Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-task Evaluation on Mixed Benchmark Average

40.86Average Accuracy

phi-balancing

33.413635.346837.2839.2132May 14, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2026.05
40.86
2026.05
40.45
2026.05
36.12
2026.05
35.92
2026.05
35.46
2026.05
33.7