Share your thoughts, 1 month free Claude Pro on usSee more

Multi-task Evaluation on Mixed Benchmark Average

40.86Average Accuracy

phi-balancing

Updated 2mo ago

Evaluation Results

Method	Links
phi-balancing 2026.05		40.86
ST-MoE 2026.05		40.45
phi-balancing 2026.05		36.12
ST-MoE 2026.05		35.92
Frozen checkpoint 2026.05		35.46
Frozen checkpoint 2026.05		33.7