Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Evaluation on Mixed Benchmark Average
Loading...
40.86
Average Accuracy
phi-balancing
33.4136
35.3468
37.28
39.2132
May 14, 2026
Average Accuracy
Updated 16d ago
Evaluation Results
Method
Method
Links
Average Accuracy
phi-balancing
Model=DeepSeek-V2-Lite
2026.05
40.86
ST-MoE
Model=DeepSeek-V2-Lite
2026.05
40.45
phi-balancing
Model=DeepSeek-MoE-Chat
2026.05
36.12
ST-MoE
Model=DeepSeek-MoE-Chat
2026.05
35.92
Frozen checkpoint
Model=DeepSeek-V2-Lite
2026.05
35.46
Frozen checkpoint
Model=DeepSeek-MoE-Chat
2026.05
33.7
Feedback
Search any
task
Search any
task