Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-domain reasoning on BBH
Loading...
85.74
Accuracy
phi-balancing
30.9632
45.1841
59.405
73.6259
May 14, 2026
Accuracy
Updated 16d ago
Evaluation Results
Method
Method
Links
Accuracy
phi-balancing
Model=Moonlight-16B-A3...
2026.05
85.74
ST-MoE
Model=Moonlight-16B-A3...
2026.05
82
phi-balancing
Model=DeepSeek-MoE-Chat
2026.05
73.92
ST-MoE
Model=DeepSeek-MoE-Chat
2026.05
69.86
phi-balancing
Model=DeepSeek-V2-Lite
2026.05
61.98
Frozen checkpoint
Model=Moonlight-16B-A3...
2026.05
59.1
ST-MoE
Model=DeepSeek-V2-Lite
2026.05
57.34
Frozen checkpoint
Model=DeepSeek-V2-Lite
2026.05
35.42
Frozen checkpoint
Model=DeepSeek-MoE-Chat
2026.05
33.07
Feedback
Search any
task
Search any
task