Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Language Understanding on MMLU-Pro (Performance and Token Count)
Loading...
92.86
Performance
Agent Q-Mix
78.4456
82.1878
85.93
89.6722
Apr 1, 2026
Performance
Total Tokens
Updated 16d ago
Evaluation Results
Method
Method
Links
Performance
Total Tokens
Agent Q-Mix
Backbone=GPT-OSS:120B
2026.04
92.86
112
LongGraph
Backbone=GPT-OSS:120B
2026.04
92.86
1.25
TopoDIM
Backbone=GPT-OSS:120B
2026.04
88.57
1.14
Lobster
Backbone=GPT-OSS:120B
2026.04
87.14
97
AutoGen
Backbone=GPT-OSS:120B
2026.04
87.14
1
GTD
Backbone=GPT-OSS:120B
2026.04
85.71
1.02
G-Designer
Backbone=GPT-OSS:120B
2026.04
84.29
1.05
GPTSwarm
Backbone=GPT-OSS:120B
2026.04
81.43
2.17
AgentFmw
Backbone=GPT-OSS:120B
2026.04
80
471
LLM-Debate
Backbone=GPT-OSS:120B
2026.04
79
2.71
Feedback
Search any
task
Search any
task