Share your thoughts, 1 month free Claude Pro on usSee more

Multi-task Language Understanding on MMLU-Pro (Performance and Token Count)

92.86Performance

Agent Q-Mix

Updated 3mo ago

Evaluation Results

Method	Links
Agent Q-Mix 2026.04		92.86	112
LongGraph 2026.04		92.86	1.25
TopoDIM 2026.04		88.57	1.14
Lobster 2026.04		87.14	97
AutoGen 2026.04		87.14	1
GTD 2026.04		85.71	1.02
G-Designer 2026.04		84.29	1.05
GPTSwarm 2026.04		81.43	2.17
AgentFmw 2026.04		80	471
LLM-Debate 2026.04		79	2.71