Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on Beyond AIME (Performance %, Tokens)
Loading...
1
Total Tokens
TopoDIM
-27.28
163.61
354.5
545.39
Apr 1, 2026
Total Tokens
Performance (%)
Updated 16d ago
Evaluation Results
Method
Method
Links
Total Tokens
Performance (%)
TopoDIM
Backbone=GPT-OSS:120B
2026.04
1
-
GTD
Backbone=GPT-OSS:120B
2026.04
1.17
-
G-designer
Backbone=GPT-OSS:120B
2026.04
1.21
-
LongGraph
Backbone=GPT-OSS:120B
2026.04
1.29
-
AutoGen
Backbone=GPT-OSS:120B
2026.04
1.85
-
AgentFmw
Backbone=GPT-OSS:120B
2026.04
1.85
-
GPT-Swarm
Backbone=GPT-OSS:120B
2026.04
2.49
-
LLM-DEBATE
Backbone=GPT-OSS:120B
2026.04
2.68
-
Lobster
Backbone=GPT-OSS:120B
2026.04
108
-
Agent Q-Mix
Backbone=GPT-OSS:120B
2026.04
708
-
Feedback
Search any
task
Search any
task