Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Complex Reasoning on GAIA Text
Loading...
76.4
Accuracy
OpenAI-GPT-5-high
48.944
56.072
63.2
70.328
Apr 20, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
OpenAI-GPT-5-high
Context Window=128k, M...
2026.04
76.4
Minimax-M2
Context Window=128k, M...
2026.04
75.7
GLM-4.6
Context Window=128k, M...
2026.04
71.9
LiteResearcher-4B
Context Window=128k, M...
2026.04
71.3
Claude-4.5-Sonnet
Context Window=128k, M...
2026.04
71.2
Tongyi DeepResearch 30B
Context Window=128k, M...
2026.04
70.9
Claude-4-Sonnet
Context Window=128k, M...
2026.04
68.3
Mirothinker 8B
Context Window=128k, M...
2026.04
66.4
SFR-DeepResearch
Context Window=128k, M...
2026.04
66
AgentCPM-Explore-4B
Context Window=128k, M...
2026.04
63.9
DeepSeek-V3.2
Context Window=128k, M...
2026.04
63.5
DeepSeek-V3.1
Context Window=128k, M...
2026.04
63.1
Kimi-K2-0905
Context Window=128k, M...
2026.04
60.2
ASearcher QWQ v2
Context Window=128k, M...
2026.04
58.7
DeepMiner-32B
Context Window=128k, M...
2026.04
58.7
AFM-RL-32B
Context Window=128k, M...
2026.04
55.3
WebSailor 30B
Context Window=128k, M...
2026.04
53.2
WebDancer (QwQ)
Context Window=128k, M...
2026.04
51.5
WebExplorer-8B
Context Window=128k, M...
2026.04
50
Feedback
Search any
task
Search any
task