Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agentic Reasoning on HLE
Loading...
41.6
Overall Score
ChatGPT-Agent
13.7696
20.9948
28.22
35.4452
Feb 9, 2026
Overall Score
Accuracy (Text Only)
Updated 4d ago
Evaluation Results
Method
Method
Links
Overall Score
Accuracy (Text Only)
ChatGPT-Agent
2026.02
41.6
-
InternAgent-1.5
Base Model=Gemini-3-pr...
2026.02
40
40.87
InternAgent-1.5
Base Model=o4-mini
2026.02
34.52
36.1
Kimi-Researcher
2026.02
26.9
-
Gemini DR
2026.02
26.9
-
OpenAI DR
2026.02
26.6
-
InternAgent-1.5
Base Model=Qwen3-235B
2026.02
14.84
15.04
MiroThinker
Base Model=MiroThinker...
2026.02
-
31
MiroThinker
Base Model=MiroThinker...
2026.02
-
39.2
Tongyi-DR
Base Model=Tongyi-DR-30B
2026.02
-
32.9
Feedback
Search any
task
Search any
task