Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Expert-Level Human Knowledge Reasoning on Humanity's Last Exam
Loading...
38.3
Pass@1
Tongyi DeepResearch
16.876
22.438
28
33.562
Oct 28, 2025
Pass@1
Updated 15d ago
Evaluation Results
Method
Method
Links
Pass@1
Tongyi DeepResearch
Agent Type=DeepResearc...
2025.10
38.3
Tongyi DeepResearch
Agent Type=DeepResearc...
2025.10
32.9
DeepSeek-V3.1
Agent Type=LLM-based R...
2025.10
29.8
Gemini DeepResearch
Agent Type=DeepResearc...
2025.10
26.9
Kimi Researcher
Agent Type=DeepResearc...
2025.10
26.9
OpenAI DeepResearch
Agent Type=DeepResearc...
2025.10
26.6
OpenAI o3
Agent Type=LLM-based R...
2025.10
24.9
GLM 4.5
Agent Type=LLM-based R...
2025.10
21.2
Claude-4-Sonnet
Agent Type=LLM-based R...
2025.10
20.3
Kimi K2
Agent Type=LLM-based R...
2025.10
18.1
OpenAI o4-mini
Agent Type=LLM-based R...
2025.10
17.7
Feedback
Search any
task
Search any
task