Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Complex Reasoning on Frames
Loading...
90.6
Accuracy
Tongyi DeepResearch 30B
56.8
65.575
74.35
83.125
Apr 20, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Tongyi DeepResearch 30B
Context Window=128k, M...
2026.04
90.6
Claude-4.5-Sonnet
Context Window=128k, M...
2026.04
85
DeepSeek-V3.1
Context Window=128k, M...
2026.04
83.7
LiteResearcher-4B
Context Window=128k, M...
2026.04
83.1
SFR-DeepResearch
Context Window=128k, M...
2026.04
82.8
AgentCPM-Explore-4B
Context Window=128k, M...
2026.04
82.7
Claude-4-Sonnet
Context Window=128k, M...
2026.04
80.7
Mirothinker 8B
Context Window=128k, M...
2026.04
80.6
DeepSeek-V3.2
Context Window=128k, M...
2026.04
80.2
Kimi-Researcher
Context Window=128k, M...
2026.04
78.8
WebExplorer-8B
Context Window=128k, M...
2026.04
75.7
ASearcher QWQ v2
Context Window=128k, M...
2026.04
74.5
Kimi-K2-0905
Context Window=128k, M...
2026.04
58.1
Feedback
Search any
task
Search any
task