Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agentic Reasoning on FRAMES n=50 (full)
Loading...
77.31
Accuracy
GPT-5
32.4964
44.1307
55.765
67.3993
Dec 7, 2025
Accuracy
Accuracy 95% CI
Between-Group Variance
ICC
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Accuracy 95% CI
Between-Group Variance
ICC
GPT-5
Web search=true, Trial...
2025.12
77.31
-
0.088
0.496
Claude 4.5 Haiku
Web search=true, Trial...
2025.12
68.37
-
0.144
0.655
Claude 4.5 Sonnet
Web search=true, Trial...
2025.12
66.44
-
0.156
0.689
GPT-4o
Web search=true, Trial...
2025.12
63.54
-
0.174
0.735
Gemini 2.5 Pro
Web search=false, Tria...
2025.12
62.34
-
0.174
0.713
Deepseek-v3p1
Web search=false, Tria...
2025.12
44.75
-
0.157
0.663
GPT-4o
Web search=false, Tria...
2025.12
38.16
-
0.171
0.712
Qwen3-235b-a22b
Web search=false, Tria...
2025.12
34.22
-
0.169
0.617
Feedback
Search any
task
Search any
task