Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Reasoning on BigBench Hard
Loading...
31.1
Score
GPT-4o
7.284
13.467
19.65
25.833
Aug 8, 2025
Score
Rank
Updated 3mo ago
Evaluation Results
Method
Method
Links
Score
Rank
GPT-4o
Date=2024-11-20
2025.08
31.1
1
Sonnet 3.5
Date=2024-10-22
2025.08
30.4
2
Haiku 3.5
2025.08
30.1
3
Gemini 2.0 Flash
2025.08
28.7
4
Haiku 3
2025.08
8.2
5
Feedback
Search any
task
Search any
task