Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning over Large Structured Context on Canvas
Loading...
4.96
ReasoningJudge Score
GPT-5
4.9392
4.9446
4.95
4.9554
Apr 7, 2026
ReasoningJudge Score
Token Usage (M)
Latency (s)
Updated 11d ago
Evaluation Results
Method
Method
Links
ReasoningJudge Score
Token Usage (M)
Latency (s)
GPT-5
Optimization=Standard...
2026.04
4.96
122.8
10.45
GPT-5 + HYVE
Optimization=HYVE pipe...
2026.04
4.96
38.2
8.99
GPT-4.1 + HYVE
Optimization=HYVE pipe...
2026.04
4.95
35.1
3
GPT-4.1
Optimization=Standard...
2026.04
4.94
123.4
3.22
Feedback
Search any
task
Search any
task