Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning over Large Structured Context on Hard
Loading...
5
ReasoningJudge Score
GPT-5 + HYVE
4.0016
4.2608
4.52
4.7792
Apr 7, 2026
ReasoningJudge Score
Token Usage (K)
Latency (s)
Updated 11d ago
Evaluation Results
Method
Method
Links
ReasoningJudge Score
Token Usage (K)
Latency (s)
GPT-5 + HYVE
Optimization=HYVE pipe...
2026.04
5
20.3
16.96
GPT-4.1 + HYVE
Optimization=HYVE pipe...
2026.04
5
15.1
5.43
GPT-5
Optimization=Standard...
2026.04
4.33
80.5
19.45
GPT-4.1
Optimization=Standard...
2026.04
4.04
75.1
8.07
Feedback
Search any
task
Search any
task