Reasoning over Large Structured Context

Benchmarks

Dataset Name	SOTA Method	Metric
Hard	GPT-5 + HYVE	ReasoningJudge Score5	4	3mo ago
RB-JSON	GPT-5 + HYVE	ReasoningJudge Score4.92	4	3mo ago
Canvas		ReasoningJudge Score4.96	4	3mo ago
Anom	GPT-4.1 + HYVE	ReasoningJudge Score4.03	4	3mo ago

Showing 4 of 4 rows