LLM-judge evaluation

Benchmarks

Dataset Name	SOTA Method	Metric	Trend
AXBENCH	SPLIT	Concept Score92.5		22	5mo ago
LLM-to-LLM Evaluation Reference: GPT-5.2		Global Correlation (r)0.84		2	4mo ago

Showing 2 of 2 rows