Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Correlation Analysis on Reasoning Benchmark Suite (AIME, GSM8K, MMLU, GPQA)

0.741Pearson r

TRACE

0.123240.283620.4440.60438May 28, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2026.05
0.7410.755
2026.05
0.2210.145
2026.05
0.2070.244
0.1470.186