| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LAM benchmark | Combined Embedding | Kendall Correlation0.879 | 60 | 1mo ago | |
| ArxivRollBench and ChatbotArena | A.R.Bench (S) - A.R.Bench (C) | Spearman Correlation Coefficient0.86 | 6 | 8d ago | |
| Combined AIME'24 AIME'25 HMMT'25 BrUMO'25 | Bayes@1 | Kendall's tau_b (vs Gold Standard)0.865 | 1 | 2mo ago | |
| BrUMO'25 | Bayes_R0@1 | Kendall's tau_b (vs. Gold Standard)0.858 | 1 | 2mo ago | |
| AIME 25 | Bayes_R0@1 | Kendall's tau_b (vs Gold Standard)0.798 | 1 | 2mo ago | |
| AIME '24 | Bayes_R0@1 | Kendall's tau_b (vs. Gold)0.779 | 1 | 2mo ago |