| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| USACO | RankSVM (RBF) | Spearman Rho0.78 | 6 | 15d ago | |
| SuperGPQA | RankSVM (linear) | Spearman Rank Correlation (rho)0.88 | 6 | 15d ago | |
| MMLU-Pro | RankSVM (linear) | Spearman Rank Correlation0.88 | 6 | 15d ago | |
| HMMT | RankSVM (RBF) | Spearman Rho0.76 | 6 | 15d ago | |
| GPQA | RankSVM (linear) | Spearman's Rho0.87 | 6 | 15d ago | |
| AIME | RankSVM (RBF) | Spearman Rho0.75 | 6 | 15d ago | |
| Reasoning Tasks Aggregate | RankSVM (RBF) | Spearman Rho0.81 | 6 | 15d ago | |
| UltraFeedback 13B+ Models Holdout (test) | BENCHALIGN | Pairwise Accuracy (RM1_Honest)74.8 | 4 | 3mo ago | |
| UltraFeedback 30B+ Models Holdout (test) | BENCHALIGN | Pairwise Acc (RM1_Honest)77.3 | 4 | 3mo ago | |
| UltraFeedback 70B+ Models Holdout (test) | BENCHALIGN | Pairwise Acc (RM1_Honest)77.4 | 4 | 3mo ago | |
| Helpsteer 13B+ Models Holdout (test) | BENCHALIGN | Acc_pair (RM1 Helpful)74.1 | 4 | 3mo ago | |
| Helpsteer 30B+ Models Holdout (test) | BENCHALIGN | Pairwise Accuracy (RM1)76.5 | 4 | 3mo ago | |
| Helpsteer 70B+ Models Holdout (test) | BENCHALIGN | Pairwise Acc (RM1)77.8 | 4 | 3mo ago |