| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| CoGym (Out-of-Distribution) | AutoMetrics | Kendall's Tau0.365 | 9 | 4d ago | |
| RealHumanEval (Out-of-Distribution) | AutoMetrics | Kendall's Tau0.16 | 9 | 4d ago | |
| EvalGen Out-of-Distribution | AutoMetrics | Kendall's Tau0.382 | 9 | 4d ago | |
| HelpSteer2 (In-Distribution) | AutoMetrics | Kendall's Tau0.342 | 9 | 4d ago | |
| SimpEval In-Distribution | AutoMetrics | Kendall's Tau0.321 | 9 | 4d ago |