| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Toy dataset 0% label noise (test) | SimPO | Accuracy99.6 | 76 | 1mo ago | |
| Toy dataset 50% label noise (test) | SSPO | Accuracy75.7 | 24 | 1mo ago | |
| Driving Simulated | Infogain | Alignment0.948 | 15 | 1mo ago | |
| Toy dataset Noise 30% (test) | SSPO | Accuracy0.739 | 12 | 1mo ago | |
| Toy dataset Noise 10% (test) | SSPO | Accuracy93.1 | 12 | 1mo ago | |
| Simulated Matchmaking Environment row-norm sampling, p_flip=0.2, T=6,400 | TK | Like Rate80 | 6 | 5d ago | |
| Tennis (test) | C-GPM | Test AUC0.58 | 4 | 1mo ago | |
| Pokémon (test) | GPM | Test AUC86 | 4 | 1mo ago | |
| Chameleon (test) | GPM | Test AUC92 | 4 | 1mo ago | |
| Synthetic (test) | GPM | Test AUC98 | 4 | 1mo ago | |
| Robot Voice Design Simulated | Infogain | Alignment0.852 | 3 | 1mo ago | |
| Robot Face Design Simulated | Infogain | Alignment96 | 3 | 1mo ago | |
| Lunar Lander Simulated | Infogain | Alignment93.3 | 3 | 1mo ago | |
| Anthropic HH-RLHF+VI Preference (test) | MC-STL | Overall Accuracy64 | 3 | 1mo ago | |
| ML-100K (test) | AUC69.5 | 2 | 1mo ago | ||
| UCI (test) | AUC56.5 | 2 | 1mo ago | ||
| 3 Grades (test) | AUC53.22 | 2 | 1mo ago | ||
| LSAT (test) | Spectral algorithm | AUC0.707 | 2 | 1mo ago | |
| Website (test) | C-GPM | Test AUC0.66 | 2 | 1mo ago |