| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Anthropic HH Harmless (test) | Few-Shot | Accuracy71.7 | 22 | 1mo ago | |
| Sushi (test) | Error Rate35.2 | 7 | 11d ago | ||
| WebGPT comparisons (test) | UMM-RM | Accuracy60.8 | 7 | 1mo ago | |
| Anthropic HH Helpful (test) | UMM-RM | Accuracy57.6 | 7 | 1mo ago | |
| Math Reasoning (test) | BTPO | Classification Accuracy85.4 | 4 | 1mo ago | |
| Instruction Following (IF) (test) | BTPO | Accuracy61.4 | 4 | 1mo ago | |
| Helpfulness & Harmlessness (HH) (test) | BTPO | Classification Accuracy70.4 | 4 | 1mo ago |