| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AGIEval Out-of-Domain Law (test) | FFA-LoRA | Average OOD Accuracy43.41 | 16 | 23d ago | |
| Diplomat, Mutual, Quality, CoQA, and Qasper Out-of-Domain Average (test) | AutoMix | Score70.9 | 9 | 1mo ago | |
| OOD Suite BBH, HumanEval, MMLU, TruthfulQA | PACE | BBH Score59.1 | 4 | 1mo ago |