| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MultiHier (test) | Pass@170.2 | 25 | 12d ago | ||
| Tabular Reasoning | SCOPE | F1 Score74.31 | 19 | 1mo ago | |
| DTR-Bench | DTR (DS-v3) | Accuracy Win Rate1.93 | 9 | 2mo ago | |
| INFOTABS (dev) | Rethinking with retrieval | Accuracy84.83 | 5 | 3mo ago | |
| BBH Penguins in a Table | Accuracy93 | 3 | 1d ago |