| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RealHitBench | Exact Match (EM)55.38 | 21 | 4d ago | ||
| SVAMP (test) | POET-SQL_T5 | Exact Match (EM)57.4 | 4 | 2d ago | |
| NUPA (aggregated) | NumValue-RNN | Exact Match72.4 | 4 | 4d ago | |
| DROP numerical reasoning Football 500 randomly sampled cases (test) | Least-to-Most | Accuracy63.4 | 4 | 2d ago | |
| DROP numerical reasoning Non-football 500 randomly sampled cases (test) | Least-to-Most | Accuracy74.2 | 4 | 2d ago | |
| EQUATE (test) | POET-SQL | Exact Match67.5 | 4 | 2d ago | |
| TAT-QA (dev) | POET-SQL | Exact Match (EM)59.1 | 4 | 2d ago | |
| HotpotQA (test) | POET-SQL | EM68.7 | 4 | 2d ago | |
| DROP span-subset (dev) | POET-SQL | EM79.8 | 4 | 2d ago | |
| DROP (dev) | POET-SQL_T5 | EM85.2 | 2 | 2d ago |