| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| OTT-QA (test) | Sparse Lexical Repr. (BM25) | Recall@1096.7 | 27 | 15d ago | |
| Spider Lite 2.0 (test) | JAR (w. Contriever) | Precision29.4 | 20 | 14d ago | |
| BIRD union (test) | JAR (w. Contriever) | Precision54.4 | 20 | 14d ago | |
| Spider union (test) | ATR (w. Contriever) | Precision (P)69.6 | 20 | 14d ago | |
| NQ-Tables full (966 queries) (test) | CRAFT Stage 3 (large) | Recall@149.84 | 20 | 1mo ago | |
| FetaQA | PIPER | R@140.1 | 19 | 15d ago | |
| OTTQA | CGPT | R@186.86 | 19 | 15d ago | |
| E2E-WTQ | CGPT | R@172.2 | 15 | 3mo ago | |
| Mimo en | CGPT | R@160.13 | 15 | 3mo ago | |
| Mimo ch | CGPT | R@156.8 | 15 | 3mo ago | |
| FetaQA original (test) | PIPER | Recall@1078.4 | 12 | 15d ago | |
| Spider 2.0 | RankGPT-5-mini | Precision36.6 | 11 | 14d ago | |
| BIRD | RankGPT-5-mini | Precision (P)57.3 | 11 | 14d ago | |
| Spider | ATR | Precision69.6 | 11 | 14d ago | |
| Average | STAR | R@151.86 | 11 | 3mo ago | |
| BIRD | DCTR | Capped Recall@2599.1 | 9 | 2mo ago | |
| FIBEN | stella | Capped Recall@2558.4 | 9 | 2mo ago | |
| BEAVER | DCTR | Capped Recall@2543.5 | 9 | 2mo ago | |
| FIBEN | Baseline | Recall@1041.3 | 9 | 2mo ago | |
| BEAVER | DCTR | Recall@1032.6 | 9 | 2mo ago | |
| NQ | BGE-M3 | Recall@1 (Base)33 | 8 | 1mo ago | |
| WikiSQL | SPLADE | Recall@1 (Base)52 | 8 | 1mo ago | |
| WTQ | SPLADE | Recall@1 (Base)44 | 8 | 1mo ago | |
| Beaver (dev) | GraphER-GCS | PR@517 | 8 | 2mo ago | |
| Bird (dev) | GraphER-PPR | Precision@586.2 | 8 | 2mo ago |