| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| EHR SQL closed-weight models | SkillTrojan | Accuracy90.7 | 35 | 1mo ago | |
| BacktestBench synthetic (test) | Qwen3 235B | Execution Correctness Rate (ECR)100 | 23 | 15d ago | |
| BIRD Original (dev) | Qwen3-Max | Execution Accuracy (Simple)65.51 | 14 | 3mo ago | |
| BIRD Verified | Qwen3-Max | Execution Accuracy (Simple)69.41 | 14 | 3mo ago | |
| Bird | APMPO | Pass@160.6 | 11 | 27d ago | |
| Boundary-Aware NL2SQL Synthesized Corpus | Qwen3-1.7B (Base) | Dim. Deg.10.17 | 8 | 3mo ago | |
| Spider | FREIA | Pass@174.4 | 6 | 27d ago | |
| Spider | APMPO | Pass@176.4 | 5 | 27d ago | |
| Spider SQL 2.0 | Execution Success Rate72 | 4 | 1mo ago |