| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RealHitBench | DeepSeek-R1 | GPT Score79.55 | 60 | 1mo ago | |
| DABench | ReAct | Pass@181.32 | 21 | 1mo ago | |
| DAEval Verified | Kimi K2 Instruct | Accuracy92.82 | 13 | 1mo ago | |
| DABStep hard | Accuracy37.04 | 13 | 1mo ago | ||
| DABStep easy | Accuracy83.33 | 13 | 1mo ago | ||
| QRData Verified | Kimi K2 Instruct | Accuracy63.68 | 13 | 1mo ago | |
| DABStep 2025 (hard-level) | DS-STAR | Accuracy45.24 | 12 | 1mo ago | |
| DABStep 2025 (easy-level) | DS-STAR | Accuracy87.5 | 12 | 1mo ago | |
| DACO (TestH) | GPT-4 | Helpfulness43.92 | 10 | 1mo ago | |
| DACO (test A) | GPT-4 | Helpfulness50.79 | 10 | 1mo ago | |
| Data Analysis Benchmark | Avg@345 | 5 | 1mo ago | ||
| QRData | DATAMIND | Pass@162.04 | 4 | 1mo ago | |
| T2 1.0 (test) | Task Completion Rate92.5 | 4 | 1mo ago | ||
| Data Analysis | FrugalGPT | Correctness37.8 | 2 | 1mo ago |