| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RealHitBench | DeepSeek-R1 | GPT Score79.55 | 49 | 4d ago | |
| DAEval Verified | Kimi K2 Instruct | Accuracy92.82 | 13 | 4d ago | |
| DABStep hard | Accuracy37.04 | 13 | 4d ago | ||
| DABStep easy | Accuracy83.33 | 13 | 4d ago | ||
| QRData Verified | Kimi K2 Instruct | Accuracy63.68 | 13 | 4d ago | |
| DABStep 2025 (hard-level) | DS-STAR | Accuracy45.24 | 12 | 4d ago | |
| DABStep 2025 (easy-level) | DS-STAR | Accuracy87.5 | 12 | 4d ago | |
| DACO (TestH) | GPT-4 | Helpfulness43.92 | 10 | 4d ago | |
| DACO (test A) | GPT-4 | Helpfulness50.79 | 10 | 4d ago | |
| Data Analysis Benchmark | Avg@345 | 5 | 4d ago | ||
| T2 1.0 (test) | Task Completion Rate92.5 | 4 | 4d ago | ||
| Data Analysis | FrugalGPT | Correctness37.8 | 2 | 4d ago |