DS-STAR: Data Science Agent for Solving Diverse Tasks across Heterogeneous Formats and Open-Ended Queries
About
While large language models (LLMs) have shown promise in automating data science, existing agents often struggle with the complexity of real-world workflows that require exploring multiple sources and synthesizing open-ended insights. In this paper, we introduce DS-STAR, a specialized agent to bridge this gap. Unlike prior approaches, DS-STAR is designed to (1) seamlessly process and integrate data across diverse, heterogeneous formats, and (2) move beyond simple QA to generate comprehensive research reports for open-ended queries. Extensive evaluation shows that DS-STAR achieves state-of-the-art performance on four benchmarks: DABStep, DABStep-Research, KramaBench, and DA-Code. Most notably, it significantly outperforms existing baseline models especially in hard-level QA tasks requiring multi-file processing, and generates high-quality data science reports that are preferred over the best baseline model in over 88% of cases.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Data Analysis | DABStep 2025 (easy-level) | Accuracy87.5 | 12 | |
| Data Analysis | DABStep 2025 (hard-level) | Accuracy45.24 | 12 | |
| Data Discovery and Query Solving | KramaBench Original Setting 2025 | Archaeology Score25 | 11 | |
| Data Science | DA-Code (test) | Data Wrangling Score30.4 | 9 | |
| Data Discovery and Query Solving | KramaBench Oracle Setting 2025 | Archaeology Score25 | 5 |