DS-STAR: Data Science Agent for Solving Diverse Tasks across Heterogeneous Formats and Open-Ended Queries

About

While large language models (LLMs) have shown promise in automating data science, existing agents often struggle with the complexity of real-world workflows that require exploring multiple sources and synthesizing open-ended insights. In this paper, we introduce DS-STAR, a specialized agent to bridge this gap. Unlike prior approaches, DS-STAR is designed to (1) seamlessly process and integrate data across diverse, heterogeneous formats, and (2) move beyond simple QA to generate comprehensive research reports for open-ended queries. Extensive evaluation shows that DS-STAR achieves state-of-the-art performance on four benchmarks: DABStep, DABStep-Research, KramaBench, and DA-Code. Most notably, it significantly outperforms existing baseline models especially in hard-level QA tasks requiring multi-file processing, and generates high-quality data science reports that are preferred over the best baseline model in over 88% of cases.

Jaehyun Nam, Jinsung Yoon, Jiefeng Chen, Raj Sinha, Jinwoo Shin, Tomas Pfister• 2025

Related benchmarks

Task	Dataset	Result
Data Analysis	DABStep 2025 (easy-level)	Accuracy87.5	12
Data Analysis	DABStep 2025 (hard-level)	Accuracy45.24	12
Data Discovery and Query Solving	KramaBench Original Setting 2025	Archaeology Score25	11
Data Science	DA-Code (test)	Data Wrangling Score30.4	9
Data Discovery and Query Solving	KramaBench Oracle Setting 2025	Archaeology Score25	5

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord