Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DS-STAR: Data Science Agent for Solving Diverse Tasks across Heterogeneous Formats and Open-Ended Queries

About

While large language models (LLMs) have shown promise in automating data science, existing agents often struggle with the complexity of real-world workflows that require exploring multiple sources and synthesizing open-ended insights. In this paper, we introduce DS-STAR, a specialized agent to bridge this gap. Unlike prior approaches, DS-STAR is designed to (1) seamlessly process and integrate data across diverse, heterogeneous formats, and (2) move beyond simple QA to generate comprehensive research reports for open-ended queries. Extensive evaluation shows that DS-STAR achieves state-of-the-art performance on four benchmarks: DABStep, DABStep-Research, KramaBench, and DA-Code. Most notably, it significantly outperforms existing baseline models especially in hard-level QA tasks requiring multi-file processing, and generates high-quality data science reports that are preferred over the best baseline model in over 88% of cases.

Jaehyun Nam, Jinsung Yoon, Jiefeng Chen, Raj Sinha, Jinwoo Shin, Tomas Pfister• 2025

Related benchmarks

TaskDatasetResultRank
Data AnalysisDABStep 2025 (easy-level)
Accuracy87.5
12
Data AnalysisDABStep 2025 (hard-level)
Accuracy45.24
12
Data Discovery and Query SolvingKramaBench Original Setting 2025
Archaeology Score25
11
Data ScienceDA-Code (test)
Data Wrangling Score30.4
9
Data Discovery and Query SolvingKramaBench Oracle Setting 2025
Archaeology Score25
5
Showing 5 of 5 rows

Other info

Follow for update