Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DemoEvolve: Overcoming Sparse Feedback in Agentic Harness Evolution with Demonstrations

About

Agent harness evolution improves frozen language-model agents by modifying the executable structures around them. We study this paradigm as a form of sample-efficient fast adaptation: instead of updating model weights, an agent can acquire task-specific competence by changing its external harness, while leaving the base model's general capabilities intact. Prior work shows that self-generated rollouts can support harness search, suggesting that agents may acquire new task competence through practice. Yet in long-horizon stochastic environments, self-practice becomes fragile: rewards are sparse, outcomes are high-variance, and failures are hard to attribute to concrete harness mechanisms. We introduce DemoEvolve, a demonstration-bootstrapped approach to harness evolution. When reward-only search is too broad and noisy, competent human trajectories serve as expert reference experience for the coding proposer, guiding harness-level diagnosis and editing. Experiments on Liar's Dice show that self-rollout evolution can work when episodes are short and failures are attributable. In contrast, Balatro exposes a harder long-horizon stochastic regime, where self-rollout evolution is misled by sparse feedback and candidate-selection noise, while tutorial-like textual knowledge alone does not yield stable improvement. Under the same limited budget, DemoEvolve produces more effective and auditable harness edits and achieves better performance. Overall, demonstrations make sparse-feedback harness evolution more diagnosable, localizable, and stable.

Lirong Che, Yuzhe yang, Peiwen lin, Chuang wang, Xueqian wang, Jian su• 2026

Related benchmarks

TaskDatasetResultRank
Game PlayingBalatro In-distribution seeds
Capped Mean Final Round24
16
Game PlayingBalatro Out-of-distribution seeds
Capped Mean Final Round24
12
Showing 2 of 2 rows

Other info

Follow for update