Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Helix: A Dual-Helix Co-Evolutionary Multi-Agent System for Prompt Optimization and Question Reformulation

About

Automated prompt optimization (APO) aims to improve large language model performance by refining prompt instructions. However, existing methods are largely constrained by fixed prompt templates, limited search spaces, or single-sided optimization that treats user questions as immutable inputs. In practice, question formulation and prompt design are inherently interdependent: clearer question structures facilitate focused reasoning and task understanding, while effective prompts reveal better ways to organize and restate queries. Ignoring this coupling fundamentally limits the effectiveness and adaptability of current APO approaches. We propose a unified multi-agent system (Helix) that jointly optimizes question reformulation and prompt instructions through a structured three-stage co-evolutionary framework. Helix integrates (1) planner-guided decomposition that breaks optimization into coupled question-prompt objectives, (2) dual-track co-evolution where specialized agents iteratively refine and critique each other to produce complementary improvements, and (3) strategy-driven question generation that instantiates high-quality reformulations for robust inference. Extensive experiments on 12 benchmarks against 6 strong baselines demonstrate the effectiveness of Helix, achieving up to 3.95% performance improvements across tasks with favorable optimization efficiency.

Kewen Zhu, Liping Yi, Zhiming Zhao, Xiang Li, Qinghua Hu• 2026

Related benchmarks

TaskDatasetResultRank
ReasoningBBH
Accuracy80.11
672
Mathematical ReasoningAQUA-RAT
Accuracy91.73
120
Multi-task Language UnderstandingMMLU & MMLU-Pro
Accuracy77.95
10
ReasoningAGIEval
AGIEval Reasoning Accuracy48.88
10
Showing 4 of 4 rows

Other info

Follow for update