Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning

About

LLM-powered coding agents spend the majority of their token budget reading repository files, yet much of the retrieved code is irrelevant to the task at hand. Existing learned pruners compress this context with a single-objective sequence labeler, collapsing all facets of code relevance into one score and one transition matrix. We show that this formulation creates a modeling bottleneck: a single CRF transition prior must serve heterogeneous retention patterns, including contiguous semantic spans and sparse structural support lines. We propose LaMR (Latent Multi-Rubric), a structured pruning framework that decomposes code relevance into two interpretable quality dimensions, semantic evidence and dependency support, each modeled by a dedicated CRF with dimension-specific transition dynamics. A mixture-of-experts gating network dynamically weights the per-rubric emissions conditioned on the query, and a final CRF layer on the fused emissions produces the aggregate keep-or-prune decision. To supervise each dimension without additional annotation cost, we derive multi-rubric labels from the existing training corpus via AST-based program analysis, simultaneously denoising the teacher's binary labels. By effectively filtering distracting noise, LaMR frequently matches or even outperforms unpruned full-context baselines. Experiments on four benchmarks (SWE-Bench Verified, SWE-QA, LCC, LongCodeQA) show that LaMR wins 12 of 16 head-to-head multi-turn comparisons. It saves up to 31% more tokens on multi-turn agent tasks and improves Exact Match by up to +3.5 on single-turn tasks, while performance is frequently enhanced by denoising the context, and any remaining drops are marginal.

Jingjing Wang, Xiwen Chen, Wenhui Zhu, Huayu Li, Zhengxiao He, Feiyang Cai, Ana S. Carreon-Rascon, Xuanzhao Dong, Feng Luo• 2026

Related benchmarks

Task	Dataset	Result
Long Code QA	LongCodeQA 4× Constraint	Accuracy61.26	8
Long Code QA	LongCodeQA 8× Constraint	Accuracy60	8
Long Code Completion	LCC 4× Constraint	Edit Similarity (ES)61.15	8
Long Code Completion	LCC 8× Constraint	Edit Similarity (ES)59.21	8
Software Engineering Problem Solving	SWE-bench Verified	Rounds Taken25.8	6
Software Engineering Question Answering	SWE-QA Reflex	Overall Score8.15	6
Software Engineering Question Answering	SWE-QA Conan	Score8.71	6
Software Engineering Question Answering	SWE-QA Streamlink	Score8.68	6

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord