Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning

About

LLM-powered coding agents spend the majority of their token budget reading repository files, yet much of the retrieved code is irrelevant to the task at hand. Existing learned pruners compress this context with a single-objective sequence labeler, collapsing all facets of code relevance into one score and one transition matrix. We show that this formulation creates a modeling bottleneck: a single CRF transition prior must serve heterogeneous retention patterns, including contiguous semantic spans and sparse structural support lines. We propose LaMR (Latent Multi-Rubric), a structured pruning framework that decomposes code relevance into two interpretable quality dimensions, semantic evidence and dependency support, each modeled by a dedicated CRF with dimension-specific transition dynamics. A mixture-of-experts gating network dynamically weights the per-rubric emissions conditioned on the query, and a final CRF layer on the fused emissions produces the aggregate keep-or-prune decision. To supervise each dimension without additional annotation cost, we derive multi-rubric labels from the existing training corpus via AST-based program analysis, simultaneously denoising the teacher's binary labels. By effectively filtering distracting noise, LaMR frequently matches or even outperforms unpruned full-context baselines. Experiments on four benchmarks (SWE-Bench Verified, SWE-QA, LCC, LongCodeQA) show that LaMR wins 12 of 16 head-to-head multi-turn comparisons. It saves up to 31% more tokens on multi-turn agent tasks and improves Exact Match by up to +3.5 on single-turn tasks, while performance is frequently enhanced by denoising the context, and any remaining drops are marginal.

Jingjing Wang, Xiwen Chen, Wenhui Zhu, Huayu Li, Zhengxiao He, Feiyang Cai, Ana S. Carreon-Rascon, Xuanzhao Dong, Feng Luo• 2026

Related benchmarks

TaskDatasetResultRank
Long Code QALongCodeQA 4× Constraint
Accuracy61.26
8
Long Code QALongCodeQA 8× Constraint
Accuracy60
8
Long Code CompletionLCC 4× Constraint
Edit Similarity (ES)61.15
8
Long Code CompletionLCC 8× Constraint
Edit Similarity (ES)59.21
8
Software Engineering Problem SolvingSWE-bench Verified
Rounds Taken25.8
6
Software Engineering Question AnsweringSWE-QA Reflex
Overall Score8.15
6
Software Engineering Question AnsweringSWE-QA Conan
Score8.71
6
Software Engineering Question AnsweringSWE-QA Streamlink
Score8.68
6
Showing 8 of 8 rows

Other info

Follow for update