Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models

About

Diffusion large language models promise faster generation by refining many token positions in parallel, but this parallelism introduces a hidden control problem: which proposed tokens should be transferred into the partially decoded sequence at each step? We refer to this decision as token commitment. Existing frozen-generator decoders largely rely on hand-designed confidence rules or block-specific acceptance filters. We argue that token commitment can instead be learned as a reusable trace-state policy. We introduce TraceLock, a lightweight plug-in controller that instantiates this policy for a frozen diffusion language model. Since oracle commitment times are unavailable, TraceLock derives self-supervision from future stability: at decoding step t, a proposed token for position i is labeled stable if it matches the final token at position i after the full decoding trace completes. The controller scores variable-length trace states and decides which active token proposals should be committed to the partially decoded sequence. Once trained for a given frozen backbone, the controller can be deployed across local-window widths, generation lengths, and step budgets without retraining or per-setting calibration. Experiments on question answering, mathematical reasoning, and code generation show that TraceLock improves the quality-step tradeoff over heuristic and learned baselines, with particularly stable behavior under cross-setting deployment. Diagnostic analyses show that its decisions are not reducible to scalar confidence, suggesting that frozen diffusion language models expose a learnable space of commitment trajectories beyond confidence-based decoding. Code is available at https://github.com/BobSun98/TraceLock.

Bohang Sun, Max Zhu, Francesco Caso, Jindong Gu, Junchi Yu, Philip Torr, Pietro Li\`o, Jialin Yu• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy (Acc)80
337
Mathematical ReasoningMATH
Accuracy (%)80.8
52
Code GenerationCoding
Pass@156.1
40
Question AnsweringQA
Average Rank1.73
40
GenerationMath Domain
Average Generation Time (s)2.96
40
GenerationQA domain
Average Generation Time (s)2.87
40
GenerationCoding domain
Average Wall-Clock Time (s)5.03
40
Question AnsweringNatural Questions
Average Rank2.6
10
Showing 8 of 8 rows

Other info

Follow for update