Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

About

Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can inadvertently restrict reasoning capabilities by imposing constraints too early in the generation process. We propose a hybrid approach, namely In-Writing, that combines free-form reasoning and structured generation in a single call. The model first performs unconstrained reasoning and only applies structured decoding after a trigger token is generated, explicitly decoupling reasoning from formatting. We establish that our trigger-token strategies are able to virtually eradicate premature triggering, a failure mode in which constrained decoding interrupts on-going reasoning. Evaluations across diverse datasets covering classification and reasoning tasks demonstrate that our approach outperforms the state-of-the-art by achieving accuracy gains of up to 27% over natural generation. Our code are available at: https://github.com/Nokia-Bell-Labs/InWriting.

Ngoc Trinh Hung Nguyen, Alonso Silva, Laith Zumot, Liubov Tupikina, Armen Aghasaryan, Mehwish Alam• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy (GSM8K)89.5
358
Logical reasoningShuffled Objects
Accuracy89.9
39
Symbolic ReasoningLast Letter
Accuracy0.819
31
Image ClassificationSports
Top-1 Acc77.4
14
Mathematical ReasoningGSM8K--
6
Spatial ReasoningShuffleObj--
6
ClassificationMultiFin
Accuracy86.9
4
ClassificationTask280
Accuracy74.5
4
ClassificationDDXPlus
Accuracy50.1
4
ReasoningGSM8K zero-shot
Accuracy86.9
4
Showing 10 of 12 rows

Other info

GitHub

Follow for update