Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

About

Natural generation allows Language Models (LMs) to produce free-form responses with rich reasoning, but the lack of guaranteed structure makes outputs difficult to parse or verify. Structured generation, or constrained decoding, addresses this drawback by producing content in standardized formats such as JSON, ensuring consistency and guaranteed-parsable outputs, but it can inadvertently restrict the model's reasoning capabilities. In this work, we propose a simple approach that combines the advantages of both natural and structured generation. By allowing LLMs to reason freely until specific trigger tokens are generated, and then switching to structured generation, our method preserves the expressive power of natural language reasoning while ensuring the reliability of structured outputs. We further evaluate our approach on several datasets, covering both classification and reasoning tasks, to demonstrate its effectiveness, achieving a substantial gain of up to 27% in accuracy compared to natural generation, while requiring only a small overhead of 10-20 extra tokens.

Ngoc Trinh Hung Nguyen, Alonso Silva, Laith Zumot, Liubov Tupikina, Armen Aghasaryan, Mehwish Alam• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy (GSM8K)89.5
358
Symbolic ReasoningLast Letter
Accuracy0.819
21
Logical reasoningShuffled Objects
Accuracy89.9
19
Image ClassificationSports
Top-1 Acc77.4
14
Mathematical ReasoningGSM8K--
6
Spatial ReasoningShuffleObj--
6
ClassificationMultiFin
Accuracy86.9
4
ClassificationTask280
Accuracy74.5
4
ClassificationDDXPlus
Accuracy50.1
4
ReasoningGSM8K zero-shot
Accuracy86.9
4
Showing 10 of 12 rows

Other info

Follow for update