From Sub-Ability Diagnosis to Human-Aligned Generation: Bridging the Gap for Text Length Control via MARKERGEN
About
Despite the rapid progress of large language models (LLMs), their length-controllable text generation (LCTG) ability remains below expectations, posing a major limitation for practical applications. Existing methods mainly focus on end-to-end training to reinforce adherence to length constraints. However, the lack of decomposition and targeted enhancement of LCTG sub-abilities restricts further progress. To bridge this gap, we conduct a bottom-up decomposition of LCTG sub-abilities with human patterns as reference and perform a detailed error analysis. On this basis, we propose MarkerGen, a simple-yet-effective plug-and-play approach that:(1) mitigates LLM fundamental deficiencies via external tool integration;(2) conducts explicit length modeling with dynamically inserted markers;(3) employs a three-stage generation scheme to better align length constraints while maintaining content quality. Comprehensive experiments demonstrate that MarkerGen significantly improves LCTG across various settings, exhibiting outstanding effectiveness and generalizability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Summarization | GovReport | Token MAE41.45 | 14 | |
| Biography Generation | Biographies | Token MAE42.91 | 14 | |
| Length-Constrained Text Generation | CNN/DailyMail | Win Rate16.43 | 10 | |
| Length-Constrained Text Generation | HANNA | Win Rate23 | 10 | |
| Length-Constrained Text Generation | TruthfulQA | Win Rate36.91 | 10 | |
| Length-Constrained Text Generation | Heuristic Gen. | Win Rate5.69 | 10 | |
| Text Generation | CNN/DailyMail (test) | LCTG Error Rate (E)3.18 | 10 | |
| Text Generation | HANNA (test) | LCTG Error Rate2.58 | 10 | |
| Text Generation | TruthfulQA (test) | LCTG Error Rate2.8 | 10 | |
| Text Generation | Heuristic Generation (test) | LCTG Error Rate5.03 | 10 |