Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Structured Style-Rewrite with Chain-of-Thought Planning for Low-Resource Character Dialogue

About

Applying Small Language Models (SLMs) to Chinese character-driven generation remains challenging due to data scarcity and the difficulty of disentangling character style. Standard Supervised Fine-Tuning (SFT) often captures surface-level semantics but produces frequent Out-Of-Character (OOC) outputs. We frame this as a controlled sentence-level style rewriting task, which isolates stylistic quality from dialogue context management. We propose a Structured Style-Rewrite Framework that decomposes character style into interpretable format signature, syntactic, and pragmatic dimensions, combined with Chain-of-Thought (CoT) supervision for explicit style planning. A CoT-Shared Direct Preference Optimization (DPO) stage further aligns style planning with surface realization by ensuring preference learning targets output-level style execution rather than reasoning trace differences. Experiments across eight characters from four diverse source domains demonstrate that our method enables a Qwen3-1.7B model to achieve a Valid Style Score of $0.632$ while maintaining strong semantic fidelity (0.878), placing on the Pareto frontier among the evaluated systems and outperforming significantly larger baselines (e.g., GLM-4.7) on consumer hardware.

Chanhui Zhu• 2026

Related benchmarks

TaskDatasetResultRank
Style TransferHybrid (test)
Semantic Score0.88
7
Dialogue Style TransferAnime-style dialogue evaluation set (test)
Semantic Similarity4.29
7
ClassificationCharacter Archetype Classification Dataset
Macro-F179
3
Dialogue Style TransferAnime-style dialogue human evaluation set (test)
Semantic Coherence Score4.24
3
Showing 4 of 4 rows

Other info

Follow for update