Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Semantic Flow Regularization: Teaching LLMs to Generate Diverse Yet Coherent Responses

About

When large language models are fine-tuned to generate persona- or tone-conditioned responses, their output diversity is severely limited--a failure we term Cross-Style Collapse. We trace this collapse to the cross-entropy objective, which under shared representations tends to suppress diverse continuations. We propose Semantic Flow Regularization (SFR), a lightweight auxiliary objective that supervises the backbone with continuous sentence-encoder embeddings of future segments via conditional flow matching. The stochastic flow source preserves multi-modality by construction; the flow-matching head is discarded at inference, adding zero deployment cost. On a large-scale industrial dialogue dataset (Qwen3-32B, 9 personas), SFR improves output diversity, style fidelity, and response quality over SFT. We further validate on the public LiveCodeBench-v5 (Qwen2.5-Coder-7B-Instruct), where SFR consistently improves pass@k, confirming generality beyond stylized dialogue. A controlled comparison on MBPP reveals Multi-Token Prediction to be a degenerate special case of SFR.

Kerui Peng, Feifei Li, Xingyu Fan, Wenhui Que• 2026

Related benchmarks

TaskDatasetResultRank
Stylized Dialogue148-query Average across 9 styles (test)
Context Relevance4.463
3
Stylized Dialogue148-query style-0 persona (test)
Context Score4.622
3
Stylized Dialogue148-query style-1 persona (test)
Context Score4.257
3
Stylized Dialogue148-query style-3 persona (test)
Context4.696
3
Stylized Dialogue148-query style-2 persona (test)
Context4.128
3
Stylized Dialogue Generation148-query (test)
CS-SB1 Score0.783
3
Showing 6 of 6 rows

Other info

Follow for update