Writing-RL: Advancing Long-form Writing via Adaptive Curriculum Reinforcement Learning

About

Recent advances in Large Language Models(LLMs) have enabled strong performance in long-form writing, but current training paradigms remain limited: Supervised Fine-Tuning (SFT) remains constrained by data saturation and performance ceilings, while Reinforcement Learning with Verifiable Reward (RLVR), though successful in verifiable domains like math and code, cannot be directly migrated to open-ended long-form writing due to a lack of ground-truths. To further advance long-form writing, we present Writing-RL: an Adaptive Curriculum Reinforcement Learning framework to advance long-form writing capabilities beyond SFT. The framework consists of three key components: Margin-aware Data Selection strategy that prioritizes samples with high learning potential, Pairwise Comparison Reward mechanism that provides discriminative learning signals in the absence of verifiable rewards, and Dynamic Reference Scheduling approach, which plays a critical role by adaptively adjusting task difficulty based on evolving model performance. Experiments on 7B-scale writer models show that Writing-RL effectively improves long-form writing performance over strong SFT baselines. Furthermore, we observe that models trained with long-output RL generalize surprisingly well to long-input reasoning tasks, potentially offering a promising perspective for rethinking long-context training.

Xuanyu Lei, Chenliang Li, Yuning Wu, Kaiming Liu, Weizhou Shen, Peng Li, Ming Yan, Fei Huang, Ya-Qin Zhang, Yang Liu• 2025

Related benchmarks

Task	Dataset	Result
Multitask Language Understanding	MMLU	Accuracy69.75	568
Long-context Reasoning	LongBench v2	Average Score32.8	113
Writing	WritingBench	Score78.2	104
Multi-turn Conversation Evaluation	MT-Bench	MT-Bench Score7.62	68
Professional deep-research writing	Deepresearch-Gym	KPR68.1	19
Discourse-level Chinese-English translation	DiscoX	Accuracy15.2	19
Long-form writing	WritingBench	Score88.27	18
Long-form writing	Creative-W.B.	Score83.17	18
Long-form writing	LongBench-Write	Score94.17	18

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord