Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

About

Large language models (LLMs) are increasingly trained with reinforcement learning from verifiable rewards (RLVR), yet real-world deployment demands models that can self-improve without labels or external judges. Existing self-improvement approaches primarily rely on self-confirmation signals (e.g., confidence, entropy, or consistency) to generate rewards. This reliance drives models toward over-confident, majority-favored solutions, causing an entropy collapse that degrades pass@n and reasoning complexity. To address this, we propose EVOL-RL, a label-free framework that mirrors the evolutionary principle of balancing selection with variation. Concretely, EVOL-RL retains the majority-voted answer as an anchor for stability, but adds a novelty-aware reward that scores each sampled solution by how different its reasoning is from other concurrently generated responses. This majority-for-stability + novelty-for-exploration rule mirrors the variation-selection principle: selection prevents drift, while novelty prevents collapse. Evaluation results show that EVOL-RL consistently outperforms the majority-only baseline; e.g., training on label-free AIME24 lifts Qwen3-4B-Base AIME25 pass@1 from baseline's 4.6% to 16.4%, and pass@16 from 18.5% to 37.9%. EVOL-RL not only prevents in-domain diversity collapse but also improves out-of-domain generalization (from math reasoning to broader tasks, e.g., MMLU-Pro and BBEH). The code is available at: https://github.com/YujunZhou/EVOL-RL.

Yujun Zhou, Zhenwen Liang, Haolin Liu, Wenhao Yu, Kishan Panaganti, Linfeng Song, Dian Yu, Xiangliang Zhang, Haitao Mi, Dong Yu• 2025

Related benchmarks

Task	Dataset	Result
General Reasoning	MMLU	MMLU Accuracy77.9	180
Mathematical Reasoning	AMC	Pass@169.62	112
Mathematical Reasoning	AIME 2025	Pass@130.34	96
Mathematical Reasoning	AIME 2024	Pass@141.22	86
General Reasoning	GPQA	Accuracy30.3	59
Mathematical Reasoning	MATH 500	Mean@10.734	55
General Reasoning	GPQA	pass@145.2	38
Mathematical Reasoning	AIME25	Pass@121.6	38
Reasoning	MMLU-Pro	Pass@155.3	27
Mathematical Reasoning	AMC	Mean Accuracy55	24

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord