Seir\^enes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning

About

We present Seir\^enes, a self-play RL framework that transforms contextual interference from a failure mode of LLM reasoning into an internal training signal for co-evolving more resilient reasoners. While RL with verifiable rewards has significantly advanced reasoning capabilities, models can still exhibit fragility when encountering non-idealized contexts: scenarios characterized by superfluous information, tangential instructions, or incidental correlations that differ from the clean distributions typical of standard benchmarks. Seir\^enes harnesses this vulnerability through a parameter-shared and adversarial self-play loop. Within this framework, a single model is trained to both construct plausible yet distracting contexts that expose its own reasoning blind spots, and solve problems by discerning the essential task from these perturbations to recover the core underlying logic. By pitting these competing objectives against each other, Seir\^enes compels the model to move beyond superficial pattern matching and anchors its capabilities in robust underlying reasoning. This continuous interaction sustains an informative co-evolutionary curriculum as the model improves. Across seven mathematical reasoning benchmarks and model scales from 4B to 30B, Seir\^enes achieves average gains of +10.2, +9.1, and +7.2 points. Besides, distracting contexts produced by the 4B Seir\^enes model reduce the accuracy of top-tier closed-source models (GPT and Gemini) by roughly 4--5 points, revealing Seir\^enes' general ability to uncover reasoning models' blind spots.

Chi Zhang, Haibo Qiu, Qiming Zhang, Yufei Xu, Xinbo Gao, Jing Zhang• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AIME 2025	Accuracy74.1	378
Mathematical Reasoning	AIME 2026	AIME 2026 Accuracy79.7	80
Mathematical Reasoning	IMO-Bench	Accuracy46.7	57
Mathematical Reasoning	HMMT 2026	Accuracy49	16
Mathematical Reasoning	Mathematical Reasoning Suite Overall	Average Score63.9	16
Mathematical Reasoning	Math-Perturb	Math-P Hard Score79.1	5
Mathematical Reasoning	Robustness Evaluation Suite (GSMIR, MMLU-P, OBook-P)	GSMIR Score3.05	5

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord