Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs

About

Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise generalization. While Parameter-Efficient Fine-Tuning (PEFT) methods offer a more resource-conscious alternative, they typically require retraining for each LLM backbone due to architectural dependencies. To address these challenges, we propose Universal Reasoner (UniR)-a modular, composable, and plug-and-play reasoning module that can be used with larger frozen LLMs to provide specialized reasoning capabilities with a shared or aligned token space. Specifically, UniR decomposes the reward into a standalone reasoning module trained in a decoupled manner using verifiable rewards, effectively translating trajectory-level signals into token-level guidance. Once trained, UniR is combined with frozen LLMs at inference time by simply adding its output logits to those of the backbone. This additive structure enables modular composition: multiple UniR modules trained for different tasks can be jointly applied by summing their logits, enabling complex reasoning via composition. Furthermore, UniR demonstrates weak-to-strong generalization, where reasoning modules trained on smaller models effectively guide much larger LLMs in the same model family, and generalize across domains such as in vision language models and medical reasoning. Experiments on mathematical reasoning and machine translation show that UniR surpasses existing fine-tuning methods. Code is open-sourced at https://github.com/hangeol/UniR.

Jaemin Kim, Hangeol Chang, Hyunmin Hwang, Choonghan Kim, Jong Chul Ye• 2025

Related benchmarks

Task	Dataset	Result
Readmission prediction	MIMIC IV	--	90
Mortality Prediction	MIMIC IV	--	53
Machine Translation	IWSLT en-de 2017 (test)	BLEU27.03	46
Mathematical Reasoning	MATH 500	Pass@1 Accuracy66.8	23
Mathematical Reasoning	Minerva	Pass@126.3	22
Length of Stay (LOS)	MIMIC IV	F1 Score18.32	21
Machine Translation (De-En)	IWSLT 2017 (test)	BLEU37.21	14
Mathematical Reasoning	OlympiadBench	Pass@1 Rate28.2	14
Mathematical Reasoning	GSM8K	Accuracy (pass@1)84.5	14
Mathematical Reasoning	AIME24	Pass@1 Accuracy7.7	10

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord