Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs

About

Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise generalization. While Parameter-Efficient Fine-Tuning (PEFT) methods offer a more resource-conscious alternative, they typically require retraining for each LLM backbone due to architectural dependencies. To address these challenges, we propose Universal Reasoner (UniR)-a modular, composable, and plug-and-play reasoning module that can be used with larger frozen LLMs to provide specialized reasoning capabilities with a shared or aligned token space. Specifically, UniR decomposes the reward into a standalone reasoning module trained in a decoupled manner using verifiable rewards, effectively translating trajectory-level signals into token-level guidance. Once trained, UniR is combined with frozen LLMs at inference time by simply adding its output logits to those of the backbone. This additive structure enables modular composition: multiple UniR modules trained for different tasks can be jointly applied by summing their logits, enabling complex reasoning via composition. Furthermore, UniR demonstrates weak-to-strong generalization, where reasoning modules trained on smaller models effectively guide much larger LLMs in the same model family, and generalize across domains such as in vision language models and medical reasoning. Experiments on mathematical reasoning and machine translation show that UniR surpasses existing fine-tuning methods. Code is open-sourced at https://github.com/hangeol/UniR.

Jaemin Kim, Hangeol Chang, Hyunmin Hwang, Choonghan Kim, Jong Chul Ye• 2025

Related benchmarks

TaskDatasetResultRank
Readmission predictionMIMIC IV--
74
Machine TranslationIWSLT en-de 2017 (test)
BLEU27.03
46
Mortality PredictionMIMIC IV--
30
Machine Translation (De-En)IWSLT 2017 (test)
BLEU37.21
14
Mathematical ReasoningMATH 500
Pass@1 Accuracy66.8
14
Mathematical ReasoningMinerva
Pass@126.3
14
Mathematical ReasoningOlympiadBench
Pass@1 Rate28.2
14
Mathematical ReasoningGSM8K
Accuracy (pass@1)84.5
14
Length of Stay (LOS)MIMIC IV
F1 Score18.32
13
Mathematical ReasoningAIME24
Pass@1 Accuracy7.7
10
Showing 10 of 12 rows

Other info

Follow for update