ReasonAny: Incorporating Reasoning Capability to Any Model via Simple and Effective Model Merging
About
Large Reasoning Models (LRMs) with long chain-of-thought reasoning have recently achieved remarkable success. Yet, equipping domain-specialized models with such reasoning capabilities, referred to as "Reasoning + X", remains a significant challenge. While model merging offers a promising training-free solution, existing methods often suffer from a destructive performance collapse: existing methods tend to both weaken reasoning depth and compromise domain-specific utility. Interestingly, we identify a counter-intuitive phenomenon underlying this failure: reasoning ability predominantly resides in parameter regions with low gradient sensitivity, contrary to the common assumption that domain capabilities correspond to high-magnitude parameters. Motivated by this insight, we propose ReasonAny, a novel merging framework that resolves the reasoning-domain performance collapse through Contrastive Gradient Identification. Experiments across safety, biomedicine, and finance domains show that ReasonAny effectively synthesizes "Reasoning + X" capabilities, significantly outperforming state-of-the-art baselines while retaining robust reasoning performance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Science Question Answering | ARC Challenge | Accuracy59.83 | 342 | |
| Mathematical Reasoning | AIME | AIME Accuracy33.33 | 288 | |
| Code Generation | HumanEval | Pass@161.71 | 171 | |
| Science Question Answering | ARC Easy | Accuracy66.75 | 155 | |
| Knowledge | MMLU | Accuracy82.09 | 136 | |
| Safety Evaluation | HarmBench | Harmbench Score2 | 112 | |
| Reasoning | GSM8K | -- | 106 | |
| Code Generation | LiveCodeBench | Pass@126.48 | 86 | |
| Knowledge | GPQA | Accuracy56.25 | 51 | |
| General Knowledge Evaluation | MMLU | MMLU Accuracy73.01 | 45 |