ReasonAny: Incorporating Reasoning Capability to Any Model via Simple and Effective Model Merging
About
Large Reasoning Models (LRMs) with long chain-of-thought reasoning have recently achieved remarkable success. Yet, equipping domain-specialized models with such reasoning capabilities, referred to as "Reasoning + X", remains a significant challenge. While model merging offers a promising training-free solution, existing methods often suffer from a destructive performance collapse: existing methods tend to both weaken reasoning depth and compromise domain-specific utility. Interestingly, we identify a counter-intuitive phenomenon underlying this failure: reasoning ability predominantly resides in parameter regions with low gradient sensitivity, contrary to the common assumption that domain capabilities correspond to high-magnitude parameters. Motivated by this insight, we propose ReasonAny, a novel merging framework that resolves the reasoning-domain performance collapse through Contrastive Gradient Identification. Experiments across safety, biomedicine, and finance domains show that ReasonAny effectively synthesizes "Reasoning + X" capabilities, significantly outperforming state-of-the-art baselines while retaining robust reasoning performance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | AIME | AIME Accuracy33.33 | 283 | |
| Science Question Answering | ARC Challenge | Accuracy59.83 | 234 | |
| Code Generation | HumanEval | Pass@161.71 | 108 | |
| Science Question Answering | ARC Easy | Accuracy66.75 | 101 | |
| Code Generation | LiveCodeBench | Pass@126.48 | 86 | |
| Reasoning | GSM8K | -- | 83 | |
| Safety Evaluation | HarmBench | Harmbench Score2 | 76 | |
| Knowledge | MMLU | Accuracy82.09 | 71 | |
| Code Reasoning | HumanEval | HumanEval Score92.32 | 35 | |
| Knowledge | GPQA | Accuracy56.25 | 34 |