Activation-Guided Consensus Merging for Large Language Models
About
Recent research has increasingly focused on reconciling the reasoning capabilities of System 2 with the efficiency of System 1. While existing training-based and prompt-based approaches face significant challenges in terms of efficiency and stability, model merging emerges as a promising strategy to integrate the diverse capabilities of different Large Language Models (LLMs) into a unified model. However, conventional model merging methods often assume uniform importance across layers, overlooking the functional heterogeneity inherent in neural components. To address this limitation, we propose \textbf{A}ctivation-Guided \textbf{C}onsensus \textbf{M}erging (\textbf{ACM}), a plug-and-play merging framework that determines layer-specific merging coefficients based on mutual information between activations of pre-trained and fine-tuned models. ACM effectively preserves task-specific capabilities without requiring gradient computations or additional training. Extensive experiments on Long-to-Short (L2S) and general merging tasks demonstrate that ACM consistently outperforms all baseline methods. For instance, in the case of Qwen-7B models, TIES-Merging equipped with ACM achieves a \textbf{55.3\%} reduction in response length while simultaneously improving reasoning accuracy by \textbf{1.3} points.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | MATH500 (test) | Accuracy94 | 381 | |
| Mathematical Reasoning | GSM8K | Accuracy78.4 | 351 | |
| Scientific Reasoning | GPQA | Accuracy27.8 | 50 | |
| Mathematical Reasoning | Olympiad Bench | Accuracy39.4 | 23 | |
| Mathematical Reasoning | Minerva Math | Accuracy37.5 | 14 | |
| Mathematical Reasoning | MATH500 | Accuracy78.8 | 11 | |
| Mathematical Reasoning | AIME25 | Accuracy16.7 | 11 | |
| General Reasoning Summary | Aggregate (GSM8K, MATH500, Minerva Math, Olympiad Bench, AIME24, AIME25, GPQA) | Accuracy71.3 | 11 | |
| Scientific Question Answering | GPQA | Accuracy61.6 | 11 | |
| Mathematical Reasoning | GSM8K (test) | Accuracy94.7 | 11 |