Model Stock: All we need is just a few fine-tuned models
About
This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance. Breaking away from traditional practices that need a multitude of fine-tuned models for averaging, our approach employs significantly fewer models to achieve final weights yet yield superior accuracy. Drawing from key insights in the weight space of fine-tuned weights, we uncover a strong link between the performance and proximity to the center of weight space. Based on this, we introduce a method that approximates a center-close weight using only two fine-tuned models, applicable during or after training. Our innovative layer-wise weight averaging technique surpasses state-of-the-art model methods such as Model Soup, utilizing only two fine-tuned models. This strategy can be aptly coined Model Stock, highlighting its reliance on selecting a minimal number of models to draw a more optimized-averaged model. We demonstrate the efficacy of Model Stock with fine-tuned models based upon pre-trained CLIP architectures, achieving remarkable performance on both ID and OOD tasks on the standard benchmarks, all while barely bringing extra computational demands. Our code and pre-trained models are available at https://github.com/naver-ai/model-stock.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy59.67 | 983 | |
| Mathematical Reasoning | MATH | Accuracy16.64 | 643 | |
| Multiple-choice Question Answering | MMLU-Pro | MMLU-Pro Overall Accuracy36.8 | 116 | |
| Safety Alignment | HarmBench | ASR17.25 | 88 | |
| Code Generating | MBPP | Pass@147.8 | 88 | |
| Multiple-choice Question Answering | SciQ | Accuracy95.2 | 74 | |
| Safety Alignment | SORRY-Bench | ASR12.67 | 40 | |
| Mathematical Reasoning | GSM8K Platinum | Accuracy59 | 37 | |
| Multilingual Mathematical Reasoning | MSVAMP | Accuracy (English)32.3 | 33 | |
| Code Generating | HumanEvalPack | Pass@139.02 | 24 |