Instella: Fully Open Language Models with Stellar Performance
About
Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks, yet the majority of high-performing models remain closed-source or partially open, limiting transparency and reproducibility. In this work, we introduce Instella, a family of fully open three billion parameter language models trained entirely on openly available data and codebase. Powered by AMD Instinct MI300X GPUs, Instella is developed through large-scale pre-training, general-purpose instruction tuning, and alignment with human preferences. Despite using substantially fewer pre-training tokens than many contemporaries, Instella achieves state-of-the-art results among fully open models and is competitive with leading open-weight models of comparable size. We further release two specialized variants: Instella-Long, capable of handling context lengths up to 128K tokens, and Instella-Math, a reasoning-focused model enhanced through supervised fine-tuning and reinforcement learning on mathematical tasks. Together, these contributions establish Instella as a transparent, performant, and versatile alternative for the community, advancing the goal of open and reproducible language modeling research.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval | Pass@139.48 | 1036 | |
| Code Generation | HumanEval+ | Pass@116.04 | 383 | |
| Commonsense Reasoning | WinoGrande 5-shot | Accuracy72.05 | 64 | |
| Instruction Following | IF-Eval 0-shot | Score74.02 | 55 | |
| Leaderboard Evaluation | Open LLM Leaderboard 2 | Overall Score37.14 | 55 | |
| Reasoning | BBH 3-shot | BBH 3-shot Score43.39 | 49 | |
| Math | GSM8k 5-shot | Score70.66 | 46 | |
| Knowledge | MMLU 5-shot | Knowledge (5-shot) Score58.19 | 46 | |
| Leaderboard Evaluation | Open LLM Leaderboard 1 | Overall Score60.96 | 46 | |
| Reasoning | MuSR 0-shot | Reasoning Score (0-shot)36.92 | 46 |