Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language
About
This report details the creation of Bielik-Minitron-7B, a compressed 7.35B parameter version of the Bielik-11B-v3.0 model, specifically optimized for European languages. By leveraging a two-stage compression methodology inspired by the NVIDIA Minitron approach, we combined structured hybrid pruning and knowledge distillation to reduce the model's parameter count by 33.4%, from 11.04B to 7.35B. We utilized the NVIDIA Model Optimizer for structural pruning and the NVIDIA NeMo Framework for logit-based distillation for quality recovery. Following distillation, the model underwent a rigorous alignment pipeline consisting of Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO-P), and Reinforcement Learning (GRPO). Our final model successfully recovered approximately 90% of the baseline model's performance while providing up to 50% inference speedup. This approach demonstrates an efficient pathway to create language models for less-represented languages, preserving the original model quality while reducing inference deployment costs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Emotional Intelligence | Polish EQ-Bench | Overall Score64.09 | 106 | |
| Polish Text Understanding | CPTUB | Overall Avg3.38 | 98 | |
| Multilingual Language Proficiency | INCLUDE base 44 | Average Score57.4 | 46 | |
| Polish Instruction Following | Open PL LLM Leaderboard | Average Score62.46 | 45 | |
| Reading Comprehension | Belebele 28 European languages | Overall Score78.03 | 34 | |
| Medical Question Answering | Polish Board Certification Examinations | Average Score44.36 | 30 | |
| Machine Translation | Flores (test) | Average BLEU15.53 | 11 | |
| Function Calling | Berkeley Function-Calling Leaderboard (BFCL) | Non-Live Multiple AST Success Rate94.5 | 7 |