Yuan3.0 Ultra: A Trillion-Parameter Enterprise-Oriented MoE LLM
About
We introduce Yuan3.0 Ultra, an open-source Mixture-of-Experts (MoE) large language model featuring 68.8B activated parameters and 1010B total parameters, specially designed to enhance performance on enterprise scenarios tasks while maintaining competitive capabilities on general purpose tasks. We propose Layer-Adaptive Expert Pruning (LAEP) algorithm designed for the pre-training stage of MoE LLMs. In contrast to previous expert pruning approaches that operate primarily in the post-training phase, the proposed algorithm enhances training efficiency by selectively pruning underutilized experts and reorganizing experts across computing devices according to token distribution statistics. Comprehensive experiments demonstrate that LAEP effectively reduces model size and substantially improves pre-training efficiency. When pre-training Yuan3.0 Ultra from scratch original with 1515B parameters, this algorithm delivers a 49\% boost in pre-training efficiency and a 33.3\% reduction in total parameters, while preserving the model's outstanding multi-domain performance. On enterprise scenario benchmarks including Docmatix, ChatRAG, SummEval and MMTab, Yuan3.0 Ultra achieves leading accuracy. The model and codes are publicly available at https://github.com/Yuan-lab-LLM/Yuan3.0-Ultra.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Understanding | MMLU | Accuracy78 | 825 | |
| Math | GSM8K | Accuracy0.861 | 206 | |
| Coding | MBPP | Accuracy75.9 | 116 | |
| Mathematics | MATH | MATH Accuracy66.1 | 85 | |
| Code | HumanEval | HumanEval Accuracy70.7 | 79 | |
| Natural Language Understanding | ARC Challenge | Accuracy94.3 | 14 | |
| Training Efficiency | Yuan3.0-1T Pre-training Base (train) | TFLOPS92.6 | 6 | |
| Language | Pile (test) | Accuracy59.4 | 3 | |
| Language | NaturalQuestions | Accuracy0.433 | 3 |