Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OLMoE: Open Mixture-of-Experts Language Models

About

We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to create OLMoE-1B-7B-Instruct. Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B. We present various experiments on MoE training, analyze routing in our model showing high specialization, and open-source all aspects of our work: model weights, training data, code, and logs.

Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Pete Walsh, Oyvind Tafjord, Nathan Lambert, Yuling Gu, Shane Arora, Akshita Bhagia, Dustin Schwenk, David Wadden, Alexander Wettig, Binyuan Hui, Tim Dettmers, Douwe Kiela, Ali Farhadi, Noah A. Smith, Pang Wei Koh, Amanpreet Singh, Hannaneh Hajishirzi• 2024

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande
Accuracy68.4
1085
Question AnsweringARC Challenge
Accuracy55.2
906
Multi-task Language UnderstandingMMLU
Accuracy50.5
876
Commonsense ReasoningPIQA
Accuracy80.7
751
Question AnsweringARC Easy
Accuracy76.89
597
Code GenerationHumanEval (test)--
506
Question AnsweringOpenBookQA
Accuracy44.4
465
Natural Language InferenceRTE
Accuracy71.84
448
Multitask Language UnderstandingMMLU
Accuracy53.54
413
Question AnsweringARC Easy
Normalized Acc78
389
Showing 10 of 57 rows

Other info

Follow for update