Dicta-LM 3.0: Advancing The Frontier of Hebrew Sovereign LLMs
About
Open-weight LLMs have been released by frontier labs; however, sovereign Large Language Models (for languages other than English) remain low in supply yet high in demand. Training large language models (LLMs) for low-resource languages such as Hebrew poses unique challenges. In this paper, we introduce Dicta-LM 3.0: an open-weight collection of LLMs trained on substantially-sized corpora of Hebrew and English texts. The model is released in three sizes: 24B - adapted from the Mistral-Small-3.1 base model, 12B - adapted from the NVIDIA Nemotron Nano V2 model, and 1.7B - adapted from the Qwen3-1.7B base model. We are releasing multiple variants of each model, each with a native context length of 65k tokens; base model and chat model with tool-calling support. To rigorously evaluate our models, we introduce a new benchmark suite for evaluation of Hebrew chat-LLMs, covering a diverse set of tasks including Translation, Summarization, Winograd, Israeli Trivia, and Diacritization (nikud). Our work not only addresses the intricacies of training LLMs in low-resource languages but also proposes a framework that can be leveraged for adapting other LLMs to various non-English languages, contributing to the broader field of multilingual NLP.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | MATH | Accuracy74.99 | 643 | |
| Instruction Following | IFEval | -- | 292 | |
| Knowledge | MMLU | Accuracy85.93 | 71 | |
| Knowledge | GPQA | Accuracy55.13 | 34 | |
| Mathematics | MATH | MATH Accuracy86.41 | 32 | |
| Mathematical Reasoning | OMEGA | Score15.19 | 28 | |
| Chat Evaluation | AlpacaEval LC 2 | Score74.11 | 23 | |
| Question Answering | PopQA | Accuracy26.31 | 16 | |
| Reasoning | AGI Eval EN | Accuracy82.93 | 15 | |
| Math | OMEGA | Accuracy28.38 | 13 |