HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model
About
We present Hebatron, a Hebrew-specialized open-weight large language model built on the NVIDIA Nemotron-3 sparse Mixture-of-Experts architecture. Training employs a three-phase easy-to-hard curriculum with continuous anti-forgetting anchoring, followed by supervised fine-tuning on 2 million bilingual Hebrew--English samples. The curriculum ordering alone yields a 3-point aggregate benchmark gain over the reversed configuration. Hebatron achieves a Hebrew reasoning average of 73.8\%, outperforming DictaLM-3.0-24B-Thinking (68.9\%) and remaining competitive with Gemma-3-27B-IT on GSM8K-HE and Israeli Trivia, while activating only 3B parameters per forward pass across a 30B-parameter model, delivering approximately 9 times higher inference throughput at native context lengths up to 65,536 tokens. To our knowledge, this is the first language-specific adaptation of the Nemotron-3 architecture for any target language, and the first open-weight Hebrew-specialized MoE model with native long-context support. Model weights are released openly to support further research in Hebrew and Semitic-language NLP.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reasoning | English Reasoning Benchmarks HellaSwag, GSM8K, Psychometric Psi | HellaSwag Score82.5 | 7 | |
| Natural Language Processing | Hebrew NLP Benchmarks | SNLI Accuracy91.2 | 4 | |
| Human Preference Evaluation | Arena (Phase 2) | Total Battles197 | 3 | |
| Reasoning | Hebrew Reasoning Benchmarks Copa ARC-AI2 HellaSwag MMLU GSM8K Psychometric Psi | Copa (HE)91.9 | 3 |