HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model

About

We present Hebatron, a Hebrew-specialized open-weight large language model built on the NVIDIA Nemotron-3 sparse Mixture-of-Experts architecture. Training employs a three-phase easy-to-hard curriculum with continuous anti-forgetting anchoring, followed by supervised fine-tuning on 2 million bilingual Hebrew--English samples. The curriculum ordering alone yields a 3-point aggregate benchmark gain over the reversed configuration. Hebatron achieves a Hebrew reasoning average of 73.8\%, outperforming DictaLM-3.0-24B-Thinking (68.9\%) and remaining competitive with Gemma-3-27B-IT on GSM8K-HE and Israeli Trivia, while activating only 3B parameters per forward pass across a 30B-parameter model, delivering approximately 9 times higher inference throughput at native context lengths up to 65,536 tokens. To our knowledge, this is the first language-specific adaptation of the Nemotron-3 architecture for any target language, and the first open-weight Hebrew-specialized MoE model with native long-context support. Model weights are released openly to support further research in Hebrew and Semitic-language NLP.

Noam Kayzer, Dan Revital, Ori Bar Joseph, Smadar Arvatz, Or Levi, Tal Geva, Shaltiel Shmidman, Amir DN Cohen, Noam Ordan, Omer Baruch, Kate Zinkovskaia, Zevi Apini, Sarel Weinberger• 2026

Related benchmarks

Task	Dataset	Result
Reasoning	English Reasoning Benchmarks HellaSwag, GSM8K, Psychometric Psi	HellaSwag Score82.5	7
Natural Language Processing	Hebrew NLP Benchmarks	SNLI Accuracy91.2	4
Human Preference Evaluation	Arena (Phase 2)	Total Battles197	3
Reasoning	Hebrew Reasoning Benchmarks Copa ARC-AI2 HellaSwag MMLU GSM8K Psychometric Psi	Copa (HE)91.9	3

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord