Share your thoughts, 1 month free Claude Pro on usSee more

Commonsense Reasoning on Hebrew Winograd Schema Challenge

83.45Accuracy

Llama-3.3-70B-Instruct

Updated 4mo ago

Evaluation Results

Method	Links
Llama-3.3-70B-Instruct 2026.02		83.45
aya-expanse-32B 2026.02		80.58
gemma3-27B-it 2026.02		79.86
DictaLM-3.0-24B-Thinking 2026.02		78.06
gemma-3-12b-it 2026.02		75.9
DictaLM-3.0-Nemotron-12B-Instruct 2026.02		73.74
Qwen3-14B (think) 2026.02		73.38
DictaLM-3.0-1.7B-Instruct 2026.02		58.2
DictaLM-3.0-1.7B-Thinking 2026.02		55.76
Qwen3-1.7B (think) 2026.02		51.08
gemma-3-1b-it 2026.02		47.84