Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Commonsense Reasoning on Hebrew Winograd Schema Challenge
Loading...
83.45
Accuracy
Llama-3.3-70B-Instruct
46.4156
56.0303
65.645
75.2597
Feb 2, 2026
Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
Accuracy
Llama-3.3-70B-Instruct
Size Category=Large
2026.02
83.45
aya-expanse-32B
Size Category=Large
2026.02
80.58
gemma3-27B-it
Size Category=Large
2026.02
79.86
DictaLM-3.0-24B-Thinking
Size Category=Large, R...
2026.02
78.06
gemma-3-12b-it
Size Category=Smaller...
2026.02
75.9
DictaLM-3.0-Nemotron-12B-Instruct
Size Category=Smaller...
2026.02
73.74
Qwen3-14B (think)
Size Category=Smaller...
2026.02
73.38
DictaLM-3.0-1.7B-Instruct
Size Category=Tiny (~1...
2026.02
58.2
DictaLM-3.0-1.7B-Thinking
Size Category=Tiny (~1...
2026.02
55.76
Qwen3-1.7B (think)
Size Category=Tiny (~1...
2026.02
51.08
gemma-3-1b-it
Size Category=Tiny (~1...
2026.02
47.84
Feedback
Search any
task
Search any
task