Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Natural Language Understanding on ARC Challenge
Loading...
95.3
Accuracy
LLaMA-3.1-405B Base
38.1
52.95
67.8
82.65
Oct 17, 2024
Jan 4, 2025
Mar 24, 2025
Jun 11, 2025
Aug 29, 2025
Nov 16, 2025
Feb 3, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
LLaMA-3.1-405B Base
#Shots=25-shot, Archit...
2026.01
95.3
DeepSeek-V3-Base
#Shots=25-shot, Archit...
2026.01
95.3
Yuan3.0-1T Base
#Shots=25-shot, Archit...
2026.01
94.3
Full-Attn
# Shots=25-shot, Model...
2026.02
78.4
HySparse
# Shots=25-shot, Model...
2026.02
77.6
HySparse
# Shots=25-shot, Model...
2026.02
75
Hybrid SWA
# Shots=25-shot, Model...
2026.02
74.9
Full-Attn
# Shots=25-shot, Model...
2026.02
70.2
Hybrid SWA
# Shots=25-shot, Model...
2026.02
63.9
Arcana
zero-shot=true
2024.10
61.4
Vicuna-v1.5
zero-shot=true
2024.10
56.6
LLaMA-2-Chat
zero-shot=true
2024.10
54.9
WizardLM
zero-shot=true
2024.10
47.5
LLaMA-2
zero-shot=true
2024.10
40.3
Feedback
Search any
task
Search any
task