Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Language Understanding on Winogrande, BoolQ, OpenBookQA, SciQ, Race, and PIQA (test)
Loading...
72.8
Avg Acc
Baseline
52.7072
57.9236
63.14
68.3564
May 5, 2025
Avg Acc
RP
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg Acc
RP
Baseline
Model=Llama-3-70B-Inst...
2025.05
72.8
1
Baseline
Model=Llama-3.1-8B-Ins...
2025.05
71.18
1
ReplaceMe
Model=Llama-3-70B-Inst...
2025.05
70.36
0.9664
ReplaceMe
Model=Llama-3-70B-Inst...
2025.05
65.96
0.906
ReplaceMe
Model=Llama-3.1-8B-Ins...
2025.05
65.37
0.9184
Baseline
Model=Llama-3.2-1B-Ins...
2025.05
60.98
1
ReplaceMe
Model=Llama-3.2-1B-Ins...
2025.05
53.48
0.8771
Feedback
Search any
task
Search any
task