Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Linguistic Minimal Pair Evaluation on BLiMP, SLING, and RuBLiMP Combined (test)
Loading...
0.928
Average Score
Llama (GPb+CT)
0.7824
0.8202
0.858
0.8958
Jun 2, 2025
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
Llama (GPb+CT)
Prompting Strategy=Gra...
2025.06
0.928
Llama (GPb)
Prompting Strategy=Gra...
2025.06
0.884
Llama (CT)
Prompting Strategy=Cha...
2025.06
0.8
Llama (Base)
Prompting Strategy=Basic
2025.06
0.788
Feedback
Search any
task
Search any
task