Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Linguistic Minimal Pair Evaluation on BLiMP, SLING, and RuBLiMP Combined (test)

0.928Average Score

Llama (GPb+CT)

0.78240.82020.8580.8958Jun 2, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.06
0.928
2025.06
0.884
2025.06
0.8
2025.06
0.788