Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Language Model Evaluation on NLP Evaluation Suite (WG, PIQA, BoolQ, ARC-C, ARC-E, OBQA, HS, SciQ, LM, RTE)

60.14WG

QK sharing

55.4656.67557.8959.105Jan 27, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
60.1473.548.4132.2553.9137.460.4875.82523.1747.6548.88
2026.01
59.2773.0762.7233.0256.1436.258.7283.649.9747.852.3555.71
2026.01
59.1273.5656.0932.6855.5136.861.4584.256.750.5557.0456.7
2026.01
58.2573.0163.3331.5756.4436.258.1785.956.152.4556.3257.07
2026.01
58.0273.1860.731.2355.853657.9583.147.5844.6950.954.47
2026.01
55.6467.0348.4127.1342.553151.5175.82523.1747.6544.99