Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-task Evaluation on Aggregate (LAMBADA, HellaSwag, PIQA, ARC, WinoGrande)

51.9Avg Accuracy

Mistral (Full-Attention)

38.79642.19845.649.002Jul 8, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.07
51.9
2024.07
49.1
2024.07
48.9
2024.07
48.6
2024.07
44.8
2024.07
41.4
2024.07
41.2
2024.07
40.7
2024.07
40.5
2024.07
39.3