Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Language Modeling and Reasoning on ARC, BBH, HellaSwag, TruthfulQA, LAMBADA, WinoGrande, GSM8K, MT-Bench

54.61ARC (Accuracy)

BitDelta

52.47853.031553.58554.1385Feb 15, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.02
54.6134.2879.146.670.5869.315.1652.84.87
2024.02
54.5234.1478.9946.8470.8369.5314.7152.794.93
2024.02
52.5633.7678.9638.9668.3968.9813.5750.74-