Share your thoughts, 1 month free Claude Pro on usSee more

Language Modeling and Reasoning on ARC, BBH, HellaSwag, TruthfulQA, LAMBADA, WinoGrande, GSM8K, MT-Bench

54.61ARC (Accuracy)

BitDelta

Updated 1mo ago

Evaluation Results

Method	Links
BitDelta 2024.02		54.61	34.28	79.1	46.6	70.58	69.3	15.16	52.8	4.87
Llama 2-7B UltraChat 2024.02		54.52	34.14	78.99	46.84	70.83	69.53	14.71	52.79	4.93
Llama 2-7B 2024.02		52.56	33.76	78.96	38.96	68.39	68.98	13.57	50.74	-