Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General Performance on MMLU, HellaSwag, TruthfulQA, GSM8K, MATH, MBPP, HumanEval

40.35Average Score

Sens-Merging (DARE)

30.813233.289135.76538.2409Feb 18, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.02
40.35
2025.02
40.22
2025.02
40.2
2025.02
40.13
2025.02
39.89
2025.02
37.42
2025.02
34.52
2025.02
31.21
2025.02
31.18