Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Performance on MMLU, HellaSwag, TruthfulQA, GSM8K, MATH, MBPP, HumanEval
Loading...
40.35
Average Score
Sens-Merging (DARE)
30.8132
33.2891
35.765
38.2409
Feb 18, 2025
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
Sens-Merging (DARE)
Backbone=LLaMA2-13B, U...
2025.02
40.35
Sens-Merging (Ties-Merging)
Backbone=LLaMA2-13B, U...
2025.02
40.22
Sens-Merging (Task Arithmetic)
Backbone=LLaMA2-13B, U...
2025.02
40.2
DARE
Backbone=LLaMA2-13B, U...
2025.02
40.13
Ties-Merging
Backbone=LLaMA2-13B, U...
2025.02
39.89
Math
Backbone=LLaMA2-13B
2025.02
37.42
Task Arithmetic
Backbone=LLaMA2-13B, U...
2025.02
34.52
Code
Backbone=LLaMA2-13B
2025.02
31.21
Chat
Backbone=LLaMA2-13B
2025.02
31.18
Feedback
Search any
task
Search any
task