Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Generalization on Multi-task Overall Average
Loading...
40.5
Accuracy
Single Best
32.7
34.725
36.75
38.775
May 28, 2026
Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
Single Best
Base model=Qwen2.5-1.5B
2026.05
40.5
EvoGM
Base model=Qwen2.5-1.5B
2026.05
38
CMA
Base model=Qwen2.5-1.5B
2026.05
37.5
PSO-Merging
Base model=Qwen2.5-1.5B
2026.05
37.2
Model Swarm
Base model=Qwen2.5-1.5B
2026.05
37.2
Task Arithmetic
Base model=Qwen2.5-1.5B
2026.05
36.4
DARE
Base model=Qwen2.5-1.5B
2026.05
36.4
TIES
Base model=Qwen2.5-1.5B
2026.05
35.6
Base
Base model=Qwen2.5-1.5B
2026.05
35.2
Model Soup
Base model=Qwen2.5-1.5B
2026.05
35
MTL
Base model=Qwen2.5-1.5B
2026.05
33
Feedback
Search any
task
Search any
task