Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Selective Question Answering on AbstainQA (test)
Loading...
13
Accuracy
EvoGM
-0.208
3.221
6.65
10.079
May 28, 2026
Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
EvoGM
Base model=Qwen2.5-1.5B
2026.05
13
Single Best
Base model=Qwen2.5-1.5B
2026.05
11.9
Base
Base model=Qwen2.5-1.5B
2026.05
10.1
CMA
Base model=Qwen2.5-1.5B
2026.05
7.8
Model Swarm
Base model=Qwen2.5-1.5B
2026.05
7.1
Task Arithmetic
Base model=Qwen2.5-1.5B
2026.05
6
PSO-Merging
Base model=Qwen2.5-1.5B
2026.05
5.9
Model Soup
Base model=Qwen2.5-1.5B
2026.05
4.8
DARE
Base model=Qwen2.5-1.5B
2026.05
4.3
TIES
Base model=Qwen2.5-1.5B
2026.05
2.8
MTL
Base model=Qwen2.5-1.5B
2026.05
0.3
Feedback
Search any
task
Search any
task