Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Selective Question Answering on AbstainQA (val)
Loading...
21
Accuracy
Single Best
-0.32
5.215
10.75
16.285
May 28, 2026
Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
Single Best
Base model=Qwen2.5-1.5B
2026.05
21
Base
Base model=Qwen2.5-1.5B
2026.05
18.5
EvoGM
Base model=Qwen2.5-1.5B
2026.05
16.5
CMA
Base model=Qwen2.5-1.5B
2026.05
14
PSO-Merging
Base model=Qwen2.5-1.5B
2026.05
13
Model Swarm
Base model=Qwen2.5-1.5B
2026.05
13
DARE
Base model=Qwen2.5-1.5B
2026.05
9.5
Task Arithmetic
Base model=Qwen2.5-1.5B
2026.05
7.5
TIES
Base model=Qwen2.5-1.5B
2026.05
7.5
Model Soup
Base model=Qwen2.5-1.5B
2026.05
7
MTL
Base model=Qwen2.5-1.5B
2026.05
0.5
Feedback
Search any
task
Search any
task