Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Evaluation on General and Medical Aggregation
Loading...
51.8
Overall Average
Anchored Learning
8.744
19.922
31.1
42.278
May 6, 2026
Overall Average
Updated 27d ago
Evaluation Results
Method
Method
Links
Overall Average
Anchored Learning
Backbone=Qwen2.5-3B-In...
2026.05
51.8
Base
Backbone=Qwen2.5-3B-In...
2026.05
50.8
Low-SFT
Backbone=Qwen2.5-3B-In...
2026.05
48.8
Self-sft
Backbone=Qwen2.5-3B-In...
2026.05
48.3
Iter-SFT
Backbone=Qwen2.5-3B-In...
2026.05
48.3
STM
Backbone=Qwen2.5-3B-In...
2026.05
42.5
DFT
Backbone=Qwen2.5-3B-In...
2026.05
40.1
KL-SFT
Backbone=Qwen2.5-3B-In...
2026.05
12.9
SFT
Backbone=Qwen2.5-3B-In...
2026.05
10.4
Feedback
Search any
task
Search any
task