Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Long-form Generation on MedQA (test)
Loading...
85
SM
APCD
9.392
29.021
48.65
68.279
May 10, 2026
SM
FM
Updated 22d ago
Evaluation Results
Method
Method
Links
SM
FM
APCD
Backbone=II-Medical-8B...
2026.05
85
86.7
CS
Backbone=II-Medical-8B...
2026.05
83.4
85.8
Sample (top k + top p)
Backbone=II-Medical-8B...
2026.05
80.1
85.2
Beam Search
Backbone=II-Medical-8B...
2026.05
79.9
83.1
II-Medical-8B-1706
Backbone=II-Medical-8B...
2026.05
79.2
82.8
DoLa (low)
Backbone=II-Medical-8B...
2026.05
76.7
84.5
DBS
Backbone=II-Medical-8B...
2026.05
72.3
82
DoLa (high)
Backbone=II-Medical-8B...
2026.05
12.3
12.7
Feedback
Search any
task
Search any
task