Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on PubMedQA (Accuracy and ECE)
Loading...
78.6
Accuracy
HPS-CPT
50.936
58.118
65.3
72.482
Dec 25, 2025
Dec 31, 2025
Jan 7, 2026
Jan 14, 2026
Jan 20, 2026
Jan 27, 2026
Feb 3, 2026
Accuracy
ECE
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
ECE
HPS-CPT
Backbone=Qwen3-14B-Bas...
2025.12
78.6
-
DOS-CPT
Backbone=Qwen3-14B-Bas...
2025.12
77.4
-
RS-CPT
Backbone=Qwen3-14B-Bas...
2025.12
76.7
-
Qwen3-14B-Base
Backbone=Qwen3-14B-Bas...
2025.12
76.6
-
LPS-CPT
Backbone=Qwen3-14B-Bas...
2025.12
76.4
-
UAT-LITE
Seed=Single
2026.02
64
0.103
Baseline
Seed=Single
2026.02
62
0.114
Temperature Scaling
Seed=Single
2026.02
62
0.128
Deep Ensemble
Ensemble Size=5, Seed=...
2026.02
52
0.051
Feedback
Search any
task
Search any
task