Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Question Answering on PubMedQA (test)
Loading...
12.2
CUS Score
GPT-3.5-turbo (Baseline)
5.7312
7.4106
9.09
10.7694
Nov 20, 2025
CUS Score
ZTI Score
Updated 18d ago
Evaluation Results
Method
Method
Links
CUS Score
ZTI Score
GPT-3.5-turbo (Baseline)
Train Dataset=MedQA, I...
2025.11
12.2
97.58
GPT-4o (Baseline)
Train Dataset=MedQA, I...
2025.11
8.03
90.1
GPT-3.5-turbo + MedBayes-Lite
Train Dataset=MedQA, I...
2025.11
7.11
98.2
GPT-4o + MedBayes-Lite
Train Dataset=MedQA, I...
2025.11
5.98
94.53
Feedback
Search any
task
Search any
task