Share your thoughts, 1 month free Claude Pro on usSee more

Medical Question Answering on PubMedQA (test)

12.2CUS Score

GPT-3.5-turbo (Baseline)

Updated 3mo ago

Evaluation Results

Method	Links
GPT-3.5-turbo (Baseline) 2025.11		12.2	97.58
GPT-4o (Baseline) 2025.11		8.03	90.1
GPT-3.5-turbo + MedBayes-Lite 2025.11		7.11	98.2
GPT-4o + MedBayes-Lite 2025.11		5.98	94.53