Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PubMedQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringPubMedQA
Accuracy83.6
145
Question AnsweringPubMedQA (test)
Accuracy82.4
128
Medical Question AnsweringPubMedQA
Accuracy81.4
92
Question AnsweringPubMedQA PQA-L (test)
Accuracy78.2
43
Hallucination DetectionPubmedQA
F1 Score88
36
Medical Question AnsweringPubMedQA
Factual Accuracy (FA)95.63
28
Language ModelingPubMedQA MdQ
PPL Change (%) vs Baseline0
24
Question AnsweringPubMedQA
EM79.82
18
Prompt Leakage AttackPubMedQA
ASR (500)14
16
Multiple-choice Question AnsweringPubMedQA
Accuracy63.62
15
Question AnsweringPubMedQA
Context Influence115.78
15
Question AnsweringPubMedQA
Accuracy82.2
15
Medical Question AnsweringPubMedQA
Pass@186
14
Question AnsweringPubMedQA (out-of-domain)
ROUGE-L11.7
14
Medical ReasoningPubMedQA
Accuracy78.3
13
Biomedical Question AnsweringPubMedQA
Accuracy68.32
13
Medical ReasoningPubMedQA
Token Cost (tokens/question)1,509
11
Biomedical Question AnsweringPubMedQA PQA-L In-Domain (test)
Accuracy78
11
Medical Question AnsweringPubMedQA
Accuracy78.4
10
Medical Question AnsweringPubMedQA
Kendall's Tau4.03
10
Close-ended QAPubMedQA
Accuracy85
10
Medical Question AnsweringPubMedQA Reasoning Required
Accuracy82
10
Domain AdaptationPubMedQA
PPL Delta (%)8.3
9
Language ModelingPubMedQA
PPL Change (%)8.3
9
Multiple choice QAPubMedQA (test)
AUROC81.8
9
Showing 25 of 46 rows