Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BioASQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionBioASQ
AUROC81.13
104
Question AnsweringBioASQ
Accuracy98.32
72
Medical Question AnsweringBioASQ
Accuracy88.67
63
Selective PredictionBioASQ
E-AURC0.2744
28
Question AnsweringBioASQ (dev)
F1 Score77.8
28
Biomedical reasoningBioASQ out-of-domain
Accuracy91.87
25
Hallucination DetectionBioASQ (test)
AUROC77.51
20
Reliability EstimationBioASQ
AUROC70.46
20
Domain AdaptationBioASQ (test)
BBH54.89
20
Biomedical Multi-hop Question AnsweringBioASQ-B
EM40.6
18
Extractive Question AnsweringBioASQ (test)
EM47.27
16
Snippet RetrievalBIOASQ 7 (test batches 1-5)
MAP0.2518
16
Document RetrievalBIOASQ 7 (test batches 1-5)
MAP19.24
16
Question AnsweringBioASQ MRQA out-of-domain evaluation 2019 (test)
EM60.3
15
Question AnsweringBioASQ
EM45.68
14
Question AnsweringBioASQ
T Score66
14
Reading ComprehensionBioASQ MRQA out-of-domain
EM67.62
14
Question AnsweringBioASQ factoid 7b (test)
SAcc47.4
13
Hallucination DetectionBioASQ
Inference Throughput (Samples/sec)5,351
12
Extractive Question AnsweringBioASQ MRQA
F1 Score91
12
Biomedical Question AnsweringBioASQ
Factoid Acc29
11
Question AnsweringBioASQ
F1 Score26.1
10
Question AnsweringBioASQ
SAME_CONCLUSION Score85.71
10
RetrievalBioASQ (test)
Top-2046
9
Biomedical Question AnsweringBioASQ (test)
ROUGE54.8
8
Showing 25 of 61 rows