Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SciQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringSciQ
Accuracy97.2
283
Science Question AnsweringSciQ
Normalized Accuracy97.7
137
Science Question AnsweringSciQ
Accuracy (SciQ)94.3
101
Multiple Choice Question AnsweringSciQ
Accuracy100
91
Question AnsweringSciQ
PRR60
66
Selective GenerationSciQ
ROC-AUC86.1
66
Question AnsweringSciQ
AUC87.79
51
Generation correctness predictionSciQ (test)
AURC35.01
42
Generation correctness predictionSciQ
AUROC77.99
42
Hallucination DetectionSciQ
AUC88.99
42
Question AnsweringSciQ (train)
Accuracy100
36
Hallucination DetectionSciQ
AUROC0.9328
33
Question AnsweringSciq
Acc Norm86.4
32
Reading ComprehensionSciQ
Accuracy93.7
32
Question AnsweringSciQ (test)
Accuracy85.4
28
Uncertainty quantificationSciQ (test)
AUROC74.5
28
Uncertainty Estimation (Factual QA)SciQ 1,000 samples (val)
AUROC62.6
27
Scientific reasoningSciQ
Accuracy97.08
25
Question AnsweringSciQ In-Domain (test)
Precision83.68
24
STEM Question AnsweringSciQ
First-Token Accuracy98.3
24
Factual Question AnsweringSciQ (ID)
Precision76.44
24
Science KnowledgeSciQ
Accuracy90.9
22
Multi-turn CalibrationSciQ
ECE@14.42
21
Open-ended generationSciQ
ECE5.21
21
Uncertainty EstimationSciQ
AUROC82
18
Showing 25 of 67 rows