Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SciQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringSciQ
Accuracy97.2
283
Science Question AnsweringSciQ
Normalized Accuracy97.7
137
Multiple Choice Question AnsweringSciQ
Accuracy100
81
Science Question AnsweringSciQ
Accuracy (SciQ)85.1
52
Question AnsweringSciQ (train)
Accuracy100
36
Hallucination DetectionSciQ
AUROC0.9328
33
Reading ComprehensionSciQ
Accuracy93.7
32
Uncertainty quantificationSciQ (test)
AUROC74.5
28
Uncertainty Estimation (Factual QA)SciQ 1,000 samples (val)
AUROC62.6
27
Question AnsweringSciQ (test)
Accuracy80.7
26
Question AnsweringSciQ In-Domain (test)
Precision83.68
24
STEM Question AnsweringSciQ
First-Token Accuracy98.3
24
Factual Question AnsweringSciQ (ID)
Precision76.44
24
Multi-turn CalibrationSciQ
ECE@14.42
21
Open-ended generationSciQ
ECE5.21
21
Science KnowledgeSciQ
Accuracy88.4
21
Hallucination DetectionSciQ
Accuracy96
17
Multiple Choice Question AnsweringSciQ MC
Mean Per-Step Regret0.137
15
Question AnsweringSciQ Abstract
Mean per-step regret0.135
15
Distractor GenerationSciq (test)
Precision@124.3
15
Language ModelingSciQ
Perplexity11.95
13
Question AnsweringSciQ (D_eval)
Accuracy71.4
12
Question AnsweringSCIQ Generalization
Accuracy90.4
8
Question AnsweringSciQ
Normalized Accuracy87.9
8
Science Question AnsweringSciQ standard (test)
Accuracy90.2
8
Showing 25 of 44 rows