Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SimpleQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringSimpleQA
Accuracy95.3
92
Question AnsweringSimpleQA Verified
Accuracy82.5
60
Agentic EvaluationSimpleQA
Accuracy84
50
Confidence CalibrationSimpleQA
Brier Score0.0386
27
Question AnsweringSimpleQA-verified OOD
Accuracy42.2
18
Tool UseSimpleQA
Accuracy91.5
12
Question AnsweringSimpleQA out-domain (test)
LasJ36.5
11
Watermark DetectionSimpleQA
Delta_q0.81
10
Factual Question AnsweringSimpleQA (test)
Accuracy79.07
10
Hallucination self-detectionSimpleQA
Accuracy67
8
Question AnsweringSimpleQA
EM58.3
7
Fact-seeking Question AnsweringSimpleQA no web
Accuracy55
7
FactualitySimpleQA
Factuality Score35.3
7
Confidence CalibrationSimpleQA (test)
ECE6.8
7
Knowledge RetrievalSimpleQA
Accuracy74.01
5
Agent Trajectory PerformanceSimpleQA (test)
Pass@1 Accuracy77.1
4
Question AnsweringSimpleQA (test)
SimpleQA Score4.05
3
Short-form QASimpleQA
Accuracy4
2
Showing 18 of 18 rows