Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TriviaQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionTriviaQA
AUROC0.98
621
Hallucination DetectionTriviaQA (test)
AUC-ROC98.3
243
Question AnsweringTriviaQA
Accuracy86.68
238
Predicting Answer CorrectnessTriviaQA (val)
AUROC0.904
199
Question AnsweringTriviaQA
EM86.1
182
Selective PredictionTriviaQA (val)
PRR87.9
175
Single-hop Question AnsweringTriviaQA
EM72
133
Question AnsweringTriviaQA (test)
Accuracy85.18
121
Question AnsweringTriviaQA
Accuracy94.5
117
Correctness PredictionTriviaQA
AUROC0.999
113
Uncertainty EstimationTriviaQA
AUROC88
111
Uncertainty EstimationTriviaQA (test)
AUROC87.91
110
Open-domain Question AnsweringTriviaQA
EM76.1
88
RAG Performance PredictionTriviaQA
QE5 Score0.889
80
Question AnsweringTriviaQA (test)
EM92.1
80
Open-Domain Question AnsweringTriviaQA (test)
Exact Match72.6
80
Question AnsweringTriviaQA
EM62
71
Passage retrievalTriviaQA (test)
Top-100 Acc90.1
67
Selective GenerationTriviaQA
ROC-AUC87.4
66
Question AnsweringTriviaQA
BS (%)92.42
65
Question AnsweringTriviaQA
ACC75
62
Open-domain Question AnsweringTriviaQA open (test)
EM73.3
59
Question AnsweringTriviaQA (TQA)
EM71.1
56
General Question AnsweringTriviaQA
Exact Match69.02
54
Retrieval-Augmented Generation (RAG)TriviaQA
Reliability Score80.67
52
Showing 25 of 323 rows
...