Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TriviaQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionTriviaQA
AUROC0.95
438
Question AnsweringTriviaQA
Accuracy86.68
238
Hallucination DetectionTriviaQA (test)
AUC-ROC92.9
183
Question AnsweringTriviaQA
EM86.1
182
Question AnsweringTriviaQA (test)
Accuracy85.18
121
Question AnsweringTriviaQA
Accuracy94.5
112
Uncertainty EstimationTriviaQA (test)
AUROC87.91
104
Single-hop Question AnsweringTriviaQA
EM72
81
RAG Performance PredictionTriviaQA
QE5 Score0.889
80
Open-Domain Question AnsweringTriviaQA (test)
Exact Match72.6
80
Uncertainty EstimationTriviaQA
AUROC85.56
77
Passage retrievalTriviaQA (test)
Top-100 Acc90.1
67
Question AnsweringTriviaQA
ACC75
62
Open-domain Question AnsweringTriviaQA
EM76.1
62
Open-domain Question AnsweringTriviaQA open (test)
EM73.3
59
Question AnsweringTriviaQA (test)
EM92.1
58
Question AnsweringTriviaQA (TQA)
EM71.1
56
Question AnsweringTriviaQA
BLEU36.84
54
General Question AnsweringTriviaQA
Exact Match69.02
54
Retrieval-Augmented Generation (RAG)TriviaQA
Reliability Score80.67
52
Question AnsweringTriviaQA Wiki (val)
Exact Match (EM)87.6
52
Question AnsweringTriviaQA
C79.9
48
Question AnsweringTriviaQA
F189.02
46
Correctness PredictionTriviaQA
AUROC0.852
45
Question AnsweringTriviaQA (TQA) (test)
Robust Accuracy75.4
45
Showing 25 of 239 rows
...