Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Question Answering on HotpotQA (AUROC, AUPRC)
Loading...
63.1
AUROC
P(True) + CoT-UQ
53.1888
55.7619
58.335
60.9081
Feb 24, 2025
AUROC
AUPRC
Updated 4d ago
Evaluation Results
Method
Method
Links
AUROC
AUPRC
P(True) + CoT-UQ
Backbone=Llama 3.1-8B,...
2025.02
63.1
37.29
P(True)
Backbone=Llama 3.1-8B,...
2025.02
62.39
35.68
Probas-mean + CoT-UQ
Backbone=Llama 3.1-8B,...
2025.02
62.01
32.25
TOKENSAR + CoT-UQ
Backbone=Llama 3.1-8B,...
2025.02
61.07
31.27
Probas-mean
Backbone=Llama 3.1-8B,...
2025.02
53.73
29.14
TOKENSAR
Backbone=Llama 3.1-8B,...
2025.02
53.57
28.41
Feedback
Search any
task
Search any
task