Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Uncertainty Estimation on AmbigQA
Loading...
78.5
AUROC
Kernel Lang. Ent.
56.036
61.868
67.7
73.532
Apr 18, 2026
AUROC
Updated 1mo ago
Evaluation Results
Method
Method
Links
AUROC
Kernel Lang. Ent.
Backbone=Mistral-7B
2026.04
78.5
Total
Backbone=Mistral-7B
2026.04
76.8
SelfCheckGPT
Backbone=Mistral-7B
2026.04
73.3
Aleatoric
Backbone=Mistral-7B
2026.04
71.6
Closeness Centrality
Backbone=Mistral-7B
2026.04
68.3
SemanticEntropy
Backbone=Mistral-7B
2026.04
67.8
SC + VC
Backbone=Mistral-7B
2026.04
67.1
Perplexity
Backbone=Mistral-7B
2026.04
66.1
SC Based VC
Backbone=Mistral-7B
2026.04
65.8
Max Token Prob.
Backbone=Mistral-7B
2026.04
65.8
Mean Token Entropy
Backbone=Mistral-7B
2026.04
65.4
Token Entropy
Backbone=Mistral-7B
2026.04
65.4
Max Sequence Prob.
Backbone=Mistral-7B
2026.04
65.1
SC Score
Backbone=Mistral-7B
2026.04
64.9
PTrue
Backbone=Mistral-7B
2026.04
60.4
Self Certainty
Backbone=Mistral-7B
2026.04
56.9
Feedback
Search any
task
Search any
task