Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Uncertainty Estimation on SVAMP
Loading...
93.6
AUROC
BSDETECTOR
60.632
69.191
77.75
86.309
Aug 30, 2023
AUROC
Updated 3d ago
Evaluation Results
Method
Method
Links
AUROC
BSDETECTOR
LLM=Text-Davinci-003
2023.08
93.6
BSDETECTOR
LLM=GPT-3.5 Turbo
2023.08
92.7
Self-reflection Certainty
LLM=GPT-3.5 Turbo
2023.08
83.9
Temperature Sampling
LLM=GPT-3.5 Turbo
2023.08
67.1
Likelihood Based Uncertainty
LLM=Text-Davinci-003
2023.08
66.8
Temperature Sampling
LLM=Text-Davinci-003
2023.08
65.3
Self-reflection Certainty
LLM=Text-Davinci-003
2023.08
61.9
Feedback
Search any
task
Search any
task