Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Uncertainty Estimation on GSM8K
Loading...
0.951
AUROC
BSDETECTOR
0.5038
0.6199
0.736
0.8521
Aug 30, 2023
AUROC
Updated 3d ago
Evaluation Results
Method
Method
Links
AUROC
BSDETECTOR
LLM=GPT-3.5 Turbo
2023.08
0.951
BSDETECTOR
LLM=Text-Davinci-003
2023.08
0.867
Self-reflection Certainty
LLM=GPT-3.5 Turbo
2023.08
0.831
Temperature Sampling
LLM=GPT-3.5 Turbo
2023.08
0.66
Likelihood Based Uncertainty
LLM=Text-Davinci-003
2023.08
0.647
Temperature Sampling
LLM=Text-Davinci-003
2023.08
0.614
Self-reflection Certainty
LLM=Text-Davinci-003
2023.08
0.521
Feedback
Search any
task
Search any
task