Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Uncertainty Quantification on MATH500
Loading...
65.2
ROC-AUC (Threshold 128)
LLaMA perplexity
49.08
53.265
57.45
61.635
Mar 3, 2026
ROC-AUC (Threshold 128)
ROC-AUC (Threshold 256)
ROC-AUC (Threshold 512)
Updated 1mo ago
Evaluation Results
Method
Method
Links
ROC-AUC (Threshold 128)
ROC-AUC (Threshold 256)
ROC-AUC (Threshold 512)
LLaMA perplexity
Backbone=LLaDA-1.5-8B,...
2026.03
65.2
58.8
55
DiSE
Backbone=LLaDA-Instruc...
2026.03
61.1
63.4
60.4
DiSE
Backbone=LLaDA-1.5-8B
2026.03
60.6
55.3
53.3
MC
Backbone=LLaDA-1.5-8B,...
2026.03
58
54.6
55.1
LLaMA perplexity
Backbone=LLaDA-Instruc...
2026.03
57.5
63.7
55.1
MC
Backbone=LLaDA-1.5-8B,...
2026.03
55.8
51.4
52.5
MC
Backbone=LLaDA-Instruc...
2026.03
52.8
57.8
53.1
MC
Backbone=LLaDA-Instruc...
2026.03
49.7
54.1
53.2
Feedback
Search any
task
Search any
task