Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Uncertainty Quantification on Countdown
Loading...
0.61
ROC-AUC (128)
DiSE
0.52056
0.54378
0.567
0.59022
Mar 3, 2026
ROC-AUC (128)
ROC-AUC (256)
ROC-AUC (512)
Updated 1mo ago
Evaluation Results
Method
Method
Links
ROC-AUC (128)
ROC-AUC (256)
ROC-AUC (512)
DiSE
Backbone=LLaDA-1.5-8B
2026.03
0.61
0.471
0.586
MC
Backbone=LLaDA-1.5-8B,...
2026.03
0.608
0.557
0.52
LLaMA perplexity
Backbone=LLaDA-1.5-8B,...
2026.03
0.596
0.459
0.362
MC
Backbone=LLaDA-Instruc...
2026.03
0.595
0.534
0.558
DiSE
Backbone=LLaDA-Instruc...
2026.03
0.578
0.521
0.622
LLaMA perplexity
Backbone=LLaDA-Instruc...
2026.03
0.574
0.419
0.392
MC
Backbone=LLaDA-1.5-8B,...
2026.03
0.525
0.588
0.528
MC
Backbone=LLaDA-Instruc...
2026.03
0.524
0.52
0.528
Feedback
Search any
task
Search any
task