Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Prompt Selection on Perturbation-100 (test)
Loading...
76
Top-1 Hit
Risk-Cos
51.04
57.52
64
70.48
Mar 20, 2026
Top-1 Hit
Mean Regret
AUROC Score
95% CI Lower Bound
Delta vs 0.5 (%)
Spearman ρ
Updated 27d ago
Evaluation Results
Method
Method
Links
Top-1 Hit
Mean Regret
AUROC Score
95% CI Lower Bound
Delta vs 0.5 (%)
Spearman ρ
Risk-Cos
Mode=Prefill-only
2026.03
76
18.8
0.604
0.55
20.8
0.17
Risk-Margin
Mode=Prefill-only
2026.03
76
32.5
0.481
0.43
-3.8
-0.02
Risk-Entropy
Mode=Prefill-only
2026.03
57
65.7
0.381
0.33
-23.9
-0.18
Risk-Loss (NLL)
Mode=Prefill-only
2026.03
52
87
0.359
0.31
-28.2
-0.23
Feedback
Search any
task
Search any
task