Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Honesty Alignment on OOD Aggregate Average
Loading...
0.8447
AUROC
EliCal
0.79322
0.806585
0.81995
0.833315
Oct 20, 2025
AUROC
Updated 3mo ago
Evaluation Results
Method
Method
Links
AUROC
EliCal
Model=Qwen-7B, Regime=...
2025.10
0.8447
EliCal
Model=Qwen-14B, Regime...
2025.10
0.8279
EliCal
Model=Llama-8B, Regime...
2025.10
0.7952
Feedback
Search any
task
Search any
task