Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Selective Prediction on NCQA (test)
Loading...
99.1
PRR
Llama-3.1-8B
30.356
48.203
66.05
83.897
Jun 1, 2026
PRR
Updated 1d ago
Evaluation Results
Method
Method
Links
PRR
Llama-3.1-8B
Features=all UQ, Strat...
2026.06
99.1
Llama-3.1-8B
Features=all UQ, Strat...
2026.06
99.1
Spectrum-Qwen3-14B
Features=all UQ, Strat...
2026.06
69.5
A^2Search-7B
Features=SAR, Strategy...
2026.06
68.3
A^2Search-7B
Features=SAR, Strategy...
2026.06
67
Qwen3-14B
Features=MSP, Strategy...
2026.06
64.5
A^2Search-7B
Features=MSP, Strategy...
2026.06
63.8
A^2Search-7B
Features=Semantic ener...
2026.06
63
Qwen3-14B
Features=CoCoA, Strate...
2026.06
60.6
Spectrum-Qwen3-14B
Features=all UQ, Strat...
2026.06
59.9
Qwen3-14B
Features=all UQ, Strat...
2026.06
57.7
Qwen3-14B
Features=all UQ, Strat...
2026.06
53.4
Llama-3.1-8B
Features=SAR, Strategy...
2026.06
50.7
Llama-3.1-8B
Features=MI, Strategy=...
2026.06
50.2
Qwen3-14B
Features=all UQ, Strat...
2026.06
49.9
Llama-3.1-8B
Features=all UQ, Strat...
2026.06
44.2
Spectrum-Llama-3.1-8B
Features=MI, Strategy=...
2026.06
43.3
Llama-3.1-8B
Features=all UQ, Strat...
2026.06
36.9
Llama-3.1-8B
Features=all UQ, Strat...
2026.06
33
Feedback
Search any
task
Search any
task