Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Selective Generation on MMLU Out-of-domain
Loading...
0.77
PRR (Accuracy)
HUQ-SATRMD
-0.13168
0.10241
0.3365
0.57059
Feb 20, 2025
PRR (Accuracy)
Updated 1mo ago
Evaluation Results
Method
Method
Links
PRR (Accuracy)
HUQ-SATRMD
Model=Llama 8b v3.1, E...
2025.02
0.77
SATRMD+MSP
Model=Llama 8b v3.1, E...
2025.02
0.681
Maximum Sequence Probability
Model=Llama 8b v3.1, E...
2025.02
0.405
SAR
Model=Llama 8b v3.1, E...
2025.02
0.284
DegMat NLI Score Entail.
Model=Llama 8b v3.1, E...
2025.02
0.224
Semantic Entropy
Model=Llama 8b v3.1, E...
2025.02
0.22
Factoscope
Model=Llama 8b v3.1, E...
2025.02
0.071
SAPLMA
Model=Llama 8b v3.1, E...
2025.02
-0.097
Feedback
Search any
task
Search any
task