Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Selective Generation on GSM8k (PRR)
Loading...
64.2
PRR (Accuracy)
SATRMD+MSP
2.424
18.462
34.5
50.538
Feb 20, 2025
PRR (Accuracy)
Updated 1mo ago
Evaluation Results
Method
Method
Links
PRR (Accuracy)
SATRMD+MSP
Model=Llama 8b v3.1
2025.02
64.2
SAPLMA
Model=Llama 8b v3.1
2025.02
59.8
HUQ-SATRMD
Model=Llama 8b v3.1
2025.02
59.2
Lexical Similarity ROUGE-L
Model=Llama 8b v3.1
2025.02
46.7
SAR
Model=Llama 8b v3.1
2025.02
45.5
EigenScore
Model=Llama 8b v3.1
2025.02
43
Semantic Entropy
Model=Llama 8b v3.1
2025.02
42.4
Eccentricity NLI Score Entail.
Model=Llama 8b v3.1
2025.02
40.3
Maximum Sequence Probability
Model=Llama 8b v3.1
2025.02
38
DegMat NLI Score Entail.
Model=Llama 8b v3.1
2025.02
35.7
EigValLaplacian NLI Score Entail.
Model=Llama 8b v3.1
2025.02
33.5
Perplexity
Model=Llama 8b v3.1
2025.02
25.9
SentenceSAR
Model=Llama 8b v3.1
2025.02
15.1
Factoscope
Model=Llama 8b v3.1
2025.02
4.8
Feedback
Search any
task
Search any
task