Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Selective Generation on TruthfulQA (PRR AlignScore)
Loading...
35.3
PRR (AlignScore)
SATRMD+MSP
-0.58
8.735
18.05
27.365
Feb 20, 2025
PRR (AlignScore)
Updated 1mo ago
Evaluation Results
Method
Method
Links
PRR (AlignScore)
SATRMD+MSP
Model=Llama 8b v3.1
2025.02
35.3
HUQ-SATRMD
Model=Llama 8b v3.1
2025.02
30.8
Maximum Sequence Probability
Model=Llama 8b v3.1
2025.02
27.7
SentenceSAR
Model=Llama 8b v3.1
2025.02
18.5
Perplexity
Model=Llama 8b v3.1
2025.02
17.8
Semantic Entropy
Model=Llama 8b v3.1
2025.02
17.1
DegMat NLI Score Entail.
Model=Llama 8b v3.1
2025.02
15.6
EigValLaplacian NLI Score Entail.
Model=Llama 8b v3.1
2025.02
15.2
Eccentricity NLI Score Entail.
Model=Llama 8b v3.1
2025.02
12.2
SAPLMA
Model=Llama 8b v3.1
2025.02
11.2
SAR
Model=Llama 8b v3.1
2025.02
10.5
EigenScore
Model=Llama 8b v3.1
2025.02
2.3
Factoscope
Model=Llama 8b v3.1
2025.02
1.7
Lexical Similarity ROUGE-L
Model=Llama 8b v3.1
2025.02
0.8
Feedback
Search any
task
Search any
task