Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Selective Generation on SciQ Out-of-domain
Loading...
64.4
PRR (AlignScore)
HUQ-SATRMD
6.888
21.819
36.75
51.681
Feb 20, 2025
PRR (AlignScore)
Updated 1mo ago
Evaluation Results
Method
Method
Links
PRR (AlignScore)
HUQ-SATRMD
Model=Llama 8b v3.1, E...
2025.02
64.4
SATRMD+MSP
Model=Llama 8b v3.1, E...
2025.02
59.8
Maximum Sequence Probability
Model=Llama 8b v3.1, E...
2025.02
58.2
Semantic Entropy
Model=Llama 8b v3.1, E...
2025.02
46.6
DegMat NLI Score Entail.
Model=Llama 8b v3.1, E...
2025.02
44.6
SAR
Model=Llama 8b v3.1, E...
2025.02
44
Factoscope
Model=Llama 8b v3.1, E...
2025.02
42
SAPLMA
Model=Llama 8b v3.1, E...
2025.02
9.1
Feedback
Search any
task
Search any
task