Share your thoughts, 1 month free Claude Pro on usSee more

Statistical Comparison on TruthfulQA (Mean Improvement, P-Value)

0.33Mean Improvement

PASf

Updated 2mo ago

Evaluation Results

Method	Links
PASf 2025.09		0.33	0.27	0
PASf 2025.09		0.27	0.24	0
iPASwo 2025.09		0.23	0.16	0
PASf 2025.09		0.21	0.17	0
iPASwo 2025.09		0.17	0.13	0
iPASwo 2025.09		0.16	0.14	0
iPASa 2025.09		0.15	0.07	0
iPASa 2025.09		0.13	0.11	0
iPASa 2025.09		0.08	0.04	0