Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Statistical Comparison on TruthfulQA (Mean Improvement, P-Value)

0.33Mean Improvement

PASf

0.070.13750.2050.2725Sep 25, 2025
Updated 16d ago

Evaluation Results

MethodLinks
2025.09
0.330.270
2025.09
0.270.240
2025.09
0.230.160
2025.09
0.210.170
2025.09
0.170.130
2025.09
0.160.140
2025.09
0.150.070
2025.09
0.130.110
2025.09
0.080.040