Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Truthfulness Evaluation on TruthfulQA (avg.@8)

68.69Average Score (@8)

Llama3.1-8B-Instruct

61.659663.484865.3167.1352Aug 25, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.08
68.69
2025.08
67.16
2025.08
67.08
2025.08
65.24
2025.08
64.84
2025.08
64.82
2025.08
63.14
2025.08
61.93