Share your thoughts, 1 month free Claude Pro on usSee more

Truthfulness Evaluation on TruthfulQA (avg.@8)

68.69Average Score (@8)

Llama3.1-8B-Instruct

Updated 3mo ago

Evaluation Results

Method	Links
Llama3.1-8B-Instruct 2025.08		68.69
Qwen2.5-7B-Instruct 2025.08		67.16
Llama3.1-8B-Instruct 2025.08		67.08
Llama3.1-8B-Instruct 2025.08		65.24
Qwen2.5-7B-Instruct 2025.08		64.84
Llama3.1-8B-Instruct 2025.08		64.82
Qwen2.5-7B-Instruct 2025.08		63.14
Qwen2.5-7B-Instruct 2025.08		61.93