Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Truthfulness Evaluation on TruthfulQA (avg.@8)
Loading...
68.69
Average Score (@8)
Llama3.1-8B-Instruct
61.6596
63.4848
65.31
67.1352
Aug 25, 2025
Average Score (@8)
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score (@8)
Llama3.1-8B-Instruct
Training Pipeline=PSFT
2025.08
68.69
Qwen2.5-7B-Instruct
Training Pipeline=PSFT
2025.08
67.16
Llama3.1-8B-Instruct
Training Pipeline=SFT
2025.08
67.08
Llama3.1-8B-Instruct
Training Pipeline=PSFT...
2025.08
65.24
Qwen2.5-7B-Instruct
Training Pipeline=PSFT...
2025.08
64.84
Llama3.1-8B-Instruct
Training Pipeline=SFT...
2025.08
64.82
Qwen2.5-7B-Instruct
Training Pipeline=SFT
2025.08
63.14
Qwen2.5-7B-Instruct
Training Pipeline=SFT...
2025.08
61.93
Feedback
Search any
task
Search any
task