Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Factuality Evaluation on TruthfulQA (0-shot)
Loading...
64.3
Factuality Score (0-shot)
AquilaChat2
35.7
43.125
50.55
57.975
Mar 7, 2024
Factuality Score (0-shot)
Updated 4d ago
Evaluation Results
Method
Method
Links
Factuality Score (0-shot)
AquilaChat2
Size=34B
2024.03
64.3
Yi-Chat
Size=34B
2024.03
62.4
Yi-Chat-8bits(GPTQ)
Size=34B
2024.03
61.8
Yi-Chat-4bits(AWQ)
Size=34B
2024.03
61.8
LLaMA2-Chat
Size=70B
2024.03
54
Qwen-Chat
Size=14B
2024.03
52.5
InternLM-Chat
Size=20B
2024.03
51.8
Yi-Chat
Size=6B
2024.03
50.6
Yi-Chat-4bits(AWQ)
Size=6B
2024.03
50.3
Yi-Chat-8bits(GPTQ)
Size=6B
2024.03
49.9
Baichuan2-Chat
Size=13B
2024.03
49
LLaMA2-Chat
Size=13B
2024.03
36.8
Feedback
Search any
task
Search any
task