Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Truthful and Informative Generation on TruthfulQA (test)
Loading...
67.14
True*Info (%)
Llama3-8B-chat + E/R + Coarse
37.6976
45.3413
52.985
60.6287
Oct 24, 2024
True*Info (%)
Updated 4d ago
Evaluation Results
Method
Method
Links
True*Info (%)
Llama3-8B-chat + E/R + Coarse
Backbone=Llama3-8B-cha...
2024.10
67.14
Llama3-8B-chat + FactTune-FS
Backbone=Llama3-8B-cha...
2024.10
64.58
Llama3-8B-chat + EVER-Pref
Backbone=Llama3-8B-cha...
2024.10
63.01
Llama3-8B-chat + Self-Eval-SKT
Backbone=Llama3-8B-cha...
2024.10
61.88
Llama3-8B-chat + SFT
Backbone=Llama3-8B-cha...
2024.10
59.17
Llama3-8B-chat
Backbone=Llama3-8B-chat
2024.10
58.89
Llama2-7B-chat + E/R + Coarse
Backbone=Llama2-7B-cha...
2024.10
56.47
Llama2-7B-chat + FactTune-FS
Backbone=Llama2-7B-cha...
2024.10
52.48
Llama2-7B-chat + EVER-Pref
Backbone=Llama2-7B-cha...
2024.10
51.07
Llama2-7B-chat + Self-Eval-SKT
Backbone=Llama2-7B-cha...
2024.10
48.65
Llama2-7B-chat + SFT
Backbone=Llama2-7B-cha...
2024.10
45.52
Llama2-7B-chat
Backbone=Llama2-7B-chat
2024.10
38.83
Feedback
Search any
task
Search any
task