Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Open-ended generation on TruthfulQA double info 1.0 (test)
Loading...
68.4
True Score
TACS-T
52.28
56.465
60.65
64.835
Mar 12, 2024
True Score
True Score (with Info)
Updated 4d ago
Evaluation Results
Method
Method
Links
True Score
True Score (with Info)
TACS-T
Backbone=Mistral-Instr...
2024.03
68.4
61.1
TACS-S
Backbone=Mistral-Instr...
2024.03
64.5
58.9
Mistral-Instruct-v0.2
Backbone=Mistral-Instr...
2024.03
62.1
57
TACS-T
Backbone=Llama 2-Chat,...
2024.03
58.4
54.2
TACS-S
Backbone=Llama 2-Chat,...
2024.03
58.4
53.5
Llama 2-Chat
Backbone=Llama 2-Chat,...
2024.03
55.4
52.5
ITI
Backbone=Llama 2-Chat,...
2024.03
52.9
50.2
Feedback
Search any
task
Search any
task