Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Factuality Evaluation on TruthfulQA latest (test)
Loading...
84.57
Accuracy
SkillAggregation-X
66.63
71.2875
75.945
80.6025
Oct 14, 2024
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
SkillAggregation-X
Judge Model Size=~70B,...
2024.10
84.57
SkillAggregation
Judge Model Size=~70B,...
2024.10
84.45
DawidSkene
Judge Model Size=~70B
2024.10
84.08
SkillAggregation w/o. Reg.
Judge Model Size=~70B,...
2024.10
84.04
Crowdlayer
Judge Model Size=~70B,...
2024.10
83.87
Average Probability
Judge Model Size=~70B
2024.10
83.85
Majority Voting
Judge Model Size=~70B
2024.10
83.63
Train on Majority Voting
Judge Model Size=~70B,...
2024.10
82.41
SkillAggregation-X
Judge Model Size=7B/8B...
2024.10
68.77
SkillAggregation
Judge Model Size=7B/8B...
2024.10
68.74
SkillAggregation w/o. Reg.
Judge Model Size=7B/8B...
2024.10
68.07
Average Probability
Judge Model Size=7B/8B
2024.10
68.06
DawidSkene
Judge Model Size=7B/8B
2024.10
67.84
Crowdlayer
Judge Model Size=7B/8B...
2024.10
67.74
Majority Voting
Judge Model Size=7B/8B
2024.10
67.47
Train on Majority Voting
Judge Model Size=7B/8B...
2024.10
67.32
Feedback
Search any
task
Search any
task