Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Truthfulness Evaluation on TruthfulQA (Metric: TruthfulQA)
Loading...
82.6
TruthfulQA Score
Council Mode
48.488
57.344
66.2
75.056
Feb 17, 2026
Feb 24, 2026
Mar 4, 2026
Mar 11, 2026
Mar 19, 2026
Mar 26, 2026
Apr 3, 2026
TruthfulQA Score
Updated 12d ago
Evaluation Results
Method
Method
Links
TruthfulQA Score
Council Mode
Latency (s)=8.4
2026.04
82.6
Claude Opus 4.6
Latency (s)=4.1
2026.04
74.8
GPT-5.4
Latency (s)=3.2
2026.04
71.2
Gemini 3.1 Pro
Latency (s)=2.8
2026.04
68.5
DeepSeek V3.2
Latency (s)=5.6
2026.04
65.3
Seed 2.0 Pro
Latency (s)=3.8
2026.04
63.1
Adam-NSCL
Data=ZsRE 10K
2026.02
54
AlphaEdit
Data=ZsRE 10K
2026.02
53.9
LLaMA-3-8B-Instruct
Data=ZsRE 10K
2026.02
50.7
LocBF-FT
Data=ZsRE 10K
2026.02
50.7
CrispEdit
Data=ZsRE 10K
2026.02
50.2
UltraEdit
Data=ZsRE 10K
2026.02
49.8
Feedback
Search any
task
Search any
task