Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Truthfulness on TruthfulQA
Loading...
81
MA.
PROBELLM
43.56
53.28
63
72.72
Feb 13, 2026
MA.
MI.
Error Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
MA.
MI.
Error Rate
PROBELLM
Target Model=olmo-3-7b...
2026.02
81
19
64
PROBELLM
Target Model=Claude-3....
2026.02
76
24
54
PROBELLM
Target Model=devstral
2026.02
66
34
47
PROBELLM
Target Model=Llama-3.1...
2026.02
62
38
78
PROBELLM
Target Model=granite-4.0
2026.02
58
42
70
PROBELLM
Target Model=Deepseek-...
2026.02
57
43
43
PROBELLM
Target Model=Gemini-2....
2026.02
56
44
38
PROBELLM
Target Model=phi-4
2026.02
53
47
69
PROBELLM
Target Model=GPT-4o-mini
2026.02
47
53
56
PROBELLM
Target Model=ministral...
2026.02
47
53
64
PROBELLM
Target Model=Grok-4.1-...
2026.02
45
55
39
PROBELLM
Target Model=GPT-oss-20b
2026.02
45
55
40
Feedback
Search any
task
Search any
task