Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Accuracy Evaluation on TruthfulQA

83.41Accuracy

Sparse MAD

10.807629.656348.50567.3537Oct 13, 2025Nov 11, 2025Dec 10, 2025Jan 9, 2026Feb 7, 2026Mar 8, 2026Apr 7, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.03
83.41
2026.03
81
2026.03
78.03
2026.03
74.89
2026.03
71.75
2026.03
71.75
2026.03
62.78
2026.03
60.49
2026.03
59
2026.03
57
2026.01
55.67
2026.01
55.28
2026.01
53.11
2026.04
49.57
2026.04
49.36
2026.04
46.49
2026.03
40.81
2026.03
40.81
2026.03
39.91
2026.03
39.51
2026.03
37.67
2026.03
33.63
2026.03
32.74
2026.02
31.95
2026.03
31.88
2026.02
30.23
2026.02
29.74
2025.10
29.5
2026.02
29.13
2025.10
28.8
2026.02
28.52
2025.10
28.4
2026.02
27.54
2026.02
27.42
2025.10
27.1
2026.02
26.68
2026.02
26.56
2026.02
26.56
2025.10
26.2
2025.10
25.7
2026.02
25.58
2025.10
25.1
2025.10
24.9
2025.10
23.5
2026.02
23.38
2026.02
23.13
2025.10
22.8
2026.02
22.4
2026.02
21.3
2025.10
18.5
2026.03
17.94
2025.10
17.5
2025.10
17.3
2025.10
16.9
2025.10
16.7
2025.10
16.3
2025.10
15.4
2025.10
14.8
2025.10
14.4
2025.10
13.6