Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Question Answering on TruthfulQA (MC1, MC2, MC3, Average Metrics)

86.64Accuracy

ROAST

36.990449.880262.7775.6598Dec 7, 2025Dec 18, 2025Dec 30, 2025Jan 11, 2026Jan 22, 2026Feb 3, 2026Feb 15, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
86.64----
2026.02
85.39----
2026.02
82.32----
2026.02
81.18----
2026.02
80.6----
2026.02
80.31----
2026.02
78.66----
2026.02
77----
2026.02
76.1----
2026.02
74.8----
2026.02
73.9----
2026.02
72.5----
2026.02
70.9----
2026.02
69.8----
2026.02
69.51----
2025.12
69.3----
2025.12
69----
2025.12
68.8----
2026.01
68.26----
2026.01
68.26----
2026.02
67.9----
2025.12
67----
2026.02
66.9----
2025.12
66.82----
2026.02
66.8----
2026.02
66.7----
2025.12
66.5----
2025.12
66.3----
2026.02
65.9----
2026.02
65.8----
2026.02
65.8----
2025.12
65.4----
2026.02
65.3----
2025.12
65.2----
2026.02
64.8----
2026.02
64.7----
2026.02
64.5----
2026.02
64.5----
2026.02
64.2----
2026.02
63.8----
2025.12
63.3----
2026.02
62.9----
2026.02
62.4----
2026.02
62.3----
2026.02
61.9----
2026.02
61.7----
2026.02
60.7----
2025.12
60.3----
2026.02
60.1----
2026.02
59.8----
2026.02
58.9----
2026.02
57.4----
2025.12
57.15----
2025.12
57.08----
2026.02
56.9----
2025.12
54.41----
2025.12
54.2----
2025.12
54.03----
2026.02
53.83----
2025.12
53.59----
2025.12
52.78----
2026.02
49.22----
2026.02
48.78----
2026.01
44.9----
2026.01
44.9----
2026.01
44.47----
2026.01
44.47----
2025.12
41.69----
2026.01
40.95----
2026.01
40.95----
2025.12
40.14----
2026.01
38.9----
2026.01
38.9----
2026.01
-28.743.320.830.9
2026.01
-37.654.628.140.1
2026.01
-37.75929.842.2
2026.01
-3754.727.839.8
2026.01
-3360.829.541.1
2026.01
-28.254.929.837.6
2026.01
-33.7603342.2
2026.01
-40.569.741.350.5
2026.01
-41.972.64553.2