Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Question Answering on TruthfulQA

86.6Accuracy

LLaMA-3.1-8B

13.59232.54651.570.454May 24, 2023Nov 7, 2023Apr 23, 2024Oct 8, 2024Mar 24, 2025Sep 8, 2025Feb 23, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2024.11
86.6---
2024.11
85.4---
2024.11
84.8---
2024.11
82.5---
2024.11
81.2---
2024.11
79.8---
2024.11
79.3---
2024.11
78.6---
2024.11
77---
2024.11
76.7---
2024.11
76.2---
2024.11
76.2---
2024.11
72.6---
2024.11
72.5---
2025.02
61.94---
2024.06
61.93---
2024.06
61.93---
2024.11
61.6---
2024.06
61.56---
2024.06
61.02---
2024.06
60.02---
2026.02
59.5---
2025.02
58.63---
2024.11
56.7---
2025.02
56.67---
2025.02
56.1---
2025.02
55.32---
2025.02
55.21---
2025.02
55.09---
2025.02
54.02---
2025.02
53.26---
2025.02
52.68---
2024.11
50.6---
2025.02
49.91---
2026.02
49.6---
2024.01
48.8---
2023.05
47.9---
2025.01
47.5---
2024.06
46.13---
2024.01
45.57---
2025.01
45.07---
2024.07
44.6---
2024.01
43.65---
2025.01
43.5---
2024.01
41.28---
2024.01
41.25---
2024.07
39.6---
2024.01
39.04---
2025.01
39---
2024.11
39---
2024.01
38.76---
2023.05
38.7---
2024.07
38.7---
2025.01
38---
2024.01
37.82---
2024.11
37.8---
2024.07
37.6---
2025.01
37.5---
2025.01
37.5---
2024.06
37.05---
2024.01
36.32---
2025.01
36---
2024.01
35.91---
2024.07
35.9---
2024.01
35.54---
2024.01
34.33---
2024.01
34.26---
2024.07
34.1---
2024.07
33.6---
2025.01
33.5---
2024.07
32.9---
2024.07
32---
2024.06
31.48---
2025.01
30.5---
2024.06
29.01---
2025.01
26.5---
2024.11
26.2---
2025.01
26---
2025.01
24.5---
2023.05
24.4---
2025.01
17---
2023.05
16.4---
2025.03
-24.638.41-
2025.03
-26.1939.24-
2025.03
-26.0739.74-
2025.03
-26.0739.93-
2025.03
-25.2139.24-
2025.03
-24.3638.75-
2025.03
-26.9340.56-
2025.03
-25.9538.89-
2025.03
-27.4242.62-
2025.03
-29.6243.14-
2025.03
-28.2743.91-
2025.03
-28.5243.46-
2025.02
---51.2
2025.02
---54.47
2025.02
---58.26
2025.02
---56.83
2025.02
---53.32
2025.02
---57.91
Showing 100 of 125 rows