Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Question Answering on TriviaQA (EM and Accuracy)

94.5Accuracy

GPT-4-0613

27.62844.98962.3579.711Jul 2, 2024Oct 11, 2024Jan 21, 2025May 3, 2025Aug 13, 2025Nov 23, 2025Mar 5, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2024.07
94.584.8
2024.07
94.380
2025.03
93-
2024.07
92.386.5
2025.03
92-
2025.03
92-
2025.03
92-
2025.03
92-
2025.03
92-
2025.03
92-
2024.07
91.782.9
2024.07
91.485.6
2024.07
91.170.2
2025.03
91-
2025.03
91-
2025.03
91-
2025.03
90-
2025.03
90-
2024.07
89.582.9
2024.07
89.382.4
2024.07
88.575
2024.07
8879.7
2024.07
87.681
2025.03
86-
2025.03
82-
2024.07
80.470.7
2025.03
79-
2026.01
78.8-
2026.02
74.92-
2026.01
74.9-
2026.02
74.6-
2026.02
74.53-
2026.02
74.45-
2025.12
74.2-
2026.02
74.1-
2026.02
74.1-
2026.02
74-
2026.02
74-
2026.02
73.87-
2026.02
73.77-
2026.02
73.69-
2026.02
73.69-
2026.02
73.52-
2026.02
73.49-
2026.02
73.22-
2024.07
73.265.8
2026.02
73.12-
2026.02
72.99-
2026.02
72.86-
2026.02
72.83-
2026.02
72.62-
2026.02
72.57-
2026.02
72.57-
2025.12
72.4-
2026.02
72.36-
2026.02
72.2-
2026.02
72.17-
2026.02
72.03-
2025.12
71.8-
2026.02
71.79-
2026.02
71.62-
2026.01
70.3-
2025.12
70-
2024.07
69.3-
2025.12
68.5-
2026.01
68.1-
2026.01
67.1-
2024.07
66.4-
2026.01
65.8-
2025.12
65.3-
2026.01
64-
2026.01
63.9-
2025.12
63.2-
2025.12
62.2-
2025.12
61-
2026.01
60.3-
2025.12
60.1-
2026.01
59.2-
2026.01
58.5-
2025.02
56.3-
2025.02
56-
2026.01
55.6-
2026.01
54.8-
2026.01
53-
2026.03
51.2-
2026.03
50.7-
2026.03
50.5-
2026.03
49.5-
2026.01
48.5-
2026.01
48.5-
2026.03
47.6-
2026.03
45.3-
2026.03
44.5-
2026.03
44.1-
2026.01
42-
2026.01
42-
35.2-
34.1-
2025.12
33.5-
2026.03
30.2-
Showing 100 of 145 rows