Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multiple-choice Question Answering on CommonsenseQA (CSQA)

66.4Accuracy

Llama-2-13b-chat (OTTER)

27.81637.83347.8557.867Apr 12, 2024Jul 31, 2024Nov 19, 2024Mar 10, 2025Jun 29, 2025Oct 18, 2025Feb 6, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2024.04
66.41.8
2024.04
64.53.1
2024.04
649.8
2024.04
63.412.9
2024.04
60.43.5
2024.04
58.12
2024.04
57.61
2024.04
5710.2
2024.04
56.98.3
2024.04
56.515.2
2024.04
42.73.8
2026.02
37.3-
2026.02
36.9-
2026.02
36.2-
2026.02
36.1-
2026.02
35.9-
2026.02
33.3-
2026.02
33.2-
2024.04
31.928.4
2026.02
29.6-
2026.02
29.3-