Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on Reasoning Tasks Average

68.6Average Score

Llama-3-8B

33.55242.65151.7560.849Mar 17, 2025Apr 26, 2025Jun 5, 2025Jul 15, 2025Aug 24, 2025Oct 3, 2025Nov 13, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.03
68.6
2025.03
68.4
2025.03
68.2
2025.03
67.9
2025.03
67.3
2025.03
64.9
2025.03
64.5
2025.03
64.4
2025.03
63.9
2025.03
63.7
2025.11
62.8
2025.11
61.9
2025.03
61.7
2025.11
61
2025.11
59.5
2025.03
57.8
2025.03
56.4
2025.11
55.6
2025.03
53.6
2025.03
51.7
2025.03
44.9
2025.03
40.8
2025.03
40.2
2025.03
38.8
2025.03
36.8
2025.03
36.2
2025.03
36
2025.03
35.6
2025.03
35.5
2025.03
35.3
2025.03
34.9
2025.03
34.9