Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Logical Reasoning on CLUTRR

95.9Accuracy

DIVERSE

15.50836.37957.2578.121Jun 6, 2022Dec 10, 2022Jun 16, 2023Dec 21, 2023Jun 26, 2024Dec 31, 2024Jul 7, 2025
Updated 24d ago

Evaluation Results

MethodLinks
2022.06
95.9
2022.06
93.8
2022.06
92.5
2025.07
72.62
2024.12
67.7
2022.06
67
2024.12
66.1
2024.12
65.9
2025.07
62.11
2024.12
61.9
2024.12
59.4
2025.07
58.61
2024.12
57.6
2024.12
57.6
2024.12
56.7
2025.07
56.57
2024.12
56.3
2024.12
54.8
2024.12
54.7
2024.12
54.6
2024.12
54.1
2024.12
53
2024.12
52.8
2024.12
48.1
2024.12
45.9
2024.12
45.5
2022.06
42.5
2024.12
42
2024.12
37.1
2024.12
36.5
2022.06
35.6
2024.12
35.6
2024.12
35.1
2022.06
34.9
2022.06
33.6
2024.12
33.4
2022.06
32.9
2024.12
32.8
2022.06
32.4
2024.12
31.2
2024.12
24.2
2024.12
18.6