Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Logical Reasoning on CLUTRR (test)

80.1Accuracy

SATLM

-0.81220.19441.262.206May 16, 2023Aug 21, 2023Nov 27, 2023Mar 4, 2024Jun 10, 2024Sep 16, 2024Dec 23, 2024
Updated 24d ago

Evaluation Results

MethodLinks
2023.05
80.1
2024.12
73.2
2024.12
72.3
2023.05
71.9
2024.12
71.5
2024.12
71.1
2024.12
70.2
2023.05
68.3
2024.12
67.7
2024.12
67.6
2024.12
67.4
2024.12
67.3
2024.12
65.9
2024.12
65.4
2024.12
65
2024.12
64.6
2024.12
61.7
2024.12
61.3
2024.12
61
2024.12
60.7
2023.05
58.9
2024.12
58.5
2024.12
58.4
2024.12
57.4
2024.12
55.2
2024.12
47.9
2023.05
45.7
2024.12
44.2
2024.12
44
2023.05
41.2
2023.05
40.8
2024.12
31.9
2024.12
28.6
2024.12
20.2
2024.12
2.3