Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on CR

89.3Accuracy

GPT-4

44.37256.03667.779.364Mar 7, 2024Mar 24, 2024Apr 10, 2024Apr 28, 2024May 15, 2024Jun 1, 2024Jun 19, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.03
89.3-
2024.03
83.1-
2024.03
80.7-
2024.03
78.3-
2024.06
75.966
2024.06
75.664.4
2024.03
74.4-
2024.06
74.463.7
2024.03
74.2-
2024.06
72.962
2024.03
72.7-
2024.03
72.4-
2024.03
72.2-
2024.06
7260.6
2024.06
71.762.7
2024.06
71.661.2
2024.03
71.1-
2024.06
70.860.7
2024.03
66.3-
2024.06
65.154.7
2024.06
63.852.7
2024.06
58.855.1
2024.06
57.653.4
2024.06
54.553.1
2024.06
52.652.1
2024.06
46.739.2
2024.06
46.135.6