Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Logical Reasoning on ZebraLogic (test)

92.2Grid Accuracy

In-place

-3.68821.20646.170.994Oct 1, 2025
Updated 5d ago

Evaluation Results

MethodLinks
2025.10
92.2---
2025.10
90.6---
2025.10
89.5---
2025.10
86.8---
2025.10
85---
2025.10
81.6---
2025.10
79---
2025.10
77.9---
2025.10
77.1---
2025.10
73.8---
2025.10
72.8---
2025.10
71.2---
2025.10
69---
2025.10
68.2---
2025.10
67.3---
2025.10
67.2---
2025.10
66.5---
2025.10
65.2---
2025.10
65---
2025.10
64.5---
2025.10
62.5---
2025.10
61.2---
2025.10
60---
2025.10
59---
2025.10
58.2---
2025.10
56.6---
2025.10
55.6---
2025.10
55.1---
2025.10
54.3---
2025.10
53.4---
2025.10
52---
2025.10
47.8---
2025.10
47---
2025.10
46.8---
2025.10
45.7---
2025.10
44.1---
2025.10
43.4---
2025.10
42.1---
2025.10
36.9---
2025.10
35.9---
2025.10
35.8---
2025.10
35.6---
2025.10
34.8---
2025.10
34.1---
2025.10
27.1---
2025.10
26.5---
2025.10
26.4---
2025.10
25.1---
2025.10
24.9---
2025.10
23.4---
2025.10
22.5---
2025.10
22.3---
2025.10
19.9---
2025.10
16.7---
2025.10
16.6---
2025.10
16---
2025.10
15---
2025.10
14.8---
2025.10
14.4---
2025.10
13.5---
2025.10
12.6---
2025.10
11.5---
2025.10
10.5---
2025.10
10.5---
2025.10
7.8---
2025.10
7.8---
2025.10
7.6---
2025.10
7.3---
2025.10
7.1---
2025.10
5.6---
2025.10
5.2---
2025.10
3.6---
2025.10
3.4---
2025.10
3.1---
2025.10
2.8---
2025.10
2.8---
2025.10
2.5---
2025.10
2.3---
2025.10
2.2---
2025.10
1.7---
2025.10
0.7---
2025.10
0.6---
2025.10
0.6---
2025.10
0.6---
2025.10
0.4---
2025.10
0.4---
2025.10
0.3---
2025.10
0---
2025.10
0---
2025.10
0---
2026.01
-17.3316.223.72
2026.01
-11.43013.23
2026.01
-17.0515.37-30.68
2026.01
-11.130-9.52
2026.01
-17.215.5525.22
2026.01
-11.13025.98