Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Simulatability on Tabular Datasets Aggregate (test)

74.63Accuracy (w/o explanation)

Gemini 3

63.366866.290969.21572.1391Feb 2, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
74.6382.447.81-30.79-
2026.02
7481.387.38-28.38-
2026.02
72.8982.089.19-33.9-
2026.02
72.5381.819.28-33.78-
2026.02
72.582.059.55-34.72-
2026.02
72.3881.849.46-34.25-
2026.02
72.1580.38.14-29.24-
2026.02
71.9879.497.51-26.8-
2026.02
71.6481.219.57-33.74-
2026.02
71.6279.688.06-28.38-
2026.02
71.1881.4410.27-35.61-
2026.02
71.0181.1410.12-34.92-
2026.02
70.3781.1910.82-36.51-
2026.02
69.4879.4910.02-32.81-
2026.02
69.2576.697.44-24.19-
2026.02
66.1771.55.34-15.77-
2026.02
65.7369.513.78-11.03-
2026.02
63.870.146.34-17.51-