Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Social Reasoning on GRASP-Bench (test)

42.9T1 Accuracy

Gemini 3.1 Pro

16.06823.0343036.966May 15, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2026.05
42.950.541.854.449.140.328.429.867.663.184.761.256.148.468.244.342.65044.744.6
2026.05
38.643.934.950.926.423.927.338.160.463.169.455.151.238.759.13939.131.840.839.2
2026.05
37.150.446.365.847.341.843.233.35850.890.138.83925.859.148.153.922.751.343.2
2026.05
37.152.64870.25046.346.627.464.454.990.159.246.341.963.645.648.740.948.739.2
2026.05
32.933.732.639.525.535.830.732.135.444.329.736.743.912.927.333.433.927.331.636.5
2026.05
32.931.723.621.914.519.422.734.545.745.966.732.729.325.827.328.224.331.830.331.1
2026.05
31.436.928.13623.616.430.727.448.44174.840.834.119.440.93840.936.439.532.4
2026.05
3036.234.355.327.328.427.33140.74148.636.734.122.645.533.833.931.831.636.5
2026.05
3036.631.931.63028.436.434.54246.753.226.534.119.440.938.343.545.531.635.1
2026.05
3036.337.151.838.228.435.23134.842.629.736.726.825.840.936.636.522.730.347.3
2026.05
28.630.129.129.832.720.930.728.628.529.526.128.624.432.336.434.137.436.438.224.3
2026.05
24.337.33036.833.617.930.729.847.135.282.932.72235.527.33842.631.835.535.1
2026.05
24.338.328.335.125.514.931.833.352.745.186.528.641.522.640.93833.931.835.548.6
2026.05
24.334.229.63622.725.439.827.43836.149.538.829.325.822.737.632.227.336.850
2026.05
22.94332.852.63022.434.1255951.687.440.846.325.868.240.84031.839.545.9
2026.05
22.934.228.330.73026.927.329.841.531.176.632.719.512.922.735.542.640.930.328.4
2026.05
2033.925.927.231.819.428.423.842.846.755.932.731.716.136.436.936.531.838.237.8
2026.05
17.13730.256.118.211.930.735.746.542.66430.6392954.537.338.318.242.136.5