Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Visual Grounded Reasoning on TreeBench

54.8Overall Score

o3-0416

29.11235.78142.4549.119Nov 27, 2025
Updated 3d ago

Evaluation Results

MethodLinks
2025.11
54.8-6969.265.268.879.422.438.66186.250
2025.11
54.1-51.761.556.57583.82036.865.986.254.6
2025.11
51.34569.154.483.469.364.123.238.658.769.445.8
2025.11
50.44465.553.882.668.863.322.436.8616945.5
2025.11
46.9-51.761.565.243.869.118.838.648.872.443.2
2025.11
46.4-62.161.552.268.852.916.533.36186.245.5
2025.11
45.9-48.353.969.668.87515.319.356.172.443.2
2025.11
42.5-51.753.869.662.554.416.533.346.362.138.6
2025.11
42.2-65.569.256.556.348.511.833.351.272.438.6
2025.11
42-51.761.552.268.851.512.933.356.165.538.6
2025.11
40.5-62.153.865.262.336.812.928.153.765.547.7
2025.11
3935.758.661.565.25048.514.131.63944.840.9
2025.11
38.8-51.769.256.556.333.721.224.63972.443.2
2025.11
37.53062.153.865.268.851.511.824.636.651.747.7
2025.11
37.3-55.253.856.55032.421.222.841.572.436.4
2025.11
37-55.253.856.562.527.92035.13944.843.2
2025.11
30.1-51.746.260.962.542.603.536.662.129.5