Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Interactive Tool-Use Agent Performance on VitaBench

30Cross Score

GPT-o3

0.368.05515.7523.445Dec 28, 2025Jan 3, 2026Jan 10, 2026Jan 17, 2026Jan 23, 2026Jan 30, 2026Feb 6, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
3053.553.537.8-
2026.02
2965662746.8
2025.12
23.54943.826.5-
2025.12
234651.529-
2025.12
22.85452.537.5-
2025.12
19.544.546.523.5-
2025.12
18.8444617.5-
2025.12
17.54654.524-
2025.12
16.33442.518.3-
2025.12
16354020.5-
2025.12
15.535.342.522-
2026.02
15374212-
2026.02
1557581436
2025.12
15374212-
2026.02
14.5453215.8-
2026.02
1256571835.8
2026.02
11.532.53018.8-
2026.02
10.831.334.512.5-
2025.12
10.53032.511.5-
2025.12
8253316-
2026.02
6.126397-
2025.12
6.126397-
2026.02
643441527
2026.02
5.32722.54.5-
2026.02
4261710-
2025.12
4261710-
2025.12
4232112-
2026.02
326.323.87-
2025.12
214112-
2026.02
1.518.314.84.5-
2026.01
----5.8
----4.5
2026.01
----7