Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Agent on τ2-Bench

85.4Accuracy

Gemini-3-Pro

56.2863.8471.478.96Dec 30, 2025Jan 13, 2026Jan 27, 2026Feb 11, 2026Feb 25, 2026Mar 11, 2026Mar 26, 2026
Updated 22d ago

Evaluation Results

MethodLinks
85.4
2026.03
80.9
2026.03
76.8
2026.03
76.6
2025.12
69.5
2025.12
69.1
2025.12
68.8
2025.12
64
2026.03
57.4