Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Agent Task Completion on τ²-Bench

92.1Avg Task Reward

GPT-5.1 with H-EPM

83.88486.01788.1590.283Dec 8, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
92.1
2025.12
84.2