Share your thoughts, 1 month free Claude Pro on usSee more

Agentic Tool Use on τ²-Bench Retail

90.4Accuracy

Seed2.0 Pro

Updated 23d ago

Evaluation Results

Method	Links
Seed2.0 Pro 2026.06		90.4
Claude-Opus-4.5 2026.06		88.9
Claude-Sonnet-4.5 2026.06		86.2
Gemini-3-pro High 2026.06		85.3
Qwen3.5-27B 2026.04		84.7
GPT-5.2 High 2026.06		82
K-EXAONE-236B-A23B 2026.04		78.6
GPT-5 mini 2026.04		78.3
EXAONE 4.5 33B 2026.04		77.9
Qwen3-VL-235B-A22B 2026.04		67