Share your thoughts, 1 month free Claude Pro on usSee more

Interactive Agent Task on TRIP-Bench Mid

55Loose Success Rate

GPT-5.2

Updated 5mo ago

Evaluation Results

Method	Links
GPT-5.2 2026.02		55	13
DeepSeek-V3.2 2026.02		41	9
Claude-Sonnet-4.5 2026.02		31	6
GLM-4.7 2026.02		29	0
Gemini-3-Flash 2026.02		25	0
GLM-4.7 2026.02		20	0
DeepSeek-V3.2 2026.02		20	3
Claude-Sonnet-4.5 2026.02		18	0
Gemini-3-Pro 2026.02		16	0
GPT-5.2 2026.02		14	0
Gemini-3-Flash 2026.02		11	0
Gemini-3-Pro 2026.02		9	0
Kimi-K2-Thinking 2026.02		8	4
Qwen3-235B-A22B-Instruct-2507 2026.02		5	0
Kimi-K2-0905-Preview 2026.02		0	0
Qwen3-235B-A22B-Thinking-2507 2026.02		0	0