Share your thoughts, 1 month free Claude Pro on usSee more

Interactive Agent Task on TRIP-Bench Overall

45Loose Success Score

GPT-5.2

Updated 4mo ago

Evaluation Results

Method	Links
GPT-5.2 2026.02		45	18.5
DeepSeek-V3.2 2026.02		40	10.5
Claude-Sonnet-4.5 2026.02		32	8.5
Gemini-3-Flash 2026.02		23.3	6.3
GLM-4.7 2026.02		20.3	4
Gemini-3-Pro 2026.02		20	2.8
DeepSeek-V3.2 2026.02		18.5	2.3
Gemini-3-Pro 2026.02		18	3
Claude-Sonnet-4.5 2026.02		17.3	1.8
Gemini-3-Flash 2026.02		17.3	5.5
GLM-4.7 2026.02		14.8	0
GPT-5.2 2026.02		13.3	0.5
Kimi-K2-Thinking 2026.02		10.8	2.3
Qwen3-235B-A22B-Instruct-2507 2026.02		5.8	0.5
Kimi-K2-0905-Preview 2026.02		3.3	0
Qwen3-235B-A22B-Thinking-2507 2026.02		0	0