Share your thoughts, 1 month free Claude Pro on usSee more

Agent on Terminal-Bench

45Accuracy

DeepSeek V3.2

Updated 1mo ago

Evaluation Results

Method	Links
DeepSeek V3.2 2025.12		45	-
LongCat-Flash Exp-Chat 2025.12		42.5	-
GLM 4.6 2025.12		40.5	-
LongCat-Flash Chat 2025.12		39.5	-
Reflection 2026.05		25.8	-
LiTS-Fact 2026.05		25.8	-
No Memory 2026.05		23.6	-
PromptBridge 2025.12		18.75	25
Direct Transfer 2025.12		15	-
PromptBridge 2025.12		8.75	40
ReAct 2026.05		7.9	-
Direct Transfer 2025.12		6.25	-