Share your thoughts, 1 month free Claude Pro on usSee more

Agentic Coding on Project dev (test)

0.881Tau^2

Claude-Sonnet-4.5

Updated 4mo ago

Evaluation Results

Method	Links
Claude-Sonnet-4.5 2025.12		0.881
GPT-5 2025.12		0.842
DeepSeek-V3.1-Nex-N1 2025.12		0.802
Minimax-M2 2025.12		0.762
GLM-4.6 2025.12		0.759
Kimi-K2-thinking 2025.12		0.752
Qwen3-32B-Nex-N1 2025.12		0.721
Gemini-2.5-pro 2025.12		0.654
Qwen3-30B-A3B-Nex-N1 2025.12		0.653
InternLM3-8B-Nex-N1 2025.12		0.63
DeepSeek-V3.1 2025.12		0.428
Qwen3-32B 2025.12		0.415
Qwen3-30B-A3B 2025.12		0.285