Share your thoughts, 1 month free Claude Pro on usSee more

Theorem Proving on DeepTheorem

54False Rate

DeepSeek-V3.2-Thinking (Agentic)

Updated 1mo ago

Evaluation Results

Method	Links
DeepSeek-V3.2-Thinking (Agentic) 2026.01		54	18	18
GPT-5.2 (Agentic) 2026.01		58	16	14
DeepSeek-V3.2 (Agentic) 2026.01		62	22	22
Gemini-3-Flash 2026.01		72	32	26
Gemini-3-Pro 2026.01		76	42	36
Qwen-Max (Agentic) 2026.01		76	22	22
Baseline 2026.01		76	20	16
Claude-Sonnet-4.5 (Agentic) 2026.01		80	36	32