Share your thoughts, 1 month free Claude Pro on usSee more

Theorem Proving on Small-scale benchmark Overall

33VR

Gemini-3-Pro

Updated 1mo ago

Evaluation Results

Method	Links
Gemini-3-Pro 2026.01		33
Claude-Sonnet-4.5 (Agentic) 2026.01		29
Gemini-3-Flash 2026.01		23
DeepSeek-V3.2 (Agentic) 2026.01		18
Qwen-Max (Agentic) 2026.01		16
Baseline 2026.01		15
GPT-5.2 (Agentic) 2026.01		14
DeepSeek-V3.2-Thinking (Agentic) 2026.01		13