Share your thoughts, 1 month free Claude Pro on usSee more

Problem Solving and Unsolvability Detection on Overall

97.4Solvable Accuracy

Gemini-3

Updated 5mo ago

Evaluation Results

Method	Links
Gemini-3 2025.12		97.4	84.1	90.8
Deepseek-V3.2-R 2025.12		88.4	84.4	86.1
Qwen3-4B + UnsolvableRL 2025.12		69.4	87.5	78.6
GPT-5.1-Low 2025.12		45.9	66.6	56.2
Qwen3-4B Instruct 2025.12		43.4	38.8	41.1
Qwen3-1.7B + UnsolvableRL 2025.12		25.5	76.4	50.9
Qwen3-1.7B Instruct 2025.12		23	41.7	32.4