Share your thoughts, 1 month free Claude Pro on usSee more

Problem Solving and Unsolvability Detection on AIME 24-25

95Solvable Accuracy

Gemini-3

Updated 5mo ago

Evaluation Results

Method	Links
Gemini-3 2025.12		95	21.2	58.1
Deepseek-V3.2-R 2025.12		85	40.3	62.7
Qwen3-4B + UnsolvableRL 2025.12		69.6	40.4	55
Qwen3-4B Instruct 2025.12		67.9	14.8	41.4
GPT-5.1-Low 2025.12		61.7	42.5	52.1
Qwen3-1.7B Instruct 2025.12		38.3	21.2	29.8
Qwen3-1.7B + UnsolvableRL 2025.12		35.4	28.7	32.1