Share your thoughts, 1 month free Claude Pro on usSee more

Aggregated Logical Reasoning on Overall Mean

76.2Accuracy

Deepseek-V3.2-R

Updated 5mo ago

Evaluation Results

Method	Links
Deepseek-V3.2-R 2025.12		76.2
GPT-5.1-Low 2025.12		71.5
Gemini-3.0-Pro 2025.12		69.9
Qwen3-4B-Instruct + UnsolRL-Final 2025.12		34.9
Qwen3-4B-Instruct 2025.12		23.2
Qwen3-1.7B-Instruct + UnsolRL-Final 2025.12		13.8
Qwen3-1.7B-Instruct 2025.12		11.5