| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Tracking Shuffled Objects BBH | Role-Play Prompting | Accuracy71.33 | 54 | 1mo ago | |
| Zebralogic | Qwen 3 VL 32B Think | Score96.1 | 42 | 1mo ago | |
| Causal Judgement | Self-discover | Accuracy36 | 30 | 1mo ago | |
| K&K Logic Puzzles OOD | Score Threshold 2 (OOD)99 | 25 | 1mo ago | ||
| K&K Logic Puzzles In-domain | Accuracy (Level 3)98 | 25 | 1mo ago | ||
| Autologic en | DARL | Score0.439 | 16 | 1mo ago | |
| Autologic cn | DARL | Score40.3 | 16 | 1mo ago | |
| ZebraLogic | Accuracy96 | 15 | 1mo ago | ||
| ZebraLogic | NPR | Avg Accuracy @10.817 | 11 | 1mo ago | |
| Sudoku 8B Instruct (test) | Accuracy71.7 | 9 | 4d ago | ||
| Riddle 1.0 (test) | INMS | F1 Score69 | 7 | 1mo ago | |
| Pun 1.0 (test) | F1 Score41 | 7 | 1mo ago | ||
| Puzzle 1.0 (test) | F1 Score19 | 7 | 1mo ago | ||
| Zebralogic | SUPERNOVA-4B | Pass@877 | 6 | 8d ago | |
| ARC-Challenge & LogiQA OpenCompass (test) | CRITIQ | ARC-C Accuracy38.31 | 4 | 1mo ago | |
| Large-scale model pool Logic Reasoning 15 LLMs | RouteMoA | Accuracy95.6 | 3 | 1mo ago | |
| CommonsenseQA | MIG | Pass@169.8 | 3 | 1mo ago | |
| Logic reasoning tasks | - | - | 0 | 1mo ago |