| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Tracking Shuffled Objects BBH | Role-Play Prompting | Accuracy71.33 | 54 | 4d ago | |
| Zebralogic | Qwen 3 VL 32B Think | Score96.1 | 42 | 4d ago | |
| Causal Judgement | Self-discover | Accuracy36 | 30 | 4d ago | |
| Autologic en | DARL | Score0.439 | 16 | 4d ago | |
| Autologic cn | DARL | Score40.3 | 16 | 4d ago | |
| ZebraLogic | Accuracy96 | 15 | 4d ago | ||
| ZebraLogic | NPR | Avg Accuracy @10.817 | 11 | 4d ago | |
| ARC-Challenge & LogiQA OpenCompass (test) | CRITIQ | ARC-C Accuracy38.31 | 4 | 4d ago | |
| Large-scale model pool Logic Reasoning 15 LLMs | RouteMoA | Accuracy95.6 | 3 | 4d ago | |
| CommonsenseQA | MIG | Pass@169.8 | 3 | 4d ago | |
| Logic reasoning tasks | - | - | 0 | 4d ago |