| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| BBH Logical Deduction (Seven Objects) (test) | TextReg | Accuracy55.2 | 22 | 13d ago | |
| Logical Deduction 5 objects (test) | TextReg | Accuracy61.1 | 16 | 13d ago | |
| BBH-LD | Qwen2.5-7B | BCA87.1 | 9 | 21d ago | |
| MineSweeper | GiGPO | p@152 | 8 | 3mo ago |