| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Open-ended generation | Zebra-Logic | FDR0.75 | 9 | |
| Zebra Logic Puzzle Solving | Zebra Logic Mean | Accuracy92.7 | 7 | |
| Zebra Logic Puzzle Solving | Zebra Logic Unsolvable | Accuracy91 | 7 | |
| Zebra Logic Puzzle Solving | Zebra Logic Solvable | Accuracy94.5 | 7 |