| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Bucket Sort | Formal-language benchmark lengths 41-500 (test) | Accuracy99.4 | 6 | |
| Compute Sqrt | Formal-language benchmark lengths 41-500 (test) | Accuracy (%)57.8 | 6 | |
| Binary Multiplication | Formal-language benchmark lengths 41-500 (test) | Accuracy (%)58.5 | 6 | |
| Binary Addition | Formal-language benchmark lengths 41-500 (test) | Accuracy100 | 6 | |
| Odds First | Formal-language benchmark lengths 41-500 (test) | Accuracy100 | 6 | |
| Missing Duplicate | Formal-language benchmark lengths 41-500 (test) | Accuracy100 | 6 | |
| Duplicate String | Formal-language benchmark lengths 41-500 (test) | Accuracy100 | 6 | |
| Solve Equation | Formal-language benchmark lengths 41-500 (test) | Accuracy67.8 | 6 | |
| Modular Arithmetic | Formal-language benchmark lengths 41-500 (test) | Accuracy96.1 | 6 | |
| Reverse String | Formal-language benchmark lengths 41-500 (test) | Accuracy100 | 6 | |
| Stack Manipulation | Formal-language benchmark lengths 41-500 (test) | Accuracy100 | 6 | |
| Cycle Navigation | Formal-language benchmark lengths 41-500 (test) | Accuracy100 | 6 | |
| Parity Check | Formal-language benchmark lengths 41-500 (test) | Accuracy100 | 6 | |
| Modular Arithmetic (Simple) | Formal-language benchmark lengths 41-500 (test) | Accuracy100 | 6 | |
| Even Pairs | Formal-language benchmark lengths 41-500 (test) | Accuracy (%)100 | 6 |