| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reasoning | FLenQA 1000 tokens | Accuracy78.5 | 15 | |
| Reasoning | FLenQA 500 tokens | Accuracy74 | 15 | |
| Reasoning | FLenQA 250 tokens | Accuracy80 | 15 | |
| Reasoning | FLenQA 3000 tokens | Accuracy39.3 | 9 | |
| Reasoning | FLenQA 2000 tokens | Accuracy52.5 | 9 |