| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Office Application Question Answering | OfficeQA held-out (test) | Score (%)72.1 | 59 | |
| Question Answering | OfficeQA | Accuracy82.5 | 25 | |
| Question Answering | OfficeQA 246 questions | Accuracy80.1 | 15 | |
| Long-context reasoning | OfficeQA | Accuracy57.14 | 10 |