| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-context reasoning | OOLONG | Accuracy68.4 | 37 | |
| Long-context reasoning | OOLONG trec_coarse | Score86.6 | 28 | |
| Long-context reasoning | OOLONG | Latency (s)7.1 | 27 | |
| Long-Context Reasoning | Oolong-Synth | Accuracy78.41 | 11 | |
| Long-context Question Answering | Oolong Real | Score37.46 | 9 | |
| Long-context Question Answering | Oolong Synthetic | Score71.75 | 8 |