| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Model Evaluation | BenchPress short-context (test) | Accuracy68.84 | 131 | |
| Context Compression Evaluation | BenchPress suite macro-averaged across all datasets | Macro-averaged F174.33 | 130 | |
| Context Compression | BenchPress short-context (test) | EM (4x Single Context)56.41 | 21 |