| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Generation | MAB code | TS69.6 | 18 | |
| Fact Consolidation | MAB FC-SH (262K context) v3 (full) | Accuracy (SubEM)93 | 9 | |
| Single-Hop Fact-based Reasoning | MAB FC-SH 262K v3 (test) | Accuracy93 | 8 | |
| Fact Consolidation | MAB FC-MH 262K context v3 (full) | Accuracy (SubEM)27 | 7 | |
| Multi-Hop Fact-based Reasoning | MAB FC-MH, 262K v3 (test) | Accuracy27 | 5 | |
| Online Packet Scheduling with Deadlines (OPSD) | MAB 1-bounded | Competitive Ratio1 | 2 | |
| epsilon-Best Arm Identification | MAB Multi-task | Non-adaptive Upper Bound1 | 1 | |
| Regret Minimization | MAB with paid observations | Metric- | 0 |