| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Defense | Hard (H) | FPR0 | 12 | |
| Classification | HARD (test) | Accuracy97.77 | 8 | |
| Online Learning | HARD | Latency (s)0.2516 | 8 | |
| Reasoning over Large Structured Context | Hard | ReasoningJudge Score5 | 4 | |
| Joint Audio-Video Generation | Hard (test) | Sync-C6.12 | 4 | |
| Online Bin Packing | Hard28-R | Gap Percentage8.06 | 4 |