| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Speculative Decoding | 20 Prompts across 4 Task Categories | Mean Expected Tokens per Speculation Step6.55 | 20 | |
| Jailbreak Prompt Quality Evaluation | 500 randomly sampled prompts | Similarity81 | 16 | |
| Distribution-distance evaluation | Prompts 100 (evaluation) | Distinct-N (WM)94.1 | 14 | |
| Creative Plot Generation | 160 prompts NQD (test) | Character Development8.67 | 13 | |
| Over-generation attack | 1000 prompts (test) | Succ. @≥ 188.2 | 8 | |
| Text-to-Image Generation | prompts 10 randomly sampled | Inference Time (s)2.2322 | 6 | |
| Property-based retrieval | Prompts (test) | MAP0.48 | 6 | |
| Text-to-Video | 1,024 prompts (held-out) | VQ4.81 | 5 | |
| Panorama Generation | 14 prompts 1000 panoramas of dimensions 512x4608 | Intra-LPIPS0.58 | 4 | |
| Text-to-Image Generation | 400 prompts (test) | HPSv229.0533 | 4 | |
| Steering LLM states | 50 prompts | LogFreq (d)1.6666 | 3 | |
| 3D Scene Editing | 15 distinct single-task prompts | LLM Time10.63 | 3 | |
| LLM agent alignment evaluation | 1000 prompts (test) | Usefulness Score1 | 2 |