| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Robustness against harmful content generation | LMSYS harmful queries | Attack Success Rate1 | 20 | |
| Instruction Following Evaluation | LMSYS In-Dist. | GPT-4o Score51.8 | 17 | |
| Proactive next utterance prediction | LMSYS (test) | LLM-Judge60.98 | 17 | |
| Output Length Prediction | LMSYS | MAE68.33 | 16 | |
| LLM Serving Efficiency | LMSYS trace | GPUs Used246 | 2 |