LMSYS

Benchmarks

Task Name	Dataset Name	SOTA Result
Robustness against harmful content generation	LMSYS harmful queries	Attack Success Rate1	20
Instruction Following Evaluation	LMSYS In-Dist.	GPT-4o Score51.8	17
Proactive next utterance prediction	LMSYS (test)	LLM-Judge60.98	17
Output Length Prediction	LMSYS	MAE68.33	16
LLM Request Scheduling	LMSYS (test)	Average Latency (s)0.1103	12
KV cache reuse efficiency	LMSys	Match Rate47.3	4
LLM Serving Efficiency	LMSYS trace	GPUs Used246	2

Showing 7 of 7 rows