Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LMSYS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Robustness against harmful content generationLMSYS harmful queries
Attack Success Rate1
20
Instruction Following EvaluationLMSYS In-Dist.
GPT-4o Score51.8
17
Proactive next utterance predictionLMSYS (test)
LLM-Judge60.98
17
Output Length PredictionLMSYS
MAE68.33
16
KV cache reuse efficiencyLMSys
Match Rate47.3
4
LLM Serving EfficiencyLMSYS trace
GPUs Used246
2
Showing 6 of 6 rows