Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LMSYS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Robustness against harmful content generationLMSYS harmful queries
Attack Success Rate1
20
Instruction Following EvaluationLMSYS In-Dist.
GPT-4o Score51.8
17
Proactive next utterance predictionLMSYS (test)
LLM-Judge60.98
17
Output Length PredictionLMSYS
MAE68.33
16
LLM Serving EfficiencyLMSYS trace
GPUs Used246
2
Showing 5 of 5 rows