Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LMSYS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Robustness against harmful content generationLMSYS harmful queries
Attack Success Rate1
20
Proactive next utterance predictionLMSYS (test)
LLM-Judge60.98
17
Output Length PredictionLMSYS
MAE68.33
16
Showing 3 of 3 rows