Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ID

Benchmarks

Task NameDataset NameSOTA ResultTrend
Open-ended DialogueID Average
Win Rate72.2
4
LLM response quality predictionID Claude 3.5 Haiku 20241022 (test)
RMSE0.45
3
Showing 2 of 2 rows