Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Human-in-the-loop Interactive Evaluation

Benchmarks

Task NameDataset NameSOTA ResultTrend
Dialogue GenerationHuman-in-the-loop Interactive Evaluation Customer-Agent Dialogs
Win Rate (vs GPT-4)41
8
Showing 1 of 1 rows