Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Ultra-Real

Benchmarks

Task NameDataset NameSOTA ResultTrend
Helpfulness preference labeling accuracyUltra-Real
Benchmark Score74.8
15
Preference evaluationUltra-Real (target)
Win Rate43
2
Preference evaluationUltra-Real benchmark
Win Rate43.1
2
Showing 3 of 3 rows