Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Ultra-Real

Benchmarks

Task NameDataset NameSOTA ResultTrend
Helpfulness preference labeling accuracyUltra-Real
Benchmark Score74.8
15
Preference evaluationUltra-Real (target)
Win Rate43
2
Preference evaluationUltra-Real benchmark
Win Rate43.1
2
Showing 3 of 3 rows