Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EditReward-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reward ModelingEditReward-Bench
PF85.4
17
Multi-way preference rankingEDITREWARD-BENCH
Preference Score (K=2)58.61
11
Visual Consistency AssessmentEditReward-Bench 2025 (test)
Subject Addition Accuracy80.88
6
Showing 3 of 3 rows