Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Modeling on Unified-Feedback (ID)
Loading...
73.9
Accuracy
GRM w/ dpo-noref
63.396
66.123
68.85
71.577
Jun 14, 2024
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
GRM w/ dpo-noref
Training Data Size=400...
2024.06
73.9
GRM w/ dpo
Training Data Size=400...
2024.06
73.8
GRM w/ sft
Training Data Size=400...
2024.06
73.2
Classifier + Ensemble
Training Data Size=400...
2024.06
72.8
Classifier (baseline)
Training Data Size=400...
2024.06
72.1
Classifier + margin
Training Data Size=400...
2024.06
72
Classifier + label smooth
Training Data Size=400...
2024.06
71.5
Classifier (Frozen)
Training Data Size=400...
2024.06
63.8
Feedback
Search any
task
Search any
task