Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Modeling on MT-Bench OOD (test)
Loading...
73
Score
GRM w/ sft
68.008
69.304
70.6
71.896
Jun 14, 2024
Score
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
Accuracy
GRM w/ sft
Base Model=gemma-2B-it...
2024.06
73
-
GRM w/ dpo-noref
Base Model=gemma-2B-it...
2024.06
72.1
-
Classifier + label smooth
Base Model=gemma-2B-it...
2024.06
71.9
-
GRM w/ dpo
Base Model=gemma-2B-it...
2024.06
71.3
-
Classifier + Ensemble
Base Model=gemma-2B-it...
2024.06
71.1
-
Classifier + margin
Base Model=gemma-2B-it...
2024.06
71
-
Classifier (baseline)
Base Model=gemma-2B-it...
2024.06
69.1
-
Classifier (Frozen)
Base Model=gemma-2B-it...
2024.06
68.2
-
Classifier (Frozen)
Training Data Size=400...
2024.06
-
69.5
Classifier (baseline)
Training Data Size=400...
2024.06
-
71.2
Classifier + margin
Training Data Size=400...
2024.06
-
72.6
Classifier + label smooth
Training Data Size=400...
2024.06
-
71.2
Classifier + Ensemble
Training Data Size=400...
2024.06
-
73.7
GRM w/ dpo
Training Data Size=400...
2024.06
-
73.4
GRM w/ dpo-noref
Training Data Size=400...
2024.06
-
73
GRM w/ sft
Training Data Size=400...
2024.06
-
73.4
Feedback
Search any
task
Search any
task