Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Preference Classification on Anthropic HH Helpful (test)
Loading...
57.6
Accuracy
UMM-RM
44.08
47.59
51.1
54.61
Nov 30, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
UMM-RM
Backbone=TinyLlama-1.1...
2025.11
57.6
UMM-RM
Backbone=TinyLlama-1.1...
2025.11
55.2
Mean Optimization
Backbone=TinyLlama-1.1...
2025.11
55
Worst-Case Optimization
Backbone=TinyLlama-1.1...
2025.11
54.8
Uncertainty-Weighted Optimization
Backbone=TinyLlama-1.1...
2025.11
54.6
UMM-RM
Backbone=TinyLlama-1.1...
2025.11
54.2
Dense RM
Backbone=TinyLlama-1.1B
2025.11
44.6
Feedback
Search any
task
Search any
task