Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Modeling on Arabic preference (test)
Loading...
85.4
Accuracy
RM-Distiller-Qwen2.5-3B-Instruct
71.672
75.236
78.8
82.364
Jan 20, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
RM-Distiller-Qwen2.5-3B-Instruct
Sample Num=10K
2026.01
85.4
Llama-3-OffsetBias-8B
Sample Num=70K
2026.01
83.2
Tulu-3-8B-RM-RB2
Sample Num=350K
2026.01
83.2
Skywork-Reward-V2-8B
Sample Num=40,000K
2026.01
81.3
URM-LLaMA-3.1-8B
Sample Num=100K
2026.01
76.7
Skywork-Reward-8B-v0.2
Sample Num=80K
2026.01
75.9
BT-Qwen2.5-3B-Instruct
Sample Num=10K
2026.01
72.2
Feedback
Search any
task
Search any
task