Share your thoughts, 1 month free Claude Pro on usSee more

Reward Modeling on Arabic preference (test)

85.4Accuracy

RM-Distiller-Qwen2.5-3B-Instruct

Updated 4mo ago

Evaluation Results

Method	Links
RM-Distiller-Qwen2.5-3B-Instruct 2026.01		85.4
Llama-3-OffsetBias-8B 2026.01		83.2
Tulu-3-8B-RM-RB2 2026.01		83.2
Skywork-Reward-V2-8B 2026.01		81.3
URM-LLaMA-3.1-8B 2026.01		76.7
Skywork-Reward-8B-v0.2 2026.01		75.9
BT-Qwen2.5-3B-Instruct 2026.01		72.2