Share your thoughts, 1 month free Claude Pro on usSee more

Personalized LLM Alignment Evaluation on PersonalRewardBench (test)

3.354Mean Score

Llama3.1-8B-Instruct-GRPO

Updated 5mo ago

Evaluation Results

Method	Links
Llama3.1-8B-Instruct-GRPO 2026.02		3.354	0.0102	-
Llama3.1-8B-Instruct-DPO 2026.02		3.316	0.0068	-
Qwen2.5-72B-Instruct 2026.02		3.214	0.0089	-
Llama3.1-70B-Instruct 2026.02		3.156	0.0093	-
Qwen2.5-7B-Instruct 2026.02		2.97	0.0089	-
Llama3.1-8B-Instruct 2026.02		2.954	0.0074	-