Share your thoughts, 1 month free Claude Pro on usSee more

Human Preference Alignment Out-of-Domain (test)

35.3HPS-v2.1

TAFS-GRPO

Updated 4mo ago

Evaluation Results

Method	Links
TAFS-GRPO 2026.02		35.3	159.5	3.511
DanceGRPO 2026.02		33.3	121.2	3.484
MixGRPO 2026.02		32.4	121	3.472
Flow-GRPO 2026.02		30.4	103.5	3.46
Reward-Instruct 2026.02		28.6	97.3	3.392
RG-LCD 2026.02		28.3	92.9	3.336
Flux.1-dev 2026.02		28	84.8	3.328