Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pairwise Preference Comparison on 150 prompt-response pairs

63.3333Win Rate

RL-trained policy

44.61333349.47333354.33333359.193333May 28, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2026.05
63.333323.333313.3333
2026.05
6223.333314.6667
2026.05
6025.333314.6667
2026.05
58.666726.666714.6667
2026.05
56.666726.666716.6667
2026.05
54.66673411.3333
2026.05
51.333338.666710
2026.05
50.666740.66678.6667
2026.05
45.333346.66678