Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Human Evaluation on Human Evaluation Set (test)

0.65Win Rate

LongDPO

0.615680.624590.63350.64241Feb 4, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.02
0.650.0830.267
2025.02
0.6170.1670.216
2025.02
0.6170.0670.316