Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reinforcement Learning on Tomato

6.28True Score

ORPO

3.90884.52445.145.7556Apr 13, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.04
6.286.83-1.510.0003-1.51
2026.04
4.564.68-1.370-1.37
2026.04
43.98-1.090-1.09