Share your thoughts, 1 month free Claude Pro on usSee more

Reinforcement Learning on Tomato

6.28True Score

ORPO

Updated 3mo ago

Evaluation Results

Method	Links
ORPO 2026.04		6.28	6.83	-1.51	0.0003	-1.51
Max-Min 2026.04		4.56	4.68	-1.37	0	-1.37
ORPO* 2026.04		4	3.98	-1.09	0	-1.09