Share your thoughts, 1 month free Claude Pro on usSee more

Multi-Task evaluation on LongBench Chat

72.6Point-wise Rate

DPO w/ LongReward

Updated 3mo ago

Evaluation Results

Method	Links
DPO w/ LongReward 2024.10		72.6
DPO w/ Contrast 2024.10		70.6
SFT 2024.10		69.8
DPO w/ LongReward 2024.10		69.2
officially post-trained 2024.10		68.6
DPO w/ Contrast 2024.10		68.2
DPO w/ SRM 2024.10		67.4
DPO w/ SRM 2024.10		66.6
SFT 2024.10		64.8
officially post-trained 2024.10		60.2