Share your thoughts, 1 month free Claude Pro on usSee more

Instruction Following on MT-Bench GPT-4o judge (test)

7.76MT-Bench Score

ScaleBiO

Updated 5mo ago

Evaluation Results

Method	Links
ScaleBiO 2024.06		7.76
ScaleBiO 2024.06		7.51
RHO-LOSS 2024.06		7.38
RHO-LOSS 2024.06		7.34
LESS 2024.06		7.2
LESS 2024.06		7.18
ScaleBiO 2024.06		7.12
RHO-LOSS 2024.06		6.89
Uniform Weighting 2024.06		6.66
Uniform Weighting 2024.06		6.11
LESS 2024.06		6.06
Uniform Weighting 2024.06		5.31