Share your thoughts, 1 month free Claude Pro on usSee more

Open-ended instruction following on BPO Eval (test)

59.5Win Rate (A)

BPO + Vicuna-v1.3 13B

Updated 4mo ago

Evaluation Results

Method	Links
BPO + Vicuna-v1.3 13B 2023.11		59.5	6	34.5	13.1
BPO + Llama-2-chat 70B 2023.11		53.5	11	35.5	16.8
BPO + Llama-2-chat 7B 2023.11		53	10.5	36.5	17.4
BPO + Llama-2-chat 13B 2023.11		53	12.5	34.5	18.1
BPO + Llama-2-chat 13B (Cross-size) 2023.11		51	7	42	11.9
BPO + Vicuna-v1.3 7B 2023.11		46	22	32	18.5
BPO + Llama-2-chat 7B (Cross-size) 2023.11		40	5	55	-7.1