Share your thoughts, 1 month free Claude Pro on usSee more

Open-ended instruction following on Vicuna Eval v1.3 (test)

65A Win Rate

BPO + Vicuna-v1.3 7B

Updated 5mo ago

Evaluation Results

Method	Links
BPO + Vicuna-v1.3 7B 2023.11		65	8.7	26.3	18.5
BPO + Llama-2-chat 13B 2023.11		61.3	2.5	36.2	18.1
BPO + Llama-2-chat 13B (Cross-size) 2023.11		61.3	0	38.7	11.9
BPO + Llama-2-chat 7B 2023.11		60	2.5	37.5	17.4
BPO + Llama-2-chat 70B 2023.11		59.3	5.5	35.2	16.8
BPO + Vicuna-v1.3 13B 2023.11		52.5	3.7	43.8	13.1
BPO + Llama-2-chat 7B (Cross-size) 2023.11		48.8	3.7	47.5	-7.1