Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Open-ended instruction following on BPO Eval (test)
Loading...
59.5
Win Rate (A)
BPO + Vicuna-v1.3 13B
39.22
44.485
49.75
55.015
Nov 7, 2023
Win Rate (A)
Tie Rate
Win Rate (B)
AWR
Updated 4d ago
Evaluation Results
Method
Method
Links
Win Rate (A)
Tie Rate
Win Rate (B)
AWR
BPO + Vicuna-v1.3 13B
Base LLM=Vicuna-v1.3,...
2023.11
59.5
6
34.5
13.1
BPO + Llama-2-chat 70B
Base LLM=Llama-2-chat,...
2023.11
53.5
11
35.5
16.8
BPO + Llama-2-chat 7B
Base LLM=Llama-2-chat,...
2023.11
53
10.5
36.5
17.4
BPO + Llama-2-chat 13B
Base LLM=Llama-2-chat,...
2023.11
53
12.5
34.5
18.1
BPO + Llama-2-chat 13B (Cross-size)
Base LLM=Llama-2-chat,...
2023.11
51
7
42
11.9
BPO + Vicuna-v1.3 7B
Base LLM=Vicuna-v1.3,...
2023.11
46
22
32
18.5
BPO + Llama-2-chat 7B (Cross-size)
Base LLM=Llama-2-chat,...
2023.11
40
5
55
-7.1
Feedback
Search any
task
Search any
task