Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Open-ended instruction following on Vicuna Eval v1.3 (test)
Loading...
65
A Win Rate
BPO + Vicuna-v1.3 7B
48.152
52.526
56.9
61.274
Nov 7, 2023
A Win Rate
Tie Rate
B Win Rate
AWR
Updated 4d ago
Evaluation Results
Method
Method
Links
A Win Rate
Tie Rate
B Win Rate
AWR
BPO + Vicuna-v1.3 7B
Base LLM=Vicuna-v1.3,...
2023.11
65
8.7
26.3
18.5
BPO + Llama-2-chat 13B
Base LLM=Llama-2-chat,...
2023.11
61.3
2.5
36.2
18.1
BPO + Llama-2-chat 13B (Cross-size)
Base LLM=Llama-2-chat,...
2023.11
61.3
0
38.7
11.9
BPO + Llama-2-chat 7B
Base LLM=Llama-2-chat,...
2023.11
60
2.5
37.5
17.4
BPO + Llama-2-chat 70B
Base LLM=Llama-2-chat,...
2023.11
59.3
5.5
35.2
16.8
BPO + Vicuna-v1.3 13B
Base LLM=Vicuna-v1.3,...
2023.11
52.5
3.7
43.8
13.1
BPO + Llama-2-chat 7B (Cross-size)
Base LLM=Llama-2-chat,...
2023.11
48.8
3.7
47.5
-7.1
Feedback
Search any
task
Search any
task