Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Open-ended instruction following on Dolly Eval
Loading...
54
A Win Rate
BPO + Llama-2-chat 13B (Cross-size)
46.72
48.61
50.5
52.39
Nov 7, 2023
A Win Rate
Tie Rate
B Win Rate
AWR Score
Updated 4d ago
Evaluation Results
Method
Method
Links
A Win Rate
Tie Rate
B Win Rate
AWR Score
BPO + Llama-2-chat 13B (Cross-size)
Base LLM=Llama-2-chat,...
2023.11
54
6.5
39.5
11.9
BPO + Llama-2-chat 7B
Base LLM=Llama-2-chat,...
2023.11
52
9.5
38.5
17.4
BPO + Vicuna-v1.3 13B
Base LLM=Vicuna-v1.3,...
2023.11
52
8
40
13.1
BPO + Llama-2-chat 70B
Base LLM=Llama-2-chat,...
2023.11
51
18
31
16.8
BPO + Llama-2-chat 13B
Base LLM=Llama-2-chat,...
2023.11
50.5
13.5
36
18.1
BPO + Llama-2-chat 7B (Cross-size)
Base LLM=Llama-2-chat,...
2023.11
49
2
49
-7.1
BPO + Vicuna-v1.3 7B
Base LLM=Vicuna-v1.3,...
2023.11
47
22
31
18.5
Feedback
Search any
task
Search any
task