Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Open-ended Instruction Following on Self-instruct Eval
Loading...
53.6
Win Rate (A)
BPO + Llama-2-chat 7B
39.56
43.205
46.85
50.495
Nov 7, 2023
Win Rate (A)
Tie Rate
Win Rate (B)
AWR
Updated 4d ago
Evaluation Results
Method
Method
Links
Win Rate (A)
Tie Rate
Win Rate (B)
AWR
BPO + Llama-2-chat 7B
Base LLM=Llama-2-chat,...
2023.11
53.6
9.9
36.5
17.4
BPO + Llama-2-chat 13B
Base LLM=Llama-2-chat,...
2023.11
51.2
11.9
36.9
18.1
BPO + Llama-2-chat 13B (Cross-size)
Base LLM=Llama-2-chat,...
2023.11
48.4
4.8
46.8
11.9
BPO + Vicuna-v1.3 13B
Base LLM=Vicuna-v1.3,...
2023.11
46.4
13.9
39.7
13.1
BPO + Llama-2-chat 70B
Base LLM=Llama-2-chat,...
2023.11
46
13.1
40.9
16.8
BPO + Vicuna-v1.3 7B
Base LLM=Vicuna-v1.3,...
2023.11
42
21.1
36.9
18.5
BPO + Llama-2-chat 7B (Cross-size)
Base LLM=Llama-2-chat,...
2023.11
40.1
5.1
54.8
-7.1
Feedback
Search any
task
Search any
task