Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Open-ended Instruction Following on Self-instruct Eval
Loading...
53.6
Win Rate (A)
BPO + Llama-2-chat 7B
39.56
43.205
46.85
50.495
Nov 7, 2023
Win Rate (A)
Tie Rate
Win Rate (B)
AWR
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win Rate (A)
Tie Rate
Win Rate (B)
AWR
BPO + Llama-2-chat 7B
Base LLM=Llama-2-chat,...
2023.11
53.6
9.9
36.5
17.4
BPO + Llama-2-chat 13B
Base LLM=Llama-2-chat,...
2023.11
51.2
11.9
36.9
18.1
BPO + Llama-2-chat 13B (Cross-size)
Base LLM=Llama-2-chat,...
2023.11
48.4
4.8
46.8
11.9
BPO + Vicuna-v1.3 13B
Base LLM=Vicuna-v1.3,...
2023.11
46.4
13.9
39.7
13.1
BPO + Llama-2-chat 70B
Base LLM=Llama-2-chat,...
2023.11
46
13.1
40.9
16.8
BPO + Vicuna-v1.3 7B
Base LLM=Vicuna-v1.3,...
2023.11
42
21.1
36.9
18.5
BPO + Llama-2-chat 7B (Cross-size)
Base LLM=Llama-2-chat,...
2023.11
40.1
5.1
54.8
-7.1
Feedback
Search any
task
Search any
task