Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Instruction Following Evaluation on Dolly Eval
Loading...
62
A Win Rate
BPO
50.04
53.145
56.25
59.355
Nov 7, 2023
A Win Rate
B Win Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
A Win Rate
B Win Rate
BPO
Base LLM=gpt-4, Method...
2023.11
62
38
BPO
Base LLM=text-bison, M...
2023.11
60.5
39.5
BPO
Base LLM=gpt-3.5-turbo...
2023.11
60
40
BPO
Base LLM=claude-instan...
2023.11
51.5
48.5
BPO
Base LLM=claude-2, Met...
2023.11
50.5
49.5
Feedback
Search any
task
Search any
task