| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Average (Vicuna, Self-instruct, Dolly, BPO) (test) | BPO-aligned gpt-3.5-turbo | Delta Win Rate (ΔWR)22 | 24 | 4d ago | |
| IFEval | CrispEdit | IFEval Score69.9 | 6 | 4d ago | |
| Ours hard seed data | Score56.73 | 5 | 3d ago | ||
| SELF-INSTRUCT Ours | Score74.29 | 5 | 3d ago | ||
| SELF-INSTRUCT | Score69.48 | 5 | 3d ago | ||
| SELF-INSTRUCT seed data | Score72.01 | 5 | 3d ago | ||
| Instruction Tuning with GPT-4 | Claude3 | Score71.29 | 5 | 3d ago | |
| WizardLM | Score72.06 | 5 | 3d ago | ||
| BPO Eval (test) | BPO | A Win Rate58.5 | 5 | 4d ago | |
| Dolly Eval | BPO | A Win Rate62 | 5 | 4d ago | |
| Self-instruct Eval | BPO | Win Rate (A)56.7 | 5 | 3d ago | |
| Vicuna Eval | BPO | Win Rate (A)63.8 | 5 | 3d ago | |
| AlpacaEval 2.0 (test) | DAR | LC% over π054.17 | 4 | 4d ago | |
| IFEval (dev) | GPT-4 | Accuracy92 | 3 | 4d ago |