| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | AlpacaFarm (test) | Reward Score387.196 | 40 | |
| Direct Prompt Injection | AlpacaFarm (208 samples) | Naive Success Rate78.36 | 30 | |
| Instruction Following | AlpacaFarm Eval (test) | Win Rate76.13 | 28 | |
| Instruction Following | AlpacaFarm | Win Rate59.2 | 15 | |
| Generation quality evaluation | AlpacaFarm | Win Rate36.4 | 12 |