Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Instruction Following Evaluation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Instruction Following Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
IFEval
DS-V3.1-Terminus (no_think)
IFEval Score
86.69
32
21d ago
PPE-IFEval
Rubric-ARROW-voting@5
Score
76
24
5d ago
Average (Vicuna, Self-instruct, Dolly, BPO) (test)
BPO-aligned gpt-3.5-turbo
Delta Win Rate (ΔWR)
22
24
3mo ago
IFBench
Rubric-ARROW-voting@5
Score
73.2
23
5d ago
InfoBench
RM-R1-32B (Qwen-2.5-Inst)
Score
86.1
23
5d ago
IFEval Inverse
Qwen3-30B
Accuracy
83.7
18
22d ago
Vicuna Out-of-Distribution
SODA
GPT-4o Score
51.9
17
1mo ago
SelfInst Out-of-Distribution
SODA
GPT-4o Score
51.6
17
1mo ago
Dolly Out-of-Distribution
SODA
GPT-4o Score
49.9
17
1mo ago
LMSYS In-Dist.
SODA
GPT-4o Score
51.8
17
1mo ago
AlpacaEval 2
VRM-PPO
Win Rate
48.14
16
2mo ago
ArenaHard v1
+RL (Skywork-Reward-V2-Llama-3.1-8B)
ArenaHardv1 Score
38
14
3mo ago
AlpacaEval 2.0 (test)
DAR
LC% over π0
54.17
10
3mo ago
BelleEval
C-DPO
Score
87
6
1mo ago
Ours hard seed data
GPT-4 Turbo
Score
56.73
5
3mo ago
SELF-INSTRUCT Ours
GPT-4 Turbo
Score
74.29
5
3mo ago
SELF-INSTRUCT
GPT-4 Turbo
Score
69.48
5
3mo ago
SELF-INSTRUCT seed data
GPT-4 Turbo
Score
72.01
5
3mo ago
Instruction Tuning with GPT-4
Claude3
Score
71.29
5
3mo ago
WizardLM
GPT-4 Turbo
Score
72.06
5
3mo ago
BPO Eval (test)
BPO
A Win Rate
58.5
5
3mo ago
Dolly Eval
BPO
A Win Rate
62
5
3mo ago
Self-instruct Eval
BPO
Win Rate (A)
56.7
5
3mo ago
Vicuna Eval
BPO
Win Rate (A)
63.8
5
3mo ago
IFEval (random subset of 50 prompts)
DIRECTER
Task Fidelity
85.9
3
2mo ago
Showing 25 of 26 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs