Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

IFEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction FollowingIFEval
Accuracy (0-100)94
292
Instruction FollowingIFEval (test)
IFEval Score81.17
45
Reverse Chain-of-Thought GenerationIFEval
Accuracy85.1
20
AlignmentIFEval strict prompt
pass@190.2
16
Constraint-following Instruction EvaluationIFEval
Average Score54.4
16
ChatIFEval
Loose Prompt Metric48.8
15
Instruction FollowingIFEval standard (test)
Punctuation Score100
10
AlignmentIFEval
IFEval Score86.3
10
Instruction FollowingIFEval
Pass@1 (Strict)64.93
8
Verifiable Instruction FollowingIFEval (test)
Prompt Loose Accuracy75.23
7
Instruction Following EvaluationIFEval
IFEval Score69.9
6
Personalized InteractionIFEval Synthetic
Personalization Score6.58
6
Thai output consistencyIFEval-TH
IFEval-TH Score99.4
6
OtherIFEval
Score86.69
6
Agent & AlignmentIFEval strict-prompt
Score83.73
5
Instruction FollowingIFEval v1 (test)
Accuracy72.1
4
Instruction Following EvaluationIFEval (dev)
Accuracy92
3
Instruction FollowingIFEval 1.0 (full)
Pass@1446
2
Instruction FollowingIFEval strict instance
Accuracy25.06
2
Instruction FollowingIFEval TH
Overall Score80.47
2
Instruction FollowingIFEval EN
Score87.64
2
Showing 21 of 21 rows