Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

IFEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction FollowingIFEval
IFEval Accuracy95
836
Instruction FollowingIFEval
Accuracy (IFEval)90.39
89
Instruction FollowingIFEval (test)
IFEval Score84.8
88
Instruction FollowingIFEval
IFEval Score94.64
87
Instruction FollowingIFEval
Genuine-Followup Rate87.5
65
Instruction FollowingIFEval
Avg. Score (IFEval)88.92
45
Instruction FollowingIFEval
Win Rate81.2
36
DialogueIFEval
IFEval80.2
34
Instruction FollowingIFEval Inverse
Accuracy83.7
33
Instruction Following EvaluationIFEval
IFEval Score86.69
32
Instruction FollowingIFEval
IF-RT Score76.26
30
Instruction FollowingIFEval v1 (test)
Accuracy79.85
28
Instruction FollowingIFEval
Avg@k64.23
27
AlignmentIFEval strict prompt
pass@190.2
26
Instruction FollowingIFEval
IFEval Score86.5
25
Instruction Following EvaluationPPE-IFEval
Score76
24
Text generationIFEval
Accuracy74.49
23
Instruction FollowingIFEval In-Domain
Precision (L)0.871
23
Instruction FollowingIFEval
Average Score47.67
21
Instruction FollowingIFEval
Strict Accuracy90
21
Instruction FollowingIFEval
IFScore84.63
21
Instruction FollowingIFEval
Improvement Score16.57
20
Reverse Chain-of-Thought GenerationIFEval
Accuracy85.1
20
Instruction-followingIFEval
IFEval Score96.3
18
Instruction FollowingIFEval
Score (%)62.48
18
Showing 25 of 75 rows