Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

IFEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction FollowingIFEval
IFEval Accuracy95
625
Instruction FollowingIFEval (test)
IFEval Score84.8
55
Instruction FollowingIFEval
Genuine-Followup Rate53.97
38
Instruction FollowingIFEval In-Domain
Precision (L)0.871
23
Instruction Following EvaluationIFEval
IFEval Score86.69
20
Reverse Chain-of-Thought GenerationIFEval
Accuracy85.1
20
Instruction FollowingIFEval loose
Avg@878.1
18
AlignmentIFEval strict prompt
pass@190.2
16
Instruction FollowingIFEval
Pass@1 (Strict)82.4
16
Constraint-following Instruction EvaluationIFEval
Average Score54.4
16
ChatIFEval
Loose Prompt Metric48.8
15
Instruction FollowingIFEval-PT
Instruction Score79.33
14
Instruction FollowingIFEval Spanish (ES)
Strict Accuracy82.92
13
Instruction FollowingIFEval Catalan (CA)
Strict Accuracy76.81
13
Instruction FollowingIFEval Galician (GL)
Strict Accuracy79
13
Instruction FollowingIFEval Basque (EU)
Strict Accuracy62.81
13
Instruction FollowingIFEval EN
Score87.64
12
Instruction FollowingIFEval English (test)
IFEval Accuracy93.16
10
Instruction FollowingIFEval et
Instruction Level Strict Accuracy61.4
10
Instruction FollowingIFEval standard (test)
Punctuation Score100
10
AlignmentIFEval
IFEval Score86.3
10
Verifiable Instruction FollowingIFEval (test)
Prompt Loose Accuracy75.23
7
Personalized InteractionIFEval Synthetic
Personalization Score6.58
6
Thai output consistencyIFEval-TH
IFEval-TH Score99.4
6
OtherIFEval
Score86.69
6
Showing 25 of 37 rows