Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

IFBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction FollowingIFBench
Pass@1 (Strict)82.9
72
Instruction FollowingIFBench
Accuracy76.5
33
Instruction FollowingIFBench
IFBench Score43.28
27
Instruction FollowingIFBench
Prompt-level Accuracy34.69
21
Instruction FollowingIFBench
IFBench Score75.4
19
Reward ModelingIFBench
Accuracy69.3
17
Reward ModelingIFBench Hard
Accuracy78
16
Reward ModelingIFBench Normal
Accuracy80.5
16
Reward ModelingIFBench Simple
Accuracy87.2
16
Instruction FollowingIFBench
IFBench Score39.46
12
Instruction FollowingIFBench
Exact Match (EM)65
7
AlignmentIFBench
pass@141.7
7
Reward ModelingIFBench (test)
Accuracy57.9
7
Instruction FollowingIFBench
Pass@127
6
Instruction FollowingIFBench
Genuine Followup Rate9.7
6
General Task (Agentic Coding)IFBench
Score77.1
6
Instruction FollowingIFBench (test)
Score38.61
5
Instruction FollowingIFBench Strict
Avg@1031.5
2
Showing 18 of 18 rows