Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

IFBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction FollowingIFBench
Accuracy77.8
72
Instruction FollowingIFBench
Pass@1 (Strict)82.9
72
Instruction FollowingIFBench
IFBench Score76
56
General instruction followingIFBench
Accuracy84.1
30
Instruction Following EvaluationIFBench
Score73.2
23
Instruction FollowingIFBench
Prompt-level Accuracy34.69
21
Instruction FollowingIFBench
IFBench Score79.79
19
Instruction FollowingIFBench
IFBench Score75.4
19
Instruction FollowingIFBench
Accuracy36
18
Reward ModelingIFBench
Accuracy69.3
17
Instruction FollowingIFBench (test)
Score55.95
16
Reward ModelingIFBench Hard
Accuracy78
16
Reward ModelingIFBench Normal
Accuracy80.5
16
Reward ModelingIFBench Simple
Accuracy87.2
16
Instruction FollowingIFBench-I
Accuracy73.96
15
Instruction FollowingIFBench-P
Accuracy17.4
10
Instruction FollowingIFBench
IFBench Score23.88
9
Instruction FollowingIFBench
Pr. (S)57.3
8
Instruction followingIFBench
LLM Throughput (tokens/s)2,688
8
Instruction FollowingIFBench
Exact Match (EM)65
7
AlignmentIFBench
pass@141.7
7
Reward ModelingIFBench (test)
Accuracy57.9
7
Instruction FollowingIFBench
Pass@127
6
Instruction FollowingIFBench
Genuine Followup Rate9.7
6
General Task (Agentic Coding)IFBench
Score77.1
6
Showing 25 of 33 rows