Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

IF-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction FollowingIF-Eval 0-shot
Score81.13
55
Instruction FollowingIF-Eval
Prompt Strict Accuracy86.5
13
Instruction FollowingIF-Eval
Accuracy63.7
8
Instruction FollowingIF-Eval fr
Average Score15.04
6
Instruction FollowingIF Eval v1 (test)
Accuracy60.8
4
Showing 5 of 5 rows