Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Following on IFEval (pass@1 Strict)
Loading...
82.4
Pass@1 (Strict)
EXPERT
26.656
41.128
55.6
70.072
Feb 12, 2026
Feb 20, 2026
Feb 28, 2026
Mar 8, 2026
Mar 16, 2026
Mar 24, 2026
Apr 1, 2026
Pass@1 (Strict)
Updated 16d ago
Evaluation Results
Method
Method
Links
Pass@1 (Strict)
EXPERT
Base Model=OLMo-3-7B,...
2026.04
82.4
w/o Merging
Models=Math
2026.02
64.93
REGMEAN
Base Model=OLMo-3-7B,...
2026.04
62.8
ACTMat
Base Model=OLMo-3-7B,...
2026.04
47
w/o Merging
Models=Code
2026.02
43.05
SCF-RKL
Models=Fuse
2026.02
42.12
Task Arithmetic
Models=Fuse
2026.02
38.08
Dare Task Arithmetic
Models=Fuse
2026.02
37.5
Dare Ties Merging
Models=Fuse
2026.02
37.25
Ties Merging
Models=Fuse
2026.02
34.7
SCE
Models=Fuse
2026.02
34.47
TSV
Base Model=OLMo-3-7B,...
2026.04
34
TA
Base Model=OLMo-3-7B,...
2026.04
32.2
AVERAGE
Base Model=OLMo-3-7B,...
2026.04
31.2
ZERO-SHOT
Base Model=OLMo-3-7B,...
2026.04
30.7
ISO-C
Base Model=OLMo-3-7B,...
2026.04
28.8
Feedback
Search any
task
Search any
task