Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Following on IF-Eval
Loading...
63.7
Accuracy
Base
38.2096
44.8273
51.445
58.0627
Sep 29, 2025
Oct 22, 2025
Nov 15, 2025
Dec 8, 2025
Jan 1, 2026
Jan 24, 2026
Feb 17, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Base
backbone=DS-8B
2025.09
63.7
IPO
backbone=DS-8B
2025.09
56.2
RealSafe
backbone=DS-8B, alignm...
2025.09
54.7
Base Model
Backbone=Meta-Llama-3-...
2026.02
40.48
Numerical
Backbone=Meta-Llama-3-...
2026.02
39.93
Random
Backbone=Meta-Llama-3-...
2026.02
39.74
WIM Fixed Judge
Backbone=Meta-Llama-3-...
2026.02
39.56
WIM Changing Judge
Backbone=Meta-Llama-3-...
2026.02
39.19
Feedback
Search any
task
Search any
task