Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Following on IFBench (test)
Loading...
55.95
Score
GEPA+Merge
27.1212
34.6056
42.09
49.5744
Jul 25, 2025
Sep 11, 2025
Oct 29, 2025
Dec 17, 2025
Feb 3, 2026
Mar 23, 2026
May 11, 2026
Score
Updated 21d ago
Evaluation Results
Method
Method
Links
Score
GEPA+Merge
Model=GPT-4.1 mini
2026.04
55.95
GEPA
Model=GPT-4.1 mini
2026.04
52.72
FLOWBOT
Model=GPT-4.1 mini
2026.04
52.51
MetaHarness
Wall (minutes)=126
2026.05
52.3
CRO
Wall (minutes)=82
2026.05
51.3
Trace
Model=GPT-4.1 mini
2026.04
51.19
GEPA
Wall (minutes)=50
2026.05
50.1
MIPROv2
Model=GPT-4.1 mini
2026.04
49.15
TextGrad
Model=GPT-4.1 mini
2026.04
48.64
Baseline
Model=GPT-4.1 mini
2026.04
47.79
Baseline
2026.05
42.4
GEPA
Backbone=Qwen3 8B, Opt...
2025.07
38.61
Baseline
Backbone=Qwen3 8B
2025.07
36.9
MIPROv2
Backbone=Qwen3 8B
2025.07
36.22
GRPO
Backbone=Qwen3 8B, Opt...
2025.07
35.88
GEPA+Merge
Backbone=Qwen3 8B, Opt...
2025.07
28.23
Feedback
Search any
task
Search any
task