Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Instruction Following on IF
Loading...
39.2
IF Score
Chat-Math-IF
24.848
28.574
32.3
36.026
Feb 2, 2026
IF Score
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
IF Score
Average Score
Chat-Math-IF
Backbone=Qwen-2.5-7B,...
2026.02
39.2
40.3
Modular Gradient Surgery
Backbone=Qwen-2.5-7B,...
2026.02
33.7
45
Normal Mixing
Backbone=Qwen-2.5-7B,...
2026.02
33.2
44.6
Global Surgery
Backbone=Llama-3.1-8B,...
2026.02
31.1
30.3
Modular Gradient Surgery
Backbone=Llama-3.1-8B,...
2026.02
30
32.6
Global Surgery
Backbone=Qwen-2.5-7B,...
2026.02
29.9
43
Math-Chat-IF
Backbone=Qwen-2.5-7B,...
2026.02
26.2
20.2
Normal Mixing
Backbone=Llama-3.1-8B,...
2026.02
25.4
27.3
Feedback
Search any
task
Search any
task