Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Following Evaluation on IFEval Inverse
Loading...
83.7
Accuracy
Qwen3-30B
30.764
44.507
58.25
71.993
Oct 10, 2025
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-30B
Training Variant=M-DPOP
2025.10
83.7
Qwen3-4B
Training Variant=M-DPOP
2025.10
76.1
GLM-4-9B
Training Variant=M-DPOP
2025.10
74.2
GPT-5.2
Training Variant=Zero-...
2025.10
73.7
Gemini-3-Flash
Training Variant=Zero-...
2025.10
70.6
Gemma-3-4B
Training Variant=M-DPOP
2025.10
69.8
Claude-4.5-Sonnet
Training Variant=Zero-...
2025.10
67.2
Tulu-3.1-8B
Training Variant=M-DPOP
2025.10
65.3
Qwen3-30B
Training Variant=GRPO
2025.10
56.8
Qwen3-4B
Training Variant=GRPO
2025.10
51.2
Qwen3-30B
Training Variant=Base
2025.10
49.2
GLM-4-9B
Training Variant=GRPO
2025.10
48.5
Qwen3-4B
Training Variant=Base
2025.10
44.7
Gemma-3-4B
Training Variant=GRPO
2025.10
43.7
GLM-4-9B
Training Variant=Base
2025.10
42.1
Tulu-3.1-8B
Training Variant=GRPO
2025.10
40.2
Gemma-3-4B
Training Variant=Base
2025.10
38.5
Tulu-3.1-8B
Training Variant=Base
2025.10
32.8
Feedback
Search any
task
Search any
task