Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Get Webpage on IHEval v1 (Conflict)
Loading...
39.8
Accuracy
Qwen3-4B-it-CoT
0.384
10.617
20.85
31.083
Apr 10, 2026
Accuracy
Updated 6d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-4B-it-CoT
Backbone=Qwen3-4B-it,...
2026.04
39.8
Qwen3-4B-it-NSHA-DPO
Backbone=Qwen3-4B-it,...
2026.04
36.9
Qwen3-4B-it-NSHA-HCAL
Backbone=Qwen3-4B-it,...
2026.04
36.3
Qwen3-4B-it-NS
Backbone=Qwen3-4B-it,...
2026.04
36.2
Qwen3-4B-it-NSHA-SFT
Backbone=Qwen3-4B-it,...
2026.04
31.6
Llama3.1-8B-NS
Backbone=Llama3.1-8B,...
2026.04
12.7
Llama3.1-8B-NSHA-HCAL
Backbone=Llama3.1-8B,...
2026.04
11.5
Llama3.1-8B-CoT
Backbone=Llama3.1-8B,...
2026.04
9.2
Llama3.1-8B
Backbone=Llama3.1-8B
2026.04
7.8
Llama3.1-8B-NSHA-DPO
Backbone=Llama3.1-8B,...
2026.04
5
Llama3.1-8B-NSHA-SFT
Backbone=Llama3.1-8B,...
2026.04
4.1
Qwen3-4B-it
Backbone=Qwen3-4B-it
2026.04
1.9
Feedback
Search any
task
Search any
task