Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
IHEval Prompt Extraction 1.0 (Reference)
Loading...
96.9
Accuracy
Qwen3-4B-it-NSHA-DPO
20.98
40.69
60.4
80.11
Apr 10, 2026
Accuracy
Updated 6d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-4B-it-NSHA-DPO
Backbone=Qwen3-4B, Met...
2026.04
96.9
Qwen3-4B-it-NSHA-HCAL
Backbone=Qwen3-4B, Met...
2026.04
96.9
Qwen3-4B-it
Backbone=Qwen3-4B, Met...
2026.04
96.2
Qwen3-4B-it-NS
Backbone=Qwen3-4B, Met...
2026.04
96.2
Llama3.1-8B-NSHA-DPO
Backbone=Llama3.1-8B,...
2026.04
94.7
Qwen3-4B-it-CoT
Backbone=Qwen3-4B, Met...
2026.04
83.3
Llama3.1-8B-CoT
Backbone=Llama3.1-8B,...
2026.04
81.1
Llama3.1-8B-NSHA-HCAL
Backbone=Llama3.1-8B,...
2026.04
73.6
Llama3.1-8B-NS
Backbone=Llama3.1-8B,...
2026.04
72.6
Llama3.1-8B
Backbone=Llama3.1-8B,...
2026.04
70.1
Qwen3-4B-it-NSHA-SFT
Backbone=Qwen3-4B, Met...
2026.04
69.2
Llama3.1-8B-NSHA-SFT
Backbone=Llama3.1-8B,...
2026.04
23.9
Feedback
Search any
task
Search any
task