Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Entity Counting on NL Counting (held-out)
Loading...
100
Accuracy
Oracle
7.752
31.701
55.65
79.599
May 5, 2026
Accuracy
Updated 28d ago
Evaluation Results
Method
Method
Links
Accuracy
Oracle
Model=Qwen3-8B, Parame...
2026.05
100
DPS (hard, multi-seed protocol)
Model=Qwen3-8B, Parame...
2026.05
98.7
Full lm_head (held-out)
Model=Qwen3-8B, Parame...
2026.05
94.2
9-row lm_head (held-out)
Model=Qwen3-8B, Parame...
2026.05
93.8
9-row lm_head (held-out)
Model=Mistral-7B, Para...
2026.05
92
LoRA Q/V rank-16 (fullvocab)
Model=Qwen3-8B, Parame...
2026.05
91.7
9-row lm_head + DPS (matched train, 3 seeds)
Model=Qwen3-14B, Param...
2026.05
90.3
LoRA (all 36 layers, constrained-gen)
Model=Qwen3-8B, Parame...
2026.05
84
DPS layer sweep (10 layers, best)
Model=Qwen3-14B, Param...
2026.05
28.7
DPS (α ∈ [5, 500], best)
Model=Qwen3-14B, Param...
2026.05
26.5
Baseline (vanilla)
Model=Mistral-7B, Para...
2026.05
26.2
Baseline (vanilla)
Model=Qwen3-14B, Param...
2026.05
24.5
Shuffled-row control
Model=Qwen3-14B, Param...
2026.05
12
Baseline (vanilla)
Model=Qwen3-8B, Parame...
2026.05
11.3
Feedback
Search any
task
Search any
task