Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Entity counting on Entity counting Qwen3-8B prompts N=200x3 seeds (test)
Loading...
97
Greedy Generation Accuracy
LoRA Q/V rank-16 (entity-counting only)
-3.88
22.31
48.5
74.69
May 5, 2026
Greedy Generation Accuracy
Digit-Restricted Next-Token Accuracy
Full-Vocab Next-Token Accuracy
Updated 28d ago
Evaluation Results
Method
Method
Links
Greedy Generation Accuracy
Digit-Restricted Next-Token Accuracy
Full-Vocab Next-Token Accuracy
LoRA Q/V rank-16 (entity-counting only)
Task scenario=entity-o...
2026.05
97
-
-
Probe-round (oracle UB)
Note=injects probe at...
2026.05
96
98.7
-
LoRA Q/V rank-16
Rank=16, Trainable par...
2026.05
83.1
96
91.7
Hard DPS
Protocol=Diagnostic Pr...
2026.05
72.4
98.7
-
LoRA Q/V rank-16 (per-seed, multi-task)
Task scenario=multi-ta...
2026.05
71.5
-
-
Baseline (entity counting)
Intervention=no interv...
2026.05
7.2
13.7
0
9-row lm_head repair
Parameters rewritten=3...
2026.05
0
60.7
60.3
Soft DPS
Protocol=Diagnostic Pr...
2026.05
-
13.2
-
Norm rescaling (digit ×3)
Scaling Factor=3
2026.05
-
-
26.5
Feedback
Search any
task
Search any
task