Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-token generation on GSM8K
Loading...
68.8
Accuracy
IA2 → SFT
44.464
50.782
57.1
63.418
Sep 26, 2025
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
IA2 → SFT
Backbone=Qwen3-4B-Base...
2025.09
68.8
ICL
Backbone=Qwen3-4B-Base...
2025.09
68.4
IA2 only
Backbone=Qwen3-4B-Base...
2025.09
66.2
SFT only
Backbone=Qwen3-4B-Base...
2025.09
64.5
w/o ICL
Backbone=Qwen3-4B-Base...
2025.09
45.4
Feedback
Search any
task
Search any
task