Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on AIME 32K generation 25
Loading...
74
Accuracy
OSCAR
51.4632
57.3141
63.165
69.0159
May 18, 2026
Accuracy
BPE
Updated 14d ago
Evaluation Results
Method
Method
Links
Accuracy
BPE
OSCAR
Backbone=Qwen3-32B, Ge...
2026.05
74
2.38
Original BF16
Backbone=Qwen3-32B, Ge...
2026.05
72.59
16
Kitty
Backbone=Qwen3-32B, Ge...
2026.05
69.26
2.39
OSCAR
Backbone=Qwen3-8B, Gen...
2026.05
66.67
2.38
Original BF16
Backbone=Qwen3-8B, Gen...
2026.05
66
16
Kitty
Backbone=Qwen3-8B, Gen...
2026.05
59.67
2.39
KIVI-KV2*
Backbone=Qwen3-32B, Ge...
2026.05
59.05
2.26
KIVI-KV2*
Backbone=Qwen3-8B, Gen...
2026.05
57.67
2.26
KIVI-KV2
Backbone=Qwen3-32B, Ge...
2026.05
57.41
2.25
KIVI-KV2
Backbone=Qwen3-8B, Gen...
2026.05
52.33
2.25
Feedback
Search any
task
Search any
task