Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Long-context reasoning on InfiniteBench (test)

87.63Reasoning Pa Score

Qwen3-4B-SFT w/ DAPO

85.685286.190186.69587.1999Feb 5, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
87.6384.7517.7814.8310.3954.5944.99
2026.02
87.4489.7615.7227.422.3158.4550.18
2026.02
86.9387.0516.0721.8517.3956.2947.6
2026.02
86.7883.4412.7612.689.6553.2843.1
2026.02
85.9383.5611.8627.4523.5456.3348.11
2026.02
85.7686.4414.1825.0914.4754.1546.68