Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Long-context Temporal Reasoning on EventQA

0.856Accuracy (64K)

Qwen3-4B RL finetuned on HanabiRewards

0.839360.843680.8480.85232Jan 26, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
0.8560.6680.436
2026.01
0.840.6260.372