Share your thoughts, 1 month free Claude Pro on usSee more

Long Context Understanding on HELMET

68.5Accuracy

Synthetic Reasoning

Updated 13d ago

Evaluation Results

Method	Links
Synthetic Reasoning 2026.03		68.5
Qwen3 VL 235B A22B Instruct 2026.03		67.6
No-think 2026.03		65.9
Plain Distillation 2026.03		65.7
LongCat-Flash Exp-Chat 2025.12		64.7
GLM 4.6 2025.12		64.6
Qwen Thinking Traces 2026.03		64.1
Qwen3 VL 32B Instruct 2026.03		63
LongPO 2026.03		62.9
Synthetic Reasoning 2026.03		62.6
DeepSeek V3.2 2025.12		59.5
LongCat-Flash Chat 2025.12		59.1
No-think 2026.03		55.8
Plain Distillation 2026.03		53.1
Mistral 3.1 Small 24B 2026.03		37