Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context evaluation on LongBench v2

59.76Overall Score

Qwen3-235B-A22B-Thinking

24.857633.918842.9852.0412Mar 30, 2026Apr 7, 2026Apr 16, 2026Apr 25, 2026May 3, 2026May 12, 2026May 21, 2026
Updated 12d ago

Evaluation Results

MethodLinks
2026.05
59.76-----
2026.05
48.9-----
2026.05
47.87-----
2026.03
31.234.429.337.825.132.4
2026.03
3135.428.337.22630.6
2026.03
29.932.92833.827.931.5
2026.03
29.833.327.737.822.331.5
2026.03
29.229.52931.826.929.4
2026.03
26.227.625.43022.327.8