Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-context Question Answering on MuSiQue (F1)
Loading...
30.58
F1 Score
GLM-4.1V-9B-Thinking VERA
8.3968
14.1559
19.915
25.6741
Feb 9, 2026
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
GLM-4.1V-9B-Thinking VERA
RAG Strategy=Attention...
2026.02
30.58
GLM-4.1V-9B-Thinking ColPali RAG
RAG Strategy=ColPali RAG
2026.02
30.24
GLM-4.1V-9B-Thinking Random RAG
RAG Strategy=Random RAG
2026.02
28.71
GLM-4.1V-9B-Thinking Embedding RAG
RAG Strategy=Embedding...
2026.02
28.08
GLM-4.1V-9B-Thinking
RAG Strategy=Direct (N...
2026.02
27.85
Glyph
RAG Strategy=Direct (N...
2026.02
24.87
Qwen3-VL-8B-Instruct VERA
RAG Strategy=Attention...
2026.02
17.84
Qwen3-VL-8B-Instruct OCR RAG
RAG Strategy=OCR RAG
2026.02
14.7
Qwen3-VL-8B-Instruct Random RAG
RAG Strategy=Random RAG
2026.02
14.68
Qwen3-VL-8B-Instruct
RAG Strategy=Direct (N...
2026.02
9.25
Feedback
Search any
task
Search any
task