Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Chart-based Reasoning on CharXivRQ
Loading...
64.2
Accuracy
Claude-3.7-Sonnet
43.608
48.954
54.3
59.646
Feb 9, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Claude-3.7-Sonnet
Learning Protocol=Clos...
2026.02
64.2
Octopus-8B (Ours)
Rollout count (rollout...
2026.02
55.7
Qwen3-VL-8B-Instruct + GSPO
Rollout count (rollout...
2026.02
55.3
OpenAI-o1
Learning Protocol=Clos...
2026.02
55.1
MiMo-VL-7B-SFT
Learning Protocol=Open...
2026.02
54.8
MiMo-VL-7B-RL
Learning Protocol=Open...
2026.02
53.2
Qwen3-VL-8B-Thinking
Learning Protocol=Open...
2026.02
53
Qwen3-VL-8B-Instruct + DAPO
Rollout count (rollout...
2026.02
52.8
Qwen3-VL-8B-Instruct + SRPO
Rollout count (rollout...
2026.02
52.7
Qwen3-VL-8B-Instruct + GRPO
Rollout count (rollout...
2026.02
51.4
Qwen3-VL-8B-Instruct + SRPO
Rollout count (rollout...
2026.02
51.2
Qwen3-VL-8B-Instruct + GRPO
Rollout count (rollout...
2026.02
50.7
GPT-4o
Learning Protocol=Clos...
2026.02
50.5
Qwen3-VL-8B-Instruct + GSPO
Rollout count (rollout...
2026.02
47.9
Qwen3-VL-8B-Instruct
Learning Protocol=Base...
2026.02
45.1
InternVL3.5-8B-RL
Learning Protocol=Open...
2026.02
44.4
Feedback
Search any
task
Search any
task