Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
VLM Reasoning on NumberLine (In-Distribution)
Loading...
100
Success Rate
GFlowVLM w/ Var-TB
19.92
40.71
61.5
82.29
Mar 9, 2025
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
GFlowVLM w/ Var-TB
Train Data=On-Policy,...
2025.03
100
GFlowVLM w/ SubTB
Train Data=On-Policy,...
2025.03
100
GFlowVLM w/ DB
Train Data=On-Policy,...
2025.03
100
GFlowVLM w/ Var-TB
Train Data=Off-Policy,...
2025.03
100
GFlowVLM w/ SubTB
Train Data=Off-Policy,...
2025.03
100
GFlowVLM w/ DB
Train Data=Off-Policy,...
2025.03
100
RL4VLM+
Train Data=On-Policy,...
2025.03
90.3
RL4VLM
Train Data=On-Policy,...
2025.03
89.4
RL4VLM*
Train Data=On-Policy,...
2025.03
34.8
GFlowVLM w/ SubTB
Train Data=Off-Policy,...
2025.03
34.4
GFlowVLM w/ DB
Train Data=Off-Policy,...
2025.03
33.1
SFT-w/o- [DONE]
Train Data=Off-Policy,...
2025.03
24.8
GFlowVLM w/ DB
Train Data=On-Policy,...
2025.03
24.3
SFT-w/- [DONE]
Train Data=Off-Policy,...
2025.03
24
GFlowVLM w/ SubTB
Train Data=On-Policy,...
2025.03
23
Feedback
Search any
task
Search any
task