Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
VLM Reasoning on NumberLine Out-of-Distribution
Loading...
18.6
Success Rate
GFlowVLM w/ DB
-0.744
4.278
9.3
14.322
Mar 9, 2025
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
GFlowVLM w/ DB
Train Data=Off-Policy,...
2025.03
18.6
GFlowVLM w/ Var-TB
Train Data=Off-Policy,...
2025.03
17.3
GFlowVLM w/ SubTB
Train Data=Off-Policy,...
2025.03
16.7
GFlowVLM w/ DB
Train Data=On-Policy,...
2025.03
9.1
GFlowVLM w/ SubTB
Train Data=On-Policy,...
2025.03
7
GFlowVLM w/ Var-TB
Train Data=On-Policy,...
2025.03
6.2
RL4VLM+
Train Data=On-Policy,...
2025.03
4.4
RL4VLM
Train Data=On-Policy,...
2025.03
3.1
RL4VLM*
Train Data=On-Policy,...
2025.03
1.9
SFT-w/o- [DONE]
Train Data=Off-Policy,...
2025.03
0
SFT-w/- [DONE]
Train Data=Off-Policy,...
2025.03
0
GFlowVLM w/ SubTB
Train Data=On-Policy,...
2025.03
0
GFlowVLM w/ DB
Train Data=On-Policy,...
2025.03
0
GFlowVLM w/ SubTB
Train Data=Off-Policy,...
2025.03
0
GFlowVLM w/ DB
Train Data=Off-Policy,...
2025.03
0
Feedback
Search any
task
Search any
task