Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
VLM Reasoning on BlackJack (In-Distribution)
Loading...
43.8
Success Rate
GFlowVLM w/ DB
5.32
15.31
25.3
35.29
Mar 9, 2025
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
GFlowVLM w/ DB
Train Data=Off-Policy,...
2025.03
43.8
GFlowVLM w/ Var-TB
Train Data=Off-Policy,...
2025.03
43
GFlowVLM w/ SubTB
Train Data=Off-Policy,...
2025.03
42.4
GFlowVLM w/ DB
Train Data=On-Policy,...
2025.03
42.2
GFlowVLM w/ SubTB
Train Data=On-Policy,...
2025.03
41.7
GFlowVLM w/ Var-TB
Train Data=On-Policy,...
2025.03
41.4
RL4VLM+
Train Data=On-Policy,...
2025.03
41
RL4VLM
Train Data=On-Policy,...
2025.03
40.2
RL4VLM*
Train Data=On-Policy,...
2025.03
23.5
SFT-w/o- [DONE]
Train Data=Off-Policy,...
2025.03
23.1
SFT-w/- [DONE]
Train Data=Off-Policy,...
2025.03
20.2
GFlowVLM w/ SubTB
Train Data=Off-Policy,...
2025.03
17.4
GFlowVLM w/ DB
Train Data=Off-Policy,...
2025.03
13.8
GFlowVLM w/ SubTB
Train Data=On-Policy,...
2025.03
8.4
GFlowVLM w/ DB
Train Data=On-Policy,...
2025.03
6.8
Feedback
Search any
task
Search any
task