Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VisualProcessBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Step VerificationVisualProcessBench (test)
Macro-F1 (DynaMath)71.32
28
Multimodal ReasoningVisualProcessBench Overall
FEI Average54.6
20
Reasoning Step JudgmentVisualProcessBench
MMMU Score74.1
15
Showing 3 of 3 rows