Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Visual Equivalence Reward Modeling on Visual-ERM-Bench SVG
Loading...
33.3
F1h Score
Gemini-3-Flash
0.644
9.122
17.6
26.078
Mar 13, 2026
F1h Score
F1s Score
Sc Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1h Score
F1s Score
Sc Score
Gemini-3-Flash
Category=Proprietary
2026.03
33.3
67.5
64.3
Gemini-2.5-Pro
Category=Proprietary
2026.03
29.3
34.3
63.3
GPT-5.2
Category=Proprietary
2026.03
28.5
32.2
61.1
Visual-ERM
Category=Open-source,...
2026.03
28.3
32.6
59.6
Qwen3-VL-235B-Instruct
Category=Open-source
2026.03
19.4
22.8
51.5
GPT-4o
Category=Proprietary
2026.03
13
19.3
50.3
InternVL3.5-8B
Category=Open-source
2026.03
6.1
13.1
48.9
Qwen3-VL-8B-Instruct
Category=Open-source
2026.03
6.1
9.4
27.1
Qwen2.5-VL-7B
Category=Open-source
2026.03
1.9
7.5
37.9
Feedback
Search any
task
Search any
task