Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Visual Equivalence Reward Modeling on Visual-ERM-Bench AVG
Loading...
42.1
F1h Score
Visual-ERM
1.228
11.839
22.45
33.061
Mar 13, 2026
F1h Score
F1s Score
Sc Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1h Score
F1s Score
Sc Score
Visual-ERM
Category=Open-source,...
2026.03
42.1
44.7
58.4
Gemini-3-Flash
Category=Proprietary
2026.03
40.6
43.4
53.4
Gemini-2.5-Pro
Category=Proprietary
2026.03
37.8
40.9
59.1
GPT-5.2
Category=Proprietary
2026.03
32.7
35
58.9
Qwen3-VL-235B-Instruct
Category=Open-source
2026.03
29.5
32.4
56.2
GPT-4o
Category=Proprietary
2026.03
25
29.5
56.5
InternVL3.5-8B
Category=Open-source
2026.03
6.7
9.6
32.5
Qwen3-VL-8B-Instruct
Category=Open-source
2026.03
5.3
6.5
17.5
Qwen2.5-VL-7B
Category=Open-source
2026.03
2.8
5.1
15.2
Feedback
Search any
task
Search any
task