Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Visual Equivalence Reward Modeling on Visual-ERM-Bench Chart
Loading...
39.9
F1h
Visual-ERM
1.004
11.102
21.2
31.298
Mar 13, 2026
F1h
F1s
Sc
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1h
F1s
Sc
Visual-ERM
Category=Open-source,...
2026.03
39.9
42.8
61.2
Gemini-3-Flash
Category=Proprietary
2026.03
38.5
41.3
62.8
Gemini-2.5-Pro
Category=Proprietary
2026.03
33.7
37.5
61.8
GPT-5.2
Category=Proprietary
2026.03
30.1
32.6
64.8
Qwen3-VL-235B-Instruct
Category=Open-source
2026.03
28
31.8
47.2
GPT-4o
Category=Proprietary
2026.03
22.8
28.3
48.5
Qwen2.5-VL-7B
Category=Open-source
2026.03
3.9
5.4
11.2
Qwen3-VL-8B-Instruct
Category=Open-source
2026.03
3.3
3.5
3.8
InternVL3.5-8B
Category=Open-source
2026.03
2.5
5.7
11
Feedback
Search any
task
Search any
task