Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Visual Equivalence Reward Modeling on Visual-ERM-Bench Table
Loading...
56.4
F1 Score (h)
Visual-ERM
0.136
14.743
29.35
43.957
Mar 13, 2026
F1 Score (h)
F1 Score (s)
Score (Sc)
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 Score (h)
F1 Score (s)
Score (Sc)
Visual-ERM
Category=Open-source,...
2026.03
56.4
57.6
74.8
Gemini-3-Flash
Category=Proprietary
2026.03
48.1
50.1
45.6
Gemini-2.5-Pro
Category=Proprietary
2026.03
46.4
48
49.9
GPT-5.2
Category=Proprietary
2026.03
39.3
40.6
54.6
Qwen3-VL-235B-Instruct
Category=Open-source
2026.03
35.7
37.4
56.2
GPT-4o
Category=Proprietary
2026.03
32.9
35.7
49.5
InternVL3.5-8B
Category=Open-source
2026.03
9.9
10.9
31.7
Qwen3-VL-8B-Instruct
Category=Open-source
2026.03
7
7.8
21.4
Qwen2.5-VL-7B
Category=Open-source
2026.03
2.3
3.1
12.6
Feedback
Search any
task
Search any
task