Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Visual Language Model Evaluation on MMVet V2
Loading...
52.6
MMVet V2 Score
POW3R dynamic
38.248
41.974
45.7
49.426
May 19, 2026
MMVet V2 Score
Updated 14d ago
Evaluation Results
Method
Method
Links
MMVet V2 Score
POW3R dynamic
Base policy=Qwen3-VL-8B
2026.05
52.6
POW3R dynamic
Base policy=Qwen3-VL-4B
2026.05
52.2
Category-balanced
Base policy=Qwen3-VL-4B
2026.05
51.8
Category-balanced
Base policy=Qwen3-VL-8B
2026.05
51.8
Static scalar
Base policy=Qwen3-VL-4B
2026.05
51
Static scalar
Base policy=Qwen3-VL-8B
2026.05
51
Base
Base policy=Qwen3-VL-4B
2026.05
49.9
Binary
Base policy=Qwen3-VL-4B
2026.05
49.8
Base
Base policy=Qwen3-VL-8B
2026.05
49.7
Binary
Base policy=Qwen3-VL-8B
2026.05
49.5
POW3R dynamic
Base policy=Gemma3-4B
2026.05
40.8
Category-balanced
Base policy=Gemma3-4B
2026.05
40.2
Static scalar
Base policy=Gemma3-4B
2026.05
39.5
Base
Base policy=Gemma3-4B
2026.05
38.9
Binary
Base policy=Gemma3-4B
2026.05
38.8
Feedback
Search any
task
Search any
task