Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MLLM-as-a-Judge

Benchmarks

Task NameDataset NameSOTA ResultTrend
Large Multimodal Model EvaluationMLLM-as-a-Judge v1.0 (test)
Overall Score49
16
Pointwise ScoringMLLM-as-a-Judge in-domain v1.0 (test)
ImageDC Score80.2
9
Showing 2 of 2 rows