Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MuirBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-image ReasoningMuirBench
Accuracy77.2
61
Multi-image UnderstandingMuirBench
Score68
26
Multi-image reasoningMuirbench (test)
Accuracy68
24
Multi-Image UnderstandingMuirBench (test)
Accuracy68
21
Visual Question AnsweringMuirBench
Accuracy78.6
19
Multi-modal ReasoningMUIRBENCH
Difference Reasoning Accuracy92.94
19
Multi-Image UnderstandingMuirBench 142 (test)
Score86.1
19
Multi-image UnderstandingMuirBench Multi-image Understanding
Accuracy62.3
17
Multi-modal UnderstandingMuirBench
Score59.6
16
Multi-image hallucination evaluationMUIRBench
Accuracy62
12
Vision-Centric ReasoningMuirBench
Accuracy68
11
Multimodal ReasoningMuirBench
Accuracy57.14
11
Vision-Centric UnderstandingMuirBench
Accuracy68
10
Procedural Temporal UnderstandingMuirBench (test)
Overall Score65.04
7
General Visual Question AnsweringMuirBench
Score70.7
5
Multi-image Visual Question AnsweringMUIRBench
Accuracy76.4
4
Comprehensive Multi-imageMuirBench
Accuracy62.3
4
Multi-image Multi-modal UnderstandingMuirBench
Accuracy41.8
2
Showing 18 of 18 rows