MileBench

Benchmarks

Task Name	Dataset Name	SOTA Result
Multi-modal Long-context Benchmarking	MileBench	Task T Score57.23	39
Multi-image understanding	MileBench (test)	Temporal Multi-Image Score (Task T)57.3	21
Multi-image Multi-modal Question Answering	MileBench	CL-CH Score44.76	18
Long-context multimodal evaluation	MileBench (test)	TN Score25.34	18
Multi-image reasoning	MileBench	T-1 Score44.7	9
Multimodal Long-Context Understanding	MileBench	Object Existence51.5	8

Showing 6 of 6 rows