Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MileBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-modal Long-context BenchmarkingMileBench
Task T Score57.23
39
Multi-image understandingMileBench (test)
Temporal Multi-Image Score (Task T)57.3
21
Multi-image Multi-modal Question AnsweringMileBench
CL-CH Score44.76
18
Long-context multimodal evaluationMileBench (test)
TN Score25.34
18
Showing 4 of 4 rows