Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MAGMaR

Benchmarks

Task NameDataset NameSOTA ResultTrend
Article GenerationMAGMaR oracle (leaderboard snapshot)
Human Preference Score3.833
15
Grounded Multi-Video Question AnsweringMAGMaR (test)
Reference Precision (Ref-P)82.2
11
Multi-video Grounding and RetrievalMAGMaR Oracle Track 2026 (val)
Human Evaluation Score3.833
11
Video RetrievalMAGMaR 2026
nDCG@100.759
8
Report GenerationMAGMaR (test)
ROUGE-L18.39
5
Information RetrievalMAGMaR (final)
Average Score75.9
5
Multimodal RAG EvaluationMAGMaR Extrinsic Quality Judgment (EQJ)
Information Preservation (Retrieval)65
3
Showing 7 of 7 rows