Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

REASONMAP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Visual ReasoningREASONMAP-PLUS
Weighted Accuracy88.95
16
Visual ReasoningREASONMAP Long questions
Weighted Accuracy62.5
16
Visual ReasoningREASONMAP Short questions
Weighted Accuracy0.5998
16
High-level PlanningReasonMap L (long questions)
Weighted Accuracy0.0747
3
High-level PlanningReasonMap S (short questions)
Weighted Accuracy15.44
3
Showing 5 of 5 rows