Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WhatsUp

Benchmarks

Task NameDataset NameSOTA ResultTrend
Visual World ModellingWhatsUp
GPT-4o Score8.2
18
Visual Question AnsweringWhatsUp
Accuracy99.2
10
Spatial ReasoningWhatsUp-B
Binary Robust Accuracy99.9
9
Forward-dynamics PredictionWhatsUp AURORA-BENCH
GPT-4o Score3.3
9
Action-centric EditingWhatsUp (test)
Human Evaluation Score0.25
4
Vision-Language Spatial ReasoningWhatsUp B-FB 2x2 directional variants
GroupScore66.67
3
Vision-Language Spatial ReasoningWhatsUp B-LR 2x2 directional variants
GroupScore82.84
3
Vision-Language Spatial ReasoningWhatsUp A-OU 2x2 directional variants
GroupScore99.03
3
Vision-Language Spatial ReasoningWhatsUp A-LR 2x2 directional variants
GroupScore95.87
3
Vision-Language CompositionalityWhatsUp B (1 × 4 groups)
GroupScore49.94
2
Vision-Language CompositionalityWhatsUp A 1 × 4 groups
GroupScore56.8
2
Showing 11 of 11 rows