Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GOAT-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Meme Abuse DetectionGOAT-Bench 1.0 (test)
Harmfulness Accuracy75.1
32
Multi-Modal Lifelong NavigationGOAT-Bench unseen (val)
SR62.7
22
Lifelong Visual NavigationGOAT-Bench 1/10-scale subset (val-unseen)
Success Rate72.8
13
Harmful Meme DetectionGOAT-Bench In-Domain
Racism F188.5
11
Harmful Meme DetectionGOAT-Bench (Out-Of-Domain)
Racism F187.1
7
Multi-modal Lifelong NavigationGOAT-Bench Seen-Synonyms (val)
SR66.8
6
Multi-modal Lifelong NavigationGOAT-Bench Seen (val)
SR65.5
6
Showing 7 of 7 rows