Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Standard benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Composed Image RetrievalStandard Benchmarks CIRR, FashionIQ, GeneCIS
Average Performance38.3
10
Language Modeling and Question AnsweringStandard Benchmarks (ARC-E, ARC-C, BoolQ, HellaSwag, OBQA, PIQA, WinoGrande, MMLU, SciQ) (test)
ARC-E Acc (Norm)49.75
8
Text-to-imageStandard text-to-image benchmarks
CLIP Score97.28
6
Showing 3 of 3 rows