Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OpenAI

Benchmarks

Task NameDataset NameSOTA ResultTrend
Text-based safety moderationOpenAI
F1 Score82.3
26
Question AnsweringOpenAI (in-domain)
Accuracy0.8956
12
ClusteringOpenAI
Clustering Time (s)13.4
4
Diverse Nearest Neighbor SearchOpenAI dataset
Search Cost0.331
4
Content ModerationOpenAI Out-of-Distribution
Pornography Score82.6
2
Vector Similarity SearchOpenAI
Build Time (s)33.63
2
Showing 6 of 6 rows