Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WG

Benchmarks

Task NameDataset NameSOTA ResultTrend
Common Sense ReasoningWG
Accuracy94.1
38
Commonsense ReasoningWG-S
Accuracy70.9
18
Harmful RefusalWG (test)
ASR13.8
7
Showing 3 of 3 rows