Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Wino

Benchmarks

Task NameDataset NameSOTA ResultTrend
CommonSense ReasoningWino
Accuracy86.55
146
ReasoningWino (leave-one-out setup)
Accuracy (Wino LOO)86.2
12
Showing 2 of 2 rows