Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Wino

Benchmarks

Task NameDataset NameSOTA ResultTrend
CommonSense ReasoningWino
Accuracy77.4
102
ReasoningWino (leave-one-out setup)
Accuracy (Wino LOO)86.2
12
Showing 2 of 2 rows