Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

COMMON2SENSE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Three-way probability rankingCOMMON2SENSE paired
F1 (C1)76.2
30
Binary decisionCOMMON2SENSE
Accuracy95.1
27
Pairwise preference evaluationCommon2sense
Context 1 Preference Score61
21
Showing 3 of 3 rows