Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CICERO

Benchmarks

Task NameDataset NameSOTA ResultTrend
Dialogue Commonsense ReasoningCICERO v2 (test)
Accuracy93.25
4
Dialogue Commonsense ReasoningCICERO v1 (test)
Accuracy88.04
4
Multiple Choice QuestionCICERO v2
Macro F188.63
2
Multiple Choice QuestionCICERO
Macro F170.66
2
Showing 4 of 4 rows