Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CodaSet

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Language ModelingCodaSet OOD Average (test)
Performance (%)87.84
16
Holistic EvaluationCodaSet ID Average (test)
Accuracy90.6
16
Instruction FollowingCodaSet ID IFEVAL (test)
Accuracy88.76
16
Symbolic and Logical ReasoningCodaSet BBH ID (test)
Accuracy94.29
16
Mathematical ReasoningCodaSet ID GSM8k (test)
Accuracy0.964
16
Showing 5 of 5 rows