Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CLadder

Benchmarks

Task NameDataset NameSOTA ResultTrend
Causal ReasoningCladder
Accuracy82.7
20
Causal ReasoningCLadder 14 (original)
NLL0.465
14
Causal ReasoningCladder AceReason (Reduced)
Accuracy80.2
10
Causal ReasoningCladder AceReason (Complete)
Accuracy81.2
10
Causal ReasoningCLadder
Exact Match99.89
9
Causal ReasoningCLadder 1.0 (test)
Overall Acc94.8
7
Finetuning domain recoveryCLadder
Recovery Score (Grader 1)5
4
Showing 7 of 7 rows