Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CLadder

Benchmarks

Task NameDataset NameSOTA ResultTrend
Causal ReasoningCLadder 14 (original)
NLL0.465
14
Causal ReasoningCLadder
Exact Match99.89
9
Causal ReasoningCLadder 1.0 (test)
Overall Acc94.8
7
Showing 3 of 3 rows