Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CounterBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logical ReasoningCounterBench (test)
Accuracy88.9
55
Causal InferenceCounterBench
Accuracy91.8
40
Counterfactual ReasoningCounterBench
Basic Score80.8
20
ReasoningCounterBench
Error Rate0.0359
11
Showing 4 of 4 rows