Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Counterfactual reasoning

Benchmarks

Task NameDataset NameSOTA ResultTrend
Counterfactual reasoningCounterfactual reasoning Agent synthetic (test)
Accuracy99.7
16
Counterfactual reasoningCounterfactual reasoning Human Amazon Mechanical Turk (test)
Metric-
0
Showing 2 of 2 rows