Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Counterfactual Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Counterfactual ReasoningCounterfactual Eval (dev)
Mean Score63.4
52
Logical and Mathematical Reasoning under CounterfactualsCounterfactual Eval Manual Initialization 5 random samples 1.0 (train and dev)
Arithmetic Base 8 (Mean)32
4
Showing 2 of 2 rows