| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Causal Reasoning | Cladder | Accuracy82.7 | 20 | |
| Causal Reasoning | CLadder 14 (original) | NLL0.465 | 14 | |
| Causal Reasoning | Cladder AceReason (Reduced) | Accuracy80.2 | 10 | |
| Causal Reasoning | Cladder AceReason (Complete) | Accuracy81.2 | 10 | |
| Causal Reasoning | CLadder | Exact Match99.89 | 9 | |
| Causal Reasoning | CLadder 1.0 (test) | Overall Acc94.8 | 7 | |
| Finetuning domain recovery | CLadder | Recovery Score (Grader 1)5 | 4 |