| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| XCOPA | PaLM 2 | Accuracy94.4 | 55 | 5d ago | |
| COPA | LAT | Accuracy90 | 51 | 14d ago | |
| BBH Causal Judgement | evaluation-instructed prompt optimization | Accuracy (BBH Causal Judgement)78 | 40 | 1d ago | |
| XCOPA (test) | PaLM 2 | Accuracy (th)96.4 | 31 | 2d ago | |
| Corr2Cause | LLaMA-7B | Accuracy97.5 | 22 | 6d ago | |
| Cladder | CDCR | Accuracy82.7 | 20 | 8d ago | |
| ExecCF | UNICO | Accuracy80.4 | 14 | 8d ago | |
| CaLM | UNICO | Accuracy73.9 | 14 | 8d ago | |
| Com2 | Qwen3-32B | Accuracy79.8 | 14 | 8d ago | |
| BBEH | UNICO | Accuracy (Causal Reasoning)55.2 | 14 | 8d ago | |
| XCOPA | Accuracy (ZH)99 | 14 | 1mo ago | ||
| CLadder 14 (original) | NLL0.465 | 14 | 3mo ago | ||
| e-CARE | SE-GPT | Accuracy86.9 | 14 | 3mo ago | |
| XCOPA | TokAlign + LAT | Accuracy (zh)55.5 | 12 | 3mo ago | |
| Copa100 | Our Trained Model | Accuracy83 | 12 | 3mo ago | |
| Cladder AceReason (Reduced) | Model-first Greedy | Accuracy80.2 | 10 | 8d ago | |
| Cladder AceReason (Complete) | Model-first Greedy | Accuracy81.2 | 10 | 8d ago | |
| NoisyCausal | Graph-Guided (Ours) | Accuracy (W/O Noise)80.7 | 10 | 27d ago | |
| IndicCOPA IndicXTREME (test) | IFT | Average F1 Score45.45 | 10 | 3mo ago | |
| CLadder | ARYA | Exact Match99.89 | 9 | 2mo ago | |
| XCOPA ET | Llama-3.2-3B | Accuracy71.8 | 8 | 3mo ago | |
| XCOPA | asafaya/kanarya-2b | XCOPA Causal Reasoning Score64.2 | 8 | 2mo ago | |
| CLadder 1.0 (test) | Human | Overall Acc94.8 | 7 | 3mo ago | |
| EXTENDED CORR2CAUSE d = 7–24 | A-CBO | Parity84.8 | 6 | 6d ago | |
| AITP 1.0 (test) | AITP | BLEU0.0382 | 5 | 1mo ago |