Causal Reasoning

Benchmarks

Dataset Name	SOTA Method	Metric
COPA		Accuracy95	63	1mo ago
XCOPA	PaLM 2	Accuracy94.4	55	1mo ago
BBH Causal Judgement	evaluation-instructed prompt optimization	Accuracy (BBH Causal Judgement)78	40	1mo ago
XCOPA (test)	PaLM 2	Accuracy (th)96.4	31	1mo ago
CRASS OOD (held-out)	Qwen2.5-32B	Performance P095	27	1mo ago
Corr2Cause	LLaMA-7B	Accuracy97.5	22	1mo ago
Cladder	CDCR	Accuracy82.7	20	2mo ago
e-CARE n = 200	Vernier	P0 Score82.3	14	1mo ago
ExecCF	UNICO	Accuracy80.4	14	2mo ago
CaLM	UNICO	Accuracy73.9	14	2mo ago
Com2	Qwen3-32B	Accuracy79.8	14	2mo ago
BBEH	UNICO	Accuracy (Causal Reasoning)55.2	14	2mo ago
XCOPA		Accuracy (ZH)99	14	3mo ago
CLadder 14 (original)		NLL0.465	14	4mo ago
e-CARE	SE-GPT	Accuracy86.9	14	4mo ago
e-CARE OOD (held-out)	Qwen2.5-14B	P0 Score82.3	13	1mo ago
CLadder ID (held-out)	Mistral-7B	P0 Score98	13	1mo ago
XCOPA	TokAlign + LAT	Accuracy (zh)55.5	12	4mo ago
Copa100	Our Trained Model	Accuracy83	12	4mo ago
Cladder AceReason (Reduced)	Model-first Greedy	Accuracy80.2	10	2mo ago
Cladder AceReason (Complete)	Model-first Greedy	Accuracy81.2	10	2mo ago
NoisyCausal	Graph-Guided (Ours)	Accuracy (W/O Noise)80.7	10	2mo ago
IndicCOPA IndicXTREME (test)	IFT	Average F1 Score45.45	10	4mo ago
CLadder	ARYA	Exact Match99.89	9	4mo ago
XCOPA ET	Llama-3.2-3B	Accuracy71.8	8	4mo ago

Showing 25 of 42 rows