Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Causal Reasoning on CaLM
Loading...
73.9
Accuracy
UNICO
45.196
52.648
60.1
67.552
May 24, 2026
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy
UNICO
Base Model=Qwen3-8B, D...
2026.05
73.9
UNICO
Base Model=Qwen3-4B, D...
2026.05
70.3
Olmo3.1-32B-Instruct
Base Model=Olmo3.1-32B...
2026.05
70.1
Qwen3-32B
Base Model=Qwen3-32B
2026.05
69.2
CauGym
Base Model=Qwen3-8B, D...
2026.05
67.1
CauGym
Base Model=Qwen3-4B, D...
2026.05
65.5
UNICO
Base Model=Olmo3-7B-In...
2026.05
61.2
CauGym
Base Model=Olmo3-7B-In...
2026.05
60.6
Original
Base Model=Qwen3-8B, D...
2026.05
60.4
Original
Base Model=Qwen3-4B, D...
2026.05
59.3
Original
Base Model=Olmo3-7B-In...
2026.05
56.6
CDCR
Base Model=Olmo3-7B-In...
2026.05
54.6
CDCR
Base Model=Qwen3-8B, D...
2026.05
51.6
CDCR
Base Model=Qwen3-4B, D...
2026.05
46.3
Feedback
Search any
task
Search any
task