Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Causal Reasoning on ExecCF
Loading...
80.4
Accuracy
UNICO
30.48
43.44
56.4
69.36
May 24, 2026
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy
UNICO
Base Model=Qwen3-8B, D...
2026.05
80.4
UNICO
Base Model=Qwen3-4B, D...
2026.05
76.7
UNICO
Base Model=Olmo3-7B-In...
2026.05
72.8
CauGym
Base Model=Qwen3-8B, D...
2026.05
69.6
Olmo3.1-32B-Instruct
Base Model=Olmo3.1-32B...
2026.05
65.2
Qwen3-32B
Base Model=Qwen3-32B
2026.05
61.5
Original
Base Model=Qwen3-8B, D...
2026.05
60.1
Original
Base Model=Olmo3-7B-In...
2026.05
59.9
Original
Base Model=Qwen3-4B, D...
2026.05
59.3
CauGym
Base Model=Qwen3-4B, D...
2026.05
58.6
CauGym
Base Model=Olmo3-7B-In...
2026.05
54.1
CDCR
Base Model=Olmo3-7B-In...
2026.05
50.5
CDCR
Base Model=Qwen3-8B, D...
2026.05
36.6
CDCR
Base Model=Qwen3-4B, D...
2026.05
32.4
Feedback
Search any
task
Search any
task