Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Conditional-independence (belief sufficiency) testing on Cry
Loading...
0.4223
CMI
Llama
0.164588
0.231494
0.2984
0.365306
Feb 6, 2026
CMI
CMI 95% CI
LogLoss Imp. (CB: A~p vs A~(p,θ))
LogLoss Imp. CI (CB: A~p vs A~(p,θ))
LogLoss Imp. (CB: A~(p,x) vs A~(p,x,θ))
LogLoss Imp. CI (CB: A~(p,x) vs A~(p,x,θ))
Updated 4d ago
Evaluation Results
Method
Method
Links
CMI
CMI 95% CI
LogLoss Imp. (CB: A~p vs A~(p,θ))
LogLoss Imp. CI (CB: A~p vs A~(p,θ))
LogLoss Imp. (CB: A~(p,x) vs A~(p,x,θ))
LogLoss Imp. CI (CB: A~(p,x) vs A~(p,x,θ))
Llama
Model=Llama
2026.02
0.4223
-
0.56
-
2.83
-
GPT-Min
Model=GPT-Min
2026.02
0.2232
-
14.84
-
4.88
-
GPT-High
Model=GPT-High
2026.02
0.1901
-
11.52
-
9.25
-
DeepSeek
Model=DeepSeek
2026.02
0.1745
-
0.88
-
2.66
-
Feedback
Search any
task
Search any
task