Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Conditional-independence (belief sufficiency) testing on Diab (Diabetes Patient Records)
Loading...
0.2695
CMI
Llama
0.009292
0.076846
0.1444
0.211954
Feb 6, 2026
CMI
CMI 95% CI Range
LogLoss Improvement (CB: A~p vs A~(p,θ))
LogLoss Improvement 95% CI (CB: A~p vs A~(p,θ))
LogLoss Improvement (CB: A~(p,x) vs A~(p,x,θ))
LogLoss Improvement 95% CI (CB: A~(p,x) vs A~(p,x,θ))
Updated 4d ago
Evaluation Results
Method
Method
Links
CMI
CMI 95% CI Range
LogLoss Improvement (CB: A~p vs A~(p,θ))
LogLoss Improvement 95% CI (CB: A~p vs A~(p,θ))
LogLoss Improvement (CB: A~(p,x) vs A~(p,x,θ))
LogLoss Improvement 95% CI (CB: A~(p,x) vs A~(p,x,θ))
Llama
Model=Llama
2026.02
0.2695
-
0.1233
-
-0.0026
-
GPT-High
Model=GPT-High
2026.02
0.0461
-
0.0022
-
0.0043
-
DeepSeek
Model=DeepSeek
2026.02
0.0351
-
-0.0243
-
0.0001
-
GPT-Min
Model=GPT-Min
2026.02
0.0193
-
0.0284
-
0
-
Feedback
Search any
task
Search any
task