Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Conditional-independence (belief sufficiency) testing on Fever Expert-Constructed Bayesian Network
Loading...
32.89
CMI
Llama
8.502
14.8335
21.165
27.4965
Feb 6, 2026
CMI
CMI 95% CI
LogLoss Imp. (CB: A~p vs A~(p,θ))
LogLoss Imp. CI (CB: A~p vs A~(p,θ))
LogLoss Imp. (CB: A~(p,x) vs A~(p,x,θ))
LogLoss Imp. CI (CB: A~(p,x) vs A~(p,x,θ))
Updated 4d ago
Evaluation Results
Method
Method
Links
CMI
CMI 95% CI
LogLoss Imp. (CB: A~p vs A~(p,θ))
LogLoss Imp. CI (CB: A~p vs A~(p,θ))
LogLoss Imp. (CB: A~(p,x) vs A~(p,x,θ))
LogLoss Imp. CI (CB: A~(p,x) vs A~(p,x,θ))
Llama
Model=Llama
2026.02
32.89
-
8.38
-
4.75
-
DeepSeek
Model=DeepSeek
2026.02
20.6
-
22.83
-
15.83
-
GPT-Min
Model=GPT-Min
2026.02
14.46
-
12.67
-
9.45
-
GPT-High
Model=GPT-High
2026.02
9.44
-
4.74
-
2.27
-
Feedback
Search any
task
Search any
task