Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Counterfactual Generation on AI-READI Class 0
Loading...
0.99
Validity
GPT-4
0.418
0.5665
0.715
0.8635
Jan 21, 2026
Validity
Distance
Sparsity
Plausibility
Updated 4d ago
Evaluation Results
Method
Method
Links
Validity
Distance
Sparsity
Plausibility
GPT-4
mode=Few-shot
2026.01
0.99
1.2
4.4
99
Llama*
mode=Fine-tuned
2026.01
0.99
0.41
1.8
99
BioMistral*
mode=Fine-tuned
2026.01
0.93
0.92
2.27
90
GPT-4
mode=Zero-shot
2026.01
0.91
1.1
3.6
85
CFNOW
2026.01
0.85
0.1
2.9
100
DiCE
2026.01
0.67
0.2
2.27
100
Llama
mode=Pre-trained
2026.01
0.62
1.6
4.6
91
BioMistral
mode=Pre-trained
2026.01
0.51
1.4
5.2
77
NICE
2026.01
0.44
0.02
1.12
33
Feedback
Search any
task
Search any
task