Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Counterfactual Generation on AI-READI (Class 1)
Loading...
98
Validity
Llama*
44.96
58.73
72.5
86.27
Jan 21, 2026
Validity
Distance
Sparsity
Plausibility
Updated 4d ago
Evaluation Results
Method
Method
Links
Validity
Distance
Sparsity
Plausibility
Llama*
mode=Fine-tuned
2026.01
98
20
1.9
99
GPT-4
mode=Few-shot
2026.01
92
182
4
96
BioMistral*
mode=Fine-tuned
2026.01
91
100
2.1
95
GPT-4
mode=Zero-shot
2026.01
89
150
3.8
82
CFNOW
2026.01
84
25
3
99
Llama
mode=Pre-trained
2026.01
68
130
3.8
78
DiCE
2026.01
58
41
2.4
99
NICE
2026.01
53
4
1.31
35
BioMistral
mode=Pre-trained
2026.01
47
150
4.1
70
Feedback
Search any
task
Search any
task