Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Safety & Helpfulness Evaluation on XSTest
Loading...
74.8
XSTest Score
Default
37.36
47.08
56.8
66.52
Feb 16, 2026
XSTest Score
Updated 4d ago
Evaluation Results
Method
Method
Links
XSTest Score
Default
Model=Qwen2.5-14B
2026.02
74.8
CB
Model=Llama3-8B
2026.02
67.2
Default
Model=Llama3-8B
2026.02
66
CAT
Model=Qwen2.5-14B
2026.02
54.4
Diffusion-only
Model=Llama3-8B
2026.02
53.6
CAT
Model=Llama3-8B
2026.02
46.4
DAT
Model=Llama3-8B
2026.02
46.4
DAT
Model=Qwen2.5-14B
2026.02
46.4
MixAT-GCG
Model=Llama3-8B
2026.02
45.6
LAT
Model=Llama3-8B
2026.02
44.4
MixAT-GCG
Model=Qwen2.5-14B
2026.02
38.8
Feedback
Search any
task
Search any
task