Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Instruction Adherence and Security Robustness on StruQ Clean 1.0
Loading...
84.87
Capability Score
Text
78.3908
80.0729
81.755
83.4371
Dec 3, 2025
Capability Score
ASR (Naive)
ASR (Ignore)
ASR (Escape-S)
ASR (Completion-R)
ASR (Average)
ASR (Worst)
Updated 4d ago
Evaluation Results
Method
Method
Links
Capability Score
ASR (Naive)
ASR (Ignore)
ASR (Escape-S)
ASR (Completion-R)
ASR (Average)
ASR (Worst)
Text
Training Type=Vanilla...
2025.12
84.87
32.21
38.94
20.67
94.23
46.51
94.23
Delimiter
Training Type=Special...
2025.12
84.23
32.69
47.6
23.56
99.04
50.72
99.04
CAHL
2025.12
83.6
23.08
24.04
20.19
37.98
26.32
37.98
ISE
2025.12
78.64
21.15
30.29
20.67
61.54
33.41
61.54
Feedback
Search any
task
Search any
task