Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Instruction Hierarchy

Benchmarks

Task NameDataset NameSOTA ResultTrend
Phrase ProtectionInstruction Hierarchy (test)
User Message Protection Accuracy97.5
4
System Prompt ExtractionInstruction Hierarchy (test)
Attack Success Rate (Realistic User)99.7
4
Showing 2 of 2 rows