Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Indirect Prompt Injection Red-teaming on RTC-BENCH Aggregated (OwnCloud, Reddit, RocketChat)
Loading...
66.19
ASR (Avg)
GPT-4o
5.2252
21.0526
36.88
52.7074
May 28, 2025
ASR (Avg)
AR (Avg)
Updated 4d ago
Evaluation Results
Method
Method
Links
ASR (Avg)
AR (Avg)
GPT-4o
Agent Category=Adapted...
2025.05
66.19
92.45
Claude 3.7 Sonnet | CUA
Agent Category=Special...
2025.05
42.93
64.39
Claude 3.5 Sonnet
Agent Category=Adapted...
2025.05
41.37
64.27
Claude 3.7 Sonnet
Agent Category=Adapted...
2025.05
39.33
58.99
Claude 3.5 Sonnet | CUA
Agent Category=Special...
2025.05
31.21
74.43
Operator (w/o checks)
Agent Category=Special...
2025.05
30.89
47.84
Operator
Agent Category=Special...
2025.05
7.57
14.06
Feedback
Search any
task
Search any
task