Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Agent Safety benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Agent Safety
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
R-Judge
Gemini-3.1-Pro
Accuracy
97.3
92
1d ago
ASSEBench
DRAFT (Qwen3Guard-Gen-4B)
Accuracy
92.04
69
1d ago
AuraGen
Extractor
Accuracy
95.31
47
1mo ago
Showing 3 of 3 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task