Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Malicious Action Detection on tau-Bench retail (test)
Loading...
7.8
TPR
GPT-OSS-Safeguard 20B
-0.312
1.794
3.9
6.006
Oct 3, 2025
TPR
FPR
Updated 1mo ago
Evaluation Results
Method
Method
Links
TPR
FPR
GPT-OSS-Safeguard 20B
2025.10
7.8
51.8
AprielGuard
2025.10
4.3
6.3
Granite-Guardian 3.3-8B
2025.10
0
0.9
Qwen-3-Guard
2025.10
0
0.2
Feedback
Search any
task
Search any
task