Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Malicious Action Detection on tau-Bench retail (test)

7.8TPR

GPT-OSS-Safeguard 20B

-0.3121.7943.96.006Oct 3, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
7.851.8
4.36.3
00.9
00.2