Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TRUST-BENCH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Malicious Tool Call DetectionTRUST-BENCH reconstructed comparison set curated 1,970-episode (test)
AMR100
20
Showing 1 of 1 rows