Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SHADE-Arena

Benchmarks

Task NameDataset NameSOTA ResultTrend
LLM behavior monitoringSHADE-Arena (test)
Accuracy75.2
8
Sabotage detectionSHADE-Arena
log-AUROC78
8
Showing 2 of 2 rows