Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Sentence ClassificationOS full (test)
Accuracy95.1
9
Traffic Speed PredictionOS I4
RMSE5.86
8
Reasoning-Level Denial-of-ServiceOS Environment Injection (test)
E2E Success80
4
Showing 3 of 3 rows