Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Compliance

Benchmarks

Task NameDataset NameSOTA ResultTrend
Benign Infection ControlCompliance
Metric M Score100
12
Multi-Evidence AggregationCompliance
Accuracy98
9
Cross-domain generalizationCompliance (test)
Accuracy97.8
4
Showing 3 of 3 rows