Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Quilt-1M

Benchmarks

Task NameDataset NameSOTA ResultTrend
Metric Sensitivity AnalysisQuilt-1M Logic Error
Score91
5
Metric Sensitivity AnalysisQuilt-1M Control
Score92
5
Showing 2 of 2 rows