Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DCI Evaluation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Routine Task ManagementDCI Evaluation Suite Routine
Quality Score8.96
5
Software ArchitectureDCI Evaluation Suite Arch.
Quality Score9.19
5
Policy AnalysisDCI Evaluation Suite Policy
Quality Score8.97
5
Disagreement HandlingDCI Evaluation Suite Disagree
Quality Score8.87
5
Risk AssessmentDCI Evaluation Suite Risk
Quality Score8.85
5
Late-Evidence AnalysisDCI Evaluation Suite Late-Evid.
Quality Score9.6
5
Hidden-Profile IntegrationDCI Evaluation Suite Hidden-Prof
Quality Score9.56
5
Showing 7 of 7 rows