Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Task 2

Benchmarks

Task NameDataset NameSOTA ResultTrend
Bias EvaluationTask 2 Persona F
BAD Score0.3
25
Bias EvaluationTask 2 Persona D
BAD Score0.007
25
Bias EvaluationTask 2 Persona C
BAD Score-0.18
25
Bias EvaluationTask 2 Persona B
BAD Score-0.007
25
Bias EvaluationTask 2 Persona A
BAD Score-0.006
25
Disease PredictionTask 2 Type 2 Diabetes (test)
AUROC0.83
10
Morphologic and molecular classificationTask 2
Accuracy74.8
8
Robot ManipulationTask 2 Concept-rich 1.0 (train)
Probability of Improvement0.93
5
Question AnsweringTask 2 Cross-domain
Answer Accuracy59.4
4
Question AnsweringTask 2 Single-domain
Answer Accuracy79.98
4
Showing 10 of 10 rows