Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Robust

Benchmarks

Task NameDataset NameSOTA ResultTrend
Information RetrievalRobust04
P@2046.67
72
Information Retrievalrobust
Recall@10046.5
19
Document RerankingRobust04 Description
MAP0.4084
13
Relevance Assessment Label AlignmentRobust 2004
Cohen's Kappa (κ)0.56
11
Document RetrievalRobust TREC 2004 (test)
P@2051
10
Pseudotime estimationRobust V2 (pooled donor-holdout)
Mean Difference-0.293
9
Document RetrievalRobust04 EN
NDCG@1056.38
8
Information RetrievalROBUST04 (test)
AP@100027.47
8
Information RetrievalRobust04 BEIR (test)
nDCG@100.567
7
Information RetrievalRobust04 Title queries (test)
MAP29.04
7
Passage RerankingRobust04 (test)
MAP0.2901
5
Stage classificationRobust cells V2 (test)
Balanced Accuracy56.7
4
Pseudotime inferenceRobust cells V2 (test)
Spearman Correlation (Pseudotime-depth)0.249
2
CD4/CD8 identificationRobust cells V2 (test)
AUROC0.867
2
Branch classificationRobust V2 (test)
Balanced Acc82.8
2
Trustworthiness EvaluationRobust (human evaluation)
Control Wins100
1
Showing 16 of 16 rows