Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

news

Benchmarks

Task NameDataset NameSOTA ResultTrend
Realistic color video completionNews 144×176×3×30
PSNR38.6
70
Tensor CompletionNews 144 x 176 x 100
PSNR34.9
35
Treatment Effect EstimationNEWS semi-synthetic
Mean Error0
22
Treatment Effect EstimationNEWS semi-synthetic (test)
MSE0
22
Summarizationnews multi
Rouge-L23.66
21
Named Entity RecognitionNEWS
F1 Score86.15
21
English-German document-level translationNews English-German (test)
s-BLEU30.34
20
Passage RerankingNews BEIR
NDCG@1049.32
19
Information Retrievalnews
Recall@10052.7
19
Marginal Distribution AlignmentNews
Error Rate1.72
18
Tabular Data SynthesisNews
C2ST97.93
18
Tabular Data SynthesisNews
Pairwise Correlation Alignment Error1.34
18
Tabular Data GenerationNews
DCR-0021.0325
18
News RecommendationNEWS (test)
AUC64.68
18
Out-of-Distribution DetectionNews (test)
AUROC80.7
17
Out-of-Distribution DetectionNews
FPR69.31
17
RegressionNews (test)
MSE0.69
17
Privacy PreservationNews (test)
DCR Score99
16
LLM UnlearningNEWS
Verification Memory (VerMem)22.09
16
Individual Treatment Effect (ITE) EstimationNEWS (out)
PEHE0.44
16
Individual Treatment Effect (ITE) EstimationNEWS (in)
PEHE0.25
16
ATE estimationNews
Joint Bias (JB)0.07
14
Dosage Policy Estimation (DPE)News (test)
Mean DPE2.69
12
Single change-point detectionNews
WD0.12
12
Machine Text DetectionNews
Claude 3.5 Rewrite AUC1
11
Showing 25 of 105 rows