Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

news

Benchmarks

Task NameDataset NameSOTA ResultTrend
Realistic color video completionNews 144×176×3×30
PSNR38.6
70
Tensor CompletionNews 144 x 176 x 100
PSNR34.9
35
Treatment Effect EstimationNEWS semi-synthetic
Mean Error0
22
Treatment Effect EstimationNEWS semi-synthetic (test)
MSE0
22
Summarizationnews multi
Rouge-L23.66
21
Named Entity RecognitionNEWS
F1 Score86.15
21
English-German document-level translationNews English-German (test)
s-BLEU30.34
20
Information Retrievalnews
Recall@10052.7
19
Tabular Data GenerationNews
DCR-0021.0325
18
News RecommendationNEWS (test)
AUC64.68
18
Out-of-Distribution DetectionNews (test)
AUROC80.7
17
Out-of-Distribution DetectionNews
FPR69.31
17
RegressionNews (test)
MSE0.69
17
LLM UnlearningNEWS
Verification Memory (VerMem)22.09
16
Individual Treatment Effect (ITE) EstimationNEWS (out)
PEHE0.44
16
Individual Treatment Effect (ITE) EstimationNEWS (in)
PEHE0.25
16
ATE estimationNews
Joint Bias (JB)0.07
14
Machine Text DetectionNews
Claude 3.5 Rewrite AUC1
11
Misclassification DetectionNews
ROC-AUC (Misclassification Detection)88.8
10
Tabular Data SynthesisNews
Rank1
10
Named Entity RecognitionNews (test)
F1 Score80.86
10
Retrieval Question AnsweringNews in-domain
MRR46.6
10
Tabular Data GenerationNews
Beta Recall43.1
9
Tabular Data GenerationNews
alpha-PRECISION98.24
9
Tabular Data Privacy EvaluationNews
DCR-0050.0001
9
Showing 25 of 85 rows