Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

news

Benchmarks

Task NameDataset NameSOTA ResultTrend
Treatment Effect EstimationNEWS semi-synthetic
Mean Error0
22
Treatment Effect EstimationNEWS semi-synthetic (test)
MSE0
22
Summarizationnews multi
Rouge-L23.66
21
Named Entity RecognitionNEWS
F1 Score86.15
21
English-German document-level translationNews English-German (test)
s-BLEU30.34
20
News RecommendationNEWS (test)
AUC64.68
18
Out-of-Distribution DetectionNews (test)
AUROC80.7
17
Out-of-Distribution DetectionNews
FPR69.31
17
RegressionNews (test)
MSE0.69
17
LLM UnlearningNEWS
Verification Memory (VerMem)22.09
16
Individual Treatment Effect (ITE) EstimationNEWS (out)
PEHE0.44
16
Individual Treatment Effect (ITE) EstimationNEWS (in)
PEHE0.25
16
ATE estimationNews
Joint Bias (JB)0.07
14
Machine Text DetectionNews
Claude 3.5 Rewrite AUC1
11
Named Entity RecognitionNews (test)
F1 Score80.86
10
Retrieval Question AnsweringNews in-domain
MRR46.6
10
ClassificationNews
Macro Precision93.87
9
ClassificationNews (test)
Average Inference Time (s)0.5233
9
Hierarchical Agglomerative Clusteringnews
AMI0.627
9
Clusteringnews
ARI47
9
Scientific Text SimplificationNews
d-BLEU4.61
9
Text ClassificationNews FTC-metadataset mini 10%
AUROC99.07
8
Text ClassificationNews FTC-metadataset full
AURAC0.9837
8
Text ClassificationNews FTC-metadataset full
NLL0.1423
8
Text ClassificationNews FTC-metadataset full
Average Prediction Set Size1.2228
8
Showing 25 of 54 rows