Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Review

Benchmarks

Task NameDataset NameSOTA ResultTrend
Personalized GenerationReview (test)
Accuracy95.76
10
Personalized Response GenerationReview Interpolated Users
Winrate84.6
8
Personalized Response GenerationReview Trained Users
Winrate92.3
8
Abstract ScreeningReview 1 821 abstracts (Final Includes)
False Positives45
8
Full-Text ScreeningReview 1
False Positives18
8
Document-Level Anomaly DetectionReview (test)
AUROC0.9594
7
Token-Level Anomaly DetectionReview (test)
AUROC0.8271
7
scoringReview-5K
MAE1.957
5
Full-text inclusion screeningReview 2 (7741 abstracts)
False Positives (FP)87
5
Abstract ScreeningReview 2 (Final Includes)
Metric-
0
Showing 10 of 10 rows