Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

medical

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringMedical
GPT Accuracy68.81
31
Language ModelingMedical (Med)
PPL Change (%) vs Baseline0.1
30
Partial Multi-label Learningmedical
Average Precision87.5
21
Partial Multi-Label Learningmedical
Ranking Loss0.03
21
Rubric satisfaction evaluationMedical
Claude-4 Sonnet Score50.9
21
Hypernym discoverymedical Gold standard domain-specific (test)
MRR77.32
18
Medical TaskMedical
Accuracy100
16
Image Quality AssessmentMedical
PLCC0.871
15
SummarizationMedical Random subset
R-LCS25.04
14
Medical Question AnsweringMedical
Score81.55
14
Preference EvaluationMedical
Avg Score8.58
14
Budgeted Hybrid RoutingMedical Average Global
Spearman Correlation1
12
Budgeted Hybrid RoutingMedical Ru→En
HitRate@p100
12
Budgeted Hybrid RoutingMedical Zh→En
HitRate@p100
12
Budgeted Hybrid RoutingMedical En→Ru
Hit Rate@p100
12
Budgeted Hybrid RoutingMedical En→Zh
HitRate@p100
12
Retrieval-Augmented GenerationMedical
Indexing Time (minutes)7
11
Multi-label classificationMedical
Micro F1-Score76.6
11
ClassificationMedical
F1 Score76.6
10
Importance-based Node LeakageMedical
Leakage (Deg)36.2
10
Factual Precision EvaluationMedical
SAFE87.3
10
Machine TranslationMedical (test)
BLEU55.42
9
SummarizationMedical (OOV_SD)
R-LCS26.68
8
Multi-label Feature Selectionmedical
Hamming Loss0.011
7
Multi-label Feature Selectionmedical
Running Time (sec)0.11
7
Showing 25 of 51 rows