Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DDXPlus

Benchmarks

Task NameDataset NameSOTA ResultTrend
Medical Question AnsweringDDXPlus
Accuracy86.5
43
Medical ReasoningDDXPlus
Accuracy (DDXPlus)77.9
17
Medical ReasoningDDXPlus
Token Cost2,383
11
Medical ReasoningDDXPlus
Performance Score90.2
11
Automated Medical DiagnosisDDXPlus (test)
IL25.75
9
Medical Differential DiagnosesDDXPlus
Avg Correct79
8
Privacy RewritingDDXPlus Pri
Accuracy87.6
7
Confidence EstimationDDXPlus
AUROC0.795
7
CalibrationDDXPlus
Top-1 ECE0.01
4
ClassificationDDXPlus
Accuracy50.1
4
Synthetic Data UtilityDDXPlus
Overall Score99.9
3
Privacy EvaluationDDXPlus
Overall Score100
3
Synthetic Data DetectionDDXPlus
Overall Score37.7
3
Tabular Data SynthesisDDXPlus
Overall Score97.2
3
Showing 14 of 14 rows