Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

medical

Benchmarks

Task NameDataset NameSOTA ResultTrend
Rubric satisfaction evaluationMedical
Claude-4 Sonnet Score50.9
21
Hypernym discoverymedical Gold standard domain-specific (test)
MRR77.32
18
Preference EvaluationMedical
Avg Score8.58
14
Importance-based Node LeakageMedical
Leakage (Deg)36.2
10
Factual Precision EvaluationMedical
SAFE87.3
10
Machine TranslationMedical (test)
BLEU55.42
9
MRI to CT translationmedical MRI→CT 256 × 256 (test)
NFE4
7
Machine TranslationMedical All-domain datastore (test)
BLEU55.1
6
Access ControlMedical
Accuracy100
5
Machine TranslationMedical out-of-domain (test)
BLEU15.4
5
Mixed Linear Regressionmedical
Minimal Error (K=2)0.1591
5
Machine TranslationMedical multi-domain (test)
Decoding Throughput (Tok/Sec)3,152.59
2
DSL EvaluationMedical
Opinion4.4
1
Showing 13 of 13 rows