Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Logical Reasoning (Rule 2: is_mel_or_bcc) on HAM 5000 samples (test)
Loading...
100
F1 Score (NS-CL)
GroundTruth (Oracle)
70.568
78.209
85.85
93.491
May 5, 2026
F1 Score (NS-CL)
F1 Score (Decision Tree)
Updated 22d ago
Evaluation Results
Method
Method
Links
F1 Score (NS-CL)
F1 Score (Decision Tree)
GroundTruth (Oracle)
Supervision (%)=Oracle/GT
2026.05
100
100
GlobalVAE
Supervision (%)=75%
2026.05
91.2
90.3
SlotVAE (k=1)
Supervision (%)=75%
2026.05
71.7
70.7
Feedback
Search any
task
Search any
task