Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task evaluation on Aggregated Clinical Tasks
Loading...
74.6
Average Score
Full FT
62.016
65.283
68.55
71.817
Apr 8, 2026
Average Score
Updated 9d ago
Evaluation Results
Method
Method
Links
Average Score
Full FT
Base model=gpt-oss 20B...
2026.04
74.6
MPT
Base model=gpt-oss 20B...
2026.04
73.9
LoRA
Base model=gpt-oss 20B...
2026.04
72.4
Full FT
Base model=Meditron3 8...
2026.04
72.2
MPT
Base model=Meditron3 8...
2026.04
71.5
Full FT
Base model=LLaMA 3.1 8...
2026.04
69.9
LoRA
Base model=Meditron3 8...
2026.04
69.9
MPT
Base model=LLaMA 3.1 8...
2026.04
69.1
PT
Base model=gpt-oss 20B...
2026.04
67.9
LoRA
Base model=LLaMA 3.1 8...
2026.04
67.4
PT
Base model=Meditron3 8...
2026.04
65.1
PT
Base model=LLaMA 3.1 8...
2026.04
62.5
Feedback
Search any
task
Search any
task