Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Reasoning on MedQA and MedMCQA mixture
Loading...
59.4
Pass@1
FedAvg-PubSwap
48.792
51.546
54.3
57.054
Apr 14, 2026
Pass@1
Updated 3d ago
Evaluation Results
Method
Method
Links
Pass@1
FedAvg-PubSwap
Backbone=Llama3.2-3B-I...
2026.04
59.4
FedAvg-GRPO
Backbone=Llama3.2-3B-I...
2026.04
58.7
FedAvg-PubSwap
Backbone=Llama3.2-3B-I...
2026.04
58.5
FedAvg-GRPO
Backbone=Llama3.2-3B-I...
2026.04
58.2
FedAvg-PubSwap
Backbone=Llama3.2-3B-I...
2026.04
58.1
FedAvg-GRPO
Backbone=Llama3.2-3B-I...
2026.04
57.9
FedAvg-PubSwap
Backbone=Llama3.2-3B-I...
2026.04
57.5
FedAvg-GRPO
Backbone=Llama3.2-3B-I...
2026.04
56
Base model
Backbone=Llama3.2-3B-I...
2026.04
49.2
Base model
Backbone=Llama3.2-3B-I...
2026.04
49.2
Base model
Backbone=Llama3.2-3B-I...
2026.04
49.2
Base model
Backbone=Llama3.2-3B-I...
2026.04
49.2
Feedback
Search any
task
Search any
task