HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

About

The breakthrough of OpenAI o1 highlights the potential of enhancing reasoning to improve LLM. Yet, most research in reasoning has focused on mathematical tasks, leaving domains like medicine underexplored. The medical domain, though distinct from mathematics, also demands robust reasoning to provide reliable answers, given the high standards of healthcare. However, verifying medical reasoning is challenging, unlike those in mathematics. To address this, we propose verifiable medical problems with a medical verifier to check the correctness of model outputs. This verifiable nature enables advancements in medical reasoning through a two-stage approach: (1) using the verifier to guide the search for a complex reasoning trajectory for fine-tuning LLMs, (2) applying reinforcement learning (RL) with verifier-based rewards to enhance complex reasoning further. Finally, we introduce HuatuoGPT-o1, a medical LLM capable of complex reasoning, which outperforms general and medical-specific baselines using only 40K verifiable problems. Experiments show complex reasoning improves medical problem-solving and benefits more from RL. We hope our approach inspires advancements in reasoning across medical and other specialized domains.

Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Jianye Hou, Benyou Wang• 2024

Related benchmarks

Task	Dataset	Result
Medical Question Answering	MedMCQA	Accuracy76.76	521
Mathematical Reasoning	MATH 500	Top-1 Accuracy73.2	384
Reasoning	MMLU-Pro	Accuracy74.16	241
Reasoning	GPQA Diamond	Accuracy50	185
Question Answering	PubMedQA (test)	Accuracy80.6	170
Medical Question Answering	MedQA	Accuracy88.85	153
Medical Question Answering	MedMCQA (test)	Accuracy73.61	134
Multiple-choice Question Answering	MMLU-Pro	MMLU-Pro Overall Accuracy57.58	130
Medical Question Answering	MedQA	Accuracy72.6	124
Scientific Question Answering	GPQA Diamond	Accuracy52.53	123

Showing 10 of 86 rows

...

Other info

Follow for update

@wizwand_team Discord