Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UltraMedical: Building Specialized Generalists in Biomedicine

About

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains and are moving towards more specialized areas. Recent advanced proprietary models such as GPT-4 and Gemini have achieved significant advancements in biomedicine, which have also raised privacy and security challenges. The construction of specialized generalists hinges largely on high-quality datasets, enhanced by techniques like supervised fine-tuning and reinforcement learning from human or AI feedback, and direct preference optimization. However, these leading technologies (e.g., preference learning) are still significantly limited in the open source community due to the scarcity of specialized data. In this paper, we present the UltraMedical collections, which consist of high-quality manual and synthetic datasets in the biomedicine domain, featuring preference annotations across multiple advanced LLMs. By utilizing these datasets, we fine-tune a suite of specialized medical models based on Llama-3 series, demonstrating breathtaking capabilities across various medical benchmarks. Moreover, we develop powerful reward models skilled in biomedical and general reward benchmark, enhancing further online preference learning within the biomedical LLM community. Datasets and models are available at https://github.com/TsinghuaC3I/UltraMedical

Kaiyan Zhang, Sihang Zeng, Ermo Hua, Ning Ding, Zhang-Ren Chen, Zhiyuan Ma, Haoxin Li, Ganqu Cui, Biqing Qi, Xuekai Zhu, Xingtai Lv, Hu Jinfang, Zhiyuan Liu, Bowen Zhou• 2024

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy72.94
346
Medical Question AnsweringMedQA
Accuracy83.9
153
Reward ModelingRewardBench
Chat Score97.21
146
Question AnsweringMedQA
Accuracy75
96
Medical Question AnsweringPubMedQA
Accuracy80
92
Medical Question AnsweringMedExpQA
Overall Accuracy66.4
70
Medical Question AnsweringMedbullets
Accuracy54.5
65
Question AnsweringMMLU
Accuracy67.5
46
Medical ReasoningHealthBench Hard
Accuracy16.7
41
Multilingual Medical ReasoningCUREMED-BENCH (test)
Consistency47.03
33
Showing 10 of 22 rows

Other info

Code

Follow for update