Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

About

Nowadays, open-source large language models like LLaMA have emerged. Recent developments have incorporated supervised fine-tuning (SFT) and reinforcement learning fine-tuning (RLFT) to align these models with human goals. However, SFT methods treat all training data with mixed quality equally, while RLFT methods require high-quality pairwise or ranking-based preference data. In this study, we present a novel framework, named OpenChat, to advance open-source language models with mixed-quality data. Specifically, we consider the general SFT training data, consisting of a small amount of expert data mixed with a large proportion of sub-optimal data, without any preference labels. We propose the C(onditioned)-RLFT, which regards different data sources as coarse-grained reward labels and learns a class-conditioned policy to leverage complementary data quality information. Interestingly, the optimal policy in C-RLFT can be easily solved through single-stage, RL-free supervised learning, which is lightweight and avoids costly human preference labeling. Through extensive experiments on three standard benchmarks, our openchat-13b fine-tuned with C-RLFT achieves the highest average performance among all 13b open-source language models. Moreover, we use AGIEval to validate the model generalization performance, in which only openchat-13b surpasses the base model. Finally, we conduct a series of analyses to shed light on the effectiveness and robustness of OpenChat. Our code, data, and models are publicly available at https://github.com/imoneoi/openchat and https://huggingface.co/openchat.

Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, Yang Liu• 2023

Related benchmarks

TaskDatasetResultRank
SummarizationXsum--
108
General Language IntelligenceMMLU, GSM8K, BBH, TriviaQA, NQ latest available (test)
MMLU63.87
26
Multi-Robot Task AllocationLEGO-MRTA (test)
BLEU-422
18
Instruction FollowingAlpacaEval, MT-bench, Vicuna-bench
AlpacaEval Score89.5
13
Question AnsweringAGIEval (test)
AQUA-RAT19.3
5
Information Coverage and Truthfulness EvaluationCorpus-based Retrieval
S_fact34
4
Information Coverage and Truthfulness EvaluationWeb-based Retrieval
S_fact Score0.741
4
Machine TranslationFR-EN
BLEU40.52
3
Machine TranslationDE-EN
BLEU44.32
3
Machine TranslationEn-Fr
BLEU35.4
3
Showing 10 of 10 rows

Other info

Code

Follow for update