Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Binary Classifier Optimization for Large Language Model Alignment

About

In real-world services such as ChatGPT, aligning models based on user feedback is crucial for improving model performance. However, due to the simplicity and convenience of providing feedback, users typically offer only basic binary signals, such as 'thumbs-up' or 'thumbs-down'. Most existing alignment research, on the other hand, relies on preference-based approaches that require both positive and negative responses as a pair. We propose Binary Classifier Optimization (BCO), a technique that effectively aligns LLMs using only binary feedback. BCO trains a binary classifier, where the logit serves as an implicit reward, effectively minimizing the Direct Preference Optimization (DPO) loss. We demonstrate that the binary cross-entropy loss employed in classifier training acts as an upper bound for the DPO loss. Additionally, a novel reward shift technique further minimizes the gap between the losses. We validate our methodology in two settings: first, on a paired preference dataset, where our method performs on par with DPO; and second, on a Likert-5 scale annotation dataset which stems from real users' queries. Our model consistently demonstrates effective and robust alignment across four base LLMs and three different datasets, showcasing the strength of our approach to learning from binary signals.

Seungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-Woon On• 2024

Related benchmarks

TaskDatasetResultRank
Instruction FollowingIFEval--
836
Physical Commonsense ReasoningPIQA
Accuracy80.74
696
Multi-turn Dialogue EvaluationMT-Bench
Overall Score8.23
532
Bias EvaluationBBQ
Accuracy88.5
171
Mathematical ReasoningGSM8K
EM58.98
123
Language UnderstandingMMLU
MMLU Score70.65
70
LLM Alignment EvaluationAlpacaEval 2.0 (test)
LC Win Rate29.23
51
General Utility EvaluationMT_Bench
Agreement Rate64.7
33
Scientific ReasoningARC
Score86.2
29
Scholarly Title GenerationLaMP Scholarly Title Generation
ROUGE-10.507
21
Showing 10 of 21 rows

Other info

Follow for update