Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

About

In this report, we introduce a collection of methods to enhance reward modeling for LLMs, focusing specifically on data-centric techniques. We propose effective data selection and filtering strategies for curating high-quality open-source preference datasets, culminating in the Skywork-Reward data collection, which contains only 80K preference pairs -- significantly smaller than existing datasets. Using this curated dataset, we developed the Skywork-Reward model series -- Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B -- with the former currently holding the top position on the RewardBench leaderboard. Notably, our techniques and datasets have directly enhanced the performance of many top-ranked models on RewardBench, highlighting the practical impact of our contributions in real-world preference learning applications.

Chris Yuhao Liu, Liang Zeng, Jiacai Liu, Rui Yan, Jujie He, Chaojie Wang, Shuicheng Yan, Yang Liu, Yahui Zhou• 2024

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval--
850
Language UnderstandingMMLU
Accuracy41
756
ReasoningBBH
Accuracy30.2
507
Question AnsweringARC Easy
Normalized Acc85.2
385
Mathematical ReasoningGSM8K
Accuracy (GSM8K)84.61
358
Instruction FollowingIFEval--
292
Multitask Language UnderstandingMMLU
Accuracy72.33
206
Common Sense ReasoningHellaSwag
Accuracy76.42
164
Reading ComprehensionRACE
Accuracy78.82
151
Instruction FollowingAlpacaEval
Win Rate77.4
125
Showing 10 of 68 rows

Other info

Follow for update