HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM

About

Existing open-source helpfulness preference datasets do not specify what makes some responses more helpful and others less so. Models trained on these datasets can incidentally learn to model dataset artifacts (e.g. preferring longer but unhelpful responses only due to their length). To alleviate this problem, we collect HelpSteer, a multi-attribute helpfulness dataset annotated for the various aspects that make responses helpful. Specifically, our 37k-sample dataset has annotations for correctness, coherence, complexity, and verbosity in addition to overall helpfulness of responses. Training Llama 2 70B using the HelpSteer dataset with SteerLM technique produces a model that scores 7.54 on MT Bench, which is currently the highest score for open models that do not require training data from more powerful models (e.g. GPT4). We release this dataset with CC-BY-4.0 license at https://huggingface.co/datasets/nvidia/HelpSteer

Zhilin Wang, Yi Dong, Jiaqi Zeng, Virginia Adams, Makesh Narsimhan Sreedhar, Daniel Egert, Olivier Delalleau, Jane Polak Scowcroft, Neel Kant, Aidan Swope, Oleksii Kuchaiev• 2023

Related benchmarks

Task	Dataset	Result
Reward Modeling	RewardBench	Chat Score91.3	216
Reward Modeling	RewardBench	Accuracy88.8	166
Reward Modeling	RM-Bench	Accuracy52.5	137
Reward Modeling	RMB	Accuracy58.2	120
Reward Modeling Evaluation	RM-Bench	Chat Score56.4	69
Reward Modeling	RMB	Help Accuracy57.4	13

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord