Fine-Tuning LLMs with Fine-Grained Human Feedback on Text Spans

About

We present a method and dataset for fine-tuning language models with preference supervision using feedback-driven improvement chains. Given a model response, an annotator provides fine-grained feedback by marking ``liked'' and ``disliked'' spans and specifying what they liked or disliked about them. The base model then rewrites the disliked spans accordingly, proceeding from left to right, forming a sequence of incremental improvements. We construct preference pairs for direct alignment from each adjacent step in the chain, enabling the model to learn from localized, targeted edits. We find that our approach outperforms direct alignment methods based on standard A/B preference ranking or full contrastive rewrites, demonstrating that structured, revision-based supervision leads to more efficient and effective preference tuning.

Sky CH-Wang, Justin Svegliato, Helen Appel, Jason Eisner• 2025

Related benchmarks

Task	Dataset	Result	Rank
Human Preference Ranking	Human Evaluation Elo (test)	Elo Score1.63e+3		34
Response Quality Evaluation	AlpacaEval ELOM gpt4 (test)	ELOM1.63e+3		7

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord