RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models

About

Speech large language models (LLMs) have driven significant progress in end-to-end speech understanding and recognition, yet they continue to struggle with accurately recognizing rare words and domain-specific terminology. This paper presents a novel fine-tuning method, Reinforcement Learning with Biasing Rewards (RLBR), which employs a specialized biasing words preferred reward to explicitly emphasize biasing words in the reward calculation. In addition, we introduce reference-aware mechanisms that extend the reinforcement learning algorithm with reference transcription to strengthen the potential trajectory exploration space. Experiments on the LibriSpeech corpus across various biasing list sizes demonstrate that RLBR delivers substantial performance improvements over a strong supervised fine-tuning (SFT) baseline and consistently outperforms several recently published methods. The proposed approach achieves excellent performance on the LibriSpeech test-clean and test-other sets, reaching Biasing Word Error Rates (BWERs) of 0.59% / 2.11%, 1.09% / 3.24%, and 1.36% / 4.04% for biasing list sizes of 100, 500, and 1000, respectively, without compromising the overall WERs.

Bo Ren, Ruchao Fan, Yelong Shen, Weizhu Chen, Jinyu Li• 2026

Related benchmarks

Task	Dataset	Result	Rank
Speech Recognition	Librispeech other (test)	WER2.25		105
Speech Recognition	Librispeech (test-clean)	BWER0.59		20

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord