Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Small Reward Models via Backward Inference

About

Reward models (RMs) play a central role throughout the language model (LM) pipeline, particularly in non-verifiable domains. However, the dominant LLM-as-a-Judge paradigm relies on the strong reasoning capabilities of large models, while alternative approaches require reference responses or explicit rubrics, limiting flexibility and broader accessibility. In this work, we propose FLIP (FLipped Inference for Prompt reconstruction), a reference-free and rubric-free reward modeling approach that reformulates reward modeling through backward inference: inferring the instruction that would most plausibly produce a given response. The similarity between the inferred and the original instructions is then used as the reward signal. Evaluations across four domains using 13 small language models show that FLIP outperforms LLM-as-a-Judge baselines by an average of 79.6%. Moreover, FLIP substantially improves downstream performance in extrinsic evaluations under test-time scaling via parallel sampling and GRPO training. We further find that FLIP is particularly effective for longer outputs and robust to common forms of reward hacking. By explicitly exploiting the validation-generation gap, FLIP enables reliable reward modeling in downscaled regimes where judgment methods fail. Code available at https://github.com/yikee/FLIP.

Yike Wang, Faeze Brahman, Shangbin Feng, Teng Xiao, Hannaneh Hajishirzi, Yulia Tsvetkov• 2026

Related benchmarks

TaskDatasetResultRank
ReasoningBBH
Accuracy86.3
507
Instruction FollowingIFEval--
292
Question AnsweringGPQA
Accuracy46.4
258
Reward ModelingRewardBench Focus 2
Accuracy72.2
82
Reward ModelingRewardBench Precise IF 2
Accuracy25
70
Reward Modeling EvaluationReward Bench Factuality 2
Pairwise Accuracy31.3
64
Reward ModelingRewardBench Average 2
Accuracy39.7
52
Reward ModelingRewardBench Math 2
Accuracy31
52
Mathematical ReasoningMinerva Math
Accuracy93
28
Instruction FollowingIFBench
Accuracy20.5
25
Showing 10 of 10 rows

Other info

Follow for update