Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning

About

Rubric-based reward shaping provides interpretable and editable reward signals for fine-tuning LLMs via reinforcement learning (RL), but existing adaptive rubric methods typically update criteria from local evidence such as the current batch or instance-level comparisons. This local view discards diagnostic information produced during training, making it difficult to track recurring failures, evaluate previous rubric edits, or raise standards once earlier criteria become saturated. We introduce AMARIS, A Memory-Augmented Rubric Improvement System that grounds rubric updates in longitudinal training evidence. AMARIS stores rollout analyses, step-level summaries, and rubric update records in a persistent evaluation memory, then retrieves recent and semantically relevant history to revise rubrics. We evaluate AMARIS across science, medicine, instruction following, and creative writing under both global and instance-specific rubric settings. AMARIS improves over static, local-adaptive, and memory-ablated baselines, such as +2.8 points on GPQA-Diamond and +2.2 points on IFBench over the strongest baselines, while analysis shows that memory reduces oscillatory rubric edits and supports a progression from early failure correction to later curriculum advancement. AMARIS runs asynchronously alongside the normal RL loop, reducing blocking latency relative to synchronous rubric updates.

Peilin Wu, Xinlu Zhang, Kun Wan, Wentian Zhao, Gang Wu, Xinya Du, Zhiyu Chen• 2026

Related benchmarks

TaskDatasetResultRank
Instruction FollowingIFEval
Accuracy (IFEval)81
89
Medical ReasoningHealthBench
Accuracy34
36
Creative WritingCreative Writing v3
Overall Rubric Score40.1
32
Creative WritingWritingBench
Score57.9
18
Instruction FollowingIFBench
Accuracy36
18
Instruction FollowingInfoBench
Accuracy85.2
8
Scientific ReasoningGPQA Diamond
Accuracy40.4
6
Showing 7 of 7 rows

Other info

Follow for update