PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories A Unified Framework for Test-Time Adaptation with Compositional Rule Learning and Pareto-Guided Prompt Evolution

About

LLM agents that store knowledge as natural language suffer steep retrieval degradation as condition count grows, often struggle to compose learned rules reliably, and typically lack explicit mechanisms to detect stale or adversarial knowledge. We introduce PRECEPT, a unified framework for test-time adaptation with three tightly coupled components: (1) deterministic exact-match rule retrieval over structured condition keys, (2) conflict-aware memory with Bayesian source reliability and threshold-based rule invalidation, and (3) COMPASS, a Pareto-guided prompt-evolution outer loop. Exact retrieval eliminates partial-match interpretation errors on the deterministic path (0% by construction, vs 94.4% under Theorem~B.6's independence model at N=10) and supports compositional stacking through a semantic tier hierarchy; conflict-aware memory resolves static--dynamic disagreements and supports drift adaptation; COMPASS evaluates prompts through the same end-to-end execution pipeline. Results (9--10 seeds): PRECEPT achieves a +41.1pp first-try advantage over Full Reflexion (d>1.9), +33.3pp compositional generalization (d=1.55), 100% $P_1$ on 2-way logistics compositions (d=2.64), +40--55pp continuous learning gains, strong eventual robustness under adversarial static knowledge (100% logistics with adversarial SK active; partial recovery on integration), +55.0pp drift recovery (d=0.95, p=0.031), and 61% fewer steps. Core comparisons are statistically significant, often at p<0.001.

Arash Shahmansoori• 2026

Related benchmarks

Task	Dataset	Result
Continuous Learning	Integration Domain	P1 Score70.4	15
Rule drift adaptation	Integration Domain	P1 Score60.7	15
Rule drift adaptation	Logistics Domain	P1 Performance Change (pp)83.3	15
Compositional Generalization	Logistics	P1100	6
Compositional Generalization	Integration	P1 Score49	6
Agentic Workflow Automation	Integration Domain	Overall Success Rate ($P_t$)83.3	3
Agentic Workflow Automation	Booking domain	Success Rate ($P_t$)99.4	3
Agentic Workflow Automation	Logistics Domain	Overall Success Rate ($P_t$)100	3
Continuous Learning	Logistics 1st Encounter	P144.8	3
Continuous Learning	Logistics 2nd Encounter	P182.5	3

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord