Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories A Unified Framework for Test-Time Adaptation with Compositional Rule Learning and Pareto-Guided Prompt Evolution

About

LLM agents that store knowledge as natural language suffer steep retrieval degradation as condition count grows, often struggle to compose learned rules reliably, and typically lack explicit mechanisms to detect stale or adversarial knowledge. We introduce PRECEPT, a unified framework for test-time adaptation with three tightly coupled components: (1) deterministic exact-match rule retrieval over structured condition keys, (2) conflict-aware memory with Bayesian source reliability and threshold-based rule invalidation, and (3) COMPASS, a Pareto-guided prompt-evolution outer loop. Exact retrieval eliminates partial-match interpretation errors on the deterministic path (0% by construction, vs 94.4% under Theorem~B.6's independence model at N=10) and supports compositional stacking through a semantic tier hierarchy; conflict-aware memory resolves static--dynamic disagreements and supports drift adaptation; COMPASS evaluates prompts through the same end-to-end execution pipeline. Results (9--10 seeds): PRECEPT achieves a +41.1pp first-try advantage over Full Reflexion (d>1.9), +33.3pp compositional generalization (d=1.55), 100% $P_1$ on 2-way logistics compositions (d=2.64), +40--55pp continuous learning gains, strong eventual robustness under adversarial static knowledge (100% logistics with adversarial SK active; partial recovery on integration), +55.0pp drift recovery (d=0.95, p=0.031), and 61% fewer steps. Core comparisons are statistically significant, often at p<0.001.

Arash Shahmansoori• 2026

Related benchmarks

TaskDatasetResultRank
Continuous LearningIntegration Domain
P1 Score70.4
15
Rule drift adaptationIntegration Domain
P1 Score60.7
15
Rule drift adaptationLogistics Domain
P1 Performance Change (pp)83.3
15
Compositional GeneralizationLogistics
P1100
6
Compositional GeneralizationIntegration
P1 Score49
6
Agentic Workflow AutomationIntegration Domain
Overall Success Rate ($P_t$)83.3
3
Agentic Workflow AutomationBooking domain
Success Rate ($P_t$)99.4
3
Agentic Workflow AutomationLogistics Domain
Overall Success Rate ($P_t$)100
3
Continuous LearningLogistics 1st Encounter
P144.8
3
Continuous LearningLogistics 2nd Encounter
P182.5
3
Showing 10 of 19 rows

Other info

Follow for update