Finding Interpretable Prompt-Specific Circuits in Language Models

About

Understanding the internal circuits that language models use to solve tasks remains a central challenge in mechanistic interpretability. A crucial part of finding circuits is understanding why each attention head attends where it does. To this end, we introduce ACC++, an improved circuit-tracing method based on the principle of attention-causal communication (ACC) [1], which identifies signals, i.e., contents of low dimensional subspaces that cause attention on a token pair. ACC++ extracts circuits from a single forward pass, without replacement models or patching. Circuits identified by ACC++ consist of components that are causal for the model's attention decisions, together with the low-dimensional signals used to communicate between them. Here, we first detail the conceptual advances that ACC++ makes over previous work. We then show that across multiple models, a substantial portion of ACC++ signals are interpretable: many signals admit a short natural-language description. We next present a number of new insights into model behavior obtained via ACC++. First, we use ACC++'s interpretable circuits to characterize the sensitivity of indirect object identification (IOI) circuits to prompt structure. We find that prompt-specific circuits form well-defined clusters, and across clusters, heads receive systematically different signals corresponding to distinct mechanisms for identifying the IO name. Next, in multilingual IOI, ACC++ circuits show that while model components are reused across languages, signals are often language-specific. In a four-language IOI case study, cross-language circuit distances are consistent with linguistic relatedness. Together, these results show that ACC++ can shed light on a broad spectrum of model behaviors.

Gabriel Franco, Lucas M. Tassis, Azalea Rohr, Mark Crovella• 2026

Related benchmarks

Task	Dataset	Result
Circuit localization	Mixing dataset All tasks 1.0 (test)	CPR1.026	28
Circuit localization	Mixing dataset All tasks	CMD0.012	28
Circuit localization	Mixing dataset IOI	CMD0.022	28
Circuit localization	Indirect Object Identification (IOI) 1.0 (test)	CPR1.015	28
Circuit localization	Mixing dataset	CMD0.052	28
Circuit localization	Sequence Completion 1.0 (test)	CPR0.958	28
Circuit localization	Entity-binding 1.0 (test)	CPR1.112	18
Circuit localization	Mixing dataset Entity Binding	CMD0.017	18
Circuit localization	Arithmetic 1.0 (test)	CPR1.017	9
Circuit localization	Mixing dataset Arithmetic	CMD0.2	9

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord