Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Finding Interpretable Prompt-Specific Circuits in Language Models

About

Understanding the internal circuits that language models use to solve tasks remains a central challenge in mechanistic interpretability. A crucial part of finding circuits is understanding why each attention head attends where it does. To this end, we introduce ACC++, an improved circuit-tracing method based on the principle of attention-causal communication (ACC) [1], which identifies signals, i.e., contents of low dimensional subspaces that cause attention on a token pair. ACC++ extracts circuits from a single forward pass, without replacement models or patching. Circuits identified by ACC++ consist of components that are causal for the model's attention decisions, together with the low-dimensional signals used to communicate between them. Here, we first detail the conceptual advances that ACC++ makes over previous work. We then show that across multiple models, a substantial portion of ACC++ signals are interpretable: many signals admit a short natural-language description. We next present a number of new insights into model behavior obtained via ACC++. First, we use ACC++'s interpretable circuits to characterize the sensitivity of indirect object identification (IOI) circuits to prompt structure. We find that prompt-specific circuits form well-defined clusters, and across clusters, heads receive systematically different signals corresponding to distinct mechanisms for identifying the IO name. Next, in multilingual IOI, ACC++ circuits show that while model components are reused across languages, signals are often language-specific. In a four-language IOI case study, cross-language circuit distances are consistent with linguistic relatedness. Together, these results show that ACC++ can shed light on a broad spectrum of model behaviors.

Gabriel Franco, Lucas M. Tassis, Azalea Rohr, Mark Crovella• 2026

Related benchmarks

TaskDatasetResultRank
Circuit localizationMixing dataset All tasks 1.0 (test)
CPR1.026
28
Circuit localizationMixing dataset All tasks
CMD0.012
28
Circuit localizationMixing dataset IOI
CMD0.022
28
Circuit localizationIndirect Object Identification (IOI) 1.0 (test)
CPR1.015
28
Circuit localizationMixing dataset
CMD0.052
28
Circuit localizationSequence Completion 1.0 (test)
CPR0.958
28
Circuit localizationEntity-binding 1.0 (test)
CPR1.112
18
Circuit localizationMixing dataset Entity Binding
CMD0.017
18
Circuit localizationArithmetic 1.0 (test)
CPR1.017
9
Circuit localizationMixing dataset Arithmetic
CMD0.2
9
Showing 10 of 11 rows

Other info

Follow for update