LINE: LLM-based Iterative Neuron Explanations for Vision Models
About
Interpreting individual neurons in deep neural networks is a crucial step towards understanding their complex decision-making processes and ensuring AI safety. Despite recent progress in neuron labeling, existing methods often limit the search space to predefined concept vocabularies or produce overly specific descriptions that fail to capture higher-order, global concepts. We introduce LINE, a novel, training-free iterative approach tailored for open-vocabulary concept labeling in vision models. Operating in a strictly black-box setting, LINE leverages a large language model and a text-to-image generator to iteratively propose and refine concepts in a closed loop, guided by activation history. LINE achieves state-of-the-art performance across multiple model architectures, yielding AUC improvements of up to 0.11 on ImageNet and 0.05 on Places365, while discovering, on average, 27% of new concepts missed by predefined vocabularies. Beyond identifying the top concept, LINE provides a complete generation history, enabling polysemanticity evaluation and producing visual explanations that rival gradient-dependent activation maximization methods. The source code will be made available soon.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Neuron Interpretation | ImageNet CoSy benchmark avgpool layer 1k | AUC0.97 | 12 | |
| Concept Discovery | ImageNet | -- | 5 | |
| Neuron Interpretation | Places365 CoSy benchmark avgpool layer | AUC94 | 4 | |
| Concept Discovery | Places365 | -- | 2 |