Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LINE: LLM-based Iterative Neuron Explanations for Vision Models

About

Interpreting the concepts encoded by individual neurons in deep neural networks is a crucial step towards understanding their complex decision-making processes and ensuring AI safety. Despite recent progress in neuron labeling, existing methods often limit the search space to predefined concept vocabularies or produce overly specific descriptions that fail to capture higher-order, global concepts. We introduce LINE, a novel, training-free iterative approach tailored for open-vocabulary concept labeling in vision models. Operating in a strictly black-box setting, LINE leverages a large language model and a text-to-image generator to iteratively propose and refine concepts in a closed loop, guided by activation history. We demonstrate that LINE achieves state-of-the-art performance across multiple model architectures, yielding AUC improvements of up to 0.18 on ImageNet and 0.05 on Places365, while discovering, on average, 29% of new concepts missed by massive predefined vocabularies. Beyond identifying the top concept, LINE provides a complete generation history, which enables polysemanticity evaluation and produces supporting visual explanations that rival gradient-dependent activation maximization methods.

Vladimir Zaigrajew, Micha{\l} Piechota, Gaspar Sekula, Przemys{\l}aw Biecek• 2026

Related benchmarks

TaskDatasetResultRank
Neuron InterpretationImageNet CoSy benchmark avgpool layer 1k
AUC0.97
12
Concept DiscoveryImageNet--
5
Neuron InterpretationPlaces365 CoSy benchmark avgpool layer
AUC94
4
Concept DiscoveryPlaces365--
2
Showing 4 of 4 rows

Other info

Follow for update