Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mechanistic understanding and validation of large AI models with SemanticLens

About

Unlike human-engineered systems such as aeroplanes, where each component's role and dependencies are well understood, the inner workings of AI models remain largely opaque, hindering verifiability and undermining trust. This paper introduces SemanticLens, a universal explanation method for neural networks that maps hidden knowledge encoded by components (e.g., individual neurons) into the semantically structured, multimodal space of a foundation model such as CLIP. In this space, unique operations become possible, including (i) textual search to identify neurons encoding specific concepts, (ii) systematic analysis and comparison of model representations, (iii) automated labelling of neurons and explanation of their functional roles, and (iv) audits to validate decision-making against requirements. Fully scalable and operating without human input, SemanticLens is shown to be effective for debugging and validation, summarizing model knowledge, aligning reasoning with expectations (e.g., adherence to the ABCDE-rule in melanoma classification), and detecting components tied to spurious correlations and their associated training data. By enabling component-level understanding and validation, the proposed approach helps bridge the "trust gap" between AI models and traditional engineered systems. We provide code for SemanticLens on https://github.com/jim-berend/semanticlens and a demo on https://semanticlens.hhi-research-insights.eu.

Maximilian Dreyer, Jim Berend, Tobias Labarta, Johanna Vielhaben, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek• 2025

Related benchmarks

TaskDatasetResultRank
Neuron LabelingImageNet-1K
DMA53.64
60
Neuron LabelingResNet101 Neurons (evaluated)
AUC89
15
Neuron LabelingSAE Vanilla neurons
SCS Score24.48
15
Neuron LabelingISIC 2019
SCS Score20
15
Neuron LabelingResNet101 neurons
SCS Score22.75
15
Neuron LabelingResNet50 neurons
SCS Score21.96
15
Neuron LabelingResNet50 evaluated neurons
AUC84
15
Neuron LabelingSAE-TopK neurons
SCS Score30.59
15
Neuron Labeling FaithfulnessEvaluated Neurons ResNet50 and SAE-TopK
AUC85
15
Neuron LabelingSAE-TopK Evaluated Neurons
AUC0.94
15
Showing 10 of 11 rows

Other info

Follow for update