Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Inseq: An Interpretability Toolkit for Sequence Generation Models

About

Past work in natural language processing interpretability focused mainly on popular classification tasks while largely overlooking generation settings, partly due to a lack of dedicated tools. In this work, we introduce Inseq, a Python library to democratize access to interpretability analyses of sequence generation models. Inseq enables intuitive and optimized extraction of models' internal information and feature importance scores for popular decoder-only and encoder-decoder Transformers architectures. We showcase its potential by adopting it to highlight gender biases in machine translation models and locate factual knowledge inside GPT-2. Thanks to its extensible interface supporting cutting-edge techniques such as contrastive feature attribution, Inseq can drive future advances in explainable natural language generation, centralizing good practices and enabling fair and reproducible model evaluations.

Gabriele Sarti, Nils Feldhus, Ludwig Sickert, Oskar van der Wal, Malvina Nissim, Arianna Bisazza• 2023

Related benchmarks

TaskDatasetResultRank
Cell-level attributionToTTo
Precision42.7
6
Cell-level attributionFeTaQA
Precision0.565
6
Cell-level attributionAITQA (gold set)
Precision10.99
6
Cell-level attributionToTTo (gold set)
Precision16.85
6
Cell-level attributionAITQA
Precision19.2
6
Column-Level AttributionToTTo
Precision73.1
6
Column-Level AttributionFeTaQA
Precision (%)82.6
6
Row-Level AttributionToTTo
Precision37.5
6
Row-Level AttributionFeTaQA
Precision56.4
6
Row-Level AttributionAITQA
Precision31.2
6
Showing 10 of 12 rows

Other info

Code

Follow for update