Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pengi: An Audio Language Model for Audio Tasks

About

In the domain of audio processing, Transfer Learning has facilitated the rise of Self-Supervised Learning and Zero-Shot Learning techniques. These approaches have led to the development of versatile models capable of tackling a wide array of tasks, while delivering state-of-the-art performance. However, current models inherently lack the capacity to produce the requisite language for open-ended tasks, such as Audio Captioning or Audio Question & Answering. We introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks. It takes as input, an audio recording, and text, and generates free-form text as output. The input audio is represented as a sequence of continuous embeddings by an audio encoder. A text encoder does the same for the corresponding text input. Both sequences are combined as a prefix to prompt a pre-trained frozen language model. The unified architecture of Pengi enables open-ended tasks and close-ended tasks without any additional fine-tuning or task-specific extensions. When evaluated on 22 downstream tasks, our approach yields state-of-the-art performance in several of them. Our results show that connecting language models with audio models is a major step towards general-purpose audio understanding

Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming Wang• 2023

Related benchmarks

TaskDatasetResultRank
Audio ClassificationESC-50
Accuracy91.95
374
Audio CaptioningAudioCaps (test)
CIDEr0.752
140
Audio ClassificationUrbansound8K
Accuracy71.85
126
Environmental Sound ClassificationFSD50K
mAP46.76
91
Audio ClassificationESC-50 (test)
Accuracy92
87
Text-to-Audio RetrievalClotho (test)
R@10.094
69
Audio CaptioningClotho
CIDEr41.6
60
Audio ClassificationGTZAN
Accuracy80
59
ClassificationAudioSet (test)
mAP16.35
57
Audio CaptioningAudioCaps
CIDEr75.2
49
Showing 10 of 110 rows
...

Other info

Code

Follow for update