Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance

About

In this work, we aim to analyze and optimize the EnCLAP framework, a state-of-the-art model in automated audio captioning. We investigate the impact of modifying the acoustic encoder components, explore pretraining with different dataset scales, and study the effectiveness of a reranking scheme. Through extensive experimentation and quantitative analysis of generated captions, we develop EnCLAP++, an enhanced version that significantly surpasses the original.

Jaeyeon Kim, Minjeon Jeon, Jaeyoon Jung, Sang Hoon Woo, Jinjoo Lee• 2024

Related benchmarks

TaskDatasetResultRank
Audio CaptioningAudioCaps (test)
CIDEr0.823
140
Audio CaptioningClotho
CIDEr48
60
Showing 2 of 2 rows

Other info

Follow for update