EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance

About

In this work, we aim to analyze and optimize the EnCLAP framework, a state-of-the-art model in automated audio captioning. We investigate the impact of modifying the acoustic encoder components, explore pretraining with different dataset scales, and study the effectiveness of a reranking scheme. Through extensive experimentation and quantitative analysis of generated captions, we develop EnCLAP++, an enhanced version that significantly surpasses the original.

Jaeyeon Kim, Minjeon Jeon, Jaeyoon Jung, Sang Hoon Woo, Jinjoo Lee• 2024

Related benchmarks

Task	Dataset	Result	Rank
Audio Captioning	AudioCaps (test)	CIDEr0.823		222
Audio Captioning	Clotho	CIDEr48		82

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord