In response to the escalating challenge of audio deepfake detection, this study introduces ABC-CapsNet (Attention-Based Cascaded Capsule Network), a novel architecture that merges the perceptual strengths of Mel spectrograms with the robust feature extraction capabilities of VGG18, enhanced by a strategically placed attention mechanism. This architecture pioneers the use of cascaded capsule networks to delve deeper into complex audio data patterns, setting a new standard in the precision of identifying manipulated audio content. Distinctively, ABC-CapsNet not only addresses the inherent limitations found in traditional CNN models but also showcases remarkable effectiveness across diverse datasets. The proposed method achieved an equal error rate EER of 0.06% on the ASVspoof2019 dataset and an EER of 0.04% on the FoR dataset, underscoring the superior accuracy and reliability of the proposed system in combating the sophisticated threat of audio deepfakes.
Dettaglio pubblicazione
2024, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Pages 2464-2472
ABC-CapsNet: Attention based Cascaded Capsule Network for Audio Deepfake Detection (04b Atto di convegno in volume)
Wani T. M., Gulzar R., Amerini I.
ISBN: 979-8-3503-6547-4; 979-8-3503-6548-1
keywords