WaViT-CDC : wavelet vision transformer with central difference convolutions for spatial-frequency deepfake detection

Badr, Nour Eldin Alaa, Nebel, Jean-Christophe, Greenhill, Darrel and Liang, Xing (2025) WaViT-CDC : wavelet vision transformer with central difference convolutions for spatial-frequency deepfake detection. IEEE Open Journal of Signal Processing, ISSN (online) 2644-1322 (Epub Ahead of Print)

Abstract

The increasing popularity of generative AI has led to a significant rise in deepfake content, creating an urgent need for generalized and reliable deepfake detection methods. Since existing approaches rely on either spatial-domain features or frequency-domain features, they struggle to generalize across unseen datasets, especially those with subtle manipulations. To address these challenges, a novel end-to-end Wavelet Central Difference Convolutional Vision Transformer framework is designed to enhance spatial-frequency deepfake detection. Unlike previous methods, this approach applies the Discrete Wavelet Transform for multi-level frequency decomposition and Central Difference Convolution to capture local fine-grained discrepancies and focus on texture variances, while also incorporating Vision Transformers for global contextual understanding. The Frequency-Spatial Feature Fusion Attention module integrates these features, enabling the effective detection of fake artifacts. Moreover, in contrast to earlier work, subtle perturbations to both spatial and frequency domains are introduced to further improve generalization. Generalization cross-dataset evaluations demonstrate that WaViT-CDC outperforms state-of-the-art methods, when trained on both low-quality and high-quality face images, achieving an average performance increase of 2.5% and 4.5% on challenging high-resolution, real-world datasets such as Celeb-DF and WildDeepfake.

Actions (Repository Editors)

Item Control Page Item Control Page