A comparative study of CNN-based Vs. Hybrid architectures for deepfake content detection

Mahendra Khambhalia; Shivani Trivedi

Authors

Mahendra Khambhalia Faculty of Computer Science, Kadi Sarva Vishwavidyalaya Gandhinagar, Gujarat, India
Shivani Trivedi Faculty of Computer Science, Kadi Sarva Vishwavidyalaya, Gandhinagar, Gujarat, India

Keywords:

CNN, DeepFake, Fake Content Detection, GAN, Hybrid DeepFake Detection Models, Transformers

Abstract

Using Deep Learning Models, people are creating DeepFake contents. Such contents are beneficial in some areas like film making, education, etc. But it has its own misuse also. Such contents can be used to harm others on social media. Generative Models combined with Transformers, are capable to create realistic fake medias. The challenge is to identify such contents, whether they are real or fake. Many researchers have tried to find solutions to this problem. In this paper, we shall focus on trends in hybrid models. CNN with Transformer combination makes it easy to identify such fake contents.
But a hybrid model has its own pros and cons. In this review paper available datasets, existing model strategies and their robustness are discussed. Based on this review, we also tried to open some research areas where more work is possible. We have reviewed number of papers in which existing benchmark datasets are used. Datasets like FaceForensics++, DFDC, Celeb-DF, and other are compared with different models. The dataset bias and its impact is analyzed. We also tried to compare different approaches with their complexities and suggested need of multi-model system for future work.
The traditional CNN models are not sufficient for current fake content techniques. The broader vision will be applied by transformer. Hence, a hybrid CNN-transformer model is our main focus of the survey.

References

[1] R. Ajalkar, S. Shinde, and P. Patil, “Vision Transformers for robust deepfake detection in evolving cyber threat environments,” Lecture Notes in Networks and Systems, Springer, 2025.

[2] R. Nandal, “Privacy and platform-level safeguards for generative AI misuse: Legal frameworks for deepfake protection,” in Generative Artificial Intelligence: Applications and Legal Implications, pp. 225–240, 2025.

[3] Z. Huang, Y. Zhou, and M. Liu, “Improving adversarial robustness of hybrid CNN–Transformer networks for face forgery detection,” IEEE Transactions on Artificial Intelligence, vol. 6, no. 4, pp. 512–523, 2025.

[4] M. Patil, R. Bhosale, and S. Kadam, “GenConVit+: A hybrid CNN–Transformer framework for deepfake detection across diverse datasets,” Journal of Innovation in Science and Technology (JIST), 2024, DOI: 10.62110/SCIENCEIN.JIST.2024.V12.820.

[5] H. Heidari, M. Zhang, and T. Chen, “Deep learning approaches for deepfake detection: A systematic review with insights on transfer learning,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 14, no. 1, e1520, 2024.

[6] Y. Gong and J. Li, “A comprehensive survey of deepfake detection algorithms and datasets: Challenges and future directions,” Electronics, vol. 13, no. 3, p. 585, MDPI, 2024.

[7] K. Jayashre and A. Amsaprabhaa, “Hybrid optimized deep feature fusion for video deepfake detection and integrity safeguarding,” Computers & Security, vol. 142, July 2024, Springer.

[8] X. Li, P. Zhou, and L. Zhang, “SafeEar: A privacy-preserving deepfake audio detection system,” in Proc. ACM Int. Conf. on Multimedia, pp. 3585–3599, DOI: 10.1145/3658644.3670285, 2024.

[9] A. Singh, “Holistic safeguarding of digital media authenticity: Provenance, watermarking, and transparent labeling,” Int. J. for Multidisciplinary Research, DOI: 10.36948/ijfmr.2024.v06i03.21580, 2024.

[10] F. Romero-Moreno, “Deepfake fraud and safeguarding trust in generative AI systems,” Social Science Research Network (Elsevier), 2024.

[11] P. Prasoon and R. Ramakrishnan, “Machine learning–based forensic pipelines for multimedia deepfake detection,” Int. J. Technology Research and Engineering Trends (IJTRET), vol. 8, no. 1, Feb. 2024.

[12] S. Abdullah, R. Hussain, and L. Naveed, “Adversarially trained multimodal transformer for robust deepfake image detection,” Pattern Recognition Letters, Elsevier, 2024.

[13] Z. Yan, L. Cai, and K. Wang, “DeepFakeBench: A benchmark for deepfake detection across datasets using landmark-based features,” arXiv preprint arXiv:2307.01426, 2023.

[14] M. Heo, S. Park, and H. Kim, “Hybrid CNN–Vision Transformer architectures for deepfake detection,” Applied Intelligence, Springer, DOI: 10.1007/s10489-022-03867-9, 2023.

[15] N. Alnaim, M. Abdalla, and S. Bawaneh, “DFFMD: Benchmarking CNN, InceptionResNetV2, and DenseNet models for deepfake detection in healthcare contexts,” IEEE Trans. on Artificial Intelligence, pp. 1–1, 2023, DOI: 10.1109/ACCESS.2023.3246661.

[16] H. Yan, “Digital identity protection in the age of deepfakes: Legal and societal safeguards,” Brooklyn J. of International Law, vol. 48, no. 2, Article 8, 2022.

[17] S. Suganthi, R. Prasad, and K. Manoharan, “Deepfake face recognition: A deep learning model for identity verification,” PeerJ Computer Science, vol. 8, e1032, 2022.

[18] S. Taeb and W. Chi, “Comparison of deepfake detection techniques through deep learning,” Sensors, vol. 22, no. 1, p. 7, MDPI, 2022.

[19] L. Guarnera and O. Giudice, “Convolutional traces for deepfake detection: Low-level forensic signals of generative models,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 666–667, 2020.

[20] A. Zotov, D. Petrov, and E. Ivanov, “Meta-analysis of deepfake detection methods: Effectiveness across neural architectures,” in Proc. ACM Conf. on Multimedia, pp. 43–48, 2020.

[21] H. Yu, J. Liu, and M. Wang, “A survey on video-based deepfake detection: Discriminable spatio-temporal features and challenges,” IET Biometrics, vol. 10, no. 6, pp. 673–685, 2021.

[22] F. Romero-Moreno, “A human rights–based approach to harmful deepfake content mitigation,” International Review of Law, Computers & Technology, vol. 38, no. 2, pp. 143–158, 2024.

[23] Z. Huang, L. Wei, and P. Tian, “Adversarial learning for hybrid CNN–Transformer networks in deepfake detection,” IEEE Access, vol. 13, pp. 1021–1035, 2025.

[24] M. Patil, S. Jadhav, and A. Shelar, “Explainable attention mechanisms in hybrid CNN–Transformer-based deepfake detection,” Springer Journal of Imaging, 2025.

[25] P. Prasoon and R. Ramakrishnan, “Multimedia deepfake dystopia: Safeguards and cross-domain risks,” IJTRET, vol. 8, no. 1, pp. 40–51, 2024.

[26] H. Heidari, F. Zhao, and A. Salehi, “Transfer learning trends in hybrid deepfake detection,” Wiley Data Mining and Knowledge Discovery, vol. 14, 2024.

[27] Z. Yan, Q. Wu, and D. Li, “Landmark-driven cross-dataset evaluation of deepfake detection models,” arXiv:2307.01426, 2023.

[28] N. Alnaim, F. El-Amin, and H. Hussain, “Benchmarking hybrid CNN–Transformer networks for medical forgery detection,” IEEE Access, vol. 12, pp. 44120–44132, 2023.

[29] S. Suganthi, K. Jayaraman, and M. Ravi, “Robust hybrid models for identity-based deepfake verification,” PeerJ Computer Science, vol. 8, e1040, 2022.

[30] L. Guarnera and O. Giudice, “Forensic cues of generative models: Detecting convolutional traces,” Proc. IEEE CVPR Workshops, pp. 100–105, 2020.

[31] Fakhar Abbas, Araz Taeihagh, Unmasking deepfakes: A systematic review of deepfake detection and generation techniques using artificial intelligence, Expert Systems with Applications, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2024.124260

[32] Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, Siwei Lyu, Celeb-DF: A Large-scale Challenging Dataset for DeepFake Forensics, https://doi.org/10.48550/arXiv.1909.12962

[33] Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, Cristian Canton Ferrer, The DeepFake Detection Challenge (DFDC) Dataset, https://doi.org/10.48550/arXiv.2006.07397

[34] Zhenglin Huang and Jinwei Hu and Xiangtai Li and Yiwei He and Xingyu Zhao and Bei Peng and Baoyuan Wu and Xiaowei Huang and Guangliang Cheng SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model (2025) Conference on Computer Vision and Pattern Recognition.

A comparative study of CNN-based Vs. Hybrid architectures for deepfake content detection

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Cover Page

Information