Unmasking deepfakes: A systematic review of deepfake detection and generation techniques using artificial intelligence

Due to the fast spread of data through digital media, individuals and societies must assess the reliability of information. Deepfakes are not a novel idea but they are now a widespread phenomenon. The impact of deepfakes and disinformation can range from infuriating individuals to affecting and misleading entire societies and even nations. There are several ways to detect and generate deepfakes online. By conducting a systematic literature analysis, in this study we explore automatic key detection and generation methods, frameworks, algorithms, and tools for identifying deepfakes (audio, images, and videos), and how these approaches can be employed within different situations to counter the spread of deepfakes and the generation of disinformation. Moreover, we explore state-of-the-art frameworks related to deepfakes to understand how emerging machine learning and deep learning approaches affect online disinformation. We also highlight practical challenges and trends in implementing policies to counter deepfakes. Finally, we provide policy recommendations based on analyzing how emerging artificial intelligence (AI) techniques can be employed to detect and generate deepfakes online. This study benefits the community and readers by providing a better understanding of recent developments in deepfake detection and generation frameworks. The study also sheds a light on the potential of AI in relation to deepfakes.


Introduction
The term deepfakes refers to artificial intelligence-(AI-) generated digital content, usually a video, audio, or image, that has been manipulated using deep learning algorithms to alter, replace, or superimpose the original content with new content that appears to be authentic (Coccomini et al., 2022).The term "deepfake" is derived from the combination of "deep learning" and "fake."The deepfake applications appeared in November 2019, but users swiftly amplified it 100 % during the recent COVID-19 pandemic; Healthworld, 2020).
Methods for generating deepfakes generally need many video and image datasets to train frameworks to generate realistic videos, so people in relation to whom there are many videos and images online, such as celebrities and legislators, are often the target of deepfakes.Deepfakes involve exchanging or synthesizing the faces of superstars or legislators for frameworks in offensive videos.Deepfakes can cause political and religious strains between countries, deceive people, and impact elections.Deepfakes can cause confusion in financial markets by creating fake news (Ding et al., 2022;Wang et al., 2022).Satellite imagery can even be manipulated to incorporate things that do not exist: for instance, a fake bridge over a river, which can confuse military experts.In this way, deepfakes might mislead troops to target or defend a non-existent bridge (Corvey, 2021).
Deepfake technology can be very detrimental, as it can be accessed by everyone and it can be employed to stain anyone's reputation, be it for political or personal reasons.This technology has also been used to create fake pornographic material without other people's permission.One thing that warrants serious concern is how difficult it is to distinguish between real and fake images/videos because everyone has access to advanced technology.According to scientists, even the Microsoft and Amazon application programming interfaces (APIs) can be misled or deceived by deepfake-generated videos.It has also been reported that 78 % of deepfake content has successfully deceived the API of Microsoft Azure experimental services (Ahmed, 2021;Shen et al., 2021;Stypułkowski et al., 2023).So far, Microsoft and Amazon's APIs have struggled to differentiate between deepfake impostors and real content.
This problem can be addressed by developing appropriate defense mechanisms, designing improved web-based APIs, and employing effective detection methods.Scientists have used AI to help identify deepfake detection weaknesses (Shen et al., 2021;Raza & Ding, 2022).So far, commercial face recognition APIs have been verified on five datasets: two made by researchers and three that have been made publicly available.It has been found that social media accounts with numerous publicly available pictures tend to increase the likelihood of their posted videos to be perceived as real (Ng & Taeihagh, 2021;Almars, 2021).As a result, the digital domain has become an increasingly important locus for seeking truth.Nowadays, almost everyone can create deepfakes using the prevailing deepfake tools, which makes it more challenging to counteract them.

Research motivation and key contributions
In response to the potential negative consequences of deepfakes, several studies have been proposed regarding ways to detect and create them using deep learning methods.The following few surveys explicitly address deepfake detection and generation.Tolosana et al. (2020) present an overview of face manipulation and fake detection techniques.A survey study by Weerawardana and Fernando (2021) discussed visual deepfake detection approaches but did not discuss deepfake generation and their impacts on the community.The studies by Nguyen et al. (2022) and Seow et al. (2022) examined deepfake detection and generation and their impact; however, they only briefly discussed them and did not consider audio-visual and policy recommendations.Toshpulatov et al. (2021) reviewed generative adversarial networks (GANs) and their application for 3D face generation only.Almutairi and Elgibreen (2022) reviewed recent audio deepfake detection approaches and their challenges; however, the generation of deepfakes still needs to be considered.Recent studies by Masood et al. (2023) and Dagar and Vishwakarma (2022) provide deep insights into deepfake generation and detection techniques, including facial manipulation, audio-visual synthesis, and their challenges, but their studies do not discuss policy recommendations and trends.Table 1 highlights the distinctions between our systematic literature paper and previously available survey and review papers.It provides a comparative analysis of the strengths and weaknesses observed in the existing surveys based on our research questions.
Previous surveys have either not been carried out systematically or have focused on detecting and generating deepfakes individually, rather than focusing on frameworks for understanding deepfake detection and generation using AI.In addition, previous survey papers have not discussed policy recommendations and trends that can guide efforts to stop the spread of deepfakes.Likewise, they have not explored the recent advances in face manipulation, reenactment, face-swapping, attribute manipulation, identity swapping, facial synthesis, audio-visual manipulation frameworks, and deepfake generation and detection.To the best of the authors' knowledge, this study is the first attempt to conduct a comprehensive analysis and review of deepfake generation and detection methods.It also aims to provide the community with modern trends, tools, frameworks, and policy recommendations to prevent the spread of disinformation online.
Hence, the key aim of this systematic literature review is to address existing literature gaps by theorizing and advancing the consideration of deepfake detection and generation techniques to stop the spread of disinformation.This is vital to give clear insights to the community on deepfake detection and generation techniques, and, most importantly, how to generate and detect deepfakes using emerging AI and deep learning.Likewise, the aim is to identify modern trends, policies, limitations, tools, gaps, and challenges as regards stopping the spread of disinformation on social platforms, and to guide the community with regard to future trends.The key contributions of this systematic review paper are summarized as follows: • We systematically analyze key detection and generation methods for deepfakes across various media types, including audio, images, and videos.• Identify gaps in the literature through a comparative assessment of various state-of-the-art surveys and review studies.• Examine various frameworks, algorithms, and tools for identifying deepfakes, providing a consolidated overview of the current technological landscape.• Investigate the effective applications of identified frameworks and tools in countering the spread of deepfakes and mitigating disinformation generation.• Highlight practical challenges associated with deepfake detection and generation, offering a realistic perspective on the obstacles faced in implementing effective policies.• Identify the limitations of current detection and generation frameworks, necessitating attention from the research community.• Present policy recommendations and future trends based on a comprehensive analysis of emerging AI-based frameworks to counter deepfakes in the coming years.
To achieve our research goals, we propose the following analysis questions: "What are existing AI-based deepfake detection and generation methods?How can we detect and generate deepfakes online using AI?What AI tools can be employed to detect and generate deepfakes online?What policy recommendations and future trends can be proposed to counter deepfakes?"We employed a systematic analysis protocol to answer the above questions.
This study is organized as follows.The research methodology is described in Section 2. Section 3 describes the results of the systematic review and is followed by conclusions in Section 4.

Research methodology
The study has been conducted as a systematic literature review to provide the community with insights into deepfake detection and generation frameworks and to provide it with information about current

Table 1
Summary of existing available surveys and review papers on deepfake detection and generation using AI.F. Abbas and A. Taeihagh trends, limitations, tools, gaps, policies, and challenges as regards stopping the spread of disinformation on social media platforms.This section outlines the research questions and objectives, the inclusion and exclusion criteria for the review, the results of the quality assessment, and the data gathering and analysis process.

Research questions
After identifying gaps in existing studies related to deepfakes detection and generation methods, we started our systematic literature review by formulating five research questions (RQs) to address those gaps.Choosing research questions for a systematic literate review is the key step in defining the scope and purpose of the outcomes.Table 2 summarises this systematic literature review paper's research questions and key motivations.
Concerning RQ1, we summarize and discuss the AI-based deepfake detection and generation methods, models, frameworks, and techniques.To provide a comprehensive insight into existing deepfakes frameworks, we examine the merits and demerits of every framework, and their limitations.Under RQ2, we explicitly describe and guide the community on how to detect and generate deepfakes using AI.To address RQ3, we discuss and highlight the tools and software that can be used for deepfake detection and generation, and their impacts.To respond to RQ4, we address the critical challenges encountered in generating and detecting deepfakes online and highlight some vital aspects that could enhance performance and counter deepfakes.In response to RQ5, we elaborate on policy recommendations and future trends to guide the community in efforts to stop the spread of deepfakes.

Search strategy
We conducted a systematic literature review of literature on developing deepfake detection and generation technologies using AI and the practical challenges of stopping deepfakes online.Moreover, we explored how machine learning and deep learning impact disinformation online, and we highlighted the practical challenges encountered in implementing policies to counter deepfakes.
Two key scholarly databases were employed for the collection process.These are the most widely used repositories for retrieving computer science and AI literature: Scopus Scopus TM and Web of Science TM (WoS).An extensive search string was developed using the parentheses 'AND' 'OR' in the search process.While developing the search string following sets of keywords were defined to reduce the complexity of the search process.
The keyword string filtered out potential research work at the initial search stage.The studies were filtered based on the inclusion and exclusion criteria set out below.Fig. 1 represents a visualization of a search string generated using the VOYANT tool (Sampsel, 2018).

Inclusion and exclusion criteria
We considered different inclusion and exclusion criteria to gather relevant studies for our systematic review.The criteria used to select articles were as follows: (i) articles written in English; (ii) articles that proposed and developed AI-based detection and generation techniques, methods, algorithms, frameworks, tools, and models to identify the spread of deepfake/disinformation; and (iii) articles related to research questions and studies that focus on deepfakes detection and generation only.The criteria we used to exclude articles were as follows: (i) duplicate articles found via different databases; (ii) articles written in other languages; (iii) articles that do not consider detection or generation methods, frameworks, or tools; (iv) irrelevant studies; and (v) survey articles.Using this strategy, we ensured that all related studies were included and that inapplicable studies were excluded from the systematic literature review.

Table 2
Summary of research questions and key motivations includes in this study.
To evaluate the current state-of-theart in deepfake detection and generation using AI techniques.As AI techniques for deepfakes evolve rapidly, staying up-to-date with the latest developments and evaluating their effectiveness is essential.

RQ2
How to detect and generate deepfakes online using AI?
To understand how to detect deepfakes produced with GAN and its variants.GAN-generated deepfakes could be used to spread even more damaging misinformation, and it is vital to stay ahead of the curve by developing effective detection and mitigation frameworks.

RQ3
What are AI tools and software employed to detect and generate deepfake online?
To summarize and discuss the latest tools for detecting and generating deepfakes and the mechanisms involved in developing them.

RQ4
What are the recommendations and critical challenges for detecting and generating deepfakes?
To study and develop practical recommendations for detecting and generating deepfakes to help the community improve security and privacy, mitigate the negative impacts, and, most importantly, help researchers with advanced state of the art in AI for deepfakes.

RQ5
What are the policy recommendations and future trends to counter deepfakes?
To develop policy recommendations and explore future trends to counter deepfakes.Exploring future trends in technology will help policymakers on how to regulate and manage the use of deepfake technology.
F. Abbas and A. Taeihagh

Quality evaluation
Evaluating the quality of the evidence in a systematic literature review paper is as important as analyzing the data.Bias in research methods may bias the results of poor studies and should be explained with caution.Studies that employ research methods prone to bias or have limitations that may impact the validity of findings should be recognized in, or omitted from, literature reviews.It is also vital to select appropriate criteria to examine the reliability of documentation and bias in every paper.In line with standards that are well-defined in Do et al. (2005) and the PICO Portal (Pico, 2022), we employed such criteria to authenticate the selected studies and to review those studies so as to extract related data in this systematic review paper.In addition, a validation method was used to evaluate the included articles to guarantee reliability across different data sets.After the quality evaluation stage, we identified 206 eligible studies presenting deepfake detection and generation methods.Fig. 2 illustrates the keyword co-occurrences analysis of eligible studies for our systematic literature review.

Data gathering and data analysis
Data gathering and examination were conducted to identify AI-based frameworks, models, methods, tools, and algorithms for deepfake detection and generation, and specifically for detecting and generating deepfakes using AI-based generative techniques.Therefore, articles were read closely to analyze data and to organize the systematic review and structure the content logically and coherently to convey our research findings and analyses.AI professionals administered the study process in order to guarantee the reliability and quality of the work.For every selected article, this study extracted data about the research objectives, questions, findings, tools, methods, frameworks, algorithms, classifiers, datasets, and study titles.Lastly, the extracted data were integrated and analyzed to summarize the existing studies and to identify potential areas for future research.While conducting the research, the authors discussed the development of the search strategy and protocol with AI experts.
Fig. 3 illustrates the number of publications in the deepfake detection and generation research area.It also illustrates the distribution of studies based on our search string and eligible studies by publication year.Many relevant studies were published in the period from 2016 to 2023.The number of studies published has risen steadily in recent years.

Results
This section summarizes our findings and results for each research question.

Study contexts and characteristics
A PRISMA flowchart was used for the details of the evidence, as shown in Fig. 4 (Page et al., 2021).Fig. 4 summarizes the data selection and screening process.The search string resulted in 4,172 articles from the mentioned databases, this search was performed in March 2023.For the screening process in our study, we used the PICO Portal (Pico, 2022).This is a framework that helps formulate research questions and search strategies to identify relevant studies for a systematic review.From the 4,172 articles, we first removed duplicate articles and those not in English.After the first round of screening using the PICO Portal, there were 4,022 articles.In the next round, which involved abstract, title, and fulltext screening, 3,816 articles were excluded, due to their lack of relevance.In the final round of screening, we identified 206 studies that fulfilled the inclusion criteria for the final literature review.

Deepfake detection and generation using AI (RQ1 and RQ2)
In conducting this study, we extensively analyzed the included studies concerning emerging deepfake detection and generation using AI.A review of each model for detection and generation is presented in this section to provide an intensive understanding of the different models, tools, frameworks, and technologies.
In recent years, deep learning has been applied extensively to recognize images/visual video features, such as in facial landmarks, facial expressions and emotions, lip-syncing, head pose and alignment, lighting and shading and content regulation (Anantrasirichai & Bull, 2022;Yang et al., 2022).Although manipulating video and image content has become pervasive, many people continue to post videos, photos, and news daily on social media platforms, such as YouTube, Twitter, Instagram, Facebook, TikTok, and Weibo.Many recent studies have proposed deep learning-based AI models for detecting deepfakes, face swaps, face reenactment, facial synthetics, attribute manipulation, identity swaps, and image/video manipulation.Fig. 5 illustrates the taxonomy of deepfake detection and generation approaches.
With the growing performance potential and demand for procreation methods in AI, detecting and generating deepfakes has become a critical issue.To address our research questions (RQ1 and RQ2), we identify deepfake detection and generation methods that are extensively reported in the literature, and we highlight their models, frameworks, algorithms, achievement, limits, and challenges.Likewise, we discuss how to generate and detect deepfake using generative AI.

Deepfake generation methods
Deepfake generation refers to a digital media manipulation method that overcomes the substantial limitations of traditional forgery generation frameworks that are designed to generate forgeries or fraudulent content by minimizing traces of manipulation or fingerprints.It has been widely used for detecting forgeries, such as inconsistent biometrics or compacted artifacts.Deepfake generation uses deep neural networks (DNNs) to extract input characteristics and produce synthetic fake yet hyper-realistic content (Kolagati et al., 2022).Detecting deepfakes is more challenging than detecting traditional manipulation of digital media due to the minimal divergence between the real and counterfeit data boundaries (Devasthale & Sural, 2022;Kawa & Syga, 2021).
This section of the paper discusses methods, frameworks, models, and techniques for deepfake generation, technologies that can be used to create manipulated content, face swaps, lip sync, face reenactment, attribute manipulation, generative adversarial neural networks, and the forensic study of deepfakes.Deepfake generation can be grouped into two methods: face reenactment and face-swapping and facial and video syntactic generation using generative adversarial network (GAN) and its variants.Fig. 6 illustrates the taxonomy of included deepfake generation approaches using GAN and its variants.
3.2.1.1.Face reenactment and face swap generation.Face-swapping, reenactment, and attribute manipulation are not new problems: videos and images have been forged since their creation.In face-swapping, an individual's face in an input video is swapped with the face that appears in the intended video (Ding et al., 2020), as shown in Fig. 7(a).Generally, a face-swapping method comprises three stages (Ding et al., 2022;Coccomini et al., 2022;Sun et al., 2023).Firstly, these methods detect a face in the input image and then choose a target image from a library that is comparable to the input image.
In the next stage, the mouth, nose, ears, and eyes are swapped and fine-tuned to match the appearance of the source image.Lastly, combined candidate substitution occurs that involves calculating the distance based on the overlapping area.
These methods have limitations.For instance, these approaches swap faces with a target face, resulting in image appearance loss.The synthetic results are rigid, as the swapped faces look modified (Stephen & Mantoro, 2022;Wang et al., 2022).Fig. 7(b) shows deepfake face reenactment, where the source image is employed to drive the target image's gaze, expression, and mouth (Tran et al., 2019;Bounareli et al., 2022).Gaze reenactment refers to a method whereby the direction and position of the eyes of the target image are derived from the source image to sustain eye contact in a deepfake image or video (Dolhansky et al., 2020;Mirsky & Lee, 2021).Mouth reenactment is a process whereby the mouth of the target image F. Abbas and A. Taeihagh is driven by the source image and the input voice, which together conveys speech.Finally, expression reenactment is a method whereby the input image drives the expression of the intended image.
Attribute manipulation involves deploying facial expressions, age, skin color, gender, hair, and facial hair in an image.In attribute manipulation, encoders and decoders, or a combination of both, as well as conditional attributes and GANs, are used.In general, "decoding" is the process of decoding the latent representation of attributes.There is a correlation between latent representation and independent attribute editing.In this way, it is possible to manipulate the attributes independently without losing identity information, resulting in the smooth generation of the results.Fig. 7(c) illustrates facial attribute manipulation shaped using SC-GAN (Li et al., 2023) with smiling faces and opening eyes.SC-GAN simulates the eyebrows of input faces using the original shape as a basis for attribute manipulation.
Table 3 summarizes deep learning-based generative methods and frameworks for face reenactment and faceswap generation.
Several scholars have drawn attention to videos and images that display deepfake facial data created through manipulation.Phanindra et al. (2023) developed a facial model generation framework using DC-GAN.This employed a deep convolutional neural network, the DC-GAN, to generate forged images.The model was capable of transforming noise into pragmatic faces by employing DC-GAN.However, the model failed to preserve facial features.
In their study, Sun et al. (2023) proposed a framework that employs the minimum noticeable difference-(MND-) GAN approach to create adversarial, privacy-preserving images.The study explored preserving the privacy of images using MND.Neves et al. (2020) introduced a novel GANprintR method to generate and detect manipulated faces.The authors introduced a policy based on auto-encoders to eliminate fingerprints from simulated deepfake images to preserve an image's visual quality while detecting deepfakes, as shown in Fig. 9. Manjula et al. (2022) introduced various methods to generate data by identifying true features within images.The authors employed GAN, VAE-GAN, C-GAN, and star-GAN to create faces from available datasets.They employed different publicly available datasets (VGGFace2 and CASIA Web Face for real images, and TPDNE and PGGAN for synthetic images) to perform experiments, as shown in Fig. 8 (Casia Dataset, 2020).This approach performs well in generating and detecting real and synthetic images; however, this model fails to distinguish facial features.In Ding et al. (2022), an anti-forensics tool was developed for synthetic videos using an adversarial network to preserve image visual quality.A cost function with additional features was also introduced to enhance the effectiveness of the developed model.Simulation experiments were performed to validate the performance by reference to benchmark antiforensics methods.The images synthesized with this model were free from any visual artifacts.The cost function was designed to minimize the creators' costs while maximizing the discriminator's costs.Fig. 9 represents the generic framework using GAN.
The generic cost function Z GAN can be defined as follows: F. Abbas and A. Taeihagh where λ represents the weight of cost function to avoid image visual quality deterioration, H 1 represents GANs, J represents discriminators, and K signifies deepfake frames created from real frames.Z(H 1 , J 1 , J 2 ) is defined as: Z(H 2 , J 3 , J 4 ) is defined as And Z(H 3 , J 5 , J 6 ) is described as This model can be employed for different real and synthetic images; however, it only identifies GAN-generated forgeries and ignores deepfake synthetic videos, and is not capable of preserving sufficient facts.
Facial image manipulation is becoming more widespread due to the development of facial image editing social platforms and imagerestoring software.Many researchers have developed efficient deep learning-based models to deal with image editing problems.For object completion, an efficient deep GAN method-based algorithm was proposed by Li et al. (2017).Two adversarial cost function were introduced to normalize the training procedure and to achieve a better visual quality of images.The model can create semantically effective and visually rational content for the central part of faces lost due to irregular noise.However, the proposed model fails to deal with unaligned faces.This shortcoming of this model can be circumvented using a 3D data extension.To deal with the image editing and face-swapping problem, Natsume et al. (2019) presented an FSNet-based DNN model.The model can separate the facial appearance of flat faces without requiring further fine-tuning and preserving the facial identity of an input image.However, the proposed model is unable to maintain quality at different angles and lightening.

Face and video synthetic generation.
Face and video manipulation in digital images have been extensively investigated in recent years.Face and video manipulation have recently been used to generate deepfakes that leave a lasting impression on the viewers.Facial creation involves synthesizing real images of faces to create faces that do not exist.The massive development of AI frameworks has led to them being widely used in facial image synthetics.Emerging AI methods, namely GANs (Patel et al., 2022), have been effective in creating realistic fake facial images.In video synthetics, the goal is to create faces that do not exist but that look real.AI-based facial synthetics can be employed for malevolent purposes: for instance, synthesizing fake images that are intentionally created to look authentic and genuine, even though they are entirely fabricated or manipulated) for use on social platforms to spread disinformation.Numerous methods have been developed to create real facial images, and in many cases humans cannot identify whether these faces are real or synthetic (Li et al., 2023;Tran et al.,          F. Abbas and A. Taeihagh wise (SIW) method.The framework uses the SOF method to render 2D segmentation maps based on the information provided in an arbitrary view.With the SIW method, the framework can produce high-quality portraits.However, the framework fails to capture symmetry and structured patterns.Lee et al. (2021) developed a new SFFN-based model to identify fake facial images created by GAN.The study introduced a classifier that effectively detects deepfake faces created with manipulation tools and deep learning approaches.The framework used to detect fake faces was used to distinguish between real and manipulated face images with multiple levels of complexity.The framework performed well in detecting and combating synthetic face images created using deep GAN; however, it failed to preserve the quality of fake facials after distinguishing features.Shen et al. (2021) presented deep insights into identifying synthetic faces generated using different methods.The authors proposed four interactive experiments to understand human reactions to synthetic faces and videos.The study can assist in identifying synthetic faces but was unable to distinguish high-quality ones.
A study by Zhang et al. (2022) developed an AP-GAN-based framework to preserve features of created faces compatible with the targeted face, especially in high-accuracy video.The study introduced a U-Netbased creator to interpret identities and PE blocks to get precise expressions.In addition, the study introduced a discriminator-based perceptual cost function to preserve facial characteristics.Extensive experiments were performed to validate the proposed framework's efficiency and accuracy as against benchmark techniques.The study showed pragmatic results but was sensitive to angle, gaze, and posture changes.Singh et al. (2020) employed StyleGAN2 to generate synthetic videos while preserving facial expressions.They employed various images to generate a model of each face to create a synthetic video while ensuring quality.StyleGAN2 was trained on a wide range of images comprising YouTube videos of different TV shows and synthesized images containing facial features.In addition, the study employed 1000 images to train a model to create a deepfake in which the synthesized image was overlaid.Fig. 11 summarizes how the framework was used to create a synthetic video using the StyleGAN2 framework.
Several studies have proposed different evaluation metrics to evaluate the quality of output generated by GAN, including the Inception Score, Frechet Inception Distance (FID) and OpenFace.OpenFace was employed to measure the discrepancy between faces created by StyleGAN2.
Several AI techniques can be employed to swap identities, including expression manipulation and facial synthesis.In order to protect individuals' privacy on an online social platform, it is necessary to introduce new methods to detect and generate deepfakes.2020) introduced a deepfake-generated video detector to identify the synthetic video.The study examined deepfakecreated images and videos to assess their efficiency.The study was unable to preserve the quality of the generated video while undertaking the detection process.
Zakharov et al. ( 2019) presented a framework by incorporating a few-shot capability, which involves conducting extensive meta-learning on a substantial dataset of videos.Subsequently, the framework addresses the few and one-shot learning of neural talking head models for individuals previously encountered, treating them as adversarial training problems by employing high-capacity generators and discriminators.A study by Wang et al. (2022) introduced an audio-to-video facial feature generation framework called talking Faces.The study examined recent progress in facial attribute generation.Furthermore, authors (Chen et al., 2023;Ma et al., 2023) developed an unsupervised variational style transfer framework that comprises three essential parts: encoder to extract facial attributes, hybrid decoder to model speech movement and variational style enhancer.The framework was able to capture the facial characteristics from a random video.

How to generate images and videos using AI.
There has been substantial progress in attribute repositioning and manipulation using F. Abbas and A. Taeihagh emerging deep generative methods.Deepfake is a technique that synthesizes facial features in a video into a particular target using AI technology.AI techniques utilize auto-encoders and GAN to create images and videos while preserving visual features.Fig. 12 shows the deepfake creation process using auto-encoders and GAN (Lee & Kim, 2021).It can be seen in Fig. 12 that the discriminator (D) repeats the procedure until it identifies real and fake images and videos.
Liu et al. ( 2021) designed a deep neural network-based technique to generate images from characteristic labels.Moreover, the study developed an innovative data augmentation method utilizing created images and validated the image generation quality using experiments.However, the quality of the generated image could have been higher, but this would require an immense number of target images.

Framework for deepfake generation.
A framework for generating facial images based on target attributes, map labels, and embedding vectors is presented in Fig. 13, which shows a framework for creating images with auto-encoders and adversarial loss (Liu et al., 2021).Fig. 13(a) and (b) show an image re-generation autoencoder and classifier F, in which the encoder Q en encrypts the input image g 0 to embedding features v 0 and the decoder Q de discovers to re-generates an image g 0 from v 0 , where K 1 is image re-generation and the adversarial cost is employed to develop the model.
The input image g 0 is fed to an encoder Q en and flattened into an embedding vector v 0 .After that, v 0 is decrypted g 0 to optimize an image.
The decoder Q en is defined as follows: The decoder Q de is defined as K 1 is image re-generation loss among v 0 and v 0 .The optimization function R Q for the encoder and decoder is described as where R adv represents an adversarial loss of an image which can be computed as below: R rec signifies re-generation loss, which can be computed as follows: Whereas e t is employed to control adversarial loss R adv , and repetition relation is described as follows: where λ e represents the learning ratio for eth training and η is a hyper-Fig.10.Example of synthetic faces and attributes manipulation created using GAN variants (Tran et al., 2019;Hou et al., 2023).parameter.The classifier F is trained on input images and labels b 0 .To identify facial characteristics b 0 , the optimization function R F is defined as: The different convolutional kernels can be employed to generate images while preserving the visual features of images.The proposed framework for generating deepfake images and videos can be employed efficiently to preserve the visual quality of the image.Salama and Hel-Or (2020) developed a method to identify source generators that created fake images using multi-class classification.The study designed a multiclass model to create an image generator profile to evaluate generated images, as shown in Fig. 13.The study performs well in generating synthetic image profiles but fails to preserve distinguishing facial features.Ding et al. (2020) presented a method that automatically eliminates facial masks and synthesizes affected areas while preserving facial features.The authors employed two discriminators to identify the region and structure of faces.This method can efficiently generate conceivable face images on synthetic datasets.In recent years, the landscape of deepfake technology has witnessed significant advancements, particularly in the development of novel activation functions.The study (Gustineli, 2022 delves into the statistics and trends surrounding utilizing specific activation functions, including LeakyReLU, MISH, Swish, and Pish, in deepfake generation.Among the notable activation functions, LeakyReLU has emerged as a widely adopted choice in deepfake generation.Statistical analysis reveals that approximately 70 % of recent studies in deepfake generation employ LeakyReLU due to its effectiveness in handling vanishing gradients and improving model performance (Gustineli, 2022).
In parallel, the MISH activation function has gained substantial traction, with an observed increase in its utilization by over 30 % in the past two years.This non-linearity function has demonstrated remarkable performance improvements in generating more realistic and convincing deepfakes (Misra, 2020).
Swish and Pish, while less prevalent than LeakyReLU and MISH, have shown promising results in specific deepfake generation scenarios.Statistical data indicates a steady rise in the adoption of Swish and Pish, with an increase of 15 % and 10 % in their usage in the last year (Kumar, 2022;Kawa & Syga, 2020).

Deepfake detection methods
In this section, frameworks for detecting deepfakes are analyzed and summarized.From a comprehensive study it is found that most deepfake detection models employ emerging AI and deep learning techniques.However, deepfakes primarily manipulate videos and images by using emerging deep learning methods to detect them.Due to recent technological advancements, AI-based generative deepfake detection has become a challenging task.This section reviews and discusses deepfake   Table 5 summarizes the deep learning and machine learning-based methods and frameworks for faceswap, attribute manipulation, and face reenactment detection.
Recently, AI has been employed for computer vision and big data investigation.A deepfake detection method was proposed by Lee and        F. Abbas and A. Taeihagh to benchmark algorithms, with higher computational overhead.

Synthetic faces and audio-visual detection.
This section provides an extensive analysis of current synthetic face and audio-visual detection methods.Deepfakes are primarily characterized by two types of audio and video manipulations (Müller et al., 2022;Gu et al., 2021).An analysis of every audio and image-based detection method is provided to give in-depth insights into different techniques.We present a critical examination of current studies, covering their methodologies, competencies, challenges, and limits, as well as upcoming trends in audio and synthetic image detection techniques.
The audio-visual inconsistency-based detection approach detects audio and visual inconsistencies in multimedia presentations.A conflict may arise when sound and image do not sync or when audio content contradicts or conflicts with visual content (Johnson et al., 2022;Lim et al., 2022).Detecting audio-visual inconsistencies is crucial for making multimedia presentations effective and engaging because inconsistencies between audio and visual elements can distract and confuse viewers, resulting in a less positive experience and potentially reducing the impact of a message Balasubramanian et al., 2022;Suratkar & Kazi, 2022;Zhang et al., 2022).Audio-visual synchronization on online media platforms has recently become a vital issue.Several models have been proposed for audio-image analysis.Fig. 17 illustrates the generic pre-processing pipeline for audio-visual deepfake detection.
An AVFakeNet framework was presented by Ilyas et al. (2023) to classify manipulated audio and videos using DST-Net learning.The authors employed different input and output blocks in the DST framework to extract features.The DST framework is robust enough to identify high-quality deepfake videos with different angles and poses.Hamza et al. (2022) analyzed audio deepfakes through Melfrequency cepstral coefficients (MFCC) features by employing different deep learning models.Their study employed the MFCC method to obtain useful information from audio to identify audio deepfakes.They compared different deep learning models to assess their efficiency in identifying synthesized audio.Hao et al. (2022) propose audio-video detection methods for detecting synthetic speech using spectrogram investigation.This study introduces numerous methods that employ video and audio speech to identify deepfakes through audio-video variations and LSTM for final classification.The proposed technique can detect deepfake video with better accuracy.However, this model needs a deepfake video for training and fails to identify the video properly if additional settings are employed in the video.A self-referential technique based on AVCM was proposed by Lewis et al. (2020) to tackle with audio-visual reliability concerns by incorporating synchronous audio recording.They employed reliable audio to explore visual effects in synthetic facial regions.Several detection methods were employed, such as Face Warp, IQM, and lip sync, to validate their effectiveness.However, the performance of this model decreases when mouth movement is limited in the video.Jafar et al. (2020) presented a deep learning-based Discrete Fourier Transform-Multi-frame (DFT-MF) model that considers mouth features to identify deepfake videos by investigating lip-mouth movement.MoviPy software was employed for video editing based on words while the mouth remains open.However, the video's performance significantly degraded when mouth movement was limited.Deepfake video and v detection is demanding.Aduwala et al. (2021) and Duong et al. (2022) propose a GAN and a discriminator to identify synthetic videos or images.MesoNet was employed to train GAN and a discriminator to detect deepfakes.The discriminator is a comprehensive convolutional neural network (CCNN) comprising eight filters with kernel size 3 and ReLU.In this model, five discriminators were collected and verified.However, the proposed model has limited performance when trained on different datasets.Stanciu and Ionescu (2021) explored temporal features using LSTM.The authors invested in facial regions, which comprise additional information that can be used to assess the reliability of video images.Trinh and Liu (2021) presented a method to examine the fairness of artificial intelligence in identifying deepfakes.A detector algorithm was developed to investigate fairness using pre-trained datasets to identify and detect facial features.The proposed model can identify differences in predictive performance over races and bias in existing datasets.
A low-quality video detection method was proposed by Lee et al. (2022), named BZNet, which utilizes an unsupervised super resolution (SR) technique.The proposed technique can efficiently identify fake videos and audio by extending low-quality signals using BZNet.Different loss functions were applied to train branch zooming (BZ) segments in their approach to extracting facial features from highquality images.The validity of BZNet was tested on different publicly available datasets.GAN has shown marvelous achievements in regards to modelling the spread of data.The progress here arises from the improvement in training methods.He et al. (2021) developed a novel deepfake image detection technique based on a combination of different feature representations which detect frequency features.Their model classifier was trained on real and fake images, which trained a re-synthesizer only on real images to extract features and identify deepfake images.Fig. 18 represents the flow of detection where the re-synthesizer (S) was trained on real and deepfake images.It is well known that (S) only can extract powerful features from images and identify real and fake images.(S) takes diverse types of input to capture visual features: for instance, de-noising (D), colorization (C), and super resolution (SR).
Real images were employed to train re-synthesis (S) with a downsampled type to reconstruct the original image for final classification.Statistically, dataset D shows real images to train the SR model ϕ over the dataset D, which is defined as follows (He et al., 2021), with a loss function: where Ω shows image operations, Z ∈ D and Z↓ represent a version of Z.
As an outcome, re-synthesizer S(.) = ϕ(Ω(.)).Here Ω = {J, K, N} denotes identity, gray-scale, and noise processes, respectively This model is effective and can be employed to detect forged images.Tanaka et al. (2021) developed a technique to identify fake images using a robust hashing algorithm.This model can identify tampered-with images by computing hash values against images.However, this model does not perform for low-quality images.Table 6 summarizes the deep learning and machine learning-based techniques and frameworks for synthetic face and audio-visual detection.Suratkar et al. (2022) present a comparative study of deepfake image-video detection using deep learning techniques.This study classifies deepfake images by extracting features from a video using Xcep-tionNet and CNN.Guarnera et al. (2020) developed an expectation maximization (EM) algorithm-based method to identify invisible forensic traces within images.The proposed method can classify an image as real or deepfake and predict techniques for generating deepfakes.This model employs EM algorithms to define a mathematical model to obtain image features.The estimation shows better-quality performance compared to benchmark models; however, this model cannot be applied to detecting a synthesis of sensitive facial expressions.Zhao et al. (2021) implemented a multi-feature synthesis deepfake detection method named MFF-Net to extract quality and frequency features jointly via Gobar deep convolution from images.This model's texture improvement components improve feature extraction accuracy to obtain appropriate artifacts.Moreover, a multi-fusion loss function was presented to penalize and scale feature vectors.This method performs well compared to benchmark algorithms with the highest computational cost.2019) introduced an AI-based method to identify synthetic images considering relative loss.Their approach employed several GAN and DenseNet models to generate fake and real image pairs, presenting a pairwise learning method to identify features among real and fake images.However, this approach has an accuracy issue as it was only tested on limited datasets.Elpeltagy et al. (2023) introduced a smart hybrid model to identify forged video by employing XceptionNet and ResNetV2.It employed the OpenFace approach to extract features and the PyAudio library to obtain audio features.The model performs well compared to baseline approaches on limited datasets.

How to detect deepfakes using emerging ML.
Deepfakes are becoming more prevalent.Mitra et al. (2020) and Su et al. (2021) developed a machine learning-based model to detect deepfake videos on online social platforms using different CNN models.They employed three pre-trained models for the classification of videos.The proposed model is appropriate for detecting compressed video on limited datasets.Fig. 19 represents a deepfake detection flow using two encoder and decoder pairs.The framework used to detect deepfakes comprises two CNN methods and a classification network, which are used to identify deepfake images or videos.Different CNN methods were employed to extract features from an input video.
The framework employs emerging AI-based technology to distinguish between deepfake and authentic content.Conversely, deepfake detection techniques and ML frameworks explicitly highlight the role of machine learning algorithms and models in identifying deepfake videos.Machine learning, a subset of AI, entails training models on extensive datasets to discern patterns and make predictions or classifications.
In essence, deepfake detection using emerging AI encompasses a broader spectrum of AI techniques and frameworks, including but not limited to ML, which involves more advanced and diverse methods beyond traditional machine learning algorithms.These frameworks encompass a wider range of AI technologies such as deep learning, computer vision, natural language processing, and other emerging AI methodologies specifically tailored for deepfake detection.The overarching goal of these techniques and frameworks is to enhance the accuracy and effectiveness of deepfake detection.
3.2.2.3.1.Deepfake detection process.To detect deepfakes, datasets containing real and manipulated video clips are utilized to extract frames from every input video.After extracting frames, cropped faces from every frame are resized according to the input of different CNN modules while keeping the image size (299, 299, 3) for Inception methods and (224, 224, 3) for ResNet50.For the final classification of input video, the global pooling2D layer was employed with 0.6 dropouts, followed by a fully connected layer containing 1024 vertex points, ReLU, and the SoftMax layer.Fig. 20 shows the classification workflow used to detect deepfakes.
The length of frames extracted from an input is represented as N, with time complexity Q(N).The process of detecting synthetic video continues until forged frames are identified.There are three stages for convolution operation, named as follows: input images, feature detectors, and feature mapping.A feature detector is a group of matrixes passed by the input image to generate new feature maps from convolution operations among filter and pixel values of a source image at every level (x, y).The total density (K) is defined as follows: where Z represents the number of channels in an image,H 2 I × Z represents feature matrix size, and H 2 J × Z represents filter size.The relative complexity (X) can be defined as follows: It will be noticed that the complexity of the generic method is higher than the proposed complexity (K).The employment of Xception models enhances the accuracy of feature prediction.This model is efficient and can be employed to detect forged videos and images.
In deepfake detection, the choice of activation function plays a key role in enhancing the model's discriminative capabilities.While ReLU remains a foundational choice, recent statistics reveal a growing trend towards incorporating LeakyReLU, MISH, and Swish functions, showcasing a combined adoption rate of over 50 % in the last three years (Kawa & Syga, 2020).
Additionally, Pish, while less commonly employed, has shown promise in specific detection scenarios.Recent studies have reported an increase of 8 % in the utilization of Pish as an activation function in deepfake detection models (Salvi et al., 2023).
These statistics underscore the dynamic nature of the deepfake landscape and highlight the evolving preferences for specific activation functions in both generation and detection methodologies.The trends observed here offer valuable insights for researchers and practitioners Fig. 18.Framework flow of real and deepfake image detection (He et al., 2021).
F. Abbas and A. Taeihagh Table 6 Overview of synthetic face and audio-visual detection methods.

Collection of Properties
Basic idea/ focus Performance analysis of models/results F. Abbas and A. Taeihagh seeking to optimize their approaches in this rapidly advancing field.

Tools and software for deepfake detection and generation (RQ3)
The use of AI in the manipulation of digital facial images, audio, and video has advanced tremendously in recent years.GANs have been extensively employed to manipulate audio, videos, and images; however, GANs and their variants still face challenges in differentiating between the characteristics of synthetic media and lack the ability to manipulate or control certain attributes associated with synthetic media, such as deepfakes, mainly in high-resolution fields.
Face-swapping is a pervasive method of swapping a face from a source to a target image to get a pragmatic output (Deepfakes, 2017/ 2022).The key idea behind real face-swapping, face manipulation, attribute manipulation, and facial synthetics is the employment of GAN (Guarnera et al., 2020).The development of face-swapping, face reenactment, facial synthetics, feature manipulation, expression swap, face morphing, and face-to-face swapping techniques is getting increasing attention.For instance, the authors (Choi et al., 2018;Karras et al., 2019) introduce image manipulation and facial attribute transfer methods using RaFD and CelebA datasets to create realistic images.The methods were trained on the CelebA and RaFD datasets (75 % of the dataset for training and 25 % for validation), which utilize a more streamlined set of frames than standard benchmark methods.Likewise, JinTian (2022), Deng et al. (2020), andIperov, (2022) presented Dis-coFaceGAN and DeepFaceLive-based frameworks to generate highquality face swap videos.Further, they analyze the latent space transformation using 3D adversarial learning.The frameworks demonstrated superior performance to benchmark methods, with higher computational overhead.Nvidia (2022) and Roeser (2022) developed the StyleGAN ADA and Wombo tools to create realistic images and videos using the CelebA dataset.The method introduces an adaptive discriminator augmentation technique that expressively stabilizes the training process.However, the performance of these tools decreases when mouth movement is limited in the video.Likewise, Wang (2022) and Rampas (2022) introduce a face detection tool called FALdetector using CNN.In addition, Deepfake-Bot (Rampas, 2020(Rampas, /2022)), DepFA (DepFA, 2018/2022), and DiscoFaceGAN (Deng et al., 2020) are well-known deepfake creation tools that are used to recreate facial expressions and animators.Table 7 summarizes the tools and software used for deepfake image, audio, and video creation and detection.The approach has an accuracy issue as it was only tested on limited datasets F. Abbas and A. Taeihagh

Discussion and recommendations (RQ4)
With the development of AI techniques, deepfakes can now be generated and detected easily.Deepfake content is also spreading faster than ever due to advances made by online social media frameworks (Gragnaniello et al., 2022;Khichi & Yadav, 2021;Patel et al., 2022).The quality of deepfakes creation is improving, so there is a need to improve the performance and accuracy of detection techniques.The motivation here is linked to the fact that what AI creates can also be identified by AI (Chen & Hsu, 2023;Hsu et al., 2020;Ismail, Elpeltagy, Zaki et al., 2021;Hu et al., 2022;Choi et al., 2020).
Several deepfake generation and detection models are available, with good realistic performance, and some others are under development.However, the advances in deepfake generation methods is creating challenges in regard to combating them.Deepfake refers to AIbased pragmatic videos or images created using digital editing methods: for instance, facial synthetics, deep-swap, transforming features, expression swaps, audio-visual synthetics, and creating images of things which have never existed in reality.Several methods are available to detect and generate deepfakes; however, gaps still need to be addressed.
The most pressing need is to deploy uniform forgery and voice cloning techniques to counter deepfakes.We have discussed some of the critical challenges involved in generating and detecting deepfakes online.Here, we highlight some vital aspects required to enhance performance and counter deepfakes.

Recommendations for generating deepfakes
Deepfake creation methods have gained increasing attention from academia and scientists due to their potential impact on the community.F. Abbas and A. Taeihagh generate real facial portraits, and in many cases, humans cannot identify whether these faces are real or synthetic (Natsume et al., 2019;Neves et al., 2020;Silva et al., 2022, Kumar Das andNaskar, 2022).
Likewise, the studies (Barni et al., 2020;Groh et al., 2022;Zobaed et al., 2021) were introduced for expression and attribute manipulation using FF++ and Celeb-HQ datasets.The studies employed deep generative methods to create high-quality facial attributes and 2D segmentation maps based on the information provided in an arbitrary view.The methods can create high-quality portraits.However, they fail to capture symmetry and structured patterns.2020), Ranjan et al. (2020), andSingh et al. (2020), Natsume et al. (2018) introduced frameworks for facial and synaptic video generation to increase the visual quality of audio.However, there are still gaps that need to be addressed: for instance, generality, hand motion behavior, identity disclosure, entire body movement, temporal consistency, lighting adjustment, and pragmatism in facial movements (i.e., eyes, mouth and lips).
Most of the existing deepfake generation methods mainly focus on the following (i) face reenactment, (ii) face-swapping, (iii) facial and synthetic video generation, and (iv) frontal facial poses, considering  F. Abbas and A. Taeihagh publicly available datasets like the model (Zendran & Rusiecki, 2021) described below.However, there is still room for improvement by considering composite forged and fusion datasets (real-world situations).Nowadays, deep learning-based techniques are in more demand for generating synthetic media due to the practical advantages they offer.Similarly, deepfakes show how deep learning-based methods can handle digitally manipulated media.Zendran and Rusiecki (2021) used four key techniques to generate deepfakethe autoencoder, VAE, VAE-GAN, and C-GANto analyze the Face-to-Content (F2C) swapping problem.The authors performed experiment analyses on VoxCeleb2 datasets.Moreover, an expressive assessment model was presented to formulate the deepfake image generation problem.This work presents realistic results in regard to dealing with the deepfake generation problem; however, it is sensitive to deviations in the image.
The autoencoder method which is used to generate deepfake comprises three key stages, as described in Fig. 21.Consider a set of images D and a set of faces.Fig. 21(a) shows that encoders are trained to generate feature maps for two modules of face images.Similarly, a decoder is trained to measure the accuracy of both.After that, two independent decoders are trained individually to replicate the input image created by hidden faces via a trained autoencoder, as shown in Fig. 21(b).Lastly, auto-encoders are swapped to create pictures from one module to another based on the model design, as shown in Fig. 21(c).
Using an autoencoder is a key method for generating deepfakes.An autoencoder is a kind of Artificial Neural Networks (ANN) that learns to re-generate certain inputs using a permitted method.To train the encoder M : Z n →Z l and decoder N : Z l →Z n and to meet the condition described in eq (15) (Zendran & Rusiecki, 2021) argmin where E represents expectation during the distribution of c, ω represents cost function that estimates distance among decoders and (N * M) shows composition N(M(c)).To optimize the cost function, the VAE model R VAE is employed as defined in Eq. ( 16).
where R rec is re-generation loss by correlating pixels in the image c to encode latent path, b = encoder(c) ∼ a(b|c) and the re-generated image c = decoder(b) ∼ s(c|b) computes expectation E In contrast, R df is Kullback-Leibler deviation (Kullback & Leibler, 1951) to compute probability distributions.Several tools for generating deepfakes are available, but greater preservation of quality is required because these tools are made only for a specific feature in publicly available applications.Hence, deepfake creation tools and techniques need to be studied to enhance performance abilities.Below are some vital aspects to consider in order to enhance generation techniques' quality and performance ability.F. Abbas and A. Taeihagh frontal facial stance using the FF++ and CelebA-HQ datasets.The proposed methods are appropriate for creating compressed video on limited datasets, so there is a need to consider entire body movementfor instance, generality, hand motion behavior, identity disclosure, temporal consistency, lighting adjustments, and lack of realism in facial movement (i.e., eye, mouth, and lips).Likewise, the methods proposed by (Cazenavette & De Guevara, 2021;Prajapati & Pollett, 2022) employed GAN versions for facial reenactments and attribute manipulation.Yet, it is still challenging to say which method is superior.There are several reasons for this.For instance, current studies have been trained and validated on specific datasets, i.e., FF++ (Ondyari, 2022), Celeb-DF (Li, 2022), DFDC (Dolhansky et al., 2020), CASIA (Casia Dataset, 2020), and VGGFace2 (Cao et al., 2018), and considering different performance metrics (accuracy, absolute error, and equal error rate (EER)).Researchers and scientists should employ composite-based methods or propose new datasets to validate performance while considering real scenarios, i.e., entire body movement, generality, hand motion behavior, identity disclosure, temporal consistency, lighting adjustments, and a lack of realism in facial movement (i.e., eyes, mouth, and lips).
3.4.1.2.Facial and video synthetics.Existing facial and video synthetics methods are primarily based on GAN architecturefor instance, the methods proposed by Singh et al. (2020), Patel et al. (2022), Lee et al. (2021), Nguyen et al. (2021) and Rajesh et al. (2022) employed style-GAN2, FID, and Sof-GAN to create synthetic facial videos.They employed several images to generate a model of every face to create a synthetic video while ensuring quality.The methods were trained on a wide range of images comprising YouTube videos of different TV shows and synthesized images containing facial features.Similarly, the synthetic video generation frameworks by Shelar et al. (2022), Shen et al. (2021), Pal Singh, (2023) and Zhuang and Hsu (2019) presented deep insights into creating synthetic faces using Celeb-HQ and FF-LQ datasets.The authors introduced interactive experiments to understand human reactions to synthetic faces and videos.The methods can assist in identifying synthetic faces but were unable to distinguish high-quality ones.
Moreover, Malik et al. (2022), Abdolahnejad and Liu (2020), Islam et al. (2020) and Chen et al. (2022) developed P-GAN, StarGAN, and StarGAN-V2 methods to create high-quality videos and images using FFHQ and CelebA-HQ datasets which are difficult to distinguish from the human eye -for syntactic faces and video creation.They employed parallel auto-encoders to create realistic synthetic videos.However, the methods fail to preserve distinguishing facial attributes.Wang et al. (2022), Xu et al., (2022a) and Bai et al. (2022) introduced SR-GAN and big-GAN models to produce high-quality videos which are difficult to distinguish by the human eye.The proposed methods performed well in creating realistic synthetic images and videos by removing noise and facial expression; however, these methods need to preserve the quality of synthetic images and videos generated using StarGAN v2, LS-GAN, SR-GAN, HL-GAN, AAE-GAN, or StyleGAN2.Table 8 summarizes trends in regard to deepfake creation, and recommendations.

Recommendations for deepfakes detection
Deepfake detection techniques are still in their early stages.Researchers have introduced and validated several research approaches to detecting and stop disinformation online, using publicly available datasets.For example, the deepfake detection frameworks proposed by John and Sherif (2022) 2021) employed robust hashing and 2D-Xception models to identify deepfake images and videos.The models can identify tampered-with images by computing hash values against images.However, the models do not perform well for low-quality images.Trinh and Liu, (2021), Weever andWilczek (2020), Ud Din et al., (2020) and Solaiyappan and Wen (2022) presented novel D-CNN, XGBoost, and DenseNet methods to classify manipulated images.The methods were trained with different deepfake image datasets to increase the generalization ability.However, the methods had limited performance on unseen datasets.Moreover, (Khichi & Kumar Yadav, 2021;Marcon et al., 2021;Ajoy et al., 2021) employed ABC metric, XGBoost, local binary patterns (LBP), bi-LSTM, YOLO-CRNN, and 3D-Xception models to detect low-quality image and videos.Based on our study and analysis, the following are important features in regard to enhancing the performance of detection techniques.
The active and passive methods for deepfake video generation and detection are vital in understanding the landscape of this rapidly evolving technology (Gu et al., 2022).Passive methods primarily focus on detecting deepfakes after they have been created.While this approach is conventional, its practical application could be challenging, especially when dealing with the sheer volume of videos uploaded daily.The risk of false positives further complicates the effectiveness of passive methods in real-time scenarios.
On the other hand, active methods take a proactive stance by making it harder to create a fake video from an authentic source.Techniques like digital watermarking could be employed to achieve this objective (Wang et al., 2023).By embedding distinctive markers or signatures within legitimate videos, active methods create a barrier for potential manipulations.Such preventative measures will add an extra layer of security, mitigating the risk of deepfake proliferation.
The distinction between active and passive methods is pivotal in the battle against deepfake technology.While passive methods serve as a critical line of defense, they are limited by their post-creation nature.Active methods, however, take a preemptive approach, striving to

Table 8
Trends in deepfake generation, and recommendations.

Reenactment and faceswap
• There is a need for composite, forged, and fusion datasets since existing datasets cannot accommodate different generation techniques (real-world situations).• There is a need to integrate creation models with online media frameworks to counter deepfakes; however, integration is nearly impossible to implement, though it can be executed using blockchain techniques.• Integrated frameworks could be employed to identify various video forgeries in the future.These frameworks could use corresponding sections to identify the synthetic video and audio content.In this way, synthetic and facial content could be efficiently handled in any form.• In order to make social media platforms more reliable, filtration methods should be employed.
F. Abbas and A. Taeihagh prevent the creation of deceptive videos altogether.

Faceswap and face reenactment.
Improving deepfake detection performance using AI techniques demands, most importantly, composite forgery and hybrid datasets.Several studies have been conducted using similar forged and existing datasets but have failed to demonstrate comprehensive proficiency in detection of deepfakes.A few recent studies have addressed the deepfake detection problem (faceswap, attribute manipulation, reenactment, etc.); however, there is still a need for improvement, which can be achieved by considering compound forgery and hybrid datasets (real-world situations) to develop robust models.
This type of data is not continuously obtainable in an adversarial context, where intruders typically try not to disclose that kind of deepfake generation technique.In addition, several deepfake detection studies have employed DFDC (Dolhansky et al., 2020), UADFV (Xie et al., 2020), Celeb-DF (Li, 2022), FF++ (Ondyari, 2022), and Celeb-A (Liu et al., 2020) to test performance and accuracy; however, these datasets may not be helpful in regard to achieving satisfactory performance.In light of these limitations, developing efficient deepfake detection methods that are robust and consistent to boost output quality is still needed.

Synthetic face and audio-visual.
Audio-visual synchronization and synthetic face detection are vital issues in online media frameworks.Several techniques have been proposed for audio-image analysis and synthetic face detection.For instance, the frameworks proposed by ˙Ilhan et al. ( 2022), Conti et al. (2022), andRam et al. (2022) employed MFCC, hybrid-CNN, and AFMB methods to classify synthetic facial attributes and audio-visual using DFDC and TIMIT-HQ datasets.They employed a variational auto-encoder (VAE) to convert the output into audio waveforms to identify audio-visual inconsistency.However, the methods are only applicable to detecting audio deepfakes with limited features.Likewise, (Mathews et al., 2023;Deng et al., 2022;Zhao et al., 2021;Kshirsagar et al., 2022;) used DST, AVFakeNet, MFCC, DFT-MF, and MesoNet methods to identify synthesized audios.They employed crossentropy, loss function, and divergence to analyze the forensic attributes of videos.Though, the models achieve limited performance on lowquality videos and noisy voice samples.
The methods proposed by Hao et al. (2022), Jafar et al. (2020), and Aduwala et al. (2021) used deep learning-based Discrete Fourier Transform-Multi-frame (DFT-MF) and MesoNet models that consider mouth features to identify deepfake videos by investigating lip-mouth movement.Further, they exploit the divergence in mouth shape dynamics using ASV Spoof and DFDC datasets.However, the video's performance significantly degraded when mouth movement was limited.Guarnera et al. (2020), AtaS ¸ et al. (2022), Khan et al. (2020), andFang et al. (2020) introduced hybrid CNN, EM-CNN, and hybrid XAI methods with expectation maximization to learn synthetic face attributes using CELEBA and FF++ datasets.The assessment shows better-quality performance compared to standard models; however, these models cannot be applied to detect sensitive facial expression synthesis and audio visual inconsistencies.In (Uçan et al., 2021;Zotov et al., 2020;Zhou et al., 2021;Ramachandran et al., 2021) used publically available datasets, i. e., DFDC, UADF, Celeb-DF, FF++ and Celeb-A to identify low-quality audio-visual and synthetic faces with multi-feature fusion methods to identify forged image and videos.However, these datasets may not support the achievement of fair performance.To deal with practical deepfake detection challenges, scientists and researchers must emphasize developing powerful, efficient, and universal approaches to stopping online disinformation.
Almost all deepfake threats occur online, requiring high-resolution models in real time.For instance, (Tanaka et al., 2021) developed a real-time deepfake detection technique to identify fake images using a robust hashing (RH) algorithm.This model can identify tampered-with images by computing hash values against images.However, the performance of this model is limited to synthetic images.Table 9 summarizes the recommendations and trends relating to deepfake detection.

Policy recommendations and trends (RQ5)
This section discusses different policies to mitigate the harmful impacts of deepfake generation and detection.Deepfake can be seen as a novel methodological approach.However, this perception fails to consider its actual impact on people's lives.To fully understand the implication of deepfakes, it is necessary to investigate the community aspects and movements that form their social significance, and how they are essential in the community life cycle.
AI-based deepfakes can be used for positive or negative purposes (Meskys et al., 2019).
The following are some policy recommendations and measures to mitigate the negative impacts of AI-based deepfakes (summarized in Fig. 22).

Standardization of datasets and tools.
To counter the harmful impact of deepfakes on society, datasets and tools must be standardized.Deepfake detection methods are vital for stopping the spread of deepfakes.The evolution of AI techniques and forensic identification methods is a cold war (Van der Sloot & Wagensveld, 2022).Deep learning-based technologies can detect and regulate deepfakes quickly since they can acquire information rapidly.This continuous development process will make detecting forgeries on social platforms harder.By standardizing detection datasets and tools, the community will be better able to identify forged and synthetic videos (Scherhag et al., 2019). 3.4.3.2.Overcome bias in deepfakes.Deepfake detection methods are currently not sufficiently advanced to be able to detect false information that is indistinguishable from reality.The creation of deepfakes is becoming easier over time.Creating deepfakes is relatively inexpensive, as most tools and models are publicly available.Deepfake creation costs will continue to drop as research advances are cascaded to easy-to-use software and open-source code.However, deepfake detection classifications and takedown strategies will help overcome the bias of commercialized deepfake technologies (Hwang, 2020).
F. Abbas and A. Taeihagh Recently, deepfakes have garnered significant attention from policymakers.Over time, the implementation of robust strategies identify deepfakes could significantly impact the propagation of online disinformation.Conversational AI-based models can make large numbers of forged identities more credible and convincing.Deep learning-based analytical algorithms can enable malevolent actors to target individuals and communities receiving information better, and more subtly (Mariëtte V. Huijstee et al., 2021). 3.4.3.3.Identity verification and spreading dimension of deepfakes.In this section, we discuss possible policy options to address the issue of identity verification and the spread of deepfakes, including rules and restrictions governing disseminating certain deepfakes (Kalpokas & Kalpokiene, 2022).Communication services and online platforms play a vital role in spreading deepfakes.The spreading of a deepfake determines the scale and intensity of the impact to a large extent.Thus, platforms and further intermediaries should assume responsibility and should be given obligations, including being liable if they fail to do so.There is a need to introduce a digital services act setting out measures to limit the spread of deepfakes (Van der Sloot & Wagensveld, 2022).
3.4.3.4.Regulate AI-based frameworks.The negative impact of deepfakes on communities and society can be mitigated by employing policies and measures to combat disinformation.In Europe, the regulatory environment associated with deepfakes is complex and involves both rigid and flexible rules at the European Union and national levels (EU Parliament, 2020).Participant involved in the deepfake lifecycle have both rights and obligations.These actors include creators of deepfakes, characters shown in videos, victims and original actors, authors and copyright holders of original works, technology developers, the intermediary platforms used for dissemination, and the platforms where videos are uploaded, viewed, and shared (Hwang, 2020). 3.4.3.5.Technology measures to counter deepfakes.Detection technology is vital to stop the spread of malevolent deepfakes.However, if deepfake technology providers are familiar with detection techniques, they can regulate deepfake production techniques and evade detection.One way to stop the spread of deepfakes/disinformation is to impose restrictions on disseminating cutting-edge deepfake techniques by technology providers so that adversaries do not immediately counteract the advances made by digital forensics researchers (Meskys et al., 2019).However, such an approach would impede law enforcement deploying deepfake detection technologies, and new actors might be unable to detect deepfakes.Therefore, the pros and cons should be carefully considered before implementing such a measure.

Conclusion
This study has provided an extensive review of emerging deepfake detection and generation techniques.Not all manipulated content is harmful.However, with the advancement of AI-based generative models, it is possible to create credible content that malicious users can manipulate for negative purposes.These attacks target the community, which can cause spiritual, critical, financial, and physical damage.
In this study, we have provided a compressive analysis of deepfake generation and detection models, techniques, tools, frameworks, algorithms, achievements, limits, and challenges, and we have provided policy recommendations regarding stopping online disinformation.This study will be advantageous for the community in regard to understanding deepfake techniques and stopping the spread of disinformation online.More research should be done in the areas of facial synthesis, facial reenactment, attribute manipulation, video synthetic deepfakes, and especially audio-visual synthesis detection, to address the community's needs and prevent the spread of deepfakes.
Our study provides a comprehensive analysis of deepfake techniques; however, further research is warranted in facial synthesis and reenactment, particularly in exploring advanced generative AI-based methodologies to detect subtle manipulations in facial and expression attributes.Future studies should explore frameworks focusing on attribute manipulation within deepfakes to develop more effective detection strategies.Additionally, there is a need for specialized detection methods and frameworks tailored to video synthetic deepfakes, given the distinct challenges in this area.As the landscape of deepfake technology rapidly evolves, developing robust frameworks capable of effectively detecting audio-visual synthesis becomes imperative to counter the dissemination of sophisticated and realistic deepfake content.Exploring frameworks that combine multiple modalities for detection purposes hold promise in enhancing identification accuracy.Enhancing real-time detection capabilities is vital due to the rapid dissemination of deepfake content, and developing algorithms and frameworks capable of swiftly identifying and mitigating deepfakes will be instrumental in combating disinformation.Future research should also address the ethical implications and policy frameworks surrounding deepfake detection and generation, including considerations of privacy, consent, and legal ramifications.Integrating human expertise into the detection process can provide valuable insights and improve the accuracy of detection algorithms, ultimately helping establish effective

Table 9
Recommendations and trends relating to deepfake detection.

Deepfake detection Recommendations and trends
Faceswap and face reenactment • Although current reenactments and faceswap-based techniques (Kohli & Gupta, 2022;Fernandes et al., 2020;Wang et al., 2021) perform well in identifying forged and synthetic faces, there is still a need to develop efficient deepfake detection methods that are robust and consistent, to boost output quality.• The existing datasets are not robust enough to accommodate different detection techniques; composite, forged, and fusion datasets are needed (realworld situations).• Models for deepfake detection are currently lacking in interpretability; for example, most models (Ismail, Elpeltagy, Zaki, et al., 2021;Masood et al., 2021) use pre-trained neural network algorithms with limited scalability in the results; therefore, the explainability of detection outcomes should be emphasized in the future, as this remains a concern.• There is a need to implement hybrid models by considering reenactment and face-swapping jointly.• For face reenactment and face-swapping, scientists should consider full body movement: for instance, generality, hand motion behavior, identity disclosure, temporal consistency, lighting adjustments, and facial movement (i.e., eyes, mouth, and lips).Synthetic face and audio-visual • Integrating synthetic and audio-visual techniques with online media platforms, can be effectively implemented using blockchain technology, which seems promising for addressing existing gaps.For instance, a few key blockchain techniques are discussed in the article (Rashid et al., 2021) to protect image/video integrity and counter deepfakes; however, this research trend is still emerging.• It is necessary to employ anti-forensic and adversarial AI methods to increase the classification accuracy of automated synthetic and forged visual methods, i.e., game theory models could be used to protect deepfakes detection systems from adversarial attacks.• One vital component of deepfakes, audio manipulation, is mostly ignored by researchers.To effectively detect deepfakes, it is necessary to develop detectors that can detect both video and audio.• Developing a countermeasure capable of detecting various spoofing attacks in a multi-hop environment, such as replay, cloning, and cloned replay, is necessary.• To keep up with the trend of analyzing large amounts of data, dynamic methods should be explored to detect and extract new synthetic attributes adaptively.
F. Abbas and A. Taeihagh human-machine partnerships in deepfake detection.Active methods can play a pivotal role in disrupting the potential endless cycle of AIbased deepfake creation and detection.These methods take a proactive approach by implementing measures to prevent the creation of deceptive videos rather than solely relying on detection after the fact.
Techniques like digital watermarking can embed unique identifiers within authentic videos, making it harder to manipulate them.Future research should focus on developing and enhancing active methods to effectively counteract the advancements in deepfake technology, thus mitigating the perpetual arms race between deepfake creation and detection algorithms.By addressing these research gaps and pursuing these avenues of investigation, we can collectively work towards more resilient and practical frameworks for detecting and mitigating the spread of deepfake content, essential in safeguarding individuals, communities, and societies from the potential harms of manipulated media.F. Abbas and A. Taeihagh

Fig. 1 .
Fig. 1.Text analysis visualization of search string generates using the VOYANT tool.

Fig. 2 .
Fig. 2. Keyword co-occurrences visualization analysis of eligible studies using the VOSviewer tool.

F
.Abbas and A. Taeihagh

Fig. 3 .
Fig. 3. Number of publications on deepfake detection and generation, by publication year and research area.

Fig. 4 .
Fig. 4. Flow diagram of the PRISMA protocol for reporting systematic reviews (Page et al., 2021); the figure summarizes the data selection and screening process.

F
.Abbas and A. Taeihagh

F
.Abbas and A. Taeihagh Acc = 89.5 % Random double Acc = 87.1 % Dynamic balancing Acc = 91.1 % Dynamic balancing The quality of the generated image was not appropriate in this model, which requires an immense number of target images.(continued on next page) F. Abbas and A. Taeihagh 2019; Gragnaniello et al., 2022).Fig. 10 shows facial synthetic and attribute manipulation conducted using a different version of GAN (LS-GAN, SOF-GAN): it is nearly impossible to differentiate between the real and the synthetic faces.Table 4 summarizes the deep learning-based generative methods and frameworks used for facial and video synthetic generation.Li et al. (2023) introduced an SC-GAN-based clustering framework for expression manipulation.Their framework incorporated SIFT-K means clustering to achieve convincing results in regard to splitting the semantic facial space into different subspaces.This framework can efficiently create conceivable facial expression manipulation images on synthetic datasets.However, the framework is unable to drive facial expressions in videos.Chen et al. (2022) proposed a SOF-GAN-based framework for attribute style adjustment using a semantic instance- = 82.4% Helen SSIM Acc = 82.6 % CelebA PSNR Acc = 53.4% Helen PSNR Acc = 75.2% CelebA Results are not satisfactory for different resolutions of images and fails to handle unaligned faces.
For instance, Wang et al. (2022) presented a technique for generating adversarial examples based on blurry faces to enhance generalization capability.The research also introduced an adversarial training model to blur objects by employing the component Gaussian blur.Baek et al. (2020) developed the GAN-enabled technique to enhance discrimination capability.The study employed deep learning-based training policies from generators to create synthetic images.Two discriminators were integrated to enhance the discriminability output, utilizing combined features.The study performed well and can be employed for synthetic video detection.A study by Malik et al. (

F
.Abbas and A. Taeihagh Kim (2021) to identify deepfake videos by only investigating visual features of digital content.Zhu et al. (2020) proposed a cluster-based impact regularization method to identify deepfakes.The authors employed a freely available algorithm to create videos that imitate distinguishing artifacts within deepfake videos and integrate them.Due to the improved accuracy of deepfake generation techniques, deepfake videos are difficult to identify.To address the fake video detection problem,Abdelkhalki et al. (2022) proposed a novel technique for detecting fake video using 3D Xception Net using the Fourier transform.The authors designed the 3D-Xception model by employing the CNN technique and substituting CNN's 3D-Xception model with complex convolutions to make the network more efficient in terms of reliability and classification accuracy.Asha et al. (2023) introduced a defensive framework for deepfake detection, leveraging temporal and spatial attributes for video classification.This approach involves extracting spatial attributes frame-byframe while incorporating temporal attribute analysis to discern discrepancies between frames during image generation.The performance of this framework was assessed using various datasets, including Celeb-DF, FF++, and YouTube videos, with comparisons drawn against baseline methodologies.Panda et al. (2022) developed an enhanced deepfake detection framework that utilizes a weighted deep ensemble model based on visual inputs.The authors incorporated pre-trained deep convolutional neural networks (DCNN) to extract visual features while employing long short-term memory (LSTM) for capturing temporal features from the input frames.The framework was trained on the Celeb-DF-V2 dataset (75 % of the dataset for training and 25 % for validation), which utilizes a more streamlined set of frames than standard benchmark methods.Sharma et al. (2023) presented a CNN-based framework to detect deepfake videos and images generated through GANs.Their approach involved using pre-trained models, specifically MTCNN and ResNext-V1, to automate the task of deepfake identification across various artificially generated datasets.This framework demonstrated superior performance

F
.Abbas and A. Taeihagh
Ge et al. (2022) andAwotunde et al. (2023) developed a hybrid ConvGRU-based framework to classify deepfake videos.The proposed framework employed pre-trained models for deep feature extraction to classify manipulated video.Hsu et al., (2020) and Choudhury et al. ( For instance, Zhou et al. (2021) and Huang et al. (2021) presented deep learning-based techniques that consider the mouth and facial attributes to create deepfake images and videos using the RaFD and Celeb-HQ datasets.These approaches perform well in generating real and synthetic images; however, the models fail to distinguish facial attributes.The frameworks developed by Ding et al. (2020), Liu et al. (2021), and Coccomini et al. (2022) can create the facial appearance of flat faces without requiring further fine-tuning and preserving the facial identity of an input image/video using DFDC and Celeb-HQ datasets.However, the proposed frameworks are unable to maintain quality at different angles and lightening.Numerous techniques have been developed to

Table 3
Overview of face reenactment and faceswap generation methods.

Table 3
(continued ) Li et al., 2017 LS-GAN i. Analyze facial expression, eye gaze, and head pose Output quality of synthesized images (128 × 128, 256 × 256) Extracted deep attribute using a progressive LS-GAN with parallel Auto-encoder (AE) 85 % of dataset for training and 15 % for validation

Table 4
Overview of facial and video synthetic generation methods.

Table 5
Overview of faceswap and face reenactment deepfake detection methods.

Table 7
Overview of deepfake generation and detection tools and software.

Table 7
(continued ) developed integrated frameworks to classify forged videos.The frameworks can assist in identifying synthetic videos but are unable to distinguish high-quality ones.Similarly, (Khormali & Yuan, 2022; Kohli & Gupta, 2022; Lee & Kim, 2021) identified swapped and altered facial expressions by employing pretrained deep learning methods with FF++ and DFDC datasets.However, the accuracy of their image or compressed video classification needs improvement.The framework proposed by Ismail, Elpeltagy, Zaki et al. (2021), Fernandes et al. (2020), S. L., & Sooda, K. , (2022) and Wang et al. (