Deepfakes: evolution and trends

This study conducts research on deepfakes technology evolution and trends based on a bibliometric analysis of the articles published on this topic along with six research questions: What are the main research areas of the articles in deepfakes? What are the main current topics in deepfakes research and how are they related? Which are the trends in deepfakes research? How do topics in deepfakes research change over time? Who is researching deepfakes? Who is funding deepfakes research? We have found a total of 331 research articles about deepfakes in an analysis carried out on the Web of Science and Scopus databases. This data serves to provide a complete overview of deepfakes. Main insights include: different areas in which deepfakes research is being performed; which areas are the emerging ones, those that are considered basic, and those that currently have the most potential for development; most studied topics on deepfakes research, including the different artificial intelligence methods applied; emerging and niche topics; relationships among the most prominent researchers; the countries where deepfakes research is performed; main funding institutions. This paper identifies the current trends and opportunities in deepfakes research for practitioners and researchers who want to get into this topic.


Introduction
Deepfake technology can be used to forge synthetic media that people cannot differentiate from true ones. It is a recent research area in which researchers in academia and industry have contributed deepfake databases, and synthesis and detection algorithms, which has made the deepfake popularity grow. Deepfakes are the product of artificial intelligence (AI) applications that merge, combine, replace, and superimpose images and video clips to create fake videos that appear authentic (Maras and Alexandrou 2019 on videos or images, the face of a person can be swapped with another face leaving little trace of manipulation (Chawla 2019). The emergence of deep learning has made previously existing fake face detection strategies vulnerable (Cho and Jeong 2017).
The availability of deepfake databases and synthesis and detection algorithms have made it possible for the community and even amateurish users to perform realistic deepfakes, which in turn has made the amount of popularity deepfake videos in the wild grow immensely (Pu et al. 2021a). Coupled with the reach and speed of social media, convincing deepfakes can quickly reach millions of people and have negative impacts on our society (Westerlund 2019).
The growth in deepfakes research has also been reflected in the amount of related scientific literature. Apart from technological aspects related to deepfake creation and detection, ethical, social, and legal aspects have also been carefully analyzed. There are already some reviews in specific fields, such as Creation and detection of deepfakes , Law (da Silva 2021), Forensics (Amerini et al. 2021a), and Social impact (Hancock and Bailenson 2021a), to name a few. Still, none of them contemplates the full spectrum of research areas in deepfakes, which we believe can be very useful for researchers who wish to work on this research topic. Despite its novelty, deepfakes research is a fast-growing research area, in which the research topics and their relationship is continuously changing over time and new trends appear. The different areas in which deepfakes research is performed indicate there are researchers with a wide variety of backgrounds. Apart from current trends, analyzing the funding opportunities is interesting to help focus the research effort.
The objective of this work is to get an overview of the current trends and evolution of deepfakes research, as well as to analyze the fields in which it is being applied. To this aim, all the empirical evidence that fits pre-specified eligibility criteria to answer the following six specific research questions was collated in Scopus and Web of Science databases: What are the main research areas of the articles in deepfakes? What are the main current topics in deepfakes research and how are they related? Which are the trends in deepfakes research? How do topics in deepfakes research change over time? Who is researching deepfakes? Who is funding deepfakes research? It has been decided which disciplines are developing, which are consolidating, and which are promising. The most studied areas of deep learning research, including the various artificial intelligence techniques used, have also been examined, along with emerging and niche topics. Relationships between the most well-known scientists, the nations where deepfakes research is conducted, and the major funding organizations have also been established. The prospects and trends in deepfakes research are identified in this article for practitioners and scholars who are interested in the subject.
The remainder of this paper is structured as follows. The next section presents the methods used to obtain the sample of articles to study that determine the focus, the specific research questions we seek to answer, and the software used to automate part of the process. In the results section, we expose the findings of specified research questions. After providing some reflections on the discussion, conclusions are drawn.

Methods
A systematic review attempts to collate all the empirical evidence that fits pre-specified eligibility criteria to answer a specific research question (Higgins et al. 2019). Therefore, the authors have ensured that the review addresses relevant questions to those who are expected to use and act upon its conclusions. More specifically, the research questions addressed by this review paper are: • RQ1: What are the main research areas of the articles in deepfakes? • RQ2: What are the main current topics in deepfakes research and how are they related? Once the research questions were established, the starting point was a search carried out in Scopus in July 2021 and another in October of the same year. The specific query used in the case of Scopus was: ALL ( ( deepfake deep-fake "deep fake" ) AND ( ( action unit OR facial action unit coding system OR facs ) OR ( video OR clip OR image OR photogram ) ) ) The same procedure was followed in Web of Science (WoS) also in July and October. The query in the case of WoS was: TS=( ( deepfake deep-fake "deep fake" ) AND ((Action Unit OR Facial Action Unit Coding System OR FACS) OR (video OR clip OR image OR photogram))) As summarized in Table 1, the Scopus query retrieved a total of 242 records (229 in English) in July and 331 (311 in English) in October. The range of years for the retrieved records was from 2018 to 2021. There were no results before 2018 from any of the databases. In the case of Web of Science, the results were 8 in July (6 in English) and 12 in October (10 in English).
The first objective of these queries was to check if the same articles were being published in both databases and to estimate the rate of growth of the number of publications from the change between the July and October requests. Given the small number of results from Web of Science, and that just one of them is not present in the Scopus results, the detailed analysis focuses on the October results in English, i.e., 311 records from Scopus, from now on the SDO21 (Scopus Database October 2021) dataset. The dataset records are listed in Appendix A, divided into clusters based on their keywords, and available for download online. 1 It is also important to note the importance that conference publications have concerning deepfakes research as they are not included in the Web of Science. One hundred and seventy-nine of the records from Scopus are conference papers.
Given the size of the SDO21 dataset, the review has been automated using the Bibliometrix (Aria and Cuccurullo 2017) package for R, including the Biblioshiny application, as detailed in Sect. 3. Regarding transparent reporting of systematic review and meta-analysis, a PRISMA Flow Diagram 2 has not been considered necessary because the process has been simple. All the records retrieved have been considered, with the only exception of articles not in English, to facilitate the automated analysis using Bibliometrix. In any case, as observed in Table 1, the number of records that are not in English just represents between 5% and 6% of the results, in July and October, respectively.

Main topics in deepfakes research
Regarding the first two research questions,RQ1: What are the main research areas of the articles in deepfakes? and RQ2: What are the main current topics in deepfakes research and how are they related?, our first exploration considers just the review papers, the focus of which is mainly placed on ethical and legal aspects as detailed next: If we broaden to the whole set of 311 papers and just analyze the research areas they belong to, Computer Science is the most represented with 40.8% of the records related to this area. It is followed by Engineering (19,5%) and Social Sciences (9,4%), as shown in Fig. 1. It is important to note that papers might belong to more than one area, as defined by the corresponding literature database for each journal and year.
We consider all areas when calculating these percentages as a way to recognize the interdisciplinary nature of deepfakes, with scientific journals aiming to promote interdisciplinary research and facilitate collaboration among researchers with diverse expertise.
To get deeper into the specific topics deepfakes research is dealing with, a knowledge discovery approach has been applied to identify the underlying conceptual structure. The keywords associated with each record in the SDO21 dataset have been analyzed with the Bibliometrix R package. The conceptual structure represents the relationship among the records' keywords. Keywords that appear together in a paper corresponding to a record are connected in the resulting cokeywords network. Keywords will be close in this network if a large proportion of papers have them together. Otherwise, they will be apart.
The process to create this co-keywords network that highlights the main research topics is first to create a co-occurrence symmetric matrix. As shown in Fig. 2, the elements in the diagonal k ii correspond to the total amount of occurrences of each keyword in the whole SDO21 dataset. On the other hand, the element outside of the diagonal, k i j , corresponds to how many times the keyword i and keyword j appear together in the same paper.
The co-keywords matrix is then used to generate the keywords network that highlights the research topics structure in deepfakes research. The network is an undirected graph where the nodes correspond to keywords and whose size depends on the keyword frequency, thus generated from the matrix's diagonal.
Then, two graph nodes are connected if the matrix cell for the corresponding keywords is greater than 0, and thus both keywords share at least one paper. The edges are weighted with the value of that cell, i.e., the number of papers where both keywords appear together as captured in the nondiagonal cells. Edges' weight is interpreted as a measure of the strength between two keywords, the higher they appear the closer they are on the graph. Based on this interpretation of the matrix, the graph can be rendered as shown in Fig. 3 and highlights the main research topics corresponding to the most frequent keywords. This technique processes keywords as text strings and thus does not include any kind of semantic similarity measure. It focuses on the keywords associated with each publication.
Co-occurrence networks use various measures to identify crucial nodes or vertices within the network. Among these measures, Betweenness (Table 2), Closeness (Table 3), and PageRank (Table 4) are used to provide notable insights. When considering the top 5 keywords for each metric, a sum of 8 unique keywords is obtained. This is consistent as each measure is capturing a different aspect of the network of keyword co-occurrences. Betweenness quantifies how often a node falls on the shortest paths between other nodes in the network. Nodes with high Betweenness are critical since they connect different parts of the network, playing a vital role in the flow of information or resources between distinct groups of nodes. Closeness measures how closely connected a node is to all other nodes in the network. Nodes with high Closeness are significant since they have rapid access to a vast amount of information or resources and can disseminate them quickly throughout the network. PageRank assesses a node's importance based on the number and quality of incoming links it has. Nodes with high PageRank are crucial since they are highly connected to other important nodes in the network. In identifying key intermediaries or brokers in the network, Betweenness is the most critical measure. If the aim is to identify nodes that can quickly disseminate information throughout the network, Closeness is the most critical measure. Finally, to identify nodes that shape the network's overall behavior, PageRank is the most important measure. It is often useful to calculate all three measures to gain a comprehensive understanding of the network's structure and dynamics.
This representation makes it easier to visualize how the main research topics are organized in deepfakes research. Just the most representative topics, corresponding to the most used keywords, are shown. And they are more prominent the more present they are in the SDO21 dataset. Highly related topics, because they are covered jointly in many papers, are shown closer. This makes it also possible to apply a clustering algorithm that helps identify the main research topics and

Trends and evolution of deepfakes research
In this section, we address the third and fourth research questions, RQ3: Which are the trends in deepfakes research? and RQ4: How do topics in deepfakes research change over time?. Despite the short time interval under study, the SDO21 dataset includes records from 2018 to 2021, it is possible to observe the evolution of the main research topics and identify their trends. First of all, after applying a clustering algorithm to the keywords as detailed in the previous section, we can do more than just highlight the main topics of the deepfakes research domain. Each topic can be represented on a plot called Thematic Map (Cobo et al. 2011) as shown in Fig. 4. This kind of plot classifies the cluster of keywords from the co-keyword network obtained in the previous section according to Callon's centrality and density measures (Callon et al. 1991): • Centrality: measures the strength of the links to other topics, considering those from keywords included in a cluster to keywords in other clusters. Thus, it measures the importance of a topic in the context of the whole field of study. • Density: is related to the strength of internal links among all keywords corresponding to the same topic cluster. It is interpreted as a measure of the topic's development degree.
Centrality and Density define the two axes of the Thematic Map and are used to divide it into four regions. The topics in these regions are associated with the following trends: The Motor and Basic topics are considered those that favor the development and consolidation of a research field due to their density and/or centrality. For the particular case of deepfakes research captured by the SDO21 dataset, there is a lack of clear Motor Topics. Most of them are Basic Topics related to the core of technologies used for deepfakes development, as is the case of convolutional neural networks or deep neural networks. This is also the case with detection methods such as facial recognition.
The only topics that are partially classified as Motor Topics, and thus are computer graphics, network architecture, and digital forensics. This seems related to the fact that, as noted at the beginning of Section 3.1, there are two reviews on the particular topic of forensics in the last four years.
On the other hand, the topics partially related to Emerging Topics (declining seems unfeasible given the youth of the discipline and the short time range) are those associated with artificial intelligence, data security, and adversarial networks. Finally, the more mature topics, though apart from the main efforts in this research domain, are those that have to do with the analysis in time and frequency to achieve better returns such as video recording or social networks.
It is important to note that what is being classified into these different trends are the keywords associated with the papers. Thus, quite related topics that might be even equivalent in some contexts, like "deep neural networks" and "neural networks," might be classified in different quadrants based on their use in the analyzed literature. The approach is thus completely agnostic regarding the interpretation of these keywords because they are highly contextual, like in the case of neural networks methods and applications (Samek et al. 2021).
In addition to the static view provided by the Thematic Map in Fig. 4, it is also possible to get an idea of the underlying dynamics using the Thematic Evolution diagram shown in Fig. 5. Thematic Maps for different periods are computed to identify topics' evolution over time. Topics at a particular period are then connected with those in the following one to create a stream of topics' evolution. Linking among topics is based on the percentage of keywords shared between the identified topics at each period. This way, it is possible to observe how initial topics might remain partially and split into other topics that then include the corresponding keywords.
For the SDO21 dataset, just two time periods have been defined given the short period, 2018-2020 and 2021. On the left of Fig. 5, there are the topics for the 2018-2020 period, including computer vision or computer graphics among others. On the right side, are those for 2021. The evolution of the topics is illustrated through the links connecting them, which are weighted based on the number of keywords shared by the topics in different periods.
For instance, the computer vision topic has split into many different ones in 2021, partially remaining as the same topic but less relevant because many of the associated keywords are now tied to other topics like deep learning, convolutional neural networks, or digital forensics. On the other hand, topics like computer graphics have disappeared and now the associated keywords are contributing to the digital forensics one, which has emerged from keywords from this topic combined with some from computer vision. Overall, Fig. 5 highlights the topics getting traction in deepfakes research and how they are consolidating from the topics that attracted the most attention just some years ago.

Deepfake technologies usage and funding
Regarding the last research questions, RQ5: Who is researching deepfakes? and RQ6: Who is funding deepfakes research?, they are addressed by analyzing the intellectual and social structures of the SDO21 dataset. First of all, and as can be observed in Table 5, the most relevant papers come from conferences, concretely from IEEE conferences and workshops. Forensics, signal processing, law, and blockchain are among the topics dealt with by the most cited articles about deepfakes research in Scopus between 2018 and 2021.
Going beyond this superficial analysis, the whole community that has generated the papers in SDO21 should be taken into account. It is for this reason that we have also carried out an analysis of the social structure to highlight how authors or institutions related to others in this particular research field. First of all through a co-authorship network, which is displayed in Fig. 6.
Many of the most referenced authors in Table 5 can be also identified in the co-authorship network, which also focuses on the most prominent authors. These authors appear in little clusters, like Amerini or Agarwal and their corresponding co-authors. This highlights that even highly cited authors' work collaborates in relatively closed circles and the overall community is quite fragmented from this perspective.
If we switch from individual researchers to their institutions and countries, we can also unveil the underlying social structures at these levels. Looking at the corresponding author countries, shown in Fig. 7, we can observe the great leadership that researchers from China have in this particular research area. This is even more evident when we realize that, despite it might seem that part of this leadership comes from collaborations with other countries because it is the country with the highest amount of inter-country collaborations, these collaborations are really with Chinese researchers based in other countries. This is illustrated in Fig. 8, which shows the connection between researchers and countries, and then from countries to research topics. Therefore, although intercountry collaboration is indeed very high in China, it is because these researchers work in other countries, in most cases in the USA as shown in Fig. 8.
Finally, focusing on RQ6: Who is funding deepfakes research?, the main research funding organization of the reviewed publications is the National Natural Science (Foun-  Table 6. Therefore, China is leading the investigation as a country, mostly from institutions related to the military and defense sectors. And as shown in Fig. 9, which displays the collaborations among institutions, these collaborations are kept at the national level.

Discussion
This paper employs metadata analysis to investigate the trends and tendencies related to deepfake research. It is important to note that our objective was not to conduct a literature review, but to analyze its metadata. However, it may be valuable to include this section in the paper that provides further insights into the representative results of the included publications.
Deepfakes is a field of research that has gained significant attention in recent years due to its potential implications in manipulating digital media. Following the content found in the lower-right quadrant of Fig. 4, which contains "topics that are important for the research field but are not yet fully developed" learning systems, detection methods, and algorithms are the key and future directions in the topic. One of the most common approaches used in Deepfakes is generative adversarial networks (GANs) (Hu et al. 2021). These techniques consist of two neural networks, one that generates fake data and another that evaluates the generated data authenticity. The results obtained using GANs have shown remarkable progress in generating highly realistic images and videos. Another popular method is the use of autoencoders , neural networks that are trained to reconstruct the input data. The encoded representation of the input is then used to generate new data. The results obtained using autoencoders have shown promise in generating high-quality Deepfakes.
In addition to GANs and autoencoders, there are other methods that have been used in Deepfakes, such as variational autoencoders (Zendran and Rusiecki 2021), deep belief networks , and convolutional neural   (Agrawal and Sharma 2021). Each of these methods has shown varying degrees of success in generating Deepfakes. Of course, these methods are improving by applying not only new approaches but combining known techniques in a new way, as Zheng et al. (2018) proposes a novel twostage training process for deep convolutional neural networks (CNNs) that improves their generalization ability by implicit regularization, particularly when the training data is limited.
Practical cross-area applications can be found in works like (Yao et al. 2021), where a method is proposed to automatically separate compound figures in biomedical research articles. It uses a deep learning model that is trained to separate the subfigures based on their visual features and is augmented with a "side loss" to ensure that the model also considers the context and layout of the subfigures. This article is a good example of how a single publication can show insights into distant topics from upper-left Fig. 4 (frequency domain analysis) and lower-right (detection methods) at the same time.
Despite the progress made in deepfakes, there are still limitations to the current state of the art. The primary challenges are the ability to generate realistic and high-quality deepfakes without significant artifacts (Matern et al. 2019b) and paradoxically, the ability to detect and prevent the spread of deepfakes in the public domain (Rossler et al. 2019).
Finally, regarding funding, the top five funding institutions are either government agencies (NSFC, DARPA, AFRL, and NSF) or state-sponsored programs (NKRDPC and USNCF) that prioritize funding for research projects that are strategically important to their respective countries (see Table 6). As these projects may include those with military applications or those that promote the development of key industries, it is reasonable to infer that these strategic priorities may account for the low inter-country collaboration ratio (MCP) presented in Fig. 7. This could be because research with strategic importance often challenges collaboration due to national security concerns, funding restrictions (in some cases, funds may be restricted for international collaborations), and intellectual property issues.

Conclusions
It has been found that growth since 2018 has skyrocketed regarding research publications in the area of deepfakes. The queries for Web of Science and Scopus did not retrieve any results before 2018 but accumulated 311 results, after less than four years, in 2021. The specific findings for each of the research questions are discussed in the next paragraphs.
RQ1: What are the main research areas of the articles in deepfakes? Deepfakes research includes many different research areas. Our analysis identified 10 different areas with at least 2% of the articles about the topic. All 10 combined represent roughly 95% of the papers. However, there is a big imbalance as just 3 of them accumulate almost 70% of the results. Computer Science is the most represented with 40.8%, followed by Engineering (19,5%). Thus, these technological research areas are those with the biggest percentage of articles. The third area is Social Sciences (9,4%), so deepfakes research is also noticeable in social sciences-related topics.
RQ2: What are the main current topics in deepfakes research and how are they related? Regarding the most studied topics, a knowledge discovery approach has been applied to identify the underlying conceptual structure starting from the keywords associated with the analyzed articles. Using a clustering algorithm, five main sets of topics have been identified, being the most representative topics in each cluster: deep learning, face recognition, convolutional neural networks, computer vision, and social media. Other relevant topics in each cluster are presented in Fig. 3. As can be observed, overall, deep learning stands out. And more specifically, adversarial and convolutional neural networks. It is also relevant to the research on forgery detection and the literature related to face recognition.
RQ3: Which are the trends in deepfakes research? The main topics identified using clustering have been analyzed using a Thematic Map, shown in Fig. 4. This kind of plot classifies the clusters of keywords obtained in the previous section according to Callon's centrality and density measures (Callon et al. 1991). Based on these measures, we can identify: • Niche Topics: well-developed but with a marginal role in the development of the research field, like Social Media related to Video Recording or Neural Networks in the context of Frequency Domain Analysis. • Emerging or Declining Topics: these are weakly developed and still marginal topics. Given the youth of the deepfakes discipline, they should be mainly emerging topics. Though the analysis does not identify clear emerging topics, research related to adversarial networks in the context of security might be considered an emerging area with potential relevance in the future. • Motor Topics: these are both well-developed and important in the context of deepfakes. As previously stated, the youth of the discipline causes a lack of clear candidates. Just topics related to computer graphics, network architecture, and digital forensics might be classified as Motor. • Basic Topics: these are the topics on which research should be focused. They are important for deepfake research but have not been developed yet. Here, we can find the bulk of the research. The most promising topics are convolutional deep neural networks and detection methods based on face recognition or deep learning. RQ4: How do topics in deepfakes research change over time? In addition to the dynamics of deepfakes research captured by the previous trends analysis, it is also possible to visualize the underlying dynamics using a Thematic Evolution chart, as shown in Fig. 5. We use Thematic Maps for different periods, which are then connected with those in the following one to create a stream of topics' evolution based on the percentage of keywords shared between the identified topics at each period. An insight that can be derived from this diagram is the diversification of the research around deep learning, which remains one of the main topics but with clear applications to texture analysis, fake detection, or online social networking. The same can be said about computer vision, which gets out of the main focus even more than deep learning. On the contrary, from a technical perspective, convolutional neural networks are getting more attention from recent research compared to the beginning of the analyzed period.
RQ5: Who is researching deepfakes? and RQ6: Who is funding deepfakes research? It is China as a country the one that directs the investigations, being the one that contributes the most in all regards, including funding through the Natural Science Foundation of China and NKRDPC. Researchers are mainly from this country, though many of them perform their research in the USA. On the other hand, the collaboration communities in this research area are still small and fragmented as observed when studying the co-authorship network. Usually, they are formed by just 2 or 3 authors, except for the most prolific Chinese researchers that are organized in a community of 6 authors. The same happens at the country level, most collaborations are among institutions of the same country. Additionally, though authors might be based on centers in different countries, we do not observe inter-country collaborations.
In addition to the conclusions regarding the different research questions, we have identified some missing research topics that we think should already be in the literature, such as research on the repercussions of deepfakes on marketing or online negotiation processes. These kinds of risks have been tangentially addressed in the context of studies about identity usurpation, which have been the topic of some law journals. In any case, we believe that considering the emerging risks of deepfakes in connection with tasks like online meetings is crucial.
As a limitation of this work, the number of articles found on deepfakes research made it impossible to perform a systematic literature review or meta-analysis on the whole area of deepfakes research. On the other hand, this type of study can be carried out by focusing on more specific aspects of the area identified by this work, such as the different artificial intelligence techniques used to synthesize or analyze deepfakes.
To conclude, the research articles retrieved about deepfakes serve to provide a complete overview of deepfakes. The main insights of this work include the various areas in which deepfakes research is being conducted, focusing on which areas are emerging, those that are considered basic, and those that currently have the greatest potential for development. The most studied topics in deepfakes research, including the various artificial intelligence methods employed, are analyzed together with emerging and niche topics, to provide insight into the current trends.
The relationships among the most prominent researchers, together with the countries in which deepfakes research is conducted and the main funding sources, complete the outlook regarding the people who carry out research in that area and the options for collaboration and obtaining existing funds.
Overall, this article discusses current trends and opportunities in deepfakes research for practitioners and researchers interested in this field. Future research directions emerging from the review point in the direction of the identified "Basic Topics": convolutional deep neural networks and detection methods based on face recognition or deep learning.
Author Contributions Rosa Gil and Juan-Miguel López-Gil were involved in conceptualization and methodology; Jordi Virgili-Gomà and Roberto García helped in data curation; Roberto García contributed to funding acquisition; Jordi Virgili-Gomà was involved in validation; Rosa Gil helped in visualization; Rosa Gil, Juan-Miguel López-Gil, and Roberto García contributed to writing-original draft; Jordi Virgili-Gomà, Rosa Gil, Juan-Miguel López-Gil, and Roberto García helped in writing-review & editing.
Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work has been partially supported by the project "ANGRU: Applying kNowledge Graphs to research data ReUsability" with reference PID2020-117912RB-C22 and funded by MCIN/AEI/10.13039/501100011033. Additionally, this research benefits from funding from the Research Group program of the University of the Basque Country under contract GIU21/037.

Data Availability
The datasets generated and analyzed during the current study are available online at https://drive.google.com/file/d/ 1Attj4yMnsYJB1rx9kYIdVVoQeMhqhW7k/view and are in the process of being published in the CORA-RDR repository, https://dataverse. csuc.cat.

Declarations
Competing Interests The authors have no relevant financial or nonfinancial interests to disclose.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.

Appendix A References in SDO21 Dataset
The following table lists all references in the SDO21 dataset of records retrieved from Scopus as detailed in Sect. 1. They are divided into 4 clusters centered on the keywords associated with each of them. Cluster