Research Communities in cyber security: A Comprehensive Literature Review

In order to provide a coherent overview of cyber security research, the Scopus academic abstract and citation database was mined to create a citation graph of 98,373 authors active in the field between 1949 and early 2020. The Louvain community detection algorithm was applied to the graph in order to identify existing research communities. The analysis discovered twelve top-level communities: access control, authentication, biometrics, cryptography (I&II), cyber-physical systems, information hiding, intrusion detection, malwares, quantum cryptography, sensor networks, and usable security. These top-level communities were in turn composed of a total of 80 sub-communities. The analysis results are presented for each community in descriptive text, sub-community graphs, and tables with, for example, the most-cited papers and authors. A comparison between the detected communities and topical areas defined by other related work, is also presented, demonstrating a greater researcher emphasis on cryptography, quantum cryptography, information hiding and biometrics, at the expense of laws and regulation, risk management and governance, and security software lifecycle.


Introduction
The cyber security research community is an eclectic group, addressing a diverse set of research questions, based on multifarious theories and deploying sundry methods, making it difficult to obtain a comprehensive grasp of this league. Using quantitative methods, the present work aims to summarize the activities of this group of researchers in a coherent manner. In a citation graph of 98,373 authors active in the field of cyber security between 1949 and early 2020, we identify twelve distinct communities focusing on various topics, such as Malware, Usable Security, Intrusion Detection and Access Control. Each community is described e.g. in terms of research foci, publication fora, and subcommunity evolution. Ever since Thomas Kuhn's seminal work The structure of scientific revolutions [1], philosophers of science have been aware of the impact of social organization on the scientific endeavor. It is therefore not surprising to discover that cyber security research communities and sub-communities are not solely explained by their topical foci, but sometimes by other factors, such as geography.
Section 2 details the methods used to collect and analyze the abstract and citation data on which the article is based. In Section 3, an overview of the collected data is done and some metadata are presented. Section 4 contains the results of the analysis, presenting in some detail each of the twelve research communities. This is followed by related works, including a comparison with other attempts to summarize the field. Section 6 consists of a discussion of the results, considering validity and reliability. The article is concluded with a summary in Section 7.

Method
as Data Privacy are covered by simpler ones such as Privacy. To ensure that this is indeed the case, we performed two searches for these two keywords, and we concluded that the results of KEY ("Privacy") include all the results of KEY ("Data privacy").
When we examined the Mobile Security keyword, we were surprised by the fact that there were 577 results related to "cytology." To eliminate them from the results of the main query, we applied the AND NOT KEY ("Cytology") filter on the keyword term. We opted to exclude them at scraping time to reduce the time required to filter them out during the subsequent analysis.
Finally, AND TITLE-ABS-KEY ("Security") was also applied on Digital Watermarking because 5,832 results of that term were related to watermarking but not for security purposes.

Collecting 59,782 articles
The top 21 relevant keywords together with the improvements explained above were used in a logical disjunction (security of data OR network security OR ...) as the core of the main search in the Scopus database in a search query that was also restricted (in terms of subject area) to computer science, engineering, social sciences, decision sciences, multidisciplinary, or undefined (to exclude articles clearly off topic), and restricted to the English language. The full query that was used can be found at 3 . This search query resulted in 320,907 articles. Unfortunately, owing to the search quota limitations of the Scopus API, we were forced to limit the results to the top 5,000 most cited articles for large queries. Thus, we performed a distinct search for each year. To avoid under-representing the years with a large number of articles (e.g., selecting all 4,374 articles from 2001, but only approximately 19% of the 26,642 articles produced in 2016), we selected the same fraction of articles for each year, with the peak of the 5,000 most-cited articles from the peak year of 2019, when 33,884 articles were produced. Hence, for each year, 3 https://git.io/JvRam the 19.9% most-cited articles were collected. In total, this resulted in a dataset of 59,782 articles.
For each author, the following information was gathered: (i) Scopus author ID, (ii) surname, (iii) given name, and (iv) affiliation.
Finally, for each affiliation, the following information was gathered: (i) Scopus affiliation ID, (ii) name, and (iii) country.

Producing the author graph
Based on the collected data, a citation graph was generated, in which all authors are linked to each other according to citations. In the graph, authors are represented by nodes, and undirected edges between nodes indicate that at least one author has cited the other at least once, and the size of the nodes is related to the number of citations each author has. The main author graph is shown in Figure 1 (to reduce its size, the graph only contains the authors that have more than 12 citations globally, and the edges are hidden).

Community detection
The author graph is a social graph, in the sense that it represents relations between people. A significant amount of research on the analysis of such graphs has been conducted, particularly on community detection. One of the bestperforming algorithms for community detection in large graphs is the Louvain method, proposed by Blondel et al. [3]. As the authors write, "The problem of community detection requires the partition of a network into communities of densely connected nodes, with the nodes belonging to different communities being only sparsely connected. Precise formulations of this 6 optimization problem are known to be computationally intractable. Several algorithms have therefore been proposed to find reasonably good partitions in a reasonably fast way." The algorithm aims to find a graph partition that maximizes modularity, which is a scalar value between -1 and 1 that "measures the density of links inside communities as compared to links between communities," defined as follows: where A ij represents the weight of the edge between vertices i and j, k i = j A ij is the sum of the weights of the edges attached to vertex i, c i is the community to which vertex i is assigned, the δ-function δ(u, v) is 1 if u = v and 0 otherwise, and m = 1 2 i,j A ij . The Louvain community detection algorithm operates as follows (in the words of the authors): Our algorithm is divided in two phases that are repeated iteratively.
Assume that we start with a weighted network of N nodes. First, we assign a different community to each node of the network. So, in this initial partition there are as many communities as there are nodes. Then, for each node i we consider the neighbours j of i and we evaluate the gain of modularity that would take place by removing i from its community and by placing it in the community of j.
The node i is then placed in the community for which this gain is maximum (in case of a tie we use a breaking rule), but only if this gain is positive. If no positive gain is possible, i stays in its original community. This process is applied repeatedly and sequentially for all nodes until no further improvement can be achieved and the first phase is then complete. [...] The second phase of the algorithm consists in building a new network whose nodes are now the communities found during the first phase. To do so, the weights of the links between the new nodes are given by the sum of the weight of the links between nodes in the corresponding two communities.
Once this second phase is completed, it is then possible to reapply the first phase of the algorithm to the resulting weighted network and to iterate. Let us denote by "pass" a combination of these two phases. By construction, the number of meta-communities decreases at each pass, and as a consequence most of the computing time is Because the order in which the nodes are evaluated may affect the outcome, we performed the partitioning process for 300 different random orderings, selecting the partition that resulted in the greatest modularity. In our case, a modularity of 0.525158 was achieved.
Some authors contribute to more than one research community. This may happen because the author's research focus is of interest to multiple communities, or because the author has published on several different topics. Regardless of the reason, the employed community detection algorithm will place such authors in the community to which they are most tightly connected. Such authors' will strengthen the relations between the concerned communities.

Community graph
Using the spring layout algorithm in the NetworkX Python library 5 , a graph ( Figure 2) was generated, where nodes correspond to communities, node size to community size, and edge width and node distance depend on intercommunity coupling. In most cases, the name of each community was given by the most influential unique keyword, i.e. the top keyword that is used in the articles with the most citations. In a few cases, however, the names were changed by the authors so that the topic of the community could be better reflected, but even in those cases a topic from the rest of the most influential community keyword list was selected. The only exception to this are the names of the cryptography I and II communities that were assigned in an arbitrary manner based on the contained articles. (The following common keywords were not unique to any community: cyber security, cyber-attacks, security breaches, security, information security, cybersecurity, computer security, cyber threats, network security and intrusion detection systems.)

Sub-community detection and description
For each community, the above process was repeated: An author graph was generated, sub-communities were detected using the Louvain algorithm, subcommunity graphs were generated (Figures 4-26), and each sub-community was summarized in terms of most common keywords, most-cited authors, top publication fora, etc.

Data
In this section, information regarding the employed raw data is presented.
In total, we have 59,782 articles published from 1949 until early 2020 that are authored by 98,373 authors fully recorded in the database; they contain 148,202 keywords. We also have 835,664 articles recorded as citations (i.e., we only have article title, author surname, and publication year). All the data was acquired from the Scopus database at the end of February 2020. The most-cited articles among them are presented in Table 1, whereas the most common keywords are Finally, the top ten publication fora globally are presented in Table 2. It is worth noting that only four out of the top ten publication outlets are conferences while the rest are journals. Therefore, the majority of the collected articles is found on journal publications.

Results & Analysis
In this section, we present the identified community clusters. In total, 12 communities were identified, as shown in Figure 2. The presentation of each community follows the same structure. First, we address the community topic.
In this section, we provide an overview of the most prominent topics in the community. It must be reminded that the clustering is based on individuals and their papers' referencing. Thus we cannot claim that clusters really represent topics in some formal way (as it is done when using topic modeling), every individual can of course cover multiple topics throughout a career, but we do however find a fair cohesiveness with respect to the topics within the communities that is interesting to report. The ambition here is thus to convey an intuitive feel for the topic(s) of the community. To this end, we list the   in which the communities will be presented below is based on how active each one of them is today, as shown in Figure 3.

Cryptography I & II
The Louvain clustering algorithm identified two communities concerned with cryptography. However, as they are closely related, their joint presentation allows a more coherent description. Even though cryptography has thousands of years of history, the academic discipline emerged in the 1970s with the creation of a public encryption standard (DES) and the invention of public-key cryptography. This community completely dominated cyber security research in the 1980s and 1990s, producing approximately 70% of all published papers in 1985 and 1986. Even though it has maintained its position as the most productive community, and its absolute number of publications continues to rise, its relative share of publications dropped to slightly above 20% in 2018 and 2019. At the core of this topic is provable security. The corresponding subcommunity is concerned with the fundamental mathematical assumptions and abstractions used in cryptography, such as the random oracle model [14] and universal composability [15]. Another sub-community is concerned with provable data possession, which is close to provable security but focuses more on the fundamentals of data integrity and authenticity verification, and on protocols that provide probabilistic proof that files are stored. As in the case of provable security, the sub-community concerned with public-key cryptography has its origins in the 1970s, producing notable contributions such as the RSA cryptographic scheme [4], the ElGamal cryptographic scheme [11], and, later, identity-based encryption [5]. In parallel, a sub-community concerned with symmetric ciphers and cryptanalysis emerged. This sub-community focuses on the construction of block ciphers, such as DES [16] and AES [17], as well as on the successful breaking of these cryptographic systems, most notably DES [18].
A common approach to cryptanalysis is to employ side-channel attacks, thus attempting to reveal secrets by measuring unintended side-effects of cryptographic computation. The most interesting side-channel attack is differential power analysis, for which there is a dedicated sub-community. Detecting small variations in power consumption patterns during cryptographic operations can be used to find secret keys from otherwise tamper-resistant devices [9]. As proposed by Agrawal et al. [19], side-channel information can be used to detect hardware trojans, that is, malicious alterations to integrated circuits [20].
In the first decade of the 21 st century, a sub-community related to this topic emerged. Other approaches to detecting hardware trojans include the use of physical unclonable functions (PUFs). PUFs are primitives for deriving secrets from complex physical characteristics of integrated circuits rather than storing the secrets in digital memory. PUFs make use of random variations during the fabrication process of an integrated circuit, and thus the secret is difficult to predict or extract [21].
The sub-community concerned with elliptic curve emerged in the late 1980s, with the independent co-discovery of that cryptographic system by Victor Miller [22] and Neil Koblitz [23].
A useful feature of an encryption system is that it allows operations on the encrypted data without revealing their content. This is the topic of fully homomorphic encryption sub-community, dominated by Craig Gentry, the creator of the first fully homomorphic encryption scheme [24]. A sub-community with similarities to the homomorphic encryption group is the one on privacy preserving schemes, which is related to issues such as searchable encryption [25] and differential privacy [26].
The sub-community concerned with attribute-based encryption has, only the last decade, dominated the cryptographic community. As in the case of ho- momorphic and privacy-preserving encryption, attribute-based encryption aims to develop methods that allow multiple users to access different parts or aspects of the encrypted data. This is achieved by using attributes to describe the encrypted data or user credentials [27]. The growth of this sub-community has been staggering, amounting to almost 40% of all cryptography publications in 2018 and 2019.
Finally, the most recently appeared sub-community is concerned with blockchain, which is directly connected with cryptography, as it is a growing collection of blocks, effectively a chain, each of which is based on the cryptographic hash of the previous block. This sub-community started to make significant contributions around 2012.

Sensor networks
Sensor networks are currently attracting great attention; they represent the largest and most active community (if we consider cryptography I and II sepa-  Agency started the Distributed Sensor Network program to explore the challenges in implementing distributed/wireless sensor networks [28].
Since its appearance, the community has followed a continuous increase in publications. In 2002, the community started growing rapidly until 2011, and  wireless sensor networks and ad hoc networks. Finally, we have security mechanisms and attacks: physical layer security, content-based security,
The most active sub-community overall is that concerned with the internet of things. The term "internet of things" was introduced around 1999. One of the earliest important articles of this community, however, appeared in 2005 [29] and describes an end-to-end security architecture for constrained embedded devices.
As all the top five community articles (Table 7) are related to security mechanisms, it is no surprise that the sub-community concerned with physical-layer security is also on the top of the list of the most productive sub-communities.
The most cited article produced by this sub-community is [30], in which the problem of confidentiality over wireless channels is mathematically formulated.
Another common problem in sensor networks is how new nodes can be added to the network and be able to communicate securely with the existing ones. To resolve this, a key-management system is required, as that described in [31].
Then, a content-based security mechanism can be used so that each wireless sensor node can only have access to specific content even though the messages are available to all the nodes.
Another topic that is currently attracting attention is vehicular communications, where one is interested in the communication between vehicles and between vehicles and the road infrastructure. This area started developing dramatically after 2008 and continues to grow owing to its relation to autonomous vehicles and smart cities. The most cited article produced by this sub-community is [32].
The most influential affiliation country is the United States leading with a big difference from the second which is China while Canada is following closely in the third place. Then Switzerland and United Kingdom also follow with a distance gap.
The sensor networks community is closely related to the cryptography, malwares, and intrusion detection communities. This can be explained by the need for authentication and encryption methods in sensor networks. This relation can also be seen from the existence of the physical-layer security sub-community within the sensor networks community.

Information Hiding
The information hiding community is to a large extent interested in steganography and this is the same topic that represents the history and back-  Douceur, J. R.
The sybil attack (2002) [36] ground of this community. Steganography is concerned with disguising information in data available to unwanted eavesdroppers. In contrast with cryptog-

United States 87336
China 17937 Canada 13599 Switzerland 6357 United Kingdom 6126 raphy, where it is evident that there is a message sent, in steganography the challenge is to conceal the transmission of a message. This subject stems from information theory. The general principle is to identify redundant bits in data in a cover medium and to encode the secret message in a produced stego medium.
As in the case of cryptography, the concept of steganography dates back long in history, with examples from ancient Greece, Rome, and China. As a scientific discipline, its foundation was laid in Shannon's paper "Communication Theory of Secrecy Systems" [37], which was published in 1949. To complement the field of cryptography, Shannon introduced "[...] true secrecy systems where the meaning of the message is concealed by cipher, code, etc., although its existence is not hidden, and the enemy is assumed to have any special equipment necessary to intercept and record the transmitted signal." Despite its early birth, the community had only minor activity until the 1990s. However, since then, it has steadily grown into a large community, with its publication trend pointing upward.
Steganography represents the largest sub-community within this community, and it is concerned with both encoding and information hiding. As already mentioned, this is the core of this community and the first, historically, topic of interest. It gained momentum in the late 1990s and has since exhibited a steady increase in article production.
Normally regarded as a complementary approach to steganography, watermarking has become one of the largest information hiding sub-communities.
The term watermarking relates to a paper-making technique for keeping track of provenance. Watermarking is similar to steganography in that it embeds and hides information in a source data file. However, it also differs significantly.
Watermarking has a robustness requirement: it should not be possible to remove (e.g., by image cropping, scaling, and rotation, or through conversion or compression). A watermark is not necessarily hidden, but, as in the case of Kerkhoffs' principle for cryptosystems, it should be difficult to remove even if the algorithm that generated it is known. The general concept of watermarking has a long history, but digital watermarking was born in the early 1990s [38].
Even though it is possible to describe the difference between the watermarking and the steganography sub-communities, their separation in terms of individuals appears less clear. There are people in the steganography group that have produced articles on watermarking.
In the information hiding community, there are also a number of sub-communities concerned with encryption for information hiding purposes.
The first is chaos-based image encryption. This sub-community is con-cerned with encryption techniques based on chaos theory. This approach is based on that chaotic systems are suitable for encryption, as they are sensitive to initial conditions. Authors in this community also note that Shannon, who became a member of this community, already in 1949 [37] (before the development of chaos theory) outlined the fundamental principles for the domain.
Despite the old roots of the sub-community, the number of produced papers increased significantly only in the second half of the 2000s. Currently, this sub-domain is one of the two largest, with China dominating the production.
The sub-community of selective encryption is concerned with combining compression/decompression with encryption/decryption for multimedia data (video and audio). A fundamental challenge in this field is that encryption and decryption should be performed on large volumes of data and in real time.
To balance this trade-off, videos are encrypted only partially and selectively, hence the name. The community grew with wide availability of the internet and the advent of services such as video-on-demand. It has been and remains a small community but with a fairly steady production rate.
In visual cryptography, the fundamental principle for hiding data in images is to divide a secret image into different shadow images, called shares. The shares are devised so that if certain subsets are combined, the original secret image is recovered, whereas individual shares or combinations of unqualified shares contain no information. The community is fairly small and steady-sized, with Taiwan and China in the front.
As the name implies, the sub-community of optical image encryption is concerned with optical filters that diffuse the original image to noise, and then recover it back. These diffusers operate both in the space as well as in the spatial frequency domains, the latter using various mathematical transforms, such as Fourier, Fresnel, and Gyrator. It should be noted that this sub-community is perhaps best considered to belong to an optics community (not studied here) rather than to computer security.
Finally, there is a sub-community concerned with reversible data hiding.
This group focuses on techniques that insert information by modifying the orig- inal file or signal, but they enable the exact restoration of the original after the extraction of the embedded information. A few articles in the community date back to the 1990s; however, our data suggest that it materialized in the second half of the first decade of the 21st century, and it is now established as a small community, with China in the lead.
In general, China dominates the information hiding community, and Taiwan has also a strong position. The US is the second most influential country. Not surprisingly, it has strong academic relationships with cryptography.

Intrusion Detection
This community came into being in the late 1990s. It has since experienced uninterrupted growth in productivity, and it was one of the five most productive communities in 2019. The community initially focused on general intrusion/anomaly detection systems and attack graphs. Important early Figure 11: Growth of the detected information hiding sub-communities over time.

Author Citations
Fridrich, Jiri 3495 Chang, C. C. 2267 Wang, Xing-yuan 1762 Anderson, R. J. 1692 Kilian, Joe 1399 Around 2000, a sub-community grew around a particular type of network attacks, namely, distributed denial-of-service attacks, as exemplified by [46]. In the middle of the first decade of the 21st century, as the interest in attack graphs, anomaly detection, and DDoS attacks increased, new subcommunities also emerged. A sub-community developed around signature-based deep packet inspection, as described in, for example, [47]. Another subcommunity, which also appeared at the same time, is concerned with information visualization and developed methods for visualizing security-related network data to facilitate manual intrusion detection, as exemplified by [48].
Around 2010, the interest in cloud computing reached its peak. This is one of only two exceptions to the American dominance of the intrusion detection community, as the most influential country, in terms of citations, in the cloud computing sub-community is India. A characteristic article is [49], in which different intrusion techniques affecting availability, confidentiality, and integrity of cloud resources and services are surveyed. The second sub-community in which the US is not dominant, which also peaked around 2010, is concerned with traffic classification using machine learning. Its focus appears to be the same as that of the anomaly detection sub-community. Here, Canada and Spain are among the most influential countries, and publication forums are generally concerned more with topics related to networks and communications.
Since 2010, the sub-community concerned with software-defined networking (SDN) has significantly increased in terms output. It focuses on multiple security concerns in the SDN domain, including intrusion detection (e.g., [50]) as well as control-plane saturation attacks [51].
Most closely connected with the malwares community, the intrusion detection community has also connections with the sensor networks commu-  Evaluating intrusion detection systems (2000) [54] nity. Figure 13: Growth of the detected intrusion detection sub-communities over time.

IEEE Access
Computer Networks

Journal of Network and Computer Applications
Expert Systems with Applications

Malwares
The malwares community has been active since 1973. Malware research is focused on discovering, preventing, and stopping malicious software, including viruses, trojans, ransomware, and spyware. Early papers produced by members of this community were related to secure information flow [55] and the modeling of security policies [56]. Thus, this community is closely related to other cyber security communities such as intrusion detection. More fundamental work was

United States 58718
China 7359 Australia 5762 India 4579 Canada 4196 carried out slightly later, with, for example, Fred Cohen from Lehigh University, presenting early theory and experiments on computer viruses [57]. Another influential paper (from 1991) used directed-graph epidemiological models for the spread of computer viruses [58].  The study of botnets, computer viruses, and virtualization techniques for operating systems that enhance security, follow with slightly less publications.
Finally, adversarial learning represents a novel research field that came into existence only after 2000 and employs machine learning techniques to model adversaries so that attack simulations can then be performed, as described in [62].
The malwares community is closely related to the intrusion detection, sensor networks, and cryptography communities.

Biometrics
The biometrics community is one of the largest communities in our analysis, in terms of community members. It appeared in the early 1980s, almost two decades after the introduction of the first semi-automatic face recognition system by Woodrow Bledsoe in 1968.
It followed a slow but steady productivity growth; currently, it is exactly in the middle among all communities in terms of productivity.
As seen in Figure 16, the biometrics community has eight sub-communities, which can be divided into three research domains. The first focuses on appli-   [65] cations: biometric fingerprinting and surveillance systems. The second is concerned with authentication schemes: keystroke dynamics and biometric authentication. The third is concerned with methods: face recognition,  Overall, the most active and oldest sub-community is that concerned with biometric fingerprinting. Articles in this community primarily focus on the general design of biometric systems and their procedures, as for example in [66].
The most interesting older article is [67], which is a study on secure, off-line,

United States 124853
Germany 14456 China 9242 Italy 8035 United Kingdom 4871 authenticated user-identification schemes based on a biometric system.
The surveillance systems sub-community is not only related to identifying persons but also to privacy concerns regarding such systems, as presented in [68].
The biometric fingerprinting sub-community is closely related to both the biometric authentication and the keystroke dynamics sub-community.
The biometric authentication sub-community is concerned with all possible types of biometric features, such as neural activity and brainwaves. Interestingly, the keystroke dynamics sub-community began publishing in 1990 and is currently the second most active sub-community. One of the earliest and most cited articles is [69], which described a user authentication/identification method by studying keyboard typing habits.
Another research topic in biometrics that is currently attracting great attention is person re-identification, which is the process of associating images of a person captured from different cameras or from the same camera in different environments. As expected, this is also related to face recognition.
Finally, background subtraction is a technique that removes the background of an image or video to study only useful content, something that is used in biometrics recognition. Gait recognition is the study of human motion, which can be considered a biometric feature, and can be used to identify people.
The most influential affiliation country is once more, the United States leading with a significant difference from the second one which is China, while Italy is following closely. Then United Kingdom and South Korea follow in some distance.
The biometrics community appears distantly related to the other communities in our analysis, but it is closer to the information hiding community.

Cyber-Physical Systems
The cyber-physical systems community is a medium-sized community, which is relatively new, as it came into existence approximately 25 years ago. It has experienced a steady growth and is currently the seventh most productive community.    Owing to the move from simple control systems towards IT systems, the intrusion detection systems and vulnerability analysis sub-communities, which are primarily IT-related, are found within the cyber-physical systems community. The most important article published by the intrusion detection systems sub-community, which is the most active one, is [74], in which the significance of cyber infrastructure security within the power domain, to prevent, mitigate, and tolerate cyber-attacks, is highlighted.
The false data injection attacks sub-community is currently the second Finally, the most important article in the communication system security sub-community is [77]. This article is important because it is an experimental security analysis of a mix of industrial-grade networks and cyber-physical systems, which are found not only in vehicles but also in the energy domain.
The most influential affiliation country is the United States leading with a big difference from the second which is China while the United Kingdom and Sweden are following in the third and fourth position. One observation is that this community is equally concerned with attacks on state estimators for power grids, as also with attacks on the wider industrial control systems (ICS).
The cyber-physical systems community is closely related to the sensor networks community, as sensors and sensor networks are becoming a standard in modern power grids. Additionally, it is also related to the intrusion detection community. This is due to the fact that vulnerability analysis of power grid infrastructures is becoming increasingly widespread.

Authentication
The authentication community is a relatively small-size community although it is one of the oldest communities in our analysis. It started in the late 1970s with research on authentication (using passwords) and authenticated encryption systems for computers. The Diffie-Hellman key exchange [6] is a Figure 19: Growth of the detected Cyber-physical systems sub-communities over time. Attack detection and identification in cyber-physical systems (2013) [76] characteristic example.
This community followed a steady growth in productivity, except for the  Cárdenas, Alvaro A. 395 Table 30: Most-cited countries (top five) in the cyber-physical systems community.

United States 24938
China 2886 United Kingdom 1633 Sweden 1410 Italy 1131 period 2012-2016, during which it remained static. Currently, it is the eighth out of the twelve communities in terms of productivity.
As seen in Figure 20, it has six sub-communities. Among them, the mutual authentication sub-community is currently the most active. It is primarily concerned with "two-factor authentication" (also called mutual authentication), which is commonly achieved by using a hardware authentication device (such as a OTP (one-time password) generator or OTP device). The majority of publications in this sub-community were made after 2006, and one of the earliest important articles was [79], in which a two-factor authentication protocol for wireless sensor networks was proposed.
However, one of the most active sub-communities in the past was the password sub-community. A characteristic example is [80], which proposes a secure password authentication method that is immune to eavesdropping and tampering by an attacker. This method, which is currently widely used, involves the use of hashed passwords. In more recent articles, a close relation to the mutual authentication sister sub-community can be seen.
Authentication mechanisms can also be used in tandem with confidentiality mechanisms and achieve key agreement; these are two homonymous sub-communities. The subcommunity concerned with confidentiality is currently the second most active. Finally, rfid (radio-frequency identification) is another hardware solution that can be used as a two-factor authentication token, hence it exists as a sub-community on the authentication community.
The most influential affiliation country is China, leading with a small difference from the second one which is Taiwan, while the United States is following closely.
The authentication community is closely related to the cryptography community. This is because authentication uses cryptographical elements, hence the cryptographic protocols sub-community. For example, the Diffie-Hellman key exchange, mentioned previously, uses public-key cryptography for both encryption and authentication. Authentication is also closely related to the sensor networks community, as sensors and sensors networks require authentication and security methods. This relation can also be seen from the physical layer security sub-community within the sensor networks community.

Usable Security
The Usable security community has been active since 1973, but at that time, it was concerned more with protection motivation theory, which aims to clarify fear appeals and proposes that people protect themselves based on a  Burrows, M. A logic of Authentication (1990) [83] number of different factors. Then, in the late 1980s, the term phishing came into existence and research that is more related to phishing, usable security, and information security awareness began to appear. Phishing relates to fraudulent techniques for obtaining sensitive information by disguising as a   [84], in which the protection motivation theory was founded.
Initially, the community produced a few papers per year, but after 2002, it experienced greater growth. Currently, it is one of the smallest communities in terms of size, and tenth out of twelve in terms of productivity. Figure 22 shows the six sub-communities. The two largest ones are concerned with password security and phishing. The former focuses on the study of

Country Citations
China 21642 Taiwan 20289 United States 16858 India 10270 South Korea 5889 password habits, graphical passwords, and other password-related topics.
The phishing sub-community is concerned with both phishing and mitigation techniques for phishing (anti-phishing), which is a very modern topic of research.
The protection motivation theory sub-community, which was historically the largest one until 2009, focuses on information security awareness and information security policy compliance.
The cybercrime sub-community appeared in 2002 and is concerned with studying the social networks of malware writers and hackers, the social behavior in online black markets, and the creation of attacker profiles among others.
The reason for having such a sub-community is no other than the fact that cybercrime can also be the result of low information security awareness and phishing attacks.
Finally, the economics sub-community is concerned with the economic ef- fect of phishing attacks and the economics of security investments, whereas the trust sub-community with trust issues in IT systems.
Once more, the most influential affiliation country is the United States, while United Kingdom comes second, and Canada is in the third place. Then Germany and Finland are also following.
The usable security community is closely related to the malwares and intrusion detection communities.

Access Control
The access control community is currently one of the smallest (in terms of size) and least active. It began publishing in the middle 1970s and was primarily concerned with role-based access control and access-control policies. One of the most important early articles is [90], which is also the most cited in the community. This article focuses on a certain type of access control, namely, Figure 23: Growth of the detected usable security sub-communities over time.

United States 35537
United Kingdom 5444 Canada 5333 Germany 2714 Finland 1503 role-based access control (RBAC), and describes a framework in which the use and management of RBAC can become easier and more effective.
Until 2009, its member count was slowly increasing. Subsequently, it shrank, and in the last six years, it has remained steady. Figure 24 shows the six sub-communities. Among them, the privacy com-munity has been one of the most active. However, its size has also shrunk, following the parent community. This sub-community is concerned more with trust and privacy issues in software applications, but it also conducts research on policy and privacy management as well as solution enforcement. The most important article is [91], which presented a new, at that time, trust management system, called Policy Maker.
The security requirements sub-community is currently the second most active. It is concerned with the study, analysis, and/or modeling of the security and privacy requirements of existing applications, as presented, for example, in [92]. Then, the context-aware computing sub-community is concerned with access control mechanisms for ubiquitous computing. Finally, the grid computing sub-community is concerned with access-control systems in grid computing.
The most influential affiliation country of the whole community is the United States leading with a significant difference from the second one which is Italy, while United Kingdom is following very closely.
The access control community is closely related to the cryptography and malwares communities. The first relation could be explained because together with access control an authentication mechanism is needed. Malwares on the other hand are related to access control systems because many times they can bypass them.

Quantum Cryptography
Quantum cryptography uses quantum mechanics to perform cryptographic tasks. The best-known example of quantum cryptography is quantum key distribution. In our analysis, it corresponds to the smallest and least active community. The community came into existence in the early 1980s. One of the earliest important articles is [95], which is also one of the most cited in the community. In this article, the fundamental requirements for achieving quantum key distribution are described. The community has followed a slow but steady growth in productivity.     [96].
The quantum key distribution sub-community is the second most active and is concerned with key generation and distribution between two parties over quantum communication channels. Ahn, Gail-Joon 738

United States 24210
Italy 5731 United Kingdom 4174 Germany 2032 Canada 1015 The privacy amplification sub-community is concerned with encryption techniques using quantum mechanics, as for example in [97].
The absence of a "post-quantum cryptography" sub-community might be obvious but there is an explanation for that. Since homomorphic encryption is usually based on lattice-based methods and post-quantum encryption, the postquantum sub-community is absorbed and split among the fully homomorphic encryption and McEliece cryptosystem sub-communities, which are found within the two cryptography communities.
The most influential affiliation country is China, second is the United States while Canada is following closely in the third place.
This community is closely related to the cryptography communities and less closely related to the sensor networks and steganography communities.

Related Work
The Cyber Security Body of Knowledge (CyBOK) [101] is an ambitious attempt to identify the foundational knowledge areas of the cyber security sector and inform both academia and practitioners about them. CyBOK differs from the current work in both subject and method. CyBOK aims to organize the cyber security knowledge rather than to understand the research community.  However, the presented topics have similarities with the communities discovered in the present work. As with CyBOK, [102] and the present work are compared below.
In addition to the two aforementioned works spanning the whole topic of cyber security, there exist a number of literature reviews focusing on specific subtopics, such as cross-site scripting [103], information security management [104], security awareness [105], security and privacy in health [106], cloud computing risk [107], information security policy compliance [108], cyber situational awareness [109], digital forensics [110], phishing [111], threat modeling [112], and security requirements engineering [113].
There is also work employing similar methods as the current, but targeting other research areas. An example of such work is [114], where a similar automated approach for collecting and analyzing abstract and citation data was used.

Comparison to CyBoK
Since CyBoK [101] aims to identify the top knowledge areas (KAs) of cyber security, it constitutes a relevant object of comparison.
In Figure 28, a comparison matrix between CyBoK and our work is presented. While there is a large overlap between the CyBOK knowledge areas and the researcher communities identified in the current work, there are some notable differences.
The quantum cryptography community is not found under any CyBOK knowledge area or knowledge area sub-category. Furthermore, information hiding and biometrics constitute significant research communities, but features less prominently in CyBOK. It is also noteworthy that cryptography is the overwhelmingly dominant research community, but does not appear to hold a similar position in CyBOK.
As compared to CyBOK, precious little research was identified in the fields of laws and regulation, as this topic did not even qualify for a sub-community.
Additional CyBoK KAs that are only sparsely represented by the detected com-  Finally, the mid-sized malwares community covers aspects of several Cy-BOK knowledge areas, including malware & attack technologies, adversarial behaviors, forensics, operating systems & virtualization security, and software security, thus indicating another difference in emphasis between the research communities and CyBOK.

Comparison to Baset and Denning
As mentioned above, Baset and Denning [102] uses Latent Dirichlet Allocation to identify the topics in security and privacy research. In their article, 95 research topics were identified and categorized in 20 topic categories.
In Figure 29, a comparison matrix between Baset and Denning [102] and our work is presented. The matrix demonstrates that all of Baset and Dennings' topics are either fully or partially represented by at least one sub-community.
Considering the coverage of Baset and Denning's topics, we note that quantum cryptography was not present in any of their topic categories, which was also the case for CyBOK [101]. There is perfect alignment between the present and Baset and Denning's work for the "crypto" and "malware" topic categories. Considering the researcher communities' coverage of Baset and Denning's topics, the "formalism" and "methods" topics are the least represented by the communities detected on our work. It appears that these topics are distributed over many different communities.
It is not surprising that there are differences between the present and Baset and Denning's work. Latent Dirichlet Allocation considers the textual content of articles, while community detection is focused on the authors of those articles.
One benefit of Latent Dirichlet Allocation is precisely the ability to abstract from the research process and organization, solely considering the produced results.
Ideally, this approach would produce something similar to CyBOK, which also focuses on the abstract subject areas.
The citation relationships between researchers as presented in the present  work is complementary to that of CyBOK and Baset and Denning. It provides information on the influence of one field on another, on the evolution of ideas, as well as on occasional topically inexplicable researcher behavior, such as why similar sub-communities sometimes maintain a distance. It also provides information on the geographical and organizational influence on different fields.

Discussion
There are a number of potential objections to the reliability and validity of the results presented in this article. That we only included the 59,782 most cited of the 320,907 articles might affect the results of the study. However, most of the omitted articles have less than a single digit number of citations and are therefore arguably unlikely to affect the community detection procedure.
Another threat to the validity of this study may be that older articles have received more citations than newer, simply because time has provided them more possibilities to be cited. This bias emphasizes older research over newer, and may thus also emphasize old research communities over newer ones. Timenormalizing citation counts is, however, not trivial, as citations are not necessarily a linear function of time -some articles continue to be cited long after publishing, while others do not, for instance. However, the results section provides plots of the annual article count per sub-community. These plots, such as There is also the question of where the line is drawn for what constitutes a sub-community. We have defined it by a lower limit to the number of included authors. It would be possible to use other or additional criteria, such as the total number of citations.
Finally, the selection of Scopus as the (single) source of data has surely affected the results in terms of completeness. However, in addition to its broad coverage, Scopus also provides the application programming interface access which was required for this study.

Summary
By analyzing the most-cited scientific articles of 98,373 authors in the cyber security and information security domains, we were able to detect 12 research communities and sort them based on their current activity level: cryptography (I & II), sensor networks, information hiding, intrusion detection, malwares, biometrics, cyber-physical systems, authentication, usable security, access control, and quantum cryptography.
For each of these communities, we presented, among others, an overview of their topics, a discussion on their evolution over time, the sub-communities involved, and the most-cited articles.
As compared to related work aiming to represent both academia and practitioners, the presented research communities appear to place a greater emphasis on cryptography, quantum cryptography, information hiding and biometrics, at the expense of laws and regulation, risk management and governance, and security software lifecycle.