Respite for SMEs: A Systematic Review of Socio-Technical Cybersecurity Metrics

Featured Application: The results of this work will be incorporated in an application for SMEs in Europe, which aims to improve cybersecurity awareness and resilience, as part of the EU Horizon 2020 GEIGER project. Abstract: Cybersecurity threats are on the rise, and small-and medium-sized enterprises (SMEs) struggle to cope with these developments. To combat threats, SMEs must ﬁrst be willing and able to assess their cybersecurity posture. Cybersecurity risk assessment, generally performed with the help of metrics, provides the basis for an adequate defense. Signiﬁcant challenges remain, however, especially in the complex socio-technical setting of SMEs. Seemingly basic questions, such as how to aggregate metrics and ensure solution adaptability, are still open to debate. Aggregation and adaptability are vital topics to SMEs, as they require the assimilation of metrics into an actionable advice adapted to their situation and needs. To address these issues, we systematically review socio-technical cybersecurity metric research in this paper. We analyse aggregation and adaptability considerations and investigate how current ﬁndings apply to the SME situation. To ensure that we provide valuable insights to researchers and practitioners, we integrate our results in a novel socio-technical cybersecurity framework geared towards the needs of SMEs. Our framework allowed us to determine a glaring need for intuitive, threat-based cybersecurity risk assessment approaches for the least digitally mature SMEs. In the future, we hope our framework will help to offer SMEs some deserved respite by guiding the design of suitable cybersecurity assessment solutions.


Introduction
In recent times, we have seen a surge in cyber threats that businesses are struggling to cope with [1]. Additionally, the frequency with which cybersecurity incidents occur, and the costs associated with them, are on the rise [2]. Among businesses, small-and medium-sized enterprises (SMEs) are most vulnerable, due to a shortage of cybersecurity knowledge and resources [3]. The vulnerable position of SMEs is being exploited, as witnessed by the large proportion of SMEs that experience cyber incidents [4].
In SME cybersecurity, the interplay between the social and the technical is essential [5], which is why SMEs are often studied from a socio-technical systems (STS) perspective [6]. The view of STS is that joint consideration of social and technical elements is necessary [7]. This view has interesting implications in cybersecurity, where humans are generally found to be the weakest link [8,9].
Due to their lack of resources [3] and the complex socio-technical setting they operate in, SMEs struggle to address their cybersecurity issues autonomously [10]. Before SMEs can begin to improve their cybersecurity posture, it is vital they first assess their current situation [11]. Assessment of cybersecurity posture is achieved by measuring SME cybersecurity properties, which result in cybersecurity metrics. Regardless of whether measurement results are deemed relevant by the SME, the knowledge gained by those involved in the measurement process is of value [12]. This observation touches once more on the socio-technical nature of the problem, where furthering human knowledge and improving the technical cybersecurity posture of an SME go hand-in-hand.
Cybersecurity assessment generally requires the aid of cybersecurity experts-personnel that SMEs typically do not have [9,10]. A solution to this issue is to automate the cybersecurity assessment process where possible [9]. Although automation is a promising approach, the diverse nature of the SME landscape is often ignored [13,14], whereas we know from earlier research that it is vital for SMEs to have solutions adapted to their context and needs [15,16].
Another issue is that cybersecurity assessment approaches aimed at SMEs are still scarce [6], explaining why it is not uncommon to see results from other cybersecurity focus areas being applied to the SME setting [10]. Systematic literature reviews are a logical approach to gather knowledge from one focus area, summarise it, and make it available for use in other focus areas.
Systematic reviews that address both the social and technical sides of cybersecurity already exist [17,18]. These reviews identified a need for adaptable solutions [18], which we have seen are also craved by SMEs. Additionally, these papers stress the need for more clarity on how to aggregate security metrics [17,18]. Given the lack of resources available at SMEs, aggregating information into understandable insights is a requirement for a usable solution [9].
The issue with these systematic reviews is that they offer adaptability and aggregation as areas for future research, rather than addressing the topics head-on. Additionally, they do not provide actionable insights for SMEs since this is not their target audience.
In short, we can conclude that SMEs need (semi-)automated cybersecurity assessment approaches that address their needs for adaptability and aggregation of information. A systematic review offers the potential to gather and summarise such information, providing guidelines for designing usable solutions for SMEs. This motivates the need for a systematic review of cybersecurity metric research, where both the social and technical sides of the puzzle are acknowledged. This is exactly our aim in this paper, as we try to answer the following research questions: • RQ1: How are cybersecurity metrics aggregated in socio-technical cybersecurity measurement solutions? • RQ2: How do aggregation strategies differ in cybersecurity measurement solutions relevant to SMEs and all other solutions? -RQ2.1: What are the reasons for these differences? -RQ2.2: Which aggregation strategies can be used in SME cybersecurity measurement solutions, but currently are not?
• RQ3: How do cybersecurity measurement solutions deal with the need for adaptability?
In Section 2, we cover related work from several different perspectives to provide a basis for our systematic review. Our systematic review methodology is detailed in Section 3, after which we present our results in Section 4.
To ensure that the insights we gain on aggregation and adaptability are captured in an actionable form, we incorporate them in a novel socio-technical cybersecurity framework geared towards SME needs. Our framework, introduced in Section 5, integrates our systematic review results with existing knowledge to arrive at concise guidelines for what can be expected of various SME categories. Section 6 focuses on outlining the answers to our research questions, as well as covering limitations and threats to validity. Finally, we conclude in Section 7, additionally outlining potentially fruitful areas for future research.

Related Work
Before covering work relating to our socio-technical cybersecurity metric setting, we should be clear on our definition of what constitutes a cybersecurity metric. We make use of the definition of a cyber-system as specified in Refsdal et al. [19]: "A cyber-system is a system that makes use of a cyberspace". Refsdal et al. [19] define cyberspace as "a collection of interconnected computerized networks, including services, computer systems, embedded processors, and controllers, as well as information in storage or transit". There is no standard definition of what constitutes a (cyber)security metric [17]. Borrowing ingredients from earlier definitions, we define a cybersecurity metric to be: any value resulting from the measurement of security-related properties of a cyber-system [17,19,20].

Socio-Technical Cybersecurity
Humans are often considered the weakest link in cybersecurity [21]. It is vital to recognise the interaction of the social and technical sides of cyber-systems when modelling and measuring cybersecurity, which is why the field of STS has played such an important role in cybersecurity metric research [22]. STS research has uncovered the dangers of considering social and technical elements separately [23] and has offered insight into how to avoid these dangers [7].
Recognition of the human factor in cybersecurity goes beyond simply including static human actors. This is where behavioural theories such as protection motivation theory (PMT) and self-determination theory (SDT) come in [24,25]. PMT reserves a prominent role for extrinsic motivators and threat appraisal [26]. SDT includes extrinsic motivation as a central concept but often focuses on moving from extrinsic to increasingly internalised motivation [24]. In the context of SMEs, intrinsic motivation to improve cybersecurity is often hard to find. However, there are solutions to this problem. Committing to improving cybersecurity in an organisation can motivate employees [24]. From the STS perspective, it is common to distinguish between metrics that include the real-life threat environment and those that do not [22]. Threat perception lies at the core of PMT and is important in security applications using SDT [25]. Another solution to promote motivation among SME employees would therefore be to incorporate the real-life threat environment in our cybersecurity metrics. Later in this paper, in Section 4, we describe whether this is indeed something we observe in current research.
We will address the social dimension using the ADKAR model of Hiatt [27]. This model, originating from change management, considers five phases in managing the personal side of change: awareness, desire, knowledge, ability, and reinforcement. ADKAR has previously been applied in assessing information security culture within organisations [28]. We apply ADKAR as a means to classify the socio-technical cybersecurity metrics we encounter. We define a socio-technical cybersecurity metric to be a cybersecurity metric that requires measuring the outcome(s) of the actions of at least one (simulated) human actor. We do not address the technical dimension explicitly in this definition, as the technical dimension is implicit in the term "cybersecurity". We hypothesise that all socio-technical cybersecurity metrics can be linked to one or more of the ADKAR categories.

Cybersecurity Metric Reviews
Systematic reviews are common in cybersecurity metric research. However, as Table 1 shows, they are often narrow in scope. Either the focus area is narrow, or the research does not consider social factors. The papers that do cover both social and technical factors often do so passingly, and without covering the intricacies and implications of sociotechnical interactions. Table 1. Existing cybersecurity metric (systematic) reviews. The research focus area is shown, with "generic" indicating research without a specific focus area. We consider social factors to be evaluated when the review covers socio-technical cybersecurity metrics.
We address the acknowledged challenges of aggregation and adaptability head-on in our systematic review, ensuring that our approach is both distinct from earlier work and provides a meaningful contribution to the field. Furthermore, we employ a novel systematic review approach (as outlined in Section 3) and target our analysis to aid SMEs, a group with specific needs often not considered in earlier work.

Aggregation
In cybersecurity metric research, aggregation strategies vary, although the importance of proper aggregation is widely recognised [17,18]. To discuss different aggregation strategies, we define a mathematical context with an aggregation strategy S : R n ≥0 → R ≥0 , where R ≥0 is the set of non-negative real numbers. We define metric value variables x i , corresponding to metrics i = 1, . . . , n. The metric values are assumed to be non-negative: x i ∈ R ≥0 ∀i. We assume that for each metric, a higher metric value corresponds to lower security, without loss of generality. A negative relationship between a metric and security is common in the security literature, as it is often the lack of security, or risk, which is being measured.
A desirable property of a strategy S is that it is responsive to changes in metric values. This is captured by the property of injectivity, where we consider a strategy S to be injective when for a, b ∈ R ≥0 , a = b, S(a, x 1 , x 2 , . . . , x n ) = S(b, x 1 , x 2 , . . . , x n ). Injectivity implies that a change in a metric value will always result in a change of the aggregate, provided all else remains constant. A stronger requirement would be strict monotonicity of the strategy S. Although this property could be desirable in the cybersecurity context, we only consider the less strict injectivity in this paper.
A common property of averages, which constitute a specific branch of aggregation, is idempotence. A strategy S is idempotent, when for a ∈ R ≥0 , S(a, a, . . . , a) = a. When an aggregation strategy S is both injective and idempotent, the result of the aggregation always lies between the minimum and the maximum values of all metrics. Both injectivity and idempotence capture what we would intuitively expect of an aggregation strategy, as these are properties satisfied by the Pythagorean means. In this sense, these are desirable properties in the context of SMEs, where cybersecurity knowledge is often lacking. To still allow employees to feel competence and relatedness [25] in the complex cybersecurity setting, we should at least use an aggregation strategy they understand.
Three additional properties are important in the security context. The possibility to prioritise certain metrics over others is desirable [47]. Formally, we consider a strategy to allow for prioritisation when for any a, b > 0, a = b, there exists a pair i, j with i = j, such that S(x 1 , . . . , x i = a, . . . , Strategies should also be able to accommodate dependencies between security metrics. However, it is complicated to include metric dependencies, with some seeing it as "the most challenging task" in aggregation [18]. For strategies in the set D of strategies that satisfy the necessary differentiability properties, we define a strategy S to allow for dependencies, when there exist distinct metrics i, j, and k such that: Equation 1 captures the idea that a strategy S allows for dependencies among metrics when it allows for relationships among metrics that are not proportional to other relationships. For aggregation strategies S / ∈ D, we employ the same verbal definition. Care should be taken to adjust the criterion of Equation 1 appropriately where it cannot be applied directly for the strategy S. A last core principle in security is that systems are only as secure as their weakest link [48]. Assuming that we have at least two distinct values among our metrics, there exists a minimum value x min and a maximum value x max . Since we assume metrics relate negatively to security, x max corresponds to the weakest link. A strategy S satisfies the weakest link principle if for any a > 0, S(x min + a, . . . , x max ) ≤ S(x min , . . . , x max + a), and there exists an α > 0, such that S(x min + α, . . . , x max ) < S(x min , . . . , x max + α). Thus, weakening the weakest link has more impact than weakening the strongest link with an equal amount.
The most common aggregation strategy employed in the literature is the weighted linear combination (WLC), which can be defined as: WLC contains the special cases of the weighted sum (a = 0, b = 1), the weighted average (a = 0, b = ∑ w i ), and the arithmetic mean (a = 0, b = n, w i = 1 ∀i). WLC strategies are injective, idempotent, and allow for prioritisation through weighting. However, these strategies do not allow for dependencies and do not satisfy the weakest link principle.
A related set of strategies are the weighted product (WP) strategies: Among the WP strategies are the simple product (a = 0, b = 1, w i = 1 ∀i) and the geometric mean (a = 0, b = 1, w i = 1 n ∀i). WP strategies satisfy the same properties as WLC strategies, except for the idempotence property, which these strategies do not satisfy.
Using the weighted maximum (WM) -S W M (x) = max{w 1 · x 1 , . . . , w n · x n }, w i > 0 ∀i metric value as the aggregated value is uncommon in most disciplines, since this strategy is not injective. However, it is used in the security field [49], and is in fact an extreme case of satisfying the weakest link principle. WM allows for prioritisation, although the basic maximum function does not.
The complementary product is another aggregation strategy that is uncommon outside of the security field [49]. Letx i , for i = 1, 2, . . . , n, denote the metric value normalised to [0, 1). Let w i be the weight of metric i for i = 1, 2, . . . , n. We define the weighted complementary product (WCP) class as: The regular complementary product is achieved with a = 1 and w i = 1 ∀i. WCP strategies are injective and can satisfy the prioritisation and weakest link principles, depending on the values of w i .
None of the strategies considered so far consider dependency. Bayesian networks (BN) are probabilistic graphical models, often of a causal nature, that are commonly applied in the security field [33]. In BN aggregation strategies, the metric values x i are assumed to originate from discrete, bounded random variables X i , corresponding to the metrics i = 1, . . . , n. The conditional dependencies between the random variables, and with a potential unobserved variable Y, are made explicit. This allows us to infer the probabilities of different values of Y, based on the metric values x i . BN strategies are injective, but not idempotent. Although prioritisation is generally not a goal within these strategies, the prioritisation property will usually be satisfied. BN strategies accommodate dependencies by their nature, but will mostly not satisfy the weakest link principle.
The strategy classes presented in Table 2 are not exhaustive but do cover the large majority of all aggregation strategies employed, as we show in Section 4. Two examples of other possibilities are the use of analytic network process (ANP) techniques [50,51], which relate to the deterministic equivalent of Bayesian networks, and the analysis of game-theoretic equilibria [52]. What is common to all strategies is that none satisfy all criteria of Table 2.

Adaptability
Adaptability is crucial to any cybersecurity solution [53]. Especially when measuring cybersecurity, a rigid solution that does not adapt to a changing environment or a new use case is far from optimal [54]. It is not surprising to see, then, that adaptability is a key focus of many studies [13,55], although operationalisation of adaptability is still a challenge [53].
We consider adaptability to be "the state of being able to change to work or fit better" [18]. This definition outlines two important dimensions of adaptability. First, a solution is considered adaptable if it can change to work better. There are several reasons why a cybersecurity metric solution may not be functioning as it should. This can relate to problems with the metrics themselves, such as missing or dirty data [56]. It can also relate to a changing security landscape that invalidates an existing model. This phenomenon is known as concept drift [57]. Second, a solution is considered adaptable if it can change to fit better. Generally, cybersecurity solutions in research are made to fit their use case.
We can determine their adaptability in the "fitting" dimension by determining how easily the solution can be deployed at other (similar) use cases.
Adaptability is significant in the SME context. The SME landscape is diverse [14], and SMEs often lack the knowledge and expertise to perform extensive adaptations independently [9]. In Section 6, we assimilate observations from earlier research and our results of Section 4 to provide suggestions for improving solution adaptability.

Systematic Review Methodology
We performed a systematic literature review to address our research questions. To ensure broad coverage of the cybersecurity metrics field, we employed a novel systematic review methodology blending active learning and snowballing (SYMBALS, [58]), which combines existing methods into a swift and accessible methodology, while following authoritative systematic review guidelines [59][60][61].
Active learning is one of the cornerstones of the SYMBALS approach. Active learning is commonly applied in the title and abstract screening phase of systematic reviews, where researchers start with a large set of papers and prefer not to screen them all manually [62]. Active learning is uniquely suited to this task, as this machine learning method selects the ideal data points for an algorithm to learn from.
SYMBALS complements active learning with backward snowballing. From a set of included papers, a researcher can find additional relevant papers by consulting references (backward snowballing) and citations (forward snowballing) [63]. Snowballing has proven to be a valuable addition to systematic reviews, even when reviews already include an extensive database search [64]. Backward snowballing is especially useful in uncovering older relevant research. Forward snowballing is not employed within SYM-BALS, based on the observation that databases generally have excellent coverage of recent peer-reviewed research.
After the development and evaluation of a systematic review protocol for this research, we commenced with the database search step of SYMBALS. We retrieved research from abstract databases (Scopus, Web of Science) and full-text databases (ACM Digital Library, IEEE Xplore, PubMed Central).
The Scopus API was used to retrieve an initial set of relevant research. Results from other sources were then successively added to this set. The order in which sources were consulted can be surmised from Table 3. The Python Scopus API wrapper "pybliometrics" [65] was used to retrieve all research available through the Scopus API that satisfied the query: AUTHKEY( ( s e c u r i t y * OR cyber * ) AND ( a s s e s s * OR e v a l u a t * OR measur * OR m e t r i c * OR model * OR r i s k * OR s c o r * ) ) AND LANGUAGE( e n g l i s h ) AND DOCTYPE( a r OR bk OR ch OR cp OR c r OR r e ) The "AUTHKEY" field corresponds to the keywords that authors provided for a paper. Our search query is intentionally broad, as the SYMBALS methodology allows us to deal with larger quantities of research, and we aim to exclude as little relevant research as possible at this stage. We did choose to only include English language research and document types where extensive and verifiable motivations for findings can be reported. Table 3 summarises the query results. ACM Digital Library and IEEE Xplore limit the number of accessible papers to 2000. This means only the 2000 most relevant papers from these sources could be considered. Moreover, IEEE Xplore only allows the use of six wildcards in the search query. We removed the "security" and "cyber" wildcards for the IEEE Xplore search to comply with this limitation. Any research without an abstract was excluded, as this is vital to the active learning phase of SYMBALS. This led to a small set of exclusions from the PubMed Central database. Duplicate removal was performed based on the research title, although we found that this process was not perfect, due to different character sets being accepted in different databases.
Altogether, our dataset resulting from database search comprised 25,773 papers. This exemplifies the broad scope of our research, as the largest initial set of papers from the reviews in Table 1 comprised 4818 papers [43]. The set of 25,773 papers is too large to perform data extraction directly. This is where the active learning phase of SYMBALS comes in. We chose to use ASReview in this phase, a tool that offers active learning capabilities for systematic reviews, specifically for the title and abstract screening step [62]. Many other active learning tools exist that are worth considering [66]. However, we found ASReview effective and easy to use, and additionally value the commitment its developers have made to open science. This shows, among other things, in the codebase that they made available open-source.
In the ASReview process, as well as in the later review phases, we made use of the following inclusion and exclusion criteria: • Inclusion criteria: The research concerns cybersecurity metrics and discusses how these metrics can be used to assess the security of a (hypothetical) cyber-system. -I2: The research is a review of relevant papers.
• Exclusion criteria: -E1: The research does not concern cyber-systems. Exclusion criterion E8 relates to the quality assessment phase of SYMBALS, which is explained below. Criterion E9 requires the consideration of the full text to be determined, as abstracts do not contain enough information to make a decision regarding this intricate topic [67]. Thus, neither of these criteria were applied during title and abstract screening.
ASReview requires users to specify prior relevant and irrelevant papers to train its algorithm. The following papers were used as initial indications of relevance to ASReview: These papers were chosen since they cover diverse topics, were written by different authors at different times and were published in different journals and conferences. ASReview additionally provides the option to label a certain number of random papers before proceeding, assuming that a significant proportion of these papers will be irrelevant. This provides the algorithm with a balance of relevant and irrelevant papers for training. We labelled five random papers, giving us a total training set of 10 papers.
The ASReview tool then presents the paper whose classification it deems most informative to learn from. The tool quickly learns to distinguish between relevant and irrelevant papers. By presenting the researcher mostly relevant papers, the process of discovering relevant papers is accelerated.
Although ASReview offers several classifier options, we employed the default Naïve Bayes classifier using term frequency-inverse document frequency (TF-IDF) feature extraction and certainty-based sampling. The default settings have been shown to produce consistently good results and are additionally commonly available in other active learning tools [62]. Thus, our decision to use the default settings can be motivated both from a performance and a reproducibility standpoint.
At some point in the active learning process, mostly irrelevant research remains. To reduce the time spent on assessing irrelevant research, a stopping criterion is used [62]. We stopped evaluating research when the last 20 reviewed papers were considered irrelevant, although more sophisticated stopping criteria exist that are worth considering [72]. All research that was not evaluated at this stage was excluded based on exclusion criterion E5. As Figure 1 shows, 1644 papers remained after the active learning phase. We then proceeded with the backward snowballing phase of SYMBALS. We followed the ASReview evaluation order in our backward snowballing procedure. We concluded backward snowballing once 10 consecutive papers contained no new references satisfying the inclusion criteria. As can be seen in Figure 1, 1796 papers were contained in our inclusion set after the completion of this phase. SYMBALS specifies quality assessment as an optional step, but given the large number of papers remaining, assessing quality was deemed necessary. Table 4 outlines the quality criteria that were applied. Commonly used research quality criteria were adapted for use with a Likert scale [73]. Statements could be responded to with strongly disagree, disagree, neutral, agree, or strongly agree. Instead of applying these criteria to all 1796 inclusions, the two researchers involved in quality assessment evaluated 40 papers, with 20 papers being evaluated by both researchers.
A simple, yet effective, solution to extrapolate these results is to train a binary decision tree on basic research characteristics, to create a model that can distinguish research of sufficient quality from research of insufficient quality. The five-point Likert scale responses were assigned scores of 0 (strongly disagree), 0.25 (disagree), 0.5 (neutral), 0.75 (agree), and 1 (strongly agree). Summing the quality criteria scores, each paper received a score between 0 and 9. To make the problem a binary decision problem, we labelled papers with a score of at least 6 as having sufficient quality. The height of this threshold determines how strict the eventual model will be. The study is of value to research and/or practice. 0 9 12 28 11 Next, we split our set of 60 evaluated papers into a training set of 48 papers (80%) and a test set of 12 papers (20%). To be able to train a model on this set, we need explanatory variables that explain the quality scores obtained by the papers. We opted to use three features: years since publication, citation count, and the number of pages. The maximum depth of the binary decision tree was set to 3, meaning at most three binary splits are performed before classifying a paper as having sufficient or insufficient quality. The model was trained on the 48 training papers and evaluated on the 12 test papers. Despite-or perhaps because of-the model's simplicity, 11 of the 12 test papers were labelled correctly. The only incorrect labelling occurred in an edge case with a quality score of 6. Similar results were obtained in replications with different random seeds. Figure 1 shows that 516 papers remained after applying the binary decision tree to our complete inclusion set.
Finally, we applied exclusion criterion E9 using a manual screening process, to filter out the papers that do not consider the social side of cybersecurity, as defined in Section 2.1. Figure 1 shows that in total, 60 papers were included after our filtering step.

Results
In this section, we focus on descriptive analysis of aggregate results. In Sections 5 and 6, we will dive deeper, to interpret and contextualise the results. Table A1 in Appendix A lists all data items that were extracted from the included papers to help us address our research questions. Appendix B provides detailed results per inclusion. Figure 2 depicts the relative prevalence of each of the five ADKAR factors over the years. Since 2010, awareness and reinforcement together constituted over half of the ADKAR considerations. Desire is the element that receives the least attention in research. Table 5 lists the related concepts that we encountered and mapped to each of the ADKAR terms.
Part of the reason for the prevalence of reinforcement research is that cybersecurity training and education belong to this ADKAR element. Researchers feel that organisational reinforcement is an important aspect of the social side of cybersecurity. At the same time, reinforcement can be easier to measure than other factors, which may offer a partial explanation for its prevalence. For example, many researchers choose to include a metric of cybersecurity awareness training (reinforcement), rather than of cybersecurity awareness itself (awareness).  Various security concepts were assessed in our inclusions, as shown in Table 6. Some researchers choose to measure security itself [74,75], but this approach is too general for most. Risk was assessed in two-thirds of all papers. This is interesting, as risk can be seen as having a negative connotation, whereas awareness, maturity, and resilience have positive connotations. This finding conflicts with the general tendency in the security community to favour SDT approaches over the fear-and threat-based approaches more associated with PMT [25], especially in the context of organisations [76].
When analysing the ADKAR factors by assessment concept, the papers assessing security maturity stood out. These papers place a large focus on the organisational reinforcement of security and ignore all other ADKAR factors. This is not a surprising finding. Maturity is generally a concept that requires an assessment of the organisation, rather than the individuals who make up this organisation. Table 6 shows that most papers stuck to WLC, WP, and WM as aggregation strategies. It is worth pointing out that not aggregating is a reasonable choice. If it is not necessary for a particular context, it should be avoided, based on our conclusion from Table 2 that no aggregation method satisfies all ideal security properties. Table 6. The various security assessment concepts discussed in research, with an indication of the ADKAR elements covered and the aggregation strategies employed. Each paper should consider at least one ADKAR element. A paper may not aggregate at all, but could also employ several aggregation strategies. Reviews were not labelled with a specific assessment concept. Total  AW  DE  KN  AB  RE  WLC  WP  WM  WCP  BN  None   Risk  40  24  9  14  19  28  27  10  7  1  4  4  Awareness  5  5  3  4  3  2 Table 7 focuses on the actors that were considered from the social viewpoint. Almost all papers focused solely on the defender. It is interesting to see that the desire and ability factors of ADKAR are much more prominent in research including the attacker. We would expect to see more focus from research on desire, and the related concept of motivation, based on the important role that motivation and internalisation play in SDT and PMT [24]. Desire and motivation are not easily measurable concepts, but metrics such as "attendance at security sessions" can serve as useful proxies here [77].

ADKAR Elements Aggregation Strategy Classes Assessment Concept
Nearly all research that considers the attacker perspective considers the real-life threat environment, as specified in Gollmann et al. [22]. In papers covering the defender, it is quite common to ignore threats entirely [78] or to use a proxy such as the prevalence of vulnerabilities to represent threats [79]. This is remarkable given the vital role that threat perception plays in both SDT and PMT [25].  Defender  52  33  7  17  17  37  18  Attacker  5  0  4  1  5  0  5  Both  3  2  3  1  3  3  2   Table 8 groups research based on the employed aggregation strategy. Inclusions were classified into one of three classes: theoretical, implementation, or review. The research was classified as an implementation if either clear and described actions were taken based on the implemented method, or the model was assessed at more than one point in time. This strict requirement explains why most papers were classed as theoretical.
One immediately notices from Table 8 that two of the four implementation papers did not employ an aggregation strategy. As we discussed in Section 2.3 and shown in Table 2, aggregation should only be carried out if deemed necessary. In half of the implementation research of our inclusions, researchers felt the benefits of aggregation did not outweigh the drawbacks. We additionally see that most research sticks to WLC and WP strategies, which do not satisfy the weakest link principle and cannot take into account dependencies. Researchers prefer simple and explainable strategies, which are injective or idempotent, over strategies that satisfy more security properties. Out of our 60 inclusions, 10 used fuzzy logic approaches. Although translating qualitative statements to fuzzy numbers differentiates these methods from approaches using crisp numbers, most still used some combination of WLC, WP, and WM to aggregate (for example, [80][81][82]).
Exceptions are Lo and Chen [50] and Brožová et al. [51], who used an ANP approach to capture dependencies. Lo and Chen [50], Brožová et al. [51] and the four papers using a bayesian network approach [83][84][85][86] are the only papers that considered dependencies between metrics. Interestingly, all of these papers were published in 2016 or earlier. It is not immediately clear what the underlying reason is for the current drought in research considering dependencies, but it is certainly a research area that deserves more attention. Table 9 provides detailed results regarding the research application area. Although more enterprise sizes were considered, we only encountered research applicable to mediumand large-sized enterprises, and research applicable to any enterprise size. As with research focused on maturity modelling, we see a strong focus on the reinforcement factor of ADKAR in enterprise research, especially for larger enterprises. In research intended to apply to any enterprise, Table 9 shows that WLC was by far the most popular aggregation strategy class. The only other strategy class that was used is WM. We believe it is not a coincidence that these are the only aggregation strategy classes that are both injective and idempotent. Strategies with these properties are likely to be more intuitive and easy to understand, as explained in Section 2.3. Therefore, it is not surprising that these strategies are proposed in research addressing all enterprise sizes, since especially smaller businesses need to be motivated through approachable solutions.
Regarding adaptability, of the 56 inclusions that were not review papers, 44 did not make any consideration for missing or dirty data. Of the papers that did consider one or both of these issues, the most common strategy was to ignore the associated problems. Out of these 56 papers, 46 were not able to adapt to a security event occurring, mostly since they did not operate in a live setting, but were formulated as periodic assessments. Even then, most authors did not cover this topic, and it is certainly not always clear how the security assessment would be adapted after an incident.
Concept drift and adaptation to other use cases were also often not considered. Just four of our inclusions explicitly considered concept drift and no paper mentioned a concrete timeline for when a solution should be updated. Adaptation to other use cases was discussed in 24 of our inclusions. However, the majority of these papers only gave a rough outline of how the solution could be adapted. A better practice would be to give concrete guidelines on how to adapt the solution or to immediately analyse several use cases. The former approach was not seen in research, whereas the latter was (for example, [87][88][89][90]).

Socio-Technical Cybersecurity Framework for SMEs
To offer more insight into how we can create effective cybersecurity assessment solutions for SMEs, we position our results and findings in the STS analysis framework of Davis et al. [7]. Figure 3 shows the view of STS as consisting of six internal social and technical aspects, within an external environment. We renamed the "Buildings/Infrastructure" aspect of Davis et al. [7] to "Assets". This ensures that our view is better aligned with standard terminology in cybersecurity literature. Based on the importance of policies in socio-technical cybersecurity frameworks [5], we explicitly included policies in the "Processes/Procedures" aspect of Davis et al. [7] and renamed this aspect to "Processes'.  The socio-technical system we study is the SME, in the context of cybersecurity. However, the complete set of SMEs is too diverse to consider this group as a single collective. This is why the European DIGITAL SME Alliance proposes to use four SME categories, based on the different roles SMEs can play in the digital ecosystem: startups, digitally-dependent SMEs, digitally-based SMEs, and digital enablers [14]. The European DIGITAL SME Alliance specifies these categories in the context of cybersecurity standardisation, which is intricately related to our cybersecurity assessment setting, making it a suitable classification.
The European DIGITAL SME Alliance defines start-ups as SMEs where "security has a low priority". They "typically neglect (or are not aware of) requirements" for running a secure business. Digitally-dependent SMEs are companies that depend on digital solutions (as end users) to run their business. Digitally-based SMEs "highly depend on digital solutions for their business model", and, finally, digital enablers are SMEs that develop and provide digital solutions [14]. Table 10 introduces our framework, which synthesises the SME categories of the European DIGITAL SME Alliance [14] with the STS aspects of Davis et al. [7]. Each SME category has different cybersecurity goals based on their different roles in the digital ecosystem. In Table 10, the SME categories are ordered from least to most mature regarding cybersecurity. We expect the more mature SME categories to have achieved the goals of less mature SME categories.

SME Category
Goals People Culture Processes Technology Assets
Define training plans and start creating cybersecurity awareness [92].
Initial cybersecurity policies and procedures show management commitment, ensuring employee support [93,94].

No
standardised processes yet [5]. SME gains awareness on cybersecurity policies, processes, procedures, standards and regulation.
Employ a threatbased risk assessment tool requiring no knowledge of SME assets, using no/intuitive aggregation.
External support needed to understand and implement countermeasures.
Understand relevant and critical cybersecurity asset types [92].
Management support and cybersecurity trainings stimulate employees [94] and change their perception [93].
Employ a threatbased risk assessment tool using no/intuitive aggregation.
External support needed to implement countermeasures.
Systematically identify and document relevant assets and their baseline configurations [92].
Use a risk assessment framework or maturity model with adequately motivated aggregation. Implement basic countermeasures [92], external support needed for complex countermeasures.
Manage asset changes and periodically maintain assets [92].
Employees mutually reinforce their cybersecurity abilities, possibly captured in official cybersecurity roles [94].
Successive comparisons of assessment results facilitate continuous process improvement [5,15]. Business continuity plan defined and communicated to external stakeholders [92].
Use a risk assessment framework or maturity model with advanced aggregation. Independently implement countermeasures [92] and actively detect anomalies [92], with the help of automated tools [5].
Identify and document internal and external dependencies of assets, to help in determining the SME attack surface. Actively monitor assets [92].
Our framework was constructed based on earlier cybersecurity frameworks focusing on SMEs [10,15,92] or STS [5,91,[93][94][95]. Interestingly, none of these frameworks focused on both SMEs and STS. To address the singular characteristics of our setting, we additionally incorporated the findings from our systematic review, as well as principles for designing cybersecurity maturity models for SMEs [96], in our framework. Our findings appear most prominently in the "Technology" aspect, explaining why this column of Table 10 contains relatively few references to earlier work.
Our results relating to the various ADKAR dimensions serve as input for the "People" and "Culture" aspects. Start-ups and digitally-dependent SMEs should focus on making their employees aware and providing initial cybersecurity knowledge to inspire desire and motivation. This can be achieved through a culture of organisational commitment to cybersecurity [93,94]. Digitally-based SMEs and digital enablers should progress through the ADKAR phases, with the aid of cybersecurity training, policy, and assessment. Eventually, employees should mutually reinforce each other's cybersecurity abilities [94]. The ideal cybersecurity culture will lead to trust from both the people inside the SME and the environment outside of the SME [92,94].
Start-ups and digitally-dependent SMEs are often not aware of the existence of cybersecurity standards [14]. These SMEs should first become aware and then begin to formulate basic cybersecurity policies, processes, and procedures [5,94]. Digitally-based SMEs should have formal processes in place to reinforce the desired cybersecurity behaviour of employees [5]. Digital enabler SMEs should strive towards continuous process improvement [5,15], which enables business continuity [92].
We mapped the "Technology" aspect of STS to the advised cybersecurity assessment approach and tooling for the SME. This is in line with the approach of Malatji et al. [5], who incorporated "cybersecurity tools and resources" in the "Technology" aspect of their socio-technical cybersecurity framework.
Start-ups should understand relevant cybersecurity asset types and digitally-dependent SMEs should begin identifying and documenting assets [92]. Without an asset inventory or internal cybersecurity expertise, most risk assessment and maturity model approaches are not suited to these SMEs. Additionally, they are just beginning to cultivate a desire among employees to improve cybersecurity. Incorporating the real-life threat environment [22] is an attractive option to promote motivation. Focusing on the real-life threat environment can increase the feelings of task relevance and significance employees feel, which are key motivators [97]. This is why we advise a threat-based cybersecurity risk assessment approach for start-ups and digitally-dependent SMEs.
In the same vein, we advise to not aggregate scores in cybersecurity assessment solutions for start-ups and digitally-dependent SMEs. If aggregation is deemed necessary, injective and idempotent aggregation strategies should be used, such as WLC and WM. Strategies that satisfy injectivity and idempotence can be seen as intuitive. Using these strategies allows for feelings of competence and relatedness among employees, which stimulate motivation [25]. This puts employees in a position to be a part of the solution to SME cybersecurity challenges, rather than being the source of the challenges [98].
The combination of simple aggregation and a threat-based approach offers another benefit: the corresponding assessments do not necessarily require extensive internal expertise and data. Many of the more complex aggregation strategies and comprehensive assessment approaches require cybersecurity experts at the SME to determine parameters and weights. Such resources are limited at SMEs [3], and especially at start-ups and digitally-dependent SMEs. This is why assessment approaches for these SMEs should preferably be largely based on data that can be automatically collected. Threat-based approaches are ideally suited to this requirement, as general incident data are widely available [99], and can be mapped to threats to offer SMEs insight into what is important for them [100].
Digitally-based SMEs and digital enablers can be expected to have a complete inventory of assets [92]. Digital enablers should additionally be aware of internal and external dependencies [92], allowing them to specify their attack surface [101]. For these SME categories, complete risk and maturity assessments are desirable. Digital enablers will often require comprehensive assessments that can prove compliance with cybersecurity standards and regulations.
Digitally-based SMEs should consider using aggregation strategies that reflect desirable security properties, such as the weakest link principle. Using a WCP strategy can guide these SMEs towards more accurate assessments, although intuitiveness is sacrificed.
Digital enablers with cybersecurity expertise, a specified attack surface, and large volumes of internal data should consider more advanced aggregation strategies. Figure 4 provides a visual summary of the STS interactions inherent to our framework. We use coloured arrows to indicate interactions that are explicitly mentioned in Table 10. It is implicit in the STS model of Davis et al. [7] that all aspects are interrelated.
The direction of the arrows indicates which aspect serves as an input for another aspect. For start-ups, the external environment aspects motivate the SME to realise the necessity of investing in cybersecurity, leading to the initial goals. For digitally-dependent SMEs, the goals formulated by management serve as catalysts for culture and processes. We observe that from an initial external motivation for start-ups, SMEs gradually build up internal interactions. For digital enablers, we see many interactions, both internally and with the external environment.  Table 10 using the representation of Figure 3.

Discussion
We extensively analysed and interpreted our results in Sections 4 and 5. This section will focus on a discussion of our research questions and the potential limitations of our research.
Our first research question asked: How are cybersecurity metrics aggregated in sociotechnical cybersecurity measurement solutions? One interesting finding from Table 8 is that half of the research involving implementations did not aggregate at all. Table 2 gives a partial explanation for this phenomenon: no aggregation strategy satisfies all desirable security properties. Thus, aggregation should preferably be avoided. Nevertheless, aggregation using basic approaches such as WLC is prevalent, with 42 of our 60 inclusions using this aggregation technique. We observed a clear lack of dependency consideration among metrics, which could be solved using Bayesian network [83][84][85][86] or ANP techniques [50,51]. Our cybersecurity framework presented in Table 10 provides clear guidance on which aggregation strategies suit which SME categories.
Our second research question was formulated as: How do aggregation strategies differ in cybersecurity measurement solutions relevant to SMEs and all other solutions? Our analysis of Table 9 demonstrated that in enterprise research little to no attention is paid to aggregation strategies that satisfy the weakest link and dependency properties. One of the main obstacles in making aggregation strategies suitable for SMEs is the time and expertise required to carry them out. Generally, more complex aggregation strategies require the determination of more parameters and relationships, which in turn often requires consultation of security experts at the cyber-system being assessed (for example, [89,[102][103][104]). This expertise is rarely available at smaller SMEs, although when it is, ANP approaches [50,51] could offer a path towards more accurate aggregation.
Our final research question covered the consideration of adaptability: "the state of being able to change to work or fit better" [18]. We found that very few papers consider the effects of missing data, dirty data, security events, or concept drift; all are vital elements in determining the ability of a solution to adapt to unexpected circumstances to work better. Research does often recognise the need for being able to change to fit better, as shown by the relatively large proportion that considers adaptation to other use cases. Nevertheless, there is still much to be gained in this area. It is vital that authors of research on socio-technical cybersecurity measurement solutions explicitly address the adaptability dimension in the future. Our framework of Table 10 helps in this regard, with its focus on proactive processes and active monitoring and detection capabilities.
We additionally analysed the ADKAR factors that were addressed in our inclusions. We found that desire was rarely considered in research. This was especially true for research focusing on the defender perspective. Additionally, we found that the real-life threat environment, as defined in Gollmann et al. [22], is considered in less than half of our inclusions. Both of these findings offer an interesting contrast to the increasingly important role SDT and PMT play in security research [25]. These theories focus heavily on (intrinsic) motivation and threat perception [24]. Given the low intrinsic motivation among SMEs and their employees to improve security [3], and the relatively large impact individual employees can have in the SME context, future research focusing on motivation and the real-life threat environment could provide an interesting avenue for making cybersecurity solutions more suitable to SMEs.

Limitations and Threats to Validity
We should mention at this stage that our research is not without its limitations. One potential issue is that our systematic review was not restricted to recent years, which meant that contemporary research was not as prominent in this review as it is in most other reviews. This could mean that we are overlooking certain recent developments, although 18 of our 60 inclusions were published in the past three years.
Additionally, although we believe our 60 inclusions are sufficient to help us answer our research questions, certain groupings of the inclusions resulted in relatively small sub-samples from which to draw conclusions. This could limit the generalisability of our analysis and conclusions, meaning that one could have different findings when considering different cybersecurity focus areas.
We believe in the construct validity of our systematic review methodology SYM-BALS [58], as it is based on widely-accepted methods [62,63] and guidelines [59][60][61]. However, it is still a novel methodology that remains to be extensively tested. We feel this does not threaten the validity of our research, since SYMBALS is geared towards reproducibility and satisfies standard reporting item guidelines for systematic reviews [61].
A final mention should be made of our choice to approach the social dimension through the ADKAR change management model [27]. Although the model has been applied in the cybersecurity domain [28], it is certainly not a standard approach to use ADKAR in this setting. Nevertheless, Table 5 summarised the natural mapping of social cybersecurity metric concepts to the ADKAR framework and our framework presented in Table 10 showed how the ADKAR terms can be instinctively imported from previous research. Hence, we feel justified in using this approach.

Conclusions and Future Research
Businesses, and especially small-and medium-sized enterprises (SMEs), struggle to cope with the existing cyber threat landscape. Researchers have turned to cybersecurity measurement to deal with these issues, although many challenges remain, such as how to aggregate sub-metrics into higher-level metrics [18]. The challenges faced by SMEs are compounded by the dynamic nature of the cyber threat landscape, necessitating adaptable solutions. These current challenges motivated us to investigate the topics of aggregation and adaptability in this review, with a focus on SMEs.
The social side of cybersecurity deserves attention, certainly in the SME context. This is why we chose to direct our review at socio-technical cybersecurity measurement solutions. The ADKAR (awareness, desire, knowledge, ability, reinforcement) change management model of Hiatt [27] guided us in covering the social dimensions considered in research. To aid in the analysis of aggregation approaches, we outlined five main aggregation strategy classes in Section 2.3: weighted linear combinations, weighted products, weighted maxima, weighted complementary products, and Bayesian networks. We looked towards existing research to determine interesting dimensions of adaptability, such as missing or dirty data [56] and concept drift [57].
Based on our analysis in Sections 2.3 and 4, we found that aggregation should only be carried out if necessary, since no single aggregation strategy exists that satisfies all of the desired security properties. Notably, dependencies among metrics are often not considered. Solutions can be found in this area in Bayesian networks [83][84][85][86] and analytic network process [50,51] techniques.
We used our findings as input to construct a socio-technical cybersecurity framework for SMEs. We presented our framework in Table 10 and visualised it in Figure 4. Offering a single solution for all SMEs is too simplistic. This is why we divided SMEs into four categories, as suggested by the European DIGITAL SME Alliance [14]: start-ups, digitallydependent SMEs, digitally-based SMEs, and digital enablers. By detailing what can be expected of each SME category, we were able to determine which cybersecurity assessment strategies were suitable in each case. For start-ups and digitally-dependent SMEs, threatbased risk assessment approaches that either do not aggregate or use intuitive aggregation strategies are ideal. By focusing on the real-life threat environment [22], the relevance and significance of the assessment task are given a central role. A simple and intuitive aggregation strategy accommodates feelings of competence and relatedness. Altogether, this ensures optimal organisation and employee motivation [25,97].
Digitally-based SMEs and digital enablers are advised to use more comprehensive risk assessment approaches and maturity models. These assessment techniques should assist in working towards or proving compliance with standards and regulations. Under ideal circumstances, this will build trust in the cybersecurity posture of the SME, both internally and externally. Digital enablers are also prime candidates for using more advanced aggregation strategies, such as Bayesian networks, since they often have the cybersecurity expertise and data required to make these solutions successful.
We hope that our socio-technical cybersecurity framework will provide a basis to design successful cybersecurity assessment solutions for SMEs. SMEs should not be forced to use solutions that are not suited to their situation. Especially start-ups and digitallydependent SMEs currently lack suitable cybersecurity assessment solutions, even though they are most in need of "easily understandable and practical solutions" [14]. In future work, we aim to help these SMEs to become more secure. An important first step is to formulate a properly motivated, intuitive, and usable threat-based cybersecurity risk assessment approach, to offer this most vulnerable group some deserved cybersecurity respite.

Data Availability Statement:
The data presented in this study are available in the Appendixes A and B of this paper.

Acknowledgments:
The authors would like to thank Rens van de Schoot and the ASReview team for their cooperation.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Data Extraction Items
Of the 60 papers that were determined to consider social factors, the data extraction items of Table A1 were extracted, along with general data such as title, abstract, keywords, and number of citations. Real-life threat An indication of whether the paper considers the real-life threat environment [22].

Yes, no, unclear
Physical dimension An indication of whether the paper considers the physical dimension of security.
Yes, no, unclear

Validation method
The validation method employed in the research [29]. Hypothetical, empirical, simulation, theoretical Validation method description A description of the validation method. Description text