A systematic method for measuring the performance of a cyber security operations centre analyst

Analysts who work in a Security Operations Centre (SOC) play an essential role in supporting businesses to protect their computer networks against cyber attacks. To manage analysts eﬃciently and effectively, SOC managers and stakeholders use Key Performance Indicators (KPIs) to evaluate their performance. However, existing literature suggests a lack of a systematic approach for assessing analysts’ performance. Even though cyber security researchers advocate for research into this area, little effort has been made by researchers to address this gap. Drawing on the results of a Delphi panel with industry experts and the principles of the Analytic Hierarchy Process (AHP), this paper interrogates the problem and proposes a systematic weighted approach for measuring the performance of an analyst in a SOC. The proposed method, referred to as a SOC Analyst Assessment Method (SOC-AAM), was evaluated in two SOCs as a part of an experimental case study. The results of the empirical evaluation show that the SOC-AAM enables SOC managers and stakeholders to quantify and assess analysts’ performance in a systematic manner. The SOC-AAM also provides a novel guideline for assessing the quality of incident analysis and the quality of incident reports. This study will be of interest to practitioners and cyber security researchers seeking to understand the operations of a SOC analyst. © 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )


Introduction
Security Operations Centres (SOCs) have had much increase in use and popularity in recent times and have become an active topic of research ( Ahmad et al., 2021;Cho et al., 2020;Schlette et al., 2021;Vielberth et al., 2020 ).A SOC is a centralised unit inside or outside an organisation that helps businesses to defend their network against cyberattacks by monitoring and responding to security incidents ( Achraf Chamkar et al., 2021;Majid and Ariffi, 2019 ).At the heart of a SOC's operations are cyber analysts (hereafter referred to as analysts) tasked with the responsibility of ensuring the smooth running of the SOC.It is the responsibility of an analyst to monitor, detect, analyse and report cyber threats and incidents ( Kokulu et al., 2019;Smith, 2020 ).Analysts are expected to demonstrate high operational performance, because poor performance will negatively impact the overall efficiency of a SOC ( Sundaramurthy et al., 2015 ).
To manage analysts effectively and efficiently, SOC managers and stakeholders draw on performance metrics and measures, also referred to as Key Performance Indicators (KPIs) ( Onwubiko and Onwubiko, 2019 ) to evaluate analysts' performance ( Onwubiko, 2015;Sundaramurthy et al., 2015;2014 ).However, existing literature suggests that SOC managers and stakeholders face a challenge on how to evaluate the performance of analysts fairly and systematically ( Achraf Chamkar et al., 2021;Andrade and Yoo, 2019;Sundaramurthy et al., 2015 ).Recent studies also point out that existing performance metrics and measures for evaluating the performance of an analyst are inadequate and problematic ( Achraf Chamkar et al., 2021;Agyepong et al., 2019;Sundaramurthy et al., 2015;2014;2017 ).In the context of this work, the terms 'metric' and 'measure' are used interchangeably as they are closely linked and are often used synonymously ( Ahmed, 2016;Jacques Houngbo and Toyigbé Hounsou, 2015 ).We use the term 'stakeholders' to describe other professionals in a SOC such as incident management manager, SOC team leaders and technical leads who are also interested in the performance of an analyst ( Sundaramurthy et al., 2015 ).
Amongst the problems reported in the literature is that existing performance metrics for analysts do not consider several aspects of their work such as the quality of their analysis and the handling of false positive security alerts ( Agyepong et al., 2020b;Sundaramurthy et al., 2015 ).Furthermore, there is a concern that existing quantitative performance metrics fail to take into account the severity or priority of alerts processed by an analyst ( Kokulu et al., 2019 ), even though researchers point out that analysts are expected to analyse security alerts according to alert priority ( Onwubiko and Ouazzane, 2019c;Shah et al., 2018 ).The problem of ignoring alert priority, and simply measuring analyst performance based on the number of incidents actioned regardless of their severity, is that some analysts may opt to action a large number of easy, benign or low priority incidents, thereby scoring high on such a metric ( Sundaramurthy et al., 2017 ).Prior research also highlights that existing metrics are narrow in focus and discrete and, as such, do not present the entire picture of an analyst's effort s and perf ormance within a SOC ( Sundaramurthy et al., 2015 ).Some researchers also assert that SOC managers usually focus on quantitative metrics with little attention on qualitative metrics such as quality of analysis when measuring the performance of analysts ( Achraf Chamkar et al., 2021 ).In addition, studies suggest that the current lack of a systematic approach for evaluating the performance of analysts frustrates both analysts and SOC managers ( Sundaramurthy et al., 2015;2017 ).Despite the problems mentioned above, there has been little effort from cyber security researchers to improve evaluation methods for analysts.
The main contribution of this work is a method for evaluating the performance of a SOC analyst in a comprehensive and systematic way accounting for the level of importance of each function.The proposed method includes a novel guideline for assessing the quality of incident analysis conducted by analysts and the quality of their incident reports.This guideline will be helpful to both experienced and novice analysts who study suggest suffer from the complexities of security incident analysis tasks ( Zhong et al., 2018 ).We refer to the proposed method as the Security Operations Centre Analysts Assessment Method (SOC-AAM).
This work builds on our previous study that identified the main functions of analysts in a SOC and the criteria that could be used to measure their performance systematically ( Agyepong et al., 2020b ).In this work, we draw on the results of a Delphi panel of SOC experts and the principles of the Analytic Hierarchy Process (AHP) to propose a weighted approach for measuring the performance of an analyst.We tested and evaluated the proposed method in a case study at two SOCs.The evaluation results show that the SOC-AAM enables SOC managers to aggregate and quantify the performance of an analyst in a systematic manner.The results also reveal that the SOC-AAM offers a useful, easy-to-use and comprehensive approach for evaluating an analyst's performance.
The remainder of the paper is organised as follows: Section 2 discusses related work, focusing on studies that examine performance metrics for analysts.In Section 3 , we present a discussion on the operations of SOC analysts from a theoretical perspective.Section 4 presents the methodology used for this study, explaining both the Delphi technique and the AHP methods.Section 5 presents the results of the Delphi panel.Section 6 presents the proposed method.Section 7 presents the results from the testing and evaluation of the weighted approach.Section 8 presents a discussion and research implications.Section 9 presents the conclusions and future work.

Related work
Cyber security researchers have suggested various KPIs for evaluating the performance of analysts ( Agyepong et al., 2019;Kokulu et al., 2019;Onwubiko, 2015;Sundaramurthy et al., 2015;2014;2017 ).KPIs are measures for assessing performance ( Kaplan, 2009;Onwubiko and Onwubiko, 2019 ).However, studies suggest that SOC managers and analysts could benefit from an alternative ap-proach to evaluating an analyst's performance ( Sundaramurthy et al., 2015;2014 ).In a previous work ( Agyepong et al., 2020b ), we conducted an empirical case study to understand the real work of a SOC analyst and proposed a framework that was validated by SOC experts as a useful framework that provides the foundation for developing an approach for capturing the holistic performance of an analyst.This paper builds on the findings of our previous work and proposes a method for evaluating an analyst's performance.Sundaramurthy et al. (2014) visited three SOCs to identify, amongst other things, metrics for evaluating the performance of an analyst and found that while some SOCs use the number of incidents processed by an analyst at the end of their shift to assess their performance, other SOCs measure analysts' performance based on the time it takes to create a ticket.Their study acknowledged that there are problems with both metrics.For example, whereas the latter fails to recognise that some security incidents are more complex than others and will naturally require more time, a performance metric based on the number of incidents raised, as explained by Kokulu et al. (2019) , does not consider the alert priority or severity.Thus, there will be no difference between analysts who consistently work on critical severity incidents and those who choose to work on low priority incidents.
A subsequent study by Sundaramurthy et al. (2015) that sought to investigate a burnout phenomenon amongst analysts found that a major challenge faced by SOCs is how to evaluate the performance of analysts in an objective and consistent manner.Sundaramurthy et al. (2015) noted that the existing evaluation methods fail to fully capture the effort s of an analyst, leading to frustration and dissatisfaction amongst analysts.They report that some SOCs based analysts' performance on the time they spend creating a ticket.They noted that analysts lament because tasks such as dealing with false-positives and tuning out of false-positive alerts, are often not recognised when it comes to performance assessment.Onwubiko (2015) discusses a number of metrics that can be used by SOC managers to measure the performance of analysts.Amongst them are the number of incidents detected in a certain period, the number of false positives and true positives detected over a rolling period.However, these performance metrics have similar problems as stated above.
Achraf Chamkar et al. (2021) and Kokulu et al. (2019) also present performance metrics such as the number of incidents raised, the number of alerts analysed by an analyst during their shift, mean time to detect (MTTD) and mean time to respond (MTTR) to an incident.However, analysts see time-based measures such as MTTD and MTTR as misleading when used to evaluate their performance because there are often issues outside their control (such as, for example, reliance on third parties for collaborative evidence) ( Achraf Chamkar et al., 2021;Agyepong et al., 2020b ).Shah et al. (2018) propose evaluating the performance of an analyst based on the number of analysed/unanalysed alerts actioned by an analyst operating an Intrusion Detection System (IDS) sensor.Their approach, however, does not account for other activities performed by an analyst.
The work presented in this paper takes a different approach to how an analyst's performance can be measured.We propose a weighted approach for evaluating an analyst performance using multiple criteria based on the most common and significant aspect of analysts work identified in previous work ( Agyepong et al., 2020b ).We also presents a guideline for assessing the analysis and incident report produced by analysts.

Key functions of a SOC analyst
Analysts play a vital role in the operations of a SOC and the delivery of a SOC's services ( Aung et al., 2020;Axon et al., 2017 ).From a theoretical perspective, the activities and operations of analysts can be understood using the Activity Theory (AT) ( Sundaramurthy et al., 2016 ).The AT, which was first postulated by Leontiev and Vygotsky, and subsequently extended by Engeström (2015) , can be used to model any organised human activity.
The underlying assumption of the AT is that humans are collective beings and that their actions are goal-directed or objectivedirected ( Engeström et al., 1999 ).According to the AT, without an objective, there is no meaning to any planned human activity ( Sundaramurthy et al., 2016 ).The AT stresses that humans do not act in isolation but within communities ( Engeström, 2015 ).This theory is very much evident in the operations of analysts and how they engage with other members of the team to execute missions and realise key objectives successfully.To achieve their objectives, they must obey the rules that govern the activities of analysts in a SOC.Rules can be in the form of processes such as Standard Operating Procedures (SOPs), playbooks and runbooks ( Sundaramurthy et al., 2014 ).Analysts also rely on tools such as firewalls, Security Information and Event Manager (SIEM) and Intrusion Detection and Prevention Systems (IDPSs) to achieve their objectives.They also draw on the idea of division of labour through the operations of different tiers of analysts (Level 1, 2, and 3) to achieve their objectives Kokulu et al. (2019) ; Sundaramurthy et al. (2014Sundaramurthy et al. ( , 2016) ) .Although some SOCs do not use a three-tier architecture and instead rely on their analysts to possess the necessary analytic abilities to undertake their task ( Kokulu et al., 2019 ).Analysts also engage with professionals such as SOC engineers, incident handlers, penetration testers and forensic specialists in the course of their operations within a SOC ( Agyepong et al., 2020b ).It is important to mention that not all SOCs operate using the Tier structure.In a in non-hierarchical SOC, all the analysts are expected to have similar skill-sets to address security incidents ( Kokulu et al., 2019 ).
Although Sundaramurthy et al. (2016) provide an organised account of the activities of analysts within a SOC by drawing on AT, their discussion only focuses on the threat detection function (the monitoring and detection function) by analysts.They do not thoroughly discuss other salient objectives pursued by analysts that are relevant when seeking to assess their holistic performance.For example, they do not discuss or comment on key analysts' objectives such as finding and fixing vulnerabilities ( Agyepong et al., 2020b;Kokulu et al., 2019 ).Likewise, they also do not discuss objectives such as the baseline and vulnerability management function that are usually performed by analysts ( Agyepong et al., 2020a;Schinagl et al., 2015 ).A revised version of the model suggested by Sundara-murthy et al. is presented in Fig. 1 to illustrate the operations of analysts and the full range of objectives expected of them.
The identification and appreciation of analysts' objectives, also referred to as analysts' functions in this study, are crucial if one wants to design or establish a systematic way of capturing analysts' holistic performance.These functions can be used as a set of criteria for evaluating analysts' performance ( Agyepong et al., 2020b ).Moreover, an effective evaluation method, as explained by Islam and bin Mohd Rasad needs to have a set of well-defined criteria upon which the evaluation is based ( Islam and bin Mohd Rasad, 2006 ).O'Connell and Choong (2008) explain that performance metrics must focus on an analyst real-life workplace needs and experience.However, this can be a problematic issue because no two SOCs are the same in terms of the functions that they offer; analysts' functions vary from one SOC to another ( Goodall et al., 2004;Onwubiko, 2015;Schinagl et al., 2015 ).With this idea in mind, in this study we designed a performance evaluation system, based on the most common and significant functions of analysts.
The functions of analysts are shown in Fig. 1 .These functions were identified in previous work and validated by a group of SOC analysts and managers as the core functions of an analyst that can be used as the basis for measuring an analyst's overall performance ( Agyepong et al., 2020b ).Existing literature also identifies these functions as core functions of a SOC ( Onwubiko, 2015;Schinagl et al., 2015 ).Table 1 summarises the main functions expected of an analyst and a description of each function.
A number of qualitative and quantitative KPIs were also identified as useful metrics for measuring the performance of an analysts under each function.However, on their own, the different functions and KPIs are discrete and as such do not provide an overall insight into the performance of an analyst if used in a disconnected manner.The intention of this study is, therefore, to find a systematic way of consolidating the functions expected of an analyst and the associated KPIs for each function to derive the overall performance of an analyst.It is important to highlight that timebased KPIs such as Mean Time to Detect (MTTD), Mean Time to Respond (MTTR), Mean Time to Triage (MTTT), Mean Time to Fix Vulnerability (MTTFV) and Mean Time to Mitigate (MTTM) ( Achraf Chamkar et al., 2021;Agyepong et al., 2020b ) are not used in the evaluation method proposed in this study for the reasons discussed under Section 2 .

Research methodology
In order to propose a new approach for measuring the performance of an analyst, this study adopts a practical research method-Table 1 The main functions of a SOC Analyst ( Agyepong et al., 2020b ).

Monitoring and Detection Function
• Real-time event monitoring of an organisation's network traffic, systems, processes and activities using security tools such as a SIEM, an IDS or IPS to identify malicious activities.• Monitor security systems such as firewalls to detect policy violation, privilege user activities, security breaches or any unusual activity on the network.• Identification of false positives and false negatives from sensors and tuning them out to decrease the load on sensors and analysts.• Deep packet inspection and alert triage.

Analysis Function
• Analysing log files and event data reported by the monitoring and detection tools.
• Visual inspection of logs and in-depth packet analysis of network traffic and alerts using a range of packet analyser tools such as Wireshark to establish whether an activity poses a threat to an organisation.• Drawing on historical logs to confirm trends and patterns.
• Conducting root cause analysis and creating script queries to investigate logs.

Baseline and Vulnerability Function
• Vulnerability scans.
• Applying Patches to fix vulnerabilities.
• Ensuring that systems are patched to the correct level and that all systems running unsupported operating systems are identified.

Intelligence Function
• Identify threat actors that may pose a danger to an organisation.
• Exchanging threat information with various internal and external parties.
• Correlate information on multiple threats that might affect an organisation.
• Blacklisting known malicious IP addresses such as those linked to command and control activities.
• Creating intelligence use cases scenarios to track new and emerging threats.
• Create event correlation rules and rules for event filtering.

Response and Reporting Function
• Isolation of suspicious devices to reduce damage to the enterprise network.
• Use incident tracking system to create and track tickets.

Policies and Signature Management
• Writing and tuning correlation rules.
• Signatures and rules modification to remove false positives.
• Modification to customised Signatures and content rules.
ology that engages with industry experts.The Delphi technique ( Turoff and Linstone, 2018 ) and the Analytic Hierarchy Process (AHP) ( Saaty, 2008 ), were used during the engagement with the experts.Even though these two methods have been combined and used in several studies ( Arof, 2015;Taleai and Mansourian, 2008 ), to the best of our knowledge, there is no existing work that integrates both approaches in the context of assessing a SOC analyst's performance.
To assess the efficacy of the proposed method, we use the Method Adoption Model (MAM).The MAM is based on the Method Evaluation Model (MEM) -a theoretical framework for validating IS design methods.However, as explained by Paz et al. (2015) the MEM has general aspects of evaluation that can be applied to any kind of design method.According to the MEM, the success of a design method is reflected in its adoption into practice.Moody (2003) posits that the acceptability and use of a method in practice (which is the ultimate measure of its success) is driven by a set of perceptions and intentions.Only methods that are considered to be useful and easy-to-use will motivate practitioners to use it again in the future.The intention to use a particular method leads to its 'Actual Usage' in practice, which ultimately signifies the success of that method ( Paz et al., 2015 ).On the contrary, if practitioners do not have a good perception of a method, they are not likely to adopt or use it.We validated the proposed method in two SOCs using this evaluation strategy.Section 7 discusses the evaluation process in greater detail.

The Delphi method
The Delphi method is a widely used technique for gathering data from a group of experts on a topic within their domain of expertise in a structured group communication ( Turoff and Linstone, 2018 ).It is useful in situations where no standard criteria exist for evaluation, as in the case of the SOC analyst performance ( Paintsil, 2012 ).However, the Delphi method has some drawbacks one of which is that it can be a laborious and time-consuming method because of the multiple rounds and associated feedback process for each round ( Turoff and Linstone, 2018 ).A typical Delphi process usually involves a minimum of two rounds ( Arof, 2015 ).
In this study, the Delphi method was used to solicit the opinion of SOC experts on the weights that should be assigned to the analysts' functions and KPIs that can be used for measuring the performance of analysts.These functions and KPIs are also referred to as assessment 'criteria' and 'subcriteria', respectively, to align the functions and associated KPIs to the AHP terminology as part of this study.As mentioned earlier, the analyst's functions (criteria) and KPIs (subcriteria) were identified as part of our earlier work with SOC experts ( Agyepong et al., 2020b ).Identification of the criteria and the subcriteria for the evaluation is an integral part of the AHP decision-making process ( Saaty, 1990 ) discussed in the next section.In addition, we used the Delphi technique to solicit experts' opinions on key indicators that can be used to assess the quality of an analyst's analysis and the quality of their report.
In the literature, different Delphi methods exist, giving researchers a choice on the specific Delphi technique to use, depending on what they seek to uncover ( Arof, 2015;Ogbeifun et al., 2016 ).The decision-making Delphi is adopted in this work as it follows a structured approach that allows experts to create a future reality, based on the choices they make ( Arof, 2015 ).Arof (2015) explains that the decision-making Delphi option is similar to the classical Delphi method because they follow similar steps.These steps are summarised in Gan et al. (2015) as follows: (1) Design the questionnaire and identify the Delphi panel; (2) Undertake the first round of the Delphi survey with the expert panel; (3) Synthesise the opinion provided by the experts from the first round and provide that feedback to all the members on the panel; (4) Request that each member of the panel reconsider the decision, based on the findings from the experts from the first round; (5) Synthesise expert opinion from the second round and reach a consensus; (6) Repeat steps 3 to 4 (if necessary) until a uniform result is achieved on the topic.These six steps were followed in conducting our Delphi exercise.
We began the Delphi study by contacting the SOC experts who took part in our earlier work that identified the criteria that can be used to evaluate the performance of analysts.On the recommendation from the recruited participants, we also contacted other SOC experts who did not take part in our previous study.As explained by Akins et al. (2005) participants for a Delphi study are not randomly selected but rather they are purposively selected as they have the knowledge and insight on the topic under study.We sent an email to the participants, explaining the objective of the research and requested their participation.In total, 11 (eleven) SOC experts initially agreed to take part in the study.However, only 8 (eight) of them completed and returned their questionnaires.With regard to our study sample size, although there is no consensus on the minimum number of participants for a Delphi study, we noted that some scholars point out that the panel size for a Delphi study can be as small as three participants ( Arof, 2015;Ogbeifun et al., 2016;Turoff and Linstone, 2018 ).The panel consisted of SOC managers and analysts from the UK Defence sector, finance sector, the airline industry, the automobile industry and telecommunication.

The Analytic Hierarchy Process Method
The Analytic Hierarchy Process (AHP) is a mathematical model that facilities multi-criteria decision-making involving both qualitative and quantitative criteria at an individual or group level ( Saaty, 2008;1980 ).Since its inception, the AHP has been used in several fields, including computer science and information systems ( Badie and Lashkari, 2012;Benítez et al., 2011;Costa and Santos, 2017;Fahmy, 2001;Siregar and Siregar, 2018 ).The AHP breaks a complex problem into modular parts; arranges these parts into a hierarchy; assigns numerical values to the criteria/elements in the hierarchy by making a pairwise comparison on the relative importance of each criterion, and synthesises the judgement to establish priorities (also known as weights) ( Odu, 2019;Saaty, 2008 ).Once the weights are obtained, a consistency check is applied to ensure that the judgements are not made arbitrarily ( Odu, 2019 ), reducing bias in the pairwise comparisons process.If the judgements are found to be inconsistent, the judgement needs to be re-evaluated ( Benítez et al., 2011 ).Islam and bin Mohd Rasad (2006) outlined four steps to using the AHP to evaluate the performance, which are: i. Identify the criteria, subcriteria and employees to be evaluated and construct the AHP model/hierarchy; ii.Construct an n × n pairwise comparison matrix for the criteria.Calculate the weights of the decision criteria by computing the normalised principal eigenvector of the matrix ( Odu, 2019 ).This vector gives the weights of the criteria ( Saaty, 2003;Singh Sidhu et al., 2020;Vargas, 2010 ).Construct a pairwise comparison for the subcriteria and calculate the weights in a similar manner.The weights of the subcriteria are multiplied by their respective parent criterion; iii.Divide each subcriterion into intensities or grades such as high, medium, and low.The intensity allows one to determine the quality of an alternative for that criterion ( Saaty, 2008 ).Priorities are assigned to the intensities by conducting a pairwise comparison.The priorities of the intensities are multiplied by the weight of their parent subcriterion; iv.Finally, take each employee and measure their performance intensity under each subcriterion, then add the global priorities of the intensities for the employee.Repeat the process for all the employees.
The approach used in this study is similar to the steps described above; however, in our work, step (iii) is replaced with the inherent intensities of the KPIs: the individual KPIs achieved by an analyst under step (ii) are also used as the distinguishing factor instead of creating a new set of intensities.As explained by Saaty (Saaty, 2008, p.136) , the purpose of intensities is to distinguish the quality of an alternative for that criterion.Since many of the KPIs used as the subcriteria (see Fig. 2 ) are already serving as a distinguishing factor (for example, incidents processed by analysts are categorised as high incidents, medium incidents and low incidents ( Onwubiko and Ouazzane, 2019c;Shah et al., 2018 )), we opine that there is no need for additional intensities to be created.We do not use intensities under the intelligence function, the policies and signatures management function, and the baseline and vulnerability management function because the KPIs under these functions are deemed sufficient to capture the performance of an analyst, as we discovered during our fieldwork with SOC experts ( Agyepong et al., 2020b ).This strategy is similar to the work of Vargas (2010) , who did not use intensities.In step (iv), each analyst is measured against each KPI, and the total of the KPIs is used to determine their overall score.
Figure 2 depicts the architecture of the AHP hierarchical model used in this study.The first level of the hierarchy represents the goal, which is to measure the overall performance of an analyst.The second level of the hierarchy represents the main functions of an analyst which are also used as the main criteria used for the evaluation process in this work.The functions of analysts were deduced from the empirical interview data collected from SOC experts ( Agyepong et al., 2020b ) and a thorough analysis of existing literature ( Goodall et al., 2004;Onwubiko, 2015;Schinagl et al., 2015 ).The third level of the hierarchy represents the subcriteria for each respective main criteria.The KPIs under each function are used as the subcriteria.The final level represents the analysts who are evaluated one at a time against the criteria and subcriteria defined above.The word 'alternatives' is often used in the AHP hierarchy to denote the final level.

Applying the AHP to derive the criteria weights
Having modelled the AHP hierarchy, a pairwise comparison matrix A is constructed and used to compute the weights for the criteria and subcriteria.The matrix A is an n × n real matrix, where n is the number of evaluation criteria or subcriteria being considered.Let a i j be a pairwise comparison that the decision-maker makes between two criteria i and j.Each entry a i j of the matrix A represents the importance of the ith criterion relative to the jth criterion.Note that, a i j denotes the entry in the ith row and the jth column of matrix A .If a i j > 1 , then the ith criterion is more important than the jth criterion, whereas if a i j < 1 , then the ith criterion is less important than the jth criterion.If the two criteria have the same importance, then the entry a i j will be equal to 1 ( Saaty, 2008 ).In AHP, the entries a i j and a ji satisfy the constraint: a i j • a ji = 1 and a ii = 1 for all i .If a i j = 1 , it means that the decision-maker regards element i and j as equally important.
The relative importance between two criteria is measured according to the numerical scale of 1 to 9, shown in Table 2 .
Once the matrix A has been constructed, the priority vector (or weights) for the criteria can be calculated ( Islam and bin Mohd Rasad, 2006 ).The process for deducing the weights starts by deriving from the matrix A a normalised pairwise comparison matrix (A norm ) by making the sum of each column equal to 1 ( Odu, 2019 ).
Eq. 1 is used for the computation.Each entry a i j of the matrix A norm is computed, using Eq. 1 ( Ishizaka and Labib, 2011 ).
Finally, the criteria weight vector (w ) is calculated by averaging the entries on each row of A norm using Eq. 2 ( Ishizaka and Labib,

Checking the consistency
The consistency of the choices made by the decision-maker for a comparison matrix can be checked by calculating the consistency ratio (CR ) .(CR ) can be calculated using Eq. 3 Saaty (2008) .

C R = C I RI
(3) In Eq. 3 , RI denotes a Random Index.In a randomly generated matrix, the values are entered randomly and expected to be inconsistent ( Saaty, 2008 ).Table 3 shows the values for RI ( Saaty, 2008 ).Also, the CI in Eq. 3 is calculated using Eq. 4 , where ( λ max ) represents the maximum eigenvalue of the decision matrix A and n is the number of compared criteria Saaty (2008) .The comparison matrix A is absolutely consistent if λ max = n ; otherwise as ex-plained by Saaty (2008) , the difference λ max n will be a measure of inconsistency in the decision matrix.
If A is absolutely consistent then λ max = n ( Vargas, 2010 ).If the CR of the decision matrix is less than 0.1, then, the judgement is acceptable and can therefore be used ( Odu, 2019 ).
Having done the calculations for the main criteria, a similar calculation is repeated for all subcriteria.Once the local priorities of the subcriteria are calculated, they can then be aggregated to get the final priorities ( Vargas, 2010 ).
As a part of the integrated Delphi-AHP approach, the participants were given a questionnaire for the pairwise comparison.The questionnaire was devised in a spreadsheet and was submitted to the members of the Delphi panel via email, a strategy which is similar to the suggestion by Gordon (2011) .To aggregate the results from the panel (group), a number of techniques can be used ( Ishizaka and Labib, 2011 ).Saaty (2008 , p. 273) suggests that consensus can be reached by taking the geometric mean of the in- dividual responses or by voting on the preferred judgement for each pairwise comparison.This position is supported by scholars such as Ishizaka and Labib (2011) .In this study, the geometric mean of the pairwise judgements from the participants was used in favour of voting as it was not possible to get all participants together to vote.The geometric mean values from the experts were used to construct a pairwise comparison table in a manner satisfies the reciprocal relation in comparing the elements ( De Felice and Petrillo, 2013 ).
It is important to point out that, the participants were provided with a recorded video demonstrating the AHP pairwise comparison exercise and guidance on how to complete the accompanying excel spreadsheet for their individual AHP judgements.The purpose of the video was to familiarise the participants with the AHP concept.The excel spreadsheet was also designed to check the consistency of the pairwise comparison and report back to the participant if their judgement was inconsistent.This enables the participants to adjust to their pairwise comparison until they were consistent.

Study results
This section presents the results of the Delphi Study and the aggregated outcome of each round.

Round 1 -Decisions matrices
We analysed individual responses from Round 1 to ensure that it satisfied the rule of reciprocity, transitivity and was aligned to the AHP consistency index ( De Felice and Petrillo, 2013 ).The rule of reciprocity dictate that when a judgement a i j is elicited, a ji will also be recorded as the reciprocal value in the comparison matrix.On transitivity, we explain that this relates to the judgement choices made by the decision maker.For example, if a decision maker prefers summer twice as much as spring and spring twice as much as winter, in mathematical terms,the preference of summer to winter would be 4.If the decision-maker assigns any other value, there would be a certain level of inconsistency in the judgement ( Saaty, 2008 ).The geometric mean of the values suggested by the participants for the main criteria was then used to construct the group's comparison matrix, shown in -Appendix A. We then checked the consistency index of the group's decision.The resultant weights for each of the main functions along with the CR are shown in Appendix A, Table 8 .
We applied a similar approach to derive the weights for the subcriteria and computed their respective CR .The results of the weights for all the criteria and subcriteria are shown in Appendix A -Tables 9 , 10 , 11 , 12 , 13 , 14 .The analysis of the data from the panel in Round 1 shows that the group's aggregated result was below the AHP consistent ratio of < 0.1.The findings from Round 1, which met the AHP standard, led us to adopt a two-round Delphi approach ( Arof, 2015 ).

Round 2 -Final ranking and weights
The objective of Round 2 was to establish whether the experts agreed with the group consensus achieved in Round 1, or whether any of the participants wanted to modify the values from the group judgement.Round 2 gave the experts the opportunity to make any changes or to recommend further improvement of the results from Round 1.
The outcome at the end of Round 2 revealed some interesting results in that the participants were satisfied with the weights that had been assigned to the different tasks.As a result, no changes were made to the weights deduced from the group consensus in Round 1.Hence, the weights deduced from the group's decision are proposed as the final weights for measuring the performance of analysts.
The weights for the criteria and subcriteria from the two rounds were synthesised to yield a set of overall weights (also known as the "global weights"), by multiplying the weight of each criterion by the respective weights of their subcriteria ( Vargas, 2010 ).The output of the calculation is shown in Appendix B, Table 15 .
In addition to the weight assignment, the experts were also invited to suggest "indicators" for assessing the quality of an analyst's report and the quality of their analysis as part of the Delphi study.The word "indicators" in this study denotes guidelines for assessing the quality of an analyst's incident analysis/report.Related publications point out that incident analysis and report must address 'who' (attacker/malicious person), 'what' (indicators of compromise/actions done), 'where' (from what IP address), 'when' (timestamp), 'why' (the risk), 'how' (method of detection) and provide recommendations on the actions taken to address the identified incident.Using an insight from the existing work ( D'Amico et al., 2005;Miloslavskaya, 2018;Mutemwa et al., 2018;Zhong et al., 2016 ), we devised a table and listed some indicators that can serve as a guideline for assessing the quality of an analyst's report.The experts were requested to review and add to a list of indicators.
Following the two rounds of the Delphi study, the indicators identified in Table 4 were reported by the participants as the most important areas that must be reported by analysts as part of quality analysis and in their reports.These indicators can help assess the quality of an incident report written by an analyst.

Reflection on the Delphi-AHP exercise
Although there was a group consensus on the weights for the different functions, it is important to highlight that there were some differences at an individual level in how the participants perceived the importance of functions, which was reflected in the AHP values that were assigned.In fact, the differences in opinions, which were reflected in the responses to the questionnaire, confirm the assertion made by Goodall et al. (2004) , that different SOCs conceptualise the operations of a SOC differently.It is, therefore, possible that our respondents' opinions were influenced by the importance they attached to the functions in their local SOCs.Another notable observation from the AHP questionnaire returned by that participants was that some of the panel members assigned an AHP value of 1 to many of the functions, which may have facilitated the achievement of a group consensus that was consistent with the AHP CR of < 0 . 1 .
Despite the individual differences in opinion observed in the pairwise comparison values, the inference can be made that the participants collectively tended to agree that the monitoring and detection function, the analysis function and the response and reporting functions were the most important.As such, these functions were assigned the highest weights, confirming the importance of these three functions as reported by previous SOC researchers ( Agyepong et al., 2020b;Jacobs et al., 2013;Onwubiko, 2015 ).

Security Operations Centre Analysts' Assessment Method (SOC-AAM)
In the SOC-AAM, we assign the final weights deduced from the Delphi-AHP study to the functions of an analyst (see Appendix C -Table 16 ).The SOC-AAM contains six main analyst's functions and 31 KPIs.These six functions and associated KPIs are shown in the first column of the SOC-AAM.The second column of the SOC-AAM shows the weight for each function and for each KPI.The third column (labelled -"KPI Score") is reserved for the KPI score achieved by an analyst over the assessment period.An assessor or evaluator must enter the value(s) for the third column during the evaluation process.The fourth column (Analyst's Score Per Activity) represents the aggregated score per each function.The fifth column (labelled -Team's Total KPI) is reserved for the KPI score achieved by the entire team over the assessment period.The final column (Team's Overall Score) represents the team's aggregated score per each function.The rows labelled "Analyst's Overall Score" and "Team's Overall Score", shown at the bottom of the SOC-AAM, represents the aggregated score for an analyst and that of the team respectively, once the score(s) for each KPI have been entered by an evaluator.This process is further described below.The performance of the analyst is calculated in percentage terms, considering the overall team's performance in the assessment period.This is to help track and compare the performance of the team and each individual analyst over time.
The steps required to use the SOC-AAM as an evaluation tool are detailed below.The process is facilitated by an Excel spreadsheet that automates all calculations.The output of the SOC-AAM is an aggregation of the KPI scores for a set of SOC analyst's functions.Under each function, the number of achieved KPI(s) for the function is submitted.If there are no scores for a particular function, that should be left blank.For example, the number of incidents closed will be reported under the Response and Reporting function.
The SOC-AAM contains two special KPIs (the quality of analysis and the quality of incident report) that must be scored by only a SOC manager or the technical lead, as part of the evaluation process.These two KPIs are important because they are among the top three largest weights in the SOC-AAM and are based on the subjective judgement of the evaluator.As a part of the evaluation process, a SOC manager needs to review a randomly selected report written by an analyst during an assessment period and assign a score between 1 to 7 (where 1 is the lowest and 7 is the highest), depending on how many of the seven quality indicators the analyst has addressed in the report (see Fig. 4 ).In our previous work, we found that the quality of analysis is often reflected in the report written by an analyst ( Agyepong et al., 2020b ).Therefore, the manager could assign the same score to both the quality of analysis and the quality of the report.Alternatively, she/he could choose to assign a different score for quality of analysis up to a maximum of 7.However, this does not suggest that the quality of analysis is the same as the quality of report, since the research participants assigned different weights for the quality of analysis and the quality of the report.
The steps for evaluating analysts' performance are outlined below: • Step 1: The evaluator enters the total number of analysts in the team into the SOC-AAM tool.This will calculate the maximum team score for the quality of analysis and the quality of their report.(Note: Each analyst can achieve only a maximum score of 7 for the quality of their analysis and 7 for the quality of their report, based on the seven indicators as stated earlier; the overall team score is, therefore 7, multiplied by the number of analysts for each of the two functions); • Step 2: If an analyst has written a report over the assessment period, the SOC manager or the technical lead must review the report and assign a score between 1 and 7 for the quality of report.The manager would also assign a score for the quality of analysis as explained above; • Step 3: The evaluator must enter the scores for the remaining functions.Once the evaluator has entered all the scores, the SOC-AAM tool will automatically calculate an analyst's overall performance score; and • Step 4: To allow a comparative assessment of an analyst's performance against their peers, the team's total scores for each function must be entered for the evaluation period.Once completed, the score for individual analysts would be displayed as a percentage to reflect their individual contribution to the overall team's effort for a reporting period.

Empirical evaluation of the SOC-AAM
The Method Adoption Model (MAM) ( Paz et al., 2015 ) was used to evaluate the efficacy of the SOC-AAM.As mentioned in Section 4 , the MAM is derived from the Method Evaluation Model (MEM) ( Moody, 2003 ), a theoretical framework for validating design methods.
The MEM consists of six constructs whose relationship are shown in Fig. 3 .The definitions for the MEM constructs which we adopted ( Moody, 2003;Paz et al., 2015;Recker, 2008 ) are as follows: • Actual Efficiency: refers to the effort required to apply a method; • Actual Effectiveness: denotes the degree to which a method achieves its objective; • Perceived Ease of Use (PEOU): refers to the degree to which a person believes that using a method would be free of effort; • Perceived Usefulness (PU): denotes the degree to which a person believes that a particular method will be effective in achieving its intended objective; • Intention to Use (ItU): denotes the extent to which a person intends to use a particular method; and • Actual Usage: represents the extent to which a method is used in practice.
While the MEM has six constructs, there are instances when some of the MEM constructs may not be relevant or even applicable ( Condori-Fernandez and Pastor, 20 06;Moody, 20 03;Paz et al., 2015;Recker et al., 2005 ).In this research, we focused on the perception and intention-based constructs -See Fig. 3 below.
The perception and intention-based constructs, known as the MAM, are present in all successful methods ( Moody, 2003;Paz et al., 2015 ).Our objective was to test the SOC-AAM against those constructs.Our strategy is similar to the works of Recker et al. ( 2005 2004) state that one of the major advantages of using the MAM and the associated measurement scales is that it is based on previous studies where similar surveys were used and validated in the context of method adoption.We do not use the 'Actual Usage' and 'Actual Efficacy' constructs from the MEM for the reasons outlined below.
Firstly, Moody (2003) , states that it is not possible to assess 'Actual Usage' under experimental conditions.Given that our testing and evaluation was conducted as an experiment, it was not feasible to test the 'Actual Usage' construct.However, an intention to use a particular method can be a predictor of 'Actual Usage' ( Paz et al., 2015;Recker et al., 2005 ).Although we do not include the 'Actual Usage' construct in the evaluation, we argue that an expression of intent by SOC practitioners to use the SOC-AAM in future indicates the likelihood of the SOC-AAM being adopted in practice.
Secondly, Moody (2003) emphasises that the use of the 'Actual Efficacy' constructs are only meaningful when comparing between different methods.Given that the SOC-AAM is a new, and to the best of the researchers' knowledge, the only existing systematic method for capturing the performance of an analyst based on multiple SOC functions, it was not possible to compare it with another systematic method to justify the use of the 'Actual Efficacy' constructs.This study, therefore, does not use these two constructs.
In addition to the perception and intention-based constructs, we also solicited the opinions of the experts on the perceived completeness (PC) of the SOC-AAM ( Paz et al., 2015;2013 ).We define 'PC' as the extent to which a SOC expert believes that the SOC-AAM covers all aspects of analyst functions ( Paz et al., 2015 ).Also, we solicited the opinions of the SOC managers and analysts to ascertain whether the use of the SOC-AAM as the evaluation tool resulted in improved performance.Lastly, we also asked SOC managers to provide feedback on whether the scores achieved by their analysts during the experiment reflected the manager's perception of the contribution of each analyst within the team.

Testing of the SOC-AAM
The testing and evaluation of the SOC-AAM took place at two different or ganisations.The evaluation was guided by the following research questions, which were developed based on the MEM/MAM, as explained earlier: • (RQ1) Do SOC managers and analysts consider the SOC-AAM as easy-to-use and useful?• (RQ2) Would SOC managers and analysts use the SOC-AAM in practice in the future?• (RQ3) According to the SOC managers and analysts, to what extent does the SOC-AAM cover all the main functions of an analyst?• (RQ4) According to the SOC managers and analysts, did the introduction of the SOC-AAM lead to an improvement of an analyst's performance?• (RQ5) According to SOC managers, did the final performance score(s) of analysts within their team reflect the manager's perceived performance of each analyst?
To protect the identity of the participants and the organisation where the evaluation took place, we refer to the two organisations as Corp1-SOC and Corp2-SOC.Both organisations were 'purposively' ( Ogbeifun et al., 2016 ) selected through opportunity and agreement with the senior managers.Participants from both SOCs were given a participant information sheet outlining the purpose of the study.The participant information sheet detailed the participants' rights, and they were free to choose whether or not to participate in the study.
Corp1-SOC provides a 24x7 security monitoring and response service for its own organisation and also offers SOC services as a Managed Service Security Provider (MSSP) to a number of other   organisations in the United Kingdom and across Europe.Analysts who work in Corp1-SOC perform the range of SOC functions as detailed in Table 1 above.Thirteen analysts, together with the team manager at Corp1-SOC, agreed to participate in the study without any reward for their participation.
Corp2-SOC, on the other hand, runs an internal SOC for a Norwegian telecommunication company.Analysts who work at Corp2-SOC also undertake a wide range of SOC functions as detailed in Table 1 above.Following a discussion with senior managers at Corp2-SOC, two analysts and their SOC manager agreed participate in the testing and evaluation of the SOC-AAM.
The SOC-AAM tool has an accompanying 'Read Me' notes which detail a step by step process regarding how to use it as described above.In addition, the managers from both SOCs were presented with a practical demonstration of the SOC-AAM via Zoom by the first author.During the demonstration, hypothetical KPIs values were used to facilitate the explanation of the evaluation process.
During the evaluation experiment, monthly meetings were held with the SOC managers via Zoom to discuss issues that may have risen while using the SOC-AAM.The meeting provided an additional opportunity to ascertain the ground truth on the weights assigned to different functions.As part of the evaluation process, analysts from both SOCs were given the SOC-AAM template by their respective managers to record their output in the areas of measures, apart from the quality of their analysis and the quality of their report.The SOC managers provided us with anonymised scores of their analysts at the end of each month.Appendix D -Table 17 shows the monthly individual breakdown scores by one of the analyst at Corp2-SOC.

Post-testing feedback
After four months of testing, the participants were invited to participate in a post-task survey.The purpose of the survey was to The extent to which a person believes that the SOC-AAM will be effective for evaluating the performance of an analyst.

PU1
Overall, I found the SOC-AAM to be a useful method for evaluating an analyst's performance.

PU3
The SOC-AAM provides an effective approach for measuring the performance of a SOC analyst.
Adopted from: Moody (2003) (PU6, Q12) Perceived Ease of Use (PEOU) The extent to which a person believes that using the SOC-AAM would be free of effort.

PEOU1
I found the procedure for applying the SOC-AAM easy to follow.The SOC-AAM is clear and easy to grasp.
Adopted from: Moody (2003) (PEOU5, Q11) Cherdantseva ( 2014) (PEOU2, Q33) ascertain the opinion of the study participants on the PEOU, PU, ItU and PC of the SOC-AAM.The survey also had two questions specifically designed to find out: (1) whether the introduction of the SOC-AAM resulted in an improvement of performance, and (2) whether the SOC managers believed the score achieved by an analyst during the testing period accurately reflected the performance of the analyst.
The perception and intention-based questions for the survey were formulated using a 5-point Likert scale and were based on items synthesised from previous works ( Davis, 1989;Davis et al., 1989;Moody, 2003;Paz et al., 2015 ).The wording of the items were changed to reflect the objectives of the SOC-AAM.For each question, the participants were asked to rate their responses on a scale, ranging from 1 to 5, where 1 denotes an extremely negative perception of the construct, and 5 a very good positive rating.Given that 3 is the midpoint of the 5-point Likert scale, mean scores obtained from the study participants' constructs that are greater than 3 will be considered as positively perceived by the SOC experts.This approach is similar to the work of Paz et al. (2015) .The constructs and the original scales adopted for the study Tables 5 and 6 .

Data analysis and results of the post-testing feedback
Seventeen participants in total completed and returned the post-testing survey.The average years of experience for the participants is 4.7 years.The industry breakdown for the participants is: 3 (17.6%)employees from the telecommunication provider, and 14 (82.4%)employees from the Managed Security Service Provider (MSSP).The feedback received from the surveys was analysed and used to answer the research questions defined under Section 7.1 .The Cronbach's alpha was used to assess the reliability and internal consistency of the set of scale items used in the survey.A high level of internal consistency was found with all the constructs, with Cronbach's alpha > 0.7 in all cases (See Table 7 ).This implies that the items in the questionnaire are highly correlated.Although there is no agreed upon standard for reliability, in the literature, α ≥ 0.7 are typically considered to be acceptable Moody (2003) .
Under the MAM/MEM, a score greater than 3 (the neutral point in a 5-point Likert scale) indicates a positive perception ( Gonzalez-Lopez and Bustos, 2019;Paz et al., 2015;Recker et al., 2005 ).Thus, the aim was to analyse the survey data in order to determine whether the overall perception rating from the participants was greater or less than 3 for the various constructs.
A Shapiro-Wilk normality test revealed that the data from the participants was not normally distributed.Therefore, a nonparametric statistical method was used to test the data.A nonparametric method also fits the data collected because of the small sample size ( Gonzalez-Lopez and Bustos, 2019 ).The Wilcoxon signed-rank test was used to determine whether the median score of the participants was higher than 3 ( Gonzalez-Lopez and Bustos, 2019 ).
A one-sample Wilcoxon signed-rank test revealed that the median of the scores from the participants for both PEOU and PU was significantly greater than 3, a p < 0.05, indicating a positive perception of the SOC-AAM from SOC experts.The median scores for the PEOU and PU were 5 and 4 respectively.
An intention to use a particular method is considered as an important factor when evaluating the pragmatic success of a method.The median score from the participants was 5, which is greater than 3 with a p < 0.05.Based on the outcome, we conclude that the participants have intentions to use the SOC-AAM in future evaluations.
When asked about how complete they perceived the SOC-AAM as an evaluation tool, the result showed that participants perceived the SOC-AAM as covering the key areas upon which an analyst's performance can be measured.Fig. 7 shows the results of the PC.The median score for the PC is 4, which is greater than 3 with a p < 0.05.While the SOC-AAM was initially conceptualised using existing SOC frameworks ( Majid and Ariffi, 2019;Onwubiko, 2015;Schinagl et al., 2015 ) and input from SOC experts obtained through interviews, some of the participants reported in their feedback under research question 4 that analysts could be tasked with work that may take time, but that is not accounted for in the SOC-AAM.All the same, our goal was to propose an approach based on the most common analyst's functions as reported by the SOC experts in our earlier work ( Agyepong et al., 2020b ) and insight from the existing literature ( Onwubiko, 2015;Schinagl et al., 2015 ).We recognise this as a limitation in our work.
When the participants were asked whether the use of the SOC-AAM resulted in an improvement, the majority of the analysts The extent to which a person intends to continue to use the SOC-AAM for the evaluation of an analyst performance.

ItU1
If I retain access to the SOC-AAM, my intention would be to continue to use it when evaluating analysts' performance.

ItU2
In the future, I expect I will continue to use the SOC-AAM for measuring an analyst's performance.
Adopted from: Recker (2008) (ItU2) ItU3 I prefer to continue to use the SOC-AAM for the measuring of an analyst's performance over other ways of assessing an analyst's performance.
Adopted from: Recker (2008 The extent to which a person believes that the SOC-AAM covers all core areas in evaluating the performance of an analyst.

PCO1
I found the SOC-AAM to be complete method for measuring the performance of an analyst based on their task performance.(92%) commented that the guidelines have improved their incident reports because it provided relevant cues.The manager at Corp1-SOC commented that: "The guidelines for assessing the quality of analysts' analysis has been useful to the team.I think it encouraged them to expand their thinking and take a step back to think through what they need to do when writing their incident report.I also believe that the tool made it possible for everyone within the team to understand the basis upon which they are being assessed." The manager at Corp2-SOC expressed a similar opinion, but suggested that producing a good quality report comes with experience.The manager at Corp2-SOC stated: "I think the SOC-AAM greatly helped my analysts develop their ability to analyze events.I think the quality of analysis criteria are good but analysis comes with experience, knowledge and also process of the organisation." The managers were asked whether the scores achieved by their analysts reflected their perceived view of each individual analyst's contribution within their team.Extracts from the managers' responses are provided below.The manager at the Corp1-SOC stated: "There are some competitive individuals within the team, so I was expecting those individuals to show that competitiveness.However, looking at the monthly scores, it was great to see that all the analysts did pretty well.I am of the opinion that the scores achieved by each individual analyst reflected in how I perceive their contribution to the team.One area that I saw improvement across the board is report writing." The manager at the Corp2-SOC stated that the scores achieved by analysts only reflected about 95% of their performance.According to the Corp2-SOC manager: "implementation and architect activities are not in the SOC-AAM.Therefore, when the results are collected for each period, there will be times when the outcome will not be linear because the analyst was performing other implementation activities.But overall, when compared to the general SOC, I think the SOC-AAM is satisfactory in measuring analyst performance".
The comments indicate that analysts at Corp2-SOC have other tasks that are not measured in the SOC-AAM.However, as stated earlier under Section 3 , our intention was to measure analysts' performance on the basis of their most common functions.Besides, there is also no evidence in the literature to suggest that implementation and architectural activities are typical functions expected of analysts ( Agyepong et al., 2020b;Onwubiko and Ouazzane, 2019b ).So while analysts at Corp2-SOC undertake those activities, this will not be the case in many other SOCs to justify including those activities in our method.

Discussion and research implications
The overall objective for this study was to find a systematic approach to evaluating the performance of a SOC analyst.The findings from the experimental case study shows that the SOC-AAM enables SOC managers and stakeholders such as supervisors to aggregate, quantify and evaluate the performance of analysts in a systematic manner.
Our findings and discussions with the SOC practitioners also lead us to believe that the SOC-AAM offers an operational, adaptable and practical method to evaluating analysts' performance.It is operational because it can be applied, as it is, by any SOC offering the functions identified by the SOC-AAM to evaluate the performance of an analyst ( Onwubiko, 2020 ).It is adaptable and practicable because it can be used to suit each SOC's specific situation and as per the functions offered by a SOC.Also, given that the SOC-AAM covers the main functions expected of an analyst ( Agyepong et al., 2020b ), it provides a comprehensive approach when seeking to evaluate the performance of an analyst.Nevertheless, the number of participants in the study was small, making it difficult generalise outcome of the experimental case study.
While the sample size of the experimental case study makes generalisation of the findings difficult, as stated by ( Yin, 2018 ), the aim of a case study is not to generalise but rather to get deeper understandings of a specific situation.Thus, the objective of the testing was to assess the efficacy of the SOC-AAM from practitioners point of view as a method for evaluating analysts performance.
In comparison to existing performance metrics that are based on KPIs, which generally do not differentiate the effort s of analysts ( Onwubiko, 2015;Sundaramurthy et al., 2015;2014 ), the weights proposed by this study allow SOC managers to differentiate efforts.We make a distinction between the priority of alerts actioned by analysts and, thereby, contribute to solving the current problem that usually does not consider priority ( Kokulu et al., 2019 ).Also, the SOC-AAM considers several aspects of analysts' tasks and previously unmeasured areas, such as dealing with false positives ( Sundaramurthy et al., 2015 ).
Given that different SOCs provide different functions ( Jacobs et al., 2013;Onwubiko, 2015;Schinagl et al., 2015 ), and analyst responsibilities vary between SOCs ( Goodall et al., 2004 ), this study acknowledges that each analyst performs only a subset of the functions presented in the SOC-AAM.While some SOC researchers have attempted to organise analyst responsibilities based on the tiers in which they operate (1,2,3) ( Kokulu et al., 2019;Vielberth et al., 2020 ), there are often inconsistencies in the responsibilities assigned to the tiers.Onwubiko and Ouazzane (2019a) , on the other hand, only describe the roles of analysts without assigning them to a specific tier.These researchers consider all analyst functions to be the generic functions one would anticipate from an analyst.Similarly, Aung et al. (2020) and Li et al. (2016) discuss how analysts apply patches to fix vulnerabilities without assigning them to a particular tier.Also, analysts working for an in-house SOC (a SOC that is owned by the organisation it is protecting) may have different functions in comparison with analysts working for a SOC that offers its services as a Managed Security Service Provider (MSSP).MSSPs are typically third-party organisations that provide SOC services under a specific contract to another organisation.However, as noted in the existing literature ( Jacobs et al., 2013;Zimmerman, 2014 ), both in-house and third-party SOCs offer a wide range of functions as detailed in the SOC-AAM.
The inconsistency and current lack of consensus on the precise expectations of functions of an analyst at each tier influenced our decision not to assign specific tasks to the tiers 1, 2, and 3 in the SOC-AAM, but present a broad range of functions, allowing the performance to be measured based on the specific set of functions offered by a SOC.
The SOC-AAM does not make a distinction between analyst tiers, and hence is more applicable in the context of a SOC with a non-hierarchical structure, where all analysts are expected to have the same level of skills, perform the same functions and work independently ( Alharbi, 2020;Kokulu et al., 2019 ).When using the SOC-AAM, SOC managers may customise the framework and choose to evaluate analysts only in relation to specific relevant functions.SOC managers and analysts should agree on assessment areas in both hierarchical and non-hierarchical contexts.Also, SOC managers could choose to compare the scores of analysts operating at the same tier as a part of each assessment.
Another benefit to the proposed approach is the provision of novel guidelines for assessing the quality of an analyst's analysis and their incident report as a part of an analyst's performance evaluation process.To the best of our knowledge, this is the first research to work collaboratively with industry experts to propose formal guidelines for assessing the quality of an incident analysis/report.Using the proposed guidelines, it is anticipated that novice or junior analysts can improve their performance when it comes to the analysis function.In addition, SOC managers can use these guidelines to assess the performance of analysts in this qualitative based area which traditional is difficult to evaluate ( Achraf Chamkar et al., 2021 ).
Even though this study demonstrates that it is feasible to measure an analyst's performance using a systematic approach, the pairwise comparison conducted as part of the AHP is a timeconsuming activity.Thus, a contribution of this work is to simplify this process by proposing the weights that SOC managers and stakeholders can use to evaluate the performance of an analyst.SOC managers and stakeholders do not have to go through the intense AHP process.The proposed weights can be used, as there is a consensus from the experts.One area of concern is whether the opinions and weights deduced by a small group of experts can be unconditionally generalised to all contexts and organisations.
We recognise this as a potential issue and therefore attempted to lessen this concern by engaging with experts from five different industries.Moreover, to avoid bias in the pairwise judgement, the Delphi method was used over other group data collection method such as a focus group to ensure that a dominant participant do not hijack the session ( Brown, 2018 ).
From a research perspective, this study offers a detailed insight into the work of analysts, and as such, cyber security researchers who may not have access to SOCs can draw on this study to understand the operations of cyber analysts.The areas of measures presented in this work would be valuable to SOC system designers who can draw on the suggested areas of measures when designing systems for SOCs to facilitate the evaluation of an analyst performance.For example, analysts' monitoring dashboards for tools such as Security Information and Event Management (SIEM) and Intrusion Detection Systems (IDSs) can be designed to incorporate some of the performance metrics proposed in this study to capture analysts' performance.Dashboards are interfaces that bring together several security tools on one screen for an analyst.

Conclusion and future work
Evaluating the performance of a SOC analyst is a subject of interest for both cyber security researchers and practitioners, as a poor performance from analysts will negatively impact on the overall effectiveness and efficiency of a SOC.However, existing literature highlights the lack of a systematic approach for evaluating the performance of an analyst, causing frustration for both SOC managers and analysts.
In this paper, we proposed a systematic method for evaluating the performance of analysts consistently and systematically by drawing on a Delphi panel and the principles of the AHP.Our work represents a potential change in direction in how analysts' performance is evaluated.We have demonstrated that it is possible to evaluate the performance of an analyst in a systematic manner based on their task performance by proposing a weighted approach.To the best of our knowledge, this is the first empirical study to propose a systematic approach for evaluating the performance of an analyst.
This study has some limitations in that we focus on analysts' task performance.We recognise that individual work performance can be evaluated from other dimensions such as adaptive performance and contextual performance ( Koopmans, 2014 ).Future work may be to investigate how to capture the performance of an analyst based on other dimensions.Also, some participants described the SOC-AAM as time-consuming and advised combining it with their ticketing systems, such as Jira, to streamline the evaluation process.Working with SOC system designers to integrate the proposed technique into SOC tooling to assist the evaluation could be a potential solution to this constraint.Another limitation of this study is the manager's random selection of a written report as part of the evaluation process.An incident report that is inadequately or poorly written may be missed by the manager.Furthermore, the functions and roles of analysts used in this work are based on a case study conducted with a small number of participants (SOC experts) who selected and validated the functions in a prior work and insights from existing literature that describes the primary function of a SOC ( Majid and Ariffi, 2019;Onwubiko, 2015;Schinagl et al., 2015 ).Therefore, we recognise that the small sample size may have led to the omission of certain functions.Additionally, a different group of participants may have chosen or included additional analyst functions.
Adopted from:Paz et al. (2015) PCO2I found the SOC-AAM to be complete method for measuring an analyst's performance in comparison to existing approaches.

Table 3
The consistency indices for a randomly generated matrix.

Table 4
Quality of Analysis and Quality Report Criteria.

Table 5
Adopted measurement items for the study.

Table 6
Adopted measurement items for the study.

Table 7
Reliability of the scale items.

Table 9
Weights and Consistency Ratio (CR) for the Monitoring and Detection Function Subcriteria.

Table 11
Weights and Consistency Ratio (CR)for the Baseline and Vulnerability Function Subcriteria.

Table 13
Weights and Consistency Ratio (CR) for the Response and Reporting Function Subcriteria.

Table 16
An Analyst Assessment Template.