A Study of K-ISMS Fault Analysis for Constructing Secure Internet of Things Service

Although Internet of Things (IoT) technologies and services are being developed rapidly worldwide, concerns of potential security threats such as privacy violation, information leak, and hacking are increasing as more various sensors are connected to the Internet. There is a need for the study of introducing risk management and existing security management standard (e.g., ISO27001) to ensure the stability and reliability of IoT services. K-ISMS is a representative certification system that evaluates the security management level of the enterprise in Korea and is possible to apply as a standardized process to enhance the security management of IoT services. However, there are growing concerns about the quality deterioration of the K-ISMS certification assessment these days because of internet security incidents occurring frequently in K-ISMS certified enterprises. Therefore, various researches are required to improve the accuracy and objectivity of the certification assessment. Since existing studies mainly focus on simple statistical analysis of the K-ISMS assessment results, analysis on the cause of certification assessment fault based on past data analysis is insufficient. As a method of managing the certification inspection quality, in this paper, we analyze the association among the fault items of the K-ISMS certification assessment results using association rule mining which involves identifying an association rule among items in the database.


Introduction
According to a survey by Gartner, things connected to the Internet are expected to grow to 26 billion, and market size is forecast to reach USD 1 trillion by 2020 [1]. IoT is widely applied in various areas closely related to daily lives such as smart home appliance, smart car, and healthcare. Since multiple networks can be controlled even with a sensor, hacking of an area can be fatal since it can cause security threats in a chain reaction. With the IoT service broadly utilizing the sensor information to provide a wide range of information, the risk of information leak will increase. Malfunction or suspension of IoT devices will also pose very serious threat even to social infrastructure that the economic damage is predicted to reach KRW 17.7 trillion by 2020 [1]. Therefore, there is a need to consider the technical and administrative vulnerabilities of each element, such as sensor, wired and wireless network, and platform of IoT, from the design stage and to review and study the application of the existing standard as the key tool for continuously inspecting them even at the operating stage.
Korean government operates the information security management system "K-ISMS" certification system to assess if an organization has established and managed appropriate information security environment. Therefore, the K-ISMS, which is similar to "ISO27001" as the international standard for information security management systems, is designed to improve the information security management level of enterprises and protect them from various security threats [2]. K-ISMS evaluates whether an enterprise has set up a comprehensive information security management system including administrative, technical, and physical protective measures to protect the safety of their information assets, using the 104 certification criteria; it then issues the certification if the enterprise meets the requirements. K-ISMS was introduced in Korea in 2002, and 332 enterprises have acquired the certification. Since the acquisition of the certification by an 2 International Journal of Distributed Sensor Networks enterprise over a certain size became mandatory in 2013, demand for and interest in K-ISMS certification has been gradually increasing [3].
With Internet security incidents (e.g., hacking) occurring frequently in K-ISMS certified enterprises, however, there are growing concerns about the quality deterioration of K-ISMS certification assessment these days. Since the information security management systems of enterprises are different, and fault cases vary, specialization and extensive experiences in certification assessment are required. Moreover, there are limits in maintaining objective and accurate assessment quality, since the 104 criteria should be evaluated within a short period of time. To solve these problems, there were various studies including the case study of faults in K-ISMS [4], the economic effect analysis of K-ISMS certification [5], and the analysis about process of security management for various IT services [6,7]. However, a study using data mining between faults was not performed yet.
Thus, we studied to provide a guide to extracting preferred assessment items during the limited assessment period by analyzing the fault pattern that occurs frequently and association through the K-ISMS fault data. For this purpose, we apply the data mining analysis technique in order to analyze the association relationship among the fault items of the K-ISMS certification criteria. The paper is organized as follows. Section 2 introduces association rule mining the most known and used unsupervised algorithms. In Sections 3 and 4, the experiments are performed on K-ISMS fault data. The conclusion is given in Section 5.

Theoretical Background
Data mining is a knowledge-finding process that extracts unknown useful information by analyzing a large quantity of accumulated data. Among the research studies identifying the hidden pattern in the data, the association rule finding area was studied most widely in many areas of market forecast, medical and IT engineering research [8,9]. The association rule analysis refers to a technique that finds a useful pattern, which is expressed as a "condition-result" formula among data items. The list of association rules extractable from a given data set is compared in order to evaluate their importance level. The measures commonly used to assess the strength of an association rule are the indexes of support, confidence, and interest [10].
The problem of finding association rules → was first introduced in 1993 by Agrawal et al. [11] as the data mining task of finding frequently cooccurring items in a large Boolean transactional database D [12]. Typical applications include retail market basket analysis, item recommendation systems, cross selling, and loss-leader analysis. In the classical framework, an association rule is considered to be interesting if its support (s) and confidence (c) exceed some userdefined minimum thresholds [13]. Support is defined as the percentage of transactions in the data that contain all items in both the antecedent and the consequent of the rule; that is, [14]. Confidence on the other hand is an estimate of the conditional probability of given ; that is, ( ∩ )/ ( ) [13].
Association rule finding consists of the process of identifying an association rule that has the threshold value of predefined support and confidence. This process broadly includes two processes [15,16]. One is "frequent itemset finding, " which finds the itemset that satisfies the support threshold value "minimum support" only as a technique of finding the items that occur concurrently in the transaction. The other is "association rule generation, " which adopts the rule satisfying the confidence threshold value "minimum confidence" only among association rules created from the found frequent itemsets.

Frequent Itemset Finding.
While finding the frequent itemset, a combination of the itemset that can be created from the given item is created, and the transaction data is searched for the individual itemset that has been created in this manner to check whether the minimum support can be satisfied.
When a set of frequent items in the transaction database is = { 1 , 2 , . . . , } and a transaction set composed of transactions = { 1 , 2 , . . . , } is given, transaction is defined as a subset of frequent set ( ⊆ ) [17]. If includes items, it is defined as k-itemset [17,18]. For example, {Beer, Diapers, Milk} are 3 itemsets; the null item has no items at all.
If transaction includes all items in itemset , is said to support , expressed as Supp( ). At this time, support count, ( ) can be regarded as a number of transactions including the itemset in question: When the user defines minimum support (minsup) and ( ) ≥ minsup is satisfied, itemset is called frequent itemset [19].

Association Rule Generation.
Rule generation is a process of creating an association rule from the frequent itemset found during the "frequent itemset finding" process. Suppose and are a set of items that do not contain the same element: ⊂ , ⊂ and ∩ ̸ = 0 and ̸ = 0 [16] The association rule expresses an association among frequently occurring data in the form of "condition → result (Rule: ⇒ )" rule. At this time, is called LHS (Left Hand Side), and is called RHS (Right Hand Side) [19]. Support and confidence are used as a statistical criterion to verify the validity of the association rule. "Support" is a ratio of the transactions that satisfy items and among all transactions and is expressed as Supp( ⇒ ). At this time, since "support" implies the frequency of the frequently occurring pattern or rule, "support" should have a big value to increase the usefulness of the pattern or rule: "Confidence" is a criterion for implying the strength of the rule. If the rule ⇒ exists, "confidence" refers to a ratio of the transaction that includes at the same time, among the transactions that include . "Confidence" is expressed as conf( ⇒ ). This "confidence" becomes an International Journal of Distributed Sensor Networks 3 index that can measure the accuracy of the conclusion 's rule, if condition is true. Therefore, high "support" enables accurate prediction: Finding an association rule in the data item involves finding an itemset that has higher support and confidence than the user-defined minimum support and minimum confidence. For example, let us assume a situation where in the following association rule candidates are identified from the {bread, egg, milk} itemset [18]. At this time, if the minimum confidence is 70%, the second and third association rules will be selected. In this way, possible combinations of all itemsets are created, and some of them are selected as association rule depending on whether the minimum confidence is satisfied or not.
(Rule 1) bread ⇒ egg, milk confidence = 0.3/0.5 = 60%. A strong association rule can be filtered out using the support and confidence criteria. However, there are weakness of support and confidence. Support suffers from the "rare item problem" [20]: infrequent items not meeting minimum support are ignored which is problematic if rare items are important [21]. On the other hand, if the minimum support is low, the find area becomes larger. In addition, high minimum confidence and minimum support do not necessarily mean strong association, and they can occur accidently. Therefore, there is a need for statistical correlation analysis, such as lift and conviction, to solve these problems and find a strong association rule [21]. Interest (or lift) is another statistic which attempts to correct this weakness [22].
Confidence tends to rate rules highly where the consequent ( ) is frequent [23]. For example, if 80% of transactions in a database contain , then the expected confidence of any rule → is 80%, even before taking the influence of on into account. The interest( → ) is defined as the confidence( → ) divided by the proportion of all transactions that contain . This scales the confidence to account for the commonality (or rarity) of [22].
The interest measure [13] is defined over [0, ∞] and its interpretation is as follows: (i) If interest(Lift) < 1, then and appear less frequently together in the data than expected under the assumption of conditional independence. and are said to be negatively interdependent.
(ii) If interest(Lift) = 1, then and appear as frequently together as expected under the assumption of conditional independence. and are said to be independent of each other.
(iii) If interest(Lift) > 1, then and appear more frequently together in the data than expected under the assumption of conditional independence. and are said to be positively interdependent.

Analysis Data.
In this paper, we analyzed the fault data of 76 enterprises that received certification assessment in 2013 (uses only the statistical results) and applied the representative "Apriori algorithm" [17][18][19] for association rule mining. The average fault rate of those enterprises was found to be 15%. (The explanation about terms of K-ISMS control items used in this paper is described in appendix.) Among the 104 K-ISMS certification assessment items, the frequent itemset whose value was higher than the minimum support was created. A total of 825 rules were found using the brute-force method. When a number of fault items included in these rules were analyzed, 58 one-itemsets, 494 two-itemsets, 312 three-itemsets, and 20 four-itemsets were created. Figure 1

illustrates the K-ISMS fault items that occur frequently, listed in order of CL-1-1 (asset identification), AC-3-3 (access control), OS-2-2 (security system operation), AC-2-3 (access right review), and CC-1-1 (encryption policy establishment).
If a number of items increase when generating a frequently occurring itemset candidate, the computation workload increases exponentially. To solve this problem, the "pruning" method [24] is used to make the unnecessary part concise. "Pruning" involves getting rid of the combination that does not satisfy the threshold criterion in each phase. The most universal pruning method is MSP (minimum support pruning). In other words, if support of the itemset combination is smaller than the threshold value, the item is no longer added. To remove the duplicated association rule, 307 association rule candidate groups are created through support-based pruning.

Association Rule Analysis.
In the stage of support analysis, a 10% support of a certain rule ( → ) means that the ratio of the rule followed by the rule in question is 10% among all faults. Figure 2(a) shows a support distribution graph of all fault items. The average of support is 13.7%, and 2 rules have more than 30% support value. On the other hand, 19 rules have over 20% support value and 286 rules have over 10% support value. (We used arules package in R tools [19,25].) Support is designed to measure how frequently those two faults occur among all transactions, whereas confidence implies that the possibility of fault " " occurrence is 30% if " " has occurred when the confidence of a certain rule ( → ) is 30%. Figure 2(b) shows a confidence distribution graph of all fault items. The average of confidence is 51%; 11 rules have more than 80% confidence and 98 rules have over 60% confidence and 160 rules have over 50% confidence. Figure 3 shows a visualization of the correlation among fault items that were found by the measure criterion (support, confidence, and lift) of association analysis. Table 1 shows the top 5 association rules sorted by the measure of support value. Number 44 of rule can be analyzed as follows. The support level is 34.2%, which is the ratio of finding a fault in the CL-1-1 and OS-2-2 control items at the same time. There is 59% probability that a fault occurs in the OS-2-2 control item if a fault occurs in the CL-1-1 control item. In addition, since the lift is over 1, which shows a correlation between the two control items, the correlation of the association rule is strong. On the other hand, Number 443 of rule is the association rule with low correlation because the lift is under 1, even though the support and confidence of this rule are 30.2% and 52.2%, respectively. Table 2 shows the top 5 association rules sorted by the measure of confidence.
Number 40 of rule can be analyzed as follows. The ratio that a fault occurs in control items CC-2-1, DR-2-1, and AC-2-3 at the same time "support" is low (10.5%). Note, however, International Journal of Distributed Sensor Networks   that the ratio of a fault occurring in control item AC-2-3-when fault occurs in control items AC2-2 and OS-2-2 "confidence"-is 100%. In addition, since the lift of this rule is over 1, the correlation of the association rule seems strong rule.

Results and Discussion
We have performed the process of figuring out the association rule that has the predefined support and confidence threshold value, in order to carry out relation analysis among K-ISMS faults. The first process is "frequent itemset finding, " which finds the itemset that satisfies the support threshold value only. The other process is "association rule generation" that adopts the rule satisfying the confidence threshold value only among association rules, which were created from the found frequent itemsets. Table 3 shows the summary of strong association rules within the range of the minimum support and minimum confidence. However, all strong association rules are not always useful. The support-confidence framework can induce a rule → as an interesting rule, even though occurrence actually does not affect occurrence. To solve this issue, interest analysis is required that indicates the level of rules' correlation. This paper selects 43 association rules by applying the following three conditions (see Table 4): (1) minimum support > 30% and minimum confidence > 50% and lift > 1, (2) minimum support > 20% and minimum confidence > 30% and lift > 1, (3) minimum support > 10% and minimum confidence > 80% and lift > 1.
From the result of analysis, we can forecast that the association of fault occurrence among control items is high, as the rule {CL-1-1 (Information asset identification) ⇒ OS-2-2 (Security system operation)} is the one that has over 30% support level and 50% confidence level. The "information asset identification" is a control item that should classify and identify all information assets of the organization. A fault occurs, if those information assets are not identified periodically, or some assets are skipped. In other words, if a fault is found in "information asset identification, " there is a high possibility of fault occurrence in "security system operation. " The "security system operation" fault refers to the case that the security system operating procedure is not complied, or the blocking rule management log of the security system (e.g., firewall) is not available or lost. There were two rules that have over 20% support level and 70% confidence level. The first rule was {PH-1-4 (Access control of physical area), ⇒ AC-3-3 (User password management)} and the second rule was {AC-4-6 (access control of internet connection) ⇒ CL-1-1 (Information asset identification)}. The first rule-"Access control of physical" control item, refers to the requirement that only the authorized person should be allowed to access the major systems inside the security area, and the access log should be reviewed periodically. A fault occurs, if access control of the outsiders is not sufficient, or International Journal of Distributed Sensor Networks 7      Test and maintenance (C)DR-2-2 the mobile device (e.g., USB) can be brought to it. In addition, the fault of the "user password management" control item occurs when the password of major systems is not changed periodically, or the password use requirements are not met. Therefore, if the enterprise does not perform proper "access control" on major facilities and systems, there is a possibility that the "user password management" of major systems (e.g., server, network) can also be insufficient.

Conclusions
In this study, we used the association mining applied with the "Apriori" algorithm in order to analyze the correlation among K-ISMS faults and could find 43 association rules. The result of this study suggests having a high correlation among faults as if the organization identifies and manages their information asset carelessly, then it can also affect security system operation.
Therefore, the result of those association rule may be referred to as the useful information for decision-making of organization's security activities and can be can be utilized as a guide to the assessment method during K-ISMS certification assessment. If any fault occurs among K-ISMS certification criteria, those items related to the association rule can be checked intensively. Also, it can be a guidance of analyzing the level of the Plan-Do-Check-Act activity (organization's security management phase) from the perspective of the correlation among faults.
However, finding a useful rule can be different according to the size of the data set because the adoption of the useful association rule depends on the occurrence frequency of the analysis data. Therefore, we need various studies of K-ISMS fault analysis such as association in accordance with the scope of organization's certification (a number of employees and system). Based on the association rule results obtained in this study, decision-making tree analysis to forecast the fault status, and fault factor analysis using the structural equation model will be studied as a subsequent study.
Because the paradigm of IT environment is changing from conventional PC and mobile to IoT, new approach is needed in terms of range of protection targets, characteristics of protection targets, and protection subjects to strengthen the security level of IoT continuously. In other words, the protection target should be expanded from the existing PC and mobile devices to all objects such as home appliances, automobiles, and medical supplies. It is also necessary to break away from the conventional method of protection with separate security system and software implementation and interface to establish the security policy, procedure, and standard to control and manage efficiently the administrative security, physical security, and technical security.
Moreover, there is a need for the application of information security management system suitable for the IoT environment to maintain the continuous security level of IoT services and prevent the spread of risk of intrusion incidents, including the identification of key assets to be protected and threats as well as assessment of the current security level to establish policies for coping with threats.