Identifying Thresholds for Similarity-Based Class Cohesion (SCC) Metrics

. In the object-oriented design (OOD), quality measurement can be implement based on the possibility of inter-relationship between attributes and methods in the class diagram and interaction between objects. The process of calculating the value of cohesion on the design of object-oriented software using Similarity-Based Class Cohesion metrics can be done by identifying the relationship between the three types of possible interaction between those methods, method-attribute, and interaction attribute-attribute. But the existence of this measurements theory is rarely used in the software development industry. This is due to there is no threshold value that used as the limit of good or bad design. This study aims to determine the threshold of cohesion metric based on the class diagram. The result showed that the threshold of SCC metric is 0.45. 0.45 is the value that has the highest level of agreement with the design expert


Introduction
The progress of software engineering science increasingly to supports the development of techniques to improve the quality and maintenance of the system. At the design stage, the design of object-oriented programs is more widely used than the design of a structured program [1]. In recent years, object-oriented programming languages such as C ++, PHP and java have gained popularity in the software industry [1]. In this case, the cause of most production software using object-oriented program design [1].
In the object-oriented design (OOD), quality measurement can be implement based on the possibility of inter-relationship between attributes and methods in the class diagram and interaction between objects. This process can be used for the determination of the quality of the program at the design stage of software design based on the cohesion of the software before entering the implementation phase. This is the underlying objective of cohesion value calculation. Some researchers propose some measurement matrix cohesiveness in the class diagram. [1], [2]. Several researcher works on object-oriented approach [1], [2], [3]. One is the similarity matrix formulated Similarity-Based Class Cohesion (SCC). The process of calculating the value of cohesion on the design of object-oriented software using SCC metrics can be done by identifying the relationship between the three types of possible interaction between those methods, method-attribute, and interaction attribute-attribute. Cohesion is an important metric in the basic concepts in software design. The higher the value of the cohesion of a module, the better the quality of the resulting software [4].
But the existence of such measurements theory is rarely used in the software development industry. This is due to there is no threshold value that is used as the limit of good or bad design. There is no information on the threshold of the measurement matrix cohesiveness that can be used by IT practitioners [5]. Metrics SCC will generate value for cohesion on a scale of 0-1, where 0 indicates a value closer to the low draft cohesiveness and vice versa.
This research will formulate a threshold value on the Similarity-Based Class Cohesion (SCC) so that this matrix can be used by IT practitioners to measuring quality in software development. The study include a framework to find a threshold value. The dataset used is a set of class diagrams, which involve an expert to determine the threshold value.

Quality Cohesion Metrics
Matrix is a procedure that pairs the particular characteristics of the observed entity into a numerical value [7]. Characteristics and entities that want to be observed are free. Therefore, the benefits of the metric depend on what will be achieved from the measurement results that have been done. The numerical value of the metric will give observers knowledge of the value that is too high or too low, too much or too little. In other words, the metric is a reference point that indicates the semantic meaning that is useful to a value [4].
Unified Modeling Language (UML) is a graphical notation that is supported by single model, which enables the description and design of systems built using objectoriented programming. This definition is a simple definition. In fact, what people are saying about UML differ from one another. This is because by its own history and differences in perception of what makes a process engineering software is effective [fow-04]. Similarity-Based Class Cohesion (SCC) Metrics developed by Dallal, is a matrix that measures the value of cohesion based on the interaction between the attribute-attribute, those methods, and attribute-method directly and indirectly.

Similarity-based Class Cohesion (SCC)
SCC matrix is a combination of several matrix [1] has been formulated previously, namely: Method-Method Attributes through Cohesion (MMAC), Attribute-Attribute Cohesion (AAC), Metric Method-Method Invocation Cohesion (MMIC). Here is an explanation of the matrix

Method-Method through Attributes Cohesion (MMAC) dan Attribute-Attribute Cohesion (AAC)
The similarity between the two rows and two columns method quantifies the cohesion between the pair and the pair attribute respectively. The similarity between a pair of row or column is defined as the number of entries in a row or column that has the same binary value "1" as the corresponding elements in the other row or column. Similarity normalized, denoted as ns (i, j) between a pair of rows or columns i and j is defined as the ratio between the ability of the two rows and columns with a number of entities Y rows or columns of metrics and it is defined formally as follows: Cohesion refers to the degree of membership in the module. MMAC is the cohesion average of all couples method and AAC is a cohesion average of all couples attribute. Formally, using MMAC metrics formally defined as follows: (2)

Metrik Attribute-Method Cohesion (AMC)
The idea of similarity applies only when both elements considered is of the same entity. Therefore, the idea of similarity applies to those methods and attribute-attribute pair, but does not apply to attribute-method pairs for attribute and the method consists of two different types. In this case, cohesion is the average number of attribute-method interaction metrics represented in AT. In other words, AMC is the ratio of the number 1 in the AT metric to the total size of the metrics. AMC is defined formally as follows:

Metrik Method-Method Invocation Cohesion (MMIC)
AT metric does not represent the cohesion between the pair method if one method call the other when it is invoked method has no parameter type matches the attribute. In this case, cohesion is the average number of interactions MMIC. This is represented by the ratio of the number of values in MI metrics with the overall size of the metrics. Those methods cohesion (MMIC) is formally defined as follows: ℎ .

Similarity-based Class Cohesion (SCC)
SCC metric is defined as the weighted sum of the MMAC, AAC, AMC, and MMIC of the class that has been defined for k> 1 and l> 1 as follows: Where MP is the number of pairs method, the AP is the number of different pairs of attribute-types, and MOP is the number of ordered pairs method. By replacing the MP, AP and MOP by equivalence in equation (6). And considering all the cases k and l except when both are equal to 0, the SCC is formally defined as follows:

Koefisien Cohen's Kappa
Cohen's kappa coefficient proposed by Jacob Cohen in 1960 are coefficients to evaluate the agreement between the two assessors or assessment methods. Cohen's Kappa is a method of measuring the correctness of the data [6]. This coefficient can be formulated in [6] : Where Po is the proportion of the similarity of observation and Pc is the proportion expected by chance. Then, the data obtained from observations of two observers described in terms of the relevance table as shown in Table 1. where a and d each is relevant and irrelevant similarity, and n is the total data. While Pc obtained using the following formula [6]. (9) So that to obtain Po and Pc value, then the value of K (Cohen's Kappa) can be calculated. Kappa value can determine the degree of agreement among experts with the system. Table 2 provides interpretation Kappa values.

Methodology
The determination of threshold cohesion values conducted in the iterative process. It aims to get the value threshold on the metric values generated by the SCC. SCC calculation process generates a number between 0-1, then the closer one cohesive value is the better (good design), and vice versa. This research will identify for value which is the boundary between good design and bad design based on these values. Specialists will be involved to determine the value of this threshold. The research methodology is described in Figure 2.
To perform all these processes, we collecting several class diagrams first, and the convert it into XML format. We also developing web-based applications to perform automatic calculations by applying the matrix SCC. After getting the value of SCC from application, the experts will be involved in the identification process based on his experienced to assess the cohesiveness of a class by labeling good or bad.
The first step is to provide temporary threshold value. Based on this threshold value, each class labeled with good or bad. Then the next step matches the results obtained expert. Kappa coefficients applied to the process is to get the value of an agreement between the class which has been labeled with the results obtained from the experts. The process will be repeated until it is found best Kappa value. Once found best kappa value, the last step is to determine a threshold value. Best kappa value indicates that the value is a threshold value, where the value of an agreement between the expert and the system is at its highest. The amount of data also affect the results of the suitability of an expert.

Dataset and testing scenario
The data used in this study was 50 class are taken from various sources on the Internet. These sources ibm.com, creately.com, codeproject.com, kuwatalab.com and gliffy.com. Classes are redrawn first and then converted into XML format by using Visual Paradigm. In addition, we also developed a web-based software to perform automatic calculation of SCC values of a class test in xml format.
In this study, there are 2 (two) scenarios in the process of determining the threshold value. The first scenario, we calculated the value of SCC on 50 test class by using the application. This application implements the SCC matrix in calculating the test data. The second scenario, we asked experts in determining the good or bad of a class based on its experience. The main objective of this scenario is to determine the degree of agreement between the value that is issued and the value of an expert system.

First Scenario Result
The test is performed by calculating 50 samples class as the test data to the application. The test data used are class diagrams obtained from several sources that exist on the internet, then this class diagram is regenerated back by using the tools Visual Paradigm for getting the form of class diagram in XML format. The main purpose of this scenario are to analyze the value of cohesion by using matrix Similarity-Based Class Cohesion (SCC). The following figure 3 shows a scatter plott. The value of cohesion produced has a minimum scale of 0 to the maximum value is 1. As shown in the sample class CreditCardStrategy and Transaction in Figure 4 Based on CreditCardStrategy class, the class that has three types of method with two types of attribute parameters, generating 0.81.

Fig. 4. Class Diagram and Communication Diagram from Credit card Strategy and Transaction
While in the case of Transaction class has three methods, but there are some parameters do not have the type attribute parameter value that generates 0.18 cohesion value.

Second Scenario Result
Experiments against experts aim to ensure a cohesion value of the class that is used as a data sample in the test application has a high degree of cohesiveness or not. Experts will examine one by one sample class without notice or see the test results from the application of cohesion meters. Based on measurements taken by the experts, there are 33 class has a good level of cohesiveness and the 17 class has a poor level of cohesiveness.
From the results of the test scenarios 1 and 2 can be taken a scenario analysis results that the class of 50 samples tested by experts and there are 21 classes of applications that agreed to have high cohesion value and 12 class agreed with a low cohesion value. While there are 17 classes that identified produce different grades cohesion between applications and expert.

Identify Cohession Threshold
Cohesion value that has been defined by Dallal is a range of 0.1 -1. Value range 0.1-1 is used as a temporary threshold. The iteration process is done ten times according to the value range of 0.1 -1. Each became an limiting value of the calculation results of each class cohesion is good or not good. The amount of data is good and not good will be adjusted to the results of expert analysis. The amount of data is good and not good will be the basis for calculating Kappa coefficient. Overview threshold correlation between temporary and Kappa coefficient calculation results are reflected in figure 7. The results of Kappa coefficient calculation indicate different values between 0-0.1. The threshold value of 0.5 has a highest Kappa coefficient value, at point 0.29. At point 0.5, degree of agreement between the expert system and is the highest. The calculation is performed again at a more detailed level. The threshold value used is the range between 0.41 to 0.55. In this second iteration, conducted to see or find a threshold value more detail. The same process is performed, and the results of the second iteration depicted in pictures 8. .

Discussion
A threshold value of 0.45 is obtained by looking at the highest level of agreement between the results of the application and the results of an expert in calculating the value of the cohesion of a class. It can be concluded that the value of cohesion below 0.45 can be said of a class have the level of cohesion that is classified as poor, while if the value of cohesion is more than or equal to 0.45 means that the level of cohesion for the class is relatively good.
A threshold value of 0.45 has a Kappa coefficient of 0.3254. These values can be interpreted that the agreement between the system and the expert is enough (Fair Agreement).
Specialists assess the level of cohesion of a class based on experience. The level of cohesion of a class is the degree of closeness between the elements in the class. These elements are the attributes and methods of a class. If the closeness between the attributes and methods of a class higher then it can be said that a high level of class cohesion. If all the attributes are managed by the whole method which is owned by the class it can be concluded closeness between the method and height attributes. Matrix used in this study only look at the data type of the parameter from a method. If the data type of a method is the same as the data type of the attributes of the class, then it is assumed that the method to manage these attributes. However, experts are not as simple as that in assessing the proximity between the methods and attributes. Clearer information needed, whether it is true that an attribute is managed by a method. Not only on the basis of similarity type it. Because the type parameter of a method can be a source of other data that is not an attribute of a class. Certainty whether the method really manage attributes can be seen from the source code of the method. However, a limitation of this study is the level design in which the determination is based on the cohesion of the class diagram only. In this case, there should be a more in-depth information that can be extracted from the class diagram, which shows that a method is definitely manage an attribute

Conclusion
Based on research that has been done it can be concluded that The process of determining the threshold conducted by running 2 (two) scenarios, first scenarios to calculate the value of cohesion by using the application, the second scenario is to involve an expert to determine the good or bad of a class. In determining a measurable criterion in ensuring the cohesion values in a class, the researchers determined the threshold using the approach Cohens's Kappa and can be drawn a conclusion that the value of 0.45 is the best threshold value for predicting a value of cohesion.