Code Smell Detection Using Whale Optimization Algorithm

: Software systems have been employed in many fields as a means to reduce human efforts; consequently, stakeholders are interested in more updates of their capabilities. Code smells arise as one of the obstacles in the software industry. They are characteristics of software source code that indicate a deeper problem in design. These smells appear not only in the design but also in software implementation. Code smells introduce bugs, affect software maintainability, and lead to higher maintenance costs. Uncovering code smells can be formulated as an optimization problem of finding the best detection rules. Although researchers have recommended different techniques to improve the accuracy of code smell detection, these methods are still unsta-ble and need to be improved. Previous research has sought only to discover a few at a time (three or five types) and did not set rules for detecting their types. Our research improves code smell detection by applying a search-based technique; we use the Whale Optimization Algorithm as a classifier to find ideal detection rules. Applying this algorithm, the Fisher criterion is utilized as a Additionally, the resulting classification rules are analyzed to find the software metrics that differentiate the nine code smells.


Introduction
The complexity of software systems is rapidly increasing, which leads software houses to anticipate continuous change. Due to stakeholders' continuous demands for their reliance on these systems, software houses are under constant pressure to deliver the product on time [1]. Software systems contain high levels of complexity, and maintenance can prove to be difficult. Developers spend more than 60% of their time understanding the code before proceeding with its maintenance, leading to massive costs [2], which accounting for 50%-80% of software expenditure [3]. These expenses can be reduced by detecting and eliminating code smells in the early stages of development [4].
Fowler [5] defines code smells as signs in the program's source code indicating deeper issues that make it difficult to understand, modify, and maintain software. He [5] defines 22 types of code smells and refactoring opportunities. Our research focused on nine types: Large Class (Blob), Long Method (LM), Feature Envy (FE), Spaghetti Code (SC), Data Class (DC), Lazy Class (LC), Functional Decomposition (FD), Parallel Inheritance (PI), and Long Parameter List (LPL). Some of these smells appear in the design phase, whereas others appear in the implementation phase of the software development life cycle (SDLC) [6]. These negatively impact software maintainability and reliability, and damage software quality in the long term by incurring technical debt, introducing bugs, and increasing tension among team members [7].
We select these types because of their critical effect on software systems. Additionally, they continuously appear during the development process and have been used frequently in recent studies [8][9][10][11][12][13]. Our research will focus on optimizing detection rules for bad smells. We use a search-based technique that learns identification rules from software quality metrics to capture its structural architecture.
Search-based software engineering (SBSE) [14] defines detecting code smells as a search problem; an algorithm explores the search areas, guided by a fitness function that captures the properties of the desired solutions. Search space solutions represent the rules that identify code smells. To find the optimal detection rules, this research utilizes the Whale Optimization Algorithm (WOA) [15] as a classifier. WOA encircles the prey (optimal solution) and updates the search agents' positions to find the best solution using the Fisher criterion [16] as a fitness function. The Fisher criterion calculates the desirability of the rule by maximizing the between-class distance over the within-class variance. WOA improves the accuracy of code smell detection more than other genetic and evolutionary algorithms. It adopts if-then classification rules, which detect code smells during the life cycle of software development.
The rest of this paper is structured as follows: Section 2 provides an overview of code smells. Section 3 is a literature review. Section 4 presents and discusses the proposed SBSE solution for code smell detection. Section 5 presents and discusses experimental results. Section 6 concludes our findings and suggests future research directions.

Code Smell Overview
Code smells are symptoms of inadequate design or coding practices adopted by developers due to deadline pressure, lack of skills, or lack of experience [17]. These actions are also called bad-practices, anti-patterns, anomalies, or design defects. Fowler [5] defines 22 types of code smells, as well as refactoring opportunities. This research detects nine code smells types, explained as follows: 1. Large Class (Blob) is a class that monopolizes the behavior of the system. It can modify many fields and properties of the system without taking care of a principle of consistency. This smell type causes low cohesion and high coupling, and can need to difficulties in maintenance. It is also called Blob, Winnebago, and God class. 2. Long Method (LM (is a method that has many lines of code (LOC). If a method has ten lines of code or more, questions should be asked. 3. Feature Envy (FE) occurs when a method is more familiar with the properties of other classes than its own properties. 4. Spaghetti Code (SC) occurs when the necessary system structures are not included in the code. Object-oriented concepts are forbidden, such as polymorphism and inheritance. 5. Data Class (DC) is a class that has data fields but no methods calling the data. The only methods defined are the setters and the getters of these data. 6. Lazy Class (LC) is a useless class, which means that the class has low complexity and does not do much. 7. Functional Decomposition (FD) is found in code written by novice developers and refers to classes that are built to perform a single function. 8. Parallel Inheritance (PI) occurs when an inheritance tree is based on another inheritance tree after configuration. In other words, creating a sub-category for one category and then needing to create a sub-category for another category. 9. Long Parameter List (LPL) appears when a method contains many parameters in its signature; this code smell occurs when the method has more than four parameters.
The previous code smells are a direct result of bad practices from developers during SDLC. Testers found a list of software metrics related to them. This research is interested in studying 12 software metrics that constitute the features of the training and testing data detected by if-then rules. These metrics are as follows [7]
The code smell detection problem was formulated using search-based software engineering (SBSE) methods as an optimization issue. When a software engineering problem is formulated as a search issue, optimization algorithms can be used to solve it [14].
Kessentini et al. [31] employed the genetic programming (GP) to describe code smell detection rules. These rules are a blend of values and metrics that formulate the optimal identification of smells. The average accuracy of this approach is 70%.
Ouni et al. [32] developed a detection and correction technique. It has two targets: The first is the detection of code smell, and the second is the correction of it. They used GP for detection, a correction mission, and the non-dominated sorting genetic algorithm (NSGA-II). The average accuracy was 85% for precision and 90% for recall. This technique was applied to detect three types of code smells.
Boussaa et al. [33] implemented a detection technique that maximizes the coverage base of code smell examples and increases the number of "synthetic" software smells produced and not protected by the first population solutions. This technique used a competitive co-evolutionary algorithm (CCEA); it detected only three types with an average 80% for precision and recall. These approaches' accuracies were not promising as same as the technique presented by Ouni et al. [32], which also detected three types.
Usman et al. [8] introduced a multi-objective search-based technique to discover the most potent combination of metrics that not only maximizes the detection of code smell but also minimizes the detection of well-designed code. This research enhanced the accuracy of detection to 87% precision and 92% recall and increased it to five types.
Tab. 1 provides a comparison between the different approaches; to summarize • The studies detected only a few code smells (between three and five) of the 22 types that Fowler [5] defined; • As indicated, the detection accuracy was not high; • Just one study defined rules for detection; • They did not define the rules based on the size of software systems.
To address these shortcomings, this research constructs detection rules for both medium and large systems, and it enhances detection accuracy via a search-based technique. The proposed framework utilizes WOA [15] search capabilities to find the optimal metrics-based detection rules. They are guided by data given by five open-source systems on nine types, as defined in Section 2. This proposed framework uses the Fisher criterion, whose objective function maximizes the distance between different classes while minimizing the within-class distance. Hence, the detection rules that best satisfy the objective will be chosen as the optimal solution. The WOA, Fisher criterion, and our proposed integration technique are introduced in Section 4.

The Proposed SBSE Framework for Code Smell Detection
Harman describes metaheuristic search in SBSE as shifting from human-based to machinebased solutions to software engineering issues [14]. SBSE aims at reformulating software engineering issues as optimization topics [38,39] where, optimal or near-optimal solutions exist in the candidate solution search space. Driven by a fitness function, algorithms may differentiate between good and bad solutions to find the optimal one for the problem.
To solve the optimization problem of detection, we perform optimization with WOA [15], using the Fisher criterion [16] with software metrics as a fitness function. Fig. 1 shows the Business Process Model and Notation for the proposed SBSE framework for code smell detection. The process within the framework is discussed in the following subsections.

Prepare the Dataset Metrics in Columns and Define Code Smells Based on Rules of Detection
A metric or set of metrics is used to determine whether the code has bad smells or not. We identify code smells for each application based on rules that were used in earlier studies. Tab. 2 shows metrics used to define each type.  This research is conducted on five Java software projects. ArgoUML [42] is an application that helps stakeholders create complex and professional diagrams. Azure [43] is a file-sharing tool. Gantt Project [44] is a project or application-scheduling tool. Log4j [45] is a Java jar that helps in Java login. Xerces-J [46] is a tool used to parse XML. These projects are checked for the code smells based on the 12-software metrics using traditional software testing techniques. The metrics data and the code smells are used to prepare training and testing datasets, which we split using an 80:20 ratio. The training dataset is used to learn and train the algorithm to build the detection rules. The test data is used to evaluate our algorithm based on accuracy measures such as precision and recall. The number of modules and the detected code smells in each system are reported in Tab. 3.

Discretize Datasets
The probability of an exact match in metrics during the search is very low. To make the matching process possible, we discretize the search space using a binning technique to divide the selected attributes into a user-specified number of ranges (bins). The range of numerical values for each metric (feature) is divided into equal-size segments (bins) by the Rapidminer tool [47].

Create Random Population for Each System (Probable Rules)
Optimization algorithms use a random initial population of probable solutions. The solution for the bad smell detection problem is a set of if-then rules defining the code smell types, and the initial population is a random sample of these if-then rules. The software metrics form the condition of each rule, and the code smell type forms the result. For example, "if (f1 ∈ r1&f2 ∈ r2& . . . &f12 ∈ r1), then CodeSmell = true" is a detection rule where, f is the software metric and r is the range. For each project, the population contains 100 individuals. The best if-then rule is the search agent that maximizes the calculated fitness function. A tolerance threshold is used to match search agents (rules) with the training data. The search agents with their corresponding fitness values become the input to the learning algorithm.

Calculate Fitness Function
Detection is solved using an optimization algorithm guided by a fitness function [38]. This evaluation function measures how a given solution is close to the optimum one for the problem (how fit a solution is). The fitness function evaluates the performance of each agent and gives agents with the most improvement for the highest probability of survival [48,49]. We use the Fisher criterion as a fitness function, which tests the distribution of inter-class scatter over in-class scatter [16].
A high fitness value means that the gap between any two classes is significant. Hence, the rules that maximize the distance between different classes are the fittest ones. The Fisher criterion between two groups, i and j, is defined as where V j is the variance of the jth class, defined as V j = 1 2 , and µ i represents the mean of the ith class, defined as µ i = 1 N N k=1 X k . X k represents the k th observation in class i, where, 1 ≤ k ≤ N, and N is the number of observations. Each search agent is matched against the dataset instances. The one that maximizes the Fisher criterion (or maximizes the difference between the center of different classes while minimizing the differences between instances of the same class) is selected as the best search agent. The agents with the highest fitness function values represent the closest optimum solutions and are chosen as inputs for the WOA.

Apply Whale Optimization Algorithm (WOA) as a Classifier (Classification Rules)
We choose WOA as our classifier because several studies [50][51][52] showed that WOA [53] had the highest accuracy compared to the other state-of-the-art evolutionary algorithms, such as PSO [54] and GA [55]. WOA is a swarm-based metaheuristic algorithm. Inspired by the hunting behavior of humpback whales that prefer catching frogs or small fish near the surface of the water using traps, WOA mimics humpback whales in two phases [15]. The first is the exploitation phase, where whales encircle the prey and use the bubble net attacking method, and the second is the exploration phase where they randomly search for prey (Fig. 2).

Exploitation Phase (Encircling the Prey and Bubble-Net Attacking Method)
Since the optimal agent in the search space is not known, humpback whales decide on a location of the prey and encircle it. WOA assumes the target is near the best existing solution. Other search agents can mathematically move their positions closer to the best search agent after selecting it with these equations (2) where C and A are coefficient vectors, X * is the best-solution obtained, t is the real iteration, and X is the absolute value of the current solution. Values A and C are determined by where a is linearly decreasing vector from 2 to 0, and r is a random vector between 0 and 1.
where t is the current iteration, and MaxIteration is the maximum number of allowed iterations.
After encircling shrinks, WOA calculates the distance between the current solution (X ) and the best solution (X * ). The path between the humpback whale and the prey is represented as where D = X * (t) − X (t) is the distance between the whale X and the prey, b is the logarithmic spiral form, and l is a random value between −1 and 1. From previous equations, shrinking and moving in a spiral-shaped direction occur with a 50 percent probability (8) where p is a random number between 0 and 1.

Exploration Phase (Search for a Prey)
Through updating the random solution chosen in this phase, WOA explores the optimal solution. To move suboptimal solutions away from the more popular search sites, the vector A is used to introduce random values less than −1 or greater than 1. As in the equation, it can be modeled mathematically in (9) and (10), respectively: where Xrand stands for a random whale picked from the current population. The flowchart for the previously outlined WOA phases is shown in Fig. 3.

Experimental Study and Evaluation
After completing the search, the resulting classification rules with the same software metrics are applied on a separate testing dataset to verify the efficiency of the proposed framework. In this section, we present the experimental setup and compare performance with previous approaches.

Experiment Setup
These experiments are simulated on a PC using an Intel(R) Core(TM) i5 CPU, 8 GB RAM and 1 TB hard disk using the following tools and jars: • NetBeans IDE V. 8.2 [56] for data preprocessing and defining code smell types in training data; • JDK 1.8 [57] for installing Java and coding; • RapidMiner V. 5.3 [47] for the discretization process; • JXL (Java Excel API) [58] and Apache POI (HSSf and SS) [59] for reading and writing in datasets Excel sheets; • NetBeans IDE V. 8.2 [56] for the creation of random populations, calculating the fitness method, training the WOA, testing the WOA, and calculating accuracy.
The five software projects are traced for code smells using the corresponding software quality metrics and the rules explained in Section 4.1 (Tab. 2). The tracing process produces an initial version of the software metrics dataset that is further discretized to facilitate the incoming matching process. Afterward, the preprocessed dataset is fed into the integrated WOA framework, which initializes a population of probable classification rules and performs exploitation and exploration process of the search space. The resulting classification rules are determined by the Fisher criterion and the performance on testing data is measured.

Performance Measure
We use a confusion matrix model to measure the performance of our algorithm on test data. The performance parameters of the detection framework are shown in Tab. 4. True positive (TP) indicates smell types detected correctly as a smell, false positive (FP) indicates non-smells detected incorrectly as a smell, True negative (TN) indicates non-smells detected correctly as it is not a smell, and false negative (FN) indicates smell types detected incorrectly as it is a non-smell.
We use precision and recall (Eqs. (11) and (12)) to evaluate the performance of our proposed smell detection framework. Precision is the number of relevant code smells among the found ones, while recall is the number of the total amount of relevant code smells that were retrieved.
Precision (PR) = TP TP + FP (11) Recall (RE) = TP TP + FN (12) Suppose that the proposed detection framework identifies 10 code smells in a system containing 14 modules (including both actual code smells and non-code smells). Of the 10 identified as code smells, 7 are code smells (TP), whereas the rest are not (FP). The model precision is 7/10, while its recall is 7/14. In this case, precision is "how beneficial the results of the detection model are," and recall is "how full the results are."

Experimental Results
Tab. 5 reports the performance evaluations (confusion matrix) of each software system with 80% training and 20% testing of data. A confusion matrix was used to calculate precision and recall on the five software systems and showed promising results. Our proposed framework achieves both high precision and recall.

Evaluation and Discussions
Tab. 6 compares the precision and recall between our proposed solution and different searchbased algorithms such as MOGP [8], GP [32], CCEA [33], Multiobjective Immune Algorithm (MOIAS) [60], and GA [61]. The comparisons demonstrate the efficiency of the proposed solution in detecting code smells on large systems such as ArgoUML [42], Azure [43], and Xerces-J [46], and on medium systems such as Gantt [44] and Log4J [45] (Figs. 4 and 5). The WOA classifier on the five software systems is better than the other optimization algorithms in terms of precision and recall, with averages of 94.42% and 93.4%, respectively (Tab. 6).  Next, the most optimal set of search agents in the final WOA population is utilized as the code smell classification rules. The best solutions for both medium and large systems are used to further analyze the software metrics that distinguish the nine code smell types listed in Section 2. Tab. 7 provides the analysis results for both large and medium software systems used in experiments. Each type is characterized by one, two, or three software metrics. The ranges described in Tab. 7 indicate the bins resulting from discretization of the search space. For example, long method code smell (LM) is defined by the presence of McCabe numbers greater than or equal to 40 and greater than 15 for large and medium systems, respectively.

Conclusion and Future Work
Due to additional customer requests, software houses have to implement continuous change in software systems. Hence, it is necessary to avoid software engineering problems that arise during SDLC by discovering and determining the appropriate solutions. Code smells are critical software design problems that emerge from continuous changes. This research detects code smell by finding the optimal classification rules that characterize them. WOA is utilized to search for the best classification rules using the exploration and exploitation processes. The Fisher criterion guides the WOA through the search by measuring the fittest probable classification rules after matching with the training dataset. The proposed framework detects nine types of code smell on five open-source Java software projects. Experimental results demonstrate a precision of 94.42% and recall of 93.4%, which are enhanced over the previous techniques in the literature. The proposed framework provides an efficient and feasible detection model that increases software quality while minimizing maintainability time, expenses, and efforts. Additionally, our framework can define classification rules for these types in medium and large systems; they are further analyzed to distinguish the software metrics characterizing each type of the detected code smell.
Future work includes verifying the validity of this framework with other types of code smells and testing it on more datasets. Another direction for future work is automatic correction of different types, which would be of a great help to software development teams and would minimize the SDLC time and expenditure.

Funding Statement:
The authors received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.