Increasing the explainability and success in classification: many-objective classification rule mining based on chaos integrated SPEA2

Classification rule mining represents a significant field of machine learning, facilitating informed decision-making through the extraction of meaningful rules from complex data. Many classification methods cannot simultaneously optimize both explainability and different performance metrics at the same time. Metaheuristic optimization-based solutions, inspired by natural phenomena, offer a potential paradigm shift in this field, enabling the development of interpretable and scalable classifiers. In contrast to classical methods, such rule extraction-based solutions are capable of classification by taking multiple purposes into consideration simultaneously. To the best of our knowledge, although there are limited studies on metaheuristic based classification, there is not any method that optimize more than three objectives while increasing the explainability and interpretability for classification task. In this study, data sets are treated as the search space and metaheuristics as the many-objective rule discovery strategy and study proposes a metaheuristic many-objective optimization-based rule extraction approach for the first time in the literature. Chaos theory is also integrated to the optimization method for performance increment and the proposed chaotic rule-based SPEA2 algorithm enables the simultaneous optimization of four different success metrics and automatic rule extraction. Another distinctive feature of the proposed algorithm is that, in contrast to classical random search methods, it can mitigate issues such as correlation and poor uniformity between candidate solutions through the use of a chaotic random search mechanism in the exploration and exploitation phases. The efficacy of the proposed method is evaluated using three distinct data sets, and its performance is demonstrated in comparison with other classical machine learning results.


INTRODUCTION
Classification rule mining represents a significant field of machine learning, with the objective of uncovering concealed patterns, dependencies, and relationships that underpin perform automatic rule extraction by optimizing many more objectives simultaneously.The method, which employs the SPEA2 infrastructure, an evolutionary state-of-the-art metaheuristic algorithm, has been adapted for rule mining with a bespoke representation format that has been developed.Furthermore, it addresses issues such as correlation and poor uniformity between candidate solutions through the chaotic random search mechanism employed in the exploration and exploitation phases.To the best of our knowledge, the specificities of this method, which was proposed for the first time in the literature and whose performance was tested on different data sets, can be summarized as follows: -This approach represents a novel application of the metaheuristic many-objective approach to rule mining.-It is capable of performing automatic rule extraction by simultaneously optimizing four distinct performance metrics.-A study area titled ''Many-objective rule mining'' has been proposed in the literature.
Following the publication of this article, which represents a pioneering study in this field, it is anticipated that different methods will be proposed for this problem area.-An interpretable artificial intelligence method has been proposed.It is expected that this method will lead to high-performance interpretable artificial intelligence studies.-The method allows for the working on existing values without the need for discretization steps that are frequently used in other rule extraction methods.-It uses a chaotic random search mechanism that can minimize weak statistical properties in the exploration and exploitation phases.
The following sections of the study are organized as follows: while literature summaries are presented in the Related Works section, basic information about many-objective optimization and the methodologies used are explained in detail in the Materials and Methods section.In the Result and Discussion sections, the data sets and experimental parameters used are explained, and the experimental results are shared.Finally, the article is concluded with the Conclusions section.

RELATED WORKS
Rule mining has been employed effectively to address numerous challenges for an extended period.For instance, they have been successful in identifying concealed patterns within expansive data sets and pattern discovery (Langhnoja, Barot & Mehta, 2013).Additionally, the discovery of such patterns plays a pivotal role in determining the features that exert the most significant influence on classification.Lutu & Engelbrecht (2010) demonstrated that more efficient models can be constructed through feature selection using rule mining.Rule mining is also an effective solution method in decision support systems and automatic decision-making mechanisms (Cheng et al., 2013;Hayes-Roth, 1985).The best examples of this in the literature can be seen in the business world and market analysis.While Kaur & Kang (2015) demonstrated how rule mining can contribute to the development of product recommendations and inventory management in retail and e-commerce, Krishan (2023) demonstrated that customer behaviour and demographic characteristics can also be examined with this method.
Rule mining is also employed successfully in the resolution of issues that arise in contemporary information technologies.Significant cyber security solutions can be provided by rule mining in the automatic detection of fraudulent transactions and behaviours that cannot be identified with conventional methods (Sarno et al., 2015).While Barut & Yildirim (2024) demonstrated that the minimum makespan value in cloud technologies can be optimized with rule-based methods, Sozou et al. (2017) emphasized the advantages of these methods in facilitating informatics-based scientific discoveries and generating hypotheses.Duch, Setiono & Zurada (2004) captured insights that were difficult to emerge immediately through traditional statistical analysis with rule mining and increased the depth of data understanding.This profound comprehension has facilitated the generation of bespoke, customer-centric solutions within the e-commerce domain (Okoye et al., 2013).
The machine learning methods employed in the literature for rule mining and the taxonomy of the method proposed in this study are illustrated in Fig. 1.The decision tree (Bashir et al., 2014) and Random Forest (Sirikulviriya & Sinthupinyo, 2011) based rule inference models are tree-based models that can create if-else based boards for classes with recursive mechanisms.Repeated Incremental Pruning to Produce Error Reduction (RIPPER) is a recursive induction algorithm that is specifically designed for rule-based classification (Ata & Yildiz, 2012).Bayes-based methods create classification rules based on probabilistic relationships between attributes and classes using Bayes' theorem (Langseth & Nielsen, 2006).Association rule mining algorithms such as Apriori and FP-growth (Frequent Pattern-growth) arrive at classification rules by considering class label and attribute combinations together (Bala, KaramiLawal & Zakari, 2016).Metaheuristic optimization-based approaches, including this study, treat the data set as a search space and create rules in random search processes with the generated candidate solutions (Corcoran & Sen, 1994).The number of objectives targeted when creating the rule class is a crucial factor in determining the method to be employed.
Al-Maqaleh (2021) put forth a methodology for identifying intriguing classification rules through an evolutionary metaheuristic approach.The research is concerned with the automatic discovery of augmented generation rules through the use of evolutionary algorithms.The study emphasizes the significance of accurate and engaging information for users, underscoring the necessity for high prediction accuracy and comprehensibility in classification rule discovery.Sağ & Kahramanlı Örnek (2022) present a novel classification rule mining (CRM) model based on Pareto-based multi-objective optimization, designated as CRM-PM, for datasets comprising multiple classes.In order to enhance the precision of classification, the proposed model initially treats the rule mining process as a constrained optimization problem and then transforms it into MOP (multi objective optimization problems) by leveraging the Pareto-dominance concept.This approach allows for the simultaneous optimization of two conflicting goals, which are significant challenges in rule mining.An approach for interpretable rule extraction in multi-objective metaheuristic data mining has been presented (Kalia et al., 2018).The study examined the challenges Rule inference conducted for a single purpose is referred to as single-objective (Corcoran & Sen, 1994;Yildirim, Yildirim & Alatas, 2021), whereas rule inference conducted for two or three purposes is designated as multi-objective (Yildirim & Alatas, 2021).This study introduces a novel approach to many-objective classification rule mining, which, to the best of the authors' knowledge, has not been proposed previously.
In order to highlight the advantages of the proposed method, it is necessary to consider the general limitations of classical classification rule mining methods.One such limitation is human bias.The interpretation of rules can be affected by human bias, which leads to subjectivity in the rule inference process.The method proposed in this study can minimize this effect because it performs rule inference automatically and by taking into account trade-offs between multiple objectives.In contrast, classical methods can produce complex and less interpretable rules.The proposed method offers interpretable solutions based on if-else (glass-box) without causing any discretization loss.Additionally, classical methods may encounter problems in unbalanced data sets where one class significantly outnumbers the others, leading to the creation of rules that are biased against the majority class.The proposed method, on the other hand, is stronger against unbalanced data because it performs separate random search processes with a certain number of candidate solutions for each class.Furthermore, the proposed method shares some limitations with other methods.These include scalability for the data set size and number of features, overfitting leading to classification generalization, and parameter sensitivity resulting from the internal structure of the method used.

MATERIAL AND METHODS
This section will present the fundamental mechanisms of the proposed many-objective metaheuristic optimization-based rule extraction method and the specifics of the algorithms utilized.To this end, we will initially provide a synopsis of the essentials of multi/manyobjective optimization, after which we will introduce SPEA2, the foundational algorithm of the proposed method.The details of the chaotic rule-based SPEA2 (CRb-SPEA2) algorithm developed for the proposed method and adapted for rule-based inference will be presented.

Multi/Many objective optimization
Multi/many objective optimization (MOO) deals with problem solutions that aim to optimize multiple conflicting goals simultaneously.The basic principle in MOO is to identify a solution set created by the trade-off between different targeted objectives.The number of conflicting objectives allows the relevant problem to be classified as multi-or many-objective.If the number of objectives is two or three, the problem is called multiobjective, and for more objectives, it is called many-objective.A plethora of methodologies have been proposed in the literature for the resolution of MOO problems (Stewart, Palmer & DuPont, 2021;Taha, 2020).Classical solution techniques, such as weighted sum, lexicographic and ε-Constraint, evaluate the objectives of the problem by placing them according to certain weights or in order.In these methodologies, which are referred to as prior approaches, decision makers are involved in the solution process at the outset.Decision makers can prioritize goals or reduce multiple goals to a single goal using certain ratios.Conversely, the involvement of decision makers in these methods introduces subjectivity and limitations to the exploration process.While the subjective preferences of the decision maker may result in suboptimal or biased solutions, these preferences may also lead to a narrow focus on a specific region of the search space.Another approach is the posteriori approach, also known as the Pareto-based approach.In such approaches, the objectives are treated equally, and a solution set is presented that includes non-dominated results that do not have absolute superiority over each other.The decision maker then evaluates this solution set and reaches a conclusion.These approaches are suitable for problems where the decision maker does not have to have information about the goals or where the desired goals are of equal importance.The non-domination criterion between candidates is of significant importance in the creation of the Pareto-front.In Pareto-based approaches, the domination criterion is of decisive importance.In a maximization problem such as the one presented in this study, the domination criterion is expressed as in Eq. (1).If this criterion is met, the solution vector s 1 is said to dominate the solution vector s 2 .In this context, the variable k represents the number of objectives, the variable i represents the relevant objective index in the solution vector, and the function f(.) represents the fitness function that determines the fitness value for the candidate.
∀i ∈ {1,2,3,...,k}, In addition to prior and posterior methods, there are also progressive methods in which the decision maker or designer can intervene throughout the current iteration and hybrid approaches where all these methods are used together (Luo et al., 2022).The choice of method is determined by the type of problem, the decision maker/designer and the importance of the targeted goals.However, in NP-hard (non deterministic polynomial time) optimization problems where deterministic solutions are inadequate, the increase in the number of objectives will be an important factor in determining the type of method to be chosen.In such complex problems, posteriori approaches may be preferred instead of priori approaches due to the subjectivity problem.In particular, posteriori methods using metaheuristic mechanisms are very effective in solving such NP-hard problems.Algorithms such as SPEA2 (Zitzler & Thiele, 1999;Zitzler, Laumanns & Thiele, 2001), NSGA-II/III (Non-dominated Sorting Genetic Algorithm-II/III) (Deb et al., 2002;Deb & Jain, 2014) and MOPSO (multi-objective particle swarm optimization) (Junjie et al., 2009) have been applied to a wide range of MOO problems, demonstrating their effectiveness in addressing the challenges posed by these complex problems.
In this study, the posteriori solution type was selected in order to exclude user influence in rule extraction.Furthermore, the extensive feature space employed and the high number of objectives led us to the conclusion that posteriori solutions utilizing metaheuristic mechanisms were the most appropriate.Literature review and previous experience of the authors show that SPEA2 algorithm has some advantages in this type of problems.For example, SPEA2 has better resource utilization cost since it does not perform multiple Pareto management like NSGA-II and III.It also provides a more successful diversity since it uses k-nearest based density estimation instead of the grid based distance criterion in MOGWO (Many Objective Grey Wolf Optimization) (Yildirim, 2022).

Strength Pareto Evolutionary Algorithm 2 (SPEA2)
The classical SPEA (Zitzler & Thiele, 1999) is a state-of-the-art MOO algorithm that identifies Pareto solution sets for problems containing multiple conflicting objectives.The classical SPEA, which employs evolutionary mechanisms, always considers the dominance and density factors.The dominance factor controls the superiority of the candidates over each other, while the density factor observes the distribution of the solutions found in the solution space and helps to create a balanced Pareto front.However, classical SPEA may require more generations to reach optimal solutions, which may result in slower convergence.Furthermore, the simple distance-based density estimation it employs may also diminish the efficacy of Pareto-front formation.Consequently, SPEA developers have proposed the SPEA2 algorithm, which represents an enhanced iteration of this algorithm (Zitzler, Laumanns & Thiele, 2001).While SPEA2 offers faster convergence than SPEA, utilizing elitism and tournament mechanisms, it also enables more successful Pareto-front formation with an advanced density estimation method.
SPEA2 has an initial population (P 0 ) and an empty archive (P 0 ) in the first iteration (t = 0).These two entities have a dynamic structure and are updated at each iteration according to dominance evaluations and algorithm mechanisms (P t and P t ).Generation control is carried out according to the fitness values of both population and archive individuals.The fact that individuals dominated by the same archive individuals have similar fitness values negatively affects density.To prevent this, SPEA-2 takes into account both the dominance and being dominated statuses of each candidate.The number of solutions dominated by the ith individual in P t and P t is its Strength value (S(i)) and is calculated with Eq. ( 2).Here, the symbol |.| represents cardinality, the symbol +represents multiset union, and the symbol represents Pareto dominance.S(i) = j j ∈ P t + P t i j . (2) The S(i) value is employed to calculate the raw fitness value, denoted by (R(i)), which elucidates the dominance status of the ith individual.The greater the value of R(i) determined by Eq. ( 3), the more the related candidate is dominated.If R(i) isequal to zero, the candidate is considered to be non-dominated. (3) The raw fitness value provides an indication of the dominance status of the candidate, but may be insufficient for accurate evaluation if the number of non-dominated candidates is high.Therefore, density information is also used in addition to the raw fitness value.The k -th nearest neighbour technique is used for density estimation (Silverman, 1986).Consequently, the intensity at a given point is a decreasing function of the distance to the k-th nearest data point.For each ith individual, the distances of all other individuals in the objective space to the j-th individual are calculated both in the archive and in the population and stored in a list.In the list organized in ascending order, the k-th value gives the desired distance and is denoted by σ k i .The k value depends on the population size (N ) and the archive size (N ) and is calculated with k = N + N .The density value of a candidate is inversely proportional to σ k i and is found by Eq. ( 4).Here, θ is a constant used to ensure that the denominator value is positive, and generally θ = 2. Consequently, a candidate's fitness value (F (i) isdetermined by Eq. ( 5) according to raw-fitness (R(i)) and density values D(i)). (5) The next generation is created (environmental selection) according to the F (i) values of the candidates.Initially, non-dominated individuals with fitness values less than 1 are copied to the next generation's archive, as illustrated in Eq. ( 6).During this process, the archive size of the next generation is taken into account.If P t +1 = N , the environmental selection process is completed.In the case of P t +1 < N , the previous archive and the best N − P t +1 non-dominated individual in the population are added to the next generation archive.In the case of P t +1 > N , the candidate is removed from the next generation archive until P t +1 = N is achieved.
The candidate extraction process considers the distance (σ k i ) between individual i in set P t +1 and its k-th nearest neighbour.The individual with the shortest distance is selected, as demonstrated in Eq. ( 7).In the event that there are multiple candidates with the same distance, the equality is broken by taking the second smallest distances into account.
The Simulated Binary Crossover (SBX) operator was employed as crossover operator (Deb & Agrawal, 1994).SBX generates two new individuals according to the probability distributions of the parent individuals (pr 1 ,pr 2 ).The polynomial probability distribution that ensures that the newly produced individuals are related to their parents is a function of the spread factor (β) and is expressed as in Eq. ( 8).The similarity of the new individuals to the parent individuals is determined by the constant ϕ (ϕ ∈ R + ) .A high ϕ value increases the similarity of the new individuals to the parent individuals.The spread factor value for each variable h of the candidate solution vector is distinct (β h ) and is determined by Eq. ( 9).Here, ϑ c , represents the random constant between 0 and 1 generated for the variable h.At the conclusion of the crossover process, both newly produced individuals (sv 1 ,sv 2 ) are identified by Eqs. ( 10)-( 11).
The mutation operation is performed for all vector variables of the selected individual.The amount of change in the solution vector in each variable is calculated with ( c h ) Eq. ( 12) and added to the relevant variable.The amount of change depends on the predefined mutation coefficient (γ ) and a random number (ϑ h m ) generated for the variable h .

Many objective chaotic rule-based SPEA2 and evaluation principles
This study proposes a novel many-objective metaheuristic method for machine learningbased rule extraction.To this end, the SPEA2 algorithm has been adapted to address the rule inference problem.This necessitates two fundamental changes to the SPEA2 algorithm.The first is to design an appropriate representation form for candidate solutions.The second is to determine how to evaluate the objective values according to the rule compliance.In the metaheuristic-based rule extraction technique, the representation forms and search technique of the candidate solutions to be used are of significant importance.In the proposed method, the candidate solution employs a representation format comprising three distinct vectors.Consequently, candidate solution Vi comprises sub vectors is a binary vector, indicating which attributes will be utilized in the rule that the candidate will develop.If the value of v b j is greater than the predefined λ threshold value, as indicated in Eq. ( 13), the j-th attribute is included in the rule that the candidate will develop.
Throughout the iterations, candidates obtain two values, one lower and one upper, for each attribute of the search space.A candidate keeps the upper/lower values found for each attribute in ).In an iteration, the upper and lower values found for the jth attribute must fall between the minimum (L l and maximum (U u values in the search space of the relevant attribute.Therefore, condition L l j ≤ v l j < v u j ≤ U u j must be constantly checked.At each iteration, candidates are presented with a single rule, which is updated throughout the process.The v b j value plays a pivotal role in the formation of these rules, while v b j and v b j provide context and meaning.To illustrate, consider a search for a class ''C'' that includes attributes 4., 7., and 10. in the ith iteration (v b 4 ,v b 7 ,andv b 10 > λ).In this case, the candidate's rule expression for the data set attributes (F 1 ,F 2 ,...,F n ) will be as in Eq. ( 14).This type of rule inference has two important contributions to increasing interpretability.The first is to show which attributes contribute to success and in what ranges.The second is that no discretization mechanism is needed to determine the lower and upper limits.Thus, there is no loss of information.
All candidates assess the rules they have derived throughout the iterations in accordance with their performance in the search space.A candidate's performance is contingent upon the degree of consistency exhibited by the rule they have derived for each class of data.In the evaluation of rules, the ''if'' and ''then'' components of the rule produced by the candidate solution V i are taken into account.The attribute values and class of the data being compared determine which components of the rule are deemed to be consistent with them.Following the comparison in accordance with Table 1, the true positive (TP), true negative (TN), false positive (FP) and false negative (FN) metrics of the candidate solution In multi-objective optimization methods, the contradiction between the selected objectives is of significant importance.The trade-offs resulting from this contradiction help to form the Pareto-front.Pareto solutions assist decision makers in interpreting trade-offs between objectives.Given that this study is a data mining optimization problem, it is of paramount importance to select the most appropriate metrics.In data mining, precision and recall are metrics that may conflict with one another.Precision represents the ratio of true positive predictors to all positive predictions.Recall is defined as the ratio of true positive predictions among all actual positive predictions.Consequently, in certain instances, an increase in one of these metrics may result in a decrease in the other.Two additional metrics that may conflict are accuracy and F1-score.Accuracy is a measure of the overall accuracy of the model's predictions, whereas F1-score represents the harmonic average of the precision and recall metrics.The conflict between these two metrics is particularly evident in unbalanced data sets.In this case, it would be misleading to consider only the accuracy metric.On the other hand, since F1-score combines precision and recall into a single metric, the trade-off between these two metrics cannot be seen.In data mining, it is possible to present Pareto solutions that show all trade-offs between these metrics to the decision maker with many-objective optimization.Consequently, in this study, Pareto fronts were constructed on the basis of the trade-offs between these four metrics.
The objective values for a candidate solution are calculated with the previously obtained total TP, TN, FP and FN values at the end of each iteration.Equations ( 15)-( 18) illustrate the calculation of the Accuracy (Acc), Precision (Pre), Recall (Rec) and F1-score (F1) objectives, respectively.The i-th candidate solution maintains these four metrics in the objective vector Os = [Acc,Pre,Rec,F 1].The dominance relationship between candidates is determined by comparing these objective vectors The randomness mechanism is of paramount importance in the exploration and exploitation processes of metaheuristic methods.Some traditional random number generation functions may exhibit statistical weaknesses, such as poor distribution properties, insufficient uniformity, or correlations between generated values.These weaknesses can lead to biased results, especially in simulations and modelling.In order to address this issue, researchers have examined chaotic maps due to their deterministic behaviour, longer periods and advanced statistical properties.In Yildirim et al. (2021), the authors of this paper tested the performance of chaotic functions in metaheuristic approaches.As a result of the study, it was observed that the tent function can be partially more resistant to local minima (or maxima).For this reason, tent function is also preferred in random process management in this study.It is well documented in the literature that metaheuristic algorithms employing chaotic maps yield competitive outcomes and offer flexibility, particularly during the exploration phase.This study opted to utilize chaotic maps in the randomness mechanism of SPEA2 due to the flexibility they afford, especially during the exploration process.Given the promising results demonstrated by the authors in their previous studies, the tent chaotic map (Li et al., 2017), whose mathematical expression is given in Eq. ( 19), was employed.This adapted version was designated as chaotic SPEA2 (CRb-SPEA2).
The chaotic map is dependent on the real ∂ constant and initial (X 0 ) values.By setting these two coefficients correctly, the sequence produced exhibits chaotic behaviour.In CRb-SPEA2, the V b i , V l i and V u i initial vectors of the candidate solution, ϑ c and ϑ h m values in crossover and mutation operations are generated by the Tent chaotic map function.While the pseudo code demonstrating all these processes of the rule extraction method with CRb-SPEA2 is presented in Algorithm 1, the basic mechanisms of the algorithm are shown in Fig. 2.

RESULTS AND DISCUSSION
The performance of CRb-SPEA2 has been observed for three different data sets, two of which are well known in the literature and one of which is obtained from a real engineering problem.The details of these data sets, which are both balanced and unbalanced, are provided in Table 2.The Ecoli data set (Nakai,0000), which contains 336 samples from these data sets, contains information about the protein localization sites in Escherichia coli bacteria.The RAC dataset is the second dataset and was used to determine the concrete component amounts as a result of the data obtained from the debris as a result of the devastating earthquake that occurred in Elazig (Turkey) in 2020 (Ulucan et al., 2023).The Iris data set contains class information for three different flower species.10 independent experiments were performed for all data sets.In the study, the statistical results of independent experiments performed for each data set, the rules drawn, and the training and test results of some experiments are presented separately to give an idea.In order to facilitate a comparison of the metric performances, the results of well-known state-of-the-art machine learning algorithms, namely naive Bayes (NB), k-nearest neighbor (kNN), support vector machine (SVM), decision tree (DT), Multiobjective Evolutionary Fuzzy Classifier (EFC), JRIP, Ridor, Random Forest (RF), Hyperpipes (HP), AdaboostM1 (AB) were utilized.All experimental parameters are presented in Table 3.Given that the proposed system is a supervised machine learning method, the data were divided into training and test data sets.The performance of the rules derived from the training data sets was evaluated separately with the test data, and the results of both phases were shared.In the experiments, the CRb-SPEA2 algorithm was written and tested in Python.The results of other machine learning algorithms were obtained from WEKA software.
The initial experiments were conducted on classes belonging to the Ecoli data set.The statistical results for the accuracy (Acc), precision (Pre), recall (Rec) and F1 metric values of the Pareto candidates obtained in the training and test phase experiments are presented in Tables 4 and 5. Figure 3 illustrates the sample 4D distribution (the fourth dimension is represented by color) of the training Pareto candidates that produce high-performance rules.Table 6 presents a selection of example rules that were automatically derived for the relevant classes by Pareto candidates in the training experiments.As it is known, in Pareto-based multi/many objective methods, multiple solutions are presented to the decision maker instead of a single solution.For this reason, in Table 7 and subsequent comparison tables, some of the CRb-SPEA2 based solutions will be presented.
The experimental results are discussed throughout the article, while the metric representations of the candidates are presented in [Acc Pre Rec F1] format.The large number of examples of a class in the data set increases the rule diversity, due to the  (Figs. 3,4 and 5).In these distributions, the trade-offs between the solutions, the number of solutions and the diversity in the solution space can be better seen.As illustrated in Fig. 3, distinct rule sets can be generated for the decision-maker by Pareto candidates in classes such as cp, iml and pp, which have a relatively large number of examples.In contrast, at most one or two rules are derived for classes such as omL, iml and imS, which contain a small number of data.In the Ecoli data set, CRb-SPEA2 was able to generate Pareto candidates with performance ranging between 0.900 and 1.00 in almost all classes and all metrics.These results were also observed in the test data, with the highest objective metrics obtained from class rules such as omL which have a small number of data.This is to be expected, given the nature of these classes.The limited data set resulted in wide rule intervals, which increased the rule compliance of the data set.In classes containing a relatively large amount of data, the maximum Acc, Pre and Rec metric performances generally remained above 0.900 during the training and testing phases.When compared to the results obtained by well-known ML algorithms for the same classes, CRb-SPEA2 succeeded in producing non-dominated solutions for all classes.At the same time, the highest Acc value in all classes was obtained by CRb-SPEA2.Indeed, it has achieved considerable success in certain classes.To illustrate, for the cp class [0.960, 0.960, 0.950, 0.955], the Pareto candidate demonstrated superior performance compared to other ML algorithms, except for SVM.In SVM, a non-dominance result emerged due to the high Recall and F1 metrics.In the omL class, which contains a limited number of data points, CRb-SPEA2 was able to infer a rule that could encompass all the data and outperformed the metrics of other ML algorithms.Upon examination of the sample rules generated by Pareto candidates, it was observed that mcg plays a pivotal role in the cp, im and imU classes, while gvh and alm1 also exhibited notable efficacy.In the om and omL classes, mcg has no effect, whereas chg, gvh and alm1 are determinants.
The RAC dataset was used to determine the amounts of concrete components from the classified construction and demolition wastes generated after the Sivrice-Elaz ığ (Turkey)     8 and 9.The sample Pareto candidate distribution obtained during the training phase is illustrated in Fig. 4. Upon examination of the training results, it was observed that the highest accuracy value for Pareto candidates was 0.761, 0.738 and 0.904 for Class A, B and C, respectively.
In the independent experiments, in the training phases, the Pareto curve usually consisted of three or four candidates and high diversity was observed.Although the Precision and Recall metrics reached 1.00 in different classes, the trade-offs between them were high.The prominent solutions were [0.771, 0.750, 0.750, 0.750, 0.750] for Class-A, [0.778, 1.00, 0.214, 0.353] for 0.833,0.625,0.714]for Class-C.In general, the best results were obtained for class C, and the accuracy values for this class ranged from 0.600 to 0.910.This was also the case for Precision, while Recall did not achieve very high results.
As illustrated in Tables 10 and 11, when compared to the results of other ML algorithms, the CRb-SPEA2 test results still managed to produce non-dominated values in both classes.However, the highest accuracy values were obtained by classical ML algorithms, with the NB, DT, RIDOR and EFC algorithms achieving particularly outstanding scores.Upon examination of the sample rules automatically generated by Pareto candidates during the training phase, it is seen that cement and water properties play an important role in Class-A and Class-B, while in Class-C, in addition to these, the coarse2 attribute also emerges as a dominant factor.
The final experiments were conducted on the Iris data set (Fisher, 2021), one of the most well-known data sets in the literature.The statistical results of the training and test experiments performed with this data set are presented in Tables 12 and 13.In all three classes of this data set, CRb-SPEA2 achieved results above 0.950 in all metrics.As illustrated in Fig. 5, the density of the Pareto fronts is relatively low during the training process, indicating that the density tracking function is effective in this data set, as in other data sets of SPEA2.This performance of Pareto solutions in the training experiments was maintained in the testing stages, and even some rule inferences were obtained in the Setosa class to cover all test data.Upon examination of the results presented in Table 14, it can be seen

CONCLUSIONS
This study demonstrates the applicability of metaheuristic many-objective optimization methods in interpretable rule inference-based classification problems.The CRb-SPEA2 algorithm, adapted to data mining problems, was employed in three independent experiments with three different benchmark data sets.The obtained results were also compared with those of the classical ML algorithms and the competitive non-dominated scores of the proposed method were demonstrated in all experiments.The interpretable rules derived by Pareto candidates during the training phase enable the decision maker to identify which attributes of the data set are decisive in the classification.Due to the nature of many-objective optimization methods, CRb-SPEA2 has been able to offer different rule sets with different trade-offs to the decision maker.Although high performance and explainability can be achieved, a limitation of many-objective optimization algorithms is that they are sensitive to their parameter settings and finding the right parameters can sometimes be time consuming.However, the many-objective optimization based rule mining method proposed in this paper is thought to help to create simpler and more interpretable and understandable rules by striking a balance between simplicity and complexity of rule sets.In this way, fewer and more descriptive rules can make it easier for users to understand and apply these rules.In the future, the authors aim to propose new adaptive and hybrid versions of the intelligent optimization methods for high performance rule mining problems.In addition, in order to reduce the model complexity and computational cost of this proposed method, parallel and distributed versions will be studied in order to obtain more efficient results on large data.The first applications of this method in fuzzy rule mining, association rule mining, and sequential pattern discovery are also aimed.The authors' next work is to demonstrate that metaheuristic many-objective optimization methods can be applied to different engineering problems.

Figure 3 Yildirim
Figure 3 Sample Pareto solutions obtained for each class of the Ecoli data set at the end of the training phase.Full-size DOI: 10.7717/peerjcs.2307/fig-3

Figure 4
Figure 4 Sample Pareto solutions obtained for each class of the RAC data set at the end of the training phase.Full-size DOI: 10.7717/peerjcs.2307/fig-4

Figure 5
Figure 5 Sample Pareto solutions obtained for each class of the Iris data set at the end of the training phase.Full-size DOI: 10.7717/peerjcs.2307/fig-5

by Table 1, Eq.15-18 then Eq.1-5 ) 6
. Copy all nondominated individuals in P t andP t ,to P t + 1 7.If size of P t + 1exceeds N then reduce P t + 1 (by Eq.7 ) 8. Else if size of P t +1 is less N then then fill P

Table 3 Experimental parameters.
In order to give an idea to the readers, sample Pareto solution distributions obtained from the training phases of some data sets are given

Table 6 Some example rules obtained for the classes of the Ecoli data set.
and to estimate the early age concrete strength class.The statistical outcomes of the training and testing phases with CRb-SPEA2 are presented in Tables earthquake

Table 8 Statistical results of the training stage of the RAC data set.
SPEA2 has been able to produce experimental results that are superior to those of other classical machine learning algorithms in the Setosa class.Non-dominated solutions were produced for the Virginica and Versicolor classes.It can be observed that the majority of methods achieved similar results in this data set, where classification performance was high.The most significant advantage of CRb-SPEA2 is that it offers interpretable rule sets that reveal this performance.The interpretable sample rules automatically produced by Pareto solutions in Table15demonstrate that the petal-length parameter is sufficient for the Setosa and Virginica classifications.In addition, the sepal-length and sepal-width attributes can be used for the Versicolor classification.