Multi-kernel support vector regression with improved moth-flame optimization algorithm for software effort estimation

In this paper, a novel Moth-Flame Optimization (MFO) algorithm, namely MFO algorithm enhanced by Multiple Improvement Strategies (MISMFO) is proposed for solving parameter optimization in Multi-Kernel Support Vector Regressor (MKSVR), and the MISMFO-MKSVR model is further employed to deal with the software effort estimation problems. In MISMFO, the logistic chaotic mapping is applied to increase initial population diversity, while the mutation and flame number phased reduction mechanisms are carried out to improve the search efficiency, as well the adaptive weight adjustment mechanism is used to accelerate convergence and balance exploration and exploitation. The MISMFO model is verified on fifteen benchmark functions and CEC 2020 test set. The results show that the MISMFO has advantages over other meta-heuristic algorithms and MFO variants in terms of convergence speed and accuracy. Additionally, the MISMFO-MKSVR model is tested by simulations on five software effort datasets and the results demonstrate that the proposed model has better performance in software effort estimation problem. The Matlab code of MISMFO can be found at https://github.com/loadstar1997/MISMFO.

optimization problems in the past few years, such as chemical, economic applications, image processing, and medical 54 .Although MFO exhibits superior performance in addressing various practical issues, its spiral search mechanism makes the algorithm focus more on local exploitation than global exploration 55 and it is easy to fall into local optimal solutions during the optimization process 54 .
To overcome these limitations, considerable research efforts have been made in recent years.These advancements have notably enhanced the performance of MFO in terms of convergence speed and exploratory capabilities.However, several constraints still remain that require further resolution.First, the rapid loss of population diversity during the search process remains an unresolved issue.Second, the adaptive adjustment of the search strategy across different phases requires further investigation.
In this study, an improved MFO algorithm called MISMFO is proposed, which aims to optimize the hyperparameters of the MKSVR-based software effort estimation model.By addressing the limitations of traditional MFO variants and incorporating innovative mechanisms, MISMFO seeks to achieve superior optimization performance and enhance the predictive accuracy of the model.The main contributions of this paper are as follows: 1. Logistic chaotic mapping is employed to generate the initial population of moths, which ensures the diversity of the initial population in MISMFO.2. The flame mutation mechanism is utilized to enhance population diversity during the search process, enabling the algorithm to escape local optima and explore new regions in the solution space.Additionally, the flame number phased reduction mechanism is introduced to improve the search efficiency.3.An adaptive weight mechanism is implemented to update the positions of the moths, enabling the moths to dynamically adjust their strategy based on the ratio between their fitness and the optimal flame fitness, thus improving the convergence speed and accuracy of the algorithm.4. The MISMFO algorithm was compared with other optimization algorithms and MFO variants on fifteen benchmark functions and CEC 2020 test set.Additionally, MISMFO was employed to optimize the hyperparameters of MKSVR for estimating software effort.The results verify the effectiveness of the proposed model.
The rest of this paper is organized as follows: section "Related work" briefly introduces the original MFO algorithm and the MKSVR.Section "Methodology" details the MISMFO algorithm and the construction process of the MISMFO-MKSVR model.Section "The proposed method" compares the performance of MISMFO with other optimization algorithms using benchmark functions.Section "Experiment" validates the effectiveness of the MISMFO-MKSVR model on five publicly available software datasets.Section "Case study" summarizes the research findings in a forward-looking manner.

Related work
Meta-heuristic algorithms have demonstrated their efficacy in addressing complex problems characterized by high dimensionality, multi-modality, and non-differentiability 42 .As a result, they have been widely applied across various fields, such as community detection 56 , engineering cases 57 , System identification 40 , medical diagnosis 58 , image segmentation 59 and multi-objective optimization 60 .Meta-heuristic algorithms can be generally divided into two categories: non-nature-inspired and nature-inspired (NI) algorithms.Although a few algorithms have been developed in the first category, such as Taboo Search (TS) 61 .The majority of meta-heuristic algorithms are inspired by nature and have been widely applied to optimization problems 62 .These NI algorithms can be further classified into three categories: evolutionary, physics-based, and swarm intelligence algorithms.Evolutionary algorithms (EAs) represent a class of iterative optimization algorithms that emulate the evolutionary processes found in nature.The most well-known EAs are GA and DE.Physics-based algorithms mimic physical rules in nature.Some popular algorithms in this category are black hole (BH) 63 , atom search optimization (ASO) 64 .Swarm intelligence algorithms (SIs) are inspired by the collective behavior of social creatures, based on swarm intelligence and evolution theory, they can generate a set of random solutions to automatically investigate the whole search space through multiple iterations until the optimal solution is found.Some of the advanced and newest algorithms in this category include GWO, MFO, HHO, SMA, DMO.The Moth-flame Optimization (MFO) is a population-based stochastic search algorithm, which was proposed in 2015 65 .In MFO, moths and flames are employed as candidate solutions, where the optimal flame signifies the current optimal solution.Moths adjust their positions through a spiral trajectory in each iteration, while the algorithm iteratively updates the flame position to seek the optimal solution.Owing to its simplicity, minimal control parameters, and ease of implementation, MFO has been widely applied for parameter optimization, etc. Wei et al. considered that the MFO more accurately and converges faster compared with traditional optimization algorithms (e.g.PSO, GA), and used LS-SVM based on the MFO optimization to diagnose the bearing faults 66 .Kalita et al. used MFO to optimize the hyperparameters of SVM in a dynamic environment and verified that the model optimized by MFO had a higher accuracy and better performance in intrusion detection system 67 .Talaat et al. optimized an artificial neural network (ANN) by MFO to improve its arithmetic accuracy 68 .
Since the introduction of MFO, various modifications have been made to overcome its limitations and enhance its performance.Lin et al. introduce an inertia weighting strategy and the Cauchy mutation operator to improve the moth-flame optimization algorithm.The former balances the search and mining capabilities at the population location search equation, and the latter helps to increase the diversity of the masses and to void avoid entrapment into local optima 69 .Wang et al. adopted two chaotic strategies to improve MFO to increase population diversity 70 .Pelusi et al. divided the search process of MFO into three phases, and proposed corresponding search strategies for different phases to balance the relationship between exploration and exploitation 71  www.nature.com/scientificreports/(DOL), incorporating a modified DOL strategy to effectively address the issues of premature convergence and convergence to local optima of the basic MFO 72 .In subsequent research, they further developed two enhanced variants of MFO, specifically designed for multi-objective optimization problems and COVID-19 CT image segmentation 59,60 , respectively.Wang et al. employed inertia weights, uniform initialization and spiral curve updating mechanism to enhance the global search capability of MFO 73 .Shan et al. proposed a double adaptive weighting mechanism, which enables the algorithm to adaptively adjust the search strategy in different periods, thus achieving the flexible conversion between the exploration and exploitation 58 .Zhao et al. investigated the effects of mutation and chaotic mechanism on MFO, and applied the improved MFO to real optimization problems 54,[74][75][76] .Elaziz et al. applied the opposition-based learning technique to generate the optimal initial population, meanwhile, the differential evolution was used to improve the exploitation ability of the MFO 77 .Sharma et al. employed an opposition-based learning mechanism to initialize the search population for enhancing exploration, as well utilized levy flight distribution to avoid the stagnation of solutions in local optima 47 .Nguyen et al. hybridized levy flight and logarithmic functions for updating the flame to improve the optimization performance of the MFO 78 .Jia et al. used the adaptive inertia weighting mechanism to enhance the exploration and development of the algorithm and improved MFO by combining it with practical problems to achieve better results 79 .

Methodology Moth-flame optimization algorithm
The MFO algorithm is a novel swarm optimization algorithm proposed by Mirjalili 65 , which is inspired by the nocturnal navigation behaviour of moths and the MFO algorithm is based on mathematical modelling of moths, flames and the positional relationship between them.The search space of the moth is the solution space of the target problem, and the position of the moth in the search space is the problem variable.In order to find a better solution, the moth will performs a spiral search around its corresponding flame and update its position.
The flame is the optimal position obtained by the moth so far.The position of the moth can be represented by where n is the number of moths and d is the dimension of the search space.The fitness value of each moth can be calculated by the fitness function and stored in the vector OM as Since each moth has a flame corresponding to it, so the flame matrix has a similar structure to the moth matrix, which can be defined by the matrix F in Eq. ( 3), and its fitness value is shown in Eq. ( 4).
The initial position of the moth population is determined by where m ij denotes the position of the i-th moth in the j-th dimension.lb j and ub j denote the lower and upper bounds of the j-th dimension of the search space, respectively.Denote lb = [lb 1 , lb 2 , . . ., lb d ] , ub = [ub 1 , ub 2 , . . ., ub d ] and r is a random value between 0 and 1.During the search process, when the moths spiral around the flames to locate the global optimum, each moth corresponds to a flame for position update.However, the moths updating their positions relative to n flames may reduce the efficiency of seeking the optimal region, and in order to balance the exploration and exploitation, the number of flames is dynamically reduced in the search process according to the following equation where f no is the number of flames, t and T are the current and maximum number of iterations, respectively, and round is the rounding function.As the number of flames decreases, there are some changes in the correspondence between moths and flames, with the first f no -th moths corresponding to the f no flames, and the subsequent moths corresponding to the f no -th flame.The position of moths is updated during the search process according to the following equation: where D t i denotes the Euclidean distance between the i-th moth M i and its corresponding flame F i at the t- th iteration.b is a constant defining the shape of the spiral curve.l is a random number between [r, 1], and r = −1 + t(−1/T) .The r value decreases linearly from −1 to −2 with the iteration process.Until the maximum number of iterations is reached, the position of the first flame and the fitness value are returned, which is the optimal solution, and the MFO algorithm end.

Multiple-kernel SVR
SVR is a non-linear kernel-based regression method which tries to find a regression hyper-plane with small rick in high-dimensional feature space.Among the various types of SVR, the most commonly used is ε-SVR which introduces an ε-insensitive loss function.Given a set of samples x i , y i l i=1 , where x i ∈ R n and y i ∈ R , and l is the number of samples.The objective function of ε-SVR is defined as where C is a parameter which gives a trade-off between model complexity and training error, ξ i and ξi are slack variables for exceeding or being below the output value by more than ε , respectively.Denote φ : X → F as a non- linear mapping function from the input space to a feature space F .The regression hyperplane in F is derived as where w and b are regression coefficients and bias, respectively.To solve Eq. ( 9), the Lagrange function is intro- duced and Eq. ( 9) is transformed into the following dual form where α i and αi , i = 1, . . ., l are Lagrange multipliers, and α = [α 1 , α 2 , . . ., α l ] and α = [ α1 , α2 , . . ., αl ] .The K(x i , x j ) is a kernel function which represents the inner product φ(x i ), φ(x j ) .
The Eq. ( 11) can be solved by SMO.Suppose α * i and α * i , i = 1, . . ., l are the optimal values obtained.The final regression function is then given by: The traditional SVR method uses a single mapping function φ and therefore yields a single kernel function K.However, when a dataset has a locally varying distribution, using a single kernel may not capture the varying distribution well 80 .Multiple-Kernel Learning (MKL) can help address this issue, MKL combines several mapping functions to do aggregate mapping.A simple direct sum fusion applies a vector of M mapping function 80 , i.e., to map the input space to the feature space.The weighted sum fusion is adopt with the following mapping function: where 1 , 2 , . . ., m are weights of component functions.Denote the weight vector = [ 1 , 2 , . . ., m ] .The objective function of multiple-kernel SVR is as follows, where is the vector of function mappings of Eq. ( 14).Likewise, by introducing the Lagrangian as usual, the Eq. ( 15) can be converted to the following dual form, where is a weighted sum of M kernel function K 1 , K 2 , . . ., K M , corresponding to mapping functions φ 1 , φ 2 , . . ., φ M , respectively.Now, if we find , α and α by solving Eq. ( 16), the regression function of MKSVR is

The proposed method
While the MFO algorithm boasts advantages such as rapid convergence and a simple structure, it faces challenges in balancing exploration and exploitation.Additionally, the population diversity decreases too rapidly during the iterative process, making the algorithm easy to fall into the local optimum.To solve the above problems, a MFO algorithm with multiple improvement strategies called MISMFO is proposed.In MISMFO, firstly, the moth population is initialized through the logistic chaotic mapping mechanism,which ensures the diversity of the initial population.Secondly,the mutation and flame number phased reduction mechanisms are proposed to increase the population diversity during the iteration process, which improves the probability of the algorithm jumping out of the local optimum and enhances the exploration.Finally,a novel adaptive weight mechanism is employed to update the moth positions, which accelerates convergence and improves the accuracy of the algorithm, as well balance the exploration and exploitation.(ξ i + ξi ),

Population initialization based on logistic chaotic mapping
The diversity of the initialized population significantly influences the performance of swarm optimization algorithms.A more diverse initial population enhances optimization performance and accelerates convergence.
The traditional MFO use a random function based on a uniform distribution to generate the initial solutions.
To create a more diverse population, this paper introduces a chaotic mechanism to initialize the moth population.Chaotic techniques are effective in enhancing algorithmic convergence rates and preventing premature convergence to local optima 81,82 .In particular, chaotic sequences surpass probability-based random sequences in improving the traceability and stochasticity of the initial population, thereby increasing its diversity 74 .Among these, logistic chaotic mapping is particularly advantageous due to its simplicity, efficiency, and ability to generate highly diverse initial populations 83 .In this paper, the logistic chaotic mapping is used to initialize the moth population with the following mapping formula: where L c is the c-th chaotic variable.µ ∈ [0, 4] is the bifurcation parameter indicating the level of chaotic.Figure 1 show the variation in logistic chaotic mapping values for different values of µ .It is evident that the performance of the logistic chaotic map is significantly influenced by the bifurcation parameter.When µ = 4 , the system will be transformed into a state of total chaotic, and the initial population position based on Logistic chaotic mapping is calculated as follows: where L j denotes the random number generated by the logistic chaotic mapping on the j-th dimension.

Flame update based on mutation mechanisms
In the traditional MFO, the flame is determined by the moths sorted according to their fitness values.When the number of iterations is 1, the flames are obtained by sorting the initial population.When the number of iterations is greater than 1, the moths of the current iteration and the previous iteration are sorted, and the top n moths are chosen to obtain the flames.This interaction pattern allows rapid information transfer between the moth population and the flame population.When a single moth falls into the local optimum, it can quickly jump out of the local optimum with the help of flames, but when most moths fall into the local optimum, the algorithm may get trapped in the local optima.To avoid early stagnation of the algorithm, this paper proposes a flame updating mechanism inspired by genetic algorithm mutation.This mechanism uses perturbation and mutation factors to disturb the original flame, enhancing population diversity during the search process and improving the ability to escape from local optimal solutions.Furthermore, since the flame with higher fitness in the population often contain valuable information about the optimal solutions, only the flames with worse fitness are selected for mutation.The mutation mechanism is explained in detail below.
1. Determine the mutant flames and its number in each iteration.In order to retain the valuable information of the optimal solution, the mutation is mainly performed for the n mu flames which ranks lower in the original flame FI = [F 1 , F 2 , . . ., F n ] .Due to MFO algorithm prioritize the exploration in the early search phase and convert to exploitation in the last search phases, the number of mutant flames should change accordingly.For this reason, a perturbation factor rd is introduced to control the number of mutant flames.
( where R ∈ [0, 1] is the maximum perturbation proportion.Then the number of mutant flames n mu is cal- culated as follows, where n mu(t) is the number of mutant flames at the t-th iteration, and rd t is the value of the perturbation factor at the t-th iteration, i.e., the proportion of mutation of the original flame at the t-th iteration.Thus, the flame to be mutated can be obtained as follows, 2. Mutation operations on mutant flame FM .The moth population should explore the whole search space as much as possible in the early phase, while focus on exploit the region near the optimal solution.Therefore, The mutation degree of the flame should be greater in the early phase and gradually decrease over time.To control the mutation degree of the flames, the flame mutation factor fg is introduced, which is calculated as follows where r ′ ∈ [0, 1] is a random parameter.Obviously, fg decreases as t increases.The mutation operation for n mu flames in FM is as follows, where fm ij denotes the value of the i-th mutated flame in the j-th dimension, r ∈ [0, 1] is random parameter.The mutated flame FM ′ is obtained.

Flame number phased reduction mechanism
The update of the moth positions is guided by the flame.According to Eq. ( 6), the number of flames linearly decreases during the search process, causing the search space for moths to gradually narrow down.The population diversity is lost rapidly in the optimization process, thus the MFO algorithm may fall into the local optimum and converge prematurely.To improve the search efficiency of the algorithm, this paper proposes a flame number phased reduction mechanism that divides the whole search process into three phases: (I) In the first phase, the moths focus on the exploration, since the number of flames should decrease slowly to enhance the moths' ability to explore the entire search space.(II) The second phase is the transition phase, the moths gradually shift its focus from exploration to searching for the optimal region, thus the number of flames should decrease steadily to improve the search efficiency.(III) In the third phase, the moths concentrate on the exploitation.To accelerate the convergence of the algorithm, only the optimal subset of flames is selected to update the moths' positions.Therefore, the number of flames should decrease rapidly.For this purpose, phase division factors δ 1 and δ 2 are introduced to divide the search process, and the specific division calculation is as follows, where P 1 is the number of iterations in the first phase, P 2 is the number of iterations in the first two phases.
, where δ 1 is the division factors for the first and second phases, while δ 2 is the division factors for the second and third phases.The whole search process will be divided into three phases: [1, P 1 ] , [P 1 , P 2 ] and [P 2 , T] , then the number of flames reduced in different phase is calculated as follows, When the population size n = 50 , maximum number of iterations T = 300 , δ 1 = 0.7 and δ 2 = 0.8 .The number of flames decreases according to different phases is shown in Fig. 2.

Moths positions update based on adaptive weight
In the traditional MFO, the moths search for the optimum in a spiral way around the flames, the update of the moths positions is exclusively influenced by the corresponding flames, and the moths follow a fixed spiral curve as their optimal path during the search process, which makes the moths easy to fall into the local optimum.
Hence, to improve the accuracy of the algorithm, this paper introduces the optimal flame to participate in moths positions update along with their corresponding flames, and employs adaptive weight to regulate the influence of the two flames on moths positions update.In the exploration phase, i.e., [1, P 1 ] , the moths should scour the entire search space freely to identify the potential optimal solution regions, thus their positions update should not be influenced by the optimal flames and use Eq. ( 7) for the position update calculation.
In the transition and exploitation phase, i.e. [P 1 , T] , the moths should gradually approach the potential region of the optimal solution and search around it.To accelerate the convergence and prevent the moths from getting trapped in local optima during the approach process, the optimal flame is introduced to participate in the moths positions update.Hence, the moths positions update mechanism in this phase is as follows where F t i denotes the i-th flame generated at the t-th iteration, F t best represents the optimal flame obtained so far, and ω i denotes the adaptive weight of the i-th moth at the t-th iteration.The ω i is mainly determined by the fitness of the current moth and the optimal flame, which is calculated as follows, where OM i is the fitness value of the i-th moth, OF best is the fitness value of the optimal flame.
When the current moth is far from the optimal position then the ω i tends to 0, the next moth position is updated most in relation to the optimal flame and less in relation to its corresponding flame, which helps the moths escape from the local optima and quickly approach the region of the optimal solution.Therefore, it can accelerate the convergence of the algorithm.Conversely, when the current moth is located near the optimal position, the ω i tends to 1.It means that the current moth is updated mainly with respect to its corresponding flame, which allows the moth to conduct a thorough search near the optimal solution region, thereby improving the accuracy of the algorithm.Finally, the pseudo-code for the proposed MISMFO algorithm is given in Algorithm 1.

Kernel function selection and the process of model construction
Since the performance of MKSVR is greatly affected by the hyperparameters and kernel parameters, the proposed MISMFO algorithm is used to optimize the weights of kernel function, hyperparameters and kernel parameters in MKSVR.The common kernel functions include linear kernel, Gaussian kernel, polynomial kernel, and Sigmoid kernel.Among them, the Gaussian kernel is a typical local kernel function with strong local generalization ability, while the polynomial kernel is a typical global kernel function that can better capture nonlinear relationships in data and has efficient computation.In this paper, a linear combination of Gaussian and polynomial kernels was employed as a multi-kernel function.The Gaussian kernel is defined as follows: where σ 2 is the variance of the Gaussian kernel, while the polynomial kernel function is defined as where γ > 0 is the slope, c is the intercept and d is the power parameter.Hence, the kernel function used in this paper can be defined as follows: where the w 1 is the weight of Gaussian kernel function, and w 2 is the weight of polynomial kernel function.The flow of model establishment is shown in the Fig. 3.

The complexity of the MISMFO algorithm
The big-O notation is used for the time and space complexity of MISMFO algorithm.The time complexity of the proposed MISMFO depends on the initialization of moth position(T IMP ), mutation operation ( T MO ), distance calculation ( T DC ), fitness calculation of moth position ( T FCMP ) and flame generation ( T FG ).Let maximum iter- ate number, variable number, and moths' number, the time complexity of the fitness function are denoted by T, D, N and L, respectively.Here we will use time complexity for the comparison of both MISMFO and MFO algorithms.Computational complexity for sorting N-flame and N-moth are lying between O T * 2N * log(2N) and O T * (2N) 2 towards worst and best case.The overall time complexity of the proposed MISMFO ( T MISMFO ) for the worst case is as follows, Therefore, the complexity of MISMFO is roughly the same as the regular MFO method in 65 .The space complexity ( S MISMFO ) of the proposed MISMFO algorithm is the maximum amount of space used at any time which is considered during its sorting process in each iteration, hence we have S MISMFO = O(2 * N * D) , and both MFO and the proposed MISMFO have the same time and space complexity.

Experiment
In this section, the proposed MISMFO algorithm and other comparative algorithms are evaluated on two test suites.The first test suite is the 15 classical benchmark functions selected from 84 , and the second test suite is the standard IEEE CEC 2020 test set.To analyze MISMFO, three groups of comparative tests are carried out to evaluate the performance of MIS-MFO algorithm.Firstly, the convergence and scalability of the MISMFO, along with the basic MFO 65 and its variants (IMFO 79 , CMFO 70 , WCMFO 69 ), are analyzed based on the classical test functions.At the same time, the other 9 meta-heuristic algorithms such as WSO 85 , SCA 86 , DA 87 , GOA 88 , GA 89 , DE 44 , SHADE 90 , LSHADE 91 , COLSHADE 92 are compared.Secondly, the MISMFO is tested based on CEC 2020 test set and compared with other 13 algorithms.Thirdly, an ablation experiment is conducted on the proposed MISMFO to determine the contribution of each component to its overall performance, followed by a diversity analysis to determine how effectively MISMFO preserves population diversity during the optimization process.Furthermore, a sensitivity analysis is undertaken to examine the robustness of MISMFO in terms of its parameter values, specifically focusing on µ , δ 1 and δ 2 .The experiment results indicate that the MISMFO model operates at its best when µ = 4 , (32) www.nature.com/scientificreports/δ 1 = 0.7 and δ 2 = 0.8 .Therefore, in this paper, the values of µ , δ 1 and δ 2 were set to 4, 0.7 and 0.8, respectively.The basic parameters of the algorithms are shown in Table 1.Each algorithm is run independently 30 times for each function, with population size of 30 and maximum number of iterations of 300.All simulation experiments were conducted on Windows 11 Home Edition with a system configuration of AMD Ryzen 7-7840H with Radeon 780M Graphics 3.80GHz.RAM 16.00G, and the language is MATLAB_ R2023a.

Benchmark function
In this subsection, the detailed descriptions are provided for both the fifteen classical benchmark functions and the standard CEC 2020 test set, which are utilized to evaluate the algorithms.The 15 classical benchmark problems are divided into three groups: six high-dimensional unimodal functions (F1, F2, F4, F5, F6 F7) to evaluate the local search capability of the algorithms, five high-dimension multimodal functions (F9, F10, F11, F12, F13) to evaluate the global search capability of the algorithms, and four fixed-dimension multimodal functions (F19, F20, F21, F23) to evaluate the convergence performance of the algorithms.The standard CEC 2020 test can be divided into four groups: (1) Uni-modal problems (f1); (2) Multi-modal problems (f2-f4); (3) Hybrid problems (f5-f7); (4) Composition problems (f9-f10).The basic information of the fifteen classical functions and their 3D graphs are shown in Table 2 and Fig. 4, while the summary of the IEEE CEC2020 test functions is shown in Table 3.

Scalability test
In this subsection, based on the benchmark functions (F1, F2, F4, F5, F6, F7, F9, F10, F11, F12, F13), the convergence and scalability of the MISMFO algorithm are analyzed.The MISMFO, the basic MFO, and its variants IMFO, CMFO, and WCMFO are tested across three different dimensions (30, 100, and 500), while all other experimental conditions remain constant.The scalability test evaluates the performance of the algorithm when the problem dimensions and the overall proportion change.The detailed experimental data of the scalability test are shown in Tables 4, 5 and 6, where Avg denotes the mean of 30 independent experiments and Std denotes the standard deviation of 30 independent experiments.As the dimension of the search space expands, both the Uni-modal benchmark functions and the Multimodal functions present increasing challenges.WCMFO is better than MISMFO at a dimension of 30 for F7, 100 for F1, and 500 for F2 and F11.However, MISMFO outperforms MFO, IMFO, WCMFO, and CMFO in other dimensional ranges on many problems.In addition, Figs. 5, 6 and 7 show the partial convergence curves of MISMFO, MFO, IMFO, WCMFO and CMFO in different dimensions.These figures clearly show that the convergence speed of the MISMFO algorithm is significantly faster than that of the other four algorithms compared in the scalability test.
MISMFO is capable of finding competitive solutions in both low-dimensional and high-dimensional problems.This may be attributed to the introduction of the Logistic chaos mapping and the flame number phased reduction mechanism, which significantly increase population diversity.Furthermore, the implementation of flame mutation techniques and adaptive position update mechanism enables MISMFO to escape local optimum and more likely achieve a better solution.Taken together, it can be concluded that MISMFO outperforms the basic MFO in different ranges of dimension.

Comparisons on classical benchmark functions
In this subsection, the performance of MISMFO is compared with the basic MFO, its variants IMFO, WCMFO, CMFO, and nine other meta-heuristic algorithms on fifteen benchmark functions.Among these, GA and DE are established as classical algorithms, while WSO, SCA, DA, GOA, DSHADE, LSHADE, and COLSHADE are recognized as novel algorithms proposed in recent years.Additionally, to accurately evaluate the performance    www.nature.com/scientificreports/differences between these algorithms, both the Friedman test 93 and the Bonferroni-Dunn test 94 are conducted on the experimental data from the fifteen benchmark tests.Table 7 shows the results of MISMFO and thirteen other optimization algorithms, each independently executed 30 times on 15 benchmark test functions.The dimensionality is set to 50, with other experimental conditions remaining constant."Avg" denotes the mean of the test results, while "Std" denotes the standard deviation of the test results.The experimental data shows that MISMFO achieved the highest accuracy on 11 out of 15 test functions.For high-dimensional unimodal functions, MISMFO exhibited superior performance on F4, F5, and F7.Although it did not secure the highest accuracy on F1, F2, and F6, its performance was notably competitive relative to the other 12 algorithms, which indicates that MISMFO has better local search capabilities.For highdimension multimodal functions F9-F13, MISMFO consistently achieved optimal precision, and its std values demonstrated robust competitiveness, indicating its ability to effectively jump out of the local optima and its superior global search abilities.For the fixed-dimension multimodal functions, MISMFO successfully located the global optima on F19, F21, and F23, and exhibited the smallest std values.Although it did not find the optimal solution on F20, its performance still surpassed most other algorithms, particularly MFO and its variants.This demonstrates that MISMFO possesses considerable stability and convergence capabilities.According to the convergence curves shown in Fig. 8, it can be observed from the figure that MISMFO converges rapidly in the initial stages and continues to search for the optimal values in the later phases.
Figure 9 presents the Friedman test results of MISMFO compared to other optimization algorithms on fifteen sets of experimental data.The Friedman test at significance level α = 0.05 , where the original hypothesis indicates that there is no significant difference between this method and the comparison method, and if the original hypothesis is rejected, the Bonferroni Dunn test is used as a post − hoc test to analyze and compare  the differences in ranking between the different algorithms.The Critical Difference (CD) is used as a criterion to judge whether there is a significant difference between the different methods, and is calculated as follows: where n is the number of methods involved in the comparison and T is the number of data sets.The significance level of α = 0.05 and n = 9 corresponds to q α = 4.1047 , which gives CD = 6.27(n = 14, T = 15) .Friedman test is used to assess whether there is a significant difference in the performance of all algorithms on a set of test data.The average rank of MISMFO is 2.0, while the average rank of MFO is 10.27, which indicates a significant improvement in the performance of MISMFO compared to the basic MFO.Among the fourteen algorithms tested, MISMFO is ranked first, which indicates that MISMFO has better search optimization capability compared to the other compared algorithms.

Comparisons on CEC 2020 test set
To further evaluate the performance of MISMFO, it is analyzed based on CEC 2020 in this section.The MISMFO is compared with the other 13 algorithms (IMFO, WCMFO, CMFO, MFO, DE, SHADE, LSHADE, COLSHADE, WSO, SCA, DA, GOA, GA).Table 8 shows the average and standard deviation of each algorithm running independently 30 times.Figure 10 presents the results of Friedman test for each algorithm on CEC 2020.According to Table 8, MISMFO achieves the best solutions for f1, f2, f4, f8, f9, and f10 compared to the other 13 algorithms.Although MISMFO does not always find the optimal solution for other problems, its average and std values are highly competitive relative to the other algorithms.Furthermore, the ranking results from 10 test problems clearly show that MISMFO consistently ranks first, outperforming all other algorithms, followed by SHADE, LSHADE, DE, IMFO, COLSHADE, WSO, CMFO, MFO, WCMFO, GA, DA, SCA, and GOA.In summary, it convincingly demonstrates that MISMFO can obtain competitive solutions when solving CEC 2020 problems.

Ablation study
In this subsection, an ablation study is conducted to evaluate the individual contributions of each component within the proposed MISMFO algorithm.The MISMFO algorithm integrates four key components: Logistic chaos mapping for population initialization (MISMFO I), a mutation-based flame update mechanism (MISMFO II), a flame number phased reduction mechanism (MISMFO III), and an adaptive position update mechanism (MISMFO IV).The evaluation was carried out on six classic benchmark functions (F4, F7, F9, F12, F20, F21).For high-dimensional functions (F4, F7, F9, F12), the configuration was set to 30 dimensions with a maximum of 500 iterations, while fixed-dimensional functions (F20, F21) were subjected to 200 iterations.
Figure 11 shows the convergence curves of the MISMFO algorithm, its individual components, and the basic MFO across the selected benchmark functions.The results reveal that the mutation-based flame update mechanism (MISMFO II) plays a pivotal role in enhancing the overall performance of the MISMFO algorithm.While the contributions of the other components are beneficial, their impacts are comparatively modest.Overall, each component individually enhances performance relative to the basic MFO, exemplifying the effectiveness of the integrated approach in MISMFO.
The ablation study conclusively underscores the essential role of each component within MISMFO, affirming that their coordinated functionality is crucial for boosting the efficacy and robustness of the algorithm across a variety of testing scenarios.

Diversity analysis
In this subsection, the diversity and balance of MISMFO were analyzed.Optimization algorithms employ a collective of agents to enhance the exploration of search spaces, thereby accelerating the process of identifying optimal solutions.Typically, agents that discover superior solutions tend to influence the overall direction of the search, promoting convergence.However, this can reduce population diversity, which in turn diminishes www.nature.com/scientificreports/ the breadth of search areas explored.Conversely, the process of intensification becomes more pronounced as the distances among agents decrease.To assess these dynamics of expanding and contracting distances among search agents, a diversity assessment as described in 59 is utilized:  where N and d represents number of search agents and design variables respectively, x j i is the dimension j of the ith search agent and median(x j ) is the median of dimension j in the whole population, div j is the diversity in each  dimension and mathematically, it is defined as the distance between the jth dimension of every search agent and the median of that dimension.The diversity of whole population (div) is then calculated by taking average of every div j .Additionally, by employing diversity metrics, it is possible to quantify the proportions of exploration and exploitation in each iteration through the application of the following equations: where div max is defined as the maximum diversity value in the whole optimization process and |div − div max | is the absolute value between div and div max .The exploration% is the link between the diversity in each iteration and the maximum diversity obtained.The exploitation% relates to the exploitation level and it is evaluated as the compliment percentage to exploration% as the difference between the maximal diversity and the current diversity of an iteration is caused by the concentration of search agents.Figure 12 show the diversity monitoring of MIS-MFO and MFO on 3 classical benchmark functions (F7, F21, F23).It can be seen that the diversity of MISMFO is consistently maintained at a superior level compared to MFO.During the search process, the diversity value of the MISMFO population is maintained at a high level, which enhances the capability of the MISMFO to escape local optima.According to Fig. 13, the proposed MISMFO algorithm was evaluated on 3 classical benchmark functions (F4, F10, F12), facilitating an analysis of the trade-offs between exploration and exploitation.The X-axis  www.nature.com/scientificreports/represents the number of iterations, while the Y-axis depicts the percentage of both exploration and exploitation activities.It is observable that the MISMFO algorithm initially exhibits substantial exploration capabilities, which progressively shift towards enhanced exploitation as the iterations advance.Before convergence, the algorithm maintains a balance between exploration and exploitation, with exploitation becoming increasingly dominant over exploration.This demonstrates that MISMFO effectively maintains a balance between exploration and exploitation, thereby swiftly locating optimal solutions to problems.

Sensitivity analysis
In this subsection, a sensitivity analysis is performed to evaluate the impacts of critical parameters on the performance of the MISMFO algorithm on three classical benchmark functions (F4, F7, F10).The parameters analyzed include the bifurcation parameter µ(µ ∈ [0, 4] ), and the phase division factors δ 1 and δ 2 ( 0 ≤ δ 1 ≤ δ 2 ≤ 1 ).The objective of this analysis is to ascertain the robustness of MISMFO to variations in these parameters and to identify settings that optimize performance.The analysis was structured into two distinct parts: firstly, assessing the effect of varying µ while maintaining constant values for δ 1 and δ 2 .Secondly, evaluating the impact of various combinations of δ 1 and δ 2 with a fixed µ .According to Fig. 1, which shows that the Logistic chaotic map stabilizes at zero and exhibits non-chaotic behavior when µ is within the range of [0, 1], the parameter µ was consequently adjusted in [1, 4] to enable a more comprehensive analysis.Figures 14 and 15 show the results of sensitivity test.Figure 14 reveals that the accuracy of MISMFO significantly fluctuates with varying values of µ , demonstrating pronounced sensitivity to this parameter.The algorithm achieves optimal performance at µ = 4 , indicating that this is the most effective setting for µ within the evaluated range.Figure 15 illustrates the effects of varying combinations of δ 1 and δ 2 on the performance of the MISMFO algorithm, demonstrating that these parameters significantly influence its accuracy.Extensive experimentation has revealed that the algorithm performs optimally when δ 1 is set to 0.7 and δ 2 to 0.8, which corresponds to the settings where MISMFO achieves its peak performance.This underscores the importance of precise parameter tuning to optimize the effectiveness of the MISMFO algorithm.

Case study
In this section, the MISMFO-MKSVR model is applied to estimate the effort of the software projects to validate its ability to solve practical problems.

Description of the data sets
Five publicly available datasets from PROMISE (http:// promi se.site.uotta wa.ca/ SERep osito ry) are selected for testing the proposed MISMFO-MKSVR model.These five datasets are COCOMO81, Maxwell, Desharnais, Miyazaki and China.The attributes of these datasets can be broadly classified into two categories, numerical   9 summarises the main features of the selected datasets, including name, number of cases, number of numerical attributes, number of categorical attributes, and the unit of effort (Table 9).

Evaluation criteria
In order to evaluate and compare the accuracy of the proposed model, this paper chooses MMRE, MdMRE 3 , MAE 24 , RMSE and R 223 as the evaluation indexes for this experiment, and the formula for each index is as follows:   where y i is the i-th true value, ŷi is the i-th predicted value, n is the test sample size and ȳi is the mean values of n test samples.As can be seen in Tables 10, 11, 12, 13 and 14, the proposed MISMFO-MKSVR model performs best on Desharnais in all five metrics.On the COCOMO81 dataset, the MISMFO-MKSVR model exhibits superior performance on four metrics, among which it significantly outperforms other algorithms on RMSE.While it does not attain the best results on R 2 , it consistently outperforms the majority of competing algorithms.On the Miyazaki dataset, despite the fact that MISMFO-MKSVR fails to be optimal on MdMRE, it significantly surpasses the performance of most competing algorithms, particularly on RMSE.On the Maxwell dataset, the results demonstrate that the MISMFO-MKSVR achieves the best performance across all four evaluated metrics.While it slightly under-performs compared to SHADE-MKSVR on MAE, it still outstrips the majority of the competing algorithms, particularly the basic MFO and its variants.

Comparisons with other optimization algorithms
On the China dataset, the results analysis reveals that MISMFO-MKSVR achieved the best performance on RMSE.Although it did not attain the optimal results on other metrics, its outcomes were very close to the best values and significantly better than those of most competing algorithms.Moreover, it exhibits significant performance enhancement compared to the basic MFO and its variants.
It is experimentally verified that MISMFO can effectively solve the parameter optimization problem of MKSVR.Meanwhile, when the MISMFO-MKSVR model is used to estimate the software effort on five public datasets, the estimation accuracy and fitting ability of MISMFO-MKSVR are markedly better than those of MKSVR prediction models optimized by the basic MFO and other algorithms.It means that the MISMFO-MKSVR model has strong competitiveness in estimation performance compared with the other 13 models, and can solve the problems of software effort estimation.

Comparisons with other software effort estimation methods
In this subsection, the performance of the proposed MISMFO-MKSVR model is compared with five established software effort prediction models (SVR-Poly, SVR-Linear, SVR-RBF, Linear Regression, ANN) as delineated in 29 on five public datasets.The experimental parameters remain unchanged, and the results are presented in Table 15.The results clearly show the superior performance of the MISMFO-MKSVR model.On the Maxwell, Desharnais, and Miyazaki datasets, MISMFO-MKSVR achieves the best performance on all five indicators.On the COCOMO81 dataset, Although MISMFO-MKSVR does not attain the leading performance on R 2 , it remains exceedingly competitive relative to other methods.On the China dataset, the MISMFO-MKSVR model outperforms on RMSE, MAE and R 2 , while it slightly under-performs compared to SVR-Poly on MMRE and MdMRE, it is very close to the best values.Moreover, it demonstrates considerable competitiveness relative to other methods.This experiment demonstrates that MKSVR generally outperforms single-kernel SVR across a majority of scenarios.The results support the adoption of MKSVR as a more effective tool for capturing the complexities inherent in software projects, thereby enhancing the accuracy of predictive analytics.The underperformance of ANN in this experiment may be attributed to its parameter settings, such as the number of hidden layers, underscoring the importance of parameter optimization.Overall, the MISMFO-MKSVR model exhibits outstanding suitability and effectiveness in addressing software effort prediction challenges.

Conclusions and future directions
In this study, a novel variant of MFO, MISMFO, is proposed and applied to the parameter optimization of MKSVR.MISMFO initializes the moth population using Logistic chaotic mapping to improve the diversity of the initial population.Subsequently, it employs a flame mutation mechanism to perturb the flames with lowerranked fitness values, thereby increasing the population diversity throughout the search process and enabling the algorithm to escape from local optima.Concurrently, MISMFO introduces a flame number phased reduction mechanism that strategically reduces the number of flames across iteration stages, ensuring that moths initially prioritize exploration and subsequently shift to exploitation in the later phases, effectively enhancing search efficiency.Finally, an adaptive weight mechanism is proposed to update the moths positions, allowing them to autonomously adjust their search strategies based on fitness values, thus balancing the exploration and exploitation, thereby accelerating convergence and enhancing accuracy.On the fifteen benchmark test functions, the convergence and scalability of MISMFO are first analyzed based on three dimensions.Moreover, MISMFO is tested and compared with 13 other optimization algorithms on both the 15 classic datasets and the CEC2020 test set, and the test results are evaluated using the Bonferroni-Dunn test and the Friedman test.The results show that MISMFO achieves superior accuracy and convergence compared to existing methods.Additionally, the proposed MISMFO-MKSVR model is applied to estimate the software effort on five publicly available datasets, showing enhanced performance over competing models in addressing the software effort estimation problem.The MISMFO algorithm demonstrates superior performance in accuracy and convergence, outstripping existing methods as confirmed by comprehensive tests.It integrates four strategic approaches to optimize exploration and exploitation, enhancing quality of solutions.Applied to software effort estimation, the MISMFO-MKSVR model proves superior to other models.However, owing to its structural complexity and sensitivity to hyperparameters such as δ 1 and δ 2 , the algorithm may pose challenges in under- standing and application for practitioners.
There is still much worthwhile work to be done in the future.While the introduction of logistic chaos mapping, flame number phased reduction, flame mutation, and the adaptive weight mechanism has significantly improved performance in solving the software effort estimation problem, it has also increased algorithmic complexity.In subsequent study, we will investigate alternative methods to efficiently manage algorithmic complexity while simultaneously enhancing performance.Additionally, we plan to utilize an ensemble learning framework to estimate software effort and optimize parameters with this optimization algorithm.

1 .
Figure 1.The bifurcation diagram of the logistic model with different values of µ.

3 .
Comparison and ranking based on the fitness value.Compare and rank the fitness values of the original flame FI and the mutant flame FM ′ , and select the top n flames to participate in the update calculation of the moth position in this round of iteration.

Figure 2 .
Figure 2. Diagram of the flame number in different stages.

Figure 5 .
Figure 5. Convergence curves on six functions in dimension 30.

Figure 6 .
Figure 6.Convergence curves on six functions in dimension 100.

Figure 7 .
Figure 7. Convergence curves on six functions in dimension 500.

Figure 8 .
Figure 8. Convergence curves on six functions in dimension 500.

Figure 9 .
Figure 9. Average rank of different optimization algorithms under Bonferroni-Dunn test on 15 benchmark functions.

Figure 10 .
Figure 10.Average rank of different optimization algorithms under Bonferroni-Dunn test on CEC 2020.

Figure 11 .
Figure 11.Convergence curves of ablation study for the MISMFO algorithm.

Figure 12 .
Figure 12.The population diversity monitor of the MISMFO on 3 benchmark functions.

Figure 14 .
Figure 14.Sensitivity analysis of parameter µ for the MISMFO.

Initialize the model parameters and set the fitness function Initialize M using the Logistic mapping Calculate the fitness value Update fp no by Eq. (29) Obtain FI by ordering the fitness value Update n mu by Eq. (24) Obtain FM by Eq. (27) Obtain F by sorting FI and FM Set the best flame as the Fbest Update W by Eq. (31)
≤ Figure 3. Flow diagram of MISMFO optimisation for MKSVR.Vol:.(1234567890)Scientific Reports | (2024) 14:16892 | https://doi.org/10.1038/s41598-024-67197-1

Table 1 .
Parameter settings for different algorithms.

Table 2 .
Basic information about the fifteen classical benchmark functions.

Table 3 .
Summary of the IEEE CEC2020 test functions.

Table 4 .
The scalability test results of five algorithms with 30d.Optimal values are in bold.

Table 6 .
The scalability test results of five algorithms with 500d.Optimal values are in bold.

Table 7 .
Comparison of MISMFO with other algorithms on 15 benchmark function.Optimal values are in bold.

Table 8 .
Comparison of MISMFO with other algorithms on CEC 2020.Optimal values are in bold.

Table 9 .
Description of the data sets.
In this subsection, the proposed MISMFO-MKSVR model and the MKSVR model optimized by other thirteen optimization algorithms (IMFO, WCMFO, CMFO, MFO, DE, SHADE, LSHADE, COLSHADE, WSO, SCA, DA, GOA, GA) are used to estimate the software effort on five public datasets, then the estimation results are compared by MMRE, MdMRE, R 2 , RMSE, and MAE.The datasets are divided into training and testing sets, and the sample proportion of the training set for this experiment is 70%.The experimental parameters of each algorithm are set unchanged, and the specific experimental results are shown inTables 10, 11, 12, 13 and 14.

Table 10 .
MMRE values on different Software data sets based on different models.Optimal values are in bold.

Table 11 .
MdMRE values on different Software data sets based on different models.Optimal values are in bold.

Table 12 .
R 2 values on different Software data sets based on different models.Optimal values are in bold.

Table 13 .
RMSE values on different Software data sets based on different models.Optimal values are in bold.

Table 14 .
MAE values on different Software data sets based on different models.Optimal values are in bold.