A hybrid optimization method by incorporating adaptive response strategy for Feedforward neural network

Particle swarm optimisation algorithm (PSO) possesses a strong exploitation capability due to its fast search speed. It, however, suffers from an early convergence leading to its inability to preserve diversity. An improved particle swarm optimiser is proposed based on a constriction factor and Gravitational Search Algorithm to overcome premature convergence. The constriction factor ensures an appropriately controlled transition from exploration into exploitation, leading to an enhanced diversity and appropriate learning rate adjustment throughout the search process. We introduce Gravitational Search Algorithm to enhance the exploratory ability of PSO. An adaptive response strategy is incorporated to activate stagnated particles to curtail the high tendency to get trapped in a local optimum. To verify the efficacy of the improvement strategies, we employ the proposed algorithm in training a Single Layer Feedforward neural network to classify real-world data ranging from binary to multi-class datasets of which our proposed algorithm outperforms the others.


Introduction
As one of the most implemented Artificial Neural Networks (ANN) architectures, Feedforward neural network (FNN) possesses an excellent ability to learn (Nayak et al., 2018). However, a challenge still exists when FNN is employed to train ANNs (Saremi et al., 2014). The traditional Back-propagation algorithm (B.P) (Jiang and Hu et al., 2019) used to train FNNs have inherent limitations, such as slow convergence, time-consuming, difficulty in adjusting the weights and thresholds, high tendency to fall into a local minimum, and high sensitivity to the choice of learning rate η (Ram & Rao, 2018).
Several researches have proposed variant metaheuristics with fast training speed, good generalisation performance, and a higher likelihood to locate the global optimal solution to overcome the challenges of FNNs (Amponsah et al., 2021;Han et al., 2019;Lalwani, 2021;Šešum-Čavić, 2020). As one of the widely used metaheuristics to effectively train FNNs, Particle swarm optimisation cannot preserve diversity throughout the optimisation process. Hence, this study proposes a hybrid PSO based on Gravitational search algorithm (GSA) and Adaptive response strategy (ARS). The proposed algorithm uses the GSA to enhance exploration, the constriction factor to switch from exploration to exploitation, and the ARS to prevent premature convergence and preserve swarm diversity throughout the optimisation. The resulting hybrid algorithm is called HPSOGSA-ARS. We summarise our contributions as follows: • An Adaptive Response Strategy (ARS) is incorporated to move stagnated particles in suboptimal solutions into promising areas. • A constriction factor comprising a convex and concave function ensures an appropriately controlled transition from exploration into exploitation. The constriction factor enhances diversity, appropriate learning rate adjustment, and increased overall performance by providing a smooth transitioning.
This article is organised as follows. Section 2 discusses previous works related to metaheuristics in optimising FNNs and preliminary information about the constitutive algorithms. We then present the details of the proposed FNN training method in Section 3. The experimental results on implemented UCI datasets, related mechanisms analysis, and running time are given in Section 4. Finally, we offer the concluding remarks in Section 5.

Related works
Regarding the training of FNNs, two types of approaches exist: iterative and noniterative (Suganthan, 2018;Zhang et al., 2021). In the non-iterative neural network training approach, the hidden nodes are randomly selected and maintained throughout the training process, whereas the output weights are computed analytically (Zhang et al., 2020). Examples of the non-iterative training approaches are as follows: In (Mukherjee et al., 2021), the authors proposed an improved version of the Sine Cosine Algorithm (SCA) called: chaotic oppositional SCA (COSCA) to solve the challenge of premature convergence. They improved the algorithm by integrating chaos theory and oppositional-based learning into the SCA optimisation process. Although the algorithm could obtain the least error by finding the best control parameters to train an FNN, it required much time and computational resources to complete execution. In (Cui et al., 2018), the authors identified the inability of exiting methods to recognise malicious code at an acceptable accuracy and speed. They therefore aimed at solving this challenge by proposing a novel approach that used a convolutional neural network (CNN) and a bat algorithm to automatically extract the features of malware images and address the data imbalance among different malware families. Although their proposed model achieved a good accuracy and speed, it requires all input images to have a fixed size resulting in limited scalability. Similar to (Yi et al., 2016), the authors of  successfully improved the Single Layer Feedforward Neural Network Extreme Learning Machine (SLFNELM) by proposing a self-adaptive mechanism that solved the limitation of SLFNELM being sensitive to the number of neurons within its hidden layer. The resulting algorithm was named: SaELM. SaELM can select the best neuron number in the hidden layer to construct an optimal Neural Network. A series of experiments confirmed SaELM to be a fast-training method capable of obtaining the global optimal solution with a good generalisation performance independent of parameter adjustments.
On the other hand, the iterative technique aims to fine-tune the weights and bias of the Feedforward neural network together with its structure until an ideal result is achieved (Sugiyama et al., 2021). Some examples of the iterative training approach are as follows: Derivative-based learning strategies generally have high computational costs due to the large number of weight values that require tuning (Eyoh et al., 2020). In (Sağ & Jalil, 2021), the authors aimed at solving the difficulty of derivative learning-based strategies by implementing the Vortex Search (VS) algorithm to determine the optimal weights and biases of the FNN. Although the authors achieved the research objective, the algorithm has a limitation regarding generalisation performance. Again, (Yang & Ma, 2019) proposed a dictionary learning approach based on the singular value decomposition (SVD). The SVD creates a compact structure of an FNN by selecting significant hidden neurons depending on their contribution to outputs. Although this algorithm could simultaneously train the FNN and optimise its network, it introduced more hidden neurons, increasing computational time and resources. In (Wang et al., 2013), the authors proposed a mother function selection algorithm that improved the accuracy and usefulness of wavelets for target threat assessment in aerial combat. Although the algorithm could construct an enhanced Neural Network with a better mean square error (MSE) value via selecting the most appropriate wavelet function, the number of functions alternatives from which the selection algorithm chose was limited to seven.
Although the iterative approach produced a superior result to the non-iterative method in most cases, it is typically computationally intensive and time-consuming (Karamichailidou et al., 2021). Therefore, researchers mostly employ the non-iterative approach by incorporating metaheuristics such as fireworks algorithm (Konda et al., 2021;Sreeja, 2019), Particle Swarm Optimisation (PSO) (Nagra et al., 2019;Neshat et al., 2020;Zemmal et al., 2021), Genetic Algorithm(G.A.) (Christo et al., 2020;Falahiazar & Shah-Hosseini, 2018), firefly algorithm (Liu et al., 2021), Gravitational Search Algorithm (GSA) (Bohat & Arya, 2018;Huang et al., 2019) with improvement strategies to enhance training performance. Similar works of this nature include the following: The authors Amponsah et al. (Amponsah et al., 2021) proposed an improved multi-leader comprehensive learning particle swarm optimiser based on Karush-Kuhn-Tucker proximity measure and Gravitational Search Algorithm. This improvement was implemented to overcome the limitation of the multi-leader comprehensive learning particle swarm optimiser in preserving diversity. The proposed algorithm outperformed other state-of-the-art FNN training algorithms used to train a Feedforward neural network for epilepsy detection. Similarly, Lei et al. proposed an aggregate learning gravitational search Algorithm (ALGSA) to improve GSA's exploitation ability during the later stages of iterations when the value of G (the gravitational constant) is sufficiently large. ALGSA used Kbest individuals to construct different gravitational fields which attracted other search agents at the later stages of iteration (Lei et al., 2020). Even though the authors achieved their research objective, this achievement came at the cost of high computation and sensitivity to the learning rate η.
Furthermore, (Rather et al., 2021) proposed a novel hybrid chaotic gravitational search algorithm (CGSA) and particle swarm optimisation (PSO) to train a multi-layer perceptron (MLP) neural network for classification purposes. Although this unique hybridisation method produced highly accurate classification outcomes, the inherent challenge of local minima entrapment prevailed during the later stages of the search. The motivation of this work is to develop yet another hybrid variant of PSOGSA that alleviates the limitations mentioned above. We seek to achieve this aim by preserving diversity, kerbing the challenge of local optima stagnation, and ensuring a proper balance in learning rate adjustment during the entire optimisation process.

Particle swarm optimisation (PSO)
Particle swarm optimisation (PSO) is a population-based stochastic optimisation algorithm developed by Eberhart and Kennedy (Russell & James, 1995). A randomly initialised group of birds initiates the mechanism of the PSO in a search landscape. Each of the birds denotes a particle. Each of the particles moves at a velocity influenced by the momentum, its prior best position (P id ) and the best position of all particles (P gd ). Supposing the search space is D and the sum of particles is n, PSO can be mathematically expressed as: where v id (t) and x id (t) denotes the velocity vector and the position of the ith particle respectively. In the t − th iteration; P id and P gd denotes the best position of the ith particle and the best position of all particles, respectively. c 1 andc 2 denotes the acceleration constants, and rand() indicates a random value within the interval of [0,1].
To improve the converging ability of the original PSO, Shi and Eberhart (Xiaohui & Eberhart, 2002;Yuhui & Eberhart, 1998) proposed an adaptive PSO that incorporated an inertia weight into the velocity equation of PSO. We present the improved PSO mathematically as: w denotes the inertial weight within [0 ≤ w ≤ 1]. Shi and Eberhart further implemented the linearly diminishing technique to adapt the inertia weight as shown below (Xiaohui & Eberhart, 2002): where "t" denotes the current iteration value; w ini , w end , T max denotes the starting inertia weight, the ending inertia weight, and the maximum number of iterations, respectively.

Gravitational search algorithm (GSA)
The gravitational search algorithm (GSA) proposed by Rashedi et al. is an exploratory optimisation algorithm motivated primarily by the mass-gravity concept. There are four requirements for any mass in GSA: position, inertia mass, passive and active gravitational masses. The position of a mass corresponds to the problem's solution, and the fitness function determines its gravitational and inertial masses (Rashedi et al., 2010). Given an optimisation problem that uses "m" decision variables and an objective function "fobj," each variable has an upper (ub) and a lower (lb) limit. The search landscape with "m" dimension is bounded by variable "d'' as: Considering a system of N masses, the position of the "ith" mass is defined by Equation (6) as follows: "x d i : The position of the ith mass in the d th dimension. Equation (7) shows the relationship between the active, inertia, passive masses, and the objective function. The greater the objective function, the higher the value of the mass.
The active, inertial, and passive masses, respectively. "fob ji (t)" is the objective function of agent "i" at time (t). The aggregate force from the set of denser bodies acting on mass i from mass j at time t is defined by Equation (8). We compute mass i s acceleration at time t in the dth direction by Equation (9). We use the fitness function in Equation (10) to calculate the masses of agents. Equations (11,12) compute the new velocity and position of a mass, respectively.
M aj : The active mass gravity relative to mass j, M pi denotes the passive mass gravity relative to mass i. G(t): denotes the gravitational constant at time t. rand i and rand j are random values within the interval of [0, 1], ε is a small positive constant value, the Euclidian gap between masses i and j at time t is denoted by R ij (t).
The gravitational constant, G is initialised on the onset and declines over time as a control mechanism of GSA's search accuracy. We depict the initial value function with time in Equation (13).
GSA's complex operations are enumerated and shown in (Rashedi et al., 2010). The two critical features meta-heuristic algorithms use to attain an optimal solution are exploration and exploitation. An algorithm searches locally to converge at the best solution during exploitation, whereas an algorithm explores the entire search landscape during exploration. In PSO, the cognitive component [P id (t) − x id (t)] of the velocity equation, Equation (1) is responsible for exploration, whereas the social component [P gd (t) − x id (t)] ensures exploitation. On the other hand, exploration in GSA is achieved and enhanced by an appropriate selection of random parameter values (G 0 and α), whereas exploitation is obtained via the slow movement of heavier agents (Rashedi et al., 2010).
Although the original GSA has a strong exploration ability, the slow movement of denser masses results in a slow search ability during the final iterations as the denser masses limit the convergence of GSA to an optimal solution (Singh et al., 2018). On the contrary, standard PSO performs a faster search with a better exploitation capacity but often cannot adequately explore the search landscape for the optimal global solution. Although merging GSA and PSO on the basis that they complement each other is expected to result in an improved optimisation algorithm, this expected outcome is not always achieved. This variation in expectation is because the resultant algorithm often gets trapped in a local optimum solution during the later iteration stages.

Improved particle swarm optimization
In this section, we improve PSO in three steps, namely: • Implementing an inertia weight and a constriction factor in a synchronously varying manner. • Replacing the cognitive component of PSO with the acceleration component of GSA • Implementing an Adaptive Response Strategy.

Synchronous implementation of an inertia weight and a constriction factor
When we use a constriction factor in PSO's velocity equation, we do not often use inertia weight in conjunction. The research performed by Lin et al. shows that the PSO algorithm with only a constriction factor (k) as well as the PSO algorithm with only the inertia weight (w) are equivalent in the case of "k" being equal to "w" (Lin & Yu, 2011). However, the constriction factor (k) and the inertia weight (w) are entirely different. This section addresses the situation where the inertia weight and functional constriction factor are used synchronously.
In the initial search stages of PSO, a particle needs to search within a large range and switch to exploit a limited search space during the later stages to obtain the global optimal solution. Therefore, k must be an enormous value during the early stages and a small value during the later stages. At the same time, k must steadily decrease to the minimum for a more extended period during the final phase of the search (Wei & Xinning, 2010). This shift pattern corresponds to the selection of convex and concave functions based on the value of the constriction factor during different stages of the search. During the early stages of the search, the algorithm relies on the value of the constriction factor to select a convex function which ensures particles find the optimal solution in an extensive range and prevent premature convergence. During the latter stages of the search, the algorithm selects the concave function to slowly adjust the constriction factor to a minimum to ensure an intensified local search. Doing this ensures the smooth convergence of the algorithm toward the global optimum solution. According to the principle mentioned above, the functional constriction factor based on the cosine function is presented in Equation (14)   To prevent premature convergence, "T": Is the number of iterations, G max = 40 according to the alternating curve of value k as shown in Figure 1. The curve of k in Figure 1 is an alternating curve depicting the synchronous change of the functional constriction factor from a convex function on the onset of the algorithm's execution into a concave function during the later execution stages. Axis "y" (i.e. k values) are the values of the constriction factor, whereas axis "x" (i.e. T values) are the number of iterations.
Incorporating the constriction factor in the standard equation of PSO, Equation (1) results in Equation (15) as follows: When we keep the inertia weight fixed, particles will maintain the same exploratory ability in search. Equation (3) represents a velocity update of the standard PSO with a fixed inertia weight. The foraging process of flocking birds inspires the altering manner of the inertia weight. According to the results of Eberhart et al., the flocking birds slow down to find the precise location of a food source when they get closer to the food source (Russell & James, 1995). If a particle is in a good position in the iterative process, the value of the inertia weight (w) must be decreased to maintain a rigorous exploitation. When the fitness value of a particle is low, "w" should be high to retain more of the initial velocity for better global optimisation. Taking into account the research findings of Huang et al. (Huang et al., 2008), Equation (16) shows the exact changes in ''w": T : number of iterations, T ∈ [0, G max ]; P gd is the global optimum position; w start is the initial inertia weight and w end is the maximum inertia weight value. We calculate w start and w end as follows: To facilitate computation, we assume the values of "k" and "w" to be constants, and we limit our calculation to one dimension. Substituting these constants into Equation (15) results in Equation (17): We further calculate the value of k by substituting c 1 * rand() and c 2 * rand() with c 1 and c 2 respectively to obtain Equation (18) as follows: Similarly, we obtain v id (t + 2) as follows: The matrix representation of Equation (20), denoted as A, is: The homogeneous matrix of Equation (20) is: The characteristic equation of the coefficient matrix for Equation (20) is given by: From Equation (22), we obtain three characteristic roots as shown below: If (1 + kw − kc 1 − kc 2 ) 2 ≥ 4kw, α, and β are real roots.

Replacement of PSO's cognitive component with GSA's acceleration component
This section further enhances PSO's global exploratory capacity by replacing PSO's cognitive component with GSA's acceleration component responsible for global exploration.
Using the resulting velocity update equation of PSO in Equation (28), the resulting velocity update equation of the hybrid PSOGSA is as follows: where v d i (t) indicates the velocity of agent "i" at iteration "t" in the "dth" dimension, c 1 = c 2 = 2 are acceleration constants, ω = inertia weight, rand() is any random number within [0, 1], a d i (t) is the acceleration of agent ''i" at iteration "t" in the "dth" dimension and P gd denotes the global best position obtained by all particles.

The adaptive response strategy
In this section, we work on diminishing the tendency of a particle to be trapped in a suboptimal location while searching for the global optimum solution. We seek to achieve this aim by developing and implementing an adaptive response strategy (ARS). ARS enables particles entrapped in sub-optimal regions to escape entrapment by adjusting their orientation to resume their search for the global optimum solution.
As the algorithm searches for the global optimum solution, diversity is easily lost and can thus get entrapped in a sub-optimal solution. We begin the process of helping trapped particles escape from sub-optimal solutions by observing the Gbest value in any iteration to determine whether it remains unchanged or changed. If the global best fitness value remains unchanged after more than three (3) successive iterations, it means that particles have stagnated. As such, we begin a cycle of positioning whereby the Euclidean distances of particles nearest to the stagnated particle are measured. We then determine the average position of the neighbouring particles relative to the stagnated particle and designate that average position as the new position of the stagnated particle. We express ARS mathematically in a unique position update Equation (30) as follows: where x i embodies x i : which is the particle's new position after the Adaptive Response Strategy (ARS) has taken effect. x i : indicates the minimum distances from itself comparative to the distances from all the nearest neighbouring particles. The random number rand i : is the actual value within [−1, 1]. Set N i holds the indexes of the closest neighbouring particle of a particle x i . Equation (30) is the new position update equation for stagnated particles. This strategy is an adaptation of a natural phenomenon of flocking birds in finding the ultimate food source and 1/|N i | n∈N i x n : describes the adaptive response of the birds in closest proximity to the bird in the stagnated position relative to the ultimate food source. From old solutions, new solutions emerge after the collaborative behaviour of the flock. Algorithm 1 describes ARS, and Figure 2 is a flowchart representing the entire operations of the proposed algorithm (HPSOGSA-ARS).
As such, we express the improved method mathematically as follows: x id (t + 1) = x id (t) + v id (t + 1) Else : Algorithm 1. Adaptive Response to Adjust Particles for Minimisation Problem.
Let N denote the number of particles and Gbest also indicate the optimum value; then the optimal solution will be represented as Sbest; for i = 1 : N Identify the nearest seven neighbours with Euclidean distance and preserve the neighbour (n) index in the set N k ; Compute the new position via Equation (32)

Computation complexity
The complexity of HPSOGSA-ARS comprises four aspects; joint enhanced exploratory search due to implementation of GSA's acceleration component, Adaptive Response Strategy, and operations of a convex function and a concave function as influenced by the value of the constriction factor. The convergence of the HPSOGSA-ARS is dependent on a progressive update of both the velocity and position equations v id and x id , respectively.
Suppose we execute the algorithm with "N" number of particles in the swarm of dimension (m × n), for k number of times, the exploratory impact of the acceleration component of GSA and the convex function results in a complexity of O(m 2 ). The impact of ARS occurs when the trap activation is triggered, resulting in a complexity O(m 2 + n). Subsequently, the exploitation effect due to the concave function during the later stages of the search process raises the computational complexity to O(m 2 + n + n). Thus, a computational complexity of O(m 2 + 2n) is required to complete the search for a global optimum solution. Supposing the maximum number of iterations until convergence is ''t," the complexity of our proposed HPSOGSA-ARS becomes O(t(m 2 + 2n).

Training feed-forward neural network using HPSOGSA-ARS and its comparing variants
This section implements our proposed HPSOGSA-ARS and other comparing variant algorithm to determine weights and biases of an FNN with a stable structure throughout the training process. The comparative methods are PSOGSA (Mirjalili SeyedAli et al., 2012), HPSOGSA (Jiang et al., 2014), and SHPSOGSA (Radosavljević et al., 2018). Also included are the standard versions of the PSO (Xiaohui & Eberhart, 2002) and GSA (Rashedi et al., 2009) algorithms. These algorithms are used to compute the combination of weights and biases that gives the FNN a minimal error rate. The essential steps necessary for the effective development of FNNPSO, FNNGSA, FNNPSOGSA, FNNHPSOGSA, FNNSHPSOGSA, and FNNHPSOGSA-ARS are as follows: Firstly, we define a fitness function that uses the error rate of the FNN to determine the fitness values of particles. Secondly, we develop a suitable encoding technique to encode the bias and weights of the comparing FNNs. These steps mentioned above are explained in subsequent subsections as follows:

Fitness function
In this article, we express the fitness function mathematically as: For an FNN with three layers, namely: "I", "h" and "O" representing the input, hidden, and output layers, respectively, the "n" in the input layer denotes the number of input nodes, "m" in the output layer indicates the number of output nodes, and h in the hidden layer represents the number of hidden nodes.
The w ij represents the ith node connection weight inside the input layer corresponding to the jth node within the hidden layer, and ith input is x i .
After measuring the outputs of the hidden nodes, the cumulative output is expressed as: In Equation (35), w kj denotes the connection weight from the kth output node to the jth hidden node, θ j indicates the bias of the kth output node and learning error, "E" (fitness function), is computed according to the mathematical formulae: where q denotes the number of training samples, d k i denotes the expected output of the ith input unit when the kth training sample is used, we represent the i-th input unit upon using the kth training sample as y k i . Thus, we present the fitness function of the ith training sample in Equation (38) as:

Encoding strategy
According to Zhang et al., when optimising FNNs using evolutionary algorithms, the matrix, vector, and binary encoding strategies are commonly used to encode the biases and weight. In matrix encoding, each particle is encoded as a matrix. In vector encoding, each particle is encoded as a vector, and each particle is encoded as a string of binary bits in binary encoding. Each encoding method has its merits and demerits that makes a specific encoding method ideal for solving a particular problem (Zhang et al., 2007). This article implements the matrix encoding strategy to train the FNN models. The generalised representation of the matrix encoding used in this article is: "a" is the weight, and "b" is the bias.

Experiments
This section implements PSO, GSA, PSOGSA, HPSOGSA, SHPSOGSA, and HPSOGSA-ARS optimisation algorithms to train an FNN consisting of 15 hidden nodes. We designate the input node equal to the number of data set attributes, and the output node equals the number of target classes. The 6 FNN training models classify the datasets, and their outputs are compared based on classification accuracy, prevention of local optima entrapment, and the convergence rate.

Datasets
We verify the optimisation capabilities of the proposed FNN training model by applying the FNN classifiers to the Wisconsin Breast Cancer (WBCD), Iris, Car Assessment, Glass Identification, Primary Tumour, and large soybean datasets from the University of California, Irvin (UCI) Machine Learning Repository (Andrew & Asuncion, 2011). These selected datasets have diversified attributes numbers, domains, and classes, as detailed in Table 1. We conduct the simulations using MATLAB R2018a on Intel Core (T.M.) i7-4800MQ, CPU @ 4.67 GHz, 8G RAM running Windows 10 Professional 64bit. The parameter settings for GSA and PSO are the same as the literature (Rashedi et al., 2009). Additionally, the maximum iteration is set to 1000, the inertia weights of PSOGSA , (HPSOGSA) (Jiang and Ji et al., 2019), and (SHPSOGSA) (Radosavljević et al., 2018) are reduced linearly from 0.9-0.4. The number of particles is 30, c 1 = 1, and c 2 = 2. The stopping criteria = maximum iteration. The proposed HPSOGSA-ARS uses a trap activating effect value = 3, and the number of closest neighbouring particles in the current best position = 7. Also, SHPSO-GSA uses an additional parameter: r 3 = [0, 1] (i.e. a randomly distributed number) and HPSO-GSA uses additional parameters: c 3 = 0.5 and c 4 = 0.5. The gravity constant (G 0 ) in GSA is set to 1 with a stopping criterion = maximum iterations.

Evaluation criteria
In this section, we obtain significant statistical outcomes by dividing the dataset into two segments: training and testing and running the experiment for "n" specified number of times on both datasets. According to Eftimov et al., statistical tests require a thorough assessment of heuristics performance (Eftimov et al., 2017). Furthermore, Yang et al. emphasised that comparing algorithms based on the average performance and the standard deviation is insufficient (Yang, 2010). As such, statistical testing is essential to show the substantial improvement of the new algorithm in addressing the limitations of existing algorithms. The following measures are used to evaluate the test results: • Average Performance (C.A.) refers to the average of all classification accuracy values obtained after executing an algorithm for "N" number of times independently. The Average performance indicates the capability of an algorithm to train an FNN to perform classification accurately. The higher the accuracy values, the better the classification capability of the training algorithm. As shown in Equation (39), the mean output is computed as: where CA i is the average accuracy value obtained at iteration i. • Mean fitness is the average fitness value obtained by evaluating the fitness function for "N" number of times. The smaller the value of the mean fitness (i.e. which indicates the average Mean Square Error (MSE) value over "N" iterations), the greater the capacity of the algorithm to find a solution near the global optima. The mean fitness function is computed as: • Standard deviation is an algorithmic indicator of robustness and stability. A higher standard deviation value implies wandering outcomes, while a lower value means that the algorithm converges at the same value most of the time. The standard deviation is calculated as: where g * i indicates the best fitness value obtained on the ith iteration. • Wilcoxon Rank Sum Test is a nonparametric test method used to determine the difference in median values for two independent populations. This test aids in examining the relationship between a numeric outcome and a categorical explanatory variable when the comparing groups are independent of each other. A larger p − value denotes a more negligible difference between the comparing groups, while the smaller p − value denotes a more significant difference between comparing groups (Malela-Majika, 2021; Wilcoxon et al., 1970). In our context, we use the Wilcoxon rank-sum to achieve a 5% significance level to assess if the results of FNN-HPSOGSA-ARS compared with the results of FNNPSO, FNNGSA, FNNPSOGSA, and FNNHPSOGSA change statistically. The best p − values are highlighted in bold compared with other p − values.

Results and discussion
Drawing on recent advances toward improving PSO to train FNNs, this article investigates whether PSO can further be improved as an effective training scheme for FNNs. In line with the research objective, the experimental results indicate a significant improvement in the performance of PSO with regards to its convergence ability, as shown in Table 2 Table 9, we observe that HPSOGSA-ARS demonstrates an improved convergence ability toward the global optimum solution at iterations within 400-600. These outcomes indicate that the proposed algorithm is highly stable, robust, and performs best at training an FNN for classification purposes on binary and multi-class datasets.
There are three possible explanations for these results. Firstly, introducing the synchronous use of inertia weight and constriction factor ensured that particles in the swarm maintained a balanced exploration and exploitation throughout the search for the global  optimum solution. Particles were restricted to search within the search boundary throughout the search, ensuring that the algorithm converged at the same locality. Closely following in order of increased robustness and stability are the PSOGSA and SHPSOGSA, which incorporate an effective balance between exploration and exploitation ability into PSO via GSA's acceleration component. On binary datasets, we observe this improved outcome in Figure 3, where HPSOGSA-ARS, PSOGSA, and SHPSOGSA converge at 555th, 609th, and 804th iteration, respectively. The standard PSO shows the earliest convergence at the 110th iteration demonstrating its inherent weakness of premature convergence at a sub-optimal solution. In contrast, HPSOGSA demonstrates the worst convergence ability by not converging even after 1000th iterations. This observation is mainly due to PSO's ineffective hybridisation with GSA. HPSOGSA-ARS maintains the best convergence ability on multi-class datasets as it converges at the 515th iteration, whereas SHPSOGSA and PSOGSA show a competitive convergence strength at 454th and 420th iterations, respectively. HPSOGSA demonstrates a better convergence performance on multi-class datasets as shown by convergence at 358th and 306 th iterations on Figures 4 and 5, respectively. Another possible explanation is that replacing PSO's cognitive component with the acceleration component ensured that particles searched widely enough within the search boundary during the initial stages of the search process. Thus, enhanced diversity within the swarm accounted for the algorithm's ability to find a solution near the global optima. This ability is evident in the best MSE values and the classification outcomes obtained by HPSOGSA-ARS and its closely related variants PSOGSA and SHPSOGSA, which possess a similar hybridisation technique. In these three closely related hybrid PSOGSA variants, the authors replaced the cognitive component of PSO with the acceleration component of GSA, leading to maximum exploration. Comparative to the hybrid strategy in these closely related variants, the implementation of the synchronous constriction factor and inertia weight accounted for the better performance of HPSOGSA-ARS over PSOGSA and SHP-SOGSA in terms of classification accuracy and MSE values on both binary and multi-class datasets as recorded in Table 2-7. Furthermore, our proposed algorithm overcomes the challenge of local optima entrapment, which has been proved by a better exploratory search outcome on binary and multi-class datasets in Figures 3 and 5 with convergence at 358th and 306th iterations, respectively. An explanation for this outcome is that, by incorporating an adaptive response strategy into PSO's position update equation, a significant reduction in the number of stagnated particles in a sub-optimal solution occurred, which led to the conduction of further searches for the global optimum solution. The ARS is responsible for the high average classification values obtained by HPSOGSA-ARS across the selected benchmark datasets. Although PSOGSA and SHPSOGSA perform well by striving to maintain a balanced exploration and exploitation at different stages of the search for global optima, both algorithms do not have a mechanism that checks and enables stagnated particles to escape sub-optimal solution entrapment.
Although it is a widely held notion that hybridising PSO with other high-performing metaheuristics eventually results in an improved hybrid optimisation algorithm, our findings reveal that it is not always true. Whether the resulting hybrid algorithm will perform better than its constituent algorithms largely depend on how the algorithms were merged. As such, we offer a novel perspective that PSO's position update equation can be modified to minimise the likelihood of early convergence leading to entrapment of particles in sub-optimal solution regions. Another well-held belief is that the introduction of inertia weight and constriction factor significantly enhances PSO's search performance. Our method presents a new way of synchronising the inertia weight and constriction factor to ensure good exploration ability from the onset and a good exploitation ability during the final iteration stages. Our study has two main limitations. Firstly, the computational cost of the proposed algorithm is relatively high. This observation is attributable to our proposed algorithm's numerous computational steps to achieve the desired outcome. Secondly, there isn't much significant difference in the outcome of the proposed method compared to the peer algorithms used in this work, as revealed by the p − values in Table 8. But in general, the results indicate a significant improvement compared to the standard PSO algorithm.

Results of parametric studies
This section discusses the computational time, the key parameters, and their influence on the proposed FNN training algorithm's performance. We begin by considering computational time analysis, changes in the number of particles, the number of iterations, the trap activation triggering value, and the number of neighbouring particles near the stagnated particle.

Computation time
This section presents a detailed analysis of the computational time our proposed HPSOGSA-ARS takes to complete execution on the six selected benchmark datasets compared to the closely related training algorithms implemented in this article. According to t-test, we perform 30 independent experimental runs on each algorithm and record all average CPU times of each FNN training method. The experiment is conducted using MATLAB R2018a programming environment with a CPU specification of Intel(R) Core (T.M.) i7-4800MQ CPU @ 2.70 GHz (8 CPUs), ∼ 2.7 GHz, 8GB RAM. Table 10 presents the observed computational time for training an FNN and classifying a test sample on the selected datasets with the best average time emphasised. It is observed in Table 10 that, except for HPSOGSA, the proposed HPSOGSA-ARS algorithm is diminutively more time-consuming than the remaining variants on all the datasets. This observed increase in the computational time of HPSOGSA-ARS is due to its synchronous execution of the constriction factor and inertia weight in addition to the incorporation of the Adaptive Response Strategy. Although undesirable, the time spent to execute HPSOGSA-ARS is worthwhile since the performance of PSO is ultimately improved.

Effect of changes in number of particles and maximum iterations on the performance of HPSOGSA-ARS
This section discusses the effects of altering parameters, namely: number of particles and iterations, on the performance of HPSOGSA-ARS. We conducted this parametric investigation by alternating the number of particles from 20 to 40 in steps of 10 particles while alternating the maximum iterations from 500 to 1500 in steps of 500 iterations on the benchmark datasets. Figures 6 and 8 are visual representations of the parametric study outcome of HPSOGSA-ARS with its comparing variants when the maximum iteration is fixed at 500 and the number of particles being gradually increased from 20 to 40 in steps of 10 particles. Contrastingly, Figures 7 and 9 depict the parametric study outcome when the number of particles is fixed at 30, and the number of iterations gradually increases in steps of 500. It was observed from the experiment that when the maximum iteration is set at 500, increasing the number of particles generally increases accuracy on all the benchmark datasets. Moreover, we recorded the highest accuracy when the maximum iteration was fixed at 1500 and the number of particles at 30. HPSOGSA-ARS obtained an increased accuracy value according to Figures 6, 8, as well as Tables 11 and 12. The possible reason for this outcome is that keeping the maximum number of iterations at 500 while increasing the number of particles from 20 to 40 means more particles will extensively search within the search space. Consequently, the global optimum solution is easily located within a relatively shorter time with little effort. Similarly, HPSOGSA-ARS obtained an increased accuracy value according to Figures 7 and 9. The possible reason for this outcome is that keeping the number of particles at 30 while increasing the number of iterations from 500 to 1500 in steps of 500 means few particles will extensively search within the search space for an extended period. Additionally,  the improved strategies ensure that particles explore widely within the search space while avoiding sub-optimal solution entrapment.
As such, regarding HPSOGSA-ARS as a stochastic search algorithm, an increased number of iterations only increases the likelihood of global optimum solution attainment irrespective of its starting number of particles.

Effect of the number of nearest neighboring particles on HPSOGSA-ARS' performance
This section performs experiments to determine the optimal number of particles near a stagnated particle upon which a stagnated particle depends to adjust its position. We do  this by altering the number of closest particles and exploring their effects on the performance of HPSOGSA-ARS. According to the result of our experiment, we concluded that the optimal number of nearest particles is 7. This conclusion was based on the observation that when the maximum iteration was set at 1000, the best MSE and the highest classification accuracy were obtained when the nearest particles were seven.
It can be observed in Table 13 that the average MSE values and classification accuracy values increased as the nearest number of particles increased from 2-6, peaked at 7, and then steadily declined from 8 upwards. As such, we cannot conclude that an increased number of nearest particles leads to a better performance of the algorithm. However, since the  algorithm is stochastic, the reason for 7 being the optimal value is intrinsic, which could not be pointed out precisely but rather accurately determined via experimentation.

Effect of the trap activation triggering value on HPSOGSA-ARS' performance
The Trap Activation value used to trigger the Adaptive Response Strategy (ARS) is an influential parameter that, when altered, will significantly affect the FNN optimisation outcome. In this section, we seek to determine the optimal trap activation value. We do this by adjusting the value of the trap activation parameter and exploring its effect on the performance of the HPSOGSA-ARS algorithm. Our experiment found the optimal trap activation value to be 3. This observed result stems from the fact that by keeping the trap activation value at three (3) while maintaining the maximum iteration and the number of particles at 1000 and 30, respectively, we obtain the best MSE and highest classification accuracy values. The results presented in Table 14 indicate that HPSOGSA-ARS is a stable and robust algorithm on binary datasets. The stability and robustness outcome of HPSOGSA-ARS has been derived from the observation that the HPSOGSA-ARS algorithm attains the best convergent ability with an adequately balanced exploration and exploitation capacity in addition to best MSE and standard deviation values. Contrary to the previous observation in Table 14, we observed that although the HPSOGSA-ARS algorithm exhibited a slightly early convergence on multi-class datasets, it attained the best performance in terms of MSE and standard deviation values. As stochastic algorithms are dynamic, it is challenging to explain precisely their intrinsic behaviour from a theoretical perspective. We, therefore, resort to explaining the possible reason for this outcome experimentally. When the trap activation value is less, the algorithm triggers the Adaptive Response Strategy (ARS) early enough to cause a stagnated particle to jump out of stagnation early. ARS action gives such particles ample time to explore and locate the global optimum solution at fewer iterations. On the other hand, a higher trap activation value implies a comparatively long time for the ARS to be triggered to enable stagnated particles to jump out of sub-optimal solutions. Thus, a higher trap activation leads to more time to search and locate the global optimum solution with more iterations.

Effectiveness of constriction factor and adaptive response strategies
This section performs experiments to determine the efficacy of constrictor factor and adaptive response strategies by comparing the convergence outcomes of HPSOGSA-ARS and HPSOGSA. The algorithm with constriction and adaptive response strategies is denoted HPSOGSA-ARS, else HPSOGSA. The experiment is performed on the six benchmark datasets described in Table 1, and the parameter configurations are set per Section 4, subsection one. Figure 10 depicts the outcome of the experiment.
The test results on all datasets depict that HPSOGSA-ARS have a better average diversity than HPSOGSA. Across all the implemented datasets, we observe that the average diversity of particles in HPSOGSA-ARS from the early stages of iteration to the latter stages is better when compared to that of HPSOGSA. Specifically, HPSOGSA achieves a better average diversity in the early stages symbolising a good exploration ability. In contrast, it is observed that during the later stages of the search, the HPSOGSA algorithm is unable to appropriately transition search from exploration into exploitation leading to stagnation of particles as indicated by the straight line. On the WBCD in Figure 10(a), Iris dataset in Figure 10  On the other hand, we observe that the HPSOGSA-ARS search process similarly obtains a good exploration during the early phase of the search and can transition from exploration into exploitation with increased iteration. Additionally, we observe in the search process of the HPSOGSA-ARS that anytime stagnation occurs (indicated by the intermittent straight lines), the adaptive response strategy is activated to unfold particles leading to a continued search until the attainment of a global optimum. This group of tests validates the contributions of the constriction factor and the adaptive response strategy to prevent premature convergence and ensure balanced exploratory and exploitation searches.

Conclusions
This article researched the previous implementation of the Particle swarm optimisation (PSO) algorithm in optimising Feedforward Neural Networks (FNN) and improved upon those works by incorporating novel strategies. The improvement strategy hybridises PSO with the Gravitational search algorithm (GSA) and includes an adaptive response strategy and a constriction factor (HPSOGSA-ARS). HPSOGSA-ARS was then used to optimise an FNN to perform classification. When evaluated on six selected diversified benchmark datasets, FNN-HPSOGSA-ARS demonstrated a better performance in avoiding sub-optimal solutions entrapment and achieving a better convergence rate. Our findings confirm that when we hybridise PSO with another heuristic algorithm good at exploration, it complements PSO's weakness of early convergence. Additionally, our results demonstrated that increased performance is achieved when a sub-optimal solution stagnation escape mechanism is incorporated into the PSO algorithm.
Moreover, our study offers a novel perspective on position update equation improvement, modifying PSO's inertia weight and constriction factor so that they work jointly in a manner that leads to PSO's improved performance. It is noteworthy that we can use other metaheuristics in place of PSO or GSA to optimise FNNs. Examples of these metaheuristics are monarch butterfly optimisation (MBO) (Xie & Wang, 2021), earthworm optimisation algorithm (EWA) (Prasad et al., 2021), elephant herding optimisation (EHO) (Kilany et al., 2021), moth search (M.S.) algorithm (Feng & Wang, 2021), Slime mould algorithm (SMA) (Precup et al., 2021), and Harris hawks optimisation (HHO) (Mary et al., 2021) algorithm. However, we can only obtain an optimal performance when we resolve their limitations effectively, as this work does.
Future work can focus on any of these directions as follows: firstly, by extending HPSOGSA-ARS to handle multi-objective optimisation problems, complex classification problems, and gene selection for cancer classification. Secondly, by modifying the position update equation to enhance the tendency of PSO to avoid sub-optimal solution entrapment, thirdly, furthering the improvement approaches of PSO that has to do with implementing inertia weight and constriction factor simultaneously to obtain a balanced exploration and exploitation. fourthly, hybridising other metaheuristics that possesses an excellent exploratory search ability with PSO with or without different improvement strategies to solve multi-objective optimisation real-world challenges. Finally, applying the resulting algorithm to solve optimisation tasks in other fields of life such as Agriculture, Engineering, Business, etc.