Intelligent leukaemia diagnosis with bare-bones PSO based feature optimization

In this research, we propose an intelligent decision support system for acute lymphoblastic leukaemia (ALL) diagnosis using microscopic images. Two Bare-bones Particle Swarm Optimization (BBPSO) algorithms are proposed to identify the most signiﬁcant discriminative characteristics of healthy and blast cells to enable efﬁcient ALL classiﬁcation. The ﬁrst BBPSO variant incorporates accelerated chaotic search mechanisms of food chasing and enemy avoidance to diversify the search and mitigate the premature convergence of the original BBPSO algorithm. The second BBPSO variant exhibits both of the abovementioned new search mechanisms in a subswarm-based search. Evaluated with the ALL-IDB2 database, both proposed algorithms achieve superior geometric mean performances of 94.94% and 96.25%, respectively, and outperform other metaheuristic search and related methods signiﬁcantly for ALL classiﬁcation. © 2017 The Author(s). Published by


Introduction
Leukaemia is a type of cancer pertaining to white blood cells (WBCs), in which abnormal and immature WBCs are produced by the bone marrow and enter the bloodstream. There are two types of acute leukaemia, i.e. acute lymphoblastic leukaemia (ALL) and acute myeloid leukaemia (AML). Since ALL diagnosis associates closely with morphological changes of WBCs and manual morphological analysis may suffer from several potential limitations (e.g. non-standard precision and relying heavily on medical professionals' knowledge and skill) [1][2][3], many automatic ALL diagnosis methods have been proposed in recent years [1][2][3][4][5]. In order to achieve robust and efficient computerized diagnosis, identifying the characteristics of healthy and blast cells is a crucial factor. Although many studies on the separation and retrieval of the nucleus and cytoplasm or purely nuclei of the cells using segmentation techniques are available, limited investigations have been conducted on the selection of significant discriminative character-istics from the segmented regions to effectively benefit subsequent ALL diagnosis [2][3][4][5][6][7].
This research aims to deal with the aforementioned challenges by proposing an intelligent decision support system with evolutionary feature optimization for robust ALL classification. Specifically, we propose two Bare-bones Particle Swarm Optimization (BBPSO) algorithms to extract the most significant discriminative characteristics of normal and abnormal lymphocytic cells for ALL classification. The proposed BBPSO variants incorporate accelerated search mechanisms of attraction to the food source and avoidance of enemies to diversify the search and overcome premature convergence of the original BBPSO algorithm. Fig. 1 shows the overall flow of the proposed system. It contains the following key steps: (a) WBC identification from blood smear images, (b) nucleus-cytoplasm separation, (c) feature extraction, (d) BBPSO-based feature optimization, and (e) lymphocyte and lymphoblast identification. After employing marker-controlled watershed segmentation to extract WBCs from microscopic images, a stimulating discriminant measure (SDM)-based clustering algorithm proposed in our previous research [2] is used for nucleus-cytoplasm separation. We initially extract 80 raw features from the segmented nucleus and cytoplasm sub-images. The proposed BBPSO algorithms are then used to identify the most significant discriminative characteristics of healthy and blast cells from the extracted raw features, respectively.  Neighbour (1NN) and Support Vector Machine (SVM) with Gaussian Radial Basis Function (RBF) kernel are used to classify lymphocytes and lymphoblasts using the identified optimal feature subsets.
The contributions of this research are summarized, as follows.
1. We propose two BBPSO algorithms for feature optimization.
Besides the original position updating operation of the BBPSO algorithm, the proposed variants incorporate mechanisms of attraction to the food source and avoidance of enemies to increase search diversity and overcome local optima of the original BBPSO algorithm. These two new behaviours are also accelerated by the Logistic chaotic map. 2. The food chasing behaviour is guided by the average personal best experience and the global best solution to enable the search to reach attractive optimal regions more effectively. The mechanism of fleeing from enemies enables the particles to move away from unpromising search regions, in order to accelerate convergence. The first proposed BBPSO variant incorporates these two new search behaviours to guide the search in the main swarm, while the second proposed variant exhibits these strategies in the subswarm-based search. These two search mechanisms and the original BBPSO operation work in a cooperative manner to lead the search to attain global optima. 3. In comparison with other metaheuristic search methods, the proposed BBPSO algorithms possess efficient discriminative capabilities in which significant discriminating features for lymphocytes and lymphoblasts are revealed. Evaluated with 180 microscopic images extracted from the ALL-IDB2 database [3], the proposed algorithms show great efficiency, and outperform other search methods across different experimental settings under different fitness evaluations. They also compare favourably with other related methods for ALL diagnosis reported in the literature.
The organisation of this paper is as follows. Section 2 introduces the related research on automatic ALL diagnosis and feature optimization techniques. Section 3 presents the proposed ALL diagnosis system including its pre-processing steps, the proposed BBPSO-based feature optimization methods and ALL identification using both 1NN and SVM. Evaluation of the proposed algorithms and comparison with other search methods using the ALL-IDB2 database are discussed in Section 4. Finally, we draw conclusions and identify future research directions in Section 5.

Related work
In this section, we discuss related research on automatic ALL diagnosis and state-of-the-art feature optimization techniques.

Automatic leukaemia diagnosis
Defined by the French-American-British classification systems [1,4], there are three subtypes of ALL, i.e. L1 to L3 and eight subtypes of AML, i.e. M0 to M7. Many research studies have been dedicated to the automatic diagnosis of ALL, AML and their subtypes in order to promote early diagnosis. Neoh et al. [2] proposed an automatic ALL diagnosis system using microscopic blood images. Their work proposed a clustering algorithm with the stimulating discriminant measure (i.e. SDM) that took both within and between cluster scatter variances into account for nucleus-cytoplasm separation. The SDM-based clustering algorithm was integrated with the Genetic Algorithm (GA) to perform robust segmentation of nucleus, cytoplasm and background regions. Single and ensemble classifiers were applied in their work for ALL recognition. Bootstrapping and 10-fold cross validation were used for system evaluation. Shadowed C-means (SCM) clustering was used by Mohapatra et al. [5] to perform lymphocyte image segmentation. It clustered each pixel into one of the three regions, i.e. cytoplasm, nucleus and background. An ensemble classifier, consisting of a neural network (NN), SVM, and k-Nearest Neighbour (kNN), was used to recognize lymphocytes and lymphoblasts. The ensemble classifier outperformed other single models including the NN, kNN, Naïve Bayes Classifier (NB), SVM, and Radial Basis Function Network (RBFN). Furthermore, Madhloom et al. [6] integrated colour features with morphological reconstruction to localize and isolate blast cells. They also applied the Fisher Discriminant Ratio (FDR) to rank and select features from each cell for subsequent ALL recognition using kNN. Their work employed 260 cell images with 180 and 80 images for training and test respectively, and achieved 92.5% accuracy for the distinction of healthy and blast cells.
Agaian et al. [4] proposed an AML detection system which employed K-means Clustering and morphological filtering to segment nuclei from leucocytes. Local Binary Patterns (LBP) and Hausdorff Dimension (HD) were used in their work to extract useful features in addition to other extracted shape, Gray Level Cooccurrence Matrix (GLCM) and colour features. The SVM classifier was used to recognize AML and healthy cells. Meera and Matthew [7] introduced Fuzzy Local Information C-means for AML image segmentation with multiple nuclei, and used both GLCM and HD features with the SVM for identification of healthy and blast cells.

Feature extraction techniques
Feature extraction is an important step in contributing to accurate recognition of normal and blast cells. Features that are commonly extracted from the microscopic blood cell images include shape, colour, texture and statistical based information. Generally, shape-based features cover the geometric information such as area, perimeter, elongation, and eccentricity, while colourbased features include the type of colour space information such as RGB, CIELAB (CIE L*a*b*), or Hue-Saturation-Intensity (HSI). For textural features, GLCM, which provides information such as homogeneity, contrast, and entropy, is usually employed, whereas for statistical-based features, information such as mean and standard deviation is often used. Ongun et al. [8] adopted affine invariants, CIE L*a*b* colour space, colour histogram, and shape-based features from heuristic reasoning of haematologist to make up a total of 57 features for the classification of 12 types of blood cells (e.g. monocyte, neutrophil, myelocyte, plasma, etc). Putzu et al. [1] focused on the detection of abnormality in lymphocytes. A total of 30 shape, 21 colour and 80 GLCM-based texture descriptors were extracted from the obtained sub-images. Besides the GLCM textural features, some researchers employed different methods to interpret textural information from the cell images. As an example, LBP textural extraction was proposed by Singhal and Singh [9] for the detection of lymphocytes and lymphoblasts while Rezatofighi and Soltanian-Zadeh [10] used the LBP features for WBC extraction. In addition, HD was adopted by Mohapatra et al. [5] to extract roughness of the nucleus boundary of lymphocytes and lymphoblasts. The influence of the LBP operator on HD has also been evaluated by Agaian et al. [4]. Their experiments indicated the positive impact of LBP on HD, which boosted the AML classification performance greatly. A colour feature called cell energy was also employed in their work, which played a very important role in distinguishing between normal and abnormal cells.

Feature selection algorithms
The retrieval of shape, colour, texture and statistical based information from the blood cells often entails a large set of input features for the classification system, which could be computationally costly. While inadequate features reduce classification accuracy, a large feature set that involves redundant and insignificant information can reduce classification performance as well [11][12][13]. Therefore, optimal feature selection is crucial for the improvement of classification results. In this section, we first introduce wellknown evolutionary optimization algorithms for feature selection, including Particle Swarm Optimization (PSO), Cuckoo Search (CS) and Dragonfly Algorithm (DA), followed by other advanced modified optimization mechanisms. Techniques dedicated to feature selection and dimension reduction for leukaemia classification are also introduced.

PSO and bare-bones PSO
A number of classical, well-known optimization algorithms have been applied to diverse engineering optimization tasks. Introduced by Kennedy and Eberhart [14], PSO is an efficient technique for feature selection [15,16]. In PSO, each particle has a position in the search space. The particle is characterised by a position vector, x i = (x i1 , x i2 , . . ., x iD ), and a velocity vector, v i = (v i1 , v i2 , . . ., v iD ), where D denotes the dimension of the search space. All particles move in the search space to search for the optimal solutions. In PSO, the best position ever achieved by a particle, i.e. the personal best, pbest, and the best position of the overall swarm, i.e. the global best, gbest, are used to update the velocity and position of each particle.
BBPSO is a variant of PSO [17]. Compared with PSO, it does not consider the velocity, but only updates the particles' positions. The Gaussian distribution is employed for position updating in BBPSO, as in Eq. (1).
where denotes the Gaussian distribution, pbest t id +gbest t d 2 represents the mean or expectation of the distribution with |pbest t id − gbest t d | as the standard deviation. Using Eq. (1), the new position of a particle is distributed according to the Gaussian distribution, although other distribution functions can also be applied. Compared with conventional PSO, BBPSO does not require any operating parameters. Therefore, BBPSO is more efficient, which has been extensively applied to real-world single and multi-objective optimization problems [18,19].

Cuckoo search
Proposed by Yang and Deb [20], Cuckoo search (CS) possesses both local and global search mechanisms to attain global convergence. CS employs the following three main principles for searching the global optimal solutions. Firstly, each cuckoo lays one egg (solution) at a time, which is discarded in a randomly chosen nest. Secondly, the best nests with high-quality eggs are selected for the next generation. Thirdly, the host bird discovers the egg laid by a cuckoo with a probability, p a , therefore, a fraction (p a ) of the worse nests is abandoned and replaced by the new nests. The algorithm employs the following strategy to generate new nests (solutions).
where x t k and x t l denote the solutions selected randomly by random permutation, while s denotes the step size and H(v) represents a Heaviside function. Note that ε is a random number drawn from a uniform distribution, while ⊗ represents the entry-wise product of two vectors. The new solution, x t+1 i , is accepted if it has a better fitness value than that of x t i . In each iteration, the Levy flights operation defined in Eq. (3) is used to perform the global random walk.
where x t+1 i and x t i denote the i th solution in t+1-th and t-th generations, respectively, while represents the Levy flights operation with as the random step length (1 < ≤ 3), and ␣ is the step-size scaling factor. CS employs these local and global search operations to search for the global optima.
Although CS shows impressive search capabilities, its search strategy could be further enhanced. For instance, the new nest (solution) generation strategy shown in Eq. (2) relies purely on two randomly selected individuals, x t k and x t l , and it does not explicitly employ optimal solutions identified so far for promising offspring generation. Therefore, it could be further improved by considering more explicit optimal signals, i.e. local and global best experiences, to increase the likelihood of generation of promising offspring solutions. Motivated by this perspective, we incorporate both personal and global best experiences to guide the attraction search mechanism and enable fast convergence in this research.

Dragonfly algorithm
Proposed by Mirjalili [21], DA simulates and implements static and dynamic swarming behaviours of dragonflies to balance between global exploration and local exploitation. It employs the following five social interaction behaviours, i.e. separation, alignment, cohesion, attraction (towards food), and distraction (outwards enemies) to guide the search process. Its velocity and position updating operations are defined in Eqs. (4) and (5), respectively.
x t+1 = (sS i + aA i + cC i + fF i + eE i ) + w x t (4) In Eq. (4), S i , A i , C i , F i , and E i represent the social behaviours of separation, alignment, cohesion, attraction, and distraction, respectively while s, a, c, f, and e are the corresponding weights for the five actions. In addition, x t+1 and x t represent the step/velocity vector in the t+1-th and t-th iterations, respectively, with w as the inertia weight. In Eq. (5), x t+1 and x t indicate the positions of an individual in the t+1-th and t-th iterations, respectively. Eqs. (4) and (5) model the social behaviours of an artificial dragonfly when it has at least one neighbouring individual.
Among the five social behaviours, the distraction (i.e. evading) operation distinguishes DA from other swarm intelligence algorithms, and is defined as follows.
where E i denotes the distraction action and x ε represents the position of an enemy with x denoting the position of the current individual. Since this evading operation moves the current individual away from each solution with a lower fitness value, the search process is very likely to be computationally inefficient. In this research, we propose an evading mechanism that is guided by more explicit enemy signals, i.e. average personal historical and global worst experiences, for enemy avoidance. The rationale is to ensure each particle flees away from local and global unpromising search regions effectively, therefore accelerating convergence.

Other feature optimization techniques
There are also other modified or hybrid metaheuristic search algorithms proposed in recent years to overcome the limitations of some existing methods. Since a constant setting of operation parameters in CS may have a negative impact on its performance, Valian et al. [22] and Li and Yin [23] developed modified CS with self-adaptive parameter settings to overcome the problem associated with constant CS parameter settings. Valian et al. [22] adjusted the search parameters according to the number of generations, and recommended a comparatively larger parameter setting at the beginning to increase solution diversity and a smaller parameter setting in the later iterations to fine-tune the identified solutions. The work achieved impressive performance in complex engineering optimisation problems. Li and Yin [23] proposed two new local search strategies for CS. Based on a decreasing probability rule, the strategies aimed to balance between exploitation and exploration, and an adaptive parameter setting was introduced to enhance population diversity. Their work compared favourably with other related research based on a study with 16 benchmark functions. Jordehi [24] proposed an enhanced leader PSO (ELPSO), which employed successive mutation strategies such as Gaussian, Cauchy, opposition-based and differential evolution (DE) based mutation, to further enhance the swarm leader. The results indicated its efficiency in terms of accuracy and scalability. Zhang et al. [12] proposed a binary BBPSO-based feature selection algorithm. Their work used a reinforced memory strategy for personal best updating of each particle to retain particle diversity. It also used a uniform combination to diversify the swarm when stagnation occurred. The effects of uniform combination were strengthened along with the increase of stagnant iterations. The binary BBPSO algorithm showed a competitive performance in terms of classification accuracy and convergence rate. Neoh et al. [25] proposed two evolutionary algorithms under a layered cascade evolutionary framework, i.e. direct similarity and Pareto-based feature selection, for facial expression recognition. The direct similarity feature selection algo-rithm integrated the concept of micro Genetic Algorithm and focused on identifying common features within each class. Meanwhile, the Pareto-based optimization took both between-class and within-class variations into account for multi-objective feature optimization. Both optimization strategies achieved impressive performances and outperformed other baseline methods (e.g. GA and AdaBoosting) significantly. A comprehensive review of PSO and its applications has also been conducted by Zhang et al. [15].
There was also other research dedicated to dimension reduction for automatic leukaemia diagnosis. In order to select the input features for effective classification of normal and abnormal lymphocytes, Mohapatra et al. [5] applied an independent-sample "t" test to select 32 statistically significant features out of 44 raw features, representing shape, colour and texture information of nucleus and cytoplasm. Madhloom et al. [6] selected 7 out of 30 raw features, representing shape, colour and texture information of nucleus, cytoplasm and the whole cell, by employing FDR that considered cross-correlation among features for the identification of lymphocytes and lymphoblasts. In addition, Rezatofighi and Soltanian-Zadeh [10] further proposed sequential forward selection along with FDR for the recognition of five types of WBCs. Huang and Hung [26] used the Principal Component Analysis (PCA) to reduce the feature dimensions from 85 to 7 in leucocyte recognition. Despite the popularity of the filter-based approach, Osowski et al. [27] proposed an embedded method to recognize 11 types of blood cells (e.g. basophilic erythroblast, neutrophilic myelocyte, lymphocyte, etc) by using the GA to fine-tune the features with respect to the SVM performance during the training stage. Escalante et al. [28] employed PSO to guide the search process and automatically select ensemble classification models for different types of leukaemia detection. Their work achieved high accuracy for classification model selection without user intervention. The system achieved 97.68% for ALL and AML leukaemia detection, and 94.21% for subtypes of ALL (L1 and L2) and AML (M2, M3 and M5) identification. Besides the detection of ALL and AML, the GA was employed by Chan et al. [29] to obtain optimal feature parameter values for recognition of anaemia abnormal red blood cells.

The proposed all recognition system
There are five key steps of our proposed system: (a) WBC identification from blood smear images, (b) nucleus and cytoplasm separation, (c) feature extraction, (d) modified BBPSO-based feature optimization, and (e) lymphocyte and lymphoblast classification. First of all, modified marker-controlled watershed segmentation and morphological operations proposed in our previous research [30] are used to extract WBCs from microscopic images automatically. Then, an SDM-based clustering algorithm utilising both within-and between-cluster scatter variances as proposed in our recent work [2] is used to perform nucleuscytoplasm separation. A set of 80 raw features comprising 16 shape descriptors, 54 GLCM textural descriptors, and 10 CIELAB colour descriptors is extracted from the segmented nucleus and cytoplasm sub-images. These 16 extracted shape features include information with respect to the cell size, nucleus size, nucleus shape, and details of cytoplasm, which consists of cytoplasm and nucleus areas, nucleus to cytoplasm ratio, length to diameter ratio, major axis length, orientation, filled area, perimeter, solidity, eccentricity, minor axis length, convex area, form factor, compactness based on Mohapatra et al. [5], another compactness measure based on Mohapatra et al. [31], and roundness of the nucleus region. The 54 texture features consist of 13 descriptors from the GLCM matrix, including correlation, sum of variance, normalized inverse difference moment, sum of average, contrast, difference variance, entropy, cluster prominence, cluster shade, dissimilarity, energy, homogeneity, and normalized inverse difference, computed in four different angles (i.e. 0, 45, 90, and 135) plus two additional descriptors, i.e. skewness and kurtosis. In addition, the 10 colour features consist of the mean and standard deviations of the a* and b* components of the CIELAB colour space for both nucleus and cytoplasm, along with two descriptors pertaining to the ratio of the mean of a* and b* components between cytoplasm and nucleus. Since identifying the most discriminative characteristics of normal and abnormal lymphocytic cells and removing the redundant features have a great impact in boosting classification accuracy, in this research, we propose two modified BBPSO algorithms to identify significant discriminative feature subsets of healthy and blast cells from the 80 raw features to benefit subsequent robust ALL classification. We introduce the proposed BBPSO algorithms in detail in the following section.

The proposed BBPSO algorithms with attraction and flee operations
Motivated by the accelerated search strategies of PSO, CS and DA, we propose two modified BBPSO algorithms that incorporate two new operations, i.e., attraction to the food source and flee from the enemies. The aim is to mitigate premature convergence of the original BBPSO algorithm. The first variant explores both the attraction and flee operations in the primary swarm while the second embeds them in the subswarm-based search. These special food chasing and fleeing behaviours show great potential in increasing local and global search capabilities of the original BBPSO algorithm. The details of both proposed variants are as follows.

The modified BBPSO variant 1
As mentioned above, the first proposed BBPSO algorithm (denoted as Algorithm 1) incorporates not only the conventional movement of BBPSO defined in Eq. (1), but also the newly proposed attraction and flee operations to guide the search process. The new search behaviour with attraction to the food source is defined in Eqs. (7) and (8).
where pbest ' id and gbest t d represent the average personal best experience and the global best solution in the d-th dimension, respectively. Note that c denotes the Logistic chaotic map, which provides chaotic adaptive steps of the search behaviour. In Eq. (7), each particle is guided by the mean of pbest ' id and gbest t d to move towards the food source (i.e. optimal regions) to accelerate the search process. In addition, pbest ' id is further defined in Eq. (8), where pbest k id represents the personal best solution for the i-th particle obtained in the k-th iteration, k = 1, 2, . . ., t. As indicated in Eq. (8), instead of using the personal best solution identified from the current iteration, the proposed attraction action is enhanced by employing the mean of the personal historical best experiences obtained from the past t number of iterations. Overall, this food chasing mechanism can be viewed as a special case of CS, where c = ˛s ⊗ H (pa − ε) as shown in Eq. (2), and also a special case of PSO, where the personal and global best solutions are combined to guide the search process. Guided by the global and the average personal best solutions, the proposed attraction search mechanism enables the overall population to reach promising search regions more efficiently in fewer iterations. It also shows great efficiency in escaping from the local optimum trap owing to the consideration of both local and global promising solutions.
Motivated by the concept of enemy avoidance in DA, the proposed flee mechanism is defined in Eqs. (9) and (10).
where pworst ' id and gworst t d represent the average personal worst experience and the global worst solution in the d-th dimension, respectively, while ␣and ε represent a randomization vector (with each dimension ∈ (0, 1)) and a random walk strategy such as the Levy flights, respectively. Note that c also denotes the Logistic chaotic map. This search action allows each particle to flee away from enemies and move away from less optimal search regions (e.g. avoidance of enemies) to achieve fast convergence. In addition, pworst ' id , is further defined in Eq. (10), where pworst k id represents the personal worst solution for the i-th particle identified in the k-th iteration. Similar to the attraction behaviour, instead of using the personal worst solution identified from the current iteration, the proposed flee operation is enhanced by using the mean of the personal historical worst experiences from the past t number of iterations.
In comparison with the evading action in DA, the proposed mechanism is guided by both average personal historical and global worst experiences for enemy avoidance. It provides a way for each particle to flee away from local and global unpromising search regions effectively, therefore accelerating convergence.
Algorithm 1 lists the first proposed BBPSO variant. After initialising the original swarm, in each iteration, any of the three actions (i.e. the movements defined in Eqs. (7) and (9) and the original search behaviour of BBPSO defined in Eq. (1)) is randomly selected to guide the search of each particle. Moreover, the original search strategy of BBPSO together with both attraction to the food source and fleeing from enemies operations work in a collaborative manner to drive the search process out of the local optimum trap. For instance, when the search guided by the food chasing behaviour that follows the global and the average personal best experiences (e.g. the special cases of PSO and CS) stagnates, the fleeing from enemies operation is able to drive the particles out of the less optimal regions, in order to overcome premature convergence. On the other hand, when the enemy avoidance behaviour shows limited improvements in finding the best solution, the search mechanism guided by the global and the average personal best solutions leads the overall population to reach more promising search regions effectively, in order to escape from the local optimum trap. These two new search operations and the original search behaviour of BBPSO work alternatively to increase search diversity and attain global optima.

The proposed BBPSO variant 2 with subswarms
In this research, both the newly proposed search mechanisms, i.e. attraction to the food source and fleeing from enemies, are also embedded in the subswarms to evaluate their efficiency. The proposed second BBPSO variant (denoted as Algorithm 2) explores these new search mechanisms in subswarm-based search activi-ties. Algorithm 2 shows the pseudo-code of the proposed second BBPSO variant, while its flowchart is shown in Fig. 2.
As illustrated in Algorithm 2, this second BBPSO variant with subswarms firstly performs the conventional BBPSO operation for N number of iterations to identify the initial global best solution, gbest bbpso. Subsequently, the overall population is divided into two subswarms, s1 and s2. We employ the newly proposed search mechanisms, i.e. attraction to the food source and fleeing from enemies, in the subswarms, in the search process. Specifically, we embed the original movement of BBPSO and the food chasing behaviour in subswarm s1, and randomly select any of the two operations in each iteration to update the position of each particle in s1. Similarly, the original operation of BBPSO and the fleeing from enemies mechanism are employed in subswarm s2. In each iteration, any of the two operations is randomly selected to update the position of each particle in s2. After N number of iterations, the subswarm leaders of both s1 and s2, i.e. gbest s1 and gbest s2, are identified. Then, the three optimal solutions, i.e. gbest bbpso, gbest s1 and gbest s2, are compared with each other. The one with the highest fitness value is identified as the global best solution, i.e. gbest, while, the worst leader among the three is discarded. Moreover, both food chasing and enemy fleeing operations embedded in the subswarms also work collaboratively to reduce the probability of being trapped in local optima.
The search process iterates until the termination criteria are met, i.e. (1) the maximum number of generations is reached, or (2) the most optimal solution is found. In this research, the initial population of 30 particles and the maximum number of generation of 200 are set for each experiment. The fitness function defined in Eq. (11), as formulated in other studies [32,33], is used to evaluate Update the subswar m best solution , gbest_s 1 Randomly select Eq. (1) or (7) to update the position of each particle in subswarm s1 Update the subswar m best solution , gbest_s 2 Compare the fitness of gbest _bbp so, gbest_s 1, an d gbest_s2 and assign the bes t leader to gbes t

Outp ut gbes t
Re-insert gbes t into the updated swa rm as the leader Discard the worst leade r among gbest_bbp so, gbest_s 1, an d gbest_s2 and combine s1 and s2 End where and 1 − denote the weights for classification performance,performance C , and the number of selected features,number featuresC , respectively. Since classification performance is more important than the number of selected features, is assigned a higher value than that of 1 − . For fitness evaluation, we convert the continuous value in each dimension of each particle into a binary setting (i.e. 0 or 1) with '1 indicating the selection of a specific dimension and '0 representing the non-selection of that dimension. In order to have slow movements and avoid premature convergence, we use a continuous value in each dimension for each particle during the search process, and conversion to a binary setting takes place only for fitness evaluation.
Since the samples extracted from the ALL-IDB2 database are limited, the training and test data sets employed for the evaluation of the proposed algorithms exhibit imbalanced class instances. Instead of using the traditional accuracy measure, we employ the geometric mean (GM) as the performance indicator because it is frequently used for evaluation of imbalanced data problems [34,35]. Therefore, it is employed in this research for fitness evaluation during the training phase and generation of the final classification results during the test phase.
We have also compared the two proposed BBPSO algorithms (with and without subswarms) with other conventional and state-of-the-art metaheuristic search methods to evaluate their efficiency. The detailed evaluation results are provided in Section 4.

ALL recognition
The proposed BBPSO based optimization algorithms are first used to identify the significant discriminative features pertaining to healthy and blast cells. Then, the selected features are normalized into the range of [-1,1]. Two classification techniques, i.e. 1NN and RBF-based SVM, are used for ALL detection in this research. 1NN is selected because it is a popular nonparametric classification technique with an efficient computational cost, and it has been employed frequently for fitness evaluation and/or ALL classification [12,36,37]. The RBF-based SVM model is selected because the RBF kernel supports nonlinear mapping of data samples and possesses fewer hyper-parameters. In order to provide a concrete evaluation methodology, we employ a consistent classification technique for both fitness evaluation (during training) and test in each experiment. Specifically, during the training stage, 1NN and RBF-based SVM with 10-fold cross validation are employed for fitness evalu-  ation. During the test stage, the trained 1NN and RBF-based SVM models with hold-out validation are utilized for ALL classification for each test set.
In order to maximize the SVM performance, during both fitness evaluation and test, it is necessary to identify the optimal parameter settings for the scaling factor, , and the soft margin constant, Co. Although the grid search method is often used for parameter selection, it is a computational-expensive process [38]. This problem is intensified as we conduct the fitness evaluation for each particle in each iteration during the training stage. Therefore, parameter tuning using the grid search method during the training stage is computationally prohibitive, and sometimes infeasible. To mitigate this problem, a widely accepted design of experiment method, i.e. the centre composite design (CCD) [35,39], is employed to identify the optimal parameter setting of SVM during the training stage. This CCD method divides each parameter into different levels, and evaluates all the possible combinations from different levels pertaining to the parameters for optimal setting selection. It offers an affordable computational cost in comparison with that of grid search. This design of experiment method is especially useful when the optimal setting is subject to multiple (e.g. more than two) parameters, in order to overcome the computational cost of the grid search method.
In this research, the CCD method with 10-fold cross validation is used to identify the optimal parameter setting for Co and of SVM at the training stage. Following the CCD method, three different levels, i.e. low, medium and high, have been identified for each parameter. Therefore, a full factorial design of 9 (i.e. 3 2 ) key combinations of Co and has been used. Since CCD employs fewer parameter combinations as compared with those of grid search, we employ the ranges of 2 1 − 2 3 and 2 1 − 2 3 for Co and , respectively, for the search of the optimal SVM parameter settings. These ranges are identified based on trial and error for diverse training sets employed in this research. Table 1 and Fig. 3 indicate the detailed parameter settings and experimental configurations.
During the training stage, the CCD method uses all 9 key combinations of Co and (as indicated in Fig. 3) to evaluate each particle. The parameter setting that achieves the best GM score for a specific particle under 10-fold cross validation is used to generate the fitness value of that particle using Eq. (11). The optimal setting identified for the final global best solution is used to train the SVM using the entire training set. The trained SVM model is subsequently used with hold-out validation for evaluation of the test set. Fig. 4. Sample sub-images from the ALL-IDB2 data set.

Evaluation
To compare the proposed BBPSO algorithms with other stateof-the-art PSO variants and classical methods, we implement the following search methods, i.e. BBPSO, binary BBPSO [12], ELPSO [24], GA, and PSO. We employ 180 microscopic images from the ALL-IDB2 database for experimentation. Some sample images are illustrated in Fig. 4. Two experimental settings with different proportions of training and test samples have been used to evaluate the efficiency of both of the proposed BBPSO algorithms.
The first experimental setting employs two distinctive sets of 90 images for training and test, respectively. For both data sets, two thirds of the images (i.e. 60) are blast cells while the remaining images (i.e. 30) represent healthy cells. The second experimental setting uses 100 images with 50 normal and 50 abnormal instances for training, and the remaining 80 (70 abnormal and 10 normal) unseen images for test. Since our algorithms and other optimization methods are all stochastic methods, we have conducted 30 trials for each algorithm. As discussed earlier, during the training stage, both 1NN and RBF-based SVM with 10-fold cross validation have been employed for fitness evaluation of each algorithm. During the test stage, the trained 1NN and RBF-based SVM models with hold-out validation have been employed for evaluation of all methods.
For the first experimental setting, we have compared the proposed algorithms with other state-of-the-art PSO variants and classical search methods over 30 trials. Table 2 shows the average GM performance of each algorithm integrated with both 1NN and SVM classifiers for evaluation of the 90 unseen test images over 30 runs. To compare the efficiency of the feature optimization process, the classification results using the entire set of 80 raw features without any feature selection are provided in the last row of Table 2. Overall, both proposed BBPSO algorithms have comparatively higher convergence rates with smaller numbers of iterations to achieve a reasonable classification performance of normal and abnormal lymphocytic cells. Comparatively, other methods exhibit lower convergence rates with more iterations to produce the results shown in Table 2.
As illustrated in Furthermore, as indicated in Table 2, the experimental results of both proposed algorithms outperform those using the original set of 80 raw features without any feature selection greatly. Under the same experimental setting, the performances of some baseline methods are comparable with, or sometimes lower than, the result obtained using the 80 raw features. From the clinical perspective, important features for ALL diagnosis include the cytoplasm and nucleus areas, ratio between the nucleus area and the cyto-  plasm area, form factor and compactness (supporting diagnosis in terms of irregularity of cell shape in nucleus), perimeter, texture changes related to open or close chromatin, and eccentricity [1][2][3][4][5].
The experimental results indicate that these important features have been identified and included in the selected feature subsets of both proposed algorithms. Note that a few of the abovementioned clinically important features, such as the nucleus area and ratio of nucleus to cytoplasm, are often overlooked, or they do not co-exist in the selected feature subsets by other optimization methods, despite the fact that these methods sometimes select more features, which may lead to performance degradation owing to the use of insignificant features. We have also compared the proposed algorithms and other methods using boxplots. Fig. 5 illustrates the classification performance variations for all algorithms over 30 trials for 90 test images with 1NN and SVM, respectively. In both boxplot diagrams, the first two boxplots depict the results of Algorithm 1 (without subswarms) and Algorithm 2 (with subswarms), respectively. Integrated with 1NN, Algorithm 2 achieves the highest average GM performance of 95.67% over 30 runs. As shown in Fig. 5(a), 50% of the GM results of Algorithm 2 (with the median of 96%) are higher than the maximum results of BBPSO (96%), binary BBPSO (96%), and PSO (96%). Besides that, 25% of the results of Algorithm 1 (with the 3rd quartile of 96%) outperform the maximum scores of BBPSO, binary BBPSO and PSO. In addition, 75% of the GM results of Algorithm 2 (with the 1st quartile of 95%) are higher than the maximum results of ELPSO (94%) and GA (93%), while 50% of the results of Algorithm 1 (with the median of 94%) are higher than the maximum results of ELPSO and GA.
In the second experimental setting, we have evaluated all algorithms using 100 images for training with a 50:50 split for healthy and blast instances and 80 (70 abnormal and 10 normal) unseen images for test. A total of 30 runs for each algorithm have been conducted. Table 3 shows the average GM results integrated with both 1NN and RBF-based SVM for evaluation of the 80 unseen test images over 30 runs, as well as the result from the original set of 80 raw features for this experimental setting. Fig. 6 shows the classification performance variations in boxplots of both proposed algorithms and all other methods.
Since the current fitness function defined by Eq. (11) focuses more on classification performance as compared with the number of selected features, we have applied another fitness function defined in Eq. (12), which indicates a comparatively more balanced trade-off between the GM result and the number of selected features.
where number all and number featuresC indicate the overall number of raw features (i.e. 80) and the number of selected features, respectively. The second part of Eq. (12) indicates that the number of selected features has more influences on the overall fitness function than the corresponding part of the original fitness function defined in Eq. (11). In addition, the same weight settings of and 1 − as in Eq. (11) are used in Eq. (12). Note that Performance C represents the performance evaluation using GM. A series of tests has been conducted using the new fitness function in Eq. (12) for the above two experimental settings, in order to further evaluate the efficiency of the proposed algorithms. Evaluated with a benchmark of 30 runs using the first experimental setting (with unbalanced 90 images for training and 90 unseen samples for test), Table 4 and Fig. 7 illustrate the average GM scores of all methods and the detailed performance variations of all algorithms integrated with both 1NN and SVM classifiers over 30 runs, respectively.
We have used the new fitness function defined in Eq. (12) to further evaluate the efficiency of all algorithms under the second experimental setting with balanced 100 images for training and 80 unseen images for test. A total of 30 runs have been conducted for each algorithm. Table 5 and Fig. 8 show the average GM results of all methods and the detailed performance variations of all algorithms integrated with both classifiers over 30 trials, respectively.
Evaluated with the new fitness function under the two experimental settings, the empirical results further ascertain the effectiveness of the proposed algorithms. Overall, the proposed algorithms outperform all other PSO variants and classical search methods across two different experimental settings with two fitness functions.
We have also compared our results with other related ALL detection studies reported in the literature. To the best of our knowledge, Putzu et al. [1] and Madhukar et al. [40] achieved high recognition performances using the ALL-IDB database. Evaluated using 10-fold cross validation, Putzu et al. [1] obtained 93.2% accuracy using RBFbased SVM with 131 features. Using SVM and leave-one-out cross validation, Madhukar et al. [40] achieved 93.5% accuracy with a high dimensional feature vector consisting of shape, texture, and HD features of the nuclei extracted to distinguish the normal and blast cells. However, both studies did not employ feature selection processes. Evaluated using SVM with hold-out validation, when trained with balanced 100 instances and tested with 80 unseen images, our study yields comparatively smaller discriminative feature subsets for healthy and blast cells classification, and achieves average GM results of 94.80% and 96.19% over 30 runs using Algorithm 1 with the fitness function defined in Eq. (11) and Algorithm 2 with the fitness function defined in Eq. (12), respectively. The performances of our algorithms are obtained by averaging the results of 30 runs in each experimental setting. In short, our algorithms compare favourably with other related methods for ALL detection reported in the literature, indicating the efficiency of the proposed search mechanisms.
Overall, in comparison with all other methods, the proposed algorithms incorporate two new accelerated chaotic search behaviours. Algorithm 1 employs the food chasing movement motivated by the special cases of PSO and CS to move towards promising search regions and the flee operation to avoid unattractive areas. Both these behaviours help overcome the local optimum trap and achieve fast convergence. On the other hand, Algorithm 2 enables the subswarm-based attraction and flee operations to work in a collaborative manner to avoid premature convergence. Both proposed search behaviours have also been further enhanced by using the Logistic chaotic map and the average personal best and worst  experiences from the past iterations. These mechanisms lead to the impressive performance of the proposed algorithms, which show significant efficiency in escaping from local optima.
To further evaluate the efficiency of the proposed BBPSO algorithms, a new, cross-domain data set (sonar) from the UCI Machine Learning Repository [41] has also been used. This sonar data set has 60 attributes, 2 classes, and 208 instances. We employ 140 and 68 instances for training and test, respectively. The training data set has a balanced number of samples for each class, with the remaining samples for test. Furthermore, each optimization algorithm is used for feature selection with both classifiers using hold-out validation for test. The detailed GM results using both fitness evaluations are shown in Tables 6-7, respectively.
As indicated in Tables 6-7, the two proposed BBPSO algorithms achieve the best performances for this sonar data set, and they outperform all other methods consistently. Using the fitness function defined in Eq. (11), Algorithm 1 with the SVM classifier achieves the highest GM performance of 88.85%. It outperforms BBPSO, binary BBPSO, ELPSO, GA and PSO by 3.41%, 4.42%, 3.8%, 9.45% and 3.44%, respectively. Integrated with the 1NN classifier, Algorithm 2 obtains the best GM performance of 89.84%. It outperforms BBPSO, binary BBPSO, ELPSO, GA and PSO by 4.94%, 4.52%, 4.2%, 9.66% and 4.58%, respectively. The results further ascertain the efficiency of the proposed attraction and flee search mechanisms in both algorithms.
To further indicate the efficiency of the proposed algorithms, the two-sided Wilcoxon rank sum test [42,43] has been conducted. This statistical test is a non-parametric method to determine whether two solutions are significantly different statistically. It tests the null hypothesis (whether both solutions have an equal median) with a p-value, which is set at the 95% significance level (i.e., ␣ = 0.05) in this study. As such, the null hypothesis (i.e., both solutions have an equal median) is rejected if the p-value is lower than 0.05. We employ this Wilcoxon rank sum test to further indicate the significance level of the proposed algorithms. Tables 8 and 9 illustrate the detailed results of the rank sum test.
As indicated in Tables 8 and 9, the p-values for all the experiments conducted for ALL classification, are lower than 0.05. This indicates the GM results of our algorithms for both experimental settings with two fitness functions are significantly better than those from other baseline methods, statistically. The results for the sonar data set also indicate that the proposed algorithms outperform other methods statistically, with all the p-values lower than 0.05.
Besides the Wilcoxon rank sum test, another popular statistical test, i.e. the two-tailed sign test [42,44], has also been conducted. In the sign test, a binomial distribution is used to define the critical number of wins needed to achieve the 95% significance level (i.e., ␣ = 0.05) under different experimental settings. Pairwise comparisons are conducted and the number of cases that an algorithm is  Table 9 p-Values of the Wilcoxon rank sum test for Algorithm 2, with ObjFun1 and ObjFun2 indicating the fitness functions in Eqs. (11) and (12) the overall winner is identified. According to [42], for a total of 12 experiments (as in Tables 2-7), the critical number of wins for the sign test at the 95% significance level is 10. In other words, an algorithm needs to become an overall winner for at least 10 times, in order to achieve the significance level of ␣ = 0.05. Since the two proposed algorithms win each baseline method 12 times, respectively, both proposed algorithms show a statistically significant improvement (at the 95% significance level) over GA, PSO, ELPSO, BBPSO and binary BBPSO. These statistical tests indicate the effectiveness and superiority of the proposed algorithms.

Conclusions
In this research, we have proposed two modified BBPSO algorithms for feature optimization to enhance ALL classification. Both proposed algorithms employ accelerated chaotic search mechanisms of attraction to the food source and fleeing from enemies to diversify the search and escape from the local optimum trap. These two new search behaviours have also been further enhanced by the Logistic chaotic map. The first proposed algorithm explores the two new search operations in the main swarm, while the second proposed algorithm embeds these actions in the subswarm based search. Moreover, instead of using the personal best and worst experiences from the current iteration for the attraction and flee actions, we take the mean of the personal historical best and worst positions from the past iterations, respectively, to further enhance the proposed search mechanisms. Evaluated with the ALL-IDB2 database with 100 images for training and 80 for test, using 1NN with hold-out validation, our studies achieve superior average GM scores of 94.94% and 96.25% using both proposed Algorithms 1 and 2 with the fitness function defined in Eq. (12), respectively. In comparison with other advanced and classical search methods, both proposed algorithms are able to identify smaller numbers of features and to achieve faster convergence rates, which outperform other methods and related studies for ALL diagnosis reported in the literature. We have also evaluated the proposed algorithms using a cross-domain sonar data set from the UCI Machine Learning Repository. Both proposed algorithms outperform other related methods consistently for the sonar data set, which further indicates the efficiency and robustness of the proposed BBPSO-based feature optimization algorithms. The empirical results from the two statistical tests, i.e. the Wilcoxon rank sum and the sign tests, for both ALL and sonar data sets have also indicated statistically significant performances of both proposed algorithms. For further work, we will use other medical image data sets [45][46][47] to further evaluate the efficiency of the proposed algorithms in general medical diagnosis problems.