A Novel Feature Selection Strategy Based on Salp Swarm Algorithm for Plant Disease Detection

Deep learning has been widely used for plant disease recognition in smart agriculture and has proven to be a powerful tool for image classification and pattern recognition. However, it has limited interpretability for deep features. With the transfer of expert knowledge, handcrafted features provide a new way for personalized diagnosis of plant diseases. However, irrelevant and redundant features lead to high dimensionality. In this study, we proposed a swarm intelligence algorithm for feature selection [salp swarm algorithm for feature selection (SSAFS)] in image-based plant disease detection. SSAFS is employed to determine the ideal combination of handcrafted features to maximize classification success while minimizing the number of features. To verify the effectiveness of the developed SSAFS algorithm, we conducted experimental studies using SSAFS and 5 metaheuristic algorithms. Several evaluation metrics were used to evaluate and analyze the performance of these methods on 4 datasets from the UCI machine learning repository and 6 plant phenomics datasets from PlantVillage. Experimental results and statistical analyses validated the outstanding performance of SSAFS compared to existing state-of-the-art algorithms, confirming the superiority of SSAFS in exploring the feature space and identifying the most valuable features for diseased plant image classification. This computational tool will allow us to explore an optimal combination of handcrafted features to improve plant disease recognition accuracy and processing time.


Introduction
The number of plant diseases caused by various bacterial, fungal, and viral infections has increased in recent years [1]. Plant diseases can cause substantial losses in agricultural production, resulting in economic losses [2]. Thus, timely and accurate de tection of plant diseases is crucial for plant protection [3]. Al though plant disease symptoms are evident in various organs of the plant [4], plant leaves are primarily used to detect infections [5,6]. The use of computer vision in plant disease detection has been gaining significant attention in the past few years due to its potential to accurately and efficiently detect plant diseases. Computer vision techniques involve using algorithms and machine learning (ML) models to analyze images of plant leaves and identify disease symptoms [7,8]. The use of computer vision in plant disease detection can enable early disease detection, which is critical for disease management and prevention.
High-throughput phenomics allows rapid, accurate, and high-throughput analysis of plant traits, including disease severity on a larger scale [9,10]. However, there is still a great demand for reliable and efficient computational methods/ pipelines to ad dress plant phenomics [11]. Developing novel phenotypic image-based algorithms will help to detect and classify plant disease quickly and accurately.
In the field of computer vision, models or algorithms for image classification rely on extracting local or global features of images [12][13][14][15][16]. There are 2 ways of describing image features, handcrafted features and non-handcrafted features [17][18][19]. Handcrafted features mainly depend on the prior knowledge of experts and are often used together with traditional ML methods for object detection and image classification [20][21][22]. Rep resentative handcrafted descriptors include gradient statistics and local intensity comparison-and local binary pattern (LBP)-based methods [23][24][25]. However, hand crafted feature-based models require expert-designed feature detectors, descriptors, and vocabulary building methods for feature extraction and representation [26]. The whole process is labor-intensive and requires relevant expertise [26][27][28]. Non-handcrafted features can be extracted using the compact binary descriptor (CBD), convolutional neural networks (CNN), or principal component analysis network (PCAN) [29][30][31]. In particular, CNN-based deep features are robust and efficient compared to the handcrafted features [18,29,32], due to their independence of prior knowledge and image extraction and applicability to any images. However, non-handcrafted features have the following limitations: (a) The interpretability of non-handcrafted features is low, which means that it is difficult to describe the learned features [33,34]. (b) It is hard to obtain large datasets for model training [35,36]. Therefore, developing a novel feature selection strategy can potentially identify important features and increase the classification accuracy.
There is little literature on feature selection-based plant disease detection and identification. Some studies manually select top-ranked image features for leaf disease detection using feature ranking strategies [37,38]. Recently, metaheuristic algorithms developed in light of biology have been used for efficient feature selection, but rarely for improving imagebased plant disease detection [39].
In this study, we proposed an enhanced salp swarm algorithm for feature selection (SSAFS) in image classification of diseased plants. Firstly, for leaf images of diseased plant, we defined 171 handcrafted features uniformly, including 45 color features and 126 texture features. Secondly, SSAFS was applied on each dataset to screen the optimal subsets of handcrafted features. Finally, all the SSAFS-derived potential feature subsets were evaluated with a neural network classifier. During the experiments, we applied SSAFS on 4 datasets from the UCI machine learning repository [40] and verified the per form ance of the developed method. SSAFS was further tested on 6 phenomics datasets of diseased plants obtained from PlantVillage [7]. Our results demonstrated that SSAFS not only significantly reduces the count of features, but also significantly improves the classification accuracy. Comparison analysis further shows that SSAFS outperforms multiple state-of-the-art (SOTA) methods.
The main contributions of the paper are as follows: • This is the first study that develops a swarm intelligence algorithm (SIA) for image-based plant disease detection and severity grading estimation. • An enhanced SSAFS was developed for identifying the valuable handcrafted features of plant images. • Five kinds of well-known SIA models for feature selection were compared. The feature subset obtained from the SSAFS model achieved the highest classification accuracy.
The rest of this article is organized as follows: The Related Work gives a brief description of the related works. Details of the SSAFS algorithm, simulation experiment are described in Materials and Methods. Experimental results are presented and analyzed in Results. Discussion provides a discussion of the main findings. Finally, the Conclusion section provides the conclusion and future work.

Related Work
Computer vision-based plant disease detection is an active area of research in the field of intelligent phytoprotection. In recent years, there has been growing interest in developing automated systems that can detect and diagnose plant diseases using computer vision techniques [6,[41][42][43]. Building ML models for classification tasks is a traditional but effective approach. How ever, discovering, selecting, and understanding appropriate features are crucial for these classification models. This is re lated to whether the models perform well and whether the features are interpretable. Currently, there are 2 types of computational strategies available for the image-based classification of plant diseases: handcrafted feature-based methods and non-handcrafted feature-based methods.

Handcrafted feature-based methods
Traditional ML models often use "handcrafted" features for object recognition and computer vision [35,44]. These features are manually designed by experts based on their observations to address specific issues, such as occlusions and changes in scale and illumination [17]. The most commonly used manual features include various morphological features, such as color, texture, and shape, to simulate the appearance of the object of interest [45,46]. So far, many algorithms based on handcrafted features have been applied for automatic plant disease recog nition [47]. The most representative of these algorithms is the combination of morphological features extracted from leaf images and ML models (e.g., Support Vector Machine [SVM], artificial neural network ANN, K-Nearest Neighbors Algorithm [KNN], K-means, etc.) to achieve plant disease detection [48][49][50][51][52][53][54]. Although the handcrafted features are interpretable, it is unavoidable to mix in redundant and irrelevant features, which will affect the performance of ML models. Va rious variable selection methods are developed via filter-based strategies [55] to select the most robust features for plant image classification, including ReliefF [56], correlation-based methods [57,58], and minimal redundancy-maximal relevance [59]. The above filter strategies ignore feature de pendencies, making removing redundant features difficult [60]. Unlike the filter models, wrapper methods use the predic tion performance of the ML model as a metric to help search for the best subset of features [61]. It is worth mentioning that metaheuristic algorithms have been widely used for feature selection in computer vision in order to optimize feature dependencies and feature-classifier interactions [62][63][64][65]. The application fields of metaheuristic algorithms include Genetic Algorithm [54], Particle Swarm Optimization (PSO) [66][67][68][69], Squirrel search algorithm [70], and Artificial Bee Colony (ABC) [71,72]. However, the potential of these algorithms may be weakened when faced with high-dimensional problems [73]. Therefore, developing efficient and generalized feature selection algorithms  for plant disease detection is still challenging. Overall, handcrafted feature-based methods have played an essential role in the field of plant phenomics processing. However, its obvious limitation is overdependence on expert knowledge.

Non-handcrafted feature-based methods
Non-handcrafted features can be extracted in 3 ways: CNN [17], PCAN [74], and CBD [31]. Among them, CNN-based deep features are the most representative and have gradually superseded shallow classifiers trained using handcrafted features [17,19,20,47]. The advantages of CNN over traditional methods include hierarchical feature representation, exponentially im proved expressive capability, and multi-task joint optimization [20]. Because of these advantages, CNN has been widely applied in image classification [19,75 ,76], face recognition [77][78][79], and video analysis [80][81][82] tasks. Similarly, CNN models are introduced to agricultural applications, e.g., disease recognition [7], weed detection [83], flower counting [84], and fruit grading [85]. One of the advantages of using CNNs for learning is that they are able to automatically learn and extract relevant features from raw input data, thereby eliminating the need for manual feature engineering [47]. In plant disease detection, acquiring large and diverse samples can be challenging due to many factors. Therefore, transfer learning provides an effective way in training CNN classifiers. Researchers have developed extensive CNN models and extracted deep features for image-based disease classification and severity grading of various plant diseases, including cucumber leaf disease [86,87], nutritional deficiencies and damage on apple leaves [88], banana and tomato leaf diseases [89,90], pest diseases on rice [91], and leaf spot disease in sugar beet [92]. However, the prediction accuracy provided by deep learning techniques depends on sufficient training samples. Moreover, when a training set does not have samples belonging to a certain class, the decision boundary may be overstrained [93]. In addition, the uninterpretable high dimensionality of deep features can lead to higher computational costs. According to the above summary, we believe that there is still much space for improvement in both approaches. Therefore, in this study, we use a feature selection strategy to optimize the performance of handcrafted-based plant disease detection.

SSAFS algorithm for Diseased Plant Classification
The SSAFS-derived feature selection for diseased plant classification includes 4 steps ( Fig. 1) and will be explained in detail in the following subsections.

Image preprocessing
Before feature extraction, image preprocessing is required. Firstly, we remove the background and edges from each raw image and keep only the effective area of leaf so that the extracted local features can characterize the lesions in the leaf image. In this study, we used GrabCut algorithm [94] to implement foreground segmentation (Fig. S1). Secondly, each image is further converted to 5 color spaces (RGB, HSV, Lab, YCrCb, and Luv) so that we can extract color features at image level from different color spaces.

Handcrafted feature extraction of plant phenomics
In this study, we defined 2 types of handcrafted features for image classification of diseased plant, including color and texture features. Color features are mainly represented by color moments, including color first-order moment (average), color second-order moment (variance), and color third-order mo ment (skewness) [95,96]. Since each pixel has 3 color channels in one color space, the color moment of one image has 9 components to describe [97]. Therefore, 5 color spaces will provide 45 color features. Moreover, the texture descriptors include CLCM (color-level co-occurrence matrices) and LBP [98][99][100]. The texture descriptors in RGB, HSV, and Lab space are focused. In total, we have 36 GLCM (gray-level co-occurrence matrix)-based texture features (contrast, dissimilarity, correlation, and homogeneity) and 90 LBP-based features. There are 171 handcrafted features ex tracted from each leaf image with lesion.

Feature selection with the SSAFS algorithm
Salp swarm algorithm (SSA), proposed by Mirjalili et al. [101], is a relative new metaheuristic algorithms [102,103]. The inspiration of SSA is the swarming behavior of the sea organism called salps. The salps are barrel-shaped, free-floating tunicates from the family of Salpidae. When navigating and foraging in the ocean, salps often float together in chains of salps [104]. The basic idea behind the SSA algorithm is to imitate the swarming behavior of salps in the deep oceans based on the salps chain. The salp at the head of the chain acts as the leader, and the following salps are followers [105]. Each individual represents a candidate solution for the targeted problem (food source). A population of N salp individuals is defined as a 2-dimensional j denotes the position of the leader at the jth dimension. x i j presents the position of the ith follower at the jth dimension (2 ≤ i ≤ N). When SSA algorithm is used for feature selection problems, all solutions are limited to the binary values, i.e., x i j ∈ [0,1]. x i j represents the jth feature (color or texture) on the ith image sample. If x i j = 1, the jth feature is selected. The feature selection framework based on the SSAFS algorithm is shown in Fig. 2.

Population initialization
As shown in Fig. 2, the first step of SSAFS is population initialization. In this step, a swarm of N salp individuals is randomly generated. The quality of initial population is closely related to the convergence speed of the algorithm [106]. The chaotic map is a nonlinear dynamic system that could generate random numbers with special dynamics characteristics [107]. The generated random numbers exhibit non-repetitiveness, ergodicity, regularity, and unpredictability [108].
In our study, chaotic map with ergodic property [108] is employed to initialize uniform distributed salp to improve solutions diversity. In the original SSA, the initial state of the ith salp is defined as Eq. 1. where x i j (k) represents the position of the ith follower in the jth dimension space at the iteration k; ub j (k) and lb j (k) denote the upper and lower bounds of the jth dimension, respectively. Equation 2 describes the expression of the logistic mapping for o k : where μ is the bifurcation parameter of the logical mapping. Considering the fact that feature selection is a binary question, we need to define a transfer function to make the binary version of SSA from the continuous version [109,110]. Therefore, the variable x i j (k) shown in Eq. 1 will be further transferred as Boolean state through the following Equations 3 and 4: In Equation 3, x i j (k) is equal to 1 if the jth feature in the ith individual is selected; otherwise, x i j (k) = 0.

Fitness calculation
In SIAs, the fitness function is an important metric to evaluate the strengths of individuals within a population [111]. The fitness value reflects the goodness of fit of each candidate's solution to the targeted question [112,113]. There fore, the selection of the fitness function determines the balance of the multi-objective algorithm in the optimization process. As a multi-objective problem, features selection try to minimize the subset of selected features and maximize the accuracy of the output for a given classifier, simultaneously [114]. According to the above basis, the fitness function for determining solutions in this situation built to achieve a balance between 2 objectives is defined in Eq. 5.
where the function Err(*) denotes the classification error of the potential feature subset FS, and |FS| and N denote the number of selected features and the total number of features. The coefficients ρ and ϕ are the balance parameters that control the classification accuracy and the rate of features being selected. In addition, ρ and ϕ satisfy that ρ ∈ [0, 1] and ρ + ϕ = 1. A smaller fitness value is better.
k-NN algorithm is a non-parametric supervised learning algorithm. It relies on the closest k labeled instances (neighbors) to learn a function that produces an appropriate prediction for a given unlabeled example [115]. Here, the k-NN model was employed as the classification method to evaluate the feature subsets generated by SSAFS, where k = 3. Specifically, we use 5-fold cross-validation of k-NN to calculate the value Err( * ) of a feature subset.

Population Evaluation
The role of the leader of the salp chain is to search for the food source. Hence, the position of the leader is dynamically updated based on the location of the food source. In the original SSA, the leader is updated by using the following Eq. 6: where x 1 j (k) denotes the position of the leader in the jth dimensional space in the kth iteration, and F (k) j represents the position of the food source. ub (k) j and lb (k) j are the upper and lower boundaries, respectively. Parameters c 2 and c 3 control the step size and the directions of the next move, respectively (c 2 , c 3 ∈ [0, 1]). Specifically, c 1 is defined for balancing global search and local search capabilities: where k is the index of the current iteration, and K is the maximum of iterations. From Eq. 6, we found that the position of leader is mainly determined by the position of the current optimal food source. Once the leader falls into a local optimum, the whole population will fall into local stagnation.
To avoid the above issue, we introduce the Sine Cosine Algorithm (SCA) [116] here to improve the strategy of popula tion evolution. The leader in our algorithm is updated through Eqs. 8 and 9: In Eq. (8), r 2 is a parameter located in the interval [0,2π], which determines the neighborhood around the current solution in the search space. The coefficient r 3 regulates the speed of the search process. r 4 is used to switch the updating strategy of the leader between the sine and cosine components. In particular, r 1 is a random number uniformly distributed in the range [0,1] that determines the direction of the current solution. In the early stage of the iterations, r 1 helps to explore the search space, while contributing to exploiting the available search space.
Furthermore, the state of the follower was defined as follows: where v 0 is the initial velocity. Coefficient a is the acceleration and a = v final /v 0 . Overall, the location of the ith follower in the next iteration is jointly determined by its current and previous position:

Classification performance evaluation based on an ANN
After obtaining the optimal feature subset through the SSAFS algorithm, we constructed an ANN model to investigate the performance for image classification. As shown in Fig. S2, there are 4 layers involved in this network, including an input layer, an output layer, and 2 hidden layers. The number of neurons in the input layer equals the dimensionality of an optimal feature subset obtained from SSAFS. The output layer consists of 3 neurons with a Boolean state, which indicates the discriminative results of image classification.

Data collection
In the experiments, we first applied the SSAFS approach to 4 UCI datasets, including Heart, Urban Land Cover, Arrhythmia, and CNAE-9 (Table 1). Secondly, we collected a set of diseased leaf images of apples, corn, grapes, and coffee from PlantVillage [7]. Based on the annotation information, we categorized these phenomics data into 2 groups. The first group contains 4 datasets for testing the classification performance of a classifier for different diseases on the same crop ( Table 2). The second group includes 2 datasets for testing the classification (severity grading) performance of a classifier for different severity levels of the same disease (Table 3).

Experiment design
We applied SSAFS to 4 UCI datasets with different scales to verify its effectiveness. We then further tested SSAFS on 6 plant phenomics datasets. After obtaining the optimal feature subset of each dataset, a neural network classifier was constructed to test its prediction power. The training set and testing set were as signed as 4:1. Moreover, we further evaluate if SSAFS provides faster convergence and stronger robustness. We run our method with 30 replicates to avoid the impact of population initialization on the model output. Finally, we selected PSO [117], ABC [118], Improved Binary Grey Wolf Optimization (IBGWO) [119], Squirrel search algorithm (abbreviated as "Squirrel") [120], and standard SSA [101] as baseline algorithms to implement comparison analysis, which potentially reveals the advantages of our method.

Parameter optimization
All the simulations were performed by using Python 3.6 and IDE Jupyter notebook under the environment with Ubuntu 18.04 on 6×Xeon E5-2678 v3 and 128 GB RAM. In the SSAFS framework, we set the iteration K as 50. Other parameters were set as follows: μ = 4, c 1 = 6, ρ = 0.9, and a = 2. For the ANN classifier, the number of neurons in the latent layers is 128 and 32, respectively. Sigmod was used as the activation function in the output layer. The number of epochs in the ANN model is 200. In the comparison analysis, the default parameters are presented in Table 4.

Performance evaluation
In this study, the fitness score shown in Eq. 5 is defined to evaluate the goodness of any potential optimal feature subset. Accuracy represents the classification performance of a feature subset on a classifier. The dimensionality for a feature subset reflects how many important features potentially contribute to classification. Table 5 shows the average performance of SSAFS and the other 5 SOTA algorithms on 4 UCI datasets. Due to the low dimensionality of the Heart dataset, the classification performance and the number of relevant features obtained by the 6 algorithms are very close. On the dataset Urban Land Cover, SSAFS significantly outperformed other algorithms. For CNAE-9 and Arrhythmia, SSAFS is close to IBGWO and SSA, but is superior to PSO, ABC, and Squirrel. However, the processed dataset is still high-dimensional after feature selection. We found that SSAFS can achieve higher classification accuracy with fewer features. Moreover, we examined the quality of the best solution provided by each algorithm to see the difference in global search. Table 6 indicates that the optimal solution found by SSAFS is better than other comparison methods.

SSAFS works well on the phenomics of diseased plant
SSAFS was first applied to 6 phenomics datasets of the diseased plant (Tables 2 and 3). As described in the Handcrafted feature extraction of plant phenomics section, all of the phenomics datasets share the common 171 handcrafted features. Table 7 indicates that the potentially important features (for each dataset) screened by SSAFS appear to provide the best classification performance in both sample stratification and severity estimation. Compared with other algorithms, the optimal solution output from SSAFS contains relatively fewer features. These differences are significant on the dataset DS_cn_rust. Secondly, we compared the best solution of each phenomics dataset obtained from each algorithm (Table 8). In datasets DS_grape and DS_coffee, SSAFS outputs a combination of ~30 features, which shows high classification accuracy. On average, the dimensionality of the optimal solutions for all the datasets is 31.8, which is obviously superior to PSO, ABC, and SSA.
To further validate the performance of SSAFS, the ANN model is employed as the classifier to evaluate the optimal solutions searched by SSAFS. From Table 9, we find that the optimal feature subsets of the first 4 datasets output by SSAFS still have the highest classification accuracy on ANN models. Combining Figs. S3 and S4, we found that the improvement of SSAFS is relative to standard SSA. Finally, the computational cost of all the algorithms was evaluated on 6 plant datasets. From Table  10, we can easily find that SSAFS works efficiently on datasets with smaller samples. IBGWO also provides low computational costs on DS_coffee and DS_cn_rust. In summary, we suggest that the proposed SSAFS provides a new way to screen valuable handcrafted features for image classification of diseased plants.

The robustness of the SSAFS algorithm
To analyze the robustness, we further evaluated the proposed SSAFS method from 2 aspects. In one aspect, we checked if the converge curve of SSAFS was steady across multiple replicates. From Fig. 4, we find that SSAFS shows stable convergence on 5 datasets except for DS_coffee. It proves that population initialization has no significant impact on the output of SSAFS optimization. We also checked if the SSAFS algorithm quickly converges to the optimal solution in iterations. In Fig. 5, we can see that SSAFS not only provides a better solution than the other 3 algorithms but also converges significantly faster. In particular, SSAFS exhibited extremely fast convergence rates on dataset DS_corn (Fig. 5A) and DS_apple (Fig. 5B) and reached global solutions within 10 iterations. Figures S5 and S6 indicate that our method displays good convergence. Over all, we conclude that the proposed method is reliable for feature selection.

Statistical analysis for the SSAFS-derived features of plant phenomics datasets
In this section, we discuss the biological meaning of the potential features calculated by SSAFS. Of the 171 handcrafted features we defined above, 18 important features were present in the optimal feature subsets of at least 3 phenomics datasets (Fig. 6A). Among these 18 features, the proportion of color features is higher than that of texture features, indicating that color features play a more important role in plant image classification. In particular, we noticed that the top 4 color features come from the spaces RGB, LAB, HSV, and YCrCb (Table 11), including 3 variables for "variance" and 1 variable for "average" of color moment. It seems that variance has stronger stratification power rather than average and skewness. Moreover, we examined the proportions of color and texture features before and after feature selection. Combining 6 optimal feature subsets (derived by SSAFS) for all the phenomics datasets, we obtained 73 color features and 137 texture features (42 for GLCM and 95 for BLP). From Fig. 6B, it is evident that only the proportions of color features increase significantly after feature selection. Compared with CLCM, LBP-based features may include some irrelevant variables, which can be removed by SSAFS.

Discussion
In this study, an enhanced SSAFS was developed to image classification of diseased plants. In the proposed method, using chaotic maps for population initialization can effectively improve the diversity of solutions. Specifically, SCA algorithm not only prevents the heuristic search from falling into local optimums, but also speeds up the convergence rate.
The proposed SSAFS algorithm was validated using 4 UCI public datasets and 6 phenomics datasets of diseased plants.
The results were compared with 5 other heuristic feature selection techniques, namely, PSO, ABC, IBGWO, Squirrel, and SSA. The simulation results indicate that the performance of the SSAFS-based method is better than traditional wrapper feature-selecting techniques. The optimal solutions searched by SSAFS are less dependent on the classifier. More importantly, one of the crucial contributions of this work to plant phenomics is the definition of handcrafted features and the precision screen of relevant features through a novel computational approach. It provides new insight into computer vision-based plant image classification.
Limitations exist in the current work. Although our method shows great convergence and robustness, there is no guarantee that 30 replicates on new datasets are still reasonable. Therefore, we suggest that parallel computing should be assigned at least 100 replicates. Moreover, morphology and shape are also important features that were not considered in the current study. Since the classification accuracy mostly depends on the noise level within the images, verifying the proposed method in outdoor environments is also necessary. Furthermore, more extensive testing on various phenomics datasets will be necessary in the future.

Conclusion
In this study, we proposed an SIA for feature selection (SSAFS). The SSAFS method was used to identify handcrafted features of diseased plant images to obtain classifiers with higher accuracy. Our approach outperforms other swarm intelligence methods in screening the most valuable features. Furthermore, our findings highlight the importance of local features that are critical for disease detection. Future work will adopt other population-based multiobjective methods to deal with similar problems. We propose to combine comprehensive handcrafted and non-handcrafted features of plant images for accurate and efficient detection in the field of phenomics. In addition to leaf images, phenomics data collected from other plant organs are also valuable for plant disease detection. at Nanjing Agricultural University (No. 106/804001). This work was partially supported by the Natural Science Foundation of Zhejiang province (No. LY20F020003). Author contributions: X.X.: Data analysis, coding, and writing. F.X.: Coding and visualization. Y.W. and S.L.: Resource and software. K.Y.: Review and editing. H.X.: Validation. Z.J.: Supervision, project administration, funding acquisition, conceptualization, and methodology. Competing interes ts: The authors declare that they have no competing interests.

Data Availability
All the processed data and source code can be freely accessed at GitHub: https://github.com/JakeJiUThealth/SSAFS_V1.0.

Supplementary Materials
Fig. S1. Grabcut algorithm for background segmentation of leaf images. Fig. S2. Neural network classifier for evaluating the optimal feature subsets. Fig. S3. Performance test of optimal solutions obtained from SSAFS on neural network for 6 plant phenomics datasets. Fig. S4. Performance test of optimal solutions obtained from SSA on neural network for 6 plant phenomics datasets. Fig. S5. The distribution of the optimal solutions in parallel computing on the UCI dataset. Fig. S6. The distribution of the optimal solutions in parallel computing on the plant phenomics dataset.