Ensemble Feature Selection Approach Based on Feature Ranking for Rice Seed Images Classification

In smart agriculture, rice variety inspection systems based on computer vision need to be used for recognizing rice seeds instead of using technical experts. In this paper, we have investigated three types of local descriptors, such as Local Binary Pattern (LBP), Histogram of Oriented Gradients (HOG) and GIST to characterize rice seed images. However, this approach raises the curse of dimensionality phenomenon and needs to select the relevant features for a compact and better representation model. A new ensemble feature selection is proposed to represent all useful information collected from different single feature selection methods. The experimental results have shown the efficiency of our proposed method in terms of accuracy.


Introduction
Rice is the most important food source of people in many countries including Asia, Africa, Latin America, and the Middle East. Products made from rice, including rice products and indirect products, are indispensable in the daily meals of billions of people around the world. Nowadays, more rice varieties are created with diversified quality and productivity. Different varieties of rice can be mixed during cultivation and trading. We practically need to develop a system to automatically identify rice seeds based on machine vision. Various works have been proposed for automatic inspection and quality control in agriculture [10]. In the past decade, a great number of local image descriptors [13] have been proposed for characterizing images. Each kind of attribute represents the data in a specific space and has precise spatial meaning and statistical properties.
Different local descriptors are extracted to create a multi-view image representation, like Local Binary Pattern (LBP), Histogram of Oriented Gradients (HOG), and GIST. Nhat and Hoang [20] present a method to fuse the features extracted from three descriptors (Local Binary Pattern, Histogram of Oriented Gradient and GIST) for facial images classification. The concatenated features are then applied by canonical correlation analysis to have a compact representation before feeding into the classifier. Van and Hoang [24] propose to reduce noisy and irrelevant Local Ternary Pattern (LTP) features and HOG coding on different color spaces for face analysis. Hoai et al. [12] introduce a comparative study of hand-crafted descriptors and Convolutional Neural Networks (CNN) for rice seed images classification. Mebatsion et al. [17] fuse Fourier descriptors and three geometrical features for cereal grains recognition. Duong and Hoang [9] apply to extract rice seed images based on features coded in multiple color spaces using HOG descriptor. Multi-view learning was introduced to complement information between different views. While concatenating different feature sets, it is evident that all the features do not give the same contribution for the learning task and some features might decrease the performance. Thus, feature selection methods are applied as a preprocessing stage to high-dimensional feature space. It involves selecting pertinent and useful features, while avoiding and ignoring redundant and irrelevant information [26]. A novel teacher-student feature selection approach [19] is proposed to find the best representation of data in low dimension.
Recently, ensemble feature selection has emerged as a new approach that promises to enhance the robustness and performance. It is the process of performing different feature selection in order to find an optimum subset of features. Instead of using a single selection approach, an ensemble method combines the results of different approaches into a final single subset of features. Seijo-pardo et al. [23] propose to combine different feature selection approaches on heterogeneous data based on a predefined threshold value. Chiew et al. [5] introduce a hybrid ensemble feature selection based on Cumulative Distribution Function gradient. This method can determine an estimation of feature cut-off automatically. Drotar et al. [8] propose a new ensemble feature selection approach methods based on different voting techniques such as plurality, and Borda count. A complete and detailed review of ensemble feature selection methods is introduced in [3].
In this paper, we propose a new ensemble feature selection approach based on multi-view descriptors (LBP, HOG and GIST) extracted from rice seed images. Several feature selection approaches are further investigated and combined to find an optimum subset of features with the purpose to enhance the classification performance. This paper is organized and structured as follows. Section 2. , introduces the feature extracting methods based on three local image descriptors. Section 3.
presents a proposed ensemble feature selection framework. Section 4. shows experimental results. Finally, the conclusion is then provided in Sec. 5. .

The Feature Extracting Methods
This section briefly reviews three local image descriptors used in experiments for feature extraction.

Local Binary Pattern
The LBP P,R (x c , y c ) code of each pixel (x c , y c ) is calculated by comparing the gray value g c of the central pixel with the gray values {g i } P −1 i=0 of its P neighbors, as follows [21]: where g c is the gray value of central, g p is the gray value of P , R is the radius of the circle, and ω(g p − g c ) is defined as:

GIST
GIST is firstly proposed by Oliva and Torralba [22] in order to classify objects which represent the shape of the object. The primary idea of this approach is based on the Gabor filter: h(x, y) = e − 1 2 For each (δ x , δ y ) of the image via the Gabor filter, we obtain all the image elements that are close to the point color (u 0 x + v 0 y). The result of the calculated Vector GIST will have many dimensions. To reduce the size of the vector, we averaged each 4 × 4 grid of the above results. Each image also configures a Gabor filter with 4 scales and 8 directions (orientations), creating 32 characteristic maps of the same size.

2.3.
Histograms of Oriented Gradient HOG descriptors are applied for different tasks in machine vision [7] such as human detection [6]. HOG feature is extracted by counting the occurrences of gradient orientation based on the gradient angle and the gradient magnitude of local patches of an image. The gradient angle and magnitude at each pixel is computed in an 8 × 8 pixels patch. Next, 64 gradient feature vectors are divided into 9 angular bins 0-180 • (20 • each). The gradient magnitude T and angle K at each position (k, h) from an image J are computed as follows:

Ensemble Feature Selection
The dimension reduction has several advantages and impacts on data storage, generalization capability and computing time. Based on the availability of supervised information (i.e, class labels), feature selection techniques can be grouped into two large categories: supervised and unsupervised context [1]. Additionally, different strategies of feature selection are proposed based on evaluation processes such as filter, wrapper and hybrid methods [11]. Hybrid approaches incorporate both filter and wrapper into a single structure, in order to give an effective solution for dimensionality reduction [4]. In order to study the contribution of feature selection approaches for rice seed images classification, we propose to apply several selection approaches based on images represented by multi-view descriptors. In the following subsection, we will shortly present the common feature selection methods applied in supervised learning context.
• LASSO (Least Absolute Shrinkage and Selection Operator) allows to compute feature selection based on the assumption of linear dependency between input features and output values. Lasso minimizes the sum of squares of residuals when the sum of the absolute values of the regression coefficients is less than a constant, which yields certain strict regression coefficients equal to 0 [4] and [25].
• mRMR (Maximum Relevance and Minimum Redundancy) is a mutual information based feature selection criterion, or distance/similarity scores to select features. The aim is to penalize a feature's relevance by its redundancy in the presence of the other selected features [27].
• ReliefF [15] is extended from Relief [14] to support multiclass problems. ReliefF seems to be a promising heuristic function that may overcome the myopia of current inductive learning algorithms. Kira and Rendell used ReliefF as a preprocessor to eliminate irrelevant attributes from data description before learning. ReliefF is general, relatively efficient, and reliable enough to guide the search in the learning process [16].
• CFS (Correlation Feature Selection) mainly applies heuristic methods to evaluate the effect of a single feature corresponding to each group in order to obtain the optimal subset of attributes.
• Fisher [2] identifies a subset of features so that the distances between samples in different classes are as large as possible, while the distances between samples in the same class are as small as possible. Fisher selects the top ranked features according to its scores.
• ILFS (Infinite Latent Feature Selection) is a technique consists of three steps such as preprocessing, feature weighting based on a fully connected graph in each node that connect all features. Finally, energy scores of the path length are calculated, then rank its correspondence with the feature [18]. Figure 1 present the proposed ensemble feature selection framework. Each individual feature selection approach has its pros and cons, the aim of this proposition is to combine the pros of different methods to boost the performance in terms of accuracy. We propose to apply three independent feature selection methods to select the "best" subset of features. Then, a new ranking method is applied for combined feature space. This can increase the dimension space but it allows to collect relevant features determined by different selection methods. The meaning behind is to select the most relevant features so that we have to apply a final ranking to eliminate the redundant and noisy features.

4.
Experimental Results  The rice seed images database comprises six rice seed varieties in the northern Vietnam (illustrated in Fig. 2) [9]. We apply the 1-NN and SVM classifiers to evaluate the classification performance via accuracy rate. A half of the database is selected for the training set and the rest is used for the testing set. We use Hold-out method with ratio (1/2 and 1/2) and split the training and testing set by chessboard decomposition. All experiments are implemented and simulated by Matlab 2019a and conducted on a PC with a configuration of a CPU Xeon 3.08 GHz, 64 GBs of RAM. Table 1 shows the accuracy obtained by 1-NN and SVM classifier when no feature selection approach is applied. The first column indicates the features used for representing images. We use three individual local descriptors namely LBP, GIST, and HOG and the concatenation of "LBP + GIST" features. The second column indicates the number of features (or dimension) corresponding to features type. The third and fourth columns show the accuracy obtained by 1-NN and SVM classifier. We observe that the multi-view by concatenating multiple features gives better performance, however it increases the dimension. Hence, the performance of SVM classifier is better than 1-NN classifier with 94.7 % of accuracy. The following tables and figures illustrate in detailed the classification in single or multi-view based on three descriptors:

(b),
• GIST: Table 4, Fig. 4(a) and Fig. 4(b), • HOG: Table 5, Fig. 5(a) and Fig. 5(b), • LBP + GIST: Table 3, Fig. 6(a) and Fig. 6(b). Table 2 and Fig. 3 show that the classification performance reach 53.0 % by 1-NN classifier on LBP descriptor. After using 6 different feature selection approaches, we obtain three best candidates with descendant accuracy such as mRMR (59.0 %), ILFS (58.4 %) and ReliefF (54.2 %). Based on the proposed method illustrated in Fig. 1, the 85 % percentage of selected features by ReliefF is combined with 43 % of selected feature determined by ILFS method. We obtain the new subset of features which is calculated as follows:        So, we combine two best subset of features determined by ReliefF and ILFS with a feature space equal to 983. Next, this vector is applied again by mRMR method and 1-NN classifier to remove irrelevant features. Table 6 presents the comparison of a single and ensemble feature selection framework. We observe that the ensemble method outperforms single feature selection method for all kinds of features with 1-NN classifier. For example, we increase 1 % of accuracy compared to a single feature selection method and increase 7 % compared with the classification when no selection method is applied. Similar experimental results are obtained by using SVM classifier on single view descriptor. In terms of dimension, we increase the feature space by combining and selecting useful information of different single feature selection methods. Compared with the aims based on accuracy or time computing, an appropriate approach for such demand has to be chosen.

Conclusion
In this paper, we introduced a new ensemble feature selection approach by combining multiple single feature selection methods. A pre-selected subset of features is first determined by considering feature selec-tion and associated classifier. Multiple subsets are then combined to form a final feature space and then applied feature selection method again to eliminate noisy and redundant features. The experimental results on the VNRICE dataset for rice seed images classification have shown the efficiency of the proposed approach.
The future of this work is to determine an appropriate selection method based on each attribute and using different strategies to combine the final feature vector resulting from a single feature selection method.