Pseudo Optimization of E-Nose Data Using Region Selection with Feature Feedback Based on Regularized Linear Discriminant Analysis

In this paper, we present a pseudo optimization method for electronic nose (e-nose) data using region selection with feature feedback based on regularized linear discriminant analysis (R-LDA) to enhance the performance and cost functions of an e-nose system. To implement cost- and performance-effective e-nose systems, the number of channels, sampling time and sensing time of the e-nose must be considered. We propose a method to select both important channels and an important time-horizon by analyzing e-nose sensor data. By extending previous feature feedback results, we obtain a two-dimensional discriminant information map consisting of channels and time units by reverse mapping the feature space to the data space based on R-LDA. The discriminant information map enables optimal channels and time units to be heuristically selected to improve the performance and cost functions. The efficacy of the proposed method is demonstrated experimentally for different volatile organic compounds. In particular, our method is both cost and performance effective for the real implementation of e-nose systems.

Pattern recognition and data mining processes are essential for e-nose systems. Since two-dimensional data are obtained from numerous sensor channels with different characteristics, it is important to utilize efficient and suitable pattern-recognition methods [19][20][21][22][23][24][25][26][27][28][29]. Processing e-nose data effectively can potentially improve the performance of the designed hardware. Moreover, the results can be used to design additional sensors.
Numerous studies have applied pattern-recognition methods, such as feature extraction or selection, to e-nose systems [8][9][10][11][12][13][14][15][16][17][18]. In [11], template matching was adopted to classify e-nose data into two-dimensional image form. In [12][13][14][15], linear discriminant analysis (LDA), support vector machine (SVM) and relative vector machine (RVM) were used for classification. Various optimization-like techniques have also been proposed to reduce the number of sensor arrays [16,17] and the processing time-horizon [18]. In [17], the rough set-based optimization technique was proposed to select sensor channels. In [16,18], feature feedback-based pattern-recognition methods were proposed for e-nose systems. In [24], feature feedback is introduced as a data refinement technique to reduce the redundancy of a high-dimensional face image dataset. For the e-nose dataset used in our paper, by reverse mapping from the feature space to the original data space, using principle component analysis (PCA) and LDA (PCA + LDA), channel selection [16] and time-horizon selection [18] can be achieved. By retaining the important parts of the original data and discarding redundant data, the sensor array was further optimized, and the classification process was made more efficient; specifically, the classification rate, processing time, memory size, etc., demonstrated that the recognition performance was preserved or slightly improved.
We present a pseudo optimization method for e-nose data using feature feedback based on regularized LDA (R-LDA) [28,29] to enhance the performance and cost functions of the e-nose system. Since R-LDA [20] outperforms PCA + LDA in face recognition problems, we expect that the feature feedback using R-LDA will outperform that using PCA + LDA [16,18].
By extending previous feature feedback results [16,18], we obtain a two-dimensional discriminant information map, which is subsequently used to implement a region-based data selection method. In the two-dimensional map, each rectangular region consists of continuous rows and columns that correspond to continuous sensor channels and time units, respectively. In this scheme, important data are defined based on the region from which data are selected from the two-dimensional map; the sensor channels and time units that form the selected region are considered important data. This important information facilitates the improvement of the performance and cost functions. Experimental results for different volatile organic compounds [11] show that our method can classify data better than other existing methods. Furthermore, our method is both cost and performance effective for the real implementation of e-nose systems. This paper is organized as follows. In Section 2, we review existing literature on R-LDA-based feature feedback. We present region selection methods in Section 3 and present our experimental results in Section 4. In Section 5, we state our concluding remarks.

Feature Feedback
In [24], feature feedback is proposed as a data pre-processing algorithm to identify important and eliminate redundant data in training and test sets. To reduce the dimension of input data, feature feedback uses several common feature extraction techniques, such as PCA, LDA or R-LDA, to create a feature mask that is then used as a reverse mapping from the feature space to the input space. Figure 1 illustrates the principle idea of utilizing feature feedback compared with other feature extraction methods in a classification system. Instead of directly using the extracted features for the classification, in feature feedback, these features are used to revert to the original data as a data-refinement process. To accomplish this, as shown in Figure 2, feature feedback uses these extracted features to create a feature mask and multiplies this mask to the original data. The feature mask obtained from the feature feedback stage is a binary mask in which "1" elements indicate important pieces of the mask and the "0" elements represent unimportant parts. Consequently, the pixels in the input samples that are important for the classification can be selected in this form of feature mask.

Regularized Linear Discriminant Analysis
In this section, we briefly introduce the concept of R-LDA from the viewpoint of improving the LDA method [19]. R-LDA attempts to solve the small sample size (SSS) problem. Let For convenience, each sample is represented as a J-dimensional matrix, where (J = I x × I h ). The lexicographic ordering operation of LDA locates the set of feature vectors (Fisherfaces), denoted by {W m } M m=1 , which are used to construct the feature space for classification. LDA performs dimensionality reduction, while preserving as much of the class discriminant information as possible. This is achieved by simultaneously maximizing the determinant of the between-class scatter matrix and minimizing the determinant of the within-class scatter matrix. The objective function of LDA can be written as follows: where S B and S W are the between and within-class scatter matrices, respectively, defined as follows: here, µ i = 1 C i C j j=1 z ij and µ are the mean of the class Z i and entire training data, respectively. The optimization problem in Equation (1) is equivalent to the following generalized eigenvalue problem, The PCA + LDA method attempts to solve the SSS problem by performing PCA [23] before LDA, which results in S W being non-singular. However, since the PCA step may discard dimensions that contain important discriminative information, the PCA + LDA method does not give the best solution to the SSS problem. To overcome this problem, R-LDA was developed [20]. R-LDA is the extended version of LDA, which aims to solve the SSS problem. The regularized Fisher's criterion can be expressed as follows: where 0 ≤ η ≤ 1 is a regularization parameter. The proof of the equivalence between Equations (1) and (5) can be found in [20]. The scatter matrices and objective functions for PCA, LDA and R-LDA are shown in Table 1. In Table 1, the columns of where F ∈ {P, L, R}, are the projection vectors. These vectors are used to represent the sample x k as a low-dimensional feature vector Table 1. Characteristics of PCA, LDA and regularized linear discriminant analysis (R-LDA).

Method
Scatter Matrix Used Objective Function µ: mean of the whole training samples; µ i : mean of the samples belonging to class c i that has N i ; η: regularization parameter (0 ≤ η ≤ 1).

Feature Feedback Using R-LDA
In [28,29], a basic form of R-LDA-based feature feedback is introduced. To evaluate the relative importance of the information in each variable for classification, the relationship between the basis of the feature space and the input variables are analyzed. After the useful features from the training data are extracted using R-LDA, a feature mask from the feature-related region is constructed. This is then used to refine the input data, including both the test and training sets. Since the R-LDA method has the ability to extract significant features for classification, the feedback step using the feature mask can effectively identify important regions, as well as eliminate redundant regions from the input data. Consequently, the classification performance of feature feedback based on the R-LDA method is expected to be better.
All of the necessary steps regarding the experiment are shown in Figure 3. The overview procedure is as follows: Since the projection vectors corresponding to large eigenvalues are more significant feature bases, the first n f projection vectors with large eigenvalues are selected to use for feature feedback.
• Step 2: A feature mask is constructed by summing n f projection vectors extracted in Step 1. In the feature mask, N elements with large values are set to one, while the remaining elements are set to zero, i.e., the final mask from R-LDA contains only one and zero elements.
• Step 3: The input data are refined using the final mask. The elements in the input data corresponding to one in the final mask are selected and utilized for classification, and the remaining elements in the input data are eliminated.

Channel and Time-Horizon Selection Using Feature Feedback with R-LDA
In this section, we present a method to extract the important data from a two-dimensional discriminant information map for classification. In [18], a one-dimensional discriminant information map, considering only the time-horizon dimension, is proposed to implement the time-horizon data selection method. On the other hand, the channel selection method in [16] considers only the channel dimension for the implementation of data selection. Since the region selection method in this paper considers both channel and time-horizon dimensions for data selection, a corresponding two-dimensional discriminant information map is created to implement the data selection process. Our method consists of two stages: In the first stage, we derive a two-dimensional discriminant information map using feature feedback based on R-LDA. In the second stage, the channels and time-horizons are selected simultaneously based on the two-dimensional map for classification. The procedure of our channel and time-horizon selection method is shown in Figure 4.

Two-Dimensional Discriminant Information Map for Channels and Time Units
We first measure the distribution of the discriminant information in the data sample by using the feature feedback [18,24]. We then construct a two-dimensional discriminant information map M D , which is used as a reference for selecting the channel and time section.
The amount of discriminant information in each element of the data samples is based on projection vectors of R-LDA, w R l , where l = 1, ..., n f . For each projection vector w R l , the magnitude of w R li reflects the amount of discriminant information in the data sample. Therefore, we construct a map m R l for each w R l representing the distribution of discriminant information in the data sample. We then merge m R l s, l = 1, ..., n f to obtain a single map m D . Each value of m D i , i = 1, .., J, of m D indicates the relative amount of discriminant information in element x ki of x k = [x k1 , .., x kJ ] T . For data reduction and normalization, we replace the values of the N largest elements in m D i s with one and set the remaining entries to zero.
In our e-nose system, we use a gas sensor array chip consisting of 16 separate channels to collect vapor data samples [16]. Each data sample is acquired through 16 channels over 2000 time points ranging from 0 to 2 s. In this scheme, one data sample is represented by a vector x k ∈ R 32000 in 32,000-dimensional input space. The typical multi-sensor time-response of the toluene vapor [16] is shown in Figure 5. The discriminant information map m D is represented by a 32,000-dimensional vector, similar to the input space and projection vectors produced by R-LDA.
ij indicates the number of elements that equal one in the j-th unit of the i-th row. In this scheme, the higher value of u D ij , the more important of a role that the j-th unit in the i-th row of map M D plays. The above process is implemented as follows: • Step 1: From the training data x k = [x k1 , .., x kn ] T , k = 1, .., N, using R-LDA to obtain n f projection vectors w R l , l = 1..n f .

•
Step 2: For each projection vector w R l , construct the dimensional map m R l = [m R l1 , .., m R lJ ] T , as follows: Here, N is denoted as the total number of selected pixels. In this scheme, if the element m D i = 1, it means that the i-th pixel of m D is considered to be important in this discriminant information map.

Pseudo Optimization of E-Nose Data Based on the Two-Dimensional Discriminant Map
The two-dimensional discriminant information map U D defined in the previous section can be used to represent the rearranged discriminant information map M D at the unit level. In map U D , an element with value one indicates that its corresponding unit in M D has high discriminant information.
High value elements in U D are distributed heterogeneously. This means that they mainly concentrate on certain channels and time-horizon frames, rather than dispersing over the whole map. Thus, we only use elements of U D with high distributions, i.e., we only choose the elements in important channels and time-horizon frames. The pseudo optimization of e-nose data based on the two-dimensional discriminant information map U D is implemented as follows: • Step 1: Divide the two-dimensional discriminant information map U D into two parts, important and unimportant information, represents all of the channels from i to j and [T U m , T U n ] is all of the time-horizon frames from m to n. Elements inside W containing all of the channels from channel i to j and all of the time-horizon frames from m to n are important, while the remaining elements are discarded because they are unimportant. To do this, we construct a new map, denoted by , from U D as follows:

Experimental Results
In this section, we present the applications of the proposed method to the e-nose system described in [11]. Since the efficacy of the proposed region selection method is evaluated based on the comparison with previous works in [16,18], we apply our method to the same e-nose dataset for the implementation. The volatile organic compound (VOC) measurement data consisted of eight classes: acetone, benzene, cyclo-hexane, ethanol, heptane, methanol, propanol and toluene. The dataset contained 160 samples, and each sample (x k ∈ R 32000 ) consisted of 32, 000 variables that were measured through 16 channels over 2000 time points. To evaluate the classification rates, we performed five-fold cross-validation [25] and computed the average value. There were 128 data samples in the training set and 32 data samples in the testing set.  For the experiment using the proposed method, we first found a suitable value of the regularization parameter η in the R-LDA equation. To do this, we first applied the R-LDA-based feature feedback mentioned in Subsection 2.3 to e-nose data for different values of η. For each value of η, we used a different number of selected value in the feature mask obtained using R-LDA. We compared the performances of all cases to determine the best value of η. Figure 6 shows the comparison of classification rates for various values of η. As depicted in Figure 6, the classification rate changes according to the value of η, and the number of selected elements in the feature mask N also changes. Based on these results, we chose η = 0.01 for our experiments.

Construction of Two-Dimensional Discriminant Information Map
To construct the one-dimensional discriminant information map m D , we set n f = 3, because the sum of the three largest eigenvalues of x L l , l = 1, .., 3, accounted for approximately 99% of the total sum of the eigenvalues. Among the 32,000 elements of the discriminant information map m D , we perform experiments using only 8000, 9600, 12,800, 16,000 and 19,200, according to 25%, 30%, 40%, 50% and 60%, respectively, of the highest values in map m D . All of the selected elements were set to one, and the remaining elements were set to zero. By doing this, the discriminant information map m D was divided into two parts: the important part with all 1 elements and the unimportant part with all 0 elements.
As mentioned in Section 3, in the rearranged discriminant information map M D , we divided the time-horizon [0, 2] into periods of 100 ms and obtained 20 time units T U i , i = 0, .., 19; in other words, each channel was divided into 20 units. The discriminant information map M D was represented at the unit level by introducing the two-dimensional discriminant information map U D . Figure 7 shows an example of a two-dimensional discriminant information map U D obtained from 128 training samples of the first experiment. As mentioned earlier, in U D , each element u D ij indicates the number of high elements in its corresponding location in the rearranged discriminant information map M D . As a result, the higher the value of u D ij in U D , the more important of a role the j-th unit of the i-th channel plays in the rearranged discriminant information map M D . Figure 7. Two-dimensional discriminant information map.

Region Selection Based on the Two-Dimensional Discriminant Information Map
For the region selection process, we first divide map U D into two parts using window W. Figure 8 gives an example to illustrate how we used window W and two-dimensional discriminant information map U D for the region selection method. In this figure, the vertical and horizontal edges of W indicate which channels and time-horizon frames are used to extract the important elements from U D . Figure 9 shows the resulting map after the channel and time selections are applied. As mentioned in Section 3.2, we denote this new discriminant map as U M . In our experiment, for different values of N, the window W was determined heuristically by changing the selected region or selected channels and time frames. We compared all of the results to choose the best window W for each case. Note that for each map U D , we use only one window W for the selection, i.e., we extracted the elements of continuous channels, as well as time frames. Doing this made our method more practical and easier to implement for real-life applications.

Classification Using the Selected Region
In this section, we discuss the results from two experiments conducted to evaluate how our proposed method affects the classification rate. In the first experiment, we compared the classification rate when R-LDA-based feature feedback, with and without the region selection method, was used during the classification stage. In the second experiment, we compared our selection method with other selection methods, such as channel selection and time-horizon selection.
For each experiment, the variables of all samples in the training set were normalized using the mean and variance of the training set. The features used for classification were extracted using the region selection from the discriminant information map created by R-LDA. For the classification stage, the one nearest neighbor algorithm was used as a classifier, the same as in [16,18]. Since the proposed region selection method focuses on the data optimization problem at the feature extraction stage, a simple classifier, such as one nearest neighbor, is used at the classification stage to make it easier to evaluate the effects of our data selection method on the e-nose system performances. In both experiments, the distances between the pairs of samples were measured using the l 2 norm. Table 2 shows the results from the first experiment. We implemented the R-LDA-based feature feedback with and without the region selection for different values of N. For each value of N, we obtained a different two-dimensional discriminant information map U D ; this required using a different window W for region selection. The results in Table 2 clearly show the effects of region selection on the e-nose system. When more than 30% of the highest pixels in map m D are selected, the recognition rate of the proposed selection method is always higher than that of the R-LDA-based feature feedback without any selection method. In the best case, the average recognition rate when data from regions containing three to 12 channels and seven to 20 time-horizon frames is 1% higher than that when all of the channels and time-horizons are considered.
Note that when using the region containing Channels 3 to 12 and time-horizon Frames 7 to 20, only 44% of the whole data is utilized, and the recognition rate improves by 1%. This suggests that the proposed method has the ability to enhance performance, specifically the processing time, memory size and recognition rate.  Table 3 shows the results from the second experiment. In this experiment, we compared the effects of different methods applied to the e-nose data. When all of the elements in the discriminant information map (without any selection method) are used for classification, the R-LDA-based feature feedback obtains a higher classification rate than the PCA-LDA-based feature feedback. The classification rate improves considerably when a selection method is applied to the discriminant information map. By applying the region selection method, the classification rate not only improves, but reduces a large amount of usage data in the discriminant information map. The last row of Table 3 shows the result when two windows are used instead of one for region selection (shown in Figure 10). In this case, although the classification rate improves by 0.4%, the usage data are much higher than 17%, which are the usage data when only one window is used.  Figure 10. Utilizing two windows W for region selection.

Conclusions
In this paper, we presented a region selection scheme applied to a vapor classification system using data from a portable e-nose sensor.
The high-dimensional input data from the e-nose sensor are highly redundant, caused by measurement noises, sensor errors and unimportant parts of the dataset. As a result, an analysis with the original input data is used, requiring an enormous amount of memory size, computation time and power consumption. Once the relative importance of data between channels and the time horizon is determined, we can efficiently extract important data for the classification process, while removing the redundant information and, hence, improving the performance of the classification system.
Consequently, we have proposed a region selection method of sensor data using feature feedback. First, we have created a two-dimensional discriminant information map by using R-LDA. Then, from the discriminant information map, we extracted important data by merging useful channels and time-horizon frames. With the region selection method, we can reduce the processing time, required memory, power consumption, and so on. Furthermore, we can improve the performance of classification by eliminating the redundant data. From the experiment for the e-nose system, we have shown that the performance of the classification can be improved in the sense of the classification rate, data processing time and memory size.
As future work, we require a more systematic algorithm in order to merge the selected channels and time horizon frames. In addition, a more complex data set for the experiments with the proposed method is also the subject of future work.