E-Nose Sensor Array Optimization Based on Volatile Compound Concentration Data

Currently, most e-nose studies are for lab-based applications, the e-nose does not provide access from other places. To be able to implement the internet of things (IoT) technology that is gaining momentum, the e-nose device must be efficient. This study proposes a sensor array optimization technique. If in previous studies using electrical signal data, our study used volatile organic compounds concentration data to minimize the use of sensors. From 10 initial sensors used in the e-nose prototype, only 4 sensors remained. The experimental results showed that by using the KNN algorithm, these 4 sensors were able to predict banana samples with an 80% accuracy rate. When applied to the final e-nose product, the prediction accuracy was 78%.


Introduction
E-nose is an electronic instrument created to be able to recognize smell/gas/aroma like a human nose. Even though it is not as sensitive as the human nose, the e-nose is not easily getting fatigued, does not get flu, and can detect odorless harmful gases like carbon monoxide.
Current e-noses are mostly designed for stand-alone applications. To access the sample prediction results, the user must be at the point where the e-nose is located. On the other hand, IoT technology is growing rapidly [1]. Some sectors such as transportation, health, logistics, agriculture, manufacturing, and others have benefited from the application of IoT technology [2]. In general, IoT connects many smart devices, technologies, and applications, then enables the automation of these things [3].
E-nose uses a non-specific chemical sensor array to produce a unique pattern of aroma. The number of sensors on the array can be 8 to 32, or even more. The use of a large number of sensors on the e-nose device makes the computing load high, increases the amount of data that must be sent, and consumes a lot of energy. This is contrary to the characteristics of IoT infrastructure which has resource constraints. In addition, the use of many sensors can also reduce machine learning performance. Sensor array optimization techniques can be done to overcome these problems [4] - [6].
Metal Oxide Semiconductor (MOS) is the most widely used type of chemical sensor in e-nose research. The e-nose device built on previous research [4] - [9], uses the value of the electric voltage (V) or the electrical resistance value (R) produced by the MOS sensor as an input for the pattern recognition system. They do not utilize the potential use of MOS sensors that can detect various volatile organic compounds (1-7 compounds).
In this paper, we try to optimize the e-nose sensor array based on the concentration of volatile organic compounds (VOC) data. As a case study, we built an e-nose to determine the quality of the banana (Musa spp.). The reason behind this is because bananas are the most produced fruit in Indonesia, and there are not many studies that analyze the relationship between changes in the concentration of volatile compounds with the chemical and physical properties of bananas during the maturity process [10]. The sample chamber serves as a place to put samples, this chamber is made to isolate volatile organic compounds that are generated by samples from the influence of the surrounding environment. The sensor array is a core component of the e-nose device, this component serves to absorb and convert volatile compounds into electrical signals. Gas sensor output data in the form of an electric voltage (V) or electrical resistance (R) then processed by the signal conditioning system to produce only information that is really useful for the pattern recognition system. While the pattern recognition system serves to predict, identify, and classify odor by comparing new odor data with previously recognized odor data.
E-nose does not work by searching for a particular gas molecule or compound, but rather looking for a unique pattern such as "fingerprint" from the analyzed air. This unique pattern is obtained using a chemical sensor array. The sensor array consists of several types of chemical sensors so that when exposed to the aroma each sensor in the array has a special response. For example, odorant A may produce a high response in one sensor and lower responses in others, while on odorant B might produce high readings for sensors other than the one that ""took"" to odorant A. These different responses are stored in the database, which is then used to train pattern recognition algorithms in odor identification and classification [13].
The e-nose device has attracted enormous attention in the past three decades, with many findings related to e-nose applications originating from various fields of applied science. This discovery provides many benefits for humans [13], such as can be used to detect disease [14], detect explosives [15], detect toxic odorless gases [16], determine the quality of a product [17], identify plant species [18], and so on.
As summarized by [19], the e-nose application is very diverse, ranging from the garden to the battlefield. Since 1993, there have been more than 12,000 e-nose articles published. Most of the applications are in the food industry, such as fruit, meat, milk, wine (drinks), tea, and coffee. It is estimated that there are 5000 publications published since then, almost half of the total e-nose publication [20].
The current e-nose research is laboratory-based, focuses on developing smaller and cheaper devices that can produce better performance. Only a few publications discuss e-nose systems that can monitor samples from other places. Some of the articles found were [9], by adopting WSN technology, the authors installed a wireless communication module on his e-nose device. Using the PCA method to analyze sensor response data (voltage), young, mature, and rotten mango samples can be classified. Then there is [21], using WSN and pattern recognition web software installed on local servers, the authors can predict 12 classes of pollutant samples contained in water with 91% accuracy and 94% using Backpropagation learning algorithm (BP) and Radial-Base algorithms Functions based neural network (RBF), respectively. Then the research article written by [5], the authors proposed the Mobile Electronic Nose system architecture to predict the level of meat decomposition. The proposed system

Making of e-nose prototype
The development began by collecting the tools and materials needed. In this study, the e-nose prototype used an Arduino MEGA 2560 microcontroller and 10 MQ gas sensors placed on the food storage box. A complete list of tools and materials needed can be seen in figure 2. To calculate the gas concentration in the air using the MQ sensor, it takes several steps. Firstly, calculate the sensor resistance value when exposed to gas (Rs), with equations (1) and (2) below: Where the ADC is the result of analog to digital conversion obtained from Arduino, Vc is the voltage value on the Arduino measured by a multimeter, VRL is the sensor voltage when exposed to a gas, and RL is the sensor resistance obtained from the datasheet.
After getting the sensor resistance value, then equation (3) can be used to calculate the gas concentration in parts per million (ppm).
Ro is the value of sensor resistance in clean air without exposure to any gas, this value can be seen in each MQ gas sensor datasheet. Meanwhile, the values of α, β, and γ were obtained from the curve fitting process using LibreOffice Calc software. Using the equations and parameters above, Arduino is then programmed to read the gas concentration for 10 minutes with 2 seconds of reading interval. The Arduino is also programmed to clean the sample chamber by turning on the exhaust fan, calibrating the sensor, and sending the readings to the PC / Laptop via a USB cable.

Sampling
The sample chosen in this study was banana (Musa acuminata × balbisiana). This type of banana produces almost no odor. So, if the e-nose prototype can predict the group of this banana, then it can be ascertained that the prototype can predict any sample. Banana samples are categorized into two groups of ripening stage, unripe and ripe. These two groups are determined by comparing the sample's color and condition. Yellow bananas, bruised, and soft are categorized as mature. While bananas in green, gummy and hard textured are categorized as raw. This collected ground-truth data is 50 samples, consisting of 25 ripe bananas and 25 unripe bananas.

Sensor array optimization
To get the optimal combination of sensor arrays, the steps taken are as follows:  The Signal denoising with the Discrete Wavelet Transform (DWT) Daubechies family method. The Daubechies options tested are from db1 to db15.  Extracting VOC concentration statistical features, such as: maximum value, average, variance, standard deviation, and skewness value. The last four statistical features can be obtained using equations (4), (5), (6), (7).

= 1 =1
(4) Where Xn is the signal value (VOC concentration in ppm) and N is the number of data.  Feature scaling with the min-max normalization method. This method generates a standard value between 0 and 1, using the following formula (8).
Where z is the normalized value, while x is the initial value, min is the minimum value, and max is the maximum value of a feature.  Feature selection using the chi-square statistical method. To calculate the chi-squared statistic, the following equation (9) is used.
Where X2 is the chi-squared value, Oi is the class i observation frequency, and Ei is the expected class i frequency.  Performance evaluation with seven classification algorithm approaches, i.e. Logistic Regression, KNN, Decision Tree, Naive Bayes, Support Vector Machine, Linear SVC, and Random Forest. The approach was evaluated using ground-truth data samples that had been taken previously with the k-fold cross-validation method.  Tuning the parameters of the two best algorithms and tuning the duration of the sample reading. (9)

Final e-nose product
Based on a combination of sensors selected from the previous sensor array optimization stage, a final e-nose product is made. The final e-nose is configured and programmed like the prototype, except that the final e-nose is installed with a WiFi module to be able to communicate and send data to the server. Then, the tools used such as the Arduino, sensor shield, and the sample chamber box are smaller than the prototype and equipped with a battery.

Results
From the ground-truth data acquisition stage using the e-nose prototype, we obtained 54 data on the concentration of volatile compounds (gases) from each sample. From the statistical feature extraction process, 270 features were generated to train the machine learning model.
From the performance evaluation stage, can be seen that: (1) The performance of most machine learning models has increased when using preprocessed data; (2) The performance of the machine learning model varies when using the 25, 20, 15, 10, and 5 features, some of them produce better performance when using 270 features, some worse. Table 1 shows accuracy comparison between algorithms that use raw data and algorithms that use preprocessed data (Signal preprocessing and Standardization). As we can see, four algorithm performances increased, two decreased, and one did not change.  Table 2 shows a comparison of seven algorithms when feature selection performed. The KNN and RF algorithms get the highest performance compared to other algorithms, these two algorithms get the best performance when using 15 features. 15 of these features are produced by 4 MQ sensors, i.e. MQ2, MQ3, MQ6, and MQ7. The details can be seen in Table 3.  From the tuning parameter stage, it can be seen that the number of neighbors in the KNN algorithm that produces the best performance is 4, 5, 6, 10 (figure 3), while the number of trees in the RF that produces the best performance is 8 ( figure 4). Both of these algorithms obtain 76% accuracy.  From the data amount tuning stage, the highest accuracy was obtained on 8 minutes of data if using KNN, and 6 minutes of data if using Random Forest. Table 4 shows the accuracy comparison of these two algorithms.  The final product of the e-nose device assembled using 4 selected sensors from the sensor array optimization stage can be seen in figure 5 below. Figure 5. Final e-nose that is used as a WSN sensor node.
Using new data collected from 50 banana samples, the e-nose pattern recognition system was trained and validated. With the KNN classification method and the 10 fold cross-validation method, the final e-nose is able to predict ripe and unripe bananas with an accuracy rate of 78%.

Conclusion
This study proposes an e-nose sensor array optimization technique. The data used as input to the pattern recognition system is the concentration of volatile organic compounds. Using this data, what compounds are produced by a sample and which compound is dominant can be known. From the array sensor optimization stage, the KNN algorithm emerges as an algorithm with the best accuracy, this algorithm gets 80% accuracy when using 15 features (4 MQ sensors) and 8 minutes of data. When applied to final e-nose products, the performance of this algorithm has decreased slightly to 78%.