Data set from gas sensor array under flow modulation☆

Recent studies in neuroscience suggest that sniffing, namely sampling odors actively, plays an important role in olfactory system, especially in certain scenarios such as novel odorant detection. While the computational advantages of high frequency sampling have not been yet elucidated, here, in order to motivate further investigation in active sampling strategies, we share the data from an artificial olfactory system made of 16 MOX gas sensors under gas flow modulation. The data were acquired on a custom set up featured by an external mechanical ventilator that emulates the biological respiration cycle. 58 samples were recorded in response to a relatively broad set of 12 gas classes, defined from different binary mixtures of acetone and ethanol in air. The acquired time series show two dominant frequency bands: the low-frequency signal corresponds to a conventional response curve of a sensor in response to a gas pulse, and the high-frequency signal has a clear principal harmonic at the respiration frequency. The data are related to the study in [1], and the data analysis results reported there should be considered as a reference point. The data presented here have been deposited to the web site of The University of California at Irvine (UCI) Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Gas+sensor+array+under+flow+modulation). The code repository for reproducible analysis applied to the data is hosted at the GutHub web site (https://github.com/variani/pulmon). The data and code can be used upon citation of [1].

The data and code can be used upon citation of [1]. & 2015 Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Table   Subject

Value of the data
The data provide insights into the role of active sampling in the olfactory system. The findings might be on the focus of the early detection scenario. The data suit for different pattern recognition tasks in Machine Olfaction, mainly a multivariate regression with multiple responses.
The dataset can be used to explore sensor redundancy in artificial gas sensor arrays.

Experimental set up
The array was composed of 16 metal-oxide gas sensors of 5 different TGS models from Figaro Inc. [2]. The sensors were configured for 10 different sensor conditioning profiles based on the combination of 5 TGS models and 2 sensor operating temperatures. Table 1 reports the configuration parameters of the sensors. The circuit board with the gas sensor array was placed in a 70 ml inner volume chamber connected to the mechanical ventilator.
The device of the mechanical ventilator was made commercially available from Harvard Apparatus [3]. The ventilator includes a cylinder of volume 63.44 cm 3 , a mechanical pump and three outlets, namely, 'Source', 'To Animal' and 'From Animal'. The pump takes air from the outlet 'Source' and pushes the air sample through the outlet 'To Animal'. The system also receives the sample again in the outlet 'From Animal', such that the loop is closed. The chamber with the sensors is interconnected with both 'To Animal' and 'From Animal' channels. The resulted gas flow modulation system also controls the air pressure decay and collects the exhaled air. The cylinder of the ventilator was fixed to a frequency of 5 breaths per minute, approximately equivalent to 0.08 Hz.
The acquisition of sensor signals was performed by a PC104-standard embedded computer, which was designed for real-time acquisition, processing and visualization of sensory data for an autonomous mobile robot [4]. The voltage output lines of the circuit board were read by the 16-bit Analog-to-Digital Converter (ADC) board with 16 channels acquiring at a frequency of 25 Hz. The embedded computer ran a custom built GNU/Linux image designed for a refined control of the measurement process.  The measurements were split into 5 batches ('batch' attribute), where each batch contained records approximately for all gas classes given in a random order. All the batches were acquired within a time period of 4 days to minimize the effect of the long-term internal and environmental noise present in the system. The number of samples per batch among 58 samples is the following: day-1-morning: 19; day-2-afternoon: 10; day-2-morning: 10; day-3-morning: 11; day-4-afternoon: 8.

Measurement protocol
The measurement protocol was the following: we delivered 10 μL of the corresponding dilution to the vessel using a micropipette. The vessel was connected to the ventilator 'Source' outlet. After 3 min of exposition, the source of the gas vapor was removed from the vessel, and the recovery phase started. During the recovery phase, the system was sampling room air for 2 additional minutes to record the decay in the sensors signals. Note that 2 min of recovery phase was not sufficient to recover the sensors baseline and re-establish again the initial conditions in the gas chamber. Hence, although we acquired 2 min of recovery phase, the system was pumping air until the sensors recovered the baseline and the whole gas sample was exhausted from the gas chamber.

Signal-processing methods
The readout data was the output voltage of the sensors' conditioning circuit. The 16 acquired timedependent voltages were converted to resistances according to the voltage-divider scheme and the corresponding load resistor. Hence, each data point in the array described the resistance of a sensor R(t) at a certain time of measurement t. The resistance values in the data set were normalized by subtracting the baseline value R0¼R(t0) at the starting point of the measurement t0 and scaling by factor R0, (R(t)À R0)/R0. Note that such format of the measured raw data allows for comparison of responses among different sensors. The recorded time-series signal for each sensor were acquired at the sampling frequency of 25 Hz during 5 min, resulting in 7500 data points per time-series of a single sensor.
Previously to computing the low-frequency and high-frequency features, the raw data were preprocessed by a set of digital filters. A median filter was used to remove the spikes in the signals. Then we employed two Butterworth filters of 3rd order: a low-pass filter (cut-off frequency 0.01 Hz) and a high-pass filter (pass-frequency 0.07 Hz) to extract the low/high frequency parts from the original signals, respectively. Note that these low/high frequency signals (output of the two Butterworth filters) are not distributed within the data set.
For feature extraction implemented in [1], both low-frequency and high-frequency sensor signals were divided by respiratory cycles, where each cycle was processed independently. Thus, a feature is referred to as a feature by respiratory cycle. Since high-frequency signals showed oscillatory behavior similar to a sine wave curve, we decided to follow a straightforward strategy for feature extraction in this case. We used amplitude of the high-frequency signal (oscillation) at every respiratory cycle as a feature. Low-frequency trajectories had a monotonic behavior, and we used the magnitude of the low-frequency signal as a feature at every respiratory cycle. The value was taken at the same time of oscillation, where the amplitude of the high-frequency signal was measured. Note that the low-frequency and high-frequency features were computed only for the first 13 respiration cycles. Fig. 1 illustrates the feature extraction flow for a single transient of sensor No. 7.
In addition to the low/high frequency features, we also introduced a cycle-independent feature per single measurement, defined as the maximum of the low-frequency signal over the course of the measurement.

Data sets and attributes
The data published here are organized in two 'csv' files, 'rawdata.csv.gz' (4.5 MB) and 'features.csv' (200 kB). The raw data are stored in the first file 'rawdata.csv.gz', where each line represents a single measurement per sensor. Consequently, one needs to read specific 16 consecutive lines to get a single measurement from 16 sensors. The features extracted in [1] are provided in the second file 'features.csv', where each line represents features extracted from all 16 time-series of the sensors (a single measurement).
Raw data of each sample contains 16 time-series (one time-series per sensor). Each time-series was recorded during 5 min at a sample rate of 25 Hz (samples per second), providing 7500 data points per time-series. The total number of attributes per sample in raw data is 120,000.
Feature data set includes three types of features extracted from each time-series. Each time-series (one time-series per sensor) is associated with 1 maximum features, 13 high-frequency features and 13 low-frequency features (the features correspond to the first 13 respiration cycles, respectively). The total number of attributes per sample in feature data set is 432.
Both tables of the raw data and features have common attributes: 'exp': integer (range 100-181); represents the experiment number registered in the laboratory; 'batch': string (5 values); represents the batch index of the measurements; 'ace_conc': float (range 0-1); the concentration of the acetone analyte given in vol%; 'eth_conc': float (range 0-1); the concentration of the ethanol analyte given in vol%; 'lab': string (12 values); the class label of the gas; 'gas': string (4 values); another class label that encodes either pure analytes, mixture or air; 'col': string (12 values); the color code for class labels.
The table of the raw data has specific attributes: 'sensor': integer (range 1-16); the sensor number; 'sample': integer (range 1-58); the sample number; 'dR_t〈m〉': float; represents the value of the time series for a given sensor and for a given sample, which were measured at the time instant 〈m〉, where 〈m〉 takes the value from 1 to 7500.
The table of the features has specific attributes: 'S〈j〉_max': float; represents the value of the maximum feature extracted from the time-series of sensor 〈j〉; 'S〈j〉_r〈k〉_Alf': float; represents the low-frequency feature extracted from the time-series of sensor 〈j〉 at the respiration 〈k〉, where 〈j〉 takes the value from 1 to 16, and 〈k〉 takes the value from 1 to 13; 'S〈j〉_r〈k〉_Ahf': float; represents the high-frequency feature extracted from the time-series of sensor 〈j〉 at the respiration 〈k〉, where 〈j〉 takes the value from 1 to 16, and 〈k〉 takes the value from 1 to 13.