Chemical gas sensor array dataset

To address drift in chemical sensing, an extensive dataset was collected over a period of three years. An array of 16 metal-oxide gas sensors was exposed to six different volatile organic compounds at different concentration levels under tightly-controlled operating conditions. Moreover, the generated dataset is suitable to tackle a variety of challenges in chemical sensing such as sensor drift, sensor failure or system calibration. The data is related to “Chemical gas sensor drift compensation using classifier ensembles”, by Vergara et al. [1], and “On the calibration of sensor arrays for pattern recognition using the minimal number of experiments”, by Rodriguez-Lujan et al. [2] The dataset can be accessed publicly at the UCI repository upon citation of: http://archive.ics.uci.edu/ml/datasets/Gas+Sensor+Array+Drift+Dataset+at+Different+Concentrations


Subject area Chemistry
More specific subject area

Chemometrics, Machine Olfaction, Electronic Nose, Chemical Sensing, Machine Learning
Type of data Text Files How data was acquired Metal Oxide (MOX) gas sensors provided by Figaro Inc. (TGS2600, TGS2602, TGS2610, TGS2620; four of each type) exposed to different gas conditions over a period of 36 months.

Data format
Processed Experimental factors For each measurement a 128-component vector is processed from the sensors' responses to extract steady-state and transient features.

Experimental features
Sensors were exposed to clean air before and after sample presentation.

Value of the data
Response of the same chemical sensor array measured consistently over a period of 36 months.
Drift in sensors' sensitivity can be evaluated over time.
Extensive dataset (13,910 measurements) generated from chemical sensors exposed to six different volatiles, each volatile presented at different concentration levels. The problem can be formulated either as a classification problem to determine which gas is present or as a regression task to determine the gas concentration levels.
It can also be applied to concept drift, active learning, and pattern recognition in Machine Learning.
Dataset suitable for the benchmark of different Machine Learning techniques designed for chemical sensing.
1. Experimental design, materials and methods

Experimental setup
The chemical detection platform included 16 commercially available metal-oxide gas sensors manufactured and commercialized by Figaro Inc. The sensor array had four types of sensors (four of each type) tagged as TGS2600, TGS2602, TGS2610, TGS2620. Hence, the detection platform generates a multivariate response upon exposure to different volatiles.
The operating temperature of the sensors is controlled by the voltage applied to the built-in sensors' heaters. The voltage on the heaters was kept constant at 5 V.
We placed the sensor array into a 60 ml air-tight chamber where the volatiles of interest in gaseous form were injected in random order. The test chamber was attached in series to a vapor delivery system that provided the selected concentrations of the chemical substances by means of three digital mass flow controllers and calibrated gas cylinders. The total flow rate across the sensing chamber was set to 200 ml/min and kept constant for the whole measurement process. The entire measurement system setup was fully operated by a computerized environment and provided versatility for setting the concentrations with high accuracy and in a highly reproducible manner (see Fig. 1).
The dynamic response of each sensor was recorded at a sample rate of 100 Hz. Hence, each measurement produced a 16-channel time series sequence. The channels were paired with the sensors to acquire sensors' responses. Each pair remained unaltered for the whole dataset acquisition. The order of the sensors in the dataset is as follows (CH0-CH15): TGS2602; TGS2602; TGS2600; TGS2600; TGS2610; TGS2610; TGS2620; TGS2620; TGS2602; TGS2602; TGS2600; TGS2600; TGS2610; TGS2610; TGS2620; TGS2620.

Methods
To generate the dataset, we adopted a measurement procedure consisting of the following three steps. First, in order to stabilize the sensors and measure the baseline of the sensor response, we circulated synthetic dry air (10% R.H.) through the sensing chamber during 50 s. Second, we randomly added one of the analytes of interest to the carrier gas and made it circulate through the sensor chamber during 100 s. Finally, we re-circulated clean dry air for the subsequent 200 s to acquire the sensors' recovery and have the system ready for a new measurement.
The sensor array was exposed to six different volatiles, each of them at different concentration levels (see Table 1). Table 2 shows the data distribution over the 36-month period. For processing purposes, the dataset is organized into ten batches, each containing the number of measurements per class and month indicated in Table 2. This reorganization of data was done to ensure having a sufficient number of experiments in each batch, as uniformly distributed as possible. Note that a few measurements, mainly in batch 7, appear at lower concentration levels than detailed in Table 2. This concentration mismatch is due to some experimental error. For the sake of completeness, we decided to include those samples in the dataset. The sensor responses are recorded in the presence of the analyte in gaseous form diluted at different concentrations in dry air. The measurement system operates under a fully computerized environment with minimal human intervention, which provides versatility in conveying the chemicals of interest to the sensing chamber with high accuracy, and simultaneously to keep the total flow constant. Therefore, no changes in the flow or flow dynamics are reflected in the sensor response, (i.e., only the presence of a gas sample will induce the sensor conductivity to change). Moreover, since the system is continuously supplying gas to the sensing chamber (either clean dry air or a chemical component), the amount of gas molecules in the sensing chamber is homogeneously distributed.

Feature extraction
MOX gas sensors typically describe a monotonically smooth change in the conductance of the sensing layer due to the adsorption/desorption reaction processes of the exposed chemical analyte substance.
We represented each time series with an aggregate of eight features reflecting the sensor response. In particular, we considered two distinct types of features in the creation of this dataset: two steadystate features and six features reflecting the sensor dynamics.