Drift Correction Methods for gas Chemical Sensors in Artificial Olfaction Systems: Techniques and Challenges

on purpose


Introduction
The human sense of smell is a valuable tool in many areas of industry such as perfumery, food and drink production, clinical diagnosis, health and safety, environmental monitoring and process control (Gobbi et al., 2010;Vezzoli et al., 2008).Artificial olfaction mimics human olfaction by using arrays of gas chemical sensors combined with pattern recognition (PaRC) systems (Pearce et al., 2003).When a volatile compound comes into contact with the surface of the array, a set of physical changes modifies the properties of the material from which each sensor is composed.This perturbation can be measured, digitalized and used as a feature for the specific compound.A preliminary training phase collecting samples from known volatile compounds is used to train a selected PaRC algorithm in order to map each concentration of gas to the responses from the sensor array.The trained model is then used for identification during later measurements.The classification rate of the PaRC system determines the final performance of the electronic olfaction system.Gas sensor arrays represent a potentially low-cost and fast alternative to conventional analytical instruments such as gas chromatographs.However, successful applications of gas sensor arrays are still largely limited to specialized laboratories (Pardo & Sberveglieri, 2004).Lack of stability over time and the high cost of recalibration are factors which still limit the widespread adoption of artificial olfaction systems in real industrial setups (Padilla et al., 2010).The sensor drift consists of small and non-deterministic temporal variations of the sensor response when it is exposed to the same analytes under identical conditions (Holmberg et al., 1997).The main result is that the sensor's selectivity and sensitivity decrease.The gas sensor drift changes the way samples distribute in the data space, thus limiting the ability to operate over long periods.PaRC models become useless after a period of time, in some cases weeks or a few months.After that time the artificial olfaction system must be completely re-calibrated to ensure valid predictions (Aliwell et al., 2001).It is still impossible to fabricate chemical sensors without drift.In fact, drift phenomena afflict almost all kinds of sensors (Chen & Chan, 2008;Owens & Wong, 2009;Polster et al., 2009).
Sensor drift must be therefore detected and compensated to achieve reliable measurements from a sensor array.Algorithms to mitigate the negative effect of gas sensor drift are not new in the field; the first attempt to tackle this problem dates back to the early 90s (Pearce et al., 2003).Nevertheless, the study of sensor drift is still a challenging task for the chemical sensor community (Padilla et al., 2010;Pearce et al., 2003).Solutions proposed in the literature can be grouped into four main categories: (i) periodic calibration, (ii) attuning methods, (iii) filtering of drift components and (iv) adaptive models.In this chapter the authors introduce the main challenges faced when developing drift correction techniques and will propose a deep overview of state-of-the-art methodologies that have been proposed in the scientific literature trying to underlying pros and cons of these techniques and focusing on challenges still open and waiting for solutions.

Gas chemical sensors and the drift phenomenon
A chemical sensor is a device that transforms chemical information, ranging from the concentration of a specific component to the total composition analysis, into an analytically useful signal (International Union of Pure and Applied Chemistry, 1991).In the gas chemical sensing, the input signal is the concentration of one or more gaseous species while the output signal depends on the transduction mechanism that is usually a variation of some physical properties of the sensing element such as: conductivity (or other electrical magnitudes), oscillation frequency (mass), temperature, electrochemical potentials, surface work function or optical properties (Janata, 2009).Artificial olfaction systems, usually referred to as Electronic Noses (ENs) (Pearce et al., 2003), are machines designed for detecting and discriminating among complex odors using an array of broadly-tuned (non-specific) gas chemical sensors typically belonging to the above mentioned categories.An odor stimulus generates a characteristic fingerprint from the sensor array.Patterns from known samples can be used to construct a database (training set) and train a pattern recognition system so that unknown odor samples (test set) can be subsequently identified.Attempts to measure odors with electronic instruments were made in the early 60s, but the "modern era" of artificial olfaction began in 1982 with the work of Persaud and Dodd (Persaud & Dodd, 1982), who used a small array of gas-sensitive metal-oxide devices to classify odors.The expression "electronic nose" appeared for the first time in 1987 (Shurmer et al., 1987), with its current definition given in the same year by Gardner (Gardner, 1987).Commercial instruments became available in the early 90s -first in Europe immediately followed by the U.S. -with pioneering machines developed by Alpha Mos 1 and Aromascan.ENs take their inspiration from the working mechanism of biological olfaction.However, current technologies based on chemical sensors are still far away from the capability of biological systems mainly because of their (still) poor selectivity and sensitivity with respect to biological receptors, and more importantly their lack of stability.In recent years, classical chemical sensor technologies were complemented by new emerging technologies (Röck et al., 2008).In particular, machine olfaction has benefited from developments in several fields ranging from optical technologies developed by the telecommunications industry to the improvements in analytical chemistry such as: gas chromatography, mass spectroscopy and ion mobility spectrometry.This trend has also narrowed the gap between traditional ENs -used as a black box -and classical analytical techniques which aim to quantify individual volatile components.Although the constant improvements in micro fabrication techniques and the rapid development of new nano fabrication techniques have allowed the production of functional micro and nanoscale chemical sensing devices with finer sensitivity (Comini & Sberveglieri, 2010) and selectivity (Haupt & Mosbach, 2000), signal repeatability over time still remains the real challenge in the chemical sensor field for all types of sensors.In order to have a reliable instrument it is of great importance that individual sensor signals are stable and reproducible.In practice, it is not worthy spending several weeks for training an EN system in a particular application if, as a consequence of changes in the sensor response, the system can only be used for a few days before recalibration is required.The problem of chemical sensor stability over time is known as "sensor drift".It consists of (more or less) small and non-deterministic temporal variations of the sensor response when it is exposed to the same analytes under identical conditions (Holmberg et al., 1997).This is generally attributed to sensors aging (Sharma et al., 2001) or thermo-mechanical degradation (Mielle, 1996), but it can also be influenced by a variety of sources including environmental factors (Di Natale et al., 2002b;Ionescu et al., 2000).The main result is that sensors selectivity and sensitivity slowly decrease with time.The physical causes of this phenomenon are technology dependent and are strictly correlated to the sensing and transduction mechanisms.

Drift phenomenon in metal oxide semiconductor gas sensors
Among the different technologies used to fabricate chemical sensors, which in general lack enough information about the physical causes of the sensor drift, the physical meaning of the drift phenomenon in conductometric metal oxide sensors (MOX) has been deeply investigated in the past.In fact, signal drift is known to be a severe problem for these devices which have widespread commercial diffusion.A typical example is the sintered tin dioxide Taguchi Gas Sensor (TGS), a n-type semiconductor solid state device marketed by Figaro Engineering Inc. 2 since 1968 and widely applied for detection of oxidizing and reducing gases.The sensing properties of TGS are based on the electronic (n-type) conductivity of tin dioxide (SnO 2 ) (Goepel & Schierbaum, 1995).The device consists of small SnO 2 grains which are in contact with each other.The sensing effect is due to an electronic depletion layer at the surface of the grains.The depletion layer is generated when oxygen is adsorbed, thus trapping electrons from the oxide.This induces an increased resistance at the grain surfaces.When the current passes from one grain to another it has to cross these depletion layers (Schottky barriers) which thus determine the sensor's resistance.The sensing effect, i.e., the response to reducing or oxidizing gases, is therefore determined by the adsorption of these compounds and the subsequent trapping (or donation) of electrons by the adsorbed species.This modifies the space charge potential thus changing the sensor's conductivity.Similar working models also apply to thick-and thin-film semiconductor MOX gas sensors, including for instance SnO 2 − RGTO 3 gas sensors (Sberveglieri, 1992).The main reasons for the MOX sensor 2 http://www.figaro.co.jp 3 Rheotaxial Growth and Thermal Oxidation (RGTO) technique was developed in the early 90s at the Sensor Lab on purpose of growing metal oxide thin films.The films obtained with such technique show a structure characterized by polycrystalline agglomerates uniformly distributed and connected by necks.This highly porous structure leads to a large surface area well suited for gas absorption 307 Drift Correction Methods for Gas Chemical Sensors in Artificial Olfaction Systems: Techniques and Challenges www.intechopen.comdrift has been attributed to different structural and morphological variations of the sensor as discussed in the following subsections.

Chemical diffusion of oxygen vacancies
Usually two chemical effects are considered as potential sources of drift in MOX sensors: a) the resistivity change induced by chemisorption of water on the sensor (Schierbaum et al., 1991), and b) the chemical diffusion of oxygen vacancies that was investigated by means of relaxation experiments on tin dioxide single crystals (Kamp et al., 2001).Kamp et al. have proven that for TGS sensors the chemical diffusion of oxygen is fast enough to cause severe drift effects due to stoichiometry changes.This effect was in turn attributed to two main origins: • the chemical diffusion of oxygen in SnO 2 consists of the simultaneous transport of oxygen vacancies and conduction electrons that induces conductivity changes in the bulk and/or space charge layer of SnO 2 grains.Changes in oxygen vacancy concentration over the sensor working time influences the space charge and the overall conductivity which reflects on the sensor baseline drift; • the diffusion of oxygen vacancies can be induced by the space charge electric field alone (field induced migration).For the case of the TGS this process is induced by a signal change, via changes of the space charge potential.Since oxygen vacancies are usually majority carriers, the result of the redistribution of oxygen vacancies is a severe modification of the properties of the space charge region itself.

Physical changes of the MOX sensing elements
Scanning electron microscopy investigation of thin-film SnO 2 sensors integrated on CMOS micro-machined hotplates (Sharma et al., 2001) indicated that the sensing thin-film cracks after long operation cycles.Indeed, the thermal stress in the micro-hotplate induces cracks during a large number of heating and cooling cycles.Cracks in the thin-film can be considered a major physical cause of the sensor drift.Authors also suggest that Cu doping could be a manner to suppress (or strongly reduce) the film cracking.Similar results are mentioned in other works, e.g., investigating the variation of electrical response to CO and CH 4 induced by the continuous thin-film SnO 2 sensor operation at 400degC for six months.A slowly varying decreasing of response to both baseline air and gases was observed (Fig. 1) and attributed to the coalescence of grains due to their intrinsic poor degree of crystallinity.In fact a significant improvement on the stability of the sensor platforms can be achieved using single crystalline materials such as novel quasi-one-dimensional MOX nanostructures (Comini & Sberveglieri, 2010).

Degradation of the electric contacts
Besides the preparation of stable grain structures, the proper choice of electric contact geometries (shape and thickness) is particularly important in the design of reliable conductometric MOX sensors.
Recently, thin-film SnO 2 − RGTO sensors have been investigated at SENSOR lab by scanning electron microscopy including the study of morphology by secondary electron detection and back scattered electrons which allow for compositional information (unpublished results).
Electrical tests, performed in parallel, predicted a drift time constant of about 85 days .The correlation of these results show that the long term stability is mostly determined by the instability of platinum inter-digitized contacts (IDCs).Two concurrent effects were observed (Fig. 2): • first, the erosion of the IDC, which appears thinner after aging, and the formation of Pt agglomerates.Pt agglomeration was observed only on the IDCs deposited over the SnO 2 − RGTO layer, while the Pt structures deposited over the alumina substrate did not exhibit agglomeration.From this, we might argue that the larger RGTO roughness accelerates the degradation rate of the above deposited Pt layer; • second, the formation of Pt agglomerates could be promoted by electro-migration phenomena.This second hypothesis has been confirmed observing that the formation of Pt agglomerates is more evident in the area in which the section of the electrodes is thinner, i.e., where the current density is bigger.Increasing the IDC thickness from 500nm to 900nm the drift time constant was increased to about 130 days.

Overall considerations
In summary, even if we considered a well defined class of deeply investigated sensors such as MOX sensors, the drift phenomenology is still not totally understood since real polycrystalline samples show a variety of aspects and a high degree of complexity.Depending on the specific technology (e.g., TGS or thin-film) one aspect can be dominant over the others, but often concomitant causes are present.
Irreversible changes of the sensor response might also occur, one of the most common reasons being the sensor surface poisoning (Ruhland et al., 1998).This arises when the sensor is exposed to a gas (e.g., a corrosive acid) which strongly binds or interacts with the sensing material leading to a deep change in its physicochemical properties.Indeed, some authors have reported the occurrence of sensor faults during long measurement runs in monitoring odors on a landfill site (Romain & Nicolas, 2010).In other cases, the presence of strongly binding substances in the samples, such as sulphur compounds (Pratt & Williams, 1997) or some acids Schaller et al. (2000) led to irreversible poisoning.When this occurs, the sensor should be replaced and there is no possibility to compensate the effect.On the contrary, slowly varying drift phenomena, which are frequently reported in the literature, can be coped with proper "soft" methods that will be illustrated in the following sections.

Drift counteraction methods: a taxonomy and review
Attempts to mitigate the negative effect of gas sensor drift are not new.A great deal of work has been directed towards the development of drift correction methods and algorithms, which tackle the problem from different perspectives, depending on the situation.Nevertheless, the study of sensor drift still remains a challenging task for the chemical sensor community always looking for novel improved solutions.Fig. 3 provides a rough classification into four main categories of the solutions proposed in the literature, which are presented hereafter.One of the simplest methods for drift compensation that has been proposed in the literature, which is also widely used as a pre-processing method, is the transformation of individual sensor signals (Gardner & Bartlett, 1999) based on the initial value of the transient response (the so called "baseline") thus the name of baseline manipulation.Three basic transformations are common practice:

Sensor signal preprocessing
1. Differential: subtracts the baseline of each sensor and then can help compensating additive drift effects which are both present in the baseline and gas response: where ŝ is the transformed (corrected) response, y is the measured response, x is the ideal sensor response without drift and δ is the drift contribution which is assumed to be constant and uniform.
2. Relative: divides by the baseline and might correct for (constant and uniform) multiplicative drift effects: 3. Fractional: a combination of the previous two that works for multiplicative drift and has the advantage of providing dimensionless measurements and normalized sensor responses.
The first two transformations are too specific because in real applications the drift is generally not additive neither multiplicative, thus they are not able to correct the drift effect while they are (usually the second one) used to simply "normalize" the sensor response.The last manipulation obviously does not work if additive drift is present.Conversely it can amplify the noise in the measurements because the drift term, which is typically small, remains at the denominator thus degrading the quality of the sample.It therefore provides again poor correction against drift effects.More advanced preprocessing methods have been tried out based on sensor signal processing in the frequency domain more than in the time domain.

Filtering
Counteracting methods based on filtering strategies focus on the application of signal preprocessing techniques to filter out portions of the signal containing drift contaminations.Drift typically occurs in a different frequency domain with respect to interesting signals, being in general a slower process.Therefore, proper transformations of sensor signals from time to frequency domain and a careful removal of the lowest frequency components can filter the drift out.Feature extraction techniques based on Discrete Wavelet Transform (DWT) can be a powerful tool to remove the drift contamination in the low-frequency behavior of the sensor responses (Hui et al., 2003;Llobet et al., 2002;Zuppa et al., 2003).A moving median filter and Fourier band-pass filters are some examples applied to removing either high-frequency fluctuations (such as noise, spikes) or low-frequency changes such as drift.In comparison to these filters, DWT technique provides a flexible analysis of the signal at different resolutions by applying iteratively high-pass and low pass filters.DWT technique allows to remove the selected low-frequency components easily and in such a way that the signal is not distorted.
The DWT provides a multiresolution signal decomposition of sensor response: it analyzes the signal at different frequency bands with different resolutions by successively projecting it down onto two basis of functions, which are obtained by applying shift and scaling operations to two prototype functions called the "scaling function" and the "wavelet function", respectively.The scaling function is associated with low-pass filter and the convolution between the signal and the scaling functions gives the low-frequency components of the signal.Conversely, the wavelet function is associated with high-pass filter and its convolution with the signal gives the high-frequency components.These components are called approximation coefficients.
The multilevel wavelet decomposition and signal reconstruction is performed by the Mallat algorithm.It consists of iteratively applying high-pass and low-pass filters on the vector of approximation coefficients, obtaining a sequence of increasingly smoothed and halved versions of the original signal.Once a specific wavelet has been chosen (e.g.Daubechies orthonormal functions), the pair of low-pass and high-pass filters are defined.
The DWT decomposition level is fixed once analysis in the frequency domain has been carried out to single out the frequency domain (cut-off frequency) of the drifting trend.
The approximation coefficients associated with the lowest frequencies which have drift contamination are then discarded and the wavelet reconstruction of the corrected signal is computed by using the remaining coefficients.

Periodic calibration
Different drift correction methods are based on the estimation of the drift effect on the system to be later removed.Drift effect estimation can be made, for instance, by measuring the change in the sensor responses to one (or more) reference gas, which is measured with some intervals along the experiment.This strategy can be applied in a univariate way (sensor-by-sensor) or in a multivariate way by removing the directions of dispersion of the reference data in the feature space.

Multiplicative drift correction
Univariate calibration is a straightforward method in which a reference value, i.e., the response to a reference gas, is used and all subsequent sensor readings are individually corrected to it.Fryder et al. (Fryder et al., 1995) and Haugen et al. (Haugen et al., 2000) proposed to model temporal variations of the system with a multiplicative drift correction (MDC) factor obtained by measuring the calibrant and then to apply the same correction to the actual samples.
In particular, Haugen et al. propose a re-calibration method performed in two steps: within a single measurement sequence to compensate for short terms trends and between measurement sequences to compensate for long term fluctuations.This strategy provides very good results and it is currently in use in commercial electronic noses.However, assumptions are too tight: MDC makes a supposition that the drift is multiplicative, which means that the perturbation is proportional to the signal level.Furthermore, it assumes that the relationship between the individual sensor response to the reference gas and the response to the test gas has to be strictly linear.

Multivariate component correction
One of the first attempts of performing robust drift correction by multivariate methods was proposed by Artursson et al. (Artursson et al., 2000) under the name of Component Correction (CC).Two correction methods, one based on Principal Component Analysis (PCA) and one based on Partial Least Square (PLS) are proposed in the paper.
Measurements performed with arrays of chemical sensors contain several redundant information since, in general, the different variables are collinear.PCA (Wold et al., 1987) is a common compression method used to efficiently represent this information.With PCA, the dominating variability in the measurement space can be captured by means of two matrices: 1. the loading matrix P that represents a new coordinate system onto which the measurement vector X must be projected, and 2. the score matrix T that represents the coordinate of the sample in the space represented by the columns of P.
Within the new space defined by P, dimensions (referred to as components) are ordered by decreasing variability in the input measurements.CC uses PCA in conjunction with the reference gas technique.If the sensor responses to a certain reference gas contain a significant amount of drift, the first component identified by the PCA analysis on these measurements, which is the one that describes the maximum variability, will likely define the direction of the drift.This is motivated by the fact that the sensors are always exposed to the same gas and thus they are expected to provide always the same response with the exception of some random noise.This direction is defined by a loading vector p obtained as the first column of the loading matrix P. Projecting (multiplying) the sample X on this loading vector gives a score vector t representing the drift: Drift correction can be then implemented by subtracting from the original data the bilinear expression t • p T which represent an approximation of the drift component in the original sample space (i.e., all other directions are preserved and the variance that distinguishes and separates classes of samples in the data space is preserved), thus obtaining a corrected sample : Removing one component is usually enough whenever we are facing drift effects caused by aging of the sensors.However, if non-linear drift effects are observed (e.g., caused by both aging effects and chemical background) more than one component can be subtracted.
A similar correction strategy can be also obtained by using a regression model.Since the drift caused by aging effects has a preferred direction in the measurement space, it should be possible to describe this change as a function of time.The Partial Least Square (PLS) regression model (Wold et al., 1984) is able to infer the dependence between two set of variables (the sample matrix X and a matrix Y representing the time in our specific case) by using a set of orthogonal score vectors.Artursson et al. propose to compute a weight vector w and a loading vector p according to the PLS model on a set of measurements of a reference gas (as in the case of PCA).These vectors are first used to compute the drift component t according to the PLS regression model: Then, similarly to (5) the corrected sample is obtained as the residual after the first component has been subtracted from the original data: Again several components can be removed by repeating equations ( 6) and (7).

Multivariate component deflation
Another attempt to perform multivariate drift correction has been proposed in Gutierrez-Osuna (2000).The overall idea is to introduce a set of variables defined by a vector Y whose variance can be attributed to drift or interferents.Examples of these variables can be the response to a wash/reference gas that is usually performed prior to each measurement, time stamps, temperature, pressure, humidity, etc.The basic idea is to measure the sensor/array response to an odor X and remove the variance in X that can be explained by the variables in Y (by means of regression/deflation).
The approach basically applies Canonical Correlation Analysis (CCA) or Partial Least Squares (PLS) to find two linear projections Y = a • Y and X = b • X that are maximally correlated.This can be formally expressed as: Y and X are in fact low-dimensional projections that summarize the linear dependencies between Y and X.At this point Ordinary Least Squares (OLS) (Dillon & Goldstein, 1984) can be used to find a regression model X pred = w • X able to minimize the difference between X and X pred .This is formally expressed as: The OLS prediction vector (X pred ) in fact contains the variance of the odor vector X that can be explained by X and indirectly by Y since X and Y are correlated.At this point it is enough to deflate X and use the residual X corr as a drift-corrected vector: summary, techniques based on recalibration of the system using a reference gas can give very good results, but care must be taken in the selection of such gas.It must be representative of all classes being measured, since they are supposed to drift in similar ways.It should also be stable along the time, available and easy to measure (Salit & Turk, 1998).Fitting all these constraints is in general complex and expensive, therefore methods overcoming this limitation have been explored.

Attuning methods
Attuning methods try to perform component correction without resorting to the use of calibration samples, but trying to deduce drift components directly from the training data.They can provide significant improvements in the classification rate over a fixed time period, and may also make possible to obtain real responses to be used in gas quantitative analysis.

Independent component correction
Di Natale et al. approached the problem of sensor drift with an attuning method considering also disturbances derived from the measurement environment (Di Natale et al., 2002a).In methods such as CC (Artursson et al., 2000) that are based on PCA, the computed principal components are mutually uncorrelated.However, this condition is not enough to guarantee that relevant signals are completely separated by disturbances (non-correlation does not necessarily implies statistical independence) and often a principal component carries information on both signal and disturbances.To tackle with this limitation Di Natale et al.
propose to exploit the Independent Component Analysis (ICA) (Cornon, 1994) as a technique to separate a data matrix into a series of components each independent from the others.In this case, independence means that the information carried by each component cannot be inferred from the others, i.e., the joint probability of independent quantities is obtained as the product of the probability of each item.ICA is applied to EN data to preserve only those components correlated with the sample features relevant to the application.In fact, ICA is computed on the training set (in this case no reference gas is required).Contrarily to Gaussian based models such as PCA and PLS, in ICA it is not possible to determine the variance carried by each component, and it is therefore not possible to choose the component to eliminate based on this information.In fact it is not possible to establish an order among the different components.
The way the authors propose to select components to eliminate is a supervised method: the independent components that mostly correlate with the objective of the measurement are chosen, while those more correlated with the disturbances are discarded.This solution provides good results especially in removing drift effect due to external causes (e.g., temperature, pressure) that can be monitored during experiments and used as variables to select the components to discard.However, whenever the causes of the drift are not completely known, selecting the components to discard may result complex or in some cases not possible.

Orthogonal Signal Correction
Recently, Padilla et al. (Padilla et al., 2010) proposed a very interesting drift attuning method based on Orthogonal Signal Correction (OSC).OSC is a signal processing technique first introduced by Wold et al. in (Wold et al., 1998) for NIR spectra correction.OSC analyzes a set of sensor-array data X that represent a set of independent variables, and a concentration vector, or class label vector C that represent a set of dependent variables.The main idea is to remove variance of X which is not correlated to the variables in C.This is achieved by constraining the deflation of non-relevant information of X so that only information orthogonal to X must be removed (Fig. 4).The condition of orthogonality therefore assures that the signal correction process removes as little information as possible form the original data.Several different variants of the OSC algorithm have been presented in the literature.Padilla et al. applied the Wise implementation of the OSC algorithm4 .Even if OSC proved to be one of the most effective techniques for drift correction it does not completely solve the problem.One of the main drawbacks is the need for a set of training data containing a significant amount of drift making it possible to precisely identify the set of orthogonal components to be rejected.This may not be possible in industrial setups where training data are usually collected over a short period of time.Moreover, the introduction of new analytes to the recognition library represents a major problem since rejected components might be necessary to robustly identify these new classes.

Neural networks
The first adaptive approaches for drift correction have been developed resorting to artificial neural networks.Neural networks are an important tool for building PaRC systems with several interesting features.First, neural networks are data driven self-adaptive methods in that they can adjust themselves to the data without any explicit specification of functional or distributional form for the underlying model.Second, they are universal functional approximations in that neural networks can approximate any function with arbitrary accuracy (Cybenko, 1989;Devijver & Kittler, 1982).Since any classification procedure seeks a functional relationship between the group membership and the attributes of the object, accurate identification of this underlying function is doubtlessly important.Third, neural networks are nonlinear models, which make them flexible in modeling real world complex relationships such as those presented by gas chemical sensors.Finally, neural networks are able to estimate the posterior probabilities, which provide the basis for establishing classification rule and performing statistical analysis (Richard & Lippmann, 1991).Several neural network based drift correction methods exploit Kohonen Self Organizing Maps (SOMs) (Kohonen, 1990).(Davide et al., 1994;Marco et al., 1998;Zuppa et al., 2004) use a single SOM common to all classes while (Distante et al., 2002) proposes to use a separate SOM for each odor.A slightly different approach is proposed in (Vlachos et al., 1997) that exploits a different network architecture, namely the Adaptive Resonance Theory (ART) neural network (Carpenter et al., 1991) that allows for new classes to be created.Regardless the type and the specific architecture of the neural network all proposed methods achieve drift correction by exploiting the inner way neural networks work.As common in pattern recognition, a preliminary training phase is used to train the neural network in the identification of samples from the sensor array.In this phase a set of samples is provided to the network which learns similarities according to different learning rules (Bishop, 1995).Each new sample slightly changes the way the nodes of the network behave.Both supervised learning in which training samples are already labeled into a set of classes and unsupervised learning in which training samples are not labeled and the set of classes is inferred by the network itself is possible.When training has converged, the net has learned the characteristics of the input patterns and can be used for classification.In this stage the learning capability of the network is usually disabled.The basic idea to allow for drift compensation is to maintain a certain learning rate also during the normal use of the network in order to learn changes of the input patterns due to drift effects.The learning rate must be kept to a low level in order to avoid over-fitting of the model.Although neural networks represented the first attempt of implementing adaptive drift correction methods they have several drawbacks: • Drift correction is possible only for slow phenomena.A discontinuity in response between consecutive exposures (regardless of the time interval between the exposures) would immediately invalidate the classification model and would prevent adaptation.
• Selecting the appropriate learning rate to keep during normal operation is complex and may strongly impact the correction capability.To the best of our knowledge no automatic methods have been proposed so far for efficiently tuning this parameter.
• The adaptive model is rather complex and typically requires a high number of training samples.
• Several classification algorithms have been presented in the literature other than neural networks (e.g., SVM, K-NN, Random Forests, etc.).The literature shows that, depending on the specific set of data, some methods may perform better than others.Having a very tight integration between the correction method and the classification system prevents from exploiting the best PaRC model for the specific problem.
• Finally, drift correction methods based on neural networks are mainly limited to gas classification applications.Whenever both classification and gas quantitative analysis are required, it would be difficult for current adaptive methods to be applied to obtain reliable gas concentration measurements (Hui et al., 2003).

Evolutionary algorithms
Recently, a new adaptive drift correction method based on the use of evolutionary algorithms has been presented by Di Carlo et al. (Di Carlo et al., 2010;2011).The overall idea is to exploit the learning capabilities of evolutionary algorithms to compute a multiplicative correction factor C used to correct incoming samples.Under the hypothesis that, in the very short term, the variation imposed by the drift can be considered linear in time, the paper proposes to apply the correction exploiting a linear transformation: Although this assumption is a limit for previous drift counteractions that do not allow for adaptation, it is a good approximation in this case since the correction factor is not a fixed quantity but it is continuously adapted to follow the variation imposed by the drift.The hypothesis of linearity is therefore assumed only within a restricted time window (or number of measurements) whose size can be adapted in order to respect this constraint.
The correction algorithm can be coupled with any selected PaRC model (e.g., SVM, K-NN, Random Forests, etc.) and elaborates groups of consecutive measurements, denoted as windows, according to the following steps: 1.The initial correction factor, immediately after training of the PaRC model, is set to the null matrix since no correction is required at this time; 2. For each window of samples the following operations must be performed: (a) Correct each sample of the window using the current correction factor according to equation ( 11); (b) Classify each corrected sample; (c) Use the corrected samples, and the classification results in an evolutionary process able to adapt the correction factor to the changes observed in the current window (see later for an explanation of how this process works); (d) Correct each sample again using the updated correction factor; (e) Classify the new corrected samples and provide the obtained results.
The adaptation of the correction factor exploits a Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) which is an optimization method first proposed by Hansen, Ostermeier, and Gawelczyk (Hansen et al., 1995) in mid 90s.It basically iteratively searches for a definite matrix which is able to minimize a given objective function.The approach is best suited for difficult non-linear, non-convex, and non-separable problems, of at least moderate dimensionality.In this specific case, the problem tackled by the CMA-ES is finding the best correction factor that makes possible to obtain similar distributions between corrected samples of a window compared to the samples used to train the PaRC model.Several metrics can be exploited to identify similarity.In (Di Carlo et al., 2011), under the hypothesis of Gaussian distribution of samples around the related centroids, the CMA-ES is exploited to identify the best correction factor able to move each sample toward the centroid of the related class, thus compensating for drift effects that tend to move classes in the features space.The concept of proximity to the related centroid is introduced by testing the system using several types of objective functions that compute the distance of a sample from the related centroid with different metrics.This method introduces a set of important improvements compared to previous adaptive methods based on neural networks: • it can work in cooperation with any PaRC algorithm thus allowing for the selection of the best classification model depending on the specific application; • it is more robust to discontinuity in the data; • together with sample classifications it also provides corrected measures thus allowing also for gas quantitative analysis.

Comparisons and discussions
Only partial comparisons and limited considerations can be done based on literature reports.In fact, due to lack of shared sensor dataset and public software codes, only few comparison works that partially contrast different approaches and report the pros and cons of each solution have been published (Hui et al., 2003;Padilla et al., 2010;Romain & Nicolas, 2010;Sisk & Lewis, 2005).Every paper tends to show good results related to the newly proposed approach.However, in many cases, measurements are collected during few days, thus neglecting long term drift effects, or drift was simulated.Signal preprocessing techniques seem attractive for their simplicity, but cannot be considered an effective drift correction approach for chemical sensors.Baseline manipulation works well only in special cases that fit the assumptions of the employed feature transformation.Filtering methods in the frequency domain appear more promising.(Zuppa et al., 2007) compared the DWT filter with the relative baseline manipulation technique.PCA shows that the DWT approach is superior in terms of cluster dispersion with clear improvements in samples discrimination.The DWT method has been often combined to adaptive drift correction strategies based on neural networks.Nevertheless, frequency decomposition remains mostly useful for diagnostics and analysis because strong dependence of results from user defined parameters (e.g. the cut-off frequency).
According to Sisk and Lewis (Sisk & Lewis, 2005) using a single calibrant or a set of calibrants is perhaps the only robust method to mitigate drift effects even in the presence of sensor drift over an extremely long period of time.Sisk and Lewis implemented a simple linear sensor-by-sensor calibration scheme which proved to be effective at restoring the classification performance of difficult binary separation tasks affected by drift.In many other cases this approach failed to work.Thus, one relevant open point is the tradeoff between model simplicity and robustness.Univariate calibration methods, like MDC, are first-order approximations that do not exploit sensor cross-correlations.Empirically they provided good performance.However, they have several intrinsic limitations.The most important limitation is that the MDC method assumes that the relative percentage (increase or decrease) of the sensor response due to the drift is the same for the reference samples and test samples.This can be true for sample chemical compositions and concentrations very close to the reference gas but not otherwise.Therefore the amount of drift may be different for different samples, i.e., the drift for the reference gas and some samples is not same.The consequence of this is that the sensor drift does not have to be the multiplicative factor estimated from the reference gas, which the MDC method assumes.As pointed out by Artursson et al. (Artursson et al., 2000) multivariate techniques, for instance CC, better perform both in terms of drift correction performance and flexibility.The reader may notice that both the CC method and the MDC method are linear, but they differ in what is assumed to be linear.For the CC method, drift is assumed to mainly influence the sensors along a straight line, described by the drift in the reference gas.In the univariate MDC method it is assumed that the relative change in each sensor is the same for the reference gas and all test gases.There exist cases where the drift is linear for one of the methods but not for the other.Both MDC and CC methods suffer from the same disadvantages when it comes to handling non-linearities under their respective restrictions.However, as long as the relationship between the reference gas and the test gas (for the MDC method) or the relationship between the sensors (CC method) remains linear, there is no problem for both methods in handling non-linearities.Calibration is certainly the most time-intensive method for drift correction since it requires system retraining, additional measurements and labour.Hence, it must be used sparingly.Moreover, while this approach is quite simple to implement for physical sensors, where the quantity to be measured is exactly known, chemical sensors pose a series of challenging problems.Indeed, in chemical sensing, the choice of the calibrant strongly depends on the specific application especially when the sensing device is composed of a considerable number of cross-correlated sensors (Haugen et al., 2000;Hines et al., 1999).This leads to loss of generalization and lack of standardization which, on the contrary, is an important requirement for industrial systems.Several attuning methods, which deduce drift components directly from the training data, have been proposed to perform drift correction without resorting to the use of calibration.Padilla et al. (Padilla et al., 2010) compared OSC and CC.According to this work, OSC outperforms PCA-CC for a limited period of time while on the long term the advantage was not clear.More complex OSC models (with higher number of components) resulted in a better correction of variance for shorter times, but they degraded faster than simpler OSC.It was also observed that both OSC and PCA-CC methods are relatively robust regarding small calibration set sizes and perform rather well with a reduced calibration set.OSC results showed a higher variance than PCA-CC, but this was attributed to the number of components chosen in the models.On the other hand, PCA-CC needed a smaller training set and a single chemical species.This advantage may turn into disadvantage if the reference class is not properly chosen.Therefore, OSC seems to be a promising approach: it is time tested in chemometrics, uses multivariate and calibrant information, and it is simple to implement and interpret.However, although attuning methods represent a step forward in the definition of an efficient solution to the drift correction problem, the main limitation is that they do no contain provisions for updating the model and thus may ultimately be invalidated by time evolving drift effects.Conversely, they can dramatically fail when the drift direction changes.This might happen if one sensor gets stressed.In this case one must resolve to use adaptive methods.Adaptive methods represent an important step forward in tackling the problem of sensor drift in artificial olfaction systems.Until now they have been not sufficiently investigated and contrasted with other methods in order to definitively assess their superior capabilities.In fact, current solutions still present some limitations that prevent their widespread application.In particular, they require equiprobable and frequent sampling of all classes to avoid that a single class drifts too much making it unrecognizable.Moreover they strongly rely on correct identification from the PaRC model to take track of how different classes change.Local errors in the classification may easily reduce the capability of adaptation, thus reducing the effectiveness of the correction system.Further research efforts are required in this direction.

Conclusions and future research directions
In this chapter, after a brief overview of drift phenomenology in gas chemical sensors, the main challenges faced when developing drift correction techniques have been presented and discussed.A deep review of state-of-the-art methodologies proposed in the scientific literature has been illustrated with a rational taxonomy of the various approaches.
Drift correction is perhaps one the most relevant challenges in the field of chemical sensors.Indeed, in spite of constant improvements in micro/nano fabrication techniques that allowed the production of sensing devices with superior stability, it is still impossible to fabricate chemical sensors without drift.Much work has been done in the last fifteen years to develop adapted techniques and robust algorithms.In spite further research is still required to compare them on some benchmark data set.A complete comparison of performance should take into account several parameters, for instance: the type of sensors and likely the use of hybrid arrays; the presence/absence of a drift model for the given sensors; the short and long term drift behavior; the size of the data set (feature space dimension, number of measurements); the problem complexity (number of classes, degree of overlap, type of chemicals/odors).The problem of data correction in of simultaneous sources of drift, other than sensor drift, should be also investigated since it is often the case in practical situations.To this, one idea could be combining semi-supervised methods able to learn the actual source of drift, which might clearly change with the measured samples, with adaptive drift correction strategies that can account for the continuous drift direction change in the feature space.Finally, many algorithms of drift correction could be adapted to the problem of sensor failure and subsequent replacement, which has received little attention until now but represents a relevant problem in the long term.Indeed, only few works can be found in the literature aiming at detection and correction of sensor faults, like for example Tomic et al. (Tomic et al., 2004) that studied calibration transfer techniques like MDC and CC related to sensor replacement.The current trend in the field of artificial olfaction is to enlarge both the size of data sets and the dimensionality of the sensor arrays, by building for instance huge micro-machined sensor arrays made of many thousands sensors.In the near future, this shall definitively originate challenging issues for data analysis requiring more powerful drift correction algorithms able to handle large volume of data as well as high-dimensional features with acceptable time and storage complexities.Very large sensor arrays will also pose problems connected to variables redundancy.Therefore, a further relevant issue is the selection of appropriate and meaningful features to combine with the drift correctors, which can greatly reduce the burden of subsequent designs of classification/regression systems.

Fig. 2 .
Fig. 2. Comparison between as prepared (left) and 120 days aged sensor (right) by SEM-EDX analysis.Aged sensor shows a larger transparency of the IDC (compare upper left and right images) and the formation of Pt agglomerates (compare bottom left with bottom right).Legend: SE=Secondary Electrons; BSE=Back Scattered Electrons (Courtesy of Dr. A. Ponzoni and Dr. M.Ferroni, University of Brescia and CNR-IDASC)

Fig. 3 .
Fig. 3. Taxonomy of drift correction methodologies published in the scientific literature