Sensors and Actuators B: Chemical Field calibration of a cluster of low-cost commercially available sensors for air quality monitoring. Part B: NO, CO and CO 2

In this work the performances of several ﬁeld calibration methods for low-cost sensors, including lin- ear/multi linear regression and supervised learning techniques, are compared. A cluster of either metal oxide or electrochemical sensors for nitrogen monoxide and carbon monoxide together with miniatur- ized infra-red carbon dioxide sensors was operated. Calibration was carried out during the two ﬁrst weeks of evaluation against reference measurements. The accuracy of each regression method was evaluated on a ﬁve months ﬁeld experiment at a semi-rural site using different indicators and techniques: orthogonal regression, target diagram, measurement uncertainty and drifts over time of sensor predictions. In addition to the analyses for ozone and nitrogen oxide already published in Part A [1], this work assessed if carbon monoxide sensors can reach the Data Quality Objective (DQOs) of 25% of uncertainty set in the European Air Quality Directive for indicative methods. As for ozone and nitrogen oxide, it was found for NO, CO and CO 2 that the best agreement between sensors and reference measurements was observed for supervised learning techniques compared to linear and multilinear regression.


Introduction
Low-cost gas sensors get more and more interest in the field of air pollution monitoring [2], in complement with conventional methods such as optical/spectroscopic analysers. Compared to the reference methods defined in the Air Quality Directive [3], low cost gas sensor would considerably reduce both installation and maintenance costs and allow larger spatial coverage especially in remote areas. Nevertheless, these devices are known to suffer from weak metrological characteristics, such as their intrinsic lack of selectivity, which makes them unreliable [4][5][6]. Although various methods have been developed and studied to overcome sensors weaknesses, the calibration of low-cost gas sensors do still represent a challenge in air quality monitoring. This process, besides the gas compounds of interest, often needs to take into account more variables.
Several methods and algorithms have been studied for the calibration of sensors, either establishing a linear relationship between the measured gas concentrations and the corresponding sensor responses, or as more sophisticated calibration functions, including multiple corrections of several gaseous and physical variables able to limit the impact of interferences. Dickow and Feiertag [7] presented a systematic method to determine calibration coefficients using polynomial equations fitted by ordinary least square. Other simple methods include deterministic correction of sensor response to solve the problem of gaseous interfering compounds as, for example, subtracting the O 3 interference from a NO 2 electrochemical sensor, that is well known for simultaneously measuring NO 2 and O 3 [8,9]. Other more sophisticated algorithms use data generated by metal oxides sensors (MOx) operated with temperature cycles to improve ozone sensitivity [10], or to improve selectivity and stability for some organic compounds [11]. Vergara et al. [12] reported the performances of a gas sensor array able to discriminate a plume of ten different gases under various wind conditions. Masson et al. [13]  a simplified laboratory model combined with a collocation field calibration. More sophisticated techniques, such as artificial neural networks, turned out to be of great interest. Kamionka et al. applied a combined temperature pattern and artificial neural networks on a field calibrations [14,15].
Neural networks have already been used for the monitoring of CO or CH 4 at high level concentrations [16]. This study reported either satisfactory results for short periods or generally weak for longer data series. Pardo et al. applied neural networks, in particular multilayer perceptrons on sensor's array data analysis [17]. Lots of attempts have been made to use neural networks for the calibration of sensors to monitor in the low concentration range (nmol/mol) [18,19]. De Vito et al. applied neural networks for the on-field calibration of CO, NO 2 and NOx sensors arrays [20]. However, the majority of the studies cited used MOx-type sensors, which are known to suffer from a lack of stability and long response time [21].
Recently, within the EURAMET MACPoll project [22], the performances of single commercial sensors have been evaluated [9,[23][24][25] according to a precise protocol [26]. This study produced large datasets of measurements for several compounds under laboratory conditions and field campaigns. Such datasets were not previously available in literature, especially considering the number of controlled parameters (NO x , O 3 , CO, SO 2 , CO 2 , temperature, relative humidity, wind and pressure).
As in the companion paper [1], this study presents an analysis of different calibration models' performances carried out on NO, CO and CO 2 sensors tested in the same conditions. The performances of these methods are compared by taking their measurement uncertainty as indicator. For CO sensors, it is checked that the measurement uncertainty is consistent with the Data Quality Objective (DQO) of the European Directive [3] for indicative methods. These DQO consist of a relative expanded uncertainty of 25%. Even if CO 2 is not included in the European Directive, it can be used as an estimator for profile of soil respiration [27], as a traffic related indicator [28,29] or proxy for combustion sources.

Material and methods
Experiments were carried out in parallel to the experiments for O 3 and NO 2 already discussed in Part A [1]. This work has been done in collaboration with the European Reference Laboratory for Air Pollution (ERLAP) at the Joint Research Centre's EMEP station (45 • 48.881 N, 8 • 38.165 E). This semi-rural station was equipped with meteorological sensors (temperature, relative humidity, wind and pressure) and reference gas analysers for NO x , O 3 , CO, CO 2 and SO 2 . These reference measurements were used for data validation, comparison and data treatment of sensor responses. Laboratory evaluation was only carried out for the NO sensor, including the limit of detection, the effects of gaseous interferent, temperature, relative humidity and wind velocity. Results of this evaluation are included in the discussion section. None in-house evaluation was carried out for CO or CO 2 sensors , thus the selection of sensors was based on the manufacturer datasheet (Table 1). Although the nominal range of these sensors is out of the air pollution concentration levels observed in ambient air, their high sensitivity and precise analogue to digital conversion allow to reach a low level of detection of pollutant concentrations, consistent with the levels characteristic of ambient air. The best performing sensors were selected according to high sensitivity along with high resolution and short response time.

Low-cost sensors
Besides the five NO 2 and the two O 3 sensors previously listed [1], two types of CO sensors, one electrochemical and one metal oxide, one type of electrochemical NO sensor and two types of infrared CO 2 sensors were tested. For each sensors, two devices were used to assess sensor repeatability. The list of the selected sensors is presented in Table 1 along with manufacturers' names and models specifications. All sensors were connected through NI DAQ boards (NI USB 6009 and NI USB 6218 from National Instruments, USA) to our LabVIEW in-house designed DAQ software. The periodicity of data acquisition was 100 Hz and measurements were averaged every minute without filtering. No data treatment was applied during data acquisition. The sensors were enclosed into aluminium protective boxes and the evaluation boards were covered with Teflon tape in order to protect the electronic and avoid any contamination of the sensor.
Two NO sensors NO 3E100 [30] from Citytech were tested (Life Safety Germany GmbH, City Technology, Bonn, Germany). They consist in three electrodes amperometric sensors with organic electrolyte. Each sensor was mounted on a Citytech evaluation board that converts the raw sensor signal into voltage, including the possibility to vary the bias potential using various loads, feedback resistors and different levels of current amplification. The board was configured to give an output of 1V-100 nA with damping 10.
Two CO/NO 2 combined metal oxide sensors from SGX Sensortech (Neuchâtel-Switzerland) were tested in this study. These sensors, the MICS 4514 [31], can detect NO 2 and CO simultaneously with two different signal outputs. They were soldered by the manufacturer on two MICS-EK1 adapters and mounted on two MICS-EK1 gas sensor's evaluation kits [32]. Based on the manufacturer datasheet, the evaluation kit was operated in manual mode on low power for the NO 2 sensors (43 mW corresponding to a R LOAD of 1 k ) and high power for the CO sensors (76 mW correspond to a R LOAD of 256 k ).
Two TGS 5042-A00 carbon monoxide sensors, manufactured by Figaro (Illinois−USA), were tested. They consist of battery like electrochemical sensors [33]. They were mounted on two evaluation modules COM5042 able to convert the sensor output current into a voltage [34].
There were two CO 2 sensors. The first, was a carbon dioxide module S-100H manufactured by TCC ELT (Environment Leading Technology, South Korea) and based on the NDIR (Non-dispersive Infrared) technology [35].
The second CO 2 sensor was the OEM Gascard ® NG infrared gas sensor (0-1000 mol/mol) manufactured by Edinburgh Sensors Table 1 List of clustered sensors, the resolution is either given by the manufacturers or estimated from data acquisition parameters or field experiments.

Manufacturer
Sensor (Lancashire−UK). It is based on dual wavelength NDIR technology with automatic temperature and pressure corrections using realtime environmental condition measurements. The CO 2 sensor used an active sampling with a 1L/min pump.

Reference measurements
The measuring campaign was performed at the JRC−Ispra station from March to July 2014. As described in Refs. [1,9,[23][24][25], the mobile laboratory was equipped with reference analysers, meteorological and low cost sensors. In addition to a UV photometer Thermo Environment 49C for O 3 and a chemiluminescence Thermo 42C for NO 2 /NO/NOx, we used a non-Dispersive Infrared Gas-Filter Correlation Spectroscopy Horiba APMA 370 for CO, and a differential non dispersive Infrared gas analyser Li-cor 6262 for CO 2 .
The gas analysers were calibrated in laboratory before the field tests and then checked on a monthly basis. Field checks were carried out using filtered zero air and span concentration values. These span values were generated with low concentration gas cylinders certified by the Joint Research Centre, who is accredited according to ISO 17025 [36] for these analysis. The gas cylinders included concentration levels of 50, 100 and 200 nmol/mol for NO/NOx, 1.3 mol/mol for CO and 369 mol/mol for CO 2 (uncertified). The highest observed calibration drift during field tests consisted of 2.5% for NO, 4.5% for CO and 1.5% for CO 2 . For the three gaseous species, these drifts were lower than the data quality objective of reference measurements (for example 15% for carbon monoxide) given in the European Directive for air quality. Therefore, no corrections were done on the reference measurements except discarding values during maintenance and calibration checks.

Sensors calibration methods and selection of variables
As in the companion paper [1], three calibration methods were tested: simple linear regression (LR), multivariate linear regression (MLR) and artificial neural networks (ANN) with raw, standardized (normally rescaled values, see Section 3.3 below) and calibrated sensor responses.

Linear regression (LR)
For each sensor, a calibration function was established by assuming linearity between the sensor responses and the reference measurements of each pollutant. Ordinary linear regression was used with minimization of square residuals of the sensor responses versus reference measurements. The calibration functions were of the type Rs = a.X + b where Rs represents the sensor responses and X the corresponding reference measurements of air pollutant. Finally, the measuring function, the reverse equation X = (Rs−b)/a, was applied to all sensor responses in order to predict air pollutant levels.
Within our dataset, the cases corresponding to the initial two weeks of valid measurements were used for calibration (about 336 hourly values). The remaining data (about 90% of the total dataset) were used for validation of the measuring functions.

Multivariate linear regression (MLR)
The calibration was carried out using the least square method taking into consideration more than one explanatory variables Y i . Table 2 shows the models and explanatory variables used. The models were established based on the manufacturer datasheet except for the NO sensors. Coefficients a, b, c, d and e represent calibration parameters. These parameters were estimated by multi-linear regression during calibration, by using: i) the sensor responses, ii) the known reference gas measurements, CO 2 , CO and NO, and iii) the known reference relative humidity and temperature, RH and T respectively. The resulting measuring function, X = f(Rs,Y i ), was then applied to each sensor. The same pattern of calibration/validation set used for linear regression was used for the multi linear regression.

Artificial neural network (ANN)
The calibration using artificial neural networks (ANNs), described in Ref. [1], was performed on three datasets: raw sensors data, standardized data and calibrated data using the MLR method (see Section 3.2). For the standardized values, the numeric data were scaled applying a z transformation with mean of zero and standard deviation of 1.
As described in the companion paper [1], the whole dataset was divided in three parts: -the training period used the first week of valid measurements (about 168 hourly values). -the test period used the 2nd week of the measuring campaign.
-the rest of the dataset (about 85% of data) was used as a validation set to ensure that the results on both testing and training set were real, and not artefacts of the training process The output of the ANNs consisted in an ensemble of maximum 1000 networks within 10 000 tested networks with different multilayer perceptron (MLP) architectures (see Table 3).
The input variables were selected using a sensitivity analysis. Variables that were both independent among each other and correlated with the air pollutant of interest were selected. Another sensitivity analysis was then performed to discard the variables which were not significant for the ANN architectures. The sensitivity analysis used the Sums of Squares Residuals (SSR) of the model, by comparing the SSR of the full ANN models to the SSR when the respective sensor was eliminated from the neural net. The not significant parameters were discarded one by one ( Table 3). The ANN training was repeated until all parameters were found to be significant. As far as possible, we avoid selecting reference gas measurements as inputs of ANNS, in order to rely mainly on low-cost sensors. However, we did considered reference temperature, relative and absolute humidity, for their potential impact on low cost gas sensors responses.

Evaluation of calibration method
Only hourly averaged values were considered for the evaluation of sensor performances. For each method, we based our study on the predicted values, taking into account regression coefficients and difference-based analysis. As described in Ref. [1], we calculated the coefficient of determination (R 2 ), the slope and the intercept of the regression line and compared them with the respective objective values 1 and 0. We also used the target diagram to show the Mean Bias Error (MBE) and the Root Mean Squared Error (RMSE) standardized with the standard deviation of the reference measurements. We used the measurement uncertainty based on the orthogonal regression [37] of estimated outputs against reference Table 3 Lists of available parameters and selected inputs for the different ANN.

Available parameters and sensors
Selected inputs after sensitivity analysis data to assess the performance of each calibration method. Finally, we estimated the drift over time by plotting times series of the daily residuals between reference measurements and sensor predictions.

Presentation of the dataset
The dataset was studied using descriptive statistics. The JRC's EMEP station being considered as a semi-rural site in humid region, shows high relative humidity and low air pollutant levels for NO and CO (see Fig. 1). However, high peak values of CO and NO 2 of respectively about 1.3 mol/mol and 150 nmol/mol were observed. These peaks are likely due to the provisional location of the mobile laboratory close to a railroad crossing.
The table of Fig. 1 shows the coefficients of correlation r between the reference measurements. This table shows that the dataset suffers from an important lack of independence between parameters. As example, CO 2 show a high negative correlation with temperature (r = −0.61) and a high positive correlation with relative humidity (r = 0.62). Although, it is well known that temperature and humidity are important factors that may affect sensors responses. Using only field tests with uncontrolled temperature and humidity conditions makes impossible the distinction between the temperature and humidity effects on the sensor response. As proposed in Ref. [1], in this case, we used absolute humidity instead of temperature and relative humidity since absolute humidity is not correlated with CO 2 .
Regarding our three species of interest (NO, CO and CO 2 ), the highest correlation was found for CO 2 with O 3 and CO with NO 2 . Concerning NO, only the correlation with NO 2 should be taken into account, but the coefficient (0.52). For CO, the high correlation with NO 2 (r = 0.80), suggested to reject NO 2 measurements as estimators. The same effect should be considered for CO 2 and O 3 (r = −0.81) and, at a lower extent, with relative humidity and temperature (correlation coefficients of respectively 0.62 and −0.61).

Results of calibration methods
For both linear regression (LR) and multi-linear regression (MLR) we performed the calibration using the first two weeks of measurements as a calibration period. Table 4 gives the evaluation parameters for linear regression (LR) and multi-linear regression (MLR) methods. For every sensor, the measuring equation (X = (Rs−B)/a or X = f(Rs,Y i )) was applied to the validation dataset. Regarding the MLR model we avoided using reference measurements as input. However, when needed (see Table 2), temperature and humidity were selected within the reference measurements to maximize the benefits of the calibration. The regression coefficient Table 4 Performances of linear and multi-linear regression for calibration of single sensors. Results were observed on the validation set (the quoted values represent standard uncertainties u), n is the number of validation measurements. for both calibration and validation periods show that only a few sensor pairs of the same model type gave very different results.
As an example of LR calibration, Fig. 2 gives the scatterplot of the LR predicted sensor values versus the CO 2 reference measurements for the 2nd S-100H sensor. Red dots represent the values used during the calibration process and the blue ones represent the predicted data based on the validation data set. The scatterplot shows that the R 2 of predicted values has slightly decreased on validation compared to calibration with respectively R 2 = 0.71 against R 2 = 0.93.
Concerning NO and CO sensors, neither LR nor MLR performed well enough. For the CO sensors, the strength of association from calibration to validation tends to decrease when applying either LR or MLR. The MOx sensors particularly show a radical drop of 90% (calibration R 2 = 0.76 and validation R 2 = 0.035). As shown in Fig. 1, CO and NO levels observed during the field experiment were very low. This especially affects the extrapolation of the calibration model outside the calibration range, resulting in a poor correla-tion between reference and sensors measurements due to this low extent of concentration range. Table 5 presents the performances of ANN calibrations for each gaseous compound and for each dataset. These ones are evaluated using the regression parameters fitted during the validation period. Based on the lists of the inputs selected after the sensitivity analysis (Table 3), we performed the analysis on three types of input data: raw, standardized (std) and calibrated with MLR data (MLR). We kept the same list of inputs for the three types of dataset in order to be able to compare them. The difference observed in the number of data used for the calculation is mainly due to the manual validation of data performed to remove artefacts and wrong values. The correlation observed for all datasets for both CO and NO stays rather low with a maximum 0.21 for NO (ANN based on standardized values) and 0.34 average for CO. The best correlation (0.79) for CO 2 was found for ANN with raw dataset, while the use of MLR data resulted in a decrease of R 2 (0.51). Fig. 3 gives the target diagram for LR, MLR and ANNs calibration methods for NO (green), CO (red) and CO 2 (black). The target diagram [38] is used for evaluating sensor data against reference measurements. This diagram is an evolution of the Taylor diagram [39], which was based on the geometrical relation between the Centred Root Mean Square Error (CRMSE) and the standard deviation of both reference (RM) and sensor data (S). The target diagram allows to extend the notion of the Taylor Tables 4 and 5. This plot represents the normalised RMSE as the quadratic sum of the normalised MBE on the Y-axis versus the normalised CRMSE on the x-axis. The distance between each point and the origin represents the normalised RMSE for each platform sensor. Furthermore, target scores are plotted in the left quadrant of the diagram when the standard deviation of the sensor responses is lower than the one of the references measurements and conversely. In the original approach of the target diagram, RMSE, MBE and CRMSE can be normalised using the standard deviation of the reference measurements (RM). Sensors with random error equivalent to the variance of the observations stand in the circle area of radius 1. Target scores inside this circle indicate a variance of the residuals between sensor and reference measurement equal or lower than the variance of the reference measurements. In fact, sensors within the target circle are better predictors for the reference measurements than mean concentrations over the whole sampling period. The target diagram normalised with S was already implemented in the companion paper [1]. Fig. 4 shows the relative expanded uncertainty (U r ) versus NO reference measurements for selected sensors calibrated by ANN raw data or ANN scaled data. Unfortunately, the uncertainty values corresponding to ANN MLR data were higher than 580% so they were not included in the diagram. Fig. 5 shows U r for CO sensors against CO reference measurements. For both species (NO and CO), LR and MLR are not shown in the figure as the uncertainty was higher than 120%. Fig. 6 shows U r for CO 2 sensors against CO 2 reference measurements. Finally, Fig. 7 gives the times series of the NO, CO and CO 2 residuals between reference measurements and sensor predictions using LR, MLR and ANNs calibration methods.

Discussion
The target diagram of all calibration methods is presented in Fig. 3. It shows that CO 2 , when calibrated with ANN raw, ANN scaled and ANN MLR methods, and NO, when calibrated with ANN raw and ANN scaled fell within the target circle. As already observed in Ref. [1], the ANNs calibration methods resulted both in a lower bias and a lower CRMSE than LR and MLR, thus evidencing that calibration with ANN is the most effective method. For the CO sensors, even though ANNs methods were found by far to be the most efficient methods, all calibration methods produced RMSEs falling outside the target circle. This evidenced a lack of agreement between CO sensor values and reference measurements. Our first guess is that this situation was primarily caused by the limited range of CO level at the test site (50% of data in a range of less than 0.2 mol/mol) which did not allow a correct fit of the calibration function, as we will further discuss below.
Generally, the lack of agreement of sensor values with reference values found out of the target circle was caused both by high bias and high CRMSE values as shown by the number of values falling on the quadrants of the target diagrams. A few exceptions can be observed with sensor values exempted from bias while presenting important RMSEs, e.g. TGS-5042 and S-100H (see Fig. 3 and Table 4). All values fell on the right quadrant of the target diagram indicating that the variances of sensor values were higher than the one of reference values, thus suggesting that the sensor value did not suffer of a lack of sensitivity as compared to the reference methods. Finally, the majority of CO and CO 2 bias was found to be positive showing an underestimation of the sensor values just after the calibration period. This assumption is confirmed in Fig. 7, which shows a decrease of the sensors' response especially for the CO sensors.
In addition, Fig. 3 shows that none of the LR or MLR calibration methods were able to give satisfying results for any of the gases. In fact, as shown in Table 4, all evaluation coefficients drastically decreased when applying the calibration models, except for one CO 2 sensor for which the coefficient decreased of around 25% suggesting either a lack of fit of calibration functions or the presence of important drift between calibration and validation. In fact, Fig. 7 gives evidences of a drift of the calibration methods over time of about 15 mol/mol over about four months for ANNs against around 25 mol/mol for LR and MLR. ANNs methods with the raw and scaled input resulted in similar constant noise, while ANN and MLR input showed slightly higher drift and noise.
Figs. 4 and 5 show the U r of the estimated values versus reference data as a function of the NO and CO's levels for the five calibration models. The full scale of the x axis on Fig. 4 (NO) is set to 100 nmol/mol by analogy with the hourly limit value for NO 2 . For both cases, the range of concentrations was too limited and both relative expanded uncertainties remained very high (in agreement with Fig. 3). The minimum uncertainty U r reached for NO is 70% at 6 nmol/mol for ANN using raw data and around 30% for CO at less than 1 mol/mol for ANN using MLR. The poor performances of the tested CO and NO models can be explained by the limited range of concentrations observed during the exposure period. Fig. 6 shows the U r of the estimated values versus CO 2 reference data for the five calibration models. The lowest uncertainty was obtained for the ANN models using either raw or scaled data. The minimum uncertainty value of 5% was reached at 440 mol/mol. Moreover, none of the ANN used ancillary data and the sensors were not pre-calibrated. The ANN models only used a combination of one NDIR CO 2 sensor, one NO 2 MOx sensor, one electrochemical NO 2 sensor and one combined NO 2 /CO MOx sensor (see Table 3).
Using the method described in 3.3, the same approach of selection of the input parameters has been applied to the three gaseous species: NO, CO and CO 2 . For CO 2 the sensor combination gives a good uncertainty result. For CO and NO the ANN does not show a real improvement even if the sensitivity analysis seems to be the most efficient in terms of cross sensitivity. Moreover, Fig. 7 shows that for NO and CO, ANN models decrease the noise. This reduction reaches a factor 10 for CO and a maximum of 100 comparing ANNs to LR and MLR calibration. Additionally, ANNs also seem to ba able to slightly correct the drift over time of CO sensors with about 0.05 mol/mol over four months against 0.25 mol/mol for LR and MLR models over the same period. NO sensors appear to be free from drift over time, apart from one NO-3E100 sensor with MLR calibration.
Among the selected sensors, the NO-3E100 was later tested in laboratory [40]. It showed a rather linear calibration line over the full scale of concentration (between 0 and 150 nmol/mol) at a stable temperature of 22 • C and relative humidity of 60% (Fig. 8). Its limit of detection was found to be high with 74.9 nmol/mol for minute averages. The limit of detection was estimated as three times the standard deviation of repeatability at 0 nmol/mol. The influence of gaseous interferences was determined by exposing the sensor to stable levels of pollutants, one by one, at the same temperature (22 • C) and relative humidity (60%). Table 6 gives the sensitivity coefficients of the NO sensor calculated as the slope of the linear regression between the calibrated sensor responses and the interfering compounds: O 3 , NO 2 , CO, CO 2 and NH 3 . For temperature and relative humidity, the sensor was exposed to a ramp of temperature from 7 • C to 37 • C by 5 • C steps and from 40% to 80% by 10% steps for humidity ( Table 6).
The estimation of the sensors' dependence toward hysteresis was carried out using a ramp of rising NO levels followed by a ramp of decreasing levels and finally by another rising ramp. Three calibration lines were plotted, one for each ramp. The NO-3E100 was found to be independent of any NO hysteresis effect in this experiment.
The effect of air matrix was also tested using three different air matrices (filtered air, ambient air and indoor air) for dilution with the same NO levels. Three calibration lines were plotted, one for each matrix and their respective trend lines were compared. The sensors showed a difference of about 20% in slope of the calibration lines according to the air matrices. This implies that one or more unknown gaseous interfering compound was present in the dilution air.
To determine the influence of wind velocity on the sensor's response, we carried out a four wind levels test between 1 m/s to 4 m/s with step of 1 m/s at constant NO. Temperature and humidity were kept under control during tests. The sensor appeared to be independent from wind speed between 1 and 4 m/s with change of sensor within ± 1 nmol/mol. This later study showed that O 3, NO 2 , CO and NH 3 had a significant effect on the NO-3E100 sensors at concentration levels that can be observed in ambient air. Additionally, the sensor appears to be suffering from a huge dependence on temperature while it was rather not affected by the change of relative humidity. During the field tests, the gaseous interfering compounds were not abundant enough apart from O 3 that may have interfered up to 10 nmol/mol. Ammonia is expected to have been lower than 5 nmol/mol at the field site. Together with the change in temperature, the listed parameters likely affected the poor performance of the NO sensor during the field measurement campaign.
Additionally, for NO and CO, due to the small concentration range observed at the rural site, we did not find out the right combination of sensors able to compensate the influence of the interfering parameters. Therefore, as for O 3 and NO 2 [1], higher concentration levels would be required to proceed with further testing.
For CO 2 , even if a DQO does not exist, we showed that by using a combination of four sensors over three different technologies, we reached a minimum uncertainty lower than 5% at the mean concentration.

Conclusions
Based on the measurement uncertainty estimated by orthogonal regressions of the sensor outputs versus reference data, the most suitable calibration method appeared to be ANN using raw, MLR or scaled sensor inputs (lowest relative expanded uncertainty of 70% for NO, 30% for CO and 5% for CO 2 ). In all cases, simple LR and MLR have shown to produce the highest measurement uncertainty likely due to the fact that these methods do not take into consideration all interfering factors with their weighted effect. The European Directive on air quality does not include CO 2 as compound of interest. However, CO 2 is commonly used as an indicator of traffic related exposure/activity or as an estimator of soil respiration. For this compound, ANN with raw and scaled data of four sensors from three different types (one spectroscopic, two metal oxides and one electrochemical) resulted in a good agreement with the reference values. We showed that by using this sensor combination, the uncertainty reached a minimum of less than 5% at the mean concentration (between 370 and 490 mol/mol). The outstanding result is mainly due to the fact that the measurement range of CO 2 sensors perfectly matches the levels of CO 2 in ambient air.
On the opposite, for NO and CO, we could not determine the best combination of sensors able to compensate the interfering parameters. While the sensors show a wide measurement range, their sensitivity is good enough for the target concentration. A later study on the performance of the NO sensors evidenced its huge dependence on temperature and other gaseous compounds (O 3, NO 2 , CO and NH 3 ). Thus, this lack of success must be attributed to the high interfering dependence of the evaluated sensors. Field measrement with higher levels of NO and CO are required to further proceed with the evaluation of field calibration methods for low-cost sensors.
As in the companion paper, it was shown that in general the ANN method increased the correlation between estimated and reference data (higher R 2 and lower CRMSE). Moreover, it also decreased the bias to reference data, with the slope and intercept of orthogonal regression approaching the unbiased values 1 and 0, respectively.
Regarding CO 2, it has been shown that a specific combination of various types of sensors used within the ANN can improve and solve the bias issue which affects the majority of sensors. We also observed that for CO 2 , the well-known humidity/temperature dependence of such type of sensors can be solved, even without the need for ancillary data. It is likely that this is linked to the difference of influence of these parameters on both types of sensors in the ANN. However, these ambient parameters (relative humidity and temperature) appear to be necessary for CO and NO.
Finally, we showed that using a cluster of sensors for calibration purpose, by extrapolation of the uncertainty at the limit value of 8.6 mol/mol, the CO data quality objective of the European Directive for indicative methods (uncertainty, U r , of 25%) are likely to be met in our study. Conversely, the estimated DQO for NO sensors could not be met with any calibrations. While CO 2 is not regulated by the European Directive, the cluster shows a very low uncertainty (5%). Moreover, by using a simple LR calibration, the uncertainty reaches less than 30% between 370 and 490 mol/mol. This good result is also linked to the optimally suited sensors measurement range with respect to the CO 2 levels in ambient air contrary to CO and NO.