Using meteorological normalisation to detect interventions in air quality time series

Interventions used to improve air quality are often diﬃcult to detect in air quality 1 time series due to the complex nature of the atmosphere. Meteorological normalisation 2 is a technique which controls for meteorology/weather over time in an air quality time 3 series so intervention exploration (and trend analysis) can be assessed in a robust way. 4 A meteorological normalisation technique, based on the random forest machine learning 5 algorithm was applied to routinely collected observations from two locations where known 6 interventions were imposed on transportation activities which were expected to change 7 ambient pollutant concentrations. The application of progressively stringent limits on the 8 content of sulfur in marine fuels was very clearly represented in ambient sulfur dioxide (SO 2 ) 9 monitoring data in Dover, a port city in the South East of England. When the technique was 10 applied to the oxides of nitrogen (NO x and NO 2 ) time series at London Marylebone Road (a 11 Central London monitoring site located in a complex urban environment), the normalised 12 time series highlighted clear changes in NO 2 and NO x which were linked to changes in primary 13 (directly emitted) NO 2 emissions at the location. The clear features in the time series were 14 illuminated by the meteorological normalisation procedure and were not observable in the 15 raw concentration data alone. The lack of a need for specialised inputs, and the eﬃcient 16 handling of collinearity and interaction eﬀects makes the technique ﬂexible and suitable for a 17 range of potential applications for air quality intervention exploration. 18


Introduction
exploration (Grange et al., 2018). The normalised time series is in the pollutant's original units and can be thought of as concentrations in "average" or invariant weather conditions. There has been some air quality research conducted which uses the idea of change-point 48 analysis to investigate changes in atmospheric pollutant concentrations (for example Carslaw 49 et al., 2006;Carslaw and Carslaw, 2007). Methods such as these rely on regime changes 50 where a time series abruptly shifts from one regime to another (Lyubchich et al., 2013). 51 In the air quality domain, this rarely happens, since changes are usually nuanced and 52 occur progressively with much variability which makes the generality of this approach for 53 investigating intervention efforts poor. Meteorological normalisation is potentially a more 54 general approach which enables its use in a greater range of applications. 55 Atmospheric processes are complex, non-linear, and observations commonly record 56 collinearity with other observations. These attributes make the process of statistical mod-57 elling very challenging, especially so with parametric methods (Barmpadimos et al., 2011). 58 With the rise of machine learning algorithms, these attributes can be much more easily 59 accommodated due to the non-parametric and robust nature of these techniques (Friedman 60 et al., 2001). The meteorological normalisation technique used here uses random forest, an 61 ensemble decision tree machine learning method as the modelling algorithm. 62 Random forest has been described very well and in depth elsewhere (see Breiman, 2001;63 Friedman et al., 2001;Tong et al., 2003;Ziegler and König, 2013;Jones and Linder, 2015; 64 Grange et al., 2018). However in brief, a single decision tree is formed from a series of term which represents emissions or atmospheric chemistry which varies seasonally. These processes are generally strong drivers of concentrations of most atmospheric pollutants 105 (Henneman et al., 2015). Random forest's ability to handle collinearity and interaction 106 between these and the other independent variables used and the lack of need of specialised 107 or exotic inputs results in a flexible tool kit for probing the influences of interventions on air 108 quality time series.   pollutants which are monitored at the site (Jeanjean et al., 2017).  Table 2. All these interventions are significant investments with large amounts 164 of planning and resources to execute and maintain. to monthly resolution for presentation in Section 3. A conceptual representation of the 179 meteorological normalisation processes is displayed in Figure A1.  Zeileis et al. (2003) and was implemented with the strucchange R package.

189
The random forest algorithm does not directly offer the ability to determine error or 190 uncertainty of estimates. However, uncertainty is important to consider in many situations.

191
To enable uncertainty to be evaluated for the case studies, 50 random forest models were 192 grown for each example with the hyperparameters described above, but with randomly for the four sets of models are displayed in Table 3. for the random forest models ( Figure 2).

212
Partial dependence plots of decision tree models allow the learning process to be interpreted 213 and a data user to examine how variables are being handled in the predictive model. Figure 3 214 demonstrates a two-way partial dependence plot for SO 2 concentrations at Dover Landon be seen in the two-way partial dependence plot as a clear reduction in SO 2 dependence when 225 winds were sourced from the port (the south; discussed further in Section 3.1.2; Figure 3).

226
Another clear feature isolated by the partial dependence plots was that SO 2 concentrations 227 increased with increasing air temperature at the Dover monitoring sites ( Figure 5). This  shown. The importances for the NO x models were very similar.   Figure A2) and the breakpoints identified were 345 unable to be resolved without the meteorological normalisation technique.

346
The tandem use of the meteorological normalisation procedure and breakpoint analysis 347 is powerful and can revel many changes, but in many cases there may not be sufficient  (Jenkin, 2004;Carslaw and Beevers, 2005). Figure