The air quality impacts of pre-operational hydraulic fracturing activities

(cid:129) Priortohydraulicfracturingthereisanex-tensive period of preparation. (cid:129) Pre-operational activities led to an increase in NO x (274 %) and decrease in local O 3 (29%). (cid:129) Combustion-related sources are responsible for higher primary NO 2 emissions, which may exceed WHO guidelines. (cid:129) The pre-operational phase should be included in environmental assessments of shale gas extraction.


Introduction
Hydraulic fracturing or "fracking" is a technique used in unconventional oil and natural gas (O&G) development.Fracking is the industrial process of hydrocarbon extraction from shale rock formations by injecting large quantities of fluid at high pressure down a well, causing the rock to fracture and thus enabling the flow of trapped gas (Staddon et al., 2016).A combination of technological breakthroughs, such as horizontal drilling, has led to a wide-scale uptake of this technique, since it facilitates the extraction of O&G trapped within shale that cannot be exploited through conventional methods (Archibald et al., 2018).Shale gas has become a key source of natural gas in the United States (US) since 2000, accounting for 75 % of total US dry natural gas production in 2019 (U.S. Energy Information Administration, 2020).Subsequently, interest has spread to other countries, including Australia, Germany and the UK, as this technique has the potential to alter the energy landscape of a country and potentially enhance domestic energy security.
Significant environmental concerns associated with the impact of fracking have accompanied the increase in commercial application.These are most directly centred around sub-surface issues, such as the potential to cause earthquakes (Ellsworth, 2013;Mediaview, 2012) and the possible contamination of water supplies (Vengosh et al., 2014).Wider concerns around the climate impact of combustion of the fracked hydrocarbon are also highly relevant, and also the leakage of methane (CH 4 ) during extraction (Zhang et al., 2020;Alvarez et al., 2012Alvarez et al., , 2018)).Unconventional O&G development has also previously been found to have an impact on local and regional air quality.The predominant component of natural gas is CH 4 , a greenhouse gas with a high global warming potential, meaning it is often the focus for climate mitigation policies surrounding the O&G industry (Boucher et al., 2009).CH 4 emissions also contribute to poor air quality through the generation of ozone (O 3 ), a secondary pollutant with adverse health effects (Lippmann, 1989;Zhang et al., 2019).In addition to CH 4 , unconventional O&G development leads to emissions of nitrogen oxides (NO x = NO + NO 2 ) and non-methane volatile organic compounds (VOCs), resulting from point source, mobile and fugitive emissions (Field et al., 2014).Emissions of NO x are predominantly linked to the numerous emission sources associated with combustion (Vinciguerra et al., 2015).These include engines from drilling rigs, compressors, and generators, in addition to heaters and pumps.Acute exposure to NO 2 has been widely linked to adverse health effects such as reduced lung function and increased risk of stroke (Shah et al., 2015).Furthermore, both NO x and VOCs are recognised as key pollutants in the production of O 3 .Consequently, elevated levels of O 3 in the atmospheric surface layer have been linked to emissions from regions of O&G production (Edwards et al., 2014;Helmig et al., 2014).
Often, the focus on air quality emissions relates to releases associated with the opening of a well and the subsequent extraction of gas.Previous work conducted in the US identified drilling and flaring to be the dominant sources of NO x (Dix et al., 2020).However, prior to drilling and extraction there is a significant period of preparation, during which the well pad must be built, the rig transported and constructed and material required for fracking must be transferred onto site.This results in a considerable increase in heavy duty vehicle traffic.The environmental impacts of road traffic emissions associated with unconventional O&G operations are often noted but rarely quantified (King, 2012).Previous modelling work has shown that traffic related to hydraulic fracturing could lead to 18 %-30 % increases in total daily NO x emissions (Goodman et al., 2016).Moreover, it is apparent that the enhancement above baseline values is most significant for rural or village locations, where concentrations are typically low, but these are where future shale gas developments would likely occur.The future for shale gas exploitation in the UK remains uncertain, although models have been used to estimate impacts in the absence of any commercial scale activity.One modelling study demonstrated that increases of NO x and VOC emissions associated with hydrocarbon extraction could lead to approximately 110 extra premature deaths a year in the UK from increases of up to 30 ppb in the monthly mean of daily 1-hour maximum NO 2 (Archibald et al., 2018).Understanding the totality of emissions from green field site through to well completion are vital to help inform decision making and future policies, should the sector be considered for expansion (Purvis et al., 2019).

UK context
Permits for shale gas extraction in the UK are currently paused and have gone no further than early exploratory stages.As of November 2019, the UK government announced a moratorium on fracking in England due to unpredictable seismic events (Priestley, 2020).However, increasing energy costs as a result of the Ukraine conflict in early 2022 have provoked renewed interest in the industry (BBC News, 2022c).In April 2022 a UK scientific review into shale gas extraction was launched to assess the progress made to address safety concerns surrounding the industry (The Guardian, 2022).In an effort to decrease reliance on imported energy, the UK government overturned the ban in September 2022, stating that fracking would be allowed in areas where there is local community support (BBC News, 2022a, 2022b).
To date, exploration has taken place at a small number of sites in the UK.These include in Kirby Misperton (KM), North Yorkshire, where there has been extraction from a conventional gas field, named the KM1 well-site, for more than decade.In 2013, an extension to KM1 was constructed and a new well was drilled, referred to as KM8, where approval was initially granted for fracking in May 2016.
Following approval, significant changes in the on-site infrastructure occurred at KM during September 2017.Machinery required for hydraulic fracturing was brought onto the well pad in preparation for the start of operational activities.Drilling rigs, pumps, compressors, diesel generators and containers holding water, sand and fracking fluid were among the equipment transported onto the site.In addition to the increase in equipment and activity on the site itself, traffic volume due to delivery trucks increased along with additional idling vehicles in close vicinity to the site from protest activities as well as a high volume of policing and media interest.This phase of preparation is defined from here on as the "pre-operational" period.Despite the preparations, final government consent was never received and all fracking-related operations subsequently ceased at KM in February 2018 (BBC News, 2019).The isolated nature of the preoperational period presents a unique opportunity to assess a relatively understudied stage of the well pad life-cycle.Currently, there are only a small number of studies which report on emissions during the preproduction stages of O&G development (Hecobian et al., 2019;Jarosławski et al., 2022).Moreover, fracking is a temporary process and usually takes only 3-5 days for a single well once drilled.This is short in comparison to the length of the preparation period, which takes place over a number of weeks or months.In the U.S., where large multi-well pads are common, numerous wells (10-20) can be fracked on timescales comparable to the preparation period.In both cases, pre-operational emissions would be expected to occur for a significant proportion of the entire extraction process and are thus important to consider.

Objectives
The primary objective of this study is to provide a quantitative reference for the impact of pre-operational hydraulic fracturing activities on local air quality.We apply statistical predictive models in a unique context to quantify the associated change in air pollutant mixing ratios (NO x and O 3 ) during site preparation, whilst controlling for local meteorological conditions.

Data and methods
An air quality monitoring station was installed along the east wall of the well site, approximately 45 m from the KM8 well head, shown in Fig. 1.The enclosure was positioned to be predominantly downwind of the shale gas extraction infrastructure, whilst being open and unobstructed in all wind directions.The close proximity of the monitoring station to the well head provided a high sensitivity of observations to operational activity.Instrumentation was housed in a mains-powered, air-conditioned, weatherproof enclosure.Ambient air was sampled from gas phase inlets which were fixed to the top of the monitoring station at a height of 3 m.

Instrumentation
The monitoring station was equipped with a suite of air quality instrumentation, allowing the measurement of several air pollutants along with meteorological variables.Data was collected at a resolution of 1-minute but was hourly aggregated for use in this analysis.A summary of the instrumentation is provided in Table 1.
Quality assurance (QA) and quality control (QC) procedures were routinely performed for all aspects of data acquisition, including; equipment evaluation, site operation and maintenance and data review.Calibrations of air quality instrumentation were conducted on a monthly basis throughout the entirety of the measurement period.All gas phase instrument calibrations were traceable through a chain to international reference standards to maintain a high accuracy and provide known uncertainties in the recorded data.This also ensures comparability with similarly calibrated instrumentation, such as those parts of the UK's monitoring networks.
On-site span and zero point calibrations were performed monthly for the NO x analyser.The span calibration was conducted using a 100 ppb NO standard in N 2 , linked to a National Physical Laboratory (NPL) binary standard and also referenced to the WMO Global Atmospheric Watch (GAW) scale.Zero calibrations were performed using an air scrubber filled with Sofnofil followed by activated charcoal.The NO 2 conversion efficiency was calculated on an annual basis by returning the instrument to the laboratory to carry out a gas phase titration with known quantities of O 3 .
The O 3 instrument provides an absolute measurement but was verified annually off-site using a Model 49i-PS Primary Standard over the calibration range 0-500 ppb.The primary standard was itself checked annually against a certified source by NPL.Instrument blanks took place monthly using air filtered through an activated charcoal trap.Instrument blanks generally read between −0.5-0.5 ppb, resulting in a maximum uncertainty of 7 % for typical daytime mixing ratios of O 3 (15-30 ppb).

Random forest models
When considering changes in ambient air pollution, it is often difficult to disaggregate changes in mixing ratios due to meteorology from a change in the number or strength of emission sources.Baseline data collected prior to the period of interest can be exploited to identify events that deviate from the "normal" (Shaw et al., 2019), however the influence of meteorology often adds complications, making the quantification of such events challenging.Controlling for meteorological variability allows deviation events to be more robustly assessed.This is achieved by training a statistical model where a range of explanatory variables can be used to account for some of the variability in pollutant mixing ratios.
Random forest (RF) models have been widely used elsewhere to control for the effects of weather in air quality datasets, predominantly in the application of a "meteorological normalisation" technique (Grange et al., 2018;Grange and Carslaw, 2019;Zheng et al., 2020;Cole et al., 2020).
The method here is somewhat different since the models are used to predict mixing ratios during the pre-operational period, assuming a business as usual (BAU) scenario.This is essentially an intervention study, similar to other work quantifying the effect of an airport closure (Carslaw et al., 2012) and more recently the effect of the COVID-19 lockdowns on air quality (Forster et al., 2020;Grange et al., 2021;Carslaw, 2020).The BAU scenario assumes pre-operational activities did not occur at the site and therefore baseline conditions were uninterrupted and continuous.The BAU scenario is then compared with observations to quantify the incremental effect of the pre-operational period on air quality at KM.

Model construction
RF models were developed for NO, NO 2 , NO x , O 3 and total oxidant (OX = NO 2 +O 3 ) using the rmweather R package (R Core Team, 2020; Grange et al., 2018).Models were trained using hourly-averaged baseline data collected before and after the pre-operational period, which was an isolated period of activity on site (Fig. A.1).Of this training set, 80 % of the input data was used for model training whilst the remaining 20 % was used for model validation.For each species this split equated to approximately 19,000 training and 4800 testing observations.The performance of such models was assessed before they were used to predict pollutant mixing ratios using local meteorological variables as the model input.The model parameters were set as follows: the number of trees was fixed at 300, the minimum node size was set to 5 and the number of independent variables randomly sampled at each split was 3 (the square route of the number of independent variables).The explanatory variables used for prediction were: Unix date (number of seconds since 1970-01-01) as the trend term, Julian day as the seasonal term, weekday, hour of day, air temperature, atmospheric pressure, wind direction and wind speed.For the input meteorological variables, missing data was replaced with the median.An additional variable, "section", was introduced with acted as an identifier for data "before" and "after" the pre-operational period.This variable essentially helps account for the fact that the baseline characteristics after the pre-operational period may not be identical to those before.For example,  Therefore this variable provides the models with a way of distinguishing between the two sections of data when making predictions during the baseline period.However, the "section" variable was excluded when predicting mixing ratios during the pre-operational period since it would take neither value that was used in the training process.

Model performance
The RF models performed well, with R 2 values ranging from 60 % to 90 % (Table B.1).This suggests that the variation in mixing ratios of these pollutants can be reasonably well explained by a combination of meteorological conditions, along with time variables, which essentially act as proxies for emission source strength (Derwent et al., 1995).
The performance of the RF models was validated using a set of baseline data which was held back from the training process.Before initialising the models, the baseline data was randomly split into "training" and "testing" sets of data, accounting for 80 % and 20 % of observations respectively.Since the testing set of data was not used to build the models it can be used to provide insight into how well the models generalise to an independent data set.The models performed well with R 2 values ranging from 58 % to 90 %, suggesting the models are suitably capable of predicting unseen data (Fig. A.2).The best performing RF model was for O 3 , which had an R 2 matching that of the training set (0.9) and for which data are closely scattered around the 1:1 line (Fig. A.2). Poorer model performance was found for NO x , in particular NO.Data below a measured value of 20 ppb is well correlated around the 1:1 line but the model fails to predict shortlived spikes in mixing ratios (Fig. A.2).This is perhaps to be expected since NO is a fast-reacting primary pollutant, where enhancements are strongly linked to events in the local environment in the vicinity of the monitoring site, such as a passing vehicle.Proxies in the model such as hour or day attempt to control for this but are unlikely to be good predictors for sporadic events, hence model performance is expected to be weaker.A much better performance was seen for NO 2 since this is predominantly a secondary pollutant.NO 2 mixing ratios are driven by air originating from more widespread sources on larger spatial scales, such as local traffic flow, which the proxies in the model capture much better.
The RF models were further evaluated by looking at the relative importance or predictive power of each independent variable.This metric is calculated by first assessing the model performance by passing a validation set of out of bag (OOB) data through the trained model.The model accuracy is then computed by comparing the predicted values to the observed values in the validation data set.Next, the values contained within the column of a single variable are permuted or randomly shuffled, essentially giving them no predictive power.The validation data are then passed through the RF model again and the performance evaluated.The feature importance is essentially the decrease in prediction accuracy caused by permuting the column (Breiman, 2001).The importance of each variable is averaged across all trees to obtain the permutation importance for the entire forest (Strobl et al., 2008).Fig. 2 shows the permutation importance of each predictive variable for each pollutant.The trend term (unix time) and seasonal term (Julian day) were the most important explanatory variables for both components of NO x , suggesting NO x concentrations at KM are largely driven by annual cycles in regional emissions.Interestingly, hour-of-day and day-of-week were found to have little influence on the models ability to predict NO x , suggesting time variables are relatively weak proxies for local emission source strengths, such as traffic, in a rural location such as KM.Similarly, wind direction was a relatively unimportant variable, again reflecting the characteristics of a rural background site where concentrations are not influenced by specific point sources of emissions but rather by the integrated contribution from all upwind sources.In terms of O 3 , wind speed was the second most important variable, which is consistent with a polar plot of O 3 (Fig. A.3c), where high O 3 is associated with relatively strong wind speeds (8-10 m s −1 ) from the west.

Effect of the pre-operational period on ambient mixing ratios
The impact of the pre-operational period on NO x mixing ratios was initially investigated by studying the variation in pollutant mixing ratios by each hour of the day.Pollutant concentrations are often influenced by the structure and diurnal variability of the planetary boundary layer (PBL).Surface heating drives the formation of the PBL.As the PBL grows throughout the day, pollutants are diluted as they mix with cleaner air from the free troposphere.Similarly, as the PBL shrinks at night time in the absence of surface heating, emissions become more concentrated as they are confined to a smaller volume of the atmosphere.Fig. 3 shows the average diurnal mixing ratios of NO, NO 2 and NO x throughout the baseline and pre-operational periods.The baseline data for each year was filtered to the equivalent of the pre-operational period (19th September-1st February) in order to prevent bias due to the seasonal variation of NO x mixing ratios.Additionally, the observations were filtered to wind directions which favoured transport from the well pad (contained a westerly component).During both periods, mixing ratios of NO x began to increase from 06:00 and remained enhanced throughout the day before declining into the evening and overnight.As a result, Fig. 3 suggests that changes in the PBL height have no obvious effect, implying that NO x mixing ratios at KM are strongly driven by local emissions rather than meteorology.
Fig. 3 highlights some clear changes in NO x across the two monitoring periods.During the pre-operational period, the range over the day was 3times greater for both NO and NO x compared to the baseline phase.This is primarily due to high daytime mixing ratios of NO, leading to an amplified diurnal cycle throughout the pre-operational period.The largest change was observed for early evening NO, where the mean NO mixing ratio at 17:00 increased by 3975 % from 0.4 ppb in the baseline period to 16.3 ppb during the pre-operational period.This is comparable to the morning rush hour peak in NO (approximately 15 ppb) observed in North Kensington, an urban background site, during the ClearFlo campaign in London (Bohnenstengel et al., 2015).During the pre-operational phase, peak values of NO x occurred at 08:00 and 15:00 with an obvious dip at 12:00.This is suggestive of anthropogenic activities, where mixing ratios increase in the morning as the working day begins, decline during a break over lunchtime before increasing again in the afternoon as work resumes.
Also of note is the change in the relative contributions of NO and NO 2 to total NO x .During the baseline period NO 2 dominated NO x mixing ratios, contributing 83 %.However, during the pre-operational period this trend was reversed and NO became the major component of NO x , contributing 59 %.This suggests a change in the most prevalent source of NO x at KM, specifically an additional source of primary NO close to the monitoring site such that oxidation to NO 2 was yet to occur.The enhanced structure in the diurnal cycle and change in the predominant component of NO x is strong evidence that the pre-operational period had a measurable effect on ambient NO x mixing ratios at KM. Additionally, Purvis et al. ( 2019) reported a 4-fold and 2-fold increase in the annual means of NO and NO x from 2016 to 2017, respectively, showing that the pre-operational period led to a significant deterioration in the overall air quality at KM.
The RF models for each pollutant were used to predict the BAU values during the pre-operational period, which ran from 19th September 2017-1st February 2018.Whilst only the baseline data was used for model training, the entire data set was predicted using all available meteorological data as inputs to the RF models.Fig. 4 shows the daily mean time series for observed and predicted mixing ratios between 2016 and 2019 at KM.As expected, the measured and predicted values strongly agree during the baseline phase of monitoring since this data was used to grow and train the RF models.Discrepancies during this period arise when "spikes" occur in pollutant mixing ratios.In part, this is because the models here are regression models and every prediction is an average (mean) of 300 predictions from 300 trees.As a result, the models have a limited ability to capture minima and maxima in pollutant mixing ratios.Significant deviations between the predicted and observed values begin to appear at the beginning of the pre-operational period.Measured NO x values are enhanced relative to the predicted values, whilst the opposite is true for O 3 .
In order to evaluate the change in pollutant mixing ratios and to understand the predictions made by the RF models, the general meteorological conditions during the pre-operational period must be considered.Fig. 5 shows the average meteorological variables during the equivalent of the pre-operational period (19th September-1st February) for each year of monitoring at KM. Crucially, the prevailing wind direction was consistently from the west or south west across all years, meaning the monitoring station was ideally located to detect the effect of the activity on site.The air pressure was lowest during 2017-2018 with a mean value of 1008 mbar compared to 1017 mbar in the previous year.Low air pressure systems generally lead to wet and windy weather conditions.Consequently, this was concurrent with the greatest mean wind speed, which was 40 % higher than the previous year and 60 % higher than the following year.This is expected to lead to lower mixing ratios of pollutants, such as NO x , due to an increase in atmospheric dispersion.Indeed, this is reflected in the model predictions, where the predicted mean NO x during the pre-operational period was 35 % lower than the equivalent period for the previous year.Conversely, the opposite is seen in the measurement data, where total NO x was enhanced 2-fold during the pre-operational period compared to the same period in the previous year.These results are therefore consistent with the hypothesis that the pre-operational period caused increased mixing ratios of NO x .In terms of O 3 , the meteorological conditions outlined in Fig. 5 have the opposite effect.Purvis et al. (2019) show that elevated westerly winds generally lead to enhanced O 3 at KM, therefore predicted O 3 during the pre-operational period was 19 % higher than the previous year.As was the case for NO x , the observations show the contrary, where O 3 was 14 % lower than the previous year during the pre-operational period.
To link the divergence from the BAU scenario to the increase in activity due to well preparation, a plot of the increment, defined as observed minus predicted, versus wind direction is shown for NO x and O 3 in Fig. 6 Air pressure (mbar) Air temperature (°C) 2 0 1 6 − 2 0 1 7 2 0 1 7 − 2 0 1 8 2 0 1 8 − 2 0 1 9 2 0 1 6 − 2 0 1 7 2 0 1 7 − 2 0 1 8 2 0 1 8 − 2 0 1 9 2 0 1 6 − 2 0 1 7 2 0 1 7 − 2 0 1 8 2 0 1 8 − 2 0 1 9 2 0 1 6 − 2 0 1 7 2 0 1 7 − 2 0 1 8 2 0 1 8 − 2 0 1 9 shows that the maximum NO x increment is observed for westerly winds (NO x = 6.48 ppb), but falls to zero for northerly and easterly winds.Some of the increment shown in Fig. 6 during southerly winds could likely be from idling vehicles associated with the protest campaigning, policing and media presence located outside the site access point on Habton Road.Equivalent plots of NO and NO 2 (not shown) also displayed an enhancement during westerly winds but the magnitude of the enhancement differed (see discussion below).For O 3 the trend was reversed such that the largest, negative increments were observed under westerly winds.Fig. 6 is strong evidence that the increment in NO x mixing ratios and concurrent decline in O 3 is consistent with a change in emission source strength to the west, where the well pad lies.

Quantifying the change due to pre-operational activities
The observed and BAU mixing ratios were used to quantify the air quality impact of the pre-operational period.Table 2 shows the estimated percentage change in the mean mixing ratios of pollutants between 19th September 2017-1st February 2018.The uncertainties in the measured and predicted values represent the 95 % confidence intervals around the mean.Uncertainties in the delta values were found by summing the standard uncertainties for the mean-measured and mean-predicted values in quadrature and subsequently multiplying the result by a coverage factor of k = 2 to give an uncertainty at the 95 % confidence level.The relative uncertainties in the percentage change values were calculated by summing the relative errors of the delta and predicted values in quadrature.The absolute error was then multiplied by a coverage factor of k = 2 to give an uncertainty at the 95 % confidence level.A higher uncertainty, represented as a 95 % confidence interval, was found for the measured values, since there was a much larger variation in mixing ratios compared to the predicted values.For example, measured NO x mixing ratios ranged from 0.5 to 231 ppb, whereas predicted values of NO x only ranged between 1.3 and 20.6 ppb.
Table 2 shows the greatest change was observed for NO, which increased approximately 7-fold, by 566 %, compared to the BAU scenario.The associated increase in NO 2 was much less (152 %), which is to be expected since primary emissions of NO x are predominantly in the form of NO (Department for Environment, Food and Rural Affairs, 2004).The increase in NO x was accompanied by a decrease in O 3 of 29 %.Since O 3 and NO x are closely linked through a chemical cycle within the atmosphere, incremental increases in NO lead to the destruction of O 3 via titration of the two species.Locations with very high NO x emissions generally do not show as large an increase in O 3 because the source is in very close proximity and NO mixing ratios remain high relative to oxidant mixing ratios (Department for the Environment, Food and Rural Affairs, 2002).This behaviour is

Table 2
Measured and predicted means, deltas (measured-predicted), and percentage change, along with the 95 % intervals for the pre-operational period at KM. typical of a roadside monitoring site, which in this case is a good parallel for KM since the monitoring station is located on the well pad itself.It should also be noted that the KM8 well to be fracked was housed on a pre-existing well pad for conventional gas extraction.However, for brand new wells, the preparation phase would be significantly longer since it would include building the above ground infrastructure, which may require clearing trees, levelling the surface, constructing access roads and laying the well pad itself.Therefore an extended period of preparation is likely to result in larger changes than those reported here.The measured NO 2 mixing ratios were well below the Air Quality Standard Regulations for the UK, which require that the annual mean concentration of NO 2 must not exceed 40 μg m −3 (19 ppb).However, if pre-operational activities were extended and persisted for an entire year, it's expected that NO 2 would have exceeded the WHO guidelines for 2021, which set a much stricter recommendation of only 10 μg m −3 (5 ppb) (World Health Organization, 2021).

Change in total oxidant
The suppression of O 3 close to sources of NO x is often accompanied by enhanced levels of O 3 further downwind.This is due to the oxidation of NO to NO 2 with peroxy radicals and subsequent photolysis of NO 2 to form O 3 .Therefore, to account for this photochemistry, the total oxidant (OX = NO 2 +O 3 ) is considered, since production and loss are independent of the chemical coupling that results in the interconversion of NO 2 and O 3 .Changes in OX reflect the abundance of oxidants and are therefore more representative of the production of oxidant than O 3 alone (Lu et al., 2010).OX can be described in terms of a local, NO x -dependant contribution and a regional, NO x -independent contribution (Clapp and Jenkin, 2001).The regional contribution essentially equates to the regional background level of O 3 , whereas the local contribution correlates with the level of primary pollution and essentially represents the fraction of directly emitted NO 2 .The individual contributions to OX can be quantified from an [OX] vs. [NO x ] plot, where the slope obtained from a linear regression represents the local OX contribution, whilst the intercept represents the regional contribution (Clapp and Jenkin, 2001).
An additional RF model was constructed for OX (R 2 = 0.89, MSE = 8.44 ppb) to assess the change in total oxidant as a result of preoperational activities at KM. Performing an identical analysis to that for NO, NO 2 , NO x and O 3 yielded a 9 % increase in OX relative to the BAU scenario.Fig. 7 shows the local and regional contributions to OX

Baseline
Pre−operational shows that the regional component (intercept) of OX is consistent with the expected seasonal cycle of O 3 , where O 3 generally reaches a minimum during the winter months.Throughout the pre-operational period, the regional contribution to OX followed a declining trend as it approached a minimum and does therefore not account for the observed increase in OX.Fig. 7b shows the local contribution to OX between August 2017-February 2018.Throughout the baseline period, the fraction of NO x directly emitted as NO 2 (f-NO 2 ) was negligible, since only secondary NO 2 resulting from the oxidation of primary NO from upwind sources was observed.However, increases in f-NO 2 leading to positive contributions were consistent with the start of the pre-operational period.Throughout the whole of this period, f-NO 2 ranged from 6 %-37 %, suggesting the increase in OX was driven by changes in primary NO 2 emissions on or near the site.This is likely as a result of the presence of diesel vehicles and generators, which tend to emit a higher f-NO 2 compared to petrol due to diesel emission control technologies such as Diesel Oxidation Catalysts (DOC) (Carslaw et al., 2019).Since access to active O&G sites is exclusively permitted for diesel vehicles, the associated increase in OX is likely to have significant implications on the photochemical production of ozone in regions of hydraulic fracturing.

Site characteristics
Despite the 4-fold increase in total NO x during the pre-operational period (Table 2), concentrations were still well below the air quality standards regulations.In order to place the observed changes in concentrations into context, data was compared to that from the Automatic Urban and Rural Network (AURN), the UK's primary air quality monitoring network.The locations of each site are shown in Fig. A.4.Since KM is located in a rural area and not influenced by any single point source, should it be part of the AURN it would likely be classified as a rural background site.
Fig. 8 shows the probability density distribution of NO and NO 2 during both the baseline and pre-operational periods at KM. Baseline observations were filtered to between 19th September-1st February in order to minimise differences due to seasonal factors.Additionally, data from AURN monitoring sites was aggregated to each site classification throughout the same time periods in order to make a robust comparison.Density plots show how the concentrations of pollutants are distributed and give a more detailed indication of where the bulk of measurements lie.This is more useful than using a mean concentration, which can often be skewed by spikes in data.From Fig. 8 it is clear that, as expected, throughout the baseline phase of monitoring, the distributions of NO and NO 2 were most representative of a rural background site.For NO, 93.4 % of the measurements were in the interval 0-5 μg m −3 , almost equivalent to 93.2 % for rural background sites.Similarly for NO 2 , 82.8 % of data lay in the range 0-10 μg m −3 , compared to 65 % for rural background sites.For comparison, only 21 % of data lay within the same range for urban background sites.
For the pre-operational period there is a clear shift in the distribution of both NO and NO 2 concentrations.The NO measurements display a bimodal distribution, in which 31.4 % of observations fall into the range 0-5 μg m −3 and the bulk of observations lie in the range 5-20 μg m −3 , accounting for 49.8 % of observations.This suggests the initial source of NO (likely to be Habton Road) still exists, but that there is also an additional source responsible for higher levels of NO.During this phase, KM is approximately comparable to urban traffic sites, where only 25.1 % of data fell into the range 0-5 μg m −3 and the majority of observations (52.4 %) were between 10 and 100 μg m −3 .There was an evident, albeit smaller change in the distribution of NO 2 .A much broader spread was observed in the data (note the log scale in Fig. 8) with 73.3 % of data distributed between 5 and 40 μg m −3 , compared to 74.3 % for urban industrial sites and 55.4 % for urban traffic sites.climatology representative of a rural background site, with relatively clean air typical of the UK regional background, to that more analogous to an urban setting.This could have implications for residents living in the surrounding area of the well site, particularly if the industry were scaled up to facilitate hundreds of wells across the countryside.Should hydraulic fracturing have subsequently taken place, it is expected that emissions would be elevated further above baseline levels.However, it is important to note that the monitoring site was located only 45 m from the well.Emissions of primary pollutants such as NO x and VOCs are expected to decrease with increasing downwind distance due to dispersion.Whilst there are a small number of isolated dwellings located within 500 m of the well, local residents are unlikely to live in such close proximity to the well pad and as such will likely be subjected to lower levels of primary pollution than measured here.However, a study in California observed enhanced concentrations of ambient air pollutants within 4 km of pre-production wells and within 2 km of producing wells, suggesting the footprint of emissions from unconventional gas extraction extends far beyond the site itself (Gonzalez et al., 2022).

Conclusions
Well pad preparation is a key phase within the shale gas extraction process.Constructing and operating a shale gas well requires a large amount of above ground infrastructure and equipment, which must be transported to the well pad.The resultant traffic load and subsequent on-site activity introduces an additional source of air pollutants to the local environment prior to any hydraulic fracturing.In this work, the impact of the preoperational phase is investigated through the application of random forest machine learning models to air quality data in the rural village of Kirby Misperton in North Yorkshire.
Extensive baseline monitoring of air pollutants two years prior to the start of shale gas operations enabled the characterisation of the local air quality climatology.The baseline observations were used to predict mixing ratios in the construction of a "business as usual" scenario, which assumed no change in the activity on site.The counterfactual was then compared to the observations, revealing a 274 % increase in NO x and concurrent decrease in O 3 of 29 %.Changes in NO x were dominated by increases in NO as expected for a traffic-related emission source.However, evaluation of the total oxidant (OX) revealed enhancements of the primary NO 2 fraction (f-NO 2 ), which could have negative implications for local public health.Whilst emissions were found to be enhanced significantly above baseline levels, concentrations remained well within UK regulatory limits set for NO 2 .However the concentrations of NO 2 experienced, if sustained year round would have likely been above the 2021 WHO guidelines for NO 2 .Comparison of the data to that from UK AURN monitoring sites dem-onstrated a shift in the chemical environment at KM to that more similar to a suburban city environment in terms of NO x .Since no hydraulic fracturing ultimately took place on the site, this work identifies a systematic change in NO x due to site preparation in isolation.Often considerations of emissions from unconventional O&G development only emerge once infrastructure is in place and drilling begins.This work therefore exposes a relatively understudied source of emissions from the shale gas industry.Since the desire to reduce dependence on imported energy may refresh interest in domestic gas production in the UK, data that supports the fullest possible assessment of the environmental impacts of activity are vital, and that should include the impacts of pre-operational phases.

Fig. 1 .
Fig. 1.Locations of the baseline monitoring station (circle) and the KM8 well (triangle).Lines identify major and minor roads in the area.Grey shading shows residential areas and the gold shading shows well pads operated by Third Energy.

Fig. 2 .
Fig.2.Variable importance plot for 300 RF models for NO, NO 2 , NO x and O 3 at KM.

Fig. 3 .
Fig.3.Average diurnal mixing ratios of NO, NO 2 and NO x during westerly winds throughout the baseline and pre-operational periods at KM. Baseline data was filtered between 19th September-1st February for each year.The shaded areas represent the 95 % confidence intervals.

Fig. 4 .
Fig.4.Measured and predicted daily mean mixing ratios of pollutants at KM.The grey shaded area represents the pre-operational period.

Fig. 5 .
Fig. 5. Means of meteorological variables at KM between 19th September-1st February during each year of monitoring.Error bars show the 95 % confidence intervals in the mean.

Fig. 6 .
Fig. 6.NO x (top) and O 3 (bottom) increment (observed predicted) by wind direction at KM. Data have been binned into wind direction intervals of 10 degrees and averaged.The error bars represent the upper and lower 95 % confidence intervals in the mean.

Fig. 7 .
Fig. 7. (a) Weekly total oxidant (OX; NO 2 +O 3 )/NO x intercept at KM calculated using linear regression.The solid line represents a loess smooth fit to the data, and the shaded region represents the 95 % confidence interval around the smooth.(b) Weekly total oxidant (OX; NO 2 +O 3 )/NO x slope at KM. Error bars represent the 95 % confidence intervals of the slope estimate.

Fig. 8 Fig. 8 .
Fig.8shows the probability density distribution of NO and NO 2 during both the baseline and pre-operational periods at KM. Baseline observations were filtered to between 19th September-1st February in order to minimise differences due to seasonal factors.Additionally, data from AURN monitoring sites was aggregated to each site classification throughout the same time periods in order to make a robust comparison.Density plots show how the concentrations of pollutants are distributed and give a more detailed indication of where the bulk of measurements lie.This is more useful than using a mean concentration, which can often be skewed by spikes in data.From Fig.8it is clear that, as expected, throughout the baseline phase of monitoring, the distributions of NO and NO 2 were most representative of a rural background site.For NO, 93.4 % of the measurements were in the interval 0-5 μg m −3 , almost equivalent to 93.2 % for rural background sites.Similarly for NO 2 , 82.8 % of data lay in the range 0-10 μg m −3 , compared to 65 % for rural background sites.For comparison, only 21 % of data lay within the same range for urban background sites.For the pre-operational period there is a clear shift in the distribution of both NO and NO 2 concentrations.The NO measurements display a bimodal distribution, in which 31.4 % of observations fall into the range 0-5 μg m −3 and the bulk of observations lie in the range 5-20 μg m −3 , accounting for 49.8 % of observations.This suggests the initial source of NO (likely to be Habton Road) still exists, but that there is also an additional source responsible for higher levels of NO.During this phase, KM is approximately comparable to urban traffic sites, where only 25.1 % of data fell into the range 0-5 μg m −3 and the majority of observations (52.4 %) were between 10 and 100 μg m −3 .There was an evident, albeit smaller change in the distribution of NO 2 .A much broader spread was observed in the data (note the log scale in Fig.8) with 73.3 % of data distributed between 5 and 40 μg m −3 , compared to 74.3 % for urban industrial sites and 55.4 % for urban traffic sites.Fig.8indicates that the site characteristics of KM significantly changed following the initiation of unconventional O&G development.Based on NO x concentrations, the site transitioned from an air quality

S
.E. Wilde et al.Science of the Total Environment 858 (2023) 159702 Fig. A.2. Predicted against measured mixing ratios for the testing data set of baseline data for NO, NO 2 , NO x and O 3 at KM.The dashed line shows the 1:1 line.The R 2 values are those resulting from a linear fit of the two variables.

Fig. A. 3 .
Fig. A.3.Polar plots of NO, NO 2 and O 3 throughout the entire measurement period at KM.

Table 1
Instrumentation details for the air quality monitoring station at KM.