Data-driven versus conventional N2O EF quantification methods in wastewater; how can we quantify reliable annual EFs?

A long-term N 2 O dataset from a full-scale biological process was analysed for knowledge discovery. Nonparametric, multivariate timeseries changepoint detection techniques were applied to operational variables (i.e. NH 4 -N loads) in the system. The majority of changepoints, could be linked with the observed changes of the N 2 O emissions profile. The results showed that even three-day sampling campaigns between changepoints have a high probability ( > 80%) to result to an emission factor (EF) quantification with ~10% error. The analysis revealed that support vector machine (SVM) classification models can be trained to detect operational behaviour of the system and the expected range of N 2 O emission loads. The proposed approach can be applied when long-term online sampling is not feasible (due to budget or equipment limitations) to identify N 2 O emissions “hotspot” periods and guide towards the identification of operational periods requiring extensive investigation of N 2 O pathways generation. © 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ )


Introduction
Nitrous oxide (N 2 O) emitted during biological nutrients removal, can significantly contribute to the total carbon footprint of Wastewater Treatment Plants (WWTPs). The recent roadmap to carbon neutrality in urban water published by Water and Wastewater Utilities for Climate Mitigation (WaCCliM) project and the International Water Association (IWA) ( Ballard et al., 2018 ), states that direct N 2 O should be considered for the carbon footprint assessment and reporting. N 2 O fluxes in wastewater processes are characterised by significant spatial and temporal variability due to the different interacting biological processes that consume or produce N 2 O and the variation of operational and environmental conditions ( Daelman et al., 2015 ;Gruber et al., 2019 ). A recent analysis of N 2 O emission factors (EF) for over 70 full-scale wastewater treatment processes revealed that the sampling frequency and sampling techniques applied in N 2 O monitoring campaigns, can significantly affect the quantified EFs ( Vasilaki et al., 2019 ). For instance, most of the monitoring campaigns lasting less than one month have reported EFs less than 0.3 % of the N-load. On the other hand, studies lasting over a year result in a median EF equal to 1.7 % of the N-load. The IPCC guidelines for the estimation of N 2 O in WWTPs were updated in 2019; the suggesting an EF of 1.6 % for of the total N-load ( IPCC, 2019 ). However, uncertainties remain; the use of measured emissions data is suggested for the estimation of country-specific EF in large WWTPs (IPCC, 2019). The development of process-based reliable N 2 O EFs requires long-term monitoring campaigns of over 1-year ( Gruber et al., 2019 ;Vasilaki et al., 2019 ).
Long-term N 2 O sampling (continuous or via grab-samples) is still rarely performed in WWTPs. High cost and complexities of long-term online monitoring are the main limiting factor. There is still lack of a holistic low-cost, practical approach for the quantification of N 2 O EFs. Therefore, new approaches are required for the quantification of EFs, minimizing sampling rate and advising on the duration and frequency of sampling campaigns.
A large amount of raw, heterogeneous operational data is available from WWTP operations ( Olsson et al., 2014 ). Several studies have demonstrated that utilisation of historical data (i.e. DO, mixed liquor suspended solids (MLSS), NH 4 + concentrations) from WWTPs can feed statistical methods and predict the profile of target process variables or key performance indicators that cannot be monitored online; an overview can be found in the study of Haimi et al. (2013) . Additionally, datadriven techniques have been extensively used to capture the nonlinearities and complex structures of wastewater treatment processes towards their optimisation, monitoring and better control ( Haimi et al., 2013 ;Corominas et al., 2018 ;Newhart et al., 2019 ). Vasilaki et al. (2018) showed that variables monitored online can be utilised to provide insights on the long-term behaviour and abrupt changes of N 2 O dynamics with the application of clustering and dimensionality reduction techniques. However, advanced information extraction methods have rarely been used to analyse data from N 2 O monitoring campaigns. Recently, Sun et al. (2017) developed a back-propagation artificial neural network (ANN) to simulate N 2 O emissions in an anaerobic-oxic (A/O) process. The authors demonstrated the feasibility and simplicity of predicting N 2 O emissions with the use of data-driven models.
Univariate and multivariate changepoint detection techniques have been widely used to detect changes in underlying distribution of sequences and regime shifts in several applications including investigation of distributional changes in financial markets ( Allen et al., 2018 ) and climate change investigation studies ( Kotta et al., 2018 ). Li et al. (2015) , recently, applied a non-parametric multivariate changepoint detection algorithm (edivisive; ( James and Matteson, 2013 )) to detect changes in water quality variables in a shallow lake (total nitrogen, total phosphorus and Chlorophyll) and linked the changepoints (CPs) with changes in the dynamics and patterns of sediment release.
In this work, the most prevalent sampling approaches in wastewater industry are presented and compared. A data-driven sampling approach and two conventional monitoring approaches have been compared and assessed. The result of the comparison proposes the most accurate and efficient amongst the three methods. Specifically, an approach that uses CPs to analyze the behaviour of online monitored variables linked with N 2 O generation (i.e. DO, NH 4 + ), is proposed, to detect i) periods (between the CP intervals) with steady N 2 O emissions profile and ii) changes in the temporal range and dynamics of N 2 O emissions. Multivariate changepoint detection was applied to identify structural changes in the variables monitored online (i.e. NH 4 + , DO, flow-rate, NO 3 − ) in a full-scale Carrousel reactor. Subsequently, the CPs were linked with changes in the N 2 O emissions behaviour and range during a 15-month monitoring campaign ( Daelman et al., 2015 ). A classification model was developed to predict the range of N 2 O emission loads (i.e. low, medium, high) based on the CP intervals. This approach can support operators to minimise GHG sampling requirements, without compromising long-term EF estimates. The accurate quantification of annual N 2 O EF requires samples collection between all CP intervals and a few sampling days can be sufficient to estimate a representative EF for different CP intervals. The classification model can support the estimation of the N 2 O emission range for new incoming data in the WWTP.

Process description and the source of data
N 2 O measurements and the extensive data-set of the operational variables from the studies of Daelman et al. (2015) and Vasilaki et al. (2018) were used in the analysis. The dataset belongs to one of the longest N 2 O monitoring campaigns undertaken in the wastewater sector (15 months). Aplug-flow reactor linked with two subsequent parallel Carrousel reactors was monitored. A full description of the WWTP can be found in the study of Daelman et al. (2015) .
The analysis and the development of the methodological approach were based on data obtained from the Carrousel reactor 1. The data matrix used in the analysis, the location of the sensors in the system and the details of the operational control are provided in the study of Vasilaki et al. (2018) . The system includes the following probes: DO (DO1, DO2, DO3) in the beginning, middle and end of the Carrousel reactor, ammonium nitrogen (NH 4 -N) and nitrate nitrogen  in the effluent of the Carrousel reactor, NH 4 -N from the middle of the second oxic zone in the plug flow reactor, temperature and influent flow-rate. The flow-chart of the secondary treatment is provided in the supplementary information.
The behaviour of N 2 O emissions at Carrousel reactor 1 showed a high level of volatility during the 15-month monitoring campaign and characterised by significant diurnal and seasonal variations (see supplementary material). The daily emission loads ranged from < 0.004 kg N 2 O / day to > 150 kg N 2 O / day. Daelman et al. (2013) simulated different sam pling strategies, based on data collected from the same plug-flow -Carrousel reactor linking EFs with different sampling strategies. The authors simulated the sampling strategies using a long-term dataset. They reported that short-term campaigns (grab sample, 24 h and 7-day sampling), cannot accurately estimate annual EFs, while there is a high probability to underestimate actual emissions. The relative error of the estimated annual N 2 O emissions ranged between -22 % and 35 % (95 % of the cases) by simulating a 50-days N 2 O sampling campaign (random 24h periods on working days were selected). The authors found that long-term offline/online sampling capturing seasonality and temperature effects is needed for reliable EF assessment. Reliable estimation of N 2 O emissions, can provide guidance on N 2 O mitigation measures and support WWTPs towards their carbon neutrality goals. However, there is high cost and resources related to long-term, of N 2 O online monitoring. Therefore, minimizing the sampling requirements can help water utilities to integrate N 2 O monitoring in practice. Vasilaki et al. (2018) applied a changepoint detection technique on the N 2 O emissions hourly time-series in Carrousel reactor 1, combined with hierarchical k -means clustering to investigate the N 2 O emission patterns and identify links with the variables monitored online. The study concluded that i) the dependencies between N 2 O and other operational variables (i.e. NH 4 + , NO 3 − , DO) varied in different sub-periods, ii) the system disturbances are mainly linked with events of elevated influent flow-rates, iii) specific ranges of operating variables have historically resulted in low or high ranges of N 2 O in the different sub-periods. These findings have been used to develop the methodological framework of Section 2.2 .

Methodological framework and data analytics
Fig. 1 summarises the methodology applied in the current study. Pre-processed data obtained from the work of Vasilaki et. al. (2018) were used in the analysis. The previous examination showed that disturbances (i.e. precipitation events) significantly affect the NH 4 -N effluent concentrations. Thus, the first step of the analysis was to isolate and categorize abnormal diurnal pattern of the influent flow-rate that affected system performance. Subsequently, sensor data were used to segment the behaviour of the system into sub-periods with different behaviour and operational variables ranges (i.e. NH 4 -N, DO). The aim was to investigate whether changes in the N 2 O emissions coincide with the changes in the range of operational variables. For this purpose, multivariate changepoint detection techniques were applied to categorize one-year historical sensor data into sub-periods exhibiting different pattern. The sequential segmentation of the operational variables enabled the quantification of N 2 O EF over 1 year using a small number of random samples between segments. Average estimated N 2 O emissions were then compared with the respective EFs from conventional monitoring techniques (equivalent sampling duration), following the methodology applied in the study of Daelman et al. (2013) . Finally, features were extracted representing the diurnal pattern of the operational variables and classification models were trained to predict the range of N 2 O emissions. The analysis was based on the N 2 O emission ranges between the changepoint segments.

Identification and isolation of influent-flow-rate increase
A variation in the behaviour of the online monitored variables was observed during the campaign. Previous analysis ( Vasilaki et al., 2018 ) showed that abrupt and rapid increases in the influent flow-rate were linked with precipitation events and often resulted into peaks in ammonium concentration in the effluent of the Carrousel reactor. The following steps were performed in order to detect and isolate diurnal influent flow-rate patterns that affected the system's performance ( Fig. 1 ): i) features were extracted representing the diurnal behaviour of influent flow-rate and ammonium concentration (i.e. daily mean), ii) the selected features were transformed into a lower dimension space using principal component analysis (PCA) and iii) density-based spatial clustering of applications with Noise (DBSCAN) was applied to detect days that did not exhibit the expected dynamics and range of the target variables.
PCA ( Jolliffe, 2002 ) was applied to reduce feature dimensionality by eliminating a proportion of variance in data. Subsequently, DBSCAN ( Ester et al., 1996 ) was applied to the features and clusters with regions of high and low density. DBSCAN has been applied in various studies to identify outliers considering multivariate sensor data (i.e. precipitation, humidity) ( Saeedi Emadi and Mazinani, 2018 ). DBSCAN was applied to the first three PCs extracted from the selected feature vector (explaining ~90% of the total variance) to isolate data at a distance greater than a pre-defined distance. DBSCAN was executed in the R package ( Hahsler et al., 2017 ); details are provided in Vasilaki et al. (2020) .

Changepoint detection
Following a similar approach to the respective one applied in the study of Li et al. (2015) , the aim was to investigate whether distributional changes and level shifts of variables conventionally monitored in wastewater systems can be used to detect changes in the range and formation of N 2 O emissions. Identifying changes in the online data collected from wastewater treatment processes is not straight forward; the time-series consist of a combination of seasonal, gradual and abrupt changes. For this purpose, the edivisive algorithm was used from the R package ecp ( James and Matteson, 2013 ). E-divisive changepoint detection algorithm is a non-parametric method that slices the time-series by detecting changes in the characteristic functions of the underlying distributions (that define a probability distribution) between segments.
The method assumes that the α absolute moment (for α ∈ (0, 2]) exists and that observations are independent. E-divisive is an iterative procedure where in each iteration one single changepoint that divides the time-series into two segments that maximise the difference between the characteristic functions of the segments is detected. Subsequently, the statistical significance of the changepoint is evaluated based on a permutation test ( James and Matteson, 2013 ). The procedure is repeated until the statistically significant changepoints (CPs) of variables have been identified. In the implementation, 21 days (3 weeks) were selected as the minimum distance between CPs, in order to account for seasonal variability. The confidence was defined as equal to 95% in order to control the false-positive rate of CPs. The identified CPs indicate distributional changes of the operational variables in the system. CP intervals are defined as the periods between CPs.

N 2 O emission factor estimation based on changepoint detection
Different N 2 O sam pling scenarios were tested following the methodology described in Daelman et al. (2013) , to evaluate whether changepoint detection applied to historical data can reduce the required number of samples for the determination of N 2 O EFs. For this purpose, the daily emission load was calculated (kgN 2 O/day), for the first year of the monitoring campaign in the Northern Carrousel reactor (N 2 O emissions dataset -DatN 2 O).
Three different scenarios were considered; i) random 3-day monitoring between the CP intervals (total samples equal to 36 days/year) (sampling strategy 1 -ST1), ii) monitoring N 2 O emissions for 3 random days each month for 1 year to account for seasonal variability of N 2 O emissions (36 days/year) (sampling strategy 2 -ST2), iii) random monitoring for 36 days/year (sampling strategy 3 -ST3).
In all sampling strategies, it is assumed that emissions were monitored continuously for 24 h (starting from 00:01 a.m. of the chosen day). Emissions averaged over the 24 h periods, represent the daily average N 2 O emitted (kg N 2 O-N / d). In ST1, DatN 2 O dataset was used to extract randomly 3 days from each CP interval (total 36 days). In ST2, 3 days were extracted randomly from each month (total 36 days), whereas in ST3 36 days were randomly selected over the whole DatN 2 O dataset (total 36 days). Subsequently, the average N 2 O emissions over these 36 days in all scenarios were estimated and were considered to represent the annual EF estimates. This procedure was repeated 10,0 0 0 times. Therefore, for each scenario, 10 0 0 0 average annual N 2 O emission loads were simulated, and a frequency histogram with the potential annual N 2 O emission estimates was developed and compared with the observed average N 2 O emissions.

Feature extraction
The data included in the analysis are characterised by 24-hour cyclical trend, therefore 24 hours were selected as the time interval. A 24-element vector was developed for each variable monitored in the system (representing hourly average). Subsequently, features were extracted, and a feature vector was developed, representing the behaviour of the system; in total > 100 features were extracted that can be grouped into three main categories (see supporting information). The first category consists of first-order statistical features including measures of central tendency (i.e. mean), measures of variability (i.e. standard deviation), measures of shape (kurtosis, skewness) and basic statistical functions such as daily maximum, minimum and interquartile range (IQR). First-order features are calculated using the real values of the time-series and provide information on the diurnal behaviour of the variables. The second category consists of second-order features calculated based on the differences between neighbouring values ( Nanopoulos et al., 2001 ). The third group of the features was developed based on specific diurnal sub-events. It captures the pattern of operational variables under specific conditions. The intensity and presence of these events and patterns can provide information on the temporal pattern of the system. In the system, aerator 1 operates under on/off pattern (when ammonium is higher than 1.2 mg/l), while aerators 2 and 3 operate always and peak when ammonium is higher than 0.6 and 0.9 mg/l, respectively. Therefore, one subset of this group of features aimed at capturing the behaviour of DO and nitrate concentrations when ammonium concentrations in the Carrousel reactor effluent were higher than 1.2 mg/L and lower than 0.6 mg/L. Additionally, the concentration of ammonium and nitrate in the plug-flow reactor and the flow-rates provide an indication of the loads entering the Carrousel reactor. Subsequently, the second subset of features belonging to this group, aimed at capturing the diurnal duration of low ( < 10 mg/L) or high ( > 18 mg/L) ammonium concentration in the plug-flow reactor and the respective pattern of nitrate concentration. Calculated second-order features of values were also used to calculate a subset of features belonging to group three. For instance, there are periods with a strong relationship between N 2 O concentration and nitrate concentration in the Carrousel effluent. Therefore, the diurnal duration and strength of increasing/decreasing nitrate concentrations were calculated and linked with and the response of other variables in the system. A detailed list of the tested features is provided in the supplementary information.
Finally, Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) ( Torres et al., 2011 ) was applied, to deconstruct the Temperature and NH 4 -N concentration in the plug-flow reactor, into Intrinsic Mode Functions (lMFs) representing different oscillatory components. These variables were selected to investigate seasonal and cyclical components. CEEMDAN is an extension to Empirical Mode Decomposition (EMD) ( Huang et al., 1998 ) and to the ensemble empirical mode decomposition (EEMD) ( Wu and Huang, 2009 ). Results from CEEMDAN decomposition are provided in the supplementary information.
In this study, CEEMDAN was used to extract the trend of the variables monitored in the system. Trend and variability of the variables are defined based on the study of Wu et al. (2007) following the methodology applied in Antico et al. (2014) . Specifically, the variability consists of the modes with oscillatory characteristics less than 3 months; the trend is extracted by subtracting all the variability modes with oscillatory periods less than 90 days from the original variable. The frequency matching the peak of the raw periodogram is used to define the oscillatory period. The behaviour of wastewater treatment plant processes is affected by environmental factors (i.e. temperature); the exact seasonal trend is strongly linked with the local environmental conditions and on many occasions cannot be extrinsically derived. CEEMDAN is an adaptive approach based on information extracted from the raw data ( Antico et al. 2014 ) and it can be useful to extract trends when analyzing wastewater treatment processes data. In the algorithm, the noise level was set to 0.02, the realisations to 10 0 0 and the maximum sifting iterations to 10 0 0. Subsequently, the aggregated daily mean of ammonium concentration and temperature were added to the set of features.

Feature selection
The objective of the feature selection is to isolate featuresubsets, that can distinguish days belonging to different ranges of N 2 O emissions. Feature selection has been widely applied in environmental modelling, i.e. for groundwater quality monitoring ( Rodriguez-Galiano et al., 2018 ) and in renewable energy prediction problems ( Salcedo-Sanz et al., 2018 ). In the wastewater sector, Tomperi et al. (2017) recently used five different features (i.e. based on forward selection, stepwise regression and genetic algorithms) together with a multivariable linear regression, to optimise the prediction of quality wastewater parameters from process measurements and high-resolution optical monitoring.
In this study, a recursive feature elimination (RFE) approach ( Guyon et al., 2002 ) applied to the feature vector for the selection of features. It implements backward elimination of features, wrapped with a standard random forest classification algorithm is Random forest classification ( Breiman, 2001 ) is a nonparametric machine learning method where multitude of decision trees are constructed from a random subset of the features and trained in a bootstrap sample of the training set (consisting of around 2/3 of the data producing uncorrelated predictions); the final class prediction consists of the repeated outputs of these trees. It is one of the most powerful methods for feature selection and classification. The algorithm was implemented in caret package ( Kuhn, 2008 ) in R software; details are provided in the supplementary information.
The selected features were used to build and compare an SVM classification model using the same training and test sets (see supplementary information). This procedure was repeated 50 times for the different training and test dataset splits. The performance of the classification models was evaluated based on the accuracy, kappa (see supplementary material).

Support vector machine
SVMs, initially developed by Cortes and Vapnik (1995) are a range of supervised non-parametric machine learning algorithms with applications in several sectors, including wastewater ( Corominas et al., 2018 ). In SVM algorithms kernel functions can be used to map the observation into a high-dimensional (possibly infinite) feature space; the 'maximum margin hyperplane' is then selected in this space. This is the separating hyperplane that has the farthest minimum distance to observations belonging to different classes. The SVM algorithm is developed in R using kernlab ( Karatzoglou et al., 2004 ), and the radial basis function (RBF) kernel as described in Vasilaki et al. (2020) . Repeated 10-fold cross validation (3 repetitions) was applied to select the optimal cost and gamma ( γ ) parameters over a two-step grid-search with the caret package ( Kuhn, 2008 ) following the methodology proposed in Hsu et al. (2003) . The cost defines the penalty of instances misclassified or instances exceeding the maximal margin whereas γ determines the amplitude of the kernel. One versus one approach was applied to classify into more than two classes. The dataset was randomly divided into test and train, with 75% of the available data used for training the SVM model and 25% used for testing. Additionally, data from the last period of the monitoring campaign (~30 days), not used during training, were also tested, to evaluate the performance of the model.

Detection of abnormal events
The DBSCAN features (supplementary material); were selected in order to represent the diurnal behaviour of the target variables. The eps parameter was determined based on the "knee" of k-nearest neighbours of the data plotted in increasing order (see supplementary information). Based on this procedure MinPts and eps were set equal to 6 and 0.4 respectively.
The pattern of the influent flow-rate and effluent NH 4 -N concentration, in the days isolated by DBSCAN are shown in the supplementary material. In total 155 days were isolated and are mainly characterised by events with elevated influent flow-rate or/and peaks of the NH 4 -N concentration in the Carrousel efflu-ent. Subsequent inspection showed that these events varied in intensity and duration; therefore, they were categorized into three major groups. Group 1 consists of system disturbances with duration equal or less than 24 h. Days belonging to group 1 are characterised by elevated influent flow-rate (days with precipitation) and peaks in the effluent NH 4 -N concentration during the same day and thus, low removal efficiency of NH 4 -N. In group 1, the system resumes to normal NH 4 -N removal efficiency after 24h. Group 2 consists of system disturbances lasting more than 24h. Multiple days with precipitation, at close temporal proximity, that affect the system performance for several days were assigned to this group. Finally, elevated influent flow-rate events lasting less than 24h, affected significantly the behaviour of NH 4 -N concentration in the Carrousel effluent for several days. These occasions were assigned to group 3. In total, 54 different events were detected (155 days) that belong to one of the three groups. Fig. 2 shows examples of the events belonging in groups 1, 2 and 3 and the pattern of N 2 O emissions. Blue lines represent the events detected by DBSCAN and red lines represent the normal operational conditions. Overall, ~30% of the events belong to group 1. The average daily influent flow-rate is ~40 0 0 m 3 /h; therefore, days in group 1 have moderate increase of the influent flow-rate (flowrate peaks < 70 0 0 m3/h and NH 4 -N concentration in the Carrousel peaks < 6 mg/L). Fig. 2 (a-d) shows an event with the highest influent flow-rate peak; N 2 O emissions are not significantly affected. Overall, the behaviour of N 2 O emissions for 1-day events at temperatures between 12-16 °C is not significantly affected compared to the average behaviour of emissions the day prior to the event ( Fig. 2 (d)). However, significant peaks, of N 2 O emissions, coinciding with group 1 events, are observed at higher temperatures. Days belonging to group 2 are characterised by influent flow-rate peaks above 80 0 0 m 3 /h, whereas NH 4 -N concentration peaks are higher than 9 mg/L in the effluent of the Carrousel reactor ( Fig. 2  (e-h)). Again, the pattern of N 2 O emissions varies. However, emissions tend to drop after the peak of the influent flow-rate (NH 4 -N concentrations in the plug-flow < 8 mg/L). Finally, most N 2 O emissions peaks between June and September 2019 belong to group 3 ( Fig. 2 (e-h)).
The analusis system's operational behaviour can help operators to identify events that affect performance and apply mitigation strategies (i.e. regulate the anaerobic supernatant stream to the mainstream line to reduce the system loads).

Changepoint detection
Changepoint detection was applied to identify changes in the profiles of the operational variables that affect the performance of the Carrousel reactor. Daily averages of i) NH 4 -N load (kg/h) in the plug-flow reactor (as an indication of the influent ammonium in the Carrousel reactor), ii) NH 4 -N load (kg/h) in the effluent of the Carrousel reactor, iii) DO1 and DO2 concentrations were used. Daily averages of these variables were considered in order to avoid the diurnal cyclic characteristics of the variables. The volumetric flow-rate of the plug-flow and Carrousel reactors were calculated according to Daelman et al. (2015) .
The minimum transition interval was equal to 21 days (~3 weeks) on the assumption that biological processes can be affected by seasonality. The multivariate changepoint detection analysis identified 12 statistically significant CPs (with significance level << 0.05) for the first year of the monitoring campaign. Fig. 3 shows the identified CPs and the respective profile of N 2 O emissions for each period interval between CPs. On many occasions, CPs coincide with the changes of the N 2 O emissions profile during the monitoring period. For instance, the highest drop in the average N 2 O emissions between adjoining periods (CPs 6 and 7) coincides with a drop in the ammonium load and an increase in the average nitrate-nitrogen load in the plug-flow reactor. Similarly, the drop of N 2 O emissions between CPs 2 and 3, coincides with an increase in the DO1 concentration in the Carrousel reactor.

Accuracy of the monitoring strategy based on system CPs
As shown in Section 3.2 , the pattern and range of N 2 O emissions changes between the detected CP intervals. Quantification of reliable N 2 O EFs in wastewater treatment processes is still not straightforward; monitoring campaign duration and strategy significantly affect the reliability of the results. Seasonal effects have also significant impact on N 2 O emissions ( Vasilaki et al., 2019 ).
The aim of this analysis is to simulate a knowledge-based N 2 O sampling campaign between CPs and evaluate EFs following a similar approach to the study of Daelman et al. (2013) . Additionally, the knowledge-based sampling campaign is compared with two alternative monitoring strategies: i) random 24-h monitoring and ii) random 24-h monitoring for specific days at each month capturing the seasonal variability. Fig. 4 (a) shows the relative frequency histogram of the estimated annual N 2 O load (kg/day) when 24h sampling for 36 random days is applied, during the first year of the monitoring campaign ( n = 10 0 0 0 repetitions). The red vertical line represents the measured average annual N 2 O load (equal to 39~kg/day). In total 43% of the simulations resulted to an EF ranging between 35 and 43 kg/day (less than 10% error from the actual annual N 2 O load  ). Additionally, the probability to underestimate the N 2 O load by more than 10% is equal to ~30%. Fig. 4 (b), shows the histogram of the estimated annual N 2 O load (kg/day) when 24h sampling for three random days between the CPs (12 CP intervals) was assumed ( n = 10 0 0 0 repetitions). Overall, the likelihood to estimate an average N 2 O load between 35 and 43 kg N 2 O/day was equal to ~80%, with > 99% of the simulated N 2 O estimates ranging between 32 and 46 kg/day. In this case, the probability to underestimate the emissions by more than 10% was approximately 5%. Finally, when random sampling for 3 days per month was tested ( Fig. 4 (c)), the probability to estimate an N 2 O load ranging between 35 and 43 kg/day, was equal to ~70%, whereas the probability to underestimate the emissions by more than 10%, was approximately 25%. The behaviour of the operational variables needs to be considered together with seasonal effects when sampling campaigns are planned. In the investigated system, limited sampling days between the CPs could give a realistic quantification of the actual EF during a whole year. The proposed approach can be applied to identify N 2 O emissions "hotspot" periods and guide towards the identification of the operational periods that require intensive investigation of N 2 O pathways and mitigation measures.

Feature selection and classification
A classification algorithm was constructed to predict low, medium or high N 2 O emissions based on the operational behaviour of the system. The categorization of the different classes was based i) on the CP analysis, ii) on the seasonal effects. Therefore, two periods (from the CP intervals) characterised by similar N 2 O ranges but not sequential, were assigned in two different classes. Table 1 shows the average N 2 O emissions in each class and the changepoint intervals for each class (the changepoint intervals are shown in Fig. 3 ). However, five only classes were considered in the feature selection and construction of the classification algorithms.
Feature selection is a significant step of several highdimensional classification applications. However, many studies have shown that selected features depend on the training sample, and thus, a feature selection algorithm can be unstable ( He and Yu, 2010 ;Kalousis et al., 2007 ). Therefore, in many cases feature selection stability needs to be considered together with model prediction accuracy in the evaluation of the classification/regression performance ( Pes et al., 2017 ;Saeys et al., 2008 ).
The most common features for feature subset sizes equal to 6 and 10 for the resampling perturbations are shown in the supplementary information. Overall, 4 features coincided in all subsets with feature size equal to 6 and 8 features coincided in almost all subsets with feature size equal to 10. The selected features are divided between feature group 1 (i.e. first-order statistical featuresmaximum DO2, minimum influent flow-rate), feature group 3 (that capture the pattern of operational variables under specific conditions) and feature group 4 (i.e. trend extracted by CEEMDAN). For instance, the features describing the pattern of DO2 concentration for NH 4 -N higher than 1.2 mg/L in the Carrousel effluent were selected in all feature subsets. The trends of NH 4 -N concentration in the plug-flow reactor and temperature extracted by CEEMDAN, were also included in all feature subsets.
The results of the SVM and RF classification models from the different resampling perturbations for both the train and test data sets are shown in Table 2 . Feature subsets that minimise the clas-sification error were selected in each resampling. The results show similar results both for the RF and SVM classifiers, whereas classification accuracy in the test dataset is high even for small feature subsets ( > 97%). In 58% of the resampling perturbations, the size of the best feature subset was equal to 6. In total, 8 variables   were selected for 14% of the resampling perturbations, 10 for 20% of the resampling perturbations whereas all the features were selected for 4% of the resamples. Subsequently, data from the last period of the monitoring campaign were tested to assess the predictive capabilities of the models in previously unseen operational periods. In total ~30 days were tested (precipitation events were not considered). Fig. 5 displays the predicted classes (SVM classification) for different best feature subset sizes (majority rule for the different resampling perturbations) and the daily N 2 O emissions. In the first 20 days all models have predicted low emission-risk classes. Class 1 represents operational conditions from the beginning of the monitoring campaign, whereas class 5 represents conditions from the end of the monitoring campaign (after ~1 year). The investigated period is ~1 year after the start of the monitoring campaign and the observed N 2 O emissions are again low. Therefore, the alternation of the predicted classes between class 1 and 5 can be expected, mainly due to the impact of seasonal effects on the influent concentration and on N 2 O generation. Finally, the model with best feature subset size equal to 10 was the only one able to detect the change in the N 2 O range after the first 20 days.
Additional data are required to investigate the generalisation capabilities of the SVM classifier. The methodological approach is able to predict the range of N 2 O emissions, as long as the system operates within the predefined and investigated range.
Under the investigated conditions, the accuracy of the classifier with feature subset size equal to 10, was satisfactory, even when data, from the second year of the campaign were tested (these data were not used during training,30 days). Therefore, the SVM classifier can be used (with caution) to detect periods with operational behaviour that has been historically linked with elevated emissions.
The development of mitigation measures in the predicted highrisk N 2 O emission periods, can be supported with the integra-tion of mechanistic models or practical, simplified theoretical approaches. The latter facilitates the identification of potential triggering mechanisms linked with the period-specific operational conditions. For example, a simplified N 2 O risk-based model was developed by Porro et al. (2014) considering thresholds of ASM state variables linked with the generation of N 2 O emissions (i.e. DO, nitrite, COD:N) based on the treatment step (nitrification, nitrification, transition zones).
Additional long-term monitoring campaigns, in continuous wastewater systems are required, to validate the proposed strategy and standardise the selection of operational variables that should be considered during changepoint detection and classification. Finally, the development of a detection approach needs also to be integrated in the procedure, that will detect new, unobserved operational states and provide feedback to the algorithms on recalibration requirements.

Conclusions
This study shows that information hidden in conventional variables monitored in wastewater can be mined to reduce N 2 O sampling frequency without compromising the quantification of annual N 2 O EFs and ultimately predict the risk of elevated emissions. The application of changepoint detection in the process operational variables provided insights on structural changepoints of the N 2 O emissions profile. Limited 24-h N 2 O samples between the CP intervals are sufficient to estimate the average N 2 O EF for the whole year, while conventional strategies resulted in lower accuracy of the N 2 O EF. An SVM classification model was constructed to predict operational periods linked with specific N 2 O emission ranges. The results indicate that analysis of historical data and investigation of seasonal effects can be of paramount importance in the planning of monitoring campaigns. The proposed approach can be applied when long-term online sampling is not technically and economically feasible. The proposed solution is capable of pinpointing the N 2 O emissions "hotspot" periods and guiding towards the identification of operational periods that require extensive investigation of N 2 O pathways and mitigation measures.

Author statement
All authors have contributed for the development of the current work.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.