Modelling macronutrient dynamics in the Hampshire Avon river: A Bayesian approach to estimate seasonal variability and total ﬂux

The macronutrients nitrate and phosphate are aquatic pollutants that arise naturally, however, in excess concentrations they can be harmful to human health and ecosystems. These pollutants are driven by river currents and show dynamics that are a ﬀ ected by weather pattern and extreme rainfall events. As a result, the nutrient budget in the receiving estuaries and coasts can change suddenly and seasonally, causing ecological damage to resident wildlife and ﬁsh populations. In this paper, we propose a statistical change-point model with interactions between time and river ﬂow, to capture the macronutrient dynamics and their responses to river ﬂow threshold behaviour. It also accounts for the nonlinear e ﬀ ect of water quality properties via nonparametric penalised splines. This model enables us to estimate the daily levels of riverine macronutrient ﬂuxes and their seasonal and annual totals. In particular, we present a study on macronutrient dynamics on the Hampshire Avon River, which ﬂows to the southern coast of the UK through the Christchurch Harbour estuary. We model daily data for more than a year during 2013-14 in which period there were multiple severe meteorological conditions leading to localised ﬂooding. Adopting a Bayesian inference framework, we have quantiﬁed riverine macronutrient ﬂuxes based on input river ﬂow values. Out of sample empirical validation methods justify our approach, which captures also the dependencies of macronutrient concentrations with water body characteristics.


Introduction
simply as change-point structures. Tibshirani (1996)) that allows identification of the water quality properties that have the strongest 120 association with variation in the macronutrient concentrations. Lasso is a method that is used in the 121 context of regression analysis, and it can simultaneously perform variable selection and shrinkage 122 of the vector of regression coefficients toward zero. 123 We use a Bayesian formulation of Lasso regression (Park and Casella, 2008;Hans, 2009;124 O' Hara and Sillanpää, 2009) that is constructed by specifying a Laplace distribution as a prior dis-125 tribution for the model coefficients. We standardised all regressor variables and implemented the 126 Bayesian Lasso regression technique described by Lykou and Ntzoufras (2013). This Lasso tech-

Model set-up
The discussion in the previous section leads us to consider a regime switching model for 137 macronutrients according to both temporal window and river flow that is able to adjust for non-138 linear effects of the chosen water quality properties. Here the two macronutrients, nitrate and 139 phosphate, are modelled separately, although it is possible to model them jointly. Joint modelling 140 of the two macronutrients nitrate and phosphate, is not of interest here since our objective is not to 141 study their inter-relationships, which is seen to be rather weak (correlation -0.16 in Table 1), but 142 to predict their individual daily and annual fluxes into the estuary.

143
The model is developed for data y t , which denotes the natural logarithm of the observed 144 macronutrient concentration at day t, for t = 1, . . . , T = 393. We construct a Bayesian hierar-145 chical model, which encompasses the model for the observed data, the dynamics of the process 146 and the specification of parameters and hyperparameters (Berliner, 1996). At the first stage of the 147 modelling hierarchy, we assume an independent Gaussian measurement error model: where µ t denotes the time varying mean and σ 2 is the variance assumed to be constant at all time 149 points. We do not consider time varying variances as we do not have replicated data at each time 150 point to estimate them. Rather, our effort is dedicated to finding the best model for the mean 151 concentration µ t at time t in the next stage of modelling hierarchy.

152
The second stage of the hierarchy defines the model for µ t . To incorporate nonlinear effects of 153 each of the p water quality properties, we incorporate a nonparametric smoothing function g j (x t j ) 154 of x t j at each time point t, where x t j denotes the value of the jth water quality property at the tth 155 time point. The choice of the g j (·) functions ranges from linear to nonparametric penalised splines 156 (Eilers and Marx, 1996;Ruppert et al., 2003) which are well-known to be very flexible. In our over the range of x. We consider a low-rank thin-plate spline representation given by: where we treat β 0 and β 1 to be fixed but unknown parameters and assume b = (b 1 , . . . , b K ) to be 162 the vector of random parameters corresponding to the set of basis functions ( and zero otherwise, and d is the degree of the spline. Each component 164 of b is assigned an independent normal prior distribution with mean zero and unknown variance, 165 σ 2 b , to be estimated from the model.

166
Model (2), assumed for the jth covariate at tth time point, x t j , is given by: Here, we consider a model with the same set of knots and the same degree for the splines for all the covariates that have been normalised already, see Section 2.3.1. Assuming an additive model, we obtain the total contribution: However, the p separate intercept terms will not be identified and hence we only take one global 168 intercept β 0 in place of the sum p j=1 β 0 j . For ease of notation we shall write β j = β 1 j for j = 169 1, . . . , p. Now, each b k j for k = 1, . . . , K and j = 1, . . . , p is given an independent normal prior 170 distribution with mean zero and unknown variance, σ 2 b as mentioned above.

171
Ruppert (2002)   where I(A) = 1 if A is true and 0 otherwise. For model identifiability reasons, we set δ 3 = 0 so 196 that the three remaining parameters, δ 1 , δ 2 and δ 4 measure incremental slope relative to the one for 197 low river flow after the switch-point in time.

198
Putting the above discussions together, we arrive at the following model for µ t : where v th denotes the product of the two indicator functions and the incremental river flow corre-200 sponding to δ h for h = 1, . . . , 4. In subsequent discussion we denote this general model by M1.

201
We compare this model with the following sub-models of interests:

202
• M2. A linear regression model for the water quality properties, but with no change-point structures, that allows us to compare the proposed modelling innovations with a straw 204 method: • M3. A linear regression model for the water quality properties, with only a switch-point in 206 time: • M4. A linear regression model for the water quality properties, with only a change threshold 208 in river flow: • M5. A linear regression model for the water quality properties, with change-point structures 210 for time and river flow: • M6. A regression model via penalised splines for the water quality properties, but no 212 change-point structures: To account for temporal dependence that is expected to occur between measurements collected 214 on consecutive days, we also evaluated the additional inclusion in (4) of a random intercept, mod-215 elled as a linear stationary first-order autoregressive process, η t , which is a very popular choice in 216 time series analyses. Thus, the model for η t assumes the form: η t = ρη t−1 + u t , where the error u t 217 is white noise, that is normally distributed with mean 0 and variance σ 2 η , and the parameter ρ is 218 assumed be in the interval [−1, 1]. However, we were not able to fit this model to the data due to

250
To compare the quality of the model fit of the proposed modelling approach in comparison 251 to the above described simpler statistical models, we adopt the predictive model choice criterion 252 (PMCC; Laud and Ibrahim (1995); Gelfand and Ghosh (1995)) defined by: where y rep t denote the future replicate of the observed macronutrient concentrations y obs t . The

254
PMCC essentially quantifies the fit of the model by comparing the posterior predictive distribution 255 obtained from the assumed model p(y rep t |y obs t ) with the observed data. The first term of (10) gives a 256 goodness of fit measure (G) which will decrease with increasing model complexity and the second 257 term of (10) is a penalty term (P) which tends to be larger for complex models. The model with 258 the smallest value of PMCC is the preferred model.

259
To facilitate model comparisons using the traditional coefficient of determination (often termed 260 as adjusted R 2 ), we consider the analogus Bayesian statistic R 2  The implementation of the models has been performed using the freely available software  The Bayesian methods allow us to estimate the daily total deposit (mass flux) of each macronu-292 trient as follows. Note that macronutrient flux is defined as the product of concentration times river 293 flow rate (Sigleo and Frick, 2003;Quilbé et al., 2006), measured in Kg/day, i.e. flux at day t, de-294 noted by ξ t is µ t × f t where µ t is converted to be measured in Kg/m 3 and river flow is converted 295 in m 3 /day. We estimate ξ t and its uncertainty by using ξ ( ) t = µ ( ) t f t where = 1, . . . , 6000 indexes 296 the thinned MCMC iterates.

297
We predict macronutrient fluxes for the 20 days used in the out-of-sample validation test to 298 assess the predictive accuracy of the model. We also estimate daily and total fluxes for the entire 299 study period using the whole data set available. Finally, to allow a comparison with similar liter-   which is adopted henceforth in this paper.

313
To assess the adequacy of the chosen model M1 for the macronutrients data, we have checked

369
Finally, under M2 and, hence, the former model continues to be our preferred model.

376
The last two rows of Table 5

Discussion and conclusion
The principal aim of this paper consists in understanding how different macronutrient species 391 respond to changes in river flow, which can be largely driven by weather pattern and severe weather 392 conditions such as storm events. Therefore, we develop a model for riverine data collected dur-393 ing a relatively short study period characterised by unusual frequency of storms and heavy rain- shire Avon river's waters, we did not assess directly the macronutrient response to extreme events, 412 but instead we assess their response to the threshold behaviour of river flow, which is, however, 413 largely controlled by weather pattern. Therefore, taking advantage of the interaction with distri-414 butional changes in time, we found that the threshold changes in river flow causes very different 415 dynamics in nitrate and phosphate time-series.

416
An important feature of our model is that it allows the predictions of macronutrient concen-417 trations and also the quantification of riverine input fluxes to the estuary. We illustrate this by providing estimates of daily fluxes and annual totals for nitrate and phosphate along with their 419 uncertainties. These daily fluxes can be aggregated to coarser temporal levels, e.g., monthly, quar-420 terly or annually, as demonstrated in our application, where we find that the amount of macronutri-421 ents delivered to the estuary can change dramatically according to the period of the year in which 422 river flow experiences larger changes. This is particularly evident for nitrate which shows a clear 423 seasonal pattern, while flux estimates for phosphate present a weaker seasonal structure, that leads 424 to a higher uncertainty in our modelling approach.

425
The Bayesian modelling framework adopted here can be extended in various ways by including 426 more relevant covariates, such as wind field that may have a short-term mixing effect on water 427 quality, increasing the sediment re-suspension and be a driving force in exporting nutrients in the 428 estuary. This can lead to a better estimate of changes in macronutrient concentrations and fluxes.

429
Multivariate modelling for both the aquatic pollutants and for data from multiple sites may also 430 lead to fruitful research.

431
In conclusion, the Bayesian approach introduced here is able to facilitate the description of 432 complex and nonlinear environmental processes and is able to assess the associated uncertainties 433 of the reported estimates. We present an application for modelling macronutrients dynamics in 434 relationship to water quality properties and changes in river flow. Our method can be easily adapted 435 to similar data modelling and analysis problems for estuarine pollution using the accompanying 436 computer code.           In the graph, y t is the observed concentration for a macronutrient in day t. The macronutrient concentrations are assumed to be distributed normally around the mean µ t . The parameter 1/σ 2 represents the precision (i.e. 1/variance) of the normal distribution. µ t is modelled as a function of: (i) a global intercept, β 0 ; (ii) the water quality properties, which are parameterised via penalised splines, where β j is the fixed coefficient for each water quality property, x t j , and b jk are the random coefficients associated with the design matrix with elements z t jk = (x t j − k k j ) d + ; (iii) the change-point structures constructed with interaction terms described by indicator functions: I tτ is the indicator for the switch-point in time, τ, and I tϕ2 and I tϕ2 are the indicators for the two change thresholds in river flow, ϕ 1 and ϕ 2 . Finally, δ 1 , . . . , δ 4 are the coefficients associated with the change-point structures.