On the use of multi-objective optimization for multi-site calibration of extensive green roofs

,


Introduction
In the last few decades, green roofs have emerged as a sustainable stormwater infrastructure option.
Hydrological models are practical tools for evaluating the efficiency of various design configurations of green roofs under different climatic conditions.Thus, they can assist practitioners aiming at quantifying the hydrological impact, i.e., retention and detention processes, of green roof implementation in urban catchments.Numerous hydrological models of green roofs have been developed and tested in the literature.The models can be classified into physicallybased (Bouzouidja et al., 2018;Yanling Li & Babcock, 2015;Palla et al., 2009), conceptual (Abdalla et al., 2022;Palla et al., 2012;Vesuviano et al., 2014) and data-driven (Abdalla et al., 2021).The use of conceptual hydrological models has been favored by many studies due to their simplicity, accuracy, and computational efficiency (Abdalla et al., 2022;Palla et al., 2012).
Conceptual hydrological models apply simplified equations to simulate the hydrological processes of green roofs.Due to the simplification of these equations, they depend on empirical parameters that are not physically measurable.Therefore, calibration is required to obtain optimal values for these parameters.The high dependency on calibration limits the application of conceptual models in cases when measured data are not available for calibration.Several studies have attempted to obtain explicit relationships between conceptual model parameters and physically measurable characteristics of green roofs.For instance, a handful of studies concluded that conceptual model parameters representing internal green roof storages could be estimated from the field capacity of green roof substrates (Abdalla et al., 2022;Stovin et al., 2013), which can be physically measured (Fassman & Simcock, 2012).Moreover, parameters controlling flow movements and dynamics within the green roof layers were found to be correlated with roof properties such as the depth of the substrate layer (Soulis et al., 2017;Yio et al., 2013),the drainage layer type, and the slope of the roof (Vesuviano & Stovin, 2013).However, no explicit formulas were obtained that could estimate flow parameters solely from the physical roof characteristics.
Few studies have attempted to transfer calibrated models amongst similar roofs located in different cities, with the premise of physical similarity, a common approach in predicting flows for ungauged basins (Oudin et al., 2008;Tsegaw et al., 2019).For example, Johannessen et al. (2019) tested the transferability of calibrated parameters of the SWMM model (Rossman, 2015) between similar green roofs located in four Norwegian cities with different climatic conditions.However, only calibrated models from wetter cities (higher amount of precipitation) showed to yield satisfactory modelling results for the green roofs of the drier cities, but not vice versa, indicating an influence of climatic inputs on model parameters.Abdalla et al. (2021) attempted to transfer trained machine learning models between the same set of similar green roofs located in four Norwegian cities.They found the transferred models to yield satisfactory results only between cities with similar rainfall events characteristics.
The effect of climatic variables on conceptual model parameters has not been thoroughly discussed in the context of green roof modelling.It is particularly important not only for transferring calibrated parameters of conceptual models amongst similar green roofs located in different locations but also for utilizing calibrated conceptual models of green roofs for evaluating climate change scenarios in which the climatic variables are significantly different from the ones used for model calibration.Abdalla et al. (2022) tested and evaluated the performance of a conceptual green roof model for 16 green roofs located in the Norwegian cities.They discussed the effect of climatic data on the calibrated model parameters, in particular the flow parameters.They found high values of flow parameters for cities that receive rainfall events with higher amount and intensity and have shorter anticipant dry weather periods (ADWP), in comparison to cities with low precipitation amounts and longer ADWP.They acknowledged the difficulties of estimating flow parameters from climatic conditions.Many studies that conducted hydrological modelling of large basins found the performance of conceptual models to reduce significantly when evaluated using different climatic conditions compared to the calibration period (Coron et al., 2012;Hartmann & Bárdossy, 2005).Fowler et al. (2016) discussed the effect of the calibration method on producing robust parameter sets that are applicable for contrasting climatic conditions.They recommended a calibration strategy based on multi-objective optimization to explore trade-offs between model performance in different climatic conditions.Similarly, Saavedra et al. (2022) found the hydrological models in their study to produce poor flow simulations in contrasting climatic conditions from calibration periods and proposed a model calibration strategy based on multi-objective optimizations for reducing the dependency of model parameters on climatic inputs.
A multi-objective optimization aims at approximating a Pareto front that contains a set of optimal solutions.In early hydrological modelling studies using Pareto front, the Pareto front was estimated by aggregating objective functions into one scalar value and running a series of independent optimization runs of the scalar value with varying weights of the objective functions (H.Madsen, 2000;Henrik Madsen, 2003).The development of algorithms that are customized for multi-objective problems, such as the nondominated sorting genetic algorithm II (NSGA-II) (Deb et al., 2002) and the multi-objective Shuffled Complex Evolution Metropolis (Vrugt et al., 2003) allows for efficient estimation of Pareto front.In recent years, several multi-objective algorithms were developed and evaluated in hydrological modelling studies.Examples include the multiobjective Artificial Bee Colony optimization algorithm (Huo & Liu, 2019), the differential evolution with adaptive Cauchy mutation and Chaos searching (MODE-CMCS) (Liu et al., 2016), and the multi-objective Bayesian optimization (M T M Emmerich et al., 2006).Some studies attempted to compare the performance of algorithms in the context of hydrological modelling (Guo et al., 2014;Wang et al., 2010).This research sought to investigate a multi-objective optimization scheme for multi-site calibrations of sixteen extensive green roofs located in four Norwegian cities with different climatic conditions.The primary aim of this study is to demonstrate the possible advantages of multi-site calibration over single-site calibration for conceptual hydrological models of green roofs.Moreover, the study provides insights on the practical implication of multi-site optimization for urban stormwater management.

Green roof data
Sixteen extensive green roofs located in four Norwegian cities were used in this study.The cities are Bergen, Sandnes, Trondheim and Oslo.Bergen city receives the highest amount of annual precipitation of 3110 mm, followed by Sandnes city which receives annual precipitation of around 1700 mm.Both Sandnes and Bergen are classified as temperate oceanic climate (Cfb), according to Köppen-Geiger climate classification (Kottek et al., 2006).Trondheim is the northmost city with annual precipitation of around 1100 mm and has a subpolar oceanic climate (Dfc).The driest city in the study is Oslo, receiving annual precipitation of 970 mm, with a temperate oceanic climate (Cfb).A comparison between the rainfall characteristics of the four cities can be found in Abdalla et al. (2021).
The green roofs vary in geometries (i.e., width, length, and slope) among the four cities.According to similarities in configurations, they were categorized into five types, as shown in Table 1.
Precipitation, outflow, and temperature were collected between 2015-2017 in one-minute resolution.The roofs in Oslo have a long record of data (from 2011-2017).The reader is directed to Johannessen et al. (2018) for more details about field measurements and data pre-processing.

The rationale for multi-site calibration
The performance of the calibration is typically assessed via objective functions such as the Nash Sutcliffe efficiency (Nash & Sutcliffe, 1970) and the Kling Gupta efficiency (Gupta et al., 2009).
A single site calibration yields solutions that are near-optimal for the specific site.Many optimization algorithms used in hydrological modelling are stochastic, such as the shuffled complex evolution (SCE-UA) (Duan et al., 1992), resulting in different solutions for the same site and calibration setup.When these solutions are applied at another site with contrasting climatic conditions, they might result in poor solutions reflected by low values of objective functions.This was reported by Johannessen et al. (2019), attempting to transfer unchanged model parameters between similar green roofs located in different locations.On the other hand, the multi-site calibration explores trade-offs between model performance in different climatic conditions.Multi-site calibration aims to approximate the Pareto front containing non-dominated solutions for the objective functions used for calibration.Figure 1 presents a hypothetical Pareto front for two green roofs with contrasting climatic conditions.According to Figure 1, calibration solutions can be classified into one of five classes: i) theoretically possible that are neither acceptable for both green roofs, ii) theoretically impossible solutions, iii) solutions that are only acceptable for green roof 1, iv) solutions that are only acceptable for green roof 2 and v) solutions that are acceptable for both green roofs.The latter class is desirable for yielding parameters that are applicable to different climatic conditions.Figure 1: Results of two hypothetical calibrations of two green roofs plotted in two-dimensional objective space.

Single-site calibration
In this study, single-site calibration refers to the process of obtaining optimal values of model parameters for a single green roof, using a single objective optimization algorithm (SOO).The differential evolution algorithm (DE) was used for single-site calibration (Storn & Price, 1997).
DE is a stochastic algorithm that belongs to a family of optimization methods, referred to as evolutionary algorithms.These methods are suitable for global optimization and do not require the optimized function to be differentiable or continuous.DE generates populations of candidate solutions iteratively until a certain stoppage criterion is met.Each solution contains a vector of model parameters, and each population evolves from the previous one in such a way that each solution is either improved or remained the same.The initial generation is formed through random sampling of parameters from the user-defined ranges.To generate the next population, the DE applies a differential mutation process for each member of the current generation.In this process, three solutions (x0, x1, and x2) are randomly selected from the current population to produce a population of mutant solutions () for each member of the population as follows: , called the mutation factor, is a positive scale value typically less than 1.After the mutation process is done for each member of the population, the DE applies the crossover process which controls the fraction of parameters that are copied from the mutant or the original solution.A trial solution () is formed for each member of the population as follows: ,,+1 = {  ,,+1 ,   , ≤    =    ,, ,   , >    ≠   Equation 2Where  , is a random real value between (0,1),  is the index of the parameter in the solution vector,  is the index of the solution in the population,  is the index of the population, CR is the cross-over probability, and   is a random integer number between (1, D) where D is the number of solutions for each population.  ensure that  ,+1 ≠  , .After the cross-over process, the DE applies the selection process, in which each solution from the current population is compared with its associated trial solutions from the cross-over process.the solution with the best objective value is selected for the next population.If the two objective functions are equal, the trial solution is selected for the next population.
In this study, the DEopim library in R was used (Mullen et al., 2011), and the KGE of the simulated outflow was selected as an objective function.The hyperparameter of the optimizer were selected as follow: CR = 0.5, F = 0.8, D = 10 * number of model parameters.The stoppage criterion was running the DE until the maximum number of populations (N = 200) was reached.Typically, the best solution in the last population is considered optimal in single-site calibration.In this study, however, the best solution for each population was considered a near-optimal solution.Hence, for each single site calibration, a group of 200 parameter sets was selected.Note that some solutions were duplicated since the best solution could remain the same in several populations.

Multi-site calibration
In this study, multi-site calibration refers to the process of estimating the Pareto front for two green roofs using a multi-objective optimization algorithm (MOO).For multi-site optimization, multi-objective Bayesian optimization (MBO) was selected.This algorithm requires a fewer number of model evaluations to approximate the Pareto front, in comparison to other multiobjective optimization methods (Binois & Picheny, 2019).The steps of the MBO are as follows: i. Select an initial population of candidate solutions based on random sampling from the pre-defined parameter limits and determined the value of the objective functions of each solution.
ii iii.Build a surrogate model for each objective function from the candidate solutions.The Gaussian process was selected for building the surrogate model in this study (Binois & Picheny, 2019;Snoek et al., 2012;Worland et al., 2018). iv.
Select a new solution based on the surrogate models.The new solution is selected following a specific criterion that improves the Pareto front of the current iteration.
v. The selected solution is evaluated in the hydrological model, and its objective functions are determined and used to update the surrogate models of the objective functions. vi.
Repeat steps iii to v for N iteration (1000 in this study).
This study applied a common criterion for selecting potential solutions from surrogate models, termed the expected hypervolume improvement (Emmerich et al., 2011) which is presented in Figure 2. Figure 2: Hypervolume criterion for selecting potential solutions.The orange solution is better than the green solution based on the method.The orange solution maximizes the hypervolume which is measured from the reference point.modified from (Binois & Picheny, 2019) The GPareto library in R (Binois & Picheny, 2019) was used for the multi-objective optimization in this study.The objective functions used were the KGE of simulated outflows for each green roof.

The hydrological model (CRRM linear)
The green roofs were modelled with a linear reservoir model (Figure 3).The model was developed and tested by Abdalla et al. (2022).It applies several equations (Equation 3 -Equation 10) to calculate infiltration (INF), drainage flow (Q), actual evapotranspiration (AET), soil moisture (SW), and drainage storage (DW).The potential evapotranspiration (PET) is determined using the Oudin formula (Oudin et al., 2005), which is suitable for cold climates and was found to be suitable by Johannessen et al. (2017) for cities in this study.The model contains five calibrated parameters; S1 (available storage of the soil layer), S2 (available storage of the drain layer), S11 (the threshold of soil water after which AET is equal to PET), k1 (flow parameter of the soil layer) and k2 (flow parameter of the drainage layer).1  =   ×   Equation 92  = min ( −1 ,   − 1  ) Equation 10

Study experiments
To investigate the performance of the multi-objective optimization, three experiments were conducted as follows: • Experiment one: calibration of two similar green roofs configurations in different sites • Experiment two: calibration of two different green roofs configurations in the same site • Experiment three: calibration of four similar green roofs configurations in different sites In all experiments, the Pareto optimal solutions were compared with the results of single-site calibrations.Based on the value of KGE, the model results were classified as: -Poor (KGE<0.5)-Satisfactory (0.5<KGE<0.75)-Good (KGE>0.75) The classification followed the recommendation of Thiemig et al. (2013).It should be emphasized that such classification is based on the consensus of what is considered "good" or "poor" modelling results in the literature.
Measurements from 2017 were selected for model calibration while 2016 data were used for model validation.Snow periods (i.e., October to March) were excluded since the model does not simulate snow accumulation and melting.A 5 min time-step was use for the modelling.Hence, data were aggregated accordingly.

Two-site calibration (similar roofs configurations on different sites)
The optimal solutions for the two-site and the single-site calibration schemes were plotted and compared.Figure 4 presents the comparisons for type A and type B roofs.Some solutions were found by the single-site calibration to yield poor model results when transferred to different sites.
For instance, all parameter sets of OSL3 yielded KGE values below 0.1 for the TRD1 roof.In contrast, multi-site calibration yielded solutions that were satisfactory for both OSL3 and TRD1 roofs.In some cases, single-site calibration yielded satisfactory to good results for other roofs than the one used for calibration.For instance, all solutions of OSL3 roof yielded good to satisfactory results for BERG1, and vice versa.However, solutions found by the multi-site calibration for the two roofs were closer to the 1:1 line (i.e., best compromised solutions).
For some roofs, different parameter sets gave the same results for the same site, indicating equifinality (Beven, 1993).These solutions, however, yielded different results when transferred to different sites.For instance, optimal solutions that produced the same model performance at BERG3 yielded poor to satisfactory results for the OSL2 roof.This shows that single-site calibration could potentially miss promising solutions which produce satisfactory results in different locations.A similar conclusion was drawn in a study by Fowler et al. (2016), where they assessed the transferability of model parameters between dry and wet conditions.

Two-site calibration (different roofs configurations on the same site)
The comparison between the two calibration schemes (single vs multi-sites) is presented in Figure 5 for Bergen roofs.Almost all parameter sets found by the single-site calibration could yield satisfactory to good results in the other roofs with different configurations.Only a few parameter sets of BERG2 roofs yielded poor results for BERG3.On the other hand, results from the two-site calibration yielded better compromised results (closer to the 1:1 line).
It can be noted that climatic variables (i.e., location) could have a greater influence on model parameters than the roof's physical characteristics, as shown in Figure 5 as opposed to Figure 4.
Similarly, Abdalla et al. (2021) found that ML trained in one location could yield satisfactory model performance for the different roof properties that are located in the same location.Figure 4: The comparison between two-site (MOO) and single-site (SOO) calibrations for similar green roofs located in different cities.Solutions that are close to the 1:1 line are considered the best compromised solutions.The greyshaded area represents solutions that are considered satisfactory for both sites (0.5<KGE<0.75).The green-shaded area represents solutions that are considered good for both sites (KGE>0.75)manuscript submitted to Water Resources Research Figure 5: The comparison between two-site (MOO) and single-site (SOO) calibrations for different green roof types located at the same site (Bergen).The grey-shaded area represents solutions that are considered satisfactory for both sites (0.5<KGE<0.75).The green-shaded area represents solutions that are considered good for both sites (KGE>0.75)

Four site calibration (similar roofs in different cities)
The solutions of the single site calibration were used to simulate outflows for the green roofs in the other cities in the study.Figure 6 presents the performance of these simulations for type A and type B roofs.The result showed that transferring single site calibration results into different locations could yield poor modelling results.A similar finding was reported in the study of Johannessen et al. (2019) in which calibrated SWMM models were found to yield poor results when validated in multiple locations.As shown in Figure 6, transferability could yield satisfactory results between some cities (for instance, Bergen and Oslo).However, obtaining one parameter set from single-site calibration that produces satisfactory results in the four cities is very difficult, if not impossible.On the other hand, multi-site calibration resulted in a set of non-dominated solutions that allowed for exploring trade-offs of model performance amongst cities.
One parameter set that yielded the highest minimum KGE between the four locations was selected (S1 = 6.794,S11 = 8.378, k1 = 0.435, k2 = 0.031, S2 = 3.989).The selected set yielded KGE values ranging between 0.62 to 0.89 for the calibration periods and 0.6-0.82 for the validation periods, as shown in Table 2, for the four roofs which are considered satisfactory to good results.The simulated outflows matched well with observation, although some of the peak values were underestimated.
The selected parameter set was used to simulate outflows from the sixteen roofs in the study.Table 2 presents the performance of these simulations, as measured by KGE.All simulations yielded KGE values that were higher than 0.5 and some scored KGE above 0.75, indicating satisfactory to good results.Therefore, in contrast to single-site calibration, it is possible to obtain a common parameter set that yields satisfactory model results for different locations, by evaluating Pareto optimal solutions from multi-site calibration.
It could be noted that the variation of KGE values between locations and modelling period was slightly higher than between the different roof properties.For instance, the common set scored KGE values that ranged only between 0.58-0.63 for Oslo roofs (calibration periods), and only between 0.67-0.89for Bergen roofs (validation periods).This further strengthens the conclusion that the influence of climatic variables on conceptual model parameters is higher than the influence of the roof properties.

Implications for stormwater management
Single-site calibration was found to yield optimal parameters for one location which performed poorly in the other sites, due to the different climatic conditions.In the future, climatic variables are expected to change significantly due to climate change (Sun et al., 2006).Therefore, a conceptual model calibrated with the current climate variables using a single-site scheme is likely to yield poorer simulations for the future.Nevertheless, this argument has rarely been discussed in the context of modelling sustainable stormwater measures, such as green roofs.It is a common practice in sustainable stormwater modelling studies to investigate climate change scenarios using a model calibrated with the current conditions.Therefore, caution should be exercised when interpreting the results of a model that is calibrated in contrasting climatic conditions from those used in model scenarios.
The results of this study are in-line with the common consensus in catchment modelling studies, in which hydrological models were found to score poor simulation results when evaluated on contracting climatic compared to those used for model calibration (Coron et al., 2012;Hartmann & Bárdossy, 2005.A solution which has been suggested by some scholars, is to calibrate models on climatic conditions similar to those used in model scenarios (C.Z. Li et al., 2012).For instance, if the model is intended to simulate wet conditions it must be calibrated on a wet condition period from the historical data.However, as argued by Fowler et al. (2016), this limits the applicability of calibrated model beyond the climatic conditions available in the historic periods.In green roof studies, observations are even more scarce than in large catchments which further limits the applicability of such an approach.
The results of this study show that obtaining a common parameter set that fit "reasonably well" for different locations and roof properties could be achieved by multi-site calibration.This is valuable for stormwater management, as it provides a fast and reliable tool for quantifying the hydrological impact of green roofs in different locations and climate change scenarios.It should be noted, however, that such a common parameter set typically will yield lower performance for one roof than the best parameter set from the single site calibration of that roof.A question is whether this decrease in performance affects the usefulness of the model for stormwater management.
Before answering this question, it is useful to discuss the common metrics used to quantify the hydrological benefits of green roofs.Typically, green roof performance is measured by assessing retention and detention.The former is the measure of how much water is retained (i.e., removed) via roof evapotranspiration.In the literature on green roof modelling, simple water balance models with hourly or daily time steps and suitable evapotranspiration equations were found to be sufficient for estimating retention (Abdalla et al., 2021;Bengtsson et al., 2005;Stovin et al., 2013).
On the other hand, green roof detention refers to the reduction and delay of outflows due to the temporal storage of water in the green roof.Estimating detention requires calibrated models and short time steps (sub-hourly).Typically, detention is measured by event-based metrics, such as peak reduction, peak delay, etc.However, recent studies discussed issues of event-based metrics and suggested alternative approaches based on long term-simulations (Stovin et al., 2017).Among these alternatives, flow duration curves (FDCs) were found to provide an unambiguous estimation of green roof detention (Hernes et al., 2020).Hence, it was adopted in the study.
We investigated the accuracy of simulated FDCs from the common parameter set in Table 2 (regional set) and whether these FDCs are comparable with those derived from the best parameter set from the single-site calibration setup (best fit).Figure 8 presents the observed outflow and FDC of the BERG1 roof compared with the simulated results from the best fit, and the regional parameter sets.Both parameters sets underestimated the high flows.However, the best fit set produced better estimates of the high flows than the regional set.This represents the part of the FDC with low durations (e.g., less than 5 hours).For medium and low parts of the FDC (duration > 5 hours), the regional set produced slightly better estimates for medium and low values.Figure 9 presents the simulated and observed FDCs for the sixteen roofs in the study.For visualization purposes, the log-log scale was used.The regional parameter set produced FDCs that were comparable to those derived by the best fit sets for each roof.for cities with high and intense precipitation, such as Bergen, the best-fit parameters produced better estimates of high values while the regional set slightly produce better simulations for medium and low values.On the other hand, for Trondheim city, which receives lower precipitation amount and intensity, the regional set overestimated low values and provided a better estimate for high values.
Figure 8: a) observed outflows of BERG1 roof compared by simulated outflows from the best parameter set (best fit) and the four-site calibration (Regional) for the selected period.b) Observed flow duration curve (FDC) of BERG1 compared by the simulated FDC obtained from the parameter set that produces the best fit at BERG1 (single site) and from the best compromised parameter set from the four-site calibration (Regional).Figure 9: Observed flow duration curve (FDC) of the sixteen green roofs compared by the simulated FDC obtained from the parameter set that produces the best fit at each site (single site calibration) and from the best compromised parameter set from the four-site calibration (Regional).

Summary and conclusions
The current study aimed to evaluate the potential of multi-site calibration for conceptual hydrological models of green roofs.Additionally, the study provided insights on the practical implication of multi-site calibration, concerning stormwater management.Based on the results of the study, the following conclusions can be drawn: • Single site calibration obtains optimal parameters for one site that perform poorly for other locations and climate conditions.
• The variation of model performance due to climatic variables is greater than due to roof properties.
• Obtaining a common parameter set that yields satisfactory (Kling Gupta Efficiency >0.5) for different locations and roof properties can be achieved by multi-site calibration.Such a parameter set provides flow durations curves that are comparable in accuracy to those derived from the best parameter sets from single-site calibration • The multi-site calibration scheme is recommended not only for transferability among roofs in different cities but also when applying conceptual models for evaluating climate change scenarios for which the climatic variables are significantly different from the ones used for calibration.

Figure 7
Figure7illustrates the simulated and observed outflows of type A roofs for the validation periods.

Figure 6 :
Figure6: The performance of parameter sets obtained from single-site calibration in similar roofs located in other cities

Table 1 : Green roofs geometries and configurations Roof type
manuscript submitted to Water Resources Research VM: vegetation mats (sedum) MW: a mineral wool plate TR: Textile retention fabric L+B: a mixture of Leca and bricks PE: plastic drainage layers of polyethylene EPS: plastic drainage layers of expanded polystyrene HDPE: plastic drainage layers of high-density polyethylene . Apply the Pareto dominance test to extract non-dominance solutions, forming an initial Pareto front.A solution x1 is said to dominate solution x2 if and only if i) solution x1 is not worse than x2 in all objective functions and ii) x1 is better than x2 in at least one objective function.Non-dominated solutions are solutions that are not dominated by any member of the solution set.

Table 2 :
The performance of the best compromised parameter set from the four site-calibration on the 16