Green roofs for stormwater runoff retention: A global quantitative synthesis of the performance

Abstract The global popularity of green roofs (GRs) rises as urban runoff becomes a primary environmental concern in both developed and developing countries. Although a growing number of studies have measured the runoff retention (RR) performance of GRs and investigated the underpinning factors, a systematic and quantitative understanding is lacking. This study applies a statistical approach on a dataset of 2375 original experimental samples associated with the RR performance of GRs observed across 21 countries, consolidated from 75 internationally peer-reviewed studies published in 2005–2020. The results show that the sampled RR rates (i.e., the proportion of rainfall retained on a per-event basis) range widely (0–100%), with an average of 62%. Rainfall intensity, substrate depth, GR surface coverage, climate type, vegetation type, and season type partially explain the variances in retention performance. Moreover, the effects of some factors (e.g., rainfall intensity) are not isolated but contingent on other factors (e.g., vegetative cover). This global synthesis shows few samples emanate from Africa, Central America, and Central Asia, highlighting the need of more GR research and applications in these regions. The average GR RR rate appears lower than some specified in green building standards, which implies the need to further improve the RR performance of GRs or combine GRs with other RR measures. Contingent effects of GR RR incluencing factors demonstrate the need to leverage design parameters and to account for local weather and climate characteristics in the optimization of GR performance.


Introduction
The world has observed rapid urbanization over the past decades, with increased construction to accommodate the shift in population from rural to urban areas (Zhou et al., 2020). Currently, almost half of the world population lives in cities, with up to 67% living in cities by 2050 (United Nations, 2019). Despite socioeconomic benefits, urbanization leads to numerous environmental problems, such as urban floods and urban heat island (UHI) effects, where urban areas generally exhibit higher temperatures than surrounding rural areas due mainly to modifications of land surfaces (Manoli et al., 2019;Miller and Hutchins, 2017). Climate change further aggravates these problems by increasing both the frequency and intensity of climatic extremes (Avashia and Garg, 2020;Mishra et al., 2019;Leandro et al., 2020;Martel et al., 2020). As highlighted in the Sustainable Development Goals (SDGs), environmentally sustainable construction upgrades of current and new urban infrastructure are paramount to enhance urban health and to adapt to climate change (i.e., SDG 11 sustainable cities and communities, and SDG 13 climate change).
Given that buildings can occupy 50% or more of a city's surface area (Dunnett and Kingsbury, 2008), green roofs (GRs) have a high exploitative potential (Kelly et al., 2020) to tackle the multiple threats imposed by urbanization and climate change (Akther et al., 2018) and also fulfill the SDGs. GRs are an extension of an existing roof that allows for the propagation of rooftop vegetation (the vegetation layer) on top of a growing media (the substrate layer) over a waterproofing membrane (Vijayaraghavan, 2016). In contrast to a conventional roof where the vast majority of rainfall flows off, rainwater landing on GRs enters a complex hydrological system (Lambrinos, 2015). The system retains water in vegetation, substrate and layered materials, thus providing runoff retention (RR) capacity for stormwater management (Mentens et al., 2006;Kasmin and Musa, 2012). Moreover, water leaving the GR systems through evapotranspiration contributes to a cooling effect and can help mitigate the urban heat island (UHI) effect (Sanchez and Reames, 2019) and help adjust the indoor thermal environment (Coma et al., 2016). As such, GRs are considered a multipurpose green infrastructure that alleviates multiple urban problems and strengthens urban resilience, especially in the face of climate change (Loiola et al., 2019). Further, GRs are a low-impact development (LID) measure that mimics natural processes to manage stormwater without occupying any additional urban land (Zhang et al., 2015;Buccola and Spolek, 2011).
Despite the sizable potential benefits (Berardi et al., 2014), adoption of GRs has been slow primarily due to high costs associated with design, construction, and maintenance coupled with the ambiguity of potential urban service provision (Sproul et al., 2014). In particular, although the RR performance of GRs (i.e., the proportion of rainfall that is retained, either on an annual or per event basis (Stovin et al., 2017)) has been assessed via experimental case-study and model simulation since 1960 (Shafique et al., 2018), the results exhibit large variations and discrepancies, likely owing to variable locations, design and construction factors, and operation scenarios (Czemiel Berndtsson, 2010). Such uncertainty at a high cost discourages new GR implementation now and into the future. A more comprehensive understanding of how well GRs function and how the RR performance associated with various design parameters and local climates is vital to optimize the RR performance, to determine if GRs are an appropriate choice for stormwater runoff mitigation in a given urban area and if they can meet relevant green building performance requirements and regulations. Normalized knowledge is also needed to set plausible GR and green infrastructure performance guidelines or standards since severe inconsistencies remain for current urban ecosystem service expectations (Calvert et al., 2018).
Our study seeks to identify and quantify predictable patterns detailing GR RR in relation to varying design and climate characteristics. We consolidate experimental measurements of the RR performances from 75 peer-reviewed publications published between 2005 and 2020. The dataset we developed covers 21 countries and includes 2375 original measurements. Using a statistical approach (i.e., multiple regression analysis), we then identify factors contributing to different conclusions of the effects of GRs on RR and conducted stepwise meta-regression analysis. To the best of our knowledge, this analysis constitutes the first global synthesis and attempt to find and quantify predictable patterns of the RR performance of GRs with regard to heterogeneous factors. The findings suggest global parameters for the macro-simulation of GRs performance at the city level, and the quantitative insights describe GR performance in a broader context such that together these findings can help policymakers, urban designers, and contractors make better decisions about urban planning and GR design regarding runoff mitigation.

Literature search and selection criteria
The workflow of the literature search and screening is illustrated in Fig. 1. We located 7127 peer-reviewed research papers published between January 1950 and June 2020 from the Web of Science database with any of the four keywordsgreen roof, low-impact development, sponge city, and ecological roofin the abstract or title. The peer-review process serves to some extent as a reasonable filter for rigorous scientific work. Based on abstracts and titles, we first eliminated duplicates and non-English articles, which left us with 5780 papers. Then abstractbased relevance screening was applied to identify studies that i) addressed RR performance of GRs, thus excluding those that investigated non-building surface greening measures (such as park landscaping), focused on other influences of GRs (such as mitigating UHI effect); ii) provided primary data obtained from experiments and monitoring, thus excluding summaries, reviews, and modeling studies using secondary data.
The criteria left us with 84 studies. We further addressed the whole document and extracted experimental data from the primary studies. Studies lacking enough information for our regression models, including those that provide little information on influencing factors or measurements of direct RR effect as well as those that processed and presented experimental data only in graphs such that underlying data was undiscernable, were excluded.
After the literature screening, we obtained 75 studies published between 2005 and 2020 (see the full list in the supplementary information Table S1) and extracted 2375 samples for the following regression analysis. Among these 75 studies, 73 of them estimated RR effect with 1948 samples (Table S2), and 19 studies estimated peak runoff retention (PRR) effect with 427 samples (Table S3).

GR RR effect metrics
RR rates and PRR rates specified in Eqs. (1) and (2) are two of the most commonly used metrics to measure the RR effect of stormwater management approaches, including GRs (Hakimdavar et al., 2014).
Where P is rainfall intensity measured as precipitation depth per rain event (mm/rain event), R is the total runoff drained from GRs during the rain event (mm/rain event), I is the maximum rainfall intensity during the rain event (mm/min), and G p measures the peak discharge from the GR (mm/min).

Factors contributing to variations in GR RR effect
A variety of factors have been reported to influence GR RR performance, although a consistent, quantitative estimate and comparison of the performances do not exist. The factors can be classified into two categories. One category is climate-related variables, such as climate types, rainfall intensity, seasons, and antecedent dry weather period (ADWP) (Voyde et al., 2010;Berghage et al., 2009;Versini et al., 2016). Climate types and seasons influence GR RR performance by the variances in humidity, temperature, and precipitation (Loiola et al., 2019;Brandão et al., 2017). For example, Viola et al. (2017) reported that the RR performance of GRs increases when rainfall and potential evapotraspiration exhibit the same seasonality (such as in humid subtropical climates) while decreases when they are in counter-phase (such as in a Mediterranean climate). Schroll et al. (2011) indicated that cool wet season climates (e.g., the Pacific Northwest) and winter are challenging for GR RR performance. It is widely acknowledged that the RR has a negative correlation with rainfall intensity (Lee et al., 2015;Zhang et al., 2015;Carter and Rasmussen, 2006) and a positive correlation with ADWP Lee et al., 2015), though exceptions sometimes exist (Stovin et al., 2012).
The other category is GR design-related variables, including GR vegetation types, geometrical properties (i.e., GR surface coverage area and slope), GR substrate characteristics (i.e., type, depth, porosity, and density), and GR drainage layer characteristics (i.e., type and depth) (Barnhart et al., 2021;Talebi et al., 2019;Afizah Asman et al., 2017). Vegetation species, growth status (plant height and vegetation coverage), and structure significantly influence the amount of water runoff (Soulis et al., 2017a;Gong et al., 2021). The most widely used vegetation in GRs are the Crassulacean Acid Metabolism (CAM) plants, such as Sedum (Gong et al., 2021;Li et al., 2018;Butler and Orians, 2011). The maintainance cost of such vegetation is much lower than other vetegation types since it is resistant to drought, temperature and wind, and requires little artificial irrigation (Dvorak and Volder, 2010). However, in most of the study cases, the RR of CAM vegetation is less effective than other vegetation types, such as grass (Nagase and Dunnett, 2012b;Whittinghill et al., 2015) and C3 vegetation (Cristiano et al., 2020). Moreover, some studies show that a combination of plant species increases the retention performance (Brandão et al., 2017), while others found no such evidence (Nagase and Dunnett, 2012b). Besides, the geometrical properties of GRs, including slope and coverage area, also affect runoff dynamics (Czemiel Berndtsson, 2010). While some studies found little influence of GR coverage area and GR slope on RR capacity, others reported that the retention performance of GRs increase with the GR coverage area and decreasing slope (Gong et al., 2019). Additional key drivers of GR retention performance are substrate physical properties, such as substrate material, depth, porosity and density (Liu et al., 2019;VanWoert et al., 2005). It is widely recognized that deeper substrates present more advantages over shallow ones in RR, while the latter fits building retrofitation better due to its light weight and less load on the existing roof structure (Castiglia Feitosa and Wilkinson, 2016). The hydrologic attributes of substrate materials, such as wet weight, retentive capacity, hydraulic conductivity, also determine the retention performance (Bollman et al., 2019;Liu and Fassman-Beck, 2018). GRs with more permeable substrates show lower retention rates because of lower maximum storage capacity (Stovin et al., 2015). Similar to substrate characteristics, the characteristics of drainage layer, such as material and depth, influence the water storage capacity and thus the RR capacity of GRs as well (Baryla et al., 2018).
We included seven variables that have most commonly been considered by prior studies in our regression analysis. Three of them are climate-related variables (climate type, rainfall intensity, and season), and four are design-related variables (GR vegetation type, substrate depth, substrate type and GR coverage area). We were able to extract information on climate type, rainfall intensity, vegetation type, and substrate depth in all experiments presented in the sampled literature; information for substrate type, area of GR surface coverage, and season type are specified in more than 90% of the studies ( Table 1). Aside from the seven variables, the literature also indicated the GR RR effects of other factors, such as the GR slope, drainage layer type, drainage layer depth, and substrate density, as well as the length of antecedent dry weather period (ADWP). However, information for those variables is relatively less-well specified, available in only 80% of the sampled studies or less. In order to obtain more accurate statistical estimates, we used the large data sample made up of the seven variables in the benchmark model. Moreover, noticing that the substrate type data is highly skewed, we removed it from the benchmark model.

Multiple regression analysis: the benchmark models
The relationships between GR RR performance and the main determinants identified in the sampled literature were then explored using multivariate statistical techniques. Another explanatory variable, the year of publication (y), was included in the regression models according to the stepwise multiple regression method. It captures the variances of influencing factors other than climate conditions and GR design parameters, such as the evolving technology levels over time. Moreover, the stepwise method supports a non-linear relationship between some of the explanatory variables and the RR rate, which is consistent with prior studies (Yio et al., 2013;Mentens et al., 2006). The model specification and selection were based on Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC).
The benchmark regression models explaining GR RR performance are: (3) The dependent variable R i is either the RR or PRR rate of sample i. The six explanatory variables of interest (P, D, A, C, S, V), as defined in Table 2, are specified on the right side of the equation. P, D, A are continuous variables, whilst C, S and V are dummy variables representing categorical data. Natural log transformations were performed on the continuous explanatory variables to reduce or remove the skewness of the original data and boost the validity of the statistical analyses. α is and β V are the coefficients of the explanatory variables in Eq. (3) and γ 2P , γ P , γ D , γ A , γ C , γ S and γ V are the coefficients of the explanatory variables in Eq. (4). In particular, β 2P (γ 2P ) and β P (γ P ) specify the non-linear relationship between (P)RR i and rainfall intensity (P i ) and β 2A and β A specify the non-linear relationship between RR i and the area of GR surface coverage (A i ). A positive (negative) coefficient for the quadratic term indicates a U-shaped (inverted U-shape) relationship. Due to the natural log transformation, β D measures the changes in (P)RR rates of GRs for every 1% increase in Table 1 Data availability of potential explanatory variables. Notes: N represents the number of individual studies and n represents the number of individual experimental samples. substrate depth (D i ). β C , β S , β V represent the changes of (P)RR rates of GRs due to different climates, different seasons, and different vegetation types, respectively. We include y i , a dummy variable of publication year, to represent the time effect (e.g., evolving technologies).

Examples of references
The error terms, ε, include the effects of other variables on (P)RR.
They are not specified separately in the equations due to limitations in the current understanding of potential determinants or limited availability of sample data. Examples of the latter include the length of proceeding dry periods and GR substrate porosity and slope. Prior research indicate they can affect GR RR (Czemiel Berndtsson, 2010; Soulis et al., 2017b;Loiola et al., 2019), but available quantitative estimates in the sampled literature are too few for statistical analyses. It is worth noting that we initially included a dummy variable for Organization for Economic Co-operation and Development (OECD) countries in order to distinguish the economic development level of the GR sites. OECD countries are usually considered more affluent and equipped with more advanced technologies than non-OECD countries. Yet, the effect of the dummy variable appeared not statistically significant when the above-mentioned explanatory variables were controlled for and were thus excluded from the model. A few residual diagnostics were performed to validate the statistical models. The AIC and BIC tests were used to assess the relative quality of potential model specifications (Supplementary Table S4-S5). The VIF (variance inflation factor) was calculated to diagnose multicollinearity (Supplementary Table S4-S5). The general rule of thumb is that VIFs exceeding 10 signify serious multicollinearity that requires correction. Since the mean VIF value of all models is less than 10 with most VIFs below 4, we can presume that there are some albeit insignificant correlations between explanatory variables. Furthermore, residual plots and results from the 'White general test for heteroskedasticity' (White, 1980) indicate the presence of heteroscedasticity on the residual errors. As such, robust standard errors were estimated to obtain reasonably accurate p values, which indicate whether the statistical relationships are significant or not.

Extended models with interaction terms
Moreover, we included models that incorporate first-order interactions between the explanatory variables. Eqs. (5)-(7) examine if an explanatory variable has a different effect on the outcome variable (i.e., RR and PRR) depending on the values of another explanatory variable, i. e., the contingent effects.
Specifically, Eq. (5) includes the interaction term of rainfall intensity (P) and presence of vegetative cover (v), the coefficient of which (η Pv ) measures the differences in the elasticity of RR rates to rainfall intensity between non-vegetated GRs (v i = 0) and vegetated GRs (v i = 1). The interaction effects of rainfall magnitude (p) and substrate depth (D) were captured in Eq. (6), in which η pD measures how the elasticity of (P)RR rates to substrate depth differ between 'light to small' rain events (p i =0) and 'large' rain events (p i =1). The interaction coefficient of substrate magnitude (d) and rainfall intensity (P) was captured in Eq. (7), in which η dP measures how the elasticity of RR rates to rainfall intensity varies when it is changed from an extensive GR with a low level of substrate depth (d i =0) to an intensive GR with a high level of substrate depth (d i =1). The effects of other interaction terms were also examined but excluded in the models according to the results of AIC and BIC tests.

Spatial, temporal, and topical distributions of the GR runoff mitigation studies and experimental measurements
The samples from the existing studies are unevenly distributed across the world (Fig. 2) with most samples obtained from developed countries. Of the 75 studies (N = 75) and the 2375 experimental samples (n = 2375) investigating the runoff mitigation relationships of GRs, OECD countries comprised more than 70% of the data (N = 56, n = 1661). 31 studies and 1028 samples were obtained from OECD-Europe while 25 studies and 633 samples were from non-European OECD countries, of which the majority came from the United States (N = 14, n = 230). 16 studies and 574 samples were conducted in developing countries in Asia, predominantly China (N = 13, n = 481). Only three studies and 142 samples were collected from South America, and all from a single country, Brazil. No data in our data set comes from Africa, Central America, or Central Asia. The scarcity of GR RR samples in these regions highlights the need for additional research and sampling, especially since urbanization in developing regions is anticipated to progress most rapidly (Shen et al., 2017) and thus benefit the most from the integration of green infrastructures into urban planning.
Globally, GR runoff mitigation studies increased rapidly from 2016 to 2019 with the increase predominantly in Asia and OECD-Europe (Fig. 3). Such growth is likely owing to the recent policy interests in green infrastructures and sustainable urban development (Liu and Jensen, 2018;Marsalek et al., 2008). Note that by focusing on publications written in English, we excluded peer-reviewed studies that were written in other languages and contributed scientific understanding around this topic. Thus, our estimates of the growths of the research field and research interests should be considered conservative.
Although intensive GRs (GRs with substrate depth >15 cm) are known to enable higher runoff mitigation (Yilmaz et al., 2016), the majority of the studies focused on extensive GRs (relatively shallow GRs with substrates <15 cm), regardless of region or year. Intensive GRs were only investigated in one-quarter of the reviewed studies. Prior studies pointed out that the heavyweight and the high maintenance and upfront construction costs limit the applicability of intensive GRs for large-scale projects (Soulis et al., 2017b) such that extensive GRs are more common for large-scale projects.
When assessing RR performance, a measurement of peak discharge is only possible with continuous monitoring and hydrologic modeling. As such, it is not surprising that most studies solely rely on RR rates and neglect PRR rates in their assessment of GR RR effects (n RR =1948; n PRR =427). However, PRR rates are an important indicator in evaluating GR function since sewer overflows in a rainfall event and the maximum erosion damage are usually associated with peak flow (Li and Babcock Jr, 2014). Therefore, to obtain a more comprehensive evaluation of GR performance, we include the samples of PRR rates in our analysis despite the relatively small sample size.
Our results also reveal that most studies considered multiple factors in GR runoff performance. However, few examined the contingent effects of these variables. Such nuances are crucial for optimizing GR designs and implementations in heterogeneous situations. Only eight (10.7%) of the 75 studies considered and examined how two factors jointly affected GR RR performance. As an example, Loiola et al. (2019) revealed that the influence of soil moisture on the RR rates is contingent on vegetative cover; RR rates are more sensitive to changes in soil moisture when GRs are covered with vegetation.

Globally observed RR performance of GRs
The mathematical mean of GR RR rate of sampled studies (n = 1948) is 62.2% (Table 3) compared with the PRR rate mean of 69.3% (n = 427). The mean RR rate appears to be consistent with findings in an earlier analysis, which reports average retention of 56% for the effectiveness of extensive GRs on RR performance (Gregoire and Clausen, 2011). The larger mean value of our result is probably because we include both intensive and extensive GRs in the samples and intensive GRs usually have better retention performance than extensive ones (Razzaghmanesh and Beecham, 2014). The RR rate and PRR rate range from 0 to 100% for RR rates and from 0.4% to 100% for PRR rates. The wider coverage of our data results in ranges larger than those identified in previous reviews, e.g., (Shafique et al., 2018;Mentens et al., 2006, Li andBabcock Jr, 2014), suggesting the heterogeneity in the empirical RR measurements are more substantial than previously known. The wide range could result from the variances in climate conditions, GR design, antecedent dry days, and rainfall intensity (Gregoire and Clausen, 2011). Intuitively, when rainfall intensity is especially low, there is almost no runoff, so the retention rate is 100% (Fioretti et al., 2010). When the rainfall is extremely high or the length of proceeding dry period is short, which increases the initial soil moisture and eliminates the soil retention capacity, GRs are not able to store water and retain the stormwater runoff, resulting in a zero value of the retention rate (Fioretti et al., 2010).

Six main determinants of GR RR performance: quantitative estimates and relative importance
As mentioned previously, the most studied factors affecting stormwater RR effects of GRs are climate type, rainfall intensity, vegetation type, substrate depth, season and GR coverage area. Our regression results indicate that all six factors are statistically significant predictors of GR RR effects (p<0.1).

Effects of rainfall intensity (Pi)
While holding other explanatory variables constant, our regression analyses, consistent with previous findings (Carter and Rasmussen, 2006;Zhang, 2018;Dai et al., 2016), indicate that rainfall intensity is negatively correlated with the RR and PRR rates of GRs. Using a linear model, we found that with every 1% increase of rainfall intensity in a given rain event, the RR rate decreases by 0.14% (Column 2 in Table 4) and PRR rate decreases by 0.08% (Column 5 in Table 4). Moreover, for the RR rates, non-linear models with the quadratic factor of rainfall intensity (Column 3 in Table 4) shows a minor advantage in predictable power over the linear model (Column 2 in Table 4), with the explanatory power increasing by 1%. The non-linear model also illustrates poorer water retention performance under higher rainfall intensity, while the decreasing rate grows with the rainfall intensity. Such a  Fig. 3. A timeline overview of the growths of GR runoff mitigation studies. Each circle represents one study. The color of the circle illustrates the spatial or topical focus of the study.

Table 3
A statistical summary of the RR rates and PRR rates of the consolidated GR dataset.
Notes: Circles underlying each boxplot represent the distribution of RR or PRR rates associated with an individual sample, ranging from 0 to 100%. The percentages in the parentheses for three factorsvegetation type, climate type and season typerepresent the proportion in the data sample.

Table 4
Estimates of the GR-RR effects based on the benchmark model.  non-linear correlation was also found for PRR performance. The negative correlation between RR performance and rainfall intensity is explainable. In general, a runoff will only occur when rainfall exceeds the maximum RR capacity of a GR (Palla et al., 2009;She and Pang, 2010). When rainfall intensity is small, rainfall will be largely absorbed by the GR, with little to no runoff discharged from the GR. When rainfall exceeds the GR maximum RR capacity, subsequent runoff occurs. In more intense storms, there is less time for substrate and plants to absorb moisture, resulting in decreased retention capacity and increased elasticity to rainfall intensity.

Effects of substrate depth (Di)
Our regression results show substrate depth is positively correlated with RR rate, which agrees with findings that intensive GRs outperform extensive GRs in stormwater RR (Mentens et al., 2006;Monterusso et al., 2002;Soulis et al., 2017a). Our quantitative estimates further refine existing knowledge. For every 1% increase in substrate depth, our analyses indicate RR rate increases by 0.1% (Column 3 in Table 4). This is consistent with the fact that thicker substrate layers contain more mesoand micro-pores for long-term moisture storage, essentially providing more storage capacity than a shallower substrate layer (Soulis et al., 2017a;VanWoert et al., 2005). In terms of PRR rate, the influence of substrate depth is not robustly significant, especially after controlling for the time effect (Column 5-6 in Table 4). The non-significance is probably because the sample size for the PRR rate is limited (n = 333) and the consideration of the time effect leads to a large loss of degrees of freedom.

Effects of GR coverage area (Ai)
RR has a non-linear correlation with GR coverage area. Based on our sample, the correlations are positive when the roof area is lower than 57 m 2 and the correlations are reverse when the roof area exceeds 57 m 2 (Column 3 in Table 4). The positive correlation between RR rates and the GR coverage area is explainable. When the substrate is dry, the gap between the substrate and the inner wall of the small module was proportionally larger than in the larger modules. Rainfall may directly flow out of the module through the gaps, resulting in a lower rate of RR (Gong et al., 2019). However, the non-linear correlations between RR rates and GR coverage area are seldom reported in previous literature. The mechanism for the negative correlations between RR rates and GR coverage area when the area is larger than a threshold is elusive. Moreover, the non-linear correlations are not significant for PRR rates. The PRR effect displays strong linear correlations with the GR coverage area, which is consistent with previous findings (Hakimdavar et al., 2014;Gong et al., 2019). Specifically, for every 1% increase in the GR coverage area, the PRR increases by 0.04% (Column 6 in Table 4).

Effects of climate type (Ci)
Among the samples we collected, 77% of the observations were recorded in temperate climates (n = 1829 out of 2375). Observations for the other three types of climates, i.e., continental, dry and tropical, only accounts for 12%, 8%, and 3%, respectively. Our statistical analysis reveals that GRs in tropical climates, characterized by hot temperature and abundant rainfall distributed throughout the year or seasonally, outperform GRs located in dry, temperate, or continental climates as far as the RR rates are concerned (Column 3 in Table 4). However, keeping in mind that the observations for tropical climates are comparatively few, we are conservatively optimistic about the positive effect of tropical climates on GR's RR performance. The increased water stress (too much and too little water), as well as high temperatures in tropical climates, impart great challenges for GR design, such as plant selection and maintenance (Simmons, 2015).
Regarding the PRR rates, we found that GRs in continental climates, which often have a significant annual variation in temperature (i.e., hot summers and cold winters), outperform those in temperate climates (Column 5-7 in Table 4). GRs in continental climates have PRR rates 35% higher than those in temperate climates. The high performance of continental climate GRs may be attributable to the relatively high evaporation rates associated with the overall high radiation throughout the year combined with wet summers and dry winters. High evaporation helps to facilitate the hydraulic performance of the GRs.

Effects of vegetation types (Vi)
Out statistical analyses reveal some different or contrasting findings regarding the effects of vegetation on GRs' RR performance. Unlike the prior understanding that vegetated GRs typically have greater RR capacity than non-vegetated ones (Stovin et al., 2015;Kemp et al., 2019), our results indicate only minor advantages of vegetation. As shown in Column 3 of Table 4, RR rates of vegetated GRs planted with Sedum and grass are not significantly higher than those of the non-vegetated GRs. Only the vegetated GRs covered with mixed-species plantings retain significantly more rainfall than bare substrate at the 95% confidential level. As for the PRR rates, the advantages of vegetated GRs over non-vegetated ones show the opposite trend. GRs covered with Sedum and grass show lower PRR rates than those with a bare substrate. The PRR rates of the GRs covered with mixed-species plants and other types of vegetation show no significant difference from the PRR rates of non-vegetated GRs. The divergence of our findings from previous ones may result from the variances of the substrate conditions between vegetated and non-vegetated GRs. The substrate in vegetated GRs can have higher moisture content since the presence of vegetation requires irrigation in routine maintenance, which, however, leads to less water retention capacity. Such counterintuitive patterns deserve further investigation by future research.
Moreover, consistent with previous findings (Lundholm et al., 2010), our statistical results show that GRs planted with mixed-species vegetation exhibit greater RR and PRR rates than those planted with monocultures. The (P)RR rates of GRs with mix-species vegetative cover are approximately 10% higher than the (P)RR rates of GRs planted with monocultures of Sedum or grass (Column 3 & 6 in Table 4). A potential explanation is that multiple life forms contribute to temporal complementarity of growth phenology and water uptake, resulting in higher RR capacity than monocultures (Wolf and Lundholm, 2008).

Effects of season types (Si)
The seasonal variations, i.e., the intra-annual variations, of GR RR performance were statistically significant. The RR rates and the PRR rates increase by 13% (Column 3 in Table 4) and 14% (Column 6 in Table 4) respectively in summer compared to winter, and by 6% and 9% respectively in spring compared to winter. This is supported by higher evapotranspiration in spring and summer months, which facilitates recovery of GR retention capacity (Mentens et al., 2006;Villarreal et al., 2004). When subsequent rain falls, the dry soil can then absorb more rainfall and retain runoff better. The variations of the length of proceeding dry period in different seasons may also explain the results. In summer, the longer dry period after last rainfall and drier climate can jointly enhance the retention efficiency of GRs (Stovin et al., 2012). However, the dry season limits the availability of water, which requires supplemental irrigation of vegetated GRs (MacIvor et al., 2013). In such cases, the carrying capacity and the evaporation effect of GRs are limited since the irrigation keep the substrate moist.

The contingent effects between variables
The GR RR effects of some variables are contingent on the status of other factors. Such contingent effects are found between a number of roof design parameters and natural environment conditions, as shown in Table 5.
Notably, the elasticity of rainfall intensity to RR rates depends on whether the GRs are planted with vegetative cover. The negative effect that rainfall intensity exerts on the RR rates of GRs (− 0.247) will be mitigated in the case of vegetated GRs (− 0.247+0.108 = − 0.139). In other words, for GRs with a bare substrate, every 1% increase in rainfall intensity is associated with 0.25% of RR rate decrease; for vegetated GRs, every 1% increase in rainfall intensity is associated with 0.14% of RR rate decrease (Eq. (5) in Table 5). The variance is probably because evapotranspiration and water-holding capacity of plants enable vegetated GRs to be less elastic to rainfall intensity increase.
The RR effects of GRs are more elastic to the increase in substrate depth when in regard to mild rainfall events (Eq. (6) in Table 5). Specifically, for small and medium rainfall events (≤25 mm per rain event), the elasticity of RR rate to substrate depth is 0.111, meaning that every 1% increase in substrate depth is associated with 0.11% of the increase in RR rate. For large storms (>25 mm), the elasticity is only 0.042 (0.111-0.069 = 0.042), meaning that every 1% increase in substrate depth is associated with 0.04% of the increase in RR rate (Eq. (6) in Table 5). This is likely because the water-holding capacity of the soil is limited in large storm events, thereby shrinking the differences in the RR capacity between shallow and deep substrates. Thus, increasing GR substrate depth is a more effective measure towards managing small to medium rain events than for large rain events.
The RR effects of GRs are more elastic to rainfall intensity increase in intensive GRs compared to those in extensive GRs. As shown in Eq. (7) in Table 5, every 1% increase of rainfall intensity in extensive GRs is associated with 0.13% of the increase in RR rates and the same magnitude of rainfall intensity increase in intensive GRs is associated with 0.17% of the increase in RR rates. However, the mechanism explaining this finding warrants further investigation.

Discussion
Compiling existing experimental samples across the world, our analysis systematically and quantitatively analyzes the effects of GRs on RR. The study focuses on the effects of six factors that have been widely analyzed in the literature -rainfall intensity, depth of substrate layer, GR coverage area, climate type, season and vegetation type -on RR and PRR rates. The findings provide empirical parameters for macro-simulation and develop several implications for urban planning. Our review also indicates that GRs need to be further studied in Africa, Central America, and Central Asia, where fast urbanization is anticipated and thus could benefit greatly from the integration of green infrastructures into planning.
In our sample, the average value of the RR effect is 62.2%, lower than some countries' best practice guidelines issued in their national standards for green buildings. For example, according to China's "Assessment Standard for Green Building GB/T50378-2019 ′′ (MOHURD, 2019), the best practice guideline for green buildings' RR rate is more than 70%. Developing best practices and optimizing the design parameters may improve the RR performance of GRs. For example, increasing substrate depth in the reasonable range which the building and the maintenance cost can afford, as well as adapting the design parameters to local conditions, will facilitate the RR effect. Moreover, combining GRs with other green infrastructure alternatives is another avenue to improve the overall rain storm management performance (Cascone et al., 2019;Li et al., 2018). There have been some attempts at combining permeable pavement (Palermo et al., 2020), vegetative swales, and bio-retention cells (Joksimovic and Alam, 2014) with GRs, which significantly improve the RR performance.
Another key insight revealed by our analysis is that the determining factors of GR RR may interact with each other. The contingent effects identified in this study would provide further insight into the complexity of the realistic performance of GRs. Specifically, RR effects of nonvegetated GRs and intensive GRs are found to be more elastic to rainfall intensity increases than vegetated GRs and extensive GRs. RR effects of GRs are more elastic to substrate depth in small rain compared to those in large rain. These contingent effects warrant further investigation to develop design specifications according to local natural environments.
Also, it is worth noting that the findings of the statistical analyses on GRs' RR performance should be interpreted carefully. For example, it is found that bare substrates retain similar or even more rainfall than GRs vegetated with some species. This finding implies that the vegetation treatment for GRs may not bring additional retention capacity when measured by the RR rates. But it does not mean vegetation is valuelessprior studies indicate vegetation in GRs can offer multiple ecological benefits, such as mitigating the UHI effects and adjusting micro-climates (Oberndorfer et al., 2007;Manoli et al., 2019;Miller and Hutchins, 2017).
A limitation of this meta-analysis is that it focuses on the six key factors that have been widely studied and measured as determinants of GR RR performances in the existing literature. Some other factors, such as the length of antecedent dry weather period (ADWP) and GR age, are not examined in the main statistical analysis mainly because of the small sample size of the quantitative estimates available. Despite the limitations, this meta-analysis provides parameters and references for macrolevel modeling of GR installation and performance. Different from previous case studies that focus on a single area, this study provides a worldwide quantitative synthesis of the experimental data and identifies predictable patterns of GRs' effect on RR. Future macro-simulations based on these generalized parameters can facilitate policy decisions related to climate change adaptation and mitigation in terms of the rainstorm and urban flood management.

Conclusion
This study constitutes the first, to the best of our knowledge, global synthesis that quantifies predictable patterns of the RR performance of GRs with regard to heterogeneous factors. A dataset of 2375 original experimental measurements of the RR of GRs in 21 countries was consolidated from 75 internationally peer-reviewed studies published in 2005-2020. The global dataset show that few samples are avaibale from South America, Africa, Central America or Central Asia, which Table 5 Estimated contingent effects of two parameters on the RR rates of GRs.

Eq. (5)
Coefficient of P Notes: a. contingent effects on the PRR rates using the models above were not significant. b. One of the categories is treated as the reference, i.e., samples with the categorical value (e.g., non-vegetated) form the reference group whose coefficient is not explicitly estimated and equivalent to zero. The coefficient estimated for other categorical values (e.g., vegetated) indicates how that category compares to the reference category. c. statistical significance indicated by the p values *** p<0.01 ** p<0.05, * p<0.1.
highlights the need for additional research and sampling in these regions. Besides, RR rate of GRs in a single rain event is found to be 62% on average, which cannot meet some countries' green building requirements (e.g., the best practice guideline for green buildings in China, titled "Assessment Standard for Green Building GB/ T50378-2019 ′′ , requires at least 70% of RR rate). This indicates the need to further improve the RR performance of GRs or combine GRs with other RR measures for green buildings. Applying a statistical approach on the dataset, we systematically and consistently quantify the GR RR effect of main factors identified in prior studies: rainfall intensity, GR substrate depth, GR surface coverage, climate type, GR vegetation types and season types on RR performance. We reveal a non-linear negative correlation between RR and rainfall intensity, a linear positive correlation between RR and substrate depth, and an inversed U-shape correlation between RR and GR surface coverage area. Crucially, the influences of some of the factors are not isolated but contingent on the other factors. For example, the influences of rainfall intensity on the RR rates depends on the vegetative cover of GRs and the substrate depth magnitude. The influence of substrate depth on GR RR rates depends on rainfall magnitude as well. The contingent effects underscore a need of GRs optimization by not only leveraging the design parameters but also considering the local circumstances of weather and climate.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.