Green roof seasonal variation: comparison of the hydrologic behavior of a thick and a thin extensive system in New York City

Green roofs have been utilized for urban stormwater management due to their ability to capture rainwater locally. Studies of the most common type, extensive green roofs, have demonstrated that green roofs can retain significant amounts of stormwater, but have also shown variation in seasonal performance. The purpose of this study is to determine how time of year impacts the hydrologic performance of extensive green roofs considering the covariates of antecedent dry weather period (ADWP), potential evapotranspiration (ET0) and storm event size. To do this, nearly four years of monitoring data from two full-scale extensive green roofs (with differing substrate depths of 100 mm and 31 mm) are analyzed. The annual performance is then modeled using a common empirical relationship between rainfall and green roof runoff, with the addition of Julian day in one approach, ET0 in another, and both ADWP and ET0 in a third approach. Together the monitoring and modeling results confirm that stormwater retention is highest in warmer months, the green roofs retain more rainfall with longer ADWPs, and the seasonal variations in behavior are more pronounced for the roof with the thinner media than the roof with the deeper media. Overall, the ability of seasonal accounting to improve stormwater retention modeling is demonstrated; modification of the empirical model to include ADWP, and ET0 improves the model R2 from 0.944 to 0.975 for the thinner roof, and from 0.866 to 0.870 for the deeper roof. Furthermore, estimating the runoff with the empirical approach was shown to be more accurate then using a water balance model, with model R2 of 0.944 and 0.866 compared to 0.975 and 0.866 for the thinner and deeper roof, respectively. This finding is attributed to the difficulty of accurately parameterizing the water balance model.


Introduction
Practitioners, policymakers, and researchers have been investigating the use of green infrastructure to reduce the damaging environmental effects of excess stormwater runoff in urban environments (US EPA 2004). One significant opportunity for green infrastructure implementation is on rooftops, which account for approximately 40%-50% of the impermeable urban surface (Stovin et al 2012). Many cities, including New York City, which have set benchmarks for local stormwater capture are incentivizing the retrofit of existing buildings with lightweight, extensive green roofs Fowler 2008, NYC Department of Buildings 2008). Extensive green roofs require minimal maintenance and consist of several layers, including a vegetation layer (most often in the genus Sedum), a 30-150 mm deep substrate layer, and a drainage course.
The ability of an extensive green roof to prevent stormwater runoff depends on the amount of stormwater it can retain during a rain event, which, in turn, depends on its ability to release stored water between rain events. Green roof stormwater retention has been shown to vary with climate, storm size, vegetation type, and season (Mentens et al 2006, Lundholm et al 2010, Voyde et al 2010. Green roof hydrological performance is usually assessed as the percent of rainfall captured over a defined period, and can be predicted fairly well using a number of approaches, summarized as: empirical relationships between rainfall and runoff derived from field observations, referred to here as characteristic runoff equations (CREs) (Mentens et al 2006, Schroll et al 2011 or curve numbers Rasmussen 2006, Carter andJackson 2007), process-based water balance models (Berthier et al 2011, Zhang and Guo 2013, Vanuytrecht et al 2014, and software such as SWMM (Khader and Montalto 2008, Burszta-Adamiak and Mrowiec 2013) and HYDRUS 1D (Hilten et al 2008). Empirical relationships between rainfall and runoff have the advantage of being well-calibrated and simple to use. However, such relationships combine covariates of climate, season, vegetation, and system type making them difficult to generalize. Water balance models and software solutions offer greater predictive capacity, including the ability to account for climatic conditions that lead up to each individual rainfall event. Nonetheless, these models are more complicated to implement and require parameters that are often difficult to measure, such as leaf area index (LAI), plant wilting point, and roof depression storage (Carson et al 2015).
Green roof stormwater retention performance has been observed to be consistently higher in the warmer months of the year (Mentens et al 2006), largely accounted for by higher potential evapotranspiration (ET 0 ) . This variability is commonly presented in discrete time periods, often corresponding to seasons. A summary of performance variability reported in prior studies that were undertaken in the northern hemisphere on extensive Sedum roofs is provided in table 1, where the greatest percent retention in each study is marked by the darkest shade of red. Despite spanning many different climates within the northern hemisphere, general trends are similar, although the reported magnitude of stormwater retention differs between studies.
While the studies reported in table 1, and others, clearly show a seasonal trend in green roof hydrologic performance, detailed understanding of the primary factors driving the seasonal influence remains lacking. Most investigators include the caveat that the seasonal trends they observe might be masked by the influence of storm event size distribution and the length of antecedent dry weather periods (ADWPs) within each season of their study period (Stovin et al 2012, Carson et al 2013, Wong and Jim 2014. These confounding variables have been individually examined, with Carson et al (2013) finding the seasonal effect to be strongest in storms from 10 to 20 mm in depth, and Poë et al (2015) finding the seasonal impact on storage created by ET to be more apparent for events with shorter ADWPs. However, Wong and Jim (2014) were unable to find a significant link between stormwater retention and ADWP.
The purpose of this study is to improve understanding of the factors that contribute to seasonable variability of extensive green roof hydrological performance by (1) providing a context for how ADWP, storm size, and ET 0 relate individually to green roof hydrological performance, and (2) using these variables in adaptive empirical models to determine the significance of their combined effect. To do this, rainfall, runoff, and environmental data were collected over a period of nearly four-years from two full-scale extensive Sedum green roofs located in New Table 1. Stormwater retention by season expressed in percent of rainfall retained as reported by Stovin et al (2012), Uhl and Schiedt (2008), Kaufmann (1999), Liesecke (2002), and Carson et al (2013).
York City, one 31 mm deep and the other 100 mm deep. The data are then used to identify 503 individual storm events for which event size, rainfall capture, ET 0 , and ADWP were determined. The seasonal performance is first evaluated considering the individual factors with a series of exceedance probability (EI) plots. Then a comparative modeling approach is used to determine the significance of Rf, ET 0 , and ADWP in predicting runoff depth through the development and evaluation of four different models.

Green roof sites and instrumentation
Monitoring took place in Manhattan, New York, in USDA plant hardiness zone 7B, on two separate extensive green roofs, referred to herein as W118 and USPS. W118 is a Columbia University residence located at 423 West 118th Street (40°48′28″, −73°57′ 34″) that was outfitted with a Xero Flor America XF301+2FL vegetated mat in 2007 (Carson et al 2013). The roof is approximately 65 m above mean sea level and has a monitored watershed area of 310 m 2 . The growing media is expanded shale, with a water storage capacity of 37%, saturated hydraulic conductivity of 0.021 cm s -1 , and relatively thin depth of 32 mm . The vegetated area is 53%, populated with succulent plant species including: Saxifraga granulata, Sedum acre, Sedum album, Sedum ellacombianum, Sedum hybridum 'Czars Gold', Sedum oregonum, Sedum pulchellum, Sedum reflexum, Sedum sexangulare, Sedum spurium var. coccineum, and Sedum stenopetalum.
Each roof has an Onset Hobo U30 weather station that is instrumented to measure environmental conditions in addition to the water entering the drains and gutters, referred to as runoff. Environmental conditions on each roof are monitored with a THB-M002 2-bit air temperature/relative humidity sensor, an LIB-M003 solar radiation sensor, an S-WCA-M003 wind speed sensor, an S-SMC-M005 EC-5 soil moisture sensor, and an S-RGB-M002 tipping bucket rain gauge. Runoff is measured with custom built and calibrated in-drain V-notch weirs, which continuously measure the flow rate into the drain from watershed areas of 310 m 2 and 390 m 2 , for W118 and USPS, respectively (Carson et al 2013). A full description of the instrumentation set-up, calibration and monitoring protocols are provided in Culligan et al (2014).

Monitoring data
The continuous runoff monitoring data is processed into discrete storm events using the common criteria that individual storms must be separated by a period of 6 h with no rainfall or, in this case, runoff (Washington State Department of Ecology 2008, Technology Acceptance and Reciprocity Partnership 2001). Once storms are discretized, events that are considered unsuitable for analysis are removed. The following are considered unsuitable events and make a storm unusable for analysis: The peak runoff rate exceeds the limit of the monitoring device (Type 1). Precipitation is in the form of snow or the air temperature is below freezing (Type 2). The cumulative rainfall is less then the runoff (Type 3). Power to the ultrasonic sensor is interrupted (Type 4). Further details on the instrumentation, parsing, and quality of the monitoring data are available in Carson et al (2013). The data record for W118 runs from 6/29/2011 to 4/7/2015, with 256 usable storms, while the data record for USPS runs from 6/17/2011 to 4/15/ 2015 with 247 usable storms. A summary of the storms separated by season, with winter (December-February), spring (March-May), summer (June-August), and fall (September-November), is provided in table 2. W118 had 55 unusable storms (20 Type 1, 10 Type 2, 24 Type 3, 1 Type 4) and USPS had 57 unusable storms (0 Type 1, 17 Type 2, 29 Type 3, 11 Type 4). The procedure for maintaining data quality results in a generally lower number of usable winter events, due to snow, freezing air temperature and cold weather related equipment failures.
The storm event size, runoff depth, ADWP and ET 0 are determined for each suitable event. Event size (Rf ) is the total depth of rain per unit rooftop area and is determined from the tipping bucket data. Runoff (Ro) depth is the height of runoff generated per unit rooftop area per storm and is calculated using 5 min flow rate data from the weirs. The ADWP is taken to be the dry period leading up to the rain event as recorded by the tipping bucket, given in days. To capture the variation in the potential of the system to expel water, the potential or reference evapotranspiration (ET 0 ) is calculated using the well-documented and simple to use Hargreaves and Samani equation (Hargreaves et al 1985): where ET 0 is the reference ET (mm d -1 ), RA is extraterrestrial radiation (mm equivalent per day) calculated using the day of year and location latitude as described in Allen et al (1998), TC is average daily temperature (°C), and TD is the daily temperature range (°C). TC and TD are obtained from Belvedere Castle weather station in Central Park, NYC, which is maintained by the NOAA National Climatic Data Center (ncdc.noaa.gov).

Comparative modeling approach
The significance of Rf, ET 0 , and ADWP in predicting runoff depth, Ro, is explored using a comparative modeling approach where predicted Ro is compared with observed values of Ro for four different models.
The first (base) model is the empirical CRE, derived from observations between Rf and Ro for each green roof. CREs have been used by many investigators where C 1 , C 2 , and C 3 , are empirical fitting coefficients. For the second model, the CRE is modified to include Julian Day (JD), via: Julian Day embodies many of the climatic factors controlling seasonal runoff performance, and is thus a simple means of capturing green roof performance variability for a particular climate region. To determine f(JD), a genetic programing (GP) symbolic regression algorithm (Schmidt and Lipson 2009) is used. The GP algorithm generates a population of models and uses stochastic methods to 'evolve' models according to a set of rules (Koza 1992), resulting in equations that best fit the data (as defined by R 2 ) at several levels of complexity. Similarly, the GP algorithm is used to determine the form of the model functions, f, and coefficients in following equations (4)-(6).
For the third model, the CRE is modified to include a function ET 0 : Equation (4) allows the modified CRE to reflect overall seasonal change, using ET 0 as a surrogate for season that is more generalizable than JD.
For the fourth model, the CRE is modified to include ET 0 , ADWP, Rf, and their combinations, as described by: Equation (5) allows the modified CRE to reflect overall seasonal change, f(ET 0 ), seasonal change with storm size, f(ET 0 , Rf ), and seasonal change with the antecedent dry period, f(ET 0 , ADWP). For the fifth model, a simplified reservoir equation (SRE) is evaluated, taking the form: where Kc is the crop coefficient, a unit-less coefficient that accounts for a plants' ability to dispel water and S max is the green roof's maximum available water storage in mm.
In equation (6), Ro is predicted as Rf minus the available storage in the substrate, unless the available storage is larger then the rainfall-in which case no runoff is generated. The available storage, limited to S max , is estimated using the product of the dry period before the storm (ADWP in days), the potential evapotranspiration (ETo in mm d -1 ), and the crop coefficient (Kc). For both of the roofs that are the subject of this study, Kc and S max are determined by a best fit of predicted Ro to observed Ro.
The accuracy of each model is quantified using recommended statistics for hydrologic model performance; namely, the Nash-Sutcliffe efficiency index (NSE), root mean square error (RMSE) to the standard deviation of measured data (RSR), and percent bias (PBIAS) (Moriasi et al 2007. The NSE index ranges between 1 and −1, where 1 represents perfect equivalency and negative values indicate the model is less accurate than the mean value of the observed values (Nash and Sutcliffe 1970). RSR ranges from 0 to a large positive value, where a low value represents a lower RMSE, and better model performance. The magnitude of PBIAS represents how biased the model is, with a positive value representing over-prediction and a negative value representing under-prediction. According to Moriasi et al (2007), a watershed model is satisfactory if NSE>0.5, RSR£0.7 and PBIAS is within ±25%.

Results and discussion
3.1. Observed Seasonal behavior Figures 2-5 present event EP plots, where the EP for each event is calculated as rank/(n+1), for runoff depth separated by season, ET 0 , ADWP and storm size, respectively. The general weather patterns during this study period were similar to a 40 year historical period, with most monthly averages for storm size, ADWP, and ET 0 from the study period falling within the 1st and 3rd quartile of a 40-year historical period. An in depth analysis and comparison of these factors during both the study and a 40 year period spanning from 1971 to 2010 is available in appendix A.
As seen in figure 1, both roofs are more likely to generate higher stormwater runoff values for a given storm size in the winter, which agrees with the findings of Uhl and Schiedt (2008), Kaufmann (1999), Liesecke (2002), and Carson et al (2013). Additionally, the thinner W118 extensive green roof demonstrates greater seasonal variability in runoff performance than the deeper USPS roof, with lower runoff depths being much more likely to be exceeded on W118 in the winter than the summer. For example, runoff depths of 1 mm have an EP difference of 0.28 between summer and winter on W118, and an EP difference of 0.16 between summer and winter on USPS. The behavior of each season is statistically unique except for spring and summer for USPS, as demonstrated in the statistical analysis presented in appendix B. In terms of annual performance, the deeper USPS substrate layer is less likely to exceed nearly all runoff depths compared to W118.
The contribution to performance that would vary with season is presumed to be storage created by evapotranspiration, while the ability of the substrate to detain water is expected to be consistent year round (considering days below freezing are removed from the data set). As USPS has a much deeper substrate, the generally higher retention and more consistent performance could be explained by the substrate's detention capacity having a greater contribution to stormwater retention than storage created by ET. Figure 2 shows the data separated by ET 0 groupings of 0<ET 0 <2, 2<ET 0 <4, and ET 0 >4 mm/day with the behavior of each group being statistically different (see appendix B). While these groupings roughly correspond to the winter, spring/fall, and summer trends, respectively, they more specifically define the environmental conditions surrounding the storm event. As seen, for both roofs, grouping by ET 0 (figure 2) shows a clearer separation in hydrologic performance than grouping by season (figure 1). This is especially evident in USPS, where figure 2(B) shows distinctly separate ET 0 group behavior over the full range of runoff depths, while figure 1(B) showed less distinct seasonal trends, with summer, spring and fall closely tracking one another. This increased distinction can be attributed to ET 0 being more physically defined than season, specifically, the ET 0 >4 category separating only the highest ET 0 events of summer.
For extremely small runoff depths (0.01-0.1 mm), both the medium (2<ET 0 <4) and high (ET 0 >4) ET 0 conditions exhibit the same probability of exceedance for W118. Extremely small runoff depths on W118 were associated with rainfall events that did not saturate the roof. Thus, it is possible that runoff during such events is generated by flow that occurs along preferential pathways in the thin W118 system, and that such flow is largely independent of antecedent  moisture conditions when the roof is relatively dry. Nonetheless, it is important to note there are relatively few runoff depth data points within the range 0.01-0.1 mm for W118, and that the smaller watershed area of W118, in comparison to USPS, make the weir-based measurement system less accurate for extremely small runoff depths on W118. Hence, confirmation of this particular trend really requires further data. Figure 3 shows the seasonal behavior in runoff separated by short (<2 day) and long (>2 day) ADWPs. For both extensive green roofs, seasonal differences are more apparent with longer ADWPs. For example, the runoff behavior in the winter is statistically different from the other months only in the >2 day plots. Winter also has the pronounced segregation in the >2 day plots, with 1 mm runoff depths having an EP difference of 0.25 and 0.29 between summer and winter for W118 and USPS, respectively, while short ADWPs have corresponding differences of 0.18 and 0.01, respectively.
The greater seasonal variation with longer ADWPs can be explained by the greater storage created by ET given a longer dry period before a storm. As ET 0 is highly variable with season, one would expect events with longer ADWPs to have more pronounced seasonal variation. While Poë et al (2015) found the seasonal impact on storage created by ET to be more apparent in events with shorter ADWPs, the results presented here suggest that the seasonal variation in runoff increases with ADWP. Additionally, the finding here differs from Wong and Jim (2014) who found no significant link between runoff and ADWP. Figure 4 illustrates the seasonal percent rainfall retention for W118 and USPS separated by storm events smaller and larger than 10 mm. Both roofs retain 100% of the rainfall in most smaller storms (<10 mm) and show a range of behavior in larger storms (>10 mm). As in figures 1-3, figure 4 shows a more pronounced seasonal variation in W118, particularly for events larger then 10 mm. This agrees with Carson et al (2013), who found the seasonal impact to be strongest in storms from 10 to 20 mm in depth, and can be explained by the ability of the green roof to fully retain most small storms regardless of the seasonal climate. Alternatively, USPS has no apparent difference in seasonal performance with storm size. This can potentially be explained by the greater contribution of substrate detention to stormwater capture, as discussed above. For events larger than 10 mm, all four seasons were statistically different for  Table 3. Various equations used to model runoff (mm) where Rf is storm event size (mm) JD is Julian day, ET 0 is potential evapotranspiration (mm d -1 ), and ADWP is the antecedent dry weather period (days).

Equation Form
Runoff W118 compared to only fall being statistically different for USPS.

Modeling results
The fitting results of the genetic programing algorithm are provided in table 3 for the base CRE (2), CRE modified to include Julian Day (3), CRE modified to include ET 0 (4), CRE modified to include ET 0 , ADWP, and Rf (5) and the SRE (6), along with measures of accuracy.
All of the models' statistics fall well within the satisfactory levels set (NSE>0.5, RSR0.7 and PBIAS within ±25%) (Moriasi et al 2007), however, there are notable differences in their behavior and accuracy. For both roofs, the SREs have the least accurate performance followed the CREs. Both modifications of the CRE increase accuracy, with the CRE modified to include Julian Day most accurate for W118 and the CRE modified to include ET 0 and ADWP most accurate for USPS.
For both roofs, the CRE is able to produce an adequate prediction, shown below in figure 5. As runoff is only estimated using rainfall, the model performance appears as a line.
The CRE modified to incorporate Julian Day (3) has improved model statistics in W118 (R 2 improved from 0.945 to 0.980) and in USPS, (R 2 improved from 0.866 to 0.868). In both cases, the seasonal variance takes the form of a sine curve with a period of approximately 1 year (351 days for W118 and 365.3 days for USPS) and the least runoff is generated around midsummer (July 17th for W118 and June 30th for USPS). The model, illustrated in figure 6, reveals the amplitude of the seasonal variance in W118 to be nearly four times that of USPS (sine coefficient of 5 for W118 and 1.3 for USPS).  When modified to include ET 0 (4), the model performs with greater accuracy than the base CRE and is close to the Julian Day model. The model behavior is shown below in figure 7.
The CRE modified to include ET 0 and ADWP (5) performs better than using ET 0 alone and is on par with the Julian Day model. Similarly, the seasonal variance and runoff generated is found to be greater in W118, as shown in figure 8.
As a well-fitting and most generally applicable model, the behavior of equation (5) over the course of the year is further illustrated in figure 9.
All coefficients in equation (5) associated with the ET 0 term are found to be negative, resulting in both roofs generating the least runoff when the potential evapotranspiration is highest, as seen in figure 9. Furthermore, the coefficients for all terms including ET 0 are higher for W118, resulting in greater variation throughout the year. W118 is found to have a negative coefficient associated with the ET 0 * Rf term, causing the seasonal variation in runoff to decrease with event size. Alternatively, USPS is found to have an insignificant ET 0 * Rf term, resulting in the seasonal variation in runoff to not change with event size. This behavior confirms observations in figure 5 and is evident in figure 9; the amplitude of seasonal change is reduced in smaller storms in W118 while remaining consistent with event size in USPS. Both roofs are found to have a negative coefficient associated with the ADWP * ET 0 term, resulting in less runoff generated in events with longer antecedent dry periods. The impact of ADWP is found to be similar in both roofs, with coefficients of 0.110 and 0.106 for W118 and USPS, respectively. The 0, 3, and 10 day  ADWP lines in figure 9 illustrate this impact on stormwater retention throughout the year. In colder months ADWP plays a less significant role, with the modeled difference between the 0 and 10 day ADWP smallest in January and February.
The SRE (figure 10), while being the most physically rational, has the worst model performance considering the NSE and RSR for both roofs in addition to R 2 and PBIAS for USPS. The fitted crop coefficients (Kc) are found to be 0.66 for W118 and 0.58 for USPS, which generally agree with previously reported crop coefficients as determined using an energy balance model (Kc of 0.52; Olivieri et al 2013), weighing lysimeter (Kc of 0.53; Sherrard Jr and Jacobs 2012), and the FAO-24 method (Kc of 0.85-1.01; Voyde 2011). The fitted maximum storage depth (S max ) are found to be 11.4 mm for W118 and 17.1 mm for USPS, which while sensible, are smaller then the S max reported by roofs (17 mm for W118 and 52 mm for USPS). The inability of the SRE to accurately estimate individual storms is attributed to the difficulty of accurately parameterizing the water balance model. Primarily, runoff can occur before the substrate is fully saturated, as discussed in section 3.1, however this behavior is discounted in the simple reservoir equation.

Conclusions
Using runoff observations from 503 storms across two extensive green roofs located in New York City, one 31 mm and one 100 mm deep, individual factors of ADWP, storm event size, and potential  evapotranspiration (ET 0 ) are examined with season. Modeling green roof runoff with these factors confirms several initial observations; stormwater retention was the greatest in warmer months, the deeper roof shows less seasonal variation, roofs retain more rainfall with longer ADWPs, and the thinner green roof system has greater seasonal variation with storm size. Predictive equations are developed for both roofs, with 98% and 87% of the variance in measured runoff accounted for in the thin and deep roof, respectively. This study has shown that full-scale extensive green roofs vary in their ability to retain stormwater throughout the year, and that including seasonal factors can improve runoff model accuracy. While this improvement is clear in the thinner green roof (W118), it is not as significant on the deeper system (USPS).
A limitation of this study is the lack of physical basis for the modified CRE and the inability of the climatic factors to fully represent hydrologic processes. While in reality, the ET 0 of the entire ADWP impacts the amount of storage created through evapotranspiration, only the ET 0 of the event day is used here. Additionally, the plants ability to transpire water (Kc) is expected to vary throughout the year, which is not accounted here. Finally, ADWP does not take into account the extent that the previous storm saturated the substrate, which would impact the amount of available storage and runoff performance accordingly. Further study and modeling work should extend to include these considerations in addition to variables of climate, slope, and plant type. Carson and Daniel Marasco gratefully acknowledge the support of the NSF Integrative Graduate Education and Research Training (IGERT) Fellowship #DGE-0903597. Furthermore, Robert Elliott gratefully acknowledges the support of the NSF Graduate Research Fellowship Program #DGE-11-44155. Any opinions, findings, and conclusions expressed in this paper are those of the author and not meant to represent the views of any supporting institution.

Appendix A. Historical context of monitoring period
In order better understand the context of the climate conditions under which this study occurred, the data from the study period is compared to historical data for storm size, antecedent dry weather (ADWP), and potential evapotranspiration (ET 0 ). The historic data was obtained from the Belvedere Castle weather station in Central Park, NYC, for a 40 year period spanning from 1971 to 2010. The occurrence of each variable was observed by month, using box plots to compare the median values observed in the study period to the minimum, first quartile, median, third quartile, and maximum values observed in the historical period. Figure A1 presents storm size, ET 0 , and ADWP data, respectively, with boxplots showing the historical range for the years 1971-2010, and points showing the median values for the study period. For all climatic factors, the observed study period medians generally fall within the range of the 1st and 3rd quartile of historical data, which is considered acceptable for the purpose of generalizing some of the results of this study.
As seen in figure A1(A), there is not a strong correlation between storm size and month of the year in NYC, although the median storm size is slightly lower in the spring and summer months than other times of the year. An exceptional deviation in storm size between monitoring period trends and the historical data is noted in February. This deviation is explained by the event suitability criteria used for the monitoring period, which involved ignoring storms with snow or freezing air conditions. Figure A1(B) reveals the typical antecedent dry weather period to be shorter in the spring and summer months for both the historical data and the study period. The median ADWP for the study period is greater than the historical median for every month except July. This bias is likely due either to the methods used to separate storms; the historical data required a 6 h inter-event period with only no rainfall, while the study data required a 6 h inter-event period with no rainfall or runoff, effectively reducing the amount of very low ADWP events in the study period dataset. Figure A1(C) shows the potential evapotranspiration to vary greatly with season, with the warmer months having the greatest ET 0 . For most months the study period median falls closely to the historical monthly median. The abnormally low ET 0 in the study period for February and July can be explained by the deviations in temperature. While the average temperatures are close for the two data sets, the daily temperature fluctuations are typically larger in the historical period, with average fluctuations of 7.5°C (historical) and 6.4°C (study) in January and average fluctuations of 9.0°C (historical) and 8.3°C (study) in July.

Appendix B. Statistical analysis B1. Event EI plots
To test the uniqueness of the trends within each event EP plot, multiple analyses of covariance (ANCOVA) were run. The ANCOVA was set up using the probability of exceedance as the dependent variable, the runoff depth or ETo as the covariate, and the season or ETo grouping, respectively, as the categories being compared.
To meet the assumptions of linearity and normal distribution, the runoff depth data was truncated and log transformed. The ANCOVA was then run setting the level of significance as p>0.05. The corresponding plots, data range, and ANCOVA results are provided in the summary tables below. Table B1 shows that, when the W118 data grouped by season, the runoff behavior for each category is statistically different. Furthermore, when the same analysis is applied to USPS fall and winter are unique, while spring and summer are not statistically different. Table B2 shows that when the data are grouped by ET 0 ranges, the runoff behavior for each category is statistically different for both W118 and USPS. Table B3 shows the difference in seasonal groups when separated into long and short ADWPs. For W118, winter and spring are not statistically different for the shorter dry weather periods (<2 days) while all seasons are statistically different for the longer dry weather period (>2 days). For USPS winter and summer are not statistically different for the shorter dry weather period while only winter is shown to be statistically different for the longer dry weather period. Table B4 shows the difference in seasonal groups when separated by storm size. For W118 only winter is statistically different for the smaller storms while all seasons are statistically different for the larger storms. Alternatively, for USPS only summer is statistically different for the smaller storms while only fall is statistically different for the larger storms.

B2. Runoff-rainfall plots for W118 and USPS
To determine if the two roofs have distinct runoff behavior, an ANCOVA was performed with each roof as a categorical input and rainfall as the covariate. Both Figure A1. Storm size (A), ADWP (B) and Potential evapotranspiration (C) distribution for the study period compared to historical records. Table B1. Summary of ANCOVA set-up and results for the event exceedance probability for runoff depth separated by winter (December-February), spring (March-May), summer and fall (September-November), (α=0.05).

Data set
Truncation ANCOVA grouping results

Roof
Corresponding  Table B2. Summary of ANCOVA set-up and results for the event exceedance probability for runoff depth separated by ET 0 (0-2, 2-4, 4+ mm d -1 ) (α=.05). the runoff and rainfall were log-transformed to satisfy the assumption of normal distribution. Additionally, only points with runoff greater than 2 mm were used to meet the assumptions of linearity and similar slopes. The results of ANCOVA revealed that the two roofs have significantly different behavior, with a p-value of 4.95E-05 that falls within the 95% confidence level (α=.05). Table B3. Summary of ANCOVA set-up and results for the event exceedance probability for runoff depth in events with different dry weather periods and separated by season (winter (December-February), spring (March-May), summer (June-August) and fall (September-November)) for W118 and USPS events with short (<2 day) and long (>2 day) antecedent dry weather periods (ADWPs) (α=.05).  Table B4. Summary of ANCOVA set-up and results for the event exceedance probability for % rainfall retention separated by season (winter (December-February), spring (March-May), summer (June-August) and fall (September-November)) for W118 and USPS in small (<10 mm) (A1 and B1) and large (>10 mm) (A2 and B2) storm event sizes (α= 0.05).

Data set
Data Set Truncation ANCOVA grouping results

Roof Event Size
Corresponding figure