A novel application of remote sensing for modelling impacts of tree shading on water quality

Uncertainty in capturing the e ﬀ ects of riparian tree shade for assessment of algal growth rates and water tem- perature hinders the predictive capability of models applied for river basin management. Using photo-grammetry-derived tree canopy data, we quanti ﬁ ed hourly shade along the River Thames (UK) and used it to estimate the reduction in the amount of direct radiation reaching the water surface. In addition we tested the suitability of freely-available LIDAR data to map ground elevation. Following removal of buildings and objects other than trees from the LIDAR dataset, results revealed considerable di ﬀ erences between photogrammetry-and LIDAR-derived methods in variables including mean canopy height (10.5m and 4.0m respectively), per- centage occupancy of riparian zones by trees (45% and 16% respectively) and mid-summer fractional penetration of direct radiation (65% and 76% respectively). The generated data on daily direct radiation for 2010 were used as input to a river network water quality model (QUESTOR). Impacts of tree shading were assessed in terms of upper quartile levels, revealing substantial di ﬀ erences in indicators such as biochemical oxygen demand (BOD) (1.58 – 2.19mgL − 1 respectively) and water temperature (20.1 and 21.2°C respectively) between ‘ shaded ’ and ‘ non-shaded ’ radiation inputs. Whilst the di ﬀ erences in canopy height and extent derived by the two methods are appreciable they only make small di ﬀ erences to water quality in the Thames. However such dif-ferences may prove more critical in smaller rivers. We highlight the importance of accurate estimation of shading in water quality modelling and recommend use of high resolution remotely sensed spatial data to characterise riparian canopies. Our paper illustrates how it is now possible to make better reach scale estimates of shade and make aggregations of these for use at river basin scale. This will allow provision of more e ﬀ ective guidance for riparian management programmes than currently possible. This is important to support adaptation to future warming and maintenance of water quality standards.


Introduction
The influence that riparian vegetation exerts on river water temperatures and light availability by intercepting incoming solar radiation has long been studied (Davies-Colley and Rutherford, 2005;Greenberg et al., 2012;Moore et al., 2005;Webb et al., 2008). Shading is a key parameter due to the control it exerts over the amount of direct radiation reaching the river surface making it an important consideration in water quality modelling and management. Solar radiation has direct effects on rates of primary production of both macrophytes and algae (Bowes et al., , 2012bWood et al., 2012) which is important for river metabolic regime and is known to be influenced by riparian shade (Bernhardt et al., 2017). Water temperature also directly influences river fauna and dissolved oxygen concentrations. Therefore considerations of shading are of growing importance given the increasing stress on the water environment likely to arise under future climate. Effective and realistic riparian planting schemes to mitigate against these unwanted effects will become increasingly valuable and enhance water ecosystem services (Martin-Ortega et al., 2015). They will provide alternatives to traditional end-of-pipe solutions arising primarily from the EU Urban Wastewater Treatment Directive, which have been assessed through modelling (e.g. at large basin scale across Europe: Grizzetti et al., 2011). and tree height) and landscape characteristics (e.g. orientation, hill shade and channel width) (Li et al., 2012). Temporal effects are related to seasonal variation of canopy structure and sun position both during the day and over the year. Traditional methods for estimating reachaverage shade have relied on location and time-specific field measurements (e.g. hemispherical photography or clinometer) taken at a small number of points along a river stretch which may not be representative, requiring onerous manual surveying and computation to extrapolate the results to wider areas (Davies-Colley and Rutherford, 2005;Ghermandi et al., 2009). As a result, these methods fail to capture the spatio-temporal heterogeneity of shade and introduce high uncertainty and bias in the estimates of shade. Without access to extensive remotely captured data, Chen et al. (1998b) identified that spatiotemporal variation of riparian shade could not be represented adequately for the purposes of simulating stream temperature for two main reasons: (i) lack of access to basin-wide riparian information (i.e. data that captured the vegetation characteristics in the basin); and (ii) limited ability to compute dynamic shading (i.e. algorithms to account for the geometric relationships between the diurnal arc of the sun, stream latitude, location and orientation, and the height and extent of all vegetation objects). Therefore, there is a need for simple but quantitative methods for measuring riparian vegetation shading along stream reaches comprising the two above-mentioned main features (Bode et al., 2014;Chen et al., 1998bChen et al., , 1998aGreenberg et al., 2012;Li et al., 2012).
As an alternative to measuring riparian shade, estimating radiation with GIS-based solar models to develop daily time series of incoming solar radiation could be undertaken. However, this is not a practical solution due to the inability of the GIS models to account for highlychaotic and poorly-understood atmospheric conditions and processes, requiring the use of observed data to either parametrize the model or correct the output (Ruiz-Arias et al., 2009). Instead, for purposes of providing inputs for water quality models it is more effective to pursue efforts to model shade as a means of correcting observed radiation (Loicq et al., 2018;Wawrzyniak et al., 2017). Recent technological developments in data acquisition, greater capacity to handle large amounts of data and the development and widespread use of GIS systems, have created the opportunity to simulate the spatio-temporal variation of this important environmental parameter.
The challenge of obtaining information about spatio-temporal heterogeneity of shading was initially overcome using infrared aerial photographs and GIS technology to develop a model to simulate stream temperature at a catchment scale (Chen et al., 1998b(Chen et al., , 1998a. Their model calculated the shadow cast on the water surface by riparian vegetation and topography every hour based on latitude, stream orientation and tree height. Nevertheless the resolution of the aerial photographs that were used was relatively coarse (1:40,000 infrared aerial photography) compared to the resolution of remotely captured data available nowadays (e.g. LIDAR) and their method of data capture relied on manual digitisation and transfer (Chen et al., 1998a). More recently, LIDAR data have been used, in conjunction with GIS-based solar models to estimate the effect of vegetation-cast shade in incoming solar radiation in the US (Bode et al., 2014;Greenberg et al., 2012), in the UK (Johnson and Wilby, 2015) and in France (Wawrzyniak et al., 2017;Loicq et al., 2018). In this way, variation across large areas of landscape and on river water surface is captured at a high spatial and temporal resolution. These studies demonstrate the utility of the growing pool of LIDAR data to characterise vegetation cover (Anderson et al., 2006;Seavy et al., 2009;Slatton et al., 2007;Greenberg et al., 2012;Bode et al., 2014;Loicq et al., 2018) for a variety of ecological and forestry studies. When coupled with GIS tools, the capability of LIDAR data to capture the canopy structure offers great potential to provide the radiation inputs required of water quality models.
The last few decades have seen an increase in LIDAR surveys being commissioned for the production of terrain models, and thus are carried out during winter to minimise the interference of vegetation on the ground signal; this data is referred to as leaf-off LIDAR. This has led many authors to assess the fitness of leaf-off LIDAR data to capture vegetation structure for ecology and forestry studies (Brubaker et al., 2014;Gopalakrishnan et al., 2015;Parent and Volin, 2014;Tompalski et al., 2017;Wasser et al., 2013). Outcomes have been generally favourable but complications exist. Leaf-off LIDAR may misrepresent the canopy characteristics, and in addition the data tend to include any other objects on the ground at the time of capture. Consequently strong biases may be introduced, yielding an incorrect representation of the actual canopy structure.

Use of shade estimates in river eutrophication studies
Recent eutrophication research has identified light limitation, as induced by riparian shade, to be a very important moderator on the development of river algal blooms (Bowes et al., , 2012aHardenbicker et al., 2014;Waylett et al., 2013). Establishing riparian vegetation has been suggested as a more cost-effective means of preventing undesirable eutrophication impacts than reducing nutrient loads (Bowes et al., 2012a;Hutchins et al., 2010). However, for the purposes of managing eutrophication, establishment of riparian shade has traditionally been considered very much of secondary importance to the mitigation of nutrient inputs, and so modelling approaches to account for shade have tended to be rudimentary at best. For example, estimates of tree height and river width taken from phenology studies and aerial photography respectively (Halliday et al., 2016;Waylett et al., 2013) have been used to estimate height to width ratios used for calculating fractional penetration of radiation (Davies-Colley and Rutherford, 2005;DeWalle, 2010DeWalle, , 2008. The ratio, typically applied as a static value of the amount shade has been used in conjunction with estimates of occupancy based on satellite imagery or other land cover mapping products (Waylett et al., 2013). This pragmatic approach to estimating shade does not involve any additional computation, but is achieved at the expense of accuracy, since not taking into account the spatial and temporal heterogeneities introduces high uncertainty making its potential to provide management solutions limited.

Aims and objectives
The aim of this paper is to present and test a pragmatic method for reducing uncertainty in shade estimates to improve the utility of water quality modelling to inform management decisions. Developing such an approach is potentially very powerful as it circumvents the need for detailed field surveying of shade. The method quantifies average stream shading from nearby vegetation using high-resolution remotely-acquired data. The approach extends analysis to water quality beyond temperature simulation alone. The method was also tested on the River Thames as it has high levels of gross primary productivity due to long residence times and its water quality is known to be sensitive to shading (Bowes et al., , 2012b. The specific objectives of our study are to: 1) assess how well two high resolution elevation data products characterise riparian vegetation; 2) produce daily shade maps using those two datasets (section 3.2); 3) evaluate the consequences of using these two products for water quality modelling using the QUESTOR model on the river Thames Waylett et al., 2013) and comparing these outcomes to those previously generated using the method of Waylett et al. (2013) and an application which disregarded shading influence (section 3.3).
The study is novel in making these comparisons between elevation data products and assessing their impacts in water quality modelling scenarios.

Methodology
A detailed description of the methodology developed is provided in the supplementary information document and is summarised here:

Area of study
The Thames River Basin (Fig. 1) is situated in the south east of the United Kingdom and covers an area of 9948 km 2 to its tidal limit at Kingston-Upon-Thames (Marsh and Hannaford, 2008). It consists of a mixture of rural areas, primarily grassland, arable, and woodland in the west and south of the region, and urban areas (7%), dominated by Greater London but also including numerous other towns and cities. Woodland, predominantly broadleaved, comprises 16% of the basin area. The basin is underlain by two major aquifers, the Chalk and the Oolitic Limestones which provide the majority of public water supply (Bloomfield et al., 2011). The River Thames, the principal water course has a freshwater extent of 257 km, a mean flow of ca. 78 m 3 s −1 at the lowest gauge in the basin at Kingston-upon-Thames, and the mean annual rainfall is ∼750 mm (Marsh and Hannaford, 2008). Recent data have shown that despite major reductions in phosphorus concentrations since the late 1990s (Bowes et al., 2012b), the River Thames still suffers from accelerated phytoplankton growth particularly in the lower reaches and it has been suggested that light may be a major limiting factor in this freshwater ecosystem . Typically, nutrient concentrations are high and in recent years, always exceeding 1.4 and 0.09 mg L −1 nitrate-N and phosphorus respectively at Wallingford, for example (Bowes et al., 2012a).

Input datasets description
By processing, merging and re-sampling the data from all available surveys to give the best possible coverage, using the most recent data for areas flown in more than one survey, the EA have created National Tree Map™ (NTM) (Bluesky International Ltd, 2012) is a spatial database of the location, height and canopy extent of every single tree of height equal or higher than 3 m 1 in England and Wales. NTM consists of three layers displaying the location of the highest point of the tree and two polygon layers displaying the tree crown, both as captured and idealised as a circle ( Fig. 1 b).
Other datasets used in the analysis include: • A polygon defining the Thames river surface extracted from the OS MasterMap ® (MM) Topography Layer (OS MasterMap Topography Layer, 2015); from Thames headwaters to the tidal limit (at Kingston-upon-Thames).
• Hourly global radiation and daily sunshine duration observations from Little Rissington weather station (near Cheltenham, in Gloucestershire), spanning 2010-2014, were downloaded from BADC (Met Office, 2006a, 2006b. This station, part of the Met Office synoptic network, was selected due to its proximity to the River Thames. It was assumed to be representative of the riparian area analysed.

Riparian shading analysis
The processing to create the canopy surfaces, shade maps and subsequent zonal statistics was performed using ESRI site-package ArcPy, with all the tasks automated with Python 2.7. The meteorological data were processed using python for data analysis (PANDAS). The statistical analysis was performed using the python library Statsmodels.

Definition of riparian zone
The effectiveness of the riparian vegetation to shade streams depends on buffer width, canopy cover, height and density (Brazier and Brown, 1973;Steinblums et al., 1984;DeWalle, 2010). A literature review focusing on utility of riparian buffers for protection of fisheries and wildlife habitats of the Pacific Northwest region (US) (Christensen, 2000) found that riparian buffer widths ranging between 11 m and 46 m provide between 60 and 100% of shading. Therefore, we have defined the riparian zone as the 50 m area extending on each side of the river.

Canopy surface models of the Thames riparian zone
Three Canopy Surface Models (CSM) of the Thames riparian zone, all of 1 m resolution, have been used for the double purpose of calculating the amount of shade cast by riparian vegetation and then reflecting on the suitability of the two raw data sets (i.e. EA LIDAR elevation data and NTM) for this type of analysis.
(i) EA LIDAR DSM (ii) NTM CSM, made by triangulating the tree high points and actual crown polygons into a canopy height model (CHM) and adding this to the EA LIDAR DTM (iii) EA LIDAR DSM under NTM canopy (LIDAR UNTMC), made by extracting the EA LIDAR DSM values overlapping the NTM actual crown polygons. The CSMs were processed using ArcGIS spatial tools (see supplementary information document for details). Supplementary Fig. 2 shows samples of the three CSM for a small selected area.

Creation of riparian shade maps
Using each of the three CSMs as input to the ArcGIS "Area Solar Radiation" tool (Fu and Rich, 1999) three sets of daily "duration of direct radiation" maps at 10-day intervals for the period between 11 February and the 20 June were created. Those grids were used to estimate the daily number of hours (as the inverse of the duration of direct radiation) the river surface was in shade.

Estimation of daily percentage of shading for the Thames
Using the ArcGIS "Zonal Statistics as Table" tool, the "hours of direct radiation" values at the river surface (i.e. all cells of the daily shade map under the river polygon) were aggregated into a daily average for the River Thames. The daily maximum number of hours of direct radiation was also extracted (assumed equivalent to the "daylight length"). The daily hours of shade on the river surface were calculated by subtracting the average "hours of direct radiation" from the "daylight length". The "hours of direct radiation" as a proportion of "daylight length" is defined as the "fractional penetration (fp)". They were then expressed as a percentage of the daily budget. After the summer solstice (Julian Day 172), i.e. in summer and autumn, the day-to-day relative position and orientation of the sun and earth follow the same pattern followed in late winter-spring but in reverse. Hence the same surfaces have been used for the two sub-periods (pre-and post-summer solstice), extending the temporal coverage of the study. Finally, a record of daily shade for the Thames was created by incremental linear interpolation between the values modelled at 10 day intervals.
This analysis was undertaken for each of the three CSMs. It assumes that the CSMs are light-tight and the ground under the trees is in total shade. In reality, light would penetrate through the canopy depending on its structure and season, so these riparian shade estimates represent a 'best-case' scenario (Greenberg et al., 2012;Johnson and Wilby, 2015).

Estimation of total daily radiation reaching the Thames surface
Hourly global radiation observations (kJ), were converted to watts and summed to daily values. On the assumption that cloud cover is the dominant factor in determining the diffuse fraction (Muneer and Munawwar, 2006), the daily global radiation was disaggregated into direct and diffuse using a cloud cover factor (Robinson et al., 2017). To take account of the riparian shading effects of each of the three DSMs, for each day, the daily fraction of hours of direct radiation reaching the water surface (spatially averaged as calculated in Sec. 2.3.4) was multiplied by the total daily direct radiation. The daily amount of diffuse radiation was then added to the 'corrected' direct radiation to generate the radiation input to QUESTOR (Section 2.4).

Canopy occupancy and tree height and area
The percentage of occupancy for a 20 m riparian buffer and mean tree height of each CHM for the whole river were estimated using ArcGIS "Zonal Statistics" tool. The canopy occupancy estimates included the overhang portion of the canopy (i.e. portion of the CHM overlapping the river surface polygon). In addition, to gain a better insight on the distribution of individual tree height and area in both the NTM CHM and the LIDAR UNTMC CHM the extent and height of each individual tree in these two CHMs was extracted, using the NTM actual crown polygons as tree definition. Values for the LIDAR CHM were not reported as statistics on tree height taken from a CHM including objects other than trees would be misleading.

Impact of shading on water quality
To quantify the consequences of different daily estimates of riparian shade, as generated using the three CSMs described above, a model of river eutrophication was applied for 2010 for the stretch of the Thames downstream as far as Wallingford (Fig. 1). The model, QUESTOR, and its application to the River Thames is described elsewhere Waylett et al., 2013). It requires as input a daily dataset of global radiation incident at the water surface. The model calculates the effects radiation has on water temperature, phytoplankton biomass (chlorophyll concentration), nutrients and dissolved oxygen concentrations at a daily time-step. A set of equations describing the relationships between radiation and these parameters of water quality is provided the supplementary information document. Further explanation of how the model simulates light attenuation in the water column is given elsewhere (Hutchins, 2012). To assess the sensitivity of these river quality factors in response to varying levels of incident radiation due to the differing effect of the riparian shade cast by each CSM (Sec. 2.3.5) the QUESTOR model was run three times, each using a different global radiation time series. A further application was made assuming riparian tree shading to be completely absent. Results from an additional run are reported based on a typical pragmatic approach described in the Introduction, as adopted by Waylett et al. (2013) for the Thames. This approach assumed 20 m tree height and equal incidence of N-S and E-W trending river channel. The applications represent 27% tree occupancy as a "best" estimate used by Waylett et al. (2013). The rest of the parameters were the same for each run, each of which lasted 261 days commencing on 11th Feb 2010.
In order to assess the statistical significance of the effect of shade on water quality model outputs, differences between the un-shaded and shaded simulations were quantified and ordinary least-squares regression modelling undertaken (see supplementary for more information).

Results
The canopy characteristics and amount of river shading along the stretch between Hannington and Wallingford where the water quality impacts are considered (Table 1) is similar to that calculated for the entire river (Supplementary Table 1). The results of the data suitability analysis for the entire length of the Thames reveal differences arising from the three CSMs. A comparison of the impacts of these three canopy representations (Applications 1-3) with that arising from a hypothetical situation where a tree canopy is absent (Application 4) is made.
Shading is slightly more effective in the upper stretch of the River Thames. For the entire length fractional penetration is higher by 0.03, 0.001 and 0.015 for applications 1-3 respectively. Mean tree height for both the NTM and LIDAR CHM are slightly lower in the upper stretch of the Thames (by 0.46 and 0.38m respectively). However, the mean tree height for the LIDAR UNTMC CHM is, 1.2 m higher for this stretch than it is for the entire river.

Canopy height and occupancy estimates
Mean tree height, canopy area and canopy occupancy percentage yielded by each CHM for the total length of the Thames riparian zone are shown in Supplementary Table 1. Additionally, to gain an insight into both the NTM and LIDAR UNTMC canopy structure, individual tree height and canopy extent yielded by both CHMs were displayed and compared using boxplots (Fig. 2). The tree height of the NTM polygons has been plotted (NTM tree height, Fig. 2 a) alongside the tree height yield by the NTM CHM; the boxplots show that the NTM CHM elevation is very close to the elevation of the raw data used to generate it. The plots showed that tree height in both CHMs varied between 2 and 40 m; however, while inter-quartile range of the height was between 6 and 15 m for NTM, it was between 4 and 10.5 for LIDAR UNTMC (Fig. 2 a). The median for NTM, just above 10.22 m, is closer to the mean height of 10.55 m (Supplementary Table 1). The median height of the LUNTMC height estimates, though, is 2.4 m higher than the mean height.
For the vast majority of the two distributions (i.e. to the upper whiskers), the NTM CHM yielded larger tree canopy area with values spreading over a much wider range, (Fig. 2 b). This is consistent with the percentage of occupancies estimated for the total length of the Thames riparian zone (Supplementary Table 1). It is also an indication of the 'thinness' of the LIDAR UNTMC CHM due to the lack of leaves.

Fractional penetration estimated from each CSM
As expected (Table 1; see Supplementary Fig. 3), NTM provides the most effective canopy: a midsummer fp value of 0.655 corresponds to 5.6 h of shade. Given the large differences in mean tree height and canopy occupancy the differences in fp between the NTM and the two LIDAR derived CHMs appear relatively small (Table 1). Visual inspection of the daily shade maps (Fig. 5) showed that the spatial distribution of shade along the Thames is variable, with sections of the river passing through areas of sparse, low (probably shrub) riparian vegetation (e.g. Fig. 5 b). These riparian areas provide almost no daily hours of shade.

Water quality modelling
The sensitivity of water temperature, phytoplankton biomass, nutrient and dissolved oxygen concentrations in response to varying the levels of incident radiation (i.e. to mimic the effect of using different shading estimates) was demonstrated by QUESTOR outputs which were calculated for the Thames at Wallingford (Table 1). Differences in chlorophyll, both in terms of the 90 th percentile (levels indicative of elevated summer levels during blooms) and the number of days exceeding a trigger level indicative of accelerated eutrophic growth of 30 μg L −1 (Hutchins et al., 2010), are not large. In contrast the methods reveal bigger differences in the upper quartile (typical summer) levels of BOD and water temperature.
Despite tree coverage along the Thames being fairly sparse and the river channel being wide, incorporating effects of shade has an impact on the water quality. This is apparent when comparing the hypothetical absence of a tree canopy (Application number 4) with the results derived using NTM and LIDAR (Application numbers 1-3). Specifically, the consequences of considering shade is a reduction of up to 28% and 12% in BOD upper quartile and chlorophyll 90 th percentile respectively, up to 15% fewer days of elevated chlorophyll and up to 1.1°C reduction in upper quartile temperature. The differences are substantial, in particular in terms of the BOD criterion.
The dynamics apparent at a daily level are important to consider. Shading has the biggest impact on incoming radiation reaching the water surface in May and June, and correspondingly the largest differences in water temperature are seen at these times (Fig. 3). Similarly in May and June chlorophyll and BOD peaks are markedly lower when shade is considered. Chlorophyll and BOD follow similar patterns throughout the growing season (due to the linked diurnal cycles of photosynthesis and respiration). When shading is considered, primary productivity shows a delayed response to the development of favourable conditions (through the spring), being more active later in the summer. When evaluating the impact of shade on water quality at a day-by-day level it can be seen that all three water quality parameters become more substantially reduced under increasing shade (Fig. 4). All relations are significant, the strongest being for water temperature (Supplementary Table 2). This is unsurprising. In contrast, relationships between radiation and primary productivity are likely to be complex and less immediate. As concentrations of BOD and chlorophyll are often close to zero there are many days when no influence of shade is seen despite big differences in radiation (Fig. 4). Weekly water quality observations (Bowes et al., 2018 are generally insufficient to test the goodness of fit of the 90 th percentile model results over the time period of just 261 days. However the upper quartile water temperature at Wallingford was 19.4°C, closer to NTMderived estimate (Application number 1) than the other estimates including the previous "best" estimate from a previous study (Waylett et al., 2013) (Application number 5). Time-series model performance for water temperature are reported (Table 1), revealing good fits (NSE above 0.8) although there is slight but consistent estimation during the summer. In this context the Waylett et al. (2013) model performs favourably, but the NTM-derived model (Application number 1) performs best, demonstrating an appreciably better value for % error in mean than other applications. When shade is not considered (Application number 4) model performance is notably worse.
Ground-truth measurements of the tree canopy structure were unavailable. However, daily flow and weekly water quality observations (Bowes et al., 2018Hutchins et al., 2016) have permitted more general assessment of the skill of the QUESTOR model over longer periods of time. At Wallingford the values for percentage error in mean (PBIAS) for an independent period of validation of the QUESTOR model (2011-2012) were 7.9, −25.7 and 1.1 for water temperature, BOD and chlorophyll respectively. The NSE values for the study period for flow and water temperature, determinands that are not calibrated, are 0.975 (at a site 7 km upstream of Wallingford) and 0.815 (Table 1) respectively. All these values are deemed acceptable based on widely-adopted criteria (Moriasi et al., 2015).

Discussion
Relatively low percentages of shade are calculated along the River Thames. The implications of using both NTM and leaf-off LIDAR data are not a consequence of the methods, but a reflection of the amount and characteristics of riparian vegetation in the Thames, which is generally fairly limited and fairly low in height. The results displayed in Table 1 represent the outcome of averaging out the variation of the shade maps into a value for the whole length of the river. Aggregated summaries of the influence of shade are useful to support national-scale risk-based approaches to assess the vulnerability of water bodies (e.g. in terms of eutrophication: Charlton et al., 2017). Calculating whole-river values serves to generalise and smooth out the effects of the tree shade, since there are many areas along the Thames with sparse or no vegetation. Three specific aspects of the results are discussed in detail below.

Tree height and density derived from canopy height models
Differences in tree heights and occupancies derived from the three CHMs were apparent, due to the LIDAR data having been captured in winter when deciduous trees are leafless. The differences were as expected, despite their source data having been captured using different remote sensing techniques which could make for confounding factors. Recent publications (Brubaker et al., 2014;Parent and Volin, 2014;Wasser et al., 2013) have shown that for leaf-off LIDAR, tree height and cover are generally underestimated, particularly in deciduous canopies; however, the degree of underestimation can be highly dependent on the methods used to derive the CHM (i.e. LIDAR composite DSM). Nevertheless, the differences between the NTM and LIDAR UNTMC CHM heights were large (2-5 m when comparing the 25th, 50 th and 75th percentiles) (Fig. 2a). Such large differences in height have only been reported in the case of deciduous compound trees when comparing interpolated LIDAR CHM with field measurements (Wasser et al., 2013). Deciduous compound trees have much larger differences between their leaf-on and leaf-off canopies than deciduous simple trees. Riparian trees in the Thames are primarily broadleaf but by no means exclusively of deciduous compound type (National Forest Inventory England, 2014), therefore the differences between leaf-on and leaf-off may not be that large.
The magnitude of the differences arising between methods may be related to the circumstances of collection of the raw EA LIDAR dataset. No detailed metadata is provided alongside the LIDAR composite datasets to know how the survey DSMs were created although a method of interpolation was most likely. In addition, the EA LIDAR composite elevation datasets have been derived from multiple surveys, carried out in different years and not always during the same season (Environment Agency, 2016), using different collection parameters, such as flight height, pulse density or number of returns (Brubaker et al., 2014).
Furthermore, to create a continuous regional/national coverage at a set spatial resolution, data of different resolutions were resampled. Resampling adds another processing step consisting of interpolation, which may have further smoothed the CHM heights. Consequently, heights from the LIDAR-derived CHMs may considerably underestimate heights and extent of riparian tree canopies in the Thames (Fig. 5 e-h). Besides, the resampling done to assemble data of different resolutions may have accentuated the local variations in the final datasets due to the variations in the survey capture parameters, which became apparent in a visual analysis (e.g. by comparison of the LIDAR UNTMC CHM and the NTM CHS in Fig. 5 e and 5 g).
Aside from the differences in percentile levels, height differences are even larger when comparing the CHM mean heights (Table 1). The differences, of more than 6 m, easily exceed the underestimation arising from using leaf-off interpolated CHM in deciduous trees plots. This may be due to other data capture and processing issues, namely that EA LIDAR DSMs are produced from the last return point data (Orr and Lenane, 2012). In this respect, Wasser et al. (2013) found that in leaf-off conditions a large percentage of the last returns represent the ground rather than the vegetation. This would account for the large underestimates of tree height and failure to capture a large number of trees (lower canopy extent) in the LIDAR UNTMC compared to NTM, as revealed in the areal summaries (Fig. 2b). Moreover, the LIDAR UNTMC mean height for the whole riparian zone is 2.4 m lower than its individual tree height median (Fig. 2a), which indicates the presence of a very large number of vegetation objects of low height. The much lower percentage occupancy in both LIDAR derived CHMs compared to the NTM CHM (Table 1) provides further evidence that LIDAR missed a large number of trees/shrubs in the riparian zone. Differences in occupancy between both LIDAR-derived CHMs, as inferred by Fig. 2, can be attributed to the presence of other objects in the LIDAR DSM (e.g. buildings, piers, etc) within the riparian buffer (see Supplementary   Fig. 5. Snapshots of duration of direct radiation (shade maps) on the 172 Julian Day for NTM (a-d) and LIDAR UNTMC (e-h); the snapshots also include the corresponding CHM, NTM (a-d) and LIDAR UNTMC (e-h). Frames i-l show the difference between the NTM shade map and the LIDAR UNTMC shade map. The CHMs in frames a-h have been classified according to tree height interquartile values in Fig. 2 a. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) Fig. 2). While those objects can contribute to the blockage of direct radiation, their presence hampers any assessment of the specific impact of vegetation.
The choice of data processing method can alleviate some of the shortcomings of LIDAR; for example upper height percentile LIDAR estimates have been found to be very close to field measurements (Hawbaker et al., 2010;Wasser et al., 2013). However, it appears that EA LIDAR DSM is not particularly suited to estimate the riparian shade on the Thames, notwithstanding the fact that other LIDAR products available in UK could be beneficial to this type of analysis. In any case, our study has also revealed that it is important that available datasets, no matter their level of access, are accompanied by the necessary documentation (detailing provenance and content) for confident re-use.
In contrast, the method used to generate a gridded CHM from the NTM dataset has a very small impact on the tree heights i.e. reduced the mean max height by 0.2 m (Fig. 2 a). This, together with a coverage accuracy of more than 90%, provides confidence in the values. Furthermore, NTM has been used by the UK Forestry Commission to deliver assessments of tree cover outside large woodlands (Brewer et al., 2017) and for urban forest inventories (Handley and Doick, 2015). Therefore, NTM data is more suitable for types of studies such as ours.

Fractional penetration from canopy height models
As expected, all CSM have the highest fp values in the summer, when shadows are shorter. However, the impact of underestimating height and tree occupancy when using leaf-off EA LIDAR DSM is manifested when considering grids of direct radiation duration. EA LIDAR DSM gives a higher fp than the NTM by about 6% and 11% for the EA LIDAR DSM and LIDAR UNTMC data respectively ( Supplementary Fig. 3). The presence of other objects such us buildings in the LIDAR DSM would account for its lower fp.
Grids of daily duration of direct radiation show how the prevalence and extent of shading changes downstream along the river with variations in vegetation cover (tree height and occupancy), river orientation and river width (Fig. 5 a-d for NTM derived grids and 5 e-h for LIDAR UNTMC derived grids). Seasonal variations in the sun path are also important, and this governs the efficiency by which the riparian vegetation blocks direct radiation. Blocking effects are more substantial in spring and autumn when longer shadows means that a larger number of trees have a height sufficient to block direct radiation to the river surface. All CSMs show stretches with very sparse or absent vegetation; or stretches where the vegetation concentrates in the north bank of the river. Vegetation on north banks casts very little or no shade to the river which mostly flows from west to east.
The canopy occupancy within 20 m of the channel as derived from the leaf-off LIDAR UNTMC CSM is only one third of that from the NTM CSM (Supplementary Table 1). Given this large difference, and as only the trees very close to the river bank provide efficient shade during midsummer (e.g. Fig. 5b), it would be logical to have expected the difference in mid-summer fp to be greater than the 11% difference arising. Furthermore, over the entire Thames, differences in duration of direct radiation in mid-summer between each CSM are between one and two hours. However, close inspection shows that there are localised areas where there is consistently five or six hours difference (Fig. 5 i), demonstrating the efficiency of NTM CSM to capture vegetation.
Even though the NTM CHM is a closer representation of the true riparian vegetation, only that which is on the riverbank itself is effective in shading the water surface (Johnson and Wilby, 2015). It is visually apparent that the EA LIDAR DSM (and thus of the LIDAR UN-TMC's) signal concentrates along the river banks. In addition to this, visual comparison of both CHMs showed that the ability of the LIDAR UNTMC CHMs to capture riparian vegetation varies considerably throughout its spatial extent. Leaf-off EA LIDAR DSM data captured the riparian canopy to a high level of accuracy in some stretches (Fig. 5 g). A close look at the provenance of the surveys that make the EA LIDAR composite (LIDAR Composite Extents Coverage, 2017) showed that those stretches had been captured in spring 2003 (i.e. the survey took place after the leafing period had already started) at resolutions of 0.5 and 1 m. These demonstrate high variability in the ability of the EA LIDAR DSM to record vegetation, which, in turn, could partially explain why the fp of the EW LIDAR derived CHMs is not as high as expected.
During the summer months, the incoming radiation received during approximately 60% of daytime comes from solar angles varying between 30°and 60° (Johnson and Wilby, 2015). This means that for an E-W azimuth river the shade cast by a tree of average height (10.55 m as estimated using the Thames NTM CHM) would vary from about 17 m (at sunrise/sunset), to merely 5 m around mid-day. As the average width of the Thames is 42 m, a patch of trees of average height would not be enough to shade even half the width of the river. This highlights that the ratio between the tree height and the river width is a key indicator in determining the effectiveness of the canopy to block direct radiation (as identified by Davies-Colley and Rutherford, 2005;DeWalle, 2008DeWalle, , 2010. It could also explain why the difference in midsummer fp of the CHMs are not as big as might have been expected from the large differences in percentage of occupancy (Supplementary Table 1).

Suitability of spatial data products for characterising canopies
To summarise, we have tested the suitability of two spatial data products to capture the spatial variation of vegetation along the area of study. Differences in the accuracy of each CSM to capture the riparian vegetation characteristics and thereby to calculate shading effects is related to: (i) the time of survey (i.e. leaf-on/leaf-off capture), (ii) the accuracy of the technique for data capture and (iii) the post processing to produce elevation surfaces (although in the case of the LIDAR data product used in this study, it is already available as elevation surfaces).
Notably the processing of the EA LIDAR DSM data definitively underestimates height. Besides, EA LIDAR DSM data was captured for general purposes therefore, it contains any object on the surface in addition to trees. In this respect alone our analysis is very valuable as it has discriminated between vegetative and non-vegetative objects and indicated that shade from non-vegetative objects can locally be important especially in urban areas. Finally, freely available EALIDAR, was collected primarily for the purpose of mapping ground elevation and therefore normally undertaken in winter where trees have no leaves (leaf off LIDAR). Shade would be lower at this time of year. For these reasons these general purpose LIDAR data need to be used in combination with other datasets in order to be of use to characterise vegetation.

Water quality modelling
Simulations reveal that shading influences water quality in the Thames (Figs. 3 and 4). Water temperature is reduced. Relationships between radiation and primary productivity, as illustrated by BOD and chlorophyll, are more complex and delayed due to the importance of other factors such as river flow, nutrient supply and biological interactions. The model suggests that shading promotes a slower response in phytoplankton biomass with peak values reached in July rather than June when shading is not considered (Fig. 3). Previous work has suggested that mid-summer phytoplankton biomass is lower than might be expected through QUESTOR modelling  with blooms rarely seen when temperature increases above 19°C . Therefore the effects of shading could be more considerable than the significant ones suggested by the water quality modelling undertaken in the present study.
Although not large, differences in water quality impact arising from the choice of methodology and data sources for calculating riparian shade are apparent. The benefits of using the NTM are noticeable. The differences of simulated water quality for the River Thames at Wallingford in 2010 (Table 1) reflect the impact of different levels of light reaching the water column on eutrophication in the river. All five model runs used the same daily inputs of diffuse radiation (calculated as the difference between the global values and the direct component). As a considerable part of the global radiation comprises the diffuse component, this is likely to explain the somewhat limited level of variation seen between the summary water quality statistics arising from use of the different shade data sets.
The effects on water temperature during the summer appear considerable. For the three CSMs (Application numbers 1-3) the upper quartile values vary by over 0.5°C. The upper quartile temperature is estimated to be 21.2°C when shade is completely absent (4), 1.1°C more than the application using NTM canopy (Application number 1). These differences are substantial. The 90 th percentile air temperatures in the region are expected to increase by between 2.2 and 4.2°C by the 2050s . Use of NTM data (Application number 1) rather than LiDAR UNTMC (Application number 3) results in a better model performance when compared to weekly observations (Table 1).
Differences in chlorophyll levels do not seem to be large. It is likely this is because limiting conditions are reached periodically. The model output indicates that phosphorus becomes limiting in late-summer 2010. It becomes most limiting under conditions of least shade. The consequence of this is that some of the phytoplankton population crashes and is recycled as degradable carbon. This increases the BOD, which is reflected in the considerable differences in upper quartile BOD simulated under the various model runs.
The further consequence of this is likely to be a decrease in DO, but no substantial differences arising between methodologies are apparent in the mid-River Thames at Wallingford. Model calibration  suggested low rates of BOD decay upstream of Wallingford alongside relatively higher rates downstream. It is possible that the impacts on DO between the model runs may be contrasting further downstream, but this analysis is out of the scope of the present study.

Conclusions
We have presented a methodology to estimate tree height and canopy extent, thereby allowing the calculation of daily shade, the fractional penetration of radiation and its effect on global incoming radiation for water quality modelling purposes. This methodology includes the use of high resolution spatial data capable of capturing riparian canopy structure and a model that simulates the position of the sun across the sky for hourly or sub-hourly intervals to model the daily shade over the river surface. It also uses measurements of hourly radiation and daily sunshine duration, which are corrected to account for the shade effect in order to be input to the water quality model.
The results demonstrate: • Consideration of riparian shading is important for water quality simulation, as is demonstrated by the big differences arising when considering or not considering shade (Table 1).
• An increased level of confidence about riparian shade condition in terms of percent occupancy (proportion of channel length with trees) and fractional penetration (fraction of direct radiation reaching the river through the tree canopy) along the River Thames.
• Water quality impacts are sensitive to the level of shade as estimated using the two datasets. Calculations using EA LIDAR DSM are different to those using NTM (partly due to the seasonal coverage of the EA LIDAR DSM data).
• Assessing the impacts of using each dataset in turn gave improved insights into how changing the levels of riparian shade (by planting or felling trees) might affect the water quality and river ecology.
• Levels of riparian shading along the Thames are low and not very effective at reducing levels of direct radiation reaching the water surface.
• Although the modelling was limited in its use of daily shade data averaged along the whole river length, the high spatial and temporal resolution of the data presents considerable potential for pinpointing and evaluating very specific tree planting scenarios for water quality gains.
• This methodology allows design of more effective riparian shading strategies specifically to support water quality objectives based on location, orientation, effective cover, and distance from the river bank.
Shading is clearly an important parameter for understanding river water quality indicators related to eutrophication, such as chlorophyll and dissolved oxygen concentrations. The GIS-based method developed within this paper, utilising highly accurate tree data, provides (i) an innovative means to assess the impact of riparian shading, (ii) a valuable tool for estimating impacts of changes in future radiation levels due to climate change, and (iii) also the potential for targetted planting of riparian cover to minimise impacts of eutrophication. This will inform catchment management best-practise.