High correlation but high scale-dependent variance between satellite measured night lights and terrestrial exposure

Exposure to artificial light at night (ALAN) is a significant factor in ecological and epidemiological research. Although levels of exposure are frequently estimated from satellite-based measurements of upward radiance, and the correlation between upward radiance and zenith sky brightness is established, the correlation between upward radiance and the biologically relevant exposure to light experienced from all directions on the ground has not been investigated. Because ground-based exposure to ALAN can depend on local glare sources and atmospheric scattering, ecological and epidemiological studies using upward radiance have relied on an untested relationship. To establish the nature of the relationship between upward radiance and hemispherical scalar illuminance (SI) on the ground and to calibrate future experimental studies of ALAN, we used hemispheric digital photography to measure SI at 515 locations in coastal southern California, and compared those values to co-located satellite-based measures of upward radiance as described by the Visible Infrared Imaging Radiometer Suite (VIIRS) satellite’s Day-Night Band (DNB) sensor and zenith downwards radiance as estimated by the World Atlas of Artificial Night Sky Brightness (WA). We found significant variations in SI within the geographic scale defined by the resolutions of both the DNB and WA, as well as in both luminance and color correlated temperature (CCT) across individual image hemispheres. We observed up to two or more orders of magnitude in ALAN exposure within any given satellite-measured unit. Notwithstanding this variation, a linear model of log(SI) (log(SImodeled)), dependent only on the percent of the image hemisphere obscured by structures along the horizon (percent horizon) and log(WA) accounted for 76% of the variation in observed log(SI). DNB does not perform as well in alternative models and consequently future studies seeking to characterize the light environment should be built on WA data when the high temporal resolution of DNB measurements are not needed.


Introduction
Ecologists and epidemiologists conducting research on the effects of light pollution have used satellite measured night lights to investigate effects of artificial light at night (ALAN) on the biological world. This line of research has illustrated the role of ALAN in fields as varied as the epidemiology of breast and prostate cancers (Haim and Portnov 2013, James et al 2017, Rybnikova and Portnov 2017, environmental stressors (Rich and Longcore 2013, Hölker et al, Gaston et al 2014, the distributions of organisms as varied as mammals , sea turtles (Mazor et al 2013), and cacti (Correa-Cano et al 2018), as well as the landscape ecology of a wide variety of ecosystems (Witherington et al 2014, Bennie et al 2015b, de Freitas et al 2017. A common element for such studies is the use of satellite-based measurements of upwards radiance as a proxy for conditions on the ground. Such remote measurements are expected to broadly correlate with conditions on the ground (Kyba et al 2015, Zamorano et al 2015, Bennie et al 2015a, Katz and Levin 2016, either in terms of the direct radiant glare visible to an organism or the irradiance at a particular location. Given the ubiquity of lunar cycles in natural ecosystems, and their loss under the influence of ALAN , Puschnig et al 2014, models incorporating satellite-based datasets have been developed to compare the magnitude of lunar to artificial lighting (Román et al 2018). If ecological and epidemiological studies are to move beyond relative metrics, the relationship between satellite measures and ground conditions must be known, including at finer spatial scales than currently used by many remotely sensed data products.
Ground-based measurements of nighttime illumination have been developed for astronomical and ecological research, in particular through the use of hemispherical digital photography for extracting measurements of illumination, spectrum, and directionality in the light environment (Pendoley et al 2012, Thums et al 2016. The use of hemispheric digital photography in studying ALAN has expanded from research in astronomical light pollution (Luginbuhl et al 2009, Pendoley et al 2012, Duriscoe 2016, Jechow et al 2018 to include ecological light pollution (, Hänel et al 2018, Jechow et al 2018. Digital photography initially quantifies the lighting environment as an array of pixels, each with a set of radiance values within a set of spectral bands. While it is not straightforward to convert from raw camera data to measures such as luminance and illuminance (Sánchez de Miguel et al 2019), we used commercial image processing software, Sky Quality Camera (SQC; Euromix Ltd), to approximate such values from digital images. Although raw camera data can be used to estimate the Correlated Color Temperature (CCT) per pixel, tools are not currently available to generate a spectral power distribution. Notwithstanding limitations in extending human-weighted measures of the visual environment to other species, measurements in lux and the CCT are an imperfect but acceptable first order proxy for brightness and color as it pertains studying subsequent visual and nonvisual responses in a variety of organisms (Longcore et al 2018).
In this study we focused on the coastal environment of southern California, a region both designated as a biodiversity hotspot (Myers et al 2000, Calsbeek et al 2003, Gillespie et al 2018, and containing the 2nd largest urban agglomeration in the United States. As part of a study on ecological light pollution on the coast of southern California, we quantified the relationship between measures of ALAN derived from satellite-based data and conditions experienced on the ground as assessed from hemispheric photography. The resulting measurements provide both a robust quantification of the correlation between satellite-based and ground-based light pollution metrics used in ecological and epidemiological studies, as well as the variations in local illuminance.

Site selection
To select field sites that are representative of the levels of light pollution experienced along the coast of southern California (Ventura, Los Angeles, and Orange counties) we mapped upwards nighttime radiance recorded over the region using the annual composite of data captured by the VIIRS DNB sensor in 2015 (https://ngdc.noaa. gov/eog/viirs/download_dnb_composites.html). We selected DNB pixels, using QGIS 3.4 (QGIS Development Team 2016), with centroids within 250 m of the southern California coastline. To obtain a set of sample sites representative of the lighting conditions experienced within the study area we first log-transformed our distribution of DNB values (Note: in this manuscript we refer to log 10 as log). Dividing this distribution into six quantiles we then used the base R (v3.5.1 Core Team 2018) function sample to select 150 sites, 25 per quantile, representative of the distribution of upwards radiance values recorded along the coast. Given issues of physical accessibility this set of locations was further reduced to 103 ( figure 1(a)). Using the wilcox.test function in the R package stats we determined there was no significant difference in the distribution of upwards radiance values between the initial set of 150 sites and the remaining 103 accessible ones (p=0.66).

Field data
To quantify the spatial variability of illuminance within the average resolution of the pixels acquired from the 2015 annual composite VIIRS DNB data set, we acquired data at five locations within the 30 arcsecond wide bounding box centered on each site (Jing et al 2016). Going over the latitudinal range of our study, from 34.355 94 to 33.4051 degrees north, the dimensions of these bounding boxes vary from 765 m to 774 m east to west by 927 m north to south. This led to 515 images for subsequent analysis. All images were acquired on moonless nights after astronomical twilight. At each location, position was determined to within 9.0 m using mobile phone GPS (Korpilo et al 2017). All images were taken at each location using a Canon Tα camera with a Sigma 4.5 mm F −2 .8 EX DC circular fisheye lens using an ISO setting of 1600 and an aperture of f/2.8. Multiple exposures were taken at each location, with exposure times manually varied in order to maximize their duration while minimizing the level of image saturation according to the on-board camera histogram function. The camera/lens system was calibrated in the field, as well as paired with the custom software package SQC v1.9 (SQC), by Euromix Ltd in Ljubljana, Slovenia. To quantify the total exposure to light in the field we used SQC to calculate scalar illuminance (SI), which integrates night sky luminance across the entire hemisphere of the night sky using the following equation Where q represents the zenith angle, j the azimuthal angle, and q j L , ( )the night sky luminosity function. For each exposure, the camera was leveled with a 3-axis bubble level, and oriented with a compass, to standardize image orientation for analysis. As an additional assessment of the local lighting environment at each location, we took an average of 4 measurements of the brightness, in the negative logarithmic units of magnitude/arcsec 2 , of the zenithal 20 degrees of the sky using an Unihedron Sky Quality Meter (SQM) (Falchi 2011).

Data processing
For each location we used SQC to select the images which had both the longest exposure, as well as less than 0.1% saturated pixels, for downstream analysis. Images were then oriented, using SQC, with expected positions of the 30 brightest stars as reference points (figure 1(d)) along with the position and time of acquisition for each image. After reorientation we then determined the correlated color temperature (CCT) to the nearest Kelvin, SI to the nearest thousandth of a millilux (mlx) (figure 1(a)), the percent of the image hemisphere of the night sky obscured by clouds (percent clouds), and the percent of the image hemisphere of the night sky obscured by structures (including lights) along the horizon (percent horizon) (figure 1(d)). We then used SQC to determine image statistics, such as a calculated SQM, for images using the full sky hemisphere (zero horizon images) as well as those with structures along the horizon masked out and their pixel values removed from downstream calculations (horizon edited images).
To estimate values for percent horizon at locations where photographs were not taken, we generated a map of sky view factor (SVF) values, representing the fraction of the sky visible at a given location (Kidd and Chapman 2012), for our coastal study area using a digital elevation model. The digital elevation model, which was sampled at a 5 m resolution, was derived from the 2009-2011 NOAA-CA Coastal Conservancy Coastal Lidar Project. Using the UMEP plugin (Lindberg et al ) for QGIS 3.4 we calculated the SVF for each 5 m pixel in a coastal zone 3 km wide, extending 10 km north and south of our sample site locations.
Given the geographic coordinates of all 515 locations, we used QGIS 3.4 to extract the expected upwards radiance from the 2015 DNB map. Values of night sky luminance, as described by the WA (figure 1(b)) , were extracted using those same locations from a public site (lightpollutionmap.info). To enable analysis of the variability of luminance within the hemisphere of the sky we divided the hemisphere of the image into 40 equal area sectors from which we extracted luminance values (figure 1(c)). The sector boundaries were defined by 8 equal azimuthal divisions and 5 bands of elevation angles demarcated by the following range of angles (degrees) above the horizon: 0-11. 54, 11.54-23.58, 23.58-36.87, 36.87-53.13, and 53.13-90. From this schema we defined our horizon elevation band to be bounded between the angles of 0 and 11.54 degrees above the horizon, while our zenith elevation band was bounded between the angles of 51.13 and 90 degrees above the horizon.

Statistical analysis
To quantify the degree of spatial variability in log(SI) between locations within a site we calculated the coefficient of variation using the formula for log 10 transformed data ( Linear models of log(SI) were constructed using the lm function in the R package stats, along with various parameters such as percent horizon, log(DNB) as well as the log(WA) . The log of all three measurements of light were used following standard practice in studies of the brightness of skyglow (Kyba et al 2013. Data were collected under a variety of weather conditions so that any resulting model could potentially be applicable under conditions beyond cloudless nights. To test this assumption, we compared both measured values for our SQM readings and log(SI) extracted from our photographs against log(WA) using only our 173 cloudless images, 342 images with at least some cloud cover, and all 515 images (figures 2(a)-(d)). To test for the significance of differences in either the CCT or luminance between the zenith and horizon elevation bands of the night sky we used the function wilcox.test within the R packages stats.
To determine the contribution of each variable to observed variation in linear models of log(SI) we performed a 1,000 permutation PERMANOVA using the adonis function in the R package vegan (Oksanen et al 2013). The relative importance of any remaining variables in linear models were then assessed using the function calc.relimp within the R package relaimpo (Grömping 2015). We checked the distribution of the residuals for models of log(SI) (figure S1 is available online at stacks.iop.org/ERC/2/021006/mmedia). Given our large sample size (515 data points), we could not assume that a violation of the normality assumption for the model residuals could be reliably used to assess linear models of log(SI) (Ghasemi and Zahediasl 2012). Independent of sample size (Lumley et al 2002), however, the linear models of log(SI) satisfied the requirement of constant variance (figure S1). The validity of our models of log(SI) were then tested using a 10-fold cross-validation, with 100 repeats, using the train function within the R package caret (Kuhn 2019). Further analysis of correlations between various measures of the lighting environment, along with the physical environment, were calculated and visualized as correlograms using the R package corrplot (Wei et al 2017).

Modeling of log(SI)
We initially constructed linear models of log(SI) which incorporated log(WA), along with variables associated with the time, location, and atmospheric conditions with the potential to influence the optical properties of the local air column (tables 1-2). We found log(WA), percent horizon, percent clouds, and air temperature all contributed significantly to the variation observed in linear models of log(SI), although most of the variation in the models was only associated with log(WA) and percent horizon (tables 1-2). No significant contributions were observed due to the day of sampling, relative humidity, or the angle of the Sun below the horizon (tables 1-2). We  observed a shift in the relationships between our two measurements of the night sky, SQM and log(SI), and log (WA), as well as between log(SI modeled ) and log(SI), under both cloudless and cloudy conditions ( figure 2). However, the magnitude of the influence of cloud cover is minimal (figure 2) when compared to spatial variation in these measurements (figure 5). Because both the air temperature and percent clouds are far more transient variables than either log(WA) or percent horizon, and their removal results only in a slight reduction in the explanatory power of our linear models of log(SI) (  3). We found, after using a 10-fold cross-validation using either zero or horizon edited images, that log(SI modeled ) could still account for 76% of the observed variation in measured values for log(SI). Most of the observed variation in log(SI modeled ) could be accounted for by variations in log(WA) (table 4).
For both zero and edited horizon images, log(SI) had the strongest correlation with mean SQM measurements (r=−0.87, p<10 −4 ), our one other direct measurement of nighttime sky brightness. For our two independent measures of the nighttime sky, log(WA) was more strongly correlated with log(SI) than with log(DNB) (figure 4). Models that incorporated log(DNB) instead of log(WA) had a reduced explanatory power for the observed variation in log(SI) (tables S1-2).
Using the GIS-derived SVF values at sample locations in place of measured percent horizon, in addition to log(WA), we constructed another linear model of log(SI) (log(SI SVF )) with the following formula: We found this model could account for a large portion of the observed variation in log(SI) (r=0.83, p<10 −4 ). This model could still account for 69% of the observed variance in log(SI) after using a 10-fold  cross-validation using either zero or horizon edited images and provides a mechanism to calculate ground-level light exposure solely from satellite-derived data and readily available topographic data.

Hemispheric variations in the light profile of the night sky
Using zero horizon images we observed significant differences in both the luminance and CCT (p<10 −4 ) between the zenith and horizon elevation bands of the night sky (figure 3). For zero horizon images the log-ratio of mean horizon and zenith luminance values was approximately 0.33, while for the ratio of CCT values between the horizon and zenith was 0.83. For edited horizon images these ratios were 0.32 for log(luminance) and 1.45 for CCT (figure 3). When we analyzed edited horizon images, we were effectively removing relatively dark structures, such as buildings and hills, as well as sources of direct glare, such as streetlights. With or without editing out structures along the horizon we found evidence that direct glare is a large, but not sole, contributor to the difference in log(luminance) and CCT between the horizon and zenith elevation bands of the night sky. In comparing the hemispheric CV on log(luminance) with its mean we found more luminous skies tended to be more anisotropic in the angular distributions of their luminosity, with the strength of this relationship similar between zero and edited horizon images ( figure S2). This again suggests luminous skies tend to be unevenly luminous, but that their level of luminosity is associated with that of the horizon with or without direct glare sources. Similar to comparisons between the horizon and zenith elevation bands, the horizon band of the night sky is far more luminous than the sky as a whole. For zero horizon images the log-ratio of mean horizon band luminance value to that of the entire night sky hemisphere was approximately 0.12, while for edited horizon images this log-ratio is 0.11. The zenith of the night sky is far dimmer, with the log-ratio of mean zenith luminance to that of the entire night sky hemisphere was approximately −0.21 for zero horizon images and −0.20 for edited horizon ones. Further investigations of the relationship between the luminosity and the angle of elevation for bands of the sky illustrate a general decline in the luminosity going from the horizon to the zenith ( figure S3). This pattern is also reflected in stronger observed correlations between log(SI) and the luminosity within the horizon rather than the zenith elevation band of the sky (figure 4). As log(SI) involves a hemispheric integration of light across the entire night sky, we also found it to be more strongly correlated to the luminance of individual elevation bands of the sky as compared to log(DNB) or log(WA) ( figure 4). Spatial variability of SI within sites Log(SI) was highly variable within the spatial resolution of the individual VIIRS DNB pixels which defined our sites (figure S4), with a mean spatial coefficient of variation of 0.48 (using either zero or edited horizon images). This spatial variability was correlated with measures of brightness at each location (figure 5), as well as with percent horizon. Most of this variation was found to be associated with the log of the luminance in the horizon band of the sky and percent horizon, corresponding with the presence of direct glare sources, with no significant correlation found with the log of the luminance in the zenith of the sky ( figure 4). This indicates that within the urban to suburban environments we sampled, changes in the lighting environment are mostly driven by changes in sources near the horizon, rather than variations in near-zenith skyglow.

Discussion
Most of the variation in the illumination experienced at our suburban to urban coastal sites could be accounted for using a simple linear model composed of log(WA) and percent horizon. This model appears robust, even accounting for significant within-site variations in illuminance or within-hemisphere variations in luminance and CCT.
Numerous studies have been published on the effects of ALAN to a variety of species, acquired both through controlled experimental manipulations (  Our results support, at least in coastal environments, that satellite-based measurements of ALAN are useful proxies for assessing total light exposure as described by SI. The implication is that the role of ALAN in both epidemiology and landscape ecology can be reasonably be modeled when direct measurements of total exposure are not feasible. It should be noted though, that while the WA performs significantly better than the VIIRS DNB in modelling SI, the spatial resolution of both data sets smooths over the large and highly localized variations observed in the coastal lighting environment.

Modelling log(SI)
Illuminance experienced in the field, as described by log(SI), was strongly correlated with remote measures of upwards radiance as described by log(DNB), or composite measures utilizing remotely sensed measures of upwards radiance such as log(WA). However, log(WA) was found to describe more of the observed variation in log(SI) than log(DNB). This is not surprising as the WA incorporates the DNB, zenith sky brightness as measured by SQMs, and an optical model of the atmosphere (Falchi et al 2016). We also observed strong correlations between log(WA) and our SQM measurements, reflecting recent observations in the near-shore environment (Ges et al 2018). What we were surprised by, given prior observations of the role of cloud cover in reflecting upward radiated light (Kyba et al 2011, Bará 2016, Jechow et al 2017, is the small role of the percent of the sky covered by clouds in affecting the observed variation in either SI or SQM measurements against the range of night sky luminance values expected using the WA over the extent of our study area (figures 2(a)-(d)). This pattern carried over to our observed relationships between log(SI modeled ) and log(SI) (figures 2(e), (f)), as well as comparatively low contributions to observed variations in linear models of log(SI) associated with the percent horizon (tables 1-2, S1-2). We observed that cloud cover tended to increase the luminance of the night sky zenith, but as most of luminance in our hemispheric images appears to be driven by variations in lighting near the horizon we then find variations cloud cover only make a relatively small contribution to our observed variations in SI. This result may be a function of the range of lighting conditions in the study, which included a much wider range of conditions (8-8, 900 mlux SI) than other studies investigating the influence of clouds on illumination (e.g., 3-25 mlux; Jechow et al 2017). Our modeling of log(SI) was further improved through the additional incorporation of the fraction of the image hemisphere covered by structures and lights along the horizon. While structures along the horizon can obscure direct illumination (Luginbuhl et al 2009, we found log(SI) tended to increase with percent horizon. This is most likely due to those structures either acting as sources of direct illumination, or as reflectors of artificial light sources (Cabello and Kirschbaum 2001, Chalkias et al 2006, Brons et al 2008, Kocifaj 2008, Butt 2012. We did find our model which utilized GIS-derived SVF could account for almost as much of the observed variation in log(SI) as those which used ground-derived percent horizon. This indicates that there is potential in future research involving the role of ecological light pollution across a landscape for estimating total exposure to light at night using models which can be completely derived from remotely sensed measurements such as SVF and WA. We constructed our model across a large range of WA brightness values ( m 374 cd m 2 to 12 mcd m 2 ). However, this model may be less applicable in darker environments approaching natural sky brightness ( m 174 cd m 2 )  where the horizon may block light rather than reflecting it. In such instances, the percent horizon or sky view factor will decrease SI rather than increase it. Analysis of sites along a gradient that includes even darker environments than those we sampled would likely discover an inflection point at which the percent horizon switched from increasing over illuminance to decreasing it. A similar inflection point exists for clouds, which transition from reflecting light to shielding from light at a point along a gradient away from lighted areas.
Hemispheric variations in the light profile of the night sky Because of its ground-based nature and proximity within the study area, outdoor lighting created conditions where the horizon is far more luminous than the zenith of the night sky. This result, whereby most of the illumination experienced in the comes from sources near the horizon, has been reported in other urban and peri-urban environments . This indicates that the total exposure to light pollution at our coastal sites is primarily driven by sources which are not necessarily close enough to make a location appear bright from the perspective of remote sensing platforms. We propose, at least in coastal environments, that the contribution of near-horizon illumination will dominate scalar illuminance in all but the darkest night sky environments. These dark conditions did not exist within our study area.
We also found evidence that the luminance of the horizon, rather than the zenith, could explain a larger portion of the observed variation in the luminance of the full night sky ( figure 4). This hemispheric anisotropy also appears to be reflected in the significantly weaker correlations observed between log(DNB) or log(WA) and the luminance of either the zenith or horizon elevation bands and the sky, as compared to log(SI). This reinforces the importance of using a measure of illuminance which integrates light over the entire hemisphere of the sky, including the luminous near-horizon environment, as a metric for the total exposure to ambient artificial light levels.

Spatial variability of SI within sites
The lighting environment was highly heterogeneous within the spatial resolution of the WA (∼770 m by 927 m). This appears to be especially true in environments with a greater spatial variation in illumination with the spatial coefficient of variation of log(SI) within sites positively correlated with both percent horizon and log(SI), using either zero or edited horizon images (figure 4). This indicates that although log(WA) was strongly correlated with log(SI) in general, the resolution of the WA smooths over areas with a high degree of spatial heterogeneity in SI. While further refinements in modeling skyglow may be provided by the development of a city emission function (Kocifaj et al 2019) for coastal southern California our data suggest large and highly localized variations in SI, such as those associated with sources of direct glare, will still need to be considered. The importance of even relatively small scales of habitat heterogeneity on coastal communities (Hartnoll and Hawkins 1980, Underwood and Chapman 1996, Meager et al 2011, and the role of light pollution as an environmental stressor in such communities (Salmon 2003, Bird et al 2004, points towards the need to consider SI at finely resolved spatial scales (Garratt et al 2019).

Conclusion
Our results validate the use of remotely sensed nighttime lights data in ecology and epidemiology as a first-order proxy for ground-based total artificial light exposure, at least in suburban and urban coastal environments. We created a simple linear model of log(SI) relying only on the WA, a freely available data set (see www. lightpollutionmap.info), and percent horizon, which although determined via analysis of hemispheric images may potentially be inferred from SVF values calculated remotely sensed elevation data sets (Zakšek et al 2011, Kidd and Chapman 2012). This result suggests the potential to extrapolate within the geographic range of our model to estimate total exposure to ALAN within similar environments.
While we observed a strong correlation between log(WA) and log(SI), our model comes with some caveats. First, SI varies significantly within the scale defined by the spatial resolution of the WA. The spatial resolution of the WA obscures the picture of light exposure in environments with a high degree of spatial heterogeneity in SI. Given prior observations on the influence of ALAN in reshaping the connectivity of landscapes (Stone et al 2009, Azam et al 2016, Bliss-Ketchum et al 2016, this may ultimately limit the use of data at this spatial resolution as habitat variables in species distribution modeling. Second, artificial light sources create conditions where the horizon is far more luminous than the zenith of the night sky, even if they are distant enough to not contribute substantially to the upwards radiance remotely measured for that location. With the entire night sky hemisphere contributing to SI this will often lead to locations to experience higher levels of nighttime illumination than what would be expected from the WA alone. Third, the VIIRS DNB sensor is not sensitive to light <500 nm (Miller et al 2013). With the transition from lighting sources such as sodium vapor bulbs to LEDs the spectrum of ALAN is currently shifting to more blue light (Bará 2013, Luginbuhl et al 2014, Barentine et al 2018, and with it an increase in the potential for models which incorporate DNB data, such as the WA, to underreport total visible exposure. Ultimately, we find the WA can be used to make usable predictions of exposure to ALAN on the ground for ecological and epidemiological studies, and which could be further improved with the incorporation of additional satellite-based data sets with finer spatial and spectral resolutions.