Exposure measurement error in air pollution studies: A framework for assessing shared, multiplicative measurement error in ensemble learning estimates of nitrogen oxides

Background: Increasingly ensemble learning-based spatiotemporal models are being used to estimate residential air pollution exposures in epidemiological studies. While these machine learning models typically have improved performance, they suffer from exposure measurement error that is inherent in all models. Our objective is to develop a framework to formally assess shared, multiplicative measurement error (SMME) in our previously published three-stage, ensemble learning-based nitrogen oxides (NOx) model to identify its spatial and temporal patterns and predictors. Methods: By treating the ensembles as an external dosimetry system, we quantified shared and unshared, multiplicative and additive (SUMA) measurement error components in our exposure model. We used generalized additive models (GAMs) with a smooth term for location to identify geographic locations with significantly elevated SMME and explain their spatial and temporal determinants. Results: We found evidence of significant shared and unshared multiplicative error (p < 0.0001) in our ensemble-learning based spatiotemporal NOx model predictions. Unshared multiplicative error was 26 times larger than SMME. We observed significant geographic (p < 0.0001) and temporal variation in SMME with the majority (43%) of predictions with elevated SMME occurring in the earliest time-period (1992–2000). Densely populated urban prediction regions with complex air pollution sources generally exhibited highest odds of elevated SMME. Conclusions: We developed a novel statistical framework to formally evaluate the magnitude and drivers of SMME in ensemble learning-based exposure models. Our framework can be used to inform building future improved exposure models.


Introduction
Exposure to traffic-related air pollution (TRAP) has repeatedly been associated with mortality and adverse health outcomes, including respiratory illnesses and cardiovascular disease, in large epidemiological cohort studies of children and adults (Zhang et al., 2002;Andersen et al., 2008;Gehring et al., 2010;Esposito et al., 2014;Ryan et al., 2005;Nordling et al., 2008;Chen et al., 2015;Rancière et al., 2017;Pollution HEIPotHEoT-RA, 2010). Nitrogen oxides (NO X ), which are byproducts of fuel combustion, are one of the most commonly used measures of TRAP in epidemiological studies. NO X are also precursor gases involved in the secondary formation of ozone and particulate matter -air pollutants also implicated in adversely affecting health (Rancière et al., 2017;Goldsmith and Kobzik, 1999;Khreis et al., 2017;Schwela, 2000). NO x 's highly reactive nature results in dynamic variability in space and time (Apte et al., 2017), limiting the utility of traditional exposure assessment methods that rely solely on interpolation from sparse central site monitoring data or land use regression techniques, which typically suffer from poor spatial and temporal resolution, respectively (Sheppard et al., 2012). Similarly, crude spatially-derived surrogates of TRAP such as distance to roads or traffic density within buffers often covary in space with potential confounders such as socioeconomic status, access to health care, or other environmental and psychosocial exposures (Pollution HEIPotHEoT-RA, 2010). Therefore, sophisticated spatiotemporal exposure models that incorporate machine learning techniques are increasingly being developed to more accurately predict residential TRAP exposures (and other complex spatially and temporally varying exposures) (Li et al., 2017;Russo and Soares, 2014;Di et al., 2016), given that 'gold standard' personal monitoring to capture 'true exposure' is often not feasible in large cohort studies. However, spatial and temporal uncertainties inherent in these exposure models result in a complex correlation structure which leads to error in exposure predictions, referred to as exposure measurement error. These errors can be categorized as independent (unshared) or dependent (shared).
Shared error can occur because of shared uncertainties in exposure predictions due to spatial and/or temporal misalignment of exposure predictors. For example, temperature is often included in spatio-temporal NO x exposure models. But temperature may not be available at the same spatial resolution as predictions, resulting in NO x measurement error due to inaccuracies associated with readings of temperature from a single instrument applied to all prediction points in a defined spatiotemporal grid. Shared Berkson error occurs if all or groups of prediction points within the defined spatiotemporal grid are misrepresented in the same way. Shared classical measurement error can occur when the average temperature across space or time is not the true average of all prediction points included in the defined spatio-temporal grid. Both scenarios violate the independence assumption of exposure (true and measured, respectively) and error. Shared error can be both classical-like or Berkson like (Mallick et al., 2002) and results from spatial and/or temporal covariance between exposure predictions.
Recently, our group developed a sophisticated three-stage spatio-temporal modeling framework with ensemble learning and constrained optimization to model NO X concentrations in southern California for use in epidemiological studies of children's health (Li et al., 2017). In addition to a typical single stage model where a spatiotemporal mixed effects model is fit, a second stage with ensemble learning using bootstrap aggregation is employed. This machine learning technique combines the output from hundreds of individual learners in a weighted fashion and results in decreased variance in the predictions (higher precision). Constrained optimization is then applied in a third stage to adjust predictions to better reflect reality based on known physical and chemical constraints, improving overall accuracy and decreasing bias in the NO x exposure estimates. We have already demonstrated the improved performance of our modeling framework in predicting NO x exposures in southern California (R 2 : 0.86, RMSE: 13.4) (Li et al., 2017); however, we have not yet assessed the uncertainties inherent in these exposure predictions.
In the current work, we aim to formally evaluate the magnitude of shared and unshared, multiplicative and additive (SUMA) measurement error components in our Li et al. (2017) southern CA NO x model (1992-2013) predictions using a statistical dosimetry framework developed by Stram and Kopecky (2003). We expand by providing a framework to explain the geographic and temporal determinants of the shared multiplicative measurement error (SMME) component.

Methods
This investigation will use NO x exposure predictions for the most recent cohort (E) of the southern California Children's Health Study (CHS) (Chen et al., 2015;Peters et al., 1999) which started enrolling participants in 2002 with prenatal periods starting in 1992. Information from longitudinal address confirmation, residential history questionnaires and birth certificates was used to assemble lifetime residential histories for these participants and assign biweekly NO x exposure based on our model (Li et al., 2017). TRAP exposures were assigned to CHS participants across their lifetime using the novel machine learning spatiotemporal NO x model described in more detail in Li et al. (2017) to estimate residential NO x exposures at high spatio-temporal resolution (Li et al., 2017). Briefly, the model uses a flexible hierarchical framework with spatiotemporally-referenced covariates and measurement data from both long-term routine monitoring stations with high temporal resolution and short-term, sporadic measurement campaigns with high spatial resolution. Temporal basis functions are fit on the long-term monitoring data using singular value decomposition to capture seasonality and longer term temporal variation (Szpiro et al., 2010). Stage 1 of the model uses temporal parameters, long term mean concentrations, and local spatial predictors including line dispersion CALINE4 NO x estimates (Benson, 1984), traffic density, distance to major roads, population density, and meteorological parameters (wind speed and minimum temperature) to model NO x concentrations. Spatial effects were specified both as random effects based on 500 m aggregate distance Thiessen polygons and nonparametric additive terms. Stage 2 iteratively samples 90% of the predictors used in stage 1 and a random subset of 63% of the observations to test against the remaining 37% of the data set in each ensemble, obtaining 120 individual mixed-effect models (referred to as ensembles) that produce biweekly predictions. The estimates from the 120 ensembles are subsequently averaged (weighted by model performance) to provide optimal NO x predictions across the distribution of the data that are robust against investigator bias through forced covariate inclusion and inflated variance of predictions (referred to as stage 2 NO x predictions). Stage 3 of the model uses the averaged stage 2 NO x estimates and constrains the parameter estimates of the temporal basis functions to re-predict exposure based on physical constraints meant to mimic known or observed real-life behavior of NO x (e.g. decreasing temporal trend of NO x over study years, NO 2 output less than NO x output, higher cool season concentrations compared to warm season, etc.). This third stage is known as constrained optimization and its output is referred to as stage 3 NO x predictions (Li et al., 2017;Russo and Soares, 2014) (Fig. 1).

Using stage 2 ensembles as a dosimetry system
The second stage output of the 120 ensembles allows for a unique opportunity to evaluate SUMA exposure measurement error. To quantify the various forms of measurement error, we treated the 120 ensemble predictions as 120 realizations generated from an external dosimetry system. An external dosimetry system is typically used in radiation exposure literature to reconstruct distributions of radiation dose through calculation and assessment of radiation exposure based on knowledge of the physical processes and sources of irradiation (Boyd, 2009). In a similar fashion to radiation dose, NO x residential exposure estimates were reconstructed. We assume the 120 NO x ensembles are sampled from the distribution of true exposure. Each ensemble includes biweekly NO x exposure predictions for all CHS participants across their life course. Using these 120 ensembles, each SUMA component of exposure measurement error is quantified. As the ensembles are presumed to be coming from a distribution of true exposure given the known exposure determinants, adjustment for measurement error is based on a Berskon model.

Statistical analysis
2.2.1. Quantifying SUMA error components-All references to a NO x exposure prediction from here onward are for a two-week estimate for a given subject and location (denoted by "i"), unless otherwise noted. The SUMA model for shared and unshared Berkson error is written as follows: Here X i is the true exposure for the estimate of interest, Z i is the estimated exposure (a weighted mean of the ensembles). ϵ SM and ϵ Mi are the shared and unshared multiplicative errors with mean equal to 1 and variances σ SM 2 and σ M 2 respectively, and ϵ SA and ϵ Ai are the shared and unshared additive errors, with mean equal to 0 and variances σ SA 2 and σ A 2 respectively.
Our focus in the remainder of the manuscript is primarily on the variance of the shared multiplicative error component (σ SM 2 ) because this variance term is what primarily affects the behavior of variance estimates and confidence intervals for the slope term in a standard regression analysis used in an epidemiological investigation of an exposure estimate W on outcome D.
Assuming that each of the ensembles are samples from the true distribution of exposure (Eq. (1)) then Stram and Kopecky (2003) propose estimating the four variance terms σ SM 2 , σ M 2 , σ SA 2 , and σ A 2 as follows.

Shared measurement error-
For each pair of NO x predictions, i and j, we calculated the covariance of the realized values of X i and X j over the 120 ensembles and called this covariance term C ij . At the same time, we calculated the Z i and Z j values as the mean of the realized values of X i and X j (stage 2 exposure predictions as explained earlier). Next, we performed simple ordinary least squares (OLS) regression of C ij on the product Z i Z j to fit the model Stram and Kopecky note that the intercept term, a 0 in this regression corresponds to σ SA 2 , which is an estimate of σ SA 2 , while the slope term (a 1 ) corresponds to σ SM 2 or the estimate of < SM 2 .

Unshared measurement error-Similarly,
we calculated the variance of each X i across ensembles, V i , which is shown to equal the following (Stram and Kopecky, 2003): We then used simple OLS regression of V i on Z i 2 , which allows for the estimation of σ SA 2 + σ A 2 (as the intercept term) and [(σ SM 2 + 1) (σ M 2 + 1) -1] (as the slope term) to solve for σ M 2 an estimate of σ M 2 and σ A 2 , an estimate of σ A 2 .
Due to the intensity and duration of calculation, a random subset of 2500 NO x predictions were selected for SUMA error quantification. To confirm the sample of 2500 NO x predictions were representative of our model and there was no bias introduced by the random sampling, 10 additional random samples were selected (for a total of 11) and the above analysis was repeated to confirm robustness of results. We further compared the distributions of time and geographic characteristics of the sampled predictions to those of the full NOx exposure predictions.
2.3. Spatial and temporal determinants of 'high' shared multiplicative measurement error (SMME) 2.3.1. Defining 'high' SMME for each prediction-For each prediction i, we calculated the "mean covariance" as the mean C ij over all other predictions j of (Z i -E(Z)) (Z j -E(Z)). We expect that a prediction that consistently covaries with other predictions will yield an elevated average covariance, indicating increased shared uncertainties, while a prediction that covaries with few other predictions will yield a low average covariance, representing decreased shared uncertainties within the prediction. Based on observed bimodality in the distribution of the mean covariances, each prediction was assigned a dichotomized value of "high" (upper 20th percentile of average covariances for each prediction) or "low" (below the 80th percentile of average covariance for each prediction) SMME. Dichotomization at the 80th percentile was used as the cut off based on a visual inspection of the plotted covariance and product means (Fig. 2).
Descriptive summaries of the exposure model inputs and additional spatiotemporal parameters were summarized and compared for the low versus high SMME groups to describe factors significantly different between locations with low versus high SMME.

Temporal analysis-
To assess temporal trends in SMME, similar analyses were performed only stratified by time, defined as tertiles of calendar year as follows : 1992-2000, 2001-2004, and 2005-2012. For each time-period, a (new) random sample of 2500 NO x predictions was selected. SMME was calculated and compared for each time-period.

Spatial analysis-Generalized additive models (GAMs) with a smooth term
for location were used to assess spatial variability of SMME (Girguis et al., 2016). The following GAM was fit to model the odds of high SMME (compared to low as the reference group): logit[p(x 1 , y 1 )] = s(x 1 , y 1 ) + γ′ (4) where logit[p(x 1 , y 1 )] is the log-odds of high SMME at location (x 1 , y 1 ), s (x 1 , y 1 ) is a bivariate locally weighted scatterplot smoothing (loess) function at location (x 1 , y 1 ) capturing the contribution of geographic location and γ‱ is a vector of spatial and/or temporal parameters explored in the model. Odds of high SMME were predicted across a grid of evenly spaced points constrained by the geographical extent of CHS lifetime residential locations in Southern California (as NO x predictions were only made in Southern California). A confidence band with an alpha = 5 × 10 −7 (determined by false discovery rate correction) for each grid point was calculated to identify areas of statistically increased or decreased SMME. An unadjusted GAM with only a term for location was used to determine the existence of spatial variability of high SMME. GAMs were then run iteratively, adding a single predictor at a time, to assess the importance of each predictor in explaining the spatial variability of high SMME. Predictors were selected to be included in the final model if a) they significantly altered the spatial patterns of SMME or b) they influenced the range (minimum and maximum odds ratio) of SMME unexplained after their inclusion.
To determine each potential predictor's influence on spatial patterns of SMME the following predictors considered for inclusion in the GAM: NO x measures (including spatiotemporal predictions and ambient monitoring station measures), traffic measures (including traffic density, distance to nearest road by class (FCC1 through FCC4 class roads defined as freeways, arterial roads, collector distributor roads, and local roads, respectively), meteorological measures (including minimum temperature and wind speed), time (categorized and continuous), and other geographic variables (including distance to shore and population density) to determine each potential predictor's influence on the initial spatial patterns of SMME. See Table A2 for a full list of variables and descriptions. To determine the predictors, influence on spatial patterns of SMME, we visually examined patterns to determine if (1) the geographic locations with statistically significant SMME shifted or changed and (2) if the pattern of SMME risk changed and (3) if the range (max odds ratio and minimum odds ratio across space) of SMME risk across the geographic location changed.

Results
Characteristics of predicted NO x exposures and key spatiotemporal model predictors for the complete CHS cohort E lifetime residential histories and a random sample of 2500 points are summarized for comparison in Table 1. The distribution of geographical and temporal characteristics between the random sample and the entire dataset was similar confirming the representativeness of the random sample. For all CHS prediction points and the random sample, approximately 85% were located further than 300 m away from major roadways (FCC1).
To quantify SUMA error, we calculated the covariance, product means, variance and square of means from the random sample of exposure predictions. The distributions are shown in Table 2. Quantified SUMA error components as determined by OLS regression are displayed in Table 3. The slope of the regressed covariance on the product mean is statistically significant (p < 0.00001) indicating a SMME value of 0.00029. The intercept, or shared additive error value, is less than zero (−0.2516) indicating no evidence of shared additive error. Similarly, for the unshared error analysis (OLS regression of the variance on the square of means), the intercept is < 0, indicating no evidence of unshared additive error. Although the additive error components (variances) are estimated to be negative, it is clear from Figs. 2 and 3 that the discrepancy between the nominal value of the additive variances and zero is very small. After setting the additive error values (σ A 2 and σ SA 2 ) to zero, and solving Eq. (2), unshared multiplicative error is calculated as 0.00751. Comparatively, the unshared multiplicative component is approximately 26 times larger than the shared multiplicative component.
The plot of the covariances and the product means (Fig. 2) reveals the presence of two distinct SMME groups: predictions without shared additive and multiplicative error (intercept and slope around zero) and predictions with highly covarying exposure predictions across replications that display evidence of SMME.
To quantify SMME and examine how it changes over time, a time stratified analysis was conducted (Table 4) (Table 4). Although the magnitude of error decreased across time periods, two distinct SMME groups were consistently observed across the time periods (Fig. 4).
Spatial analyses using the unadjusted GAM (with only the smooth term for location) showed significant associations between geographic location and covariance distributions (p < 0.0001). Maps indicate the odds of high average covariance which represents high SMME (compared to low, classified based on the 80th percentile of the distribution) ranged from 0.34 up to 2.07 across the entire CHS study area. Areas with statistically significant elevated (hot) or reduced (cold) odds of high SMME are indicated with black contour lines in Fig. 5 (color indicates predicted odds of high SMME specific to that location). The largest risk of high SMME is observed along the southern California coastline.
Geographical and temporal variables were iteratively added to the model to explain the spatial variability observed. The final model included predictors that altered spatial patterns or changed the range of the odds ratios by 10% or more. The final model that best explained the spatial variability in the odds of high SMME included population density, traffic density, CALINE4 Non-Freeway NO x , calendar year (categorized into tertiles) and distance to nearest major airport (defined as top 5 class 1 airports in the study region). The Odds Ratio (OR) range decreased (0.50-1.56) and a majority of the spatial variability in SMME risk was explained by the included predictors (Fig. 5b). Few locations remained significantly elevated and were not fully explained. Adjusted GAM results are shown in Table 5 for an interquartile range increase of each predictor. Distance to major airport was the strongest predictor of SMME with predictions located between 0 and 15 km away from a major airport displaying a 1.15 odds (95% Confidence Interval (CI): 1.10, 1.23) of SMME compared to predictions located further than 15 km from major airports. NO x predictions in years following 2000 had decreased odds of high SMME compared to predictions between 1992 and 2000 (OR 2001(OR -2004  Although predictions located in the city of Long Beach only make up 6% of the random CHS sample, the largest proportion (23%) of high covariance exposure predictions were found in the city of Long Beach, followed by Anaheim, Riverside, and San Bernardino (8% each) (Table A1). This pattern was consistent across all 10 repeated (for a total of 11) random sample evaluations. Therefore, to separately evaluate the patterns in and predictors of SMME in the city of Long Beach, a random sample of 2500 exposure predictions was re-sampled for predictions within Long Beach. After calculating SUMA components using this Long Beach subsample, we found an SMME value of 0.0021 (seven times larger in magnitude than SMME value calculated for the entire CHS cohort). Exposure model inputs and other predictors related to NO x were compared across "high" (defined as predictions with an average covariance in the upper 20% of Long Beach covariance distributions) and "low" SMME predictions (predictions with an average covariance in the 0-80% of Long Beach covariance distributions) to identify potentially different characteristics (Table 6). High SMME predictions had elevated ambient NO x levels as determined from regional monitoring stations and stage 2 NO x prediction model output. Interestingly, high SMME predictions had higher CALINE4 non-freeway NO x but lower CALINE4 freeway NO x compared to low SMME predictions. Compared to low SMME predictions, high SMME predictions were characterized by the following: higher population density, closer to FCC2 and FCC3 roads but further away from FCC1 and FCC4, closer to the shoreline, greater Heavy Duty Vehicle (HDV) fraction on nearby FCC1 and FCC2 roads, lower average temperatures and slightly higher average wind speeds. There was no difference in elevation across the high and low SMME predictions.
By examining temporal trends in SMME in Long Beach (Table 7), we found that the greatest proportion of NO x predictions with high SMME were observed in the cooler months of winter (39.5%) and fall (35.8%) and the majority of low SMME predictions were observed in the spring (27.5%) and summer (28.9%). Similarly to results using the entire CHS, the highest proportion (43.4%) of high SMME predictions in Long Beach were observed in the earliest time period of 1992-2000.
The spatial pattern analysis of Long Beach only using GAMs showed significant associations between geographic location and the odds of high SMME (p < 0.0001). Maps indicate that NO x predictions with elevated odds of high SMME were located in specific regions in southwestern and north Long Beach (Fig. 6). Spatial predictors that best explained the geographic variability in the odds of high SMME in Long Beach included CALINE4 Non-Freeway NO x , population density, and traffic density on FCC2 roads (Table A3). After adjusting for these predictors, odds of high SMME in southwestern Long Beach locations were no longer elevated and fewer locations in north Long Beach remained significantly elevated. Geographic variations were only fully explained after including prediction year into the model, reducing the range of the ORs from 0.49-2.03 to 0.67-1.51. Locations with elevated odds of high SMME remained, but these were not statistically significant (Fig. 6).

Discussion
We recently developed a three-stage NO x spatiotemporal modeling framework to predict exposures at high spatial and temporal resolutions for use in CHS epidemiological analyses. The use of ensemble learning to reduce the variance and minimize bias of exposure predictions in this model is expected to minimize overall exposure measurement error; however, as with all exposure models, it cannot be fully eliminated. Using the Stram and Kopecky (2003) framework, we quantified the SUMA error components in the Li et al. (2017) model predictions. Given that our random sample represents the entire data set, we found evidence of both shared and unshared multiplicative error but no evidence of shared or unshared additive error. The most influential predictors of the odds of high SMME were year of exposure prediction (earlier years had higher error), distance to nearest major airport, and non-freeway NO x concentrations. Overall, we found that unshared multiplicative error was greater in magnitude than SMME when evaluating the full geographical extent of CHS prediction points, but further analysis identified specific geographic regions with relatively high shared multiplicative error. The city of Long Beach, CA, consistently had the highest proportion of NO x predictions with high SMME over several repeated random draws of the data.
We found spatial and temporal patterns in the distribution of SMME in this work. We observed significantly greater SMME in the earliest years (1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000) compared to later years (> 2001). This decreasing temporal pattern in the uncertainties is common in retrospective exposure reconstructions (Hoffmann et al., 2018) and may be the result of measurement methods improving or changing over time (for example, a shift from using Palmes tubes to Ogawa badges for passive NO x monitoring). The underlying data in the model inputs or covariates may have also become more accurate or complete over time. For example, accurately capturing NO x emissions in the years earlier than 2000 is much more challenging (sparser traffic volume and road network data). Given the observed time trend, our findings indicate that higher NO x exposure predictions (which also occurred in earlier years) are prone to higher levels of uncertainty. Other work has found that when magnitude and uncertainty of exposure are correlated, there is a notable attenuation of the exposure response curve for high exposure values (Steenland et al., 2015), but this has not yet been formally tested in this analysis.
In addition to year of prediction increasing exposure uncertainties, we found that geographic location and other spatially dependent predictors also influenced uncertainties. The comparison of covariate distributions in areas of high and low SMME indicate that measurement error is likely associated with non-freeway sources, or sources/features found in areas further away from freeways. We saw higher uncertainty in predictions located near smaller roads (FCC2 and FCC3) and lower SMME in predictions located near freeways (FCC1). Interestingly, more uncertainty was found in locations with higher heavy-duty vehicle fractions on (FCC2) roads. FCC2 roads are very similar to FCC3 roads as they are state-numbered highways with stop and go traffic, with volumes greater than FCC3 roads but less than FCC1 roads (for example, Pacific Coast Highway, also known as Route 1 is considered an FCC2 road in southern California). Although further analysis is needed, findings indicate that the exposure model does not adequately capture NO x emissions from FCC2 roads, and more specifically from heavy duty vehicles on these roads. This conclusion is further supported by the large proportion of SMME observed among predictions located in Long Beach, CA, a community with the busiest port in the nation, and therefore high proportion of heavy duty vehicles. Although some of the CHS communities do not have any FCC2 roads and the majority only have one, Long Beach includes three FCC2 roads. Our findings support the importance of accounting for local NOx sources and fine scale spatial variability in exposure prediction models, especially in regions with complex NO x sources and dense development.
Distance to major airport, defined as one of the top 5 busiest airports in the study region, was an important predictor of SMME for all CHS locations but was not influential in the Long Beach only analysis. Beyond light and heavy duty vehicular NO x emissions on roads, our exposure model did not account for airports although they are a major source of NO x emissions, not only due to increased vehicular traffic near airports, but also idling planes and jets, takeoff and landing activity, and vehicular operations within airport boundaries (Schlenker and Walker, 2015). In our spatial analysis we found elevated odds of high SMME in geographic locations near Los Angeles International Airport and San Diego International Airport. The influence of smaller airports within the region was formally tested in a sensitivity analysis in the GAM models, but smaller airports did not influence the spatial variability or magnitude of SMME risk. We suspect the smaller airports were not important predictors of SMME as our exposure prediction model spans from 1992 to 2012, and airport operations among smaller (Class 1) airports have only recently increased. Long Beach, a population dense urban area with complex NO x source mixtures, houses a single local airport and a large shipping port. Therefore, there is not much variability in the distance to the centrally-located Long Beach Airport in this city-specific analysis, and airport operations were not consistent throughout this time period.
Although we found that shared additive error was larger in magnitude than SMME, we focused our analysis on SMME as other work has indicated minimal influence of shared additive error on epidemiological results in a Berkson model (Zhang et al., 2017). Shared error differs from traditional measurement error as the errors are not independent, which is common in air pollution exposure models because (1) model covariates are usually aggregated in time and space and (2) air pollution exhibits finely resolved variability through time and space.
The SUMA method classifies "within" and "between" measurement error as unshared and shared error, retrospectively. One shortcoming of the SUMA error approach is that it does not account for "within shared error", defined as shared uncertainties for predictions made in the same or a proximal geographic location over time. SUMA methods also do not account for "between shared error" attributable to time, for example, predictions made in the same year and month will share uncertainties. Previous simulation studies determined that shared error within predictions resulted in greater bias than shared error between predictions (Hoffmann et al., 2018). We hope to elaborate on SUMA models to enable classification of within and between shared errors in future work.
In this work, we treat the 120 ensemble estimates as 120 realizations of a dosimetry system. An assumption of the dosimetry system is that the realizations are generated from a random sample of true exposures that are normally distributed. In our application, parallel ensembles are generated using a subset of prediction points and covariates, which explain the variability of the 120 ensemble exposure prediction estimates. Parallel ensembles take full advantage of independence between base learners (Kotsiantis et al., 2007). The ensembles represent a random sample of possible exposure predictions from the distribution of possible prediction models given a single set of covariates, but the weight given to each ensemble is dependent on model performance to output stage 2 output.
One limitation of our spatiotemporal error analysis is the reliance on average covariances for each prediction to identify high SMME. Covariance is a measure of deviation between two variables. We used the average of all covariance values with all other predictions to dichotomize SMME as high or low. As covariances are unstandardized, the spatiotemporal patterns observed can be an artifact of NO x absolute values since high NO x predictions are likely to have higher covariances. We assume that using the 80th percentile of average covariances will capture predictions with unusually and consistently high covariances with other predictions. Although this definition captured some predictions with high absolute NO x concentrations, it also classified some low NO x predictions were as having high SMME.
In this analysis, we selected a sample of 2500 (0.1%) exposure predictions out of 1,850,415 possible predictions. Given the manipulation of large covariance matrices, this sample number was arbitrarily chosen to accommodate computational ability and time. Given the small proportion of represented points selected in this analysis, we compared the spatial and temporal distributions of the random sample to the entire prediction population and found the sample was spatially and temporally representative (Table 1). In attempt to determine the presence of selection bias resulting from our sampling method, we further selected 10 additional random samples. Findings indicate that SUMA error magnitude was robust across samples (Table A4). We encourage future analysis of this type, to ensure samples are spatially and temporally representative of the universe of exposure predictions.
In this paper, we developed a statistical framework to quantify the different components of measurement error in NO x predictions from our previously published spatiotemporal exposure model (Li et al., 2017) demonstrating that the Stram and Kopecky (2003) radiation dosimetry framework can be applied to air pollution. We also explained the spatial (geographic) and temporal variability in the odds of observing high shared, multiplicative measurement error -the component most commonly seen in air pollution investigations. Our work highlights the ability to use ensembles the in the evaluation of SUMA error and sets up a framework to evaluate potential factors that might be responsible for exposure uncertainties. Our methods can help improve the development of future exposure models by either highlighting areas in space or periods in time where more refined data or methods are needed or shedding light on potentially important inputs or predictors that might be overlooked. Further, characterization of exposure errors can be used to improve confidence in epidemiological inference (Hoffmann et al., 2018) through adjustment of confidence intervals to account for SMME (Stram and Kopecky, 2003) or attenuation of the dose response curve (Stram et al., 2015). Given the importance of this work to exposure science and environmental epidemiology, our follow up work will focus on assessing the impact of SUMA exposure error on epidemiological health estimates and methods for adjusting them accordingly. Average NO x (ppb) for southern California Children's Health Study (CHS) residential locations, 1992-2012. Average NO x using stage 3 of the Li et al. (2017) model which uses the averaged stage 2 NO x estimates and constrained optimization to re-predict exposure based on physical constraints meant to mimic known or observed real-life behavior of NO x .
Average NO x for each unique CHS location displayed using quantiles (6).

Fig. 2.
Scatter plot of covariance by product means to visualize shared exposure measurement error. The covariance and product of means of each pair of predictions are used to demonstrate shared error. The intercept of the ordinary least squares regression line to fit the data is −0.2516 with a slope of 0.000029. The negative intercept indicates there is no evidence of additive shared error and the significant slope (p < 0.0001) indicates significant multiplicative shared error. Girguis et al. Page 17 Environ Int. Author manuscript; available in PMC 2020 April 01.

Fig. 3.
Scatter plot of prediction variance by square of mean to visualize unshared exposure measurement error. The variance and square of mean for each prediction across 120 ensembles are used to demonstrate unshared error. The intercept of the ordinary least squares regression line to fit the data is −5.39 with a slope of 0.0078. The negative intercept indicates there is no evidence of additive unshared error and the significant slope (p < 0.0001) indicates significant multiplicative unshared error. Girguis et al. Page 18 Environ Int. Author manuscript; available in PMC 2020 April 01. Environ Int. Author manuscript; available in PMC 2020 April 01.

Fig. 5.
Spatial pattern of the odds of high Shared Multiplicative Exposure Measurement Error (SMME) in Spatiotemporal NO x Predictions for the full southern California Children's Health Study (CHS) Cohort E residential histories in the a) Unadjusted, crude and b) Fully adjusted model. High SMME risk is determined based on the cut-off of the top 80th percentile of average covariance distribution at each unique prediction location. Odds of SMME is adjusted for population density, traffic density, CALINE4 Non-freeway NOx, distance to airport, and prediction year in the fully adjusted model. Statistically significant geographic areas of increased or decreased risk of SMME are indicated using black contour lines. Spatial pattern of the odds of high Shared Multiplicative Exposure Measurement Error (SMME) in spatiotemporal NO x predictions for a random sample of 2500 predictions from the city of Long Beach, CA (a) unadjusted, (b) after spatial (c) and temporal adjustments. High SMME is defined with a cut-off based on the top 80th percentile of average covariance distribution in Long Beach at each unique location. Confounders of shared multiplicative exposure measurement error risk adjusted for in the model included population density, CALINE4 Non-freeway NOx, and Traffic Density on FCC2 Roads. Statistically significant geographic areas of increased or decreased risk of SMME are indicated using black contour lines. Girguis et al. Page 22  g CALINE4 is line source dispersion model using quarterly average daily traffic volumes (Benson, 1984).
Environ Int. Author manuscript; available in PMC 2020 April 01.      x measured at the EPA air quality monitoring stations. c CALINE4 is a line source dispersion model using quarterly average daily traffic volumes (Benson, 1984). d Traffic density calculated using distance decayed annual average daily traffic (AADT) volume from major roads (freeways/highways and major surface streets) within a 300 and 500 m circular buffer.