A Generalizable Evaluated Approach, Applying Advanced Geospatial Statistical Methods, to Identify High Lead Exposure Locations at Census Tract Scale: Michigan Case Study

Background: Despite great progress in reducing environmental lead (Pb) levels, many children in the United States are still being exposed. Objective: Our aim was to develop a generalizable approach for systematically identifying, verifying, and analyzing locations with high prevalence of children’s elevated blood Pb levels (EBLLs) and to assess available Pb models/indices as surrogates, using a Michigan case study. Methods: We obtained ∼1.9 million BLL test results of children <6 years of age in Michigan from 2006–2016; we then evaluated them for data representativeness by comparing two percentage EBLL (%EBLL) rates (number of children tested with EBLL divided by both number of children tested and total population). We analyzed %EBLLs across census tracts over three time periods and between two EBLL reference values (≥5 vs. ≥10μg/dL) to evaluate consistency. Locations with high %EBLLs were identified by a top 20 percentile method and a Getis-Ord Gi* geospatial cluster “hotspot” analysis. For the locations identified, we analyzed convergences with three available Pb exposure models/indices based on old housing and sociodemographics. Results: Analyses of 2014–2016 %EBLL data identified 11 Michigan locations via cluster analysis and 80 additional locations via the top 20 percentile method and their associated census tracts. Data representativeness and consistency were supported by a 0.93 correlation coefficient between the two EBLL rates over 11 y, and a Kappa score of ∼0.8 of %EBLL hotspots across the time periods (2014–2016) and reference values. Many EBLL hotspot locations converge with current Pb exposure models/indices; others diverge, suggesting additional Pb sources for targeted interventions. Discussion: This analysis confirmed known Pb hotspot locations and revealed new ones at a finer geographic resolution than previously available, using advanced geospatial statistical methods and mapping/visualization. It also assessed the utility of surrogates in the absence of blood Pb data. This approach could be applied to other states to inform Pb mitigation and prevention efforts. https://doi.org/10.1289/EHP9705


Introduction
Despite great progress over the last several decades, childhood lead (Pb) exposure remains a public health challenge, and "substantial individual-and community-level disparities persist." 1,2 As summarized in the "Federal Action Plan to Reduce Childhood Lead Exposures and Associated Health Impacts" 1 : Lead exposure to children can result from multiple sources and can cause irreversible and lifelong health effects. No safe blood Pb level in children has been identified. Even low levels of Pb in blood have been shown to affect intelligence quotient (IQ), ability to pay attention and academic achievement.
The Action Plan calls for data, maps, and mapping tools to identify high exposure locations and disparities for prioritization efforts to reduce children's blood Pb levels (BLLs).
Local BLL and environmental data are often difficult to obtain, incomplete, or unavailable and state-to-state testing, data collection, and reporting vary. [3][4][5][6][7] This is not a new issue. 8 Thus, different groups have developed surrogate predictive models (e.g., regression models) and indices to identify disproportionately impacted places for action. 3,6,[9][10][11][12][13][14][15][16] Predictive modeling remains challenging at both national and local scales. Generalizable approaches with state-specific models and robust data at the census tract or finer-scale level are needed.
By focusing on a state with extensive, robust BLL data, we can improve mapping resolution and analysis of available measured BLLs in this state while also providing a generalizable approach that could be applied in other states and geographic regions to identify and address high Pb exposure locations, including using available Pb indices and predictive models. Once locations with a high prevalence of elevated BLLs (EBLLs; defined below) are identified, determining the key Pb exposure factors and indicators at those locations (e.g., old housing, which may have Pb paint or Pb service lines for drinking water, and other sources of Pb exposures; sociodemographics, such as poverty and minority variables; or environmental sources other than from old housing) based on exposure pathway/source apportionment modeling can inform effective mitigation and prevention targeting efforts. 7,10 This case study focuses on Michigan, a state with an extensive, robust BLL data set and with the federal-state partnerships in place to facilitate collaboration and ground truthing during this study. We also evaluated and verified our analyses using published data from state reports 17 and published BLL data maps for Michigan 12 and the city of Flint where a 2014 drinking waterrelated public health crisis occurred. 18,19 Our analysis of BLL data and convergence of available Pb indices in a state with robust BLL data can also inform Pb "hotspot" identification in states with less robust BLL data.

Goals and Objectives
This paper presents a generalizable systematic approach, using new applications of geospatial and statistical methods and models with available BLL and other data at the census tract level, to inform Pb exposure reduction efforts-with a focus on Michigan as an initial case study. We present an innovative data analysis to identify locations in Michigan with high prevalence of EBLLs and initial analyses of key select Pb exposure indicators, namely, old housing and sociodemographic factors. Environmental sources of exposure are of interest for planning, but analysis related to these sources is outside the scope of this article.
Specific research objectives in this study were to • Statistically evaluate and analyze Michigan BLL measurements over census tracts and time. • Identify high percentage EBLL (%EBLL) census tracts, and locations near or containing them, using two different statistical methods (top 20 percentiles and an advanced geospatial statistical cluster analysis). • Analyze how well the highest %EBLL locations are explained by Pb exposure models/indices. • Verify identified Michigan hotspots based on previous studies.

Methods
We processed and evaluated available Michigan BLL data, used those data to identify high %EBLL locations and statistical clusters, and analyzed changes over time. We then compared three available Pb exposure indices/models (based on age of housing and sociodemographic variables) as potential surrogates when such robust BLL data are absent. We also developed and applied an initial approach to distinguish locations with a high prevalence of EBLLs ("high %EBLL locations") associated with old housing and sociodemographics from those associated with other exposure factors and indicators.

BLL Data Summary, Processing, and Evaluation
The institutional review boards at the Michigan Department of Health and Human Services (MDHHS) and the University of North Carolina at Chapel Hill, North Carolina, reviewed and approved this study. Through a data use agreement with the MDHHS, we obtained ∼ 1:9 million BLL data points ( ∼ 50% venous and ∼ 50% capillary samples; with 31,600 un-identified) from the MDHHS Childhood Lead Poisoning Prevention Program (CLPPP) for the years 2006-2016. We geocoded test results data for children <6 years of age using ArcGIS (version 10.6.1; ESRI Inc.) for all 1,076,964 address data points provided and matched 1,016,564 addresses for a 94.39% success rate, eliminating the remaining data because of the inability to geocode. Geocoding was conducted by using the U.S. EPA Office of Environmental Information (OEI) Navteq_USA Geocode Service (via the ESRI ArcGIS software suite). Only point address and street address matches located in Michigan were tabulated (unmatched data were excluded). Address matching was automated using service defaults, and manual spot checks were performed to ensure function consistency and reliability.
The <6-years-of-age group was selected to be consistent with the U.S. EPA's age groupings guidance. 20 We applied quality assurance/quality control to all aspects of the data analysis and mapping by having multiple coauthors cross-check work. We applied the following exclusion criteria to 2006-2016 data for each census tract: total population of <50 children 0 to <6 years of age with BLL tests; <10 children tested in a given year; and not containing data for greater than half of the total years assessed.
To identify hotspots, we randomly selected one measure regardless of capillary or venous sampling in Michigan if there were multiple measurements within 1 y for a given person. We conducted sensitivity analyses on the selection of one sample from multiple samples per person per year using three methods: a) random selection (the main method for this paper), b) selection of the highest value, and c) selection of the first sample of the given year.
There are 2,813 census tracts in Michigan (according to the 2010 Census). We removed 39 unpopulated census tracts, for a total of 2,774 in our geospatial analyses. After applying geocoding and test exclusion criteria, we analyzed 2,411 census tracts for 2006-2016 and 2,401 census tracts for 2014-2016 BLL data (n = 363 and n = 373 excluded, respectively). We combined venous and capillary BLL data to increase the total sample size within census tracts. For 2006-2016, the total sample size of BLL test results for 0-to <6-y-olds was 1,930,943 (N = 958,215 for venous and N = 972,28 for capillary samples, after deleting duplicates). For these data, there is a 0.58 correlation between venous and capillary samples for the same person and same year.
For the analyses, we defined an EBLL as one above the 2012 Centers for Disease Control and Prevention (CDC) blood Pb reference value of 5 lg=dL, which is the 97.5th percentile of BLL of 1-to 5-y-old children in the United States based on the 2007-2008 and 2009-2010 National Health and Nutrition Examination Surveys (NHANES). 21 We analyzed the Michigan BLL data set to evaluate its robustness based on sample quantity and the degree to which samples were statistically representative of the general population of 0-to <6-y-olds in Michigan. Concretely, "representativeness" was evaluated by comparing two defined % EBLL metrics or rates, shown in Equations 1 and 2, with the same numerators and different denominators. From a statistical perspective, a high correlation between the rates in Equations 1 and 2 means that the number of children tested is proportional to the number of children residing in the census tract.
Representativeness, by our definition, does not necessarily capture specific sociodemographic differences within a census tract because we did not have those data. Comparing these rates is a check that the BLL data used for the hotspot analyses statistically reflect sampling proportional to the population, providing confidence in the analyses.
Michigan uses a targeted testing approach for blood Pb surveillance that focuses on children at greater risk for Pb exposure. We note that Michigan CLPPP EBLL rate calculations are in accordance with Equation 1. 17 To assess BLL data representativeness for our analysis, we calculated the correlation coefficient between the two EBLL rates over different time periods.
We analyzed the exceedance rate (Equation 1) to associate EBLL with high Pb exposure locations using the years 2014-2016 from the 11-y (2006-2016) BLL data set to have a large sample size from multiple recent years. We calculated correlation coefficients between EBLL measures by blood sample type to determine whether it was statistically appropriate (based on the 0.58 correlation coefficient) to combine venous and capillary results for analysis. Note that the threshold for statistical significance used in this paper is p < 0:01.
To assess the consistency of the %EBLL statistical hotspots, we calculated Cohen's Kappa statistics of hotspot locations over different time periods (2006-2013 and 2011-2013 vs. 2014-2016); between EBLL reference values (≥5 and ≥10 lg=dL); and between different age groups (5-, 6-, and 7-y-olds for a sensitivity analysis at ∼ 6 years of age, the focus age of the MDHHS CLPPP). Cohen's Kappa statistic is a measure of agreement between categorical variables; in our analysis, the categories are "hotspot" or "no hotspot" based on %EBLL. A Kappa score of 0.41-0.6 is considered moderate agreement; 0.61-0.8, substantial; and 0.81-0.99, near perfect. 22 If there had been inconsistency, that could suggest a problem with the data-especially if they are not statistically representative. If the data are representative and the sample size large enough, inconsistency could reflect real differences due to interventions.

Identifying Locations with High Prevalence of EBLLs
We used a top 20 percentile method and an advanced geospatial cluster analysis method to identify high %EBLL locations for census tracts and "reference locations" using 2014-2016 BLL data. We define a reference location in our analysis as the highest population city near or containing a top 20 percentile census tract or more than one geospatial cluster analysis statistical hotspot census tract.
The top 20 percentile method identified 80th-100th percentile census tracts (i.e., the census tracts with BLL data, considering exclusion criteria) for %EBLLs in Michigan (2014-2016; ≥5 lg=dL). The 80th-100th percentile metric was based on the U.S. EPA's Office of Environmental Justice EJSCREEN 2017 technical documentation. 15 The geospatial cluster analysis method, Getis-Ord Gi * , 23 involved applying the ArcGIS Hot Spot Analysis tool [ESRI ArcGIS (version 10.6.1) and ArcGIS Pro (version 2.8)] 24 to census tract exceedance rates in Michigan from 2014 to 2016 (using an EBLL reference value of ≥5 lg=dL). We conducted additional analyses to compare exceedance rates for this time period using the latest CDC EBLL reference value of ≥3:5 lg=dL. 25 Getis-Ord Gi * provided a range of statistical confidence levels when evaluating geospatial patterns of EBLL prevalence; hotspots are defined as locations where %EBLLs are clustered statistically. We queried data to only include census tracts that the tool determined to have confidence intervals (CIs) ≥95% for the Getis-Ord Gi * statistical testing significance level and then identified reference locations for the EBLL clusters.
The ArcGIS Hot Spot Analysis tool is a spatial cluster analysis function that calculates the Getis-Ord Gi * statistic for each feature (i.e., census tract) in a data set. This tool looks at each feature within the context of neighboring features. In this analysis, to be identified as a statistically significant high %EBLL area, a given census tract will have a high %EBLL value and will also be adjacent to other census tracts with high %EBLL values. 24 Basically, the standardized Gi * statistic compares the weighted average of values of the outcome of interest in the neighborhood around the focus area to the overall average of values of the outcome of interest. The Gi * follows the Z distribution, and areas with high Gi * with low p-values indicate hotspots. More detail can be found in Ord and Getis. 23 After locations were detected by the top 20 percentile and geospatial cluster analysis methods, we assigned each location or cluster a reference location by comparing and spatially joining the resulting data to the U.S. Census Bureau 2010 "Places" file, 26 using the "Select by Location" function in ESRI ArcGIS (version 10.6.1). We used the U.S. Census 2010 decennial file year, 27 given that it fell in the middle of the original time frame of the Michigan BLL data set (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) and because Michigan city boundaries for the "Places" data set did not change between 2010 and 2016. In the rare case of reference locations that were completely contained/embedded within a larger city (e.g., the cities of Hamtramck and Highland Park, both embedded in the city of Detroit), the larger city was selected as the "reference location." Associating census tracts with reference locations was not a straightforward spatial assessment, especially in rural areas/townships, because not all census tracts are associated with a unique city or township.
To ground truth and verify locations identified in our Michigan analyses, we compared results with the Michigan MDHHS CLPPP annual reports of blood Pb testing from 2010-2017, which included "targeted communities." 17 To evaluate the 2014-2016 high %EBLL (exceedance rate) locations, we mapped the nine targeted communities in the MDHHS 2016 report, 28 added a 1-mi (1.6 kilometers) buffer, then overlaid the resulting nine buffers/circles with results of Getis-Ord Gi * of 2014-2016 BLLs and the EJSCREEN Pb Paint EJ Index. In addition, we conducted a visual comparison with Reuters ZIP code mapping 12 and, specifically for Flint, a visual comparison with the ward mapping shown in Figure 2 of Hanna-Attisha et al. 18 For both the exceedance rate (Equation 1) and populationadjusted rate (Equation 2), we conducted a time trend regression analysis of %EBLLs from 2006-2016, regressing the log-transformation of the mean rate on years elapsed since 2006. We also analyzed and visualized %EBLL changes over time by census tracts.

Initial Exploration of Exposure-Related Pb Indicators
Old housing (defined as pre-1960) is prevalent throughout Michigan in both rural and urban areas. After identifying high % EBLL (exceedance rate) census tracts, we used three available Pb exposure models/indices for analyses of old housing data and sociodemographic variables with both the top 20 percentile method and the geospatial cluster analysis method. The three Pb exposure models/indices were the U.S. EPA's EJSCREEN 2017 Pb Paint EJ Index, 15 modeled BLLs of Schultz et al., 13 and U.S. Department of Housing and Urban Development's (HUD) Deteriorated Paint Index. 9 The U.S. EPA's EJSCREEN 2017 Pb Paint EJ Index, based on pre-1960 housing and sociodemographic data, was originally developed at the census block group level. We aggregated the data by averaging census block group index values to the census tract. This aggregation was done for data compatibility across index/model data sets and to analyze with BLL data at the census tract scale. Schultz et al. presented a multiple regression model approach to predict 1-and 2-y-old children's geometric mean (GM) BLLs at the census tract level across the United States; they evaluated their model against measured BLL data from three states (Massachusetts, Michigan, Texas). 13 We generated modeled 2015 GM venous BLL predictions for a 2-y-old child for all Michigan census tracts by implementing the Schultz et al. model 13  HUD's Deteriorated Paint Index was developed to help HUD grantees better target home remediation and abatement efforts.
The Deteriorated Paint Index uses microdata from the 2011 American Housing Survey and the 2009-2013 ACS to develop a household-level predicted risk metric that identifies housing units at risk of containing large areas of deteriorated paint. This predicted risk is defined as the mean predicted percentage of occupied pre-1980 housing units at risk of containing deteriorated paint within a given jurisdiction. Results were summarized by state, county, and census tract. 9 We analyzed and mapped these three Pb exposure models/ indices and %EBLL (exceedance rate) data at census tract resolution using geospatial and traditional statistical functions within ArcGIS (version 10.6.1; ESRI) and SAS (version 9.4; SAS Institute, Inc.). Analyzing the model approaches among each other (using Kappa scores and visual inspection) provided a model-to-model comparison; analyzing them against %EBLL data (using Kappa scores, sensitivity and specificity, and visual inspection) helped assess their usefulness as surrogates in the absence of BLL data and to initially identify key indicators. Visual inspection refers to visual observation of mapped hotspot locations identified in our analyses compared with hotspot maps in the other papers, 12,18 which were used to draw conclusions about the consistency of the areas identified.
Census 2010 data from Summary File 1 27 H2 (rural and urban housing) and P14 (population by age) were joined to the Getis-Ord Gi * hotspot and top 20 percentile data that were produced in this study. Any tract with >50% urban housing was designated as urban, and any tract with >50% rural housing was designated as rural. Using the resulting joined values, ratios for urban-torural census tracts and urban-to-rural children 0 to <6 years of age were calculated and categorized for the urban vs. rural analysis presented in the "Results" section.
We merged a selection of environmental justice-related variables from EJSCREEN 2017 (less than high school education, total minority population, low income to less than two times the federal poverty level) 15 and table DP05 (non-Hispanic African American population) of the 2011-2015 ACS 29 data to the clusters identified by applying the Getis-Ord Gi * based hotspot analysis on Michigan 2014-2016 EBLLs. This merging was done in ArcGIS via geospatial join by census tract number. The 2011-2015 ACS was chosen to exactly match with the underlying data presented in EJSCREEN 2017. Using the resulting joined values, population totals, population percentages, and ratios for hotspot vs. non-hotpot were calculated for the environmental justice variables and Getis-Ord Gi * hotspot cluster analysis presented in the "Results" section.
To demonstrate and quantify robustness of the BLL and exposure data sets, and to corroborate analytical and visual findings, we ranked the various measures (predictions or observations) from their lowest to highest values, and used a threshold (i.e., 80th percentile; 90th percentile for a sensitivity analysis) to identify the top 20 (or 10 for sensitivity analysis) percentage highest values. This corresponded to a set of census tracts indicated as the most severely affected areas, by a particular measure. To compare two measures, we generated a 2 × 2 table tally of the inclusions (1) and exclusions (0), and summarized findings using sensitivity, specificity (with measured BLL used as the gold standard/true value), and Kappa statistics. This procedure, which enabled comparison of measures with disparate units, is referred to in this paper as "data convergence analysis."

Identifying Statistical Hotspot %EBLL Locations, and Evaluation
There were 36 census tracts in 2014-2016 with %EBLL >20% for children 0 to <6 years of age (Table 1) We also used the new CDC blood Pb reference value of ≥3:5 lg=dL and found that the pattern of hotspot changes over time is similar, but overall exceedance rates are higher, as expected with the lower reference value [ Figure S1(b)].
Cluster analysis with Getis-Ord Gi * . The Getis-Ord Gi * geospatial cluster analysis identified 306 census tracts across 11 "reference locations." Figure 1 and Table 2 illustrate %EBLL clusters identified with the Getis-Ord Gi * geospatial cluster analysis using 2014-2016 data. Locations with the highest %EBLLs (from 7.1% to 12.4%; "random" selection methodology-i.e., the main method of our analysis) had varying numbers of census tracts (from 4 in Flint and Kalamazoo to 190 in Detroit) and numbers of children (from 643 to 39,323). More than 74,000 children lived in the identified clusters ( ∼ 11% of the population of 0 to <6 years of age). Table 2 also shows that the results using two additional sampling selection methods (i.e., highest and first) are consistent with the random selection results.
High %EBLL locations were consistent across time periods  Figure 1 were also identified over time (≥95% confidence), but with some minor differences in specific associated census tracts making up those clusters.
High %EBLL locations were also consistent between ≥5 lg=dL and ≥10 lg=dL BLL reference values across the 2014-2016 time period (Kappa = 0:78) ( Figure 3). Locations identified were consistent between the two EBLL reference values, but with some census tract differences within the clusters and one additional cluster or reference location (Laurium) identified with the ≥10 lg=dL reference value. The Kappa value of the %EBLL statistical hotspots for 2014-2016 in Michigan between 0-to 6-and 0-to 7-y-olds was 0.95, and between 0-to 5-and 0to 6-y-olds it was 0.99. We compared Pb hotspots with EBLL reference values of ≥3:5 lg=dL vs. ≥5 lg=dL, resulting in high agreement (Kappa = 0:94) [ Figure S1(a)]. Cluster detection using the top 20 percentile method. The top 20 percentile analysis identified 481 census tracts across 91 reference locations (41 reference locations had at least two census tracts; 80 reference locations were identified by this analysis and not the Getis-Ord Gi * geospatial cluster analysis). Figure S2 shows the top 20 percentile locations.
Comparing locations identified with the two statistical methods. A visual comparison of the census tracts and reference locations using the Getis-Ord Gi * geospatial cluster analysis and top 20 percentile methods is shown in Figure S3. All 11 locations selected by the Getis-Ord Gi * geospatial cluster analysis for ≥5 lg=dL were identified by the top 20 percentile method. The two analyses also had 268 census tracts in common (268 of the 306 identified by the geospatial cluster analysis and 268 of the 481 identified by the top 20 percentile method), most of which were in highly populated/urban areas. The differences reveal that the top 20 percentile method identifies rural or sparsely populated census tracts not identified by the geospatial cluster analysis. The geospatial cluster analysis preferentially identifies cities and identifies high resolution boundaries of areas with a disproportionately higher percentage of children with EBLLs, to help focus Pb reduction actions. It revealed some high EBLL locations not identified previously by the MDHHS 2016 report, as discussed below.
Comparison of locations against other published analyses. Figure S4 demonstrates the high %EBLL locations identified in Flint, Michigan, using both Getis-Ord Gi * and top 20 percentiles. Eight of the 9 targeted communities in the 2016 MDHHS report intersected with the BLL Getis-Ord Gi * -predicted hotspots (Lansing is the one location missed), and 8 of 9 intersected with the EJSCREEN Pb Paint EJ Index Getis-Ord Gi * -predicted hotspots (Adrian is the one location missed). Of the 11 reference locations selected using the Getis-Ord Gi * method, when comparing against the MDHHS CLPPP annual reports, 6 (Adrian, Detroit, Flint, Grand Rapids, Jackson, and Muskegon) were targeted communities in 2016, 3 (Battle Creek, Kalamazoo, and Saginaw) were targeted communities in previous years, and 2 (Bay City and Ludington) were not identified as targeted communities. Three MDHHS CLPPP-targeted communities were not identified as reference locations by the Getis-Ord Gi * method (Hamtramck, Highland Park, and Lansing). Of the 91 reference locations selected using the top 20 percentile method, 7 of 9 MDHHS CLPPP-targeted communities were identified (Adrian, Detroit, Flint, Grand Rapids, Jackson, Lansing, and Muskegon).

Representativeness of EBLL Data
Our geospatial and statistical analyses show that Michigan BLLs from 2006-2016 were statistically robust and consistent across census tracts, over time, and between two reference values (Figures 1-3). The correlation coefficient was ∼ 0:93 between the two EBLL rates over 11 y; Kappa scores of ∼ 0:8 for analysis of EBLLs over different time periods and a Kappa score of 0.78 for analysis of EBLLs ≥5 lg=dL vs. ≥10 lg=dL indicate substantial agreement between time periods and EBLL definition levels (Table 3). Dots inside the two reference lines indicate census tracts with representative data, and dots outside the two reference lines will be helpful to identify census tracts to improve sampling for the representativeness [as illustrated in Figure S5(b)].

EBLL Data over Time
Our analysis of the BLLs over time indicates data representativeness. Figure S5 illustrates the %EBLL and population-adjusted rate (≥5 lg=dL) over time for the state of Michigan for children 0 to <6 years of age. Figure S5(a) shows that the two metrics (defined in Equations 1 and 2) track well with each other over time and have a strong correlation of 0.93. Figure S5(b) shows the relationship of the two metrics: Each circle represents one census tract. The overwhelming majority of observations fell within the 95% CI, which demonstrates minor sampling bias in terms of the proportionality to the total population. Figure S5 Table 2. Note: EBLL, elevated blood lead level.
EBLLs >10% (based on ≥5 lg=dL). Figure 4 shows changes in children's %EBLLs over time by census tracts in Michigan, and Figures S7 and S8 show the percentage of increases and decreases of %EBLLs by census tract.

Analyses of Pb Exposure Models/Indices with %EBLLs
Comparison of Pb exposure models/indices. The census tracts identified by the three exposure models/indices were highly consistent. Kappa values among the three Pb exposure models/indices ranged from 0.83 to 0.88 using Getis-Ord Gi * (Table 3) and from 0.72 to 0.87 using the top 20 percentile method (Table 4), indicating strong statistical convergence between models.
Getis-Ord Gi * geospatial hotspot cluster analysis. High Pb exposure locations identified with the three Pb exposure models/ indices, based on old housing and sociodemographics, have moderate statistical agreement (convergence) with the 2014-2016 %EBLLs using the Getis-Ord Gi * geospatial cluster analysis Table 2. High BLL locations identified with Getis-Ord Gi * geospatial cluster analysis for exceedance rate of EBLLs (≥5 lg=dL) in census tracts and reference locations (2014-2016; children 0 to <6 years of age; 2,401 total census tracts evaluated).
Top 20 percentile method. High Pb exposure locations identified with the three Pb exposure models/indices, based on old housing and sociodemographics, also have moderate statistical agreement (convergence) with the 2014-2016 %EBLLs using the top 20 percentile method [i.e., 80th-100th percentiles of census tracts with available Pb exposure models/indices data (Table 4; Figure S9)]. Comparison between 2014-2016 %EBLLs and the three Pb exposure models/indices shows 57-67%, 88-90%, and 0.44-0.55 for sensitivity, specificity, and Kappa values, respectively (Table 4). Furthermore, there were 140 census tracts that had high %EBLLs but did not converge with the three Pb exposure models/indices ( Figure S10).

Rural vs.
Urban. An analysis using urban and rural housing data from the 2010 U.S. Census shows that the three indices/ models overwhelmingly indicate urban hotspots with either method, but especially with the geospatial cluster analysis method ( Table 5). The HUD index identifies a few more hotspots in rural areas in comparison with the EJSCREEN Pb Paint EJ Index and the Schultz et al. model. 13 If indices are used for identifying and addressing hotspots, more housing and sociodemographic data and other information (e.g., environmental sources) are needed in rural areas.
Environmental Justice-Related variables and Getis-Ord Gi * geospatial hotspot cluster analysis. As seen in Table S3, an analysis combining environmental justice-related data, extracted from EJSCREEN 2017 and table DP05 of the 2011-2015 ACS with Getis-Ord Gi * geospatial statistical cluster analysis hotspots identified for the exceedance rate of Michigan 2014-2016 EBLLs, was conducted to demonstrate the relationship between environmental justice variables (non-Hispanic African American, less than high school education, low income, total minority population) and these hotspot data. The total number of census tracts evaluated was 2,401. There were 306 hotspot census tracts and 2,095 non-hotspot census tracts. The ratios were 4.09, 1.97, 1.86, and 2.93 for hotspot  Table 3. percentage vs. non-hotspot percentage for the four selected environmental justice-related variables, respectively.

Advancing the Science
This paper presents a new science-based approach, developed through a federal-state collaboration, for better identifying locations with high prevalence of children's EBLLs. Identifying locations with the highest %EBLLs, and the contributing sources and exposures of those EBLLs, can assist with targeting and prioritization for Pb exposure risk reduction, prevention, and mitigation efforts as called for in the Federal Lead Action Plan. 1 Ideally, robust, consistently measured and reported BLL data at the individual level would be available across all U.S. states to use in identifying locations with high prevalence of EBLLs. Given the challenges in implementing such a program and the significant limitations in BLL data availability and quality, such as privacy concerns and lack of spatial resolution and representativeness, [3][4][5] surrogate Pb exposure models/indices have been developed. 3,6,9,[11][12][13]16,21,30 Differences in variables and data used for housing age, sociodemographics, environmental risk factors, and BLL data (e.g., state reference values for EBLL, age categories used for analysis) can produce different results. State-to-state differences in collection, analysis, and reporting of BLL data can Pb to uncertainty in national-scale analyses and maps to identify high Pb exposure locations. 6 Our analysis provided finer (i.e., census tract level) spatial resolution than county or ZIP code. Aggregating individual data to the census tract level allows for more spatially detailed analyses while avoiding data privacy issues.
To advance the science through a specific state case study analysis, we geocoded an extensive Michigan BLL data set from the MDHHS, statistically analyzed the data for representativeness, and identified high %EBLL locations by applying several geospatial and statistical methods. For locations identified, we used three available Pb exposure models/indices that combine old housing and sociodemographic variables for two purposes: a) to assess their use as surrogates for EBLL locations, and b) to help identify key indicators of EBLLs to inform Pb reduction efforts at the local scale. The resulting maps and analyses identify areas of potential high Pb exposures.

Key Issues
Use of BLL data. Comparing %EBLL vs. population-adjusted %EBLL [ Figure S5(b)] can help identify census tracts where the exceedance rate and population-adjusted rates differ. These census tracts can be investigated to determine why rates differed and how BLL sampling improvements could be considered in the future (in Michigan and other states). In addition, one study using cumulative case ascertainment ratios of reported EBLL counts over expected counts based on NHANES survey data from 39 states participating in CLPPP calculated an ascertainment ratio of 0.77 for Michigan, which was better than the ascertainment ratios of 31 other states and the overall CDC estimate. 4 This further suggests robustness of the Michigan data.
Venous test results are more accurate than capillary test results, but they are usually not the first test a child receives; and follow-up testing after capillary screening is not always done. Of a total 1,992,311 test results used, 49.62% samples were venous and 50.34% were capillary. The advantage of an effectively doubled sample size far outweighs the uncertainty added by combining venous and capillary data for our analysis. Our correlation analysis (comparing the two %EBLL rates, with correlation coefficients of 0.93 for combined venous and capillary, 0.7 for venous only, and 0.81 for capillary only), supported determining that it was statistically appropriate to combine venous and capillary BLL data ( Figure S6). Although venous blood Pb samples are considered the gold standard, 31 one study showed that "capillary blood draws are suitable alternatives to venous blood draws when screening children <6 years of age to determine Pb exposure and provide reasonable estimates at the population level." 32,33 Combining venous and capillary data is also supported by the 2018 report, "Screening for Elevated Blood Lead Levels in Children: A Systematic Review for the U.S. Preventive Services Task Force" 34 : Four studies of capillary blood Pb testing demonstrated sensitivity of 87% to 91% and specificity >90% (range 92% to 99%) compared with venous measurement.
Other studies have also used both venous and capillary data. 2,5 Furthermore, because we used large sample sizes and assess BLLs >5 lg=dL, we believe the influence of detection limits and accuracy in venous vs. capillary sampling to be negligible in our analysis.
We used EBLL prevalence rather than actual BLL concentrations to avoid statistical challenges in using laboratory-derived concentrations that are subject to detection limit issues at the low end of the distribution. The Michigan blood Pb surveillance database did not have the capacity to store indicators for tests below analyzer limits of detection until 2018. Consequently, there was no way of determining whether a reported BLL from a given laboratory was a true low value or whether it was a test result below the limit of detection. Treating test results below limit of detection as a true test value will create biased estimates. Once the limit of detection is known, there are statistical methods for adjusting BLLs to account for results below limits of detection. 35,36 Still, there is more analytical error in quantifying BLLs at 1 lg=dL than at 5 lg=dL. In a review of 5 y of results for target blood Pb values <11 lg=dL for U.S. clinical laboratories participating in CDC's voluntary Lead and Multi-Element Proficiency (LAMP) quality assurance program, Caldwell et al. observed that 40% of the participating laboratories reported a nondetectable result at a target blood Pb value of 1:48 lg=dL compared with 5.5% at a target blood Pb of 4:60 lg=dL. 37 On the other hand, using the EBLL does not require consideration of blood Pb limit of detection, greatly simplifying analyses and enabling greater confidence in results. Analyzing and visualizing EBLL is also consistent with how Michigan and other states routinely report blood Pb surveillance results. We note that the date of geocoded address may not match the date during blood draw, which may result in a small misclassification, but this should not impact the results because of the large sample size and because we used year as the time period. %EBLL trends over time. Tracking changes in %EBLL over time may serve as "lagging indicators" of progress. Analyzing and visualizing %EBLL changes over time by census tracts showed an overall dramatic decline in %EBLLs (Figure 4). Locations where data show continuing high %EBLL ( Figures S7  and S8) should be investigated to determine why rates have not declined and to guide future Pb reduction efforts. Before 2010, most census tracts in Michigan had decreasing rates of %EBLLs; after 2010, there were still reductions, although the trend was much slower. Further investigation drawing on the expertise of state and local partners is needed to confirm these trends and interpret the results.  Table 1 and Table S1. Note: %EBLLs, percentage elevated blood lead levels.   Locations identified with 2014-2016 data. This study identified clusters of high %EBLL census tracts using the top 20 percentile analysis and geospatial cluster analysis. Ninety-one reference locations containing high %EBLL census tracts were identified: 11 (mainly urban locations) were identified with both analyses and an additional 80 with the top 20 percentile analysis. The Michigan BLL data we analyzed were predominantly from highly populated areas, although some data were from less populated areas. Blood Pb surveillance efforts in Michigan "have been concentrated on the geographic areas and populations where the exposure problem is greatest. . .However, it is possible to use the testing data to. . .characterize the population currently being tested." 28 The top 20 percentile analysis identified rural or sparsely populated census tracts not identified by the geospatial cluster analysis because it does not take into consideration the neighboring census tracts around each selected census tract. In rural areas, where census tracts are large and encompass more than one municipality, cluster detection methods will overlook a single census tract with a high %EBLL because it has no neighbors with comparable %EBLL.
The focus of the MDHHS CLPPP is to ensure that all children with EBLLs in the state of Michigan are identified and receive appropriate attention. Consequently, CLPPP preferentially selects the highest and most reliable blood Pb test result to assign to an individual for each year. This approach creates a data set that is intentionally biased toward higher blood Pb test results. The purpose of the analysis in this paper, which involved randomly selecting blood tests results for each individual in the study, was to identify EBLL hotspots across the state of Michigan and to accurately characterize the population. Random selection results in a data set with less bias toward higher BLLs and estimates based on this data set provide results that are more representative of all blood Pb test results in the state. We conducted sensitivity analyses with the 2014-2016 EBLLs using the MDHHS CLPPP sampling methodology to compare respective results. Table 2 demonstrates some minor statistical differences between the "highest" (i.e., MDHHS CLPPP) and "random" (i.e., our analysis) sampling selection methodologies. Figure S11 shows almost identical results between both approaches regarding reference locations identified, with one exception (Holland).
Our methodology is an important advancement for identifying and protecting vulnerable populations of children with high Pb exposures; however, it may miss some locations, for example, in rural or disadvantaged communities. It is important to check the BLL data representativeness with the two rates in Equations 1 and 2. Even though the correlation coefficient is 0.93 (venous and capillary data combined), there are some hotspots that our analysis could miss owing to the sampling strategy. However, our analysis of BLL data and convergence of available Pb indices in a state with robust BLL data can inform hotspot identification in Michigan and other states with less robust BLL data. As noted above, Roberts et al. distinguished BLL data quality across states based on their estimated "ascertainment ratios." 4 The Getis-Ord Gi * analysis was able to detect clusters in areas of lower population density by using only venous blood tests, in comparison with cluster detection using all blood tests or only capillary blood tests ( Figure S6). This included one cluster around an MDHHS CLPPP-targeted community that was not identified using all blood tests (city of Lansing), and clusters in counties in southwestern Michigan (Cass, St. Joseph, Van Buren) and the Upper Peninsula (Houghton, Marquette). Most algorithms used to identify communities for assistance favor locations where there is a large underlying population and large numbers of children tested and may overlook areas with smaller populations and low testing numbers. This may reflect a focus on taking action in areas where more children are potentially impacted, as well as difficulty in producing measures that are statistically valid in areas where less testing data are available (e.g., the small-number problems when computing rates based on small denominators). Cluster detection using venous blood Pb tests has the potential to overcome these limitations. Use of the venous-only test results appears to be able to detect clusters in low population areas, but further investigation is needed to ensure that the bias associated with using venous tests has been accounted for.
Comparison to other studies. Several reference locations identified in this study are consistent with high EBLL locations identified in previous studies. 12,17,18,38 This study improves upon those earlier studies by identifying locations at a finer geographic resolution (Pell and Schneyer,12 used ZIP codes in Michigan); identifying locations throughout the state of Michigan (Hanna-Attisha et al. 18 and Gómez et al. 38 focused on Flint); and applying geospatial statistical methods for identifying potential locations for Pb targeting. Our analysis provides better temporal and geospatial resolution (e.g., analyzed BLL trends over 11 y of data, including more recent years; census tract vs. ZIP code scale) than Pell and Schneyer 12 (with different time periods). We note that our reference location boundaries did not always align with how cities with high levels of EBLLs are identified by the MDHHS CLPPP, given that CLPPP uses the mailing address city of the subject provided by the testing laboratory and mailing address cities do not conform to census tract boundaries.
Many of the identified locations are in cities targeted by the MDHHS CLPPP for actions that are underway by state and local agencies to reduce Pb exposure in children. As reported in the "Results" section, targeted communities in the 2016 MDHHS report are mostly consistent with locations identified in our analyses. Three MDHHS CLPPP-targeted communities were not identified as reference locations by the Getis-Ord Gi * method. Of these three, two targeted communities (Hamtramck and Highland Park) were not selected as reference locations, even though they contained Getis-Ord Gi * hotspot census tracts, because they are municipalities embedded/located within the boundaries of the city of Detroit (Detroit was selected as the reference location instead). Using the top 20 percentile method, two MDHHS CLPPP-targeted communities (Hamtramck and Highland Park) were also not selected as "reference locations," for the same reasons mentioned above.
A number of locations our analysis identified in Michigan have higher %EBLL rates than Flint (Figure 1), which is consistent with findings by other researchers. 12,38 Gómez et al. 38 observed that other areas in Michigan (specifically, the city of Detroit, as well as Kent, Jackson, Lenawee, Mason, and Menominee counties) experienced higher percentages of BLL ≥5 lg=dL than Flint during the 2015 Flint drinking water crisis. With the exception of Menominee County, these counties contain reference locations-namely, the cities of Grand Rapids, Jackson, Adrian, and Ludington-indicated by our geospatial cluster analysis.
Our analysis of results for the state of Michigan are consistent with previous reports at the state level 12 and for the city of Flint ( Figure S4 compared with Figure 2 of Hanna-Attisha et al., 18 which used wards rather than census tracts for geographic aggregation). Although areas outside Flint had higher EBLLs during the period of interest, Flint continues to be of great concern because changes in the drinking water supply led to a public health crisis there. 18 Flint has benefited from the attention through support from a variety of governmental and non-governmental organizations to improve conditions in the city and by increasing awareness about childhood Pb exposure among residents of Flint and the rest of Michigan. Other higher EBLL communities deserve attention and action also, but government resources for identifying and assisting these communities are not always readily available. Tools, such as cluster detection used in this paper, can be used to help identify other communities needing focused attention and resources.
We note that the CDC's recent lowering of the blood Pb reference value to ≥3:5 lg=dL does not impact our cluster analysis hotspot results for EBLL reference value ≥5 lg=dL, based on the high Kappa value ( ∼ 0:94). This can also be seen in Figure S1.
Pb models/indices as surrogates for identifying high %EBLL locations. Comparison among the three Pb exposure models/indices showed close statistical convergence. This is to be expected given their use of similar variables and data despite being developed for different purposes. Importantly, differences in the exposure variables used-such as percentage minority vs. percentage non-Hispanic African Americans, 13,15 use of variables specific to visible peeling Pb paint, 9 and differences in blood Pb data used (age categories, years covered)-can impact the convergence statistics and index-specific maps.
Although many higher %EBLL locations identified in this study were associated with old housing and sociodemographic variables by the three Pb exposure models/indices, 140 census tracts with high %EBLLs were not (when using the top 20 percentile method). This suggests that other sources of Pb exposure may be more important in these places than the housing and sociodemographic factors considered in this study as potential surrogates. The 140 census tracts with high %EBLLs that do not converge with the old housing and sociodemographics-based models/indices and the moderate ( ∼ 0:5) Kappa scores (Tables 3  and 4) suggest other potential Pb exposure sources ( Figure S10). As a potential next phase, these results can be used to determine potential causes of EBLLs (e.g., old housing or other environmental Pb exposures through ambient air, outdoor soils, drinking water, and consumer food products 1,39,40 ) along with population characteristics in those locations.
Some Michigan EBLL locations identified here have been confirmed by previous efforts 12,17,18,38 ; however, we advanced the science beyond traditional mapping/visualization approaches, by a) identifying additional locations state-wide at a finer geographic resolution, and b) applying geospatial statistical methods for identifying potential locations for Pb targeting. Locations identified were further ground truthed (with the MDHHS CLPPP annual childhood Pb reports) where actions are currently being taken by state and local agencies.
Implications and future needs. The key findings in this paper-a) identifying high %EBLL locations at the census tract level using two different approaches, and b) the initial exploration of the causes of Pb exposure in those locations-have important implications for federal, state, and local policies and actions. Our analysis showed that available Pb exposure models/indices can be used as reasonable surrogates to identify high BLL locations in the absence of robust, representative state BLL data but should be verified with available blood Pb data and local knowledge of Pb sources and exposures. Prior literature reviews and published modeling efforts 5,13,41 have found and produced results that suggest sociodemographic (e.g., poverty, race, housing, owner/renter status, education) factors, independent of environmental factors (e.g., ambient air levels), are useful as predictors for childhood BLLs.
The approach presented in this paper relies on completeness and accuracy of census tract-level BLL data. The analysis using available Pb indices is not dependent on the BLL data; however, it is dependent on the completeness and accuracy of U.S. Census/ ACS and American Housing Survey data.
It is challenging to identify the key factors driving EBLLs in specific communities when available data (e.g., Pb concentrations in soil, dust, drinking water, air, food, blood; exposure factors, such as housing variables, sociodemographics data, bioavailability, soil/dust ingestion rate, premise plumbing) are assembled from disparate studies for use in multimedia source apportionment modeling. 7,10 Our data convergence analysis highlights the need for more complete and coordinated environmental data collection and modeling to better pinpoint environmental sources for targeting Pb reduction actions.
Future research can also focus on further unraveling of the housing and demographic components, such as old housing contributions to BLLs from Pb paint, Pb service lines, or other sources not directly related to sociodemographics. The planned next phase of this research includes analyzing available local environmental data together with the census tract-level BLL, housing, and sociodemographic data and developing and applying new models to help understand the exposure sources contributing to higher BLL locations. Another future research need is analyzing the reason for BLL changes over time, such as the impacts of various environmental actions.
As a result of 55.6% of the Michigan MDHHS BLL data missing race variables and no data received for income and education level, we cannot directly estimate impacts of environmental justice-related variables on the Michigan %EBLL hotspots. As a surrogate for this missing data, we used U.S. Census ACS 2011-2015 data and merged it with the Michigan 2014-2016 exceedance rate of EBLLs Getis-Ord Gi * geospatial cluster hotspots produced in this analysis. This data merging and the resulting statistics presented in Table S3 provide a general understanding of the environmental justice-related variable breakdown in hotspots vs. nonhotspots by population and respective percentages. These results, more specifically the high ratios presented in Table S3 (e.g., 4.09 means that there are four times more non-Hispanic African American people in hotspots than in non-hotspots), also demonstrate potential correlations between high percentages of environmental justice/vulnerable communities and high Pb exposure locations. The data in Table S2 also corroborates what is seen in Table S3, as the mean percentiles of environmental justice variables in %EBLL hotspot census tracts are consistently higher than in non-hotspot census tracts.
Landrigan et al. observed that Pb poisoning is disproportionately concentrated in poor minority communities in the United States because older housing units that are in disrepair are disproportionately concentrated in these neighborhoods. 42 In an investigation of socioeconomic factors impacting childhood BLLs in Milwaukee County, Lynch and Meier stressed that community Pb prevention efforts should be informed by the intersection of multiple indicators of sociodemographic disadvantage. 43 Additional techniques may be required to project the impact of Pb point sources on neighboring communities. Moody and Grady modeled airborne Pb in the Detroit Metropolitan Area using the U.S. EPA's AERMOD model and found depositional and airborne Pb contributed directly and significantly to children's BLLs, even after controlling for housing age and social structures, with black children at a continued disadvantage. 44 This was consistent with the authors' prior finding that Pb-emitting facilities were primarily located in, and moving to, highly Black segregated neighborhoods regardless of poverty levels. 45 Egan et al., in their review of national survey data on BLLs collected over four decades, concluded that non-Hispanic Black race/ethnicity and poverty are consistently associated with higher BLLs. 46 This study suggests a systematic approach to help identify high Pb exposure locations. Where direct BLL surveillance data are available, this paper provides two statistical methods to identify hotspots, or areas of high potential Pb exposure. In the absence of sufficient BLL data, it provides a data convergence methodology using available surrogates in the absence of sufficient BLL data. These indices each have uncertainties given the limitations of the underlying survey data, but collectively they are useful for identifying sources of Pb hotspots related to old housing (e.g., Pb paint, Pb service lines). Where the indices did not show a strong statistical agreement with EBLL hotspots, collaborative engagement with state and local agencies is important to identify and address contributing sources.
This paper focused on a generalizable state-specific approach, using Michigan as a case study, with the goal of enhancing targeting efforts for Pb mitigation and prevention. This approach, using available BLL data and Pb exposure models/indices, can be used by other states in identifying Pb focus areas even where robust BLL data are lacking. This case study methodology for one state could inform potential how-to guides and training to identify high Pb exposure locations as called for by the U.S. EPA Lead Strategy and Federal Lead Action Plan. Effective risk communication will be critical when presenting the results of these analyses to stakeholders to inform the development of strategies to mitigate Pb exposures in children and when communicating these findings to the general public. Further interagency collaborations will be vital in obtaining, interpreting, and communicating available data, knowledge, and results to identify locations of high EBLL prevalence and key exposure sources. Such collaborations, supported by the scientific approach presented in this paper, can advance the public health protection goals described in the Federal Lead Action Plan.