Unconventional Oil and Gas Development Exposure and Risk of Childhood Acute Lymphoblastic Leukemia: A Case–Control Study in Pennsylvania, 2009–2017

Background: Unconventional oil and gas development (UOGD) releases chemicals that have been linked to cancer and childhood leukemia. Studies of UOGD exposure and childhood leukemia are extremely limited. Objective: The objective of this study was to evaluate potential associations between residential proximity to UOGD and risk of acute lymphoblastic leukemia (ALL), the most common form of childhood leukemia, in a large regional sample using UOGD-specific metrics, including a novel metric to represent the water pathway. Methods: We conducted a registry-based case–control study of 405 children ages 2–7 y diagnosed with ALL in Pennsylvania between 2009–2017, and 2,080 controls matched on birth year. We used logistic regression to estimate odds ratios (ORs) and 95% confidence intervals (CIs) for the association between residential proximity to UOGD (including a new water pathway-specific proximity metric) and ALL in two exposure windows: a primary window (3 months preconception to 1 y prior to diagnosis/reference date) and a perinatal window (preconception to birth). Results: Children with at least one UOG well within 2km of their birth residence during the primary window had 1.98 times the odds of developing ALL in comparison with those with no UOG wells [95% confidence interval (CI): 1.06, 3.69]. Children with at least one vs. no UOG wells within 2km during the perinatal window had 2.80 times the odds of developing ALL (95% CI: 1.11, 7.05). These relationships were slightly attenuated after adjusting for maternal race and socio-economic status [odds ratio (OR) =1.74 (95% CI: 0.93, 3.27) and OR=2.35 (95% CI: 0.93, 5.95)], respectively). The ORs produced by models using the water pathway-specific metric were similar in magnitude to the aggregate metric. Discussion: Our study including a novel UOGD metric found UOGD to be a risk factor for childhood ALL. This work adds to mounting evidence of UOGD’s impacts on children’s health, providing additional support for limiting UOGD near residences. https://doi.org/10.1289/EHP11092


Introduction
Childhood acute lymphoblastic leukemia (ALL) is a hematological malignancy that arises from immature B-and less commonly T-lymphoid immune cells. 1 ALL is the most common type of cancer in children (age 0-14 y), representing nearly 80% of childhood leukemia cases and 20%-30% of all childhood cancer cases. [1][2][3] Incidence of ALL typically peaks in children age 2-4 y, 1,4 indicating that the early life environment is likely etiologically important. Although long-term survival rates exceed 90%, 5 survivors may face health and wellness difficulties later in life, such as chronic illnesses (e.g., cognitive dysfunction, heart disease), [6][7][8][9] psychological issues (e.g., depression, anxiety), [9][10][11] and elevated risk of second primary cancers. 8 Despite a decrease in the incidence of cancer overall in the United States, the incidence of childhood ALL has continued to increase, underscoring the importance of primary prevention.
The etiology of ALL is likely multifactorial and attributable to both environmental exposures and underlying genetic susceptibility. Current evidence suggests that for most cases, ALL develops due to multiple genetic insults, such as chromosomal translocations or alterations. [12][13][14] The development of preleukemic clone cells commonly occurs after an initiating genetic insult from a chromosomal translocation in utero, with an additional genetic insult required for overt ALL to manifest. 2,4,14,15 Although the genetic and molecular processes behind the disease have been delineated, the upstream etiological agents triggering such biological insults remain poorly understood. Current evidence and the early age of peak ALL incidence suggest that exposure to environmental chemicals-particularly to chemicals that are hematotoxic, damage DNA, or interfere with the immune system-may provide a mechanism for pre-or postnatal insults. 2,16 To date, ALL has been linked to several environmental and chemical exposures, including ionizing or diagnostic radiation, 17,18 radon, 19 air pollution, [20][21][22][23][24] pesticides, [25][26][27][28][29] polybrominated diphenyl ethers, 30 and benzene. 22,[31][32][33][34][35] Unconventional oil and gas development (UOGD), commonly referred to as hydraulic fracturing or "fracking," is a complex process with the potential for releases of chemical and radiological contaminants into both water and air. 36 UOGD is a rapidly expanding source of energy and petrochemical production in the United States. Hydraulic fracturing, an important step in the UOGD process, involves pressurized injections of millions of gallons of water, chemicals, and proppant (e.g., sand) into underground rock formations to create small fissures, allowing natural gas to flow to the surface. 37 In addition to the natural gas, the injected fluids and formation water also rise to the surface as wastewater. A single well has been estimated to produce between 1.7 and 14 million liters of wastewater over the first 5 to 10 y of production, and this varies widely by producing formation. 38,39 The transport and storage of this wastewater may result in surface spills, [40][41][42][43] and improper management or structural failures of injection wells used for storage can result in migration of chemicals into groundwater or surface water. [44][45][46] Average annual spill rates (number of spills/ UOG wells drilled) across four states was estimated at 5.6%, with 31.1% of wells ever reporting a spill; many spills occurred in watersheds serving as drinking water sources. 42 Hundreds of chemicals have been reportedly used in UOGD injection water or detected in wastewater, some of which have been associated with leukemia. 47 Known and suspected carcinogens include heavy metals, radioactive material, volatile organic compounds (e.g., benzene), and polycyclic aromatic hydrocarbons. 48,49 In addition to water pollution, UOGD has the potential to generate air pollution during well and road construction and through vehicle emissions from the transport of oil, gas, and wastewater. 50,51 Studies of UOGD-related air emissions have measured several carcinogens, including radioactivity, particulate matter (PM), and volatile organic compounds (e.g., benzene). [52][53][54][55] Furthermore, elevated levels of indoor radon were measured in homes near UOGD activity. 56,57 Additionally, the process of extracting natural gas also brings technologically enhanced naturally occurring radioactive compounds to the surface with ancient brine formation water, and drill cuttings and sludge from equipment may also contain radioactivity. 58,59 The potential for children living near UOGD to be exposed to chemical carcinogens and radiological contaminants is a major public health concern.
Research on the potential association between exposure to UOGD and risk of childhood cancer is urgently needed. To our knowledge, there have been only two published studies of this relationship to date. The first was an ecological study conducted in the state of Pennsylvania, 60 which compared standardized incidence ratios of childhood cancer before and after drilling and observed no difference; this analysis did not account for a latency period or adjust for confounders. 61 The second, a registry-based case-case study in Colorado, found that children and young adults with ALL (ages 0-24 y; n = 87 cases) were four times more likely to live in areas of greater oil and gas activity (conventional and unconventional combined) than controls, which were children with nonhematological cancers, based on models adjusted for multiple confounders. 62 The case-case methodology may have attenuated the true association if UOGD was a shared risk factor. The paucity of data on the association between UOGD and childhood cancer outcomes has fueled public concerns about possible cancer clusters in heavily drilled regions and calls for more research and government action. 63 To advance understanding of the relationship between UOGD exposure and ALL risk and inform public policy, we conducted a registry-and population-based case-control study. This work builds on prior studies by incorporating a larger sample size, the use of cancer-free controls identified from birth records, and the use of UOGD-specific metrics, including a novel metric developed for capturing exposures through the water pathway. 64,65 Methods

Study Setting, Population, and Design
We conducted a population-based case-control study in the commonwealth of Pennsylvania because it is home to intense oil and gas activity. More than 10,000 UOG wells were drilled in Pennsylvania between 2002 and 2017, with the place of drilling increasing sharply from 2007 to 2011. 66 In addition, more than 1,000 spills, 5,000 violations, and 4,000 resident complaints related to oil and gas were documented between 2005 and 2014 in Pennsylvania. 42,67 Further, up to one-third of domestic groundwater wells in Pennsylvania are located within 2 km of a hydraulically fractured well. 68 Cases included all children diagnosed with ALL between the ages of 2-7 y in Pennsylvania from 2009 to 2017 ( Figure 1). We chose this age range to cover the peak age of ALL incidence in the United States 69 and exclude cases of the etiologically distinct infant leukemia (diagnosis between the ages of 0-1 y). 70,71 We selected the years of diagnosis to ensure there was opportunity for exposure after drilling commenced in the state and a latency period of at least 1 y to account for the development of disease. 72 ALL cases (n = 429) were identified from the Pennsylvania state cancer registry by Pennsylvania Department of Health staff using ICD-O-3 sites C420, C421, C424 and Histology codes 9811-9818, 9826, and 9835-9837. Cases were then linked to their birth records available from the Pennsylvania Vital Records maintained by the Bureau of Health Statistics and Registries. Cases were excluded if a) the state could not match a birth record in Pennsylvania, b) the child had a previous diagnosis of cancer in the state cancer registry, and c) a birth address could not be obtained/geocoded beyond ZIP code level.
For each case, five control children were randomly selected by Pennsylvania Department of Health staff from live births in the Pennsylvania birth records with frequency-matching on birth year (n = 2,145; Figure 1). Reasons for excluding controls included: a) birth address could not be obtained or geocoded to street level, b) the child had a previous diagnosis of cancer in the state cancer registry, and c) the child was a sibling of a case or another control. After obtaining the data set, we performed additional geocoding [using SAS (version 9.4; SAS Institute Inc.] and checked geocode quality for both case and control children, excluding those whose birth address was not street-level quality or better (n = 14 cases; n = 13 controls). Because the missingness rate for several key covariates was very low, we elected to conduct a complete case analysis by excluding children from the study population missing the following covariates (established or suspected risk factors) 16,[73][74][75] : maternal participation in the U.S. Department of Agriculture's Special Supplemental Nutrition Program for Women, Infants, and Children (WIC, an individual-level representation of socioeconomic status), birth weight, and mode of delivery ( Figure 1; n = 10 cases; n = 52 controls). We included 405 cases and 2,080 controls in our final analyses. The study protocol was approved by the institutional review board of Yale University (HIC #2000021809) and by the Pennsylvania Department of Health.

Exposure Assessment
We obtained and merged permit and production report data sets from the Pennsylvania Department of Environmental Protection's Office of Oil and Gas Management 66 to construct a data set of location, permit, and production data for UOG wells that were active (i.e., drilled or producing, as confirmed by having a reported spud date or a submitted production report) in Pennsylvania during the period 2001-2015. The data were then cleaned, and their quality were checked. For example, missing data on spud date, well type, and producing formation in the permit data sets were crossreferenced with and supplemented by the production data sets. Duplicate entries were addressed by preferentially retaining the most recent entry. Wells with a missing spud date were assigned a spud date equal to the first date of the earliest production report minus the median number of days between spud and first production in the data set. The final database included 9,578 active coalbed methane, gas, oil, and combined oil and gas wells in unconventional formations.
Maternal residential address at birth was obtained and geocoded from birth records for both cases and controls, and address at diagnosis was obtained from cancer registries for cases. Birth address was used to assign exposures using inverse distancesquared weighted (ID 2 W) well counts (represented by for all UOG wells within a buffer zone, where d is distance between the i th UOG well and a residence), referred to as the "aggregate metric." We calculated this metric with buffer sizes of 2, 5, and 10 km. We selected two etiologically important exposure windows: a) 3 months prior to conception to 1 y prior to diagnosis, called the "primary window," and b) 3 months prior to conception to birth, called the "perinatal window." For the primary window, age-matched controls were assigned a reference date corresponding to the diagnosis date of a case. For the perinatal window, exposures were assigned using the respective birth dates of the cases and controls.
To capture water as a route of exposure to UOGD, we also calculated a flow-direction metric based on land-surface topography, inverse distance metric ID ups , referred to as the "water pathwayspecific metric." ID ups is based on the widely accepted conceptual model that groundwater flow in regions of hill-and-valley topography occurs in the downhill direction, parallel to the topographic gradient. 76 ID ups is represented by the equation 1 u , where (u) is distance to the nearest upgradient UOG well, determined with the Dinfinity algorithm in TauDEM. This metric and its underlying programming code was introduced by Soriano et al. and was subsequently applied in a study of UOGD-related drinking water exposure. 65 This exposure metric assumes that UOG wells that are located upgradient of a residence contribute more to exposure than downgradient wells, presuming that consumption or contact with groundwater from domestic wells is a major exposure source. The metric was calculated using buffer sizes of 2, 5, and 10 km around the maternal residence. Our selection of buffer sizes was informed by the hydrological (2 km) and epidemiological (5 and 10 km) literature, 64,65,76,[77][78][79][80][81] and facilitates comparison between the aggregate metric and the water pathwayspecific metric and comparisons with previous epidemiologic studies. We conducted a subanalysis using this metric as the main UOGD exposure assessment variable.

Residential Mobility
Residential mobility among pregnant women or in early childhood could introduce exposure misclassification. [82][83][84][85] We used three analyses to address the potential exposure misclassification introduced by residential mobility among pregnant mothers. First, we compared all case addresses at birth and diagnosis and assessed the distance moved as well as variables associated with mobility (e.g., socioeconomic status). Second, we examined the difference in cases' exposure classification at the birth and diagnosis addresses. Third, our selection of the perinatal exposure window, which restricts the window of exposure to 3 months prior to conception to birth, addresses mobility. The exposure estimate based on birth address is likely to be most accurate during this shorter time window (i.e., less opportunity to move residences), and pregnancy is an important etiological window for childhood leukemia. 14,15,29 We used the findings from these three analyses to provide context and aid interpretation of our results.

Covariates and Confounders
To account for potential confounding, we considered adjustment for both individual-level and area-level factors. We generated a list of a priori potential confounders informed by the literature that were available from birth records or publicly available data sources including sex, mode of delivery, birth weight, race, ethnicity, maternal education, air pollution exposure, and pesticide exposure. 2,20,21,25,33,[73][74][75][86][87][88] We estimated exposure to maternal and childhood residential air pollution using the U.S. Environmental Protection Agency (U.S. EPA) Bayesian space-time downscaler models, which provide daily estimates of average fine PM with an aerodynamic diameter ≤2:5 lm (PM 2:5 ) at census tract centroids. 89 We took the mean of daily average PM 2:5 measurements from 3 months prior to conception to 1 y prior to diagnosis to produce one representative PM 2:5 measurement for each individual. To represent maternal and childhood residential exposure to agricultural pesticides, we retrieved raster data of cropland for Pennsylvania from the U.S. Department of Agriculture National Agricultural Statistics Service CropScape. 90 Individuals were matched to the cropland map from their birth year, except for 2003 and 2004, which used a 2002 map, and 2005-2008, which used a 2008 map, due to data availability. We calculated the percent of land designated as cropland within buffers of 500 m and 1,000 m around each home (modeled after Reynolds et al. 91 and referred to this as "percent cropland").
We obtained information on community-level demographic and socioeconomic characteristics from the U.S. 2000 and 2010 Decennial Census [e.g., median household income, educational attainment, percentage of households living in poverty, housing occupancy, housing type (e.g., rented vs. owned)] for all Pennsylvania census tracts. 92 We also linked individuals to the Centers for Disease Control and Prevention/Agency for Toxic Substances and Disease Registry Social Vulnerability Index (SVI), a composite metric representing 15 different social conditions, including socioeconomic status, demographics, and access to transportation, among other factors. 93

Statistical Analyses
All statistical analyses were conducted in SAS (version 9.4; SAS Institute Inc.), and all tests were two-sided with an alpha level of 0.05. We used unconditional logistic regression to estimate odds ratios (ORs) and 95% confidence intervals (CIs) for the association between UOGD exposure and ALL risk, adjusting for year of birth (i.e., the matching variable). We constructed separate models for each metric, for buffer size, and for both the primary  and perinatal exposure windows. We constructed two model types: minimally adjusted (i.e., only adjusting for year of birth via matching) and parsimonious (i.e., only covariates that changed the OR by 10% or more) (see Supplemental Material, "Intermediate analyses of association and correlation to identify covariates and confounders for model building"). If two covariates were highly correlated (Spearman q > 0:80 or v 2 p < 0:05) and led to model convergence problems, one was selected for use based on public health relevance (e.g., though both representing socioeconomic status, an individual-level measure of socioeconomic status such as maternal use of food stamps may be more relevant to a child's health outcome than their census tract-level median household income) and distribution in the population (e.g., heterogeneity of exposure). We considered several individual-and community-level variables that are proxy measures of socioeconomic status, including maternal education, maternal participation in WIC, census tract-level median household income, and census tract-level SVI. The parsimonious models included maternal race and maternal participation in WIC. As a sensitivity analysis, we constructed a third highly adjusted model that included those covariates that were either associated with the exposure or outcome based on v 2 and Fisher's exact tests at a less stringent p < 0:20 (Supplemental Material, "Intermediate analyses of association and correlation to identify covariates and confounders for model building") or had known etiological or biological importance according to the literature (infant sex, mode of delivery).

Demographics
Cases and controls were similar with respect to sex, gestational age, birth weight, mode of delivery, educational attainment of the mother, census tract-level median household income, and SVI (Table 1). Mothers were predominantly non-Hispanic (91% of both cases and controls) and White, but there was a higher percentage of White mothers among cases (81% of cases and 73% of controls). The case group had a significantly smaller percentage of Black mothers (7% in comparison with 16% of controls). A slightly greater frequency of mothers of cases reported participating in WIC (40% of cases and 36% of controls). Case children had a greater percentage of cropland within 500 m of their birth address on average than control children (13.8% in comparison with 12.5%). Average annual PM 2:5 levels were not significantly different between cases and controls (11:7 lg=m 3 for both groups).

UOGD Exposure within the Study Population
A total of 85%-98% of the study population was unexposed to UOGD; the prevalence of unexposed varied based on exposure metric buffer sizes (Table 2). Due to the low prevalence and limited variability in UOGD exposure, we dichotomized our exposure assessment metrics, because there was insufficient spread to apply them with more than two categories or use them continuously. The ID 2 W metric, when dichotomized, effectively represents whether the participant had at least one UOG well within the buffer zone, whereas the ID ups metric represents whether the participant had at least one UOG well within the buffer zone that was located upgradient within their watershed.

Residential Mobility
A total of 58% of cases moved residences between birth and diagnosis. The mean distance moved was 9:02 km (median: 0:49 km, interquartile range: 0-4:88 km, range: 0-374 km). Though the proportion of cases who moved (and for some, the distance moved) was substantial, <2% of individuals changed exposure designation (either exposed to unexposed or vice versa) using any metric after the move.

Association between ALL and Exposure to UOGD
Aggregate metric (ID 2 W). Using the aggregate UOG exposure metric and the primary exposure window, ORs were elevated for individuals living within 2, 5, and 10 km of UOGD ( Figure 2). In models adjusting only for year of birth, the odds of developing ALL were 1.98 times higher in children with at least one UOG well within 2 km of their birth residence, in comparison with those with no UOG wells (95% CI: 1.06, 3.69). The magnitude of the minimally adjusted OR decreased monotonically but remained elevated as the buffer size of the exposure metrics increased to 5 km (OR = 1:33; 95% CI: 0.88, 2.00) and 10 km (OR = 1:14; 95% CI: 0.84, 1.55). After adjusting for maternal race and WIC participation in our parsimonious models, the odds of ALL were 1.74 times higher for individuals living within 2 km of UOGD (95% CI: 0.93, 3.27), with some attenuation of the odds ratio at buffer sizes of 5 km (OR = 1:18; 95% CI: 0.78, 1.78) and 10 km (OR = 1:03; 95% CI: 0.75, 1.40). Our sensitivity analysis, which included adjustment for the additional covariates of sex, delivery route, birth weight, and percentage cropland, did not appreciably change the estimates in comparison with the parsimonious model (Supplemental Material, "Sensitivity analysis using the highly adjusted model"). For the aggregate metric and the perinatal window, estimates were larger in magnitude by 20%-40% than the estimate for the corresponding buffer size using the primary window ( Figure 2). Children living within 2 km of UOGD had 2.80 times the odds of developing ALL (95% CI: 1.11, 7.05) in models adjusting only for year of birth. The minimally adjusted odds of ALL were also elevated for children with UOGD within 5 km (OR = 1:54; 95% CI: 0.90, 2.63) and 10 km (OR = 1:42; 95% CI: 0.99, 2.04). In parsimonious models, children with UOGD within 2 km had 2.35 times the odds of having ALL (95% CI: 0. 93, 5.95). In sensitivity analyses, the highly adjusted model results were consistent with the parsimonious models at all buffer sizes (Supplemental Material, "Sensitivity analysis using the highly adjusted model"). Water pathway-specific metric. Use of the water pathwayspecific exposure metric in the regression models produced results that were similar to those for the aggregate metric for the primary exposure window (Figure 3). Children who had at least one upgradient UOG well within 2 km had 1.94 times the odds of developing ALL (95% CI: 0. 75, 4.99) in comparison with unexposed children in models adjusting only for year of birth, though the CI was wide. The association was slightly attenuated by adjusting for maternal race and WIC participation (OR = 1:70; 95% CI: 0. 66, 4.41), and the most adjusted model results were consistent with the parsimonious model. Children with at least one upgradient UOG well within 5 km had 1.45 times the odds of developing ALL (95% CI: 0.76, 2.77). Finally, children with at least one upgradient UOG well within 10 km in their watershed had 1.26 times higher odds of developing ALL than unexposed children (95% CI: 0.75, 2.14). Adjusting for maternal race and WIC participation attenuated this association (OR = 1:10; 95% CI: 0.64, 1.87). The estimates produced by the sensitivity analyses were not appreciably different from those produced by the parsimonious model (Supplemental Material, "Sensitivity analysis using the highly adjusted model").
The ORs for the water pathway-specific metric restricted to the perinatal window were also similar to those produced by the aggregate metric (Figure 2). In models adjusting only for year of birth, children with UOG activity within 2 km falling within their upgradient watershed had 3.10 times the odds of developing ALL (95% CI: 0.74, 13.01). In the parsimonious model, the odds of developing ALL for those children were 2.45 (95% CI: 0.58, 10.37). Children with an upgradient UOG well within 5 and 10 km had 1.48 and 1.60 times higher odds, respectively, of developing ALL than control children (95% CI: 0.59, 3.68 and 0.83, 3.08, respectively). The odds remained elevated at 5 and 10 km in the parsimonious model. Sensitivity analyses adjusting for additional covariates including sex, delivery route, birth weight, and percentage cropland did not significantly change the estimates in comparison with the parsimonious model (Supplemental Material, "Sensitivity analysis using the highly adjusted model").

Discussion
In this population-based case-control study of UOGD including 405 children with ALL and 2,080 age-matched controls in Pennsylvania, we found that children living in proximity to UOGD had up to 2-3 times the odds of developing ALL. Although ORs were statistically significant in models only accounting for year of birth, elevated ORs persisted after additionally adjusting for race, socioeconomic status, and competing environmental exposures. However, low exposure prevalence limited our statistical power, and confidence intervals at the 2 km buffer size and for the water pathway-specific metric in particular were wide. Nonetheless, our results indicate that exposure to UOGD may be an important risk factor for ALL, particularly for children exposed in utero. To our knowledge, this is the first case-control study of childhood ALL that examined UOGD exposure exclusively, the largest study of unconventional oil and gas and hematological malignancies in children, and the first study to apply a water pathway-specific metric of UOGD exposure in a health context.
Our results complement those reported by the McKenzie et al. study in Colorado, which reported significantly elevated odds of ALL for children and young adults ages 5-24 y and nonsignificantly elevated or mixed odds for children ages 0-4 y. 62 In the Colorado study, the strongest odds were observed for children and young adults ages 5-24, who were 3-4 times as likely to live Figure 2. Plots of the risk of childhood acute lymphoblastic leukemia (ORs and 95% CIs) by buffer size, assessed with the aggregate metric for the primary and perinatal exposure windows. The aggregate metric refers to ID 2 W well counts. ORs and 95% CIs calculated using unconditional logistic regression. Minimally adjusted: adjusted for year of birth only; Parsimonious: adjusted for year of birth, maternal race, and WIC. Note: CI, confidence interval; ID 2 W inverse distance-squared weighted; OR, odds ratio; WIC, Supplemental Nutritional Program for Women, Infants, and Children.
near UOG as control children with nonhematological cancers. Our ORs fell within a similar range. However, their study only had 39 cases in the 0-4 y age range, which may have hindered their ability to draw inferences for that group. Our results, which focus on children ages 2-7 y, provide more information on this younger age group.
Our results also suggest that preconception to birth is an important etiological window for exposure to UOG and the development of ALL. This finding is consistent with research on other environmental exposures, such as pesticides, 25,26,94 bolstering the evidence for the importance of this sensitive window. ORs calculated using the perinatal window were 20%-40% larger than the estimates for the same buffer size using the primary window, though there were fewer exposed individuals and more uncertainty overall. The perinatal period is a critical window for the genetic mutations that precede the development of ALL. 14,15 It is generally hypothesized that the etiology of childhood ALL is multifactorial due to two distinct genetic "hits." 95 The development of preleukemic clone cells commonly occurs after a genetic insult that results in fusion gene formation or hyperdiploidy in utero. 2,95,96 Then a second, possibly postnatal, insult is required for overt ALL to develop. 2,4,14,15 Given the similar results observed across both exposure windows, our findings suggest that UOG-related environmental exposures may contribute to both prenatal and postnatal insults leading to the development of ALL.
We applied a new metric for evaluating drinking water exposures from UOGD and identified suggestive relationships between ID ups and ALL. This metric and our selection of buffer sizes were informed by the hydrological and epidemiological literature. 64,65,76,[77][78][79][80][81] The estimates generated using the water pathway-specific metric ID ups were similar or greater in magnitude in comparison with estimates using the traditional ID 2 W metric, although the uncertainty associated with these estimates was higher. This finding could indicate that water is an important route of exposure to leukemogenic compounds for the development of ALL. Our metrics do not identify specific etiological agents underlying the observed associations. Seventeen compounds used or produced by UOGD have been previously associated with leukemia. 47 One candidate agent is benzene. Maternal occupational and ambient exposure to benzene, which is known to be used or produced by UOGD, in the air 22,32 or in the form of solvents, paints, and petroleum during pregnancy have been associated with elevated odds of ALL. 35 Benzene has been detected in multiple groundwater studies in this region focused on UOGD 41,65,[96][97][98][99] and in biological samples from communities near oil and gas development. 100 However, it is also possible that these results arose because the water pathway-specific metric produced exposure estimates similar to those of the aggregate metric, particularly when dichotomized. A previous analysis by our group showed that the continuous forms of these metrics tended to be moderately positively correlated with one another (Spearman q = 0:62 for ID ups and ID 2 W at 2 km). 65 It may be that simple proximity to UOGD, which could encompass and/or represent multiple routes of exposure, is the driving factor behind the associations for both metrics. At this time, the dominant stressor is not well understood. 36 Nonetheless, epidemiological studies should try to pinpoint specific exposure pathways underlying associations. Several recent studies of UOGD have explored metrics representing specific (rather than aggregate) routes of exposure, such as flaring, earthquakes, air pollution, and radioactivity. 50,55,[101][102][103] Epidemiological studies of UOGD exposure have generally relied on spatial surrogates of exposure, such as ID 2 W well counts. Previous epidemiological studies using spatial metrics have mainly used a 10 km buffer size or larger. 104,105 However, when considering environmental exposures like water pollution, realistic transport distances should be considered. 106 A study in northeast Pennsylvania measuring the vulnerability of groundwater wells to contamination by UOGD indicates that the extent of a domestic groundwater well's capture zone (the area around the well from which the water is pulled) is generally less than 2 km. 77 Further, Llewellyn et al. suggested that a contaminant plume migrated 1 to 3 km in groundwater from a well pad to domestic wells, 79 and the results of Osborn et al. and Jackson et al. suggest elevated methane levels (i.e., enhanced gas phase transport) within 1 km of UOG well pads. 107,108 Beyond water, an analysis of UOG-related air pollutants found that individuals whose closest UOG well was <0:5 mi (0:80 km) were at greater risk of health effects from exposure to air pollutants than those further than 0.5 mi from a well. 53 The extent of transport of UOG-related air pollutants would be expected to vary by pollutant and local meteorology. Because emitted pollutants attenuate at different functions of distance, there may not be a universal buffer size that optimally captures all hazards. It is possible that applying buffer sizes of 10 km or more could introduce exposure misclassification, dilute the pool of meaningfully exposed individuals, and thus attenuate the magnitude of the observed effect. In this analysis, we observed the largest effect sizes using a buffer size of 2 km, though the number of exposed individuals in these groups was low. The magnitude of the effects at the 5 and 10 km buffer sizes were comparably moderate and was likely particularly apparent in our study because the metrics were used in a binary fashion. Spatial metrics are more typically categorized (e.g., quartiles), and our restricted exposure distribution precluded use of this method. Nonetheless, this attenuation of the observed effect based on the buffer size considered may provide support for using smaller, more selective buffer sizes in epidemiological analyses despite effects to sample size.
This work adds to a growing body of literature on UOGD exposure and women's and children's health used to inform policy, such as setback distances (the required minimum distance between a private residence or other sensitive location and a UOG well). 109,110 Current setback distances in the United States are the subject of much debate, 111,112 with some calling for setback distances to be lengthened to more than 305 m (1,000 ft) 113,114 and as far as 1,000 m (3,281 ft). 115 The current setback distance in Pennsylvania is 152 m (500 ft), extended from 61 m (200 ft) in 2012. 116 We observed elevated odds of cancer associated with UOG activity within 2 km, which exceeds any existing setback distance. Further, although effect sizes diminished with increasing buffer size, the odds of ALL were still elevated at 5 and 10 km buffer sizes. Our results in the context of the broader environmental and epidemiological literature suggest that existing setback distances are insufficiently protective of public health, particularly for vulnerable populations like children, and should be revisited and informed by more recent data.
Our study has several notable strengths. It is the largest study to date investigating UOGD with ALL or any childhood cancer, the first case-control study to focus exclusively on UOGD exposure, and the first to apply a water pathway-specific UOGD exposure metric. We controlled for multiple known risk factors and examined the impact of several competing environmental exposures. We assessed UOGD exposure at multiple buffer sizes informed by the epidemiological and environmental literature. Selection bias, a typical concern in case-control studies, is unlikely to have affected our study because we selected cases from population-based cancer registries and controls from statewide birth records without the need to contact any subjects and seek consent for participation. Because we had access to addresses at two time points for the cases, we were able to examine the potential impact of residential mobility on exposure classification. We determined that only a very small percentage of cases (<2%) had different exposure assignments across birth address and diagnosis address, indicating limited potential for exposure misclassification. This finding is consistent with that of other studies of spatially defined environmental exposures, which also have not found residential mobility to be a major source of error. 82,83 Our study had several limitations. First, we were constrained by individual-level information available in the birth records, which limited our ability to investigate potential confounders such as parental occupation. Though we designed a statewide study, UOGD is confined to the extent of the shale and drilling is also not performed in urbanized metropolitan centers. Therefore, most of our study population was unexposed. However, we would expect this to have attenuated any observed relationships, because population density and incidence for cancer with nonmodifiable risk factors tends to be higher in urban areas, 117 and urban dwelling individuals may be more likely to experience known risk factors for ALL, such as air pollution. 118,119 Although the ORs were not statistically significant after adjusting for race, socioeconomic status, and other environmental exposures in comparison with ORs from models accounting for year of birth alone, the odds remained consistently elevated across different time periods and metrics. ALL is a rare disease, and as such, the lack of statistical significance could be due more to the rarity of the disease limiting our precision than to lack of biological or public health significance. Low exposure prevalence (between 1% and 5%) when using the water pathway-specific metric (particularly at the smaller buffer sizes) may have reduced model stability and reduced the overall precision of risk estimates. This metric may reveal more differences in larger study populations, or in studies of more common health end points. Moreover, the metric is most relevant for people using private groundwater wells. Although a significant proportion of our suburban and urban population may be served by public water sources, up to 50% of residents in the more heavily drilled rural counties may be served by groundwater wells. 120,121 There is an opportunity to further examine drinking water sources in this population to improve the accuracy of exposure assessment.
Our study suggests that children living near UOGD have increased odds of developing ALL as assessed by multiple metrics, including a novel metric representing drinking water exposure. The magnitude of the association was greatest among those children living within 2 km of UOGD and exposed during the perinatal period. This research adds to a growing body of work documenting adverse health effects associated with UOGD, particularly among children, 36,109,110 and provides additional support for more stringent setback policies and other public health measures to reduce exposures to UOGD.