Up in smoke: characterizing the population exposed to flaring from unconventional oil and gas development in the contiguous US

Due to advances in unconventional extraction techniques, the rate of fossil fuel production in the United States (US) is higher than ever before. The disposal of waste gas via intentional combustion (flaring) from unconventional oil and gas (UOG) development has also been on the rise, and may expose nearby residents to toxic air pollutants, light pollution and noise. However, little data exists on the extent of flaring in the US or the number of people living near UOG flaring activity. Utilizing nightly sattelite observations of flaring from the Visible Infrared Imaging Radiometer Suite Nightfire product, 2010 Census data and a dataset of remotely sensed building footprints, we applied a dasymetric mapping approach to estimate the number of nightly flare events across all oil shale plays in the contiguous US between March 2012 and February 2020 and characterize the populations residing within 3 km, 5 km and 10 km of UOG flares in terms of age, race and ethnicity. We found that three basins accounted for over 83% of all UOG flaring activity in the contiguous US over the 8 year study period. We estimated that over half a million people in these basins reside within 5 km of a flare, and 39% of them lived near more than 100 nightly flares. Black, indigenous, and people of color were disproportionately exposed to flaring.


Introduction
With the rise of unconventional extraction techniques, domestic oil and gas production in the United States (US) has increased to its highest level on historical record. Oil production has more than doubled since 2010, reversing a longstanding decline in production, while natural gas withdrawals have risen by 60% (U.S. Energy Information Administration 2020a, 2020b). This has been made possible in large part by advancements in directional drilling and high volume hydraulic fracturing ('fracking'), which involves the injection of fluids, sands, and chemical additives into wells (Colborn et al 2011, Webb et al 2014, Kassotis et al 2016. These unconventional extraction techniques have allowed for the development of oil and gas from areas that were previously inaccessible or uneconomical. An estimated 17.6 million residents of the contiguous US now live less than 1 mile from an active oil or gas well, raising concerns about the potential for harmful environmental exposures and impacts on public health (Czolowski et al 2017). Unconventional oil and gas (UOG) drilling has been linked to worsened air quality (Field et (Witter et al 2013). There is also growing evidence linking UOG operations with negative health impacts for nearby residents, including impacts on fetal growth and preterm birth (Mckenzie et al 2014, Casey et al 2015, Stacy et al 2015, Whitworth et al 2018, Gonzalez et al 2020, Tran et al 2020. One consequence of the rapid expansion of fossil fuel extraction is flaring, the practice of intentionally combusting excess natural gas to the open atmosphere. Flaring is used during the exploration, production and processing of fossil fuels and is common in oil-producing shale plays with insufficient infrastructure for the capture and utilization of natural gas that is recovered with the oil. Shale plays are accumulations of oil and natural gas deposits within fine-grained sedimentary rock and are located within large-scale geologic depressions called basins. Air quality monitoring studies indicate that flareswhich often operate continuously for days or weeks (Johnson and Coderre 2011)-release a variety of hazardous air pollutants including volatile organic compounds and polycyclic aromatic hydrocarbons along with carbon monoxide, nitrogen oxides (NO x ), and black carbon (Strosher 1996, Kindzierski 1999, Leahey et al 2001, Ite and Ibok 2013, Fawole et al 2016, Gvakharia et al 2017, Schade and Roest 2018, Giwa et al 2019, Roest and Schade 2020. In recent years, flaring in the US grew substantially, and the US was responsible for the highest number of flares of any country globally, with an estimated 17.3 billion m 3 of natural gas flared in 2019 (Elvidge et al 2015, World Bank 2020. This represents a 23% increase from the prior year (World Bank 2020). However, because the practice is largely unregulated, information on flaring from UOG varies by state and is often limited to aggregate (e.g. monthly, field-level) data self-reported by industry operators. A recent analysis in Texas suggests that flaring volumes reported to state regulators represent about one half of actual flare activity, highlighting the limitations of state regulatory data to assess gas flaring activity (Willyard and Schade 2019). The lack of detailed and objective information on the location of flaring from UOG also means that the number of people residing in close proximity to UOG flaring who could be exposed to flaring-related air pollutants is poorly characterized.
In prior work we used satellite observations to measure flaring activity in the Eagle Ford Shale of South Texas (Franklin et al 2019, Johnston et al 2020, and found that exposure to significant levels of flaring during pregnancy was associated with increased risk of preterm birth among women living within 5 km of flares (Cushing et al 2020). Here, we extend our approach to quantify the amount of flaring across the contiguous US over the last eight years and provide the first nationwide estimate of the number of people living near UOG flaring. We utilize a dasymetric mapping approach to estimate the exposed population. Dasymetric mapping refers to the process of disaggregating spatial data to a finer spatial unit of analysis using ancillary data (Mennis 2003). The smallest spatial unit for population data from the US Census Bureau is 'census blocks' . In cities, census blocks generally correspond to city blocks. However, flaring occurs primarily in rural areas where census blocks can encompass hundreds of square miles and very few people. We utilize a remotely sensed dataset of building footprints to decompose block-level population counts to the finer spatial unit of analysis of buildings and more accurately estimate the number of people living near UOG flaring.

Study area
We utilized data from the U.S. Energy Information Administration (EIA) to define the boundaries of all oil and gas shale plays in the contiguous US (n.d.). This data was updated in 2016 and includes 47 shale plays within 28 basins that intersect 714 counties across 28 states. We first estimated the density of nightly flares (count per square km) across all 28 basins, and then narrowed our study area and further refined our estimates in the three basins with the most flaring activity: the Permian, Western Gulf (Eagle Ford Shale), and Williston (Bakken shale). These three basins cover 86 counties in 4 states: 31 counties in the Permian (4 in New Mexico and 27 in Texas), 25 in the Western Gulf (Texas), and 30 in Williston (8 in Montana and 22 in North Dakota).

UOG wells
We obtained data on UOG wells for the entire contiguous US from Enverus (www.enverus.com, formerly DrillingInfo). The data include well locations (latitude and longitude), various attributes such as drill type (horizontal or directional), production type (oil or gas), production volumes (in barrels [BBL] oil or in thousands of cubic feet of gas [MCF]-which we converted to barrels of oil equivalent [BOE = MCF/6]). We restricted the data to horizontal-and directional-drilled wells that were actively producing during our study period: March 2012 through February 2020.

VIIRS Nightfire
Flares were identified using the Visible Infrared Imaging Radiometer Suite (VIIRS) instrument Nightfire (VNF) product from the National Oceanic and Atmospheric Administration Earth Observation Group. The VNF algorithm uses near-infrared and shortwave infrared bands to detect locations of nighttime subpixel (<750 m) combustion sources (Elvidge et al 2013). For each detected 'hotspot' , the VNF product provides information on source temperature, area, and radiant heat. VIIRS began collecting data in March 2012; hence, March 2012 is the first month of our 8 year study period.
We filtered the VNF data in order to limit our sample to observations that were associated with UOG extraction (figure 1). First, a black-body temperature threshold of 1600 K was used to distinguish likely UOG-related flares from other sources of combustion such as biomass burning. Prior work shows VNF detections exhibit a bimodal temperature distribution, with detections related to biomass burning primarily observed between 800 and 1200 K, and the bulk of flaring-associated detections being between 1700 and 1800 K (Elvidge et al 2013 Step 1: Excluded 10,009 (1.8%) not in EIA shale counties Step 2: Excluded 49,789 (9.1%) >1 km from a UOG well Step 3: Excluded 30,451 (6.2%) outside 3 study basins Step 4: Excluded 56,020 (12%) noise points using HDBSCAN* Figure 1. Flow chart illustrating the filtering process for identifying UOG flares from VIIRS Nightfire. We use the term 'flare' to refer to one VNF nighttime detection.
1600 K cut off to be conservative, because 1300-1500 K is a cross over range where detections may result from both biomass burning and flaring (Elvidge et al 2015). Second, we restricted our sample to VNF detections within 1 km of actively producing UOG wells in order to reduce the likelihood of including observations unrelated to UOG production, such as flaring at refineries. We then estimated the density of the resulting set of VNF detections across all 714 counties in the 28 EIA-defined basins at 10 × 10 km resolution with a Gaussian kernel density function. Finally, we utilized Hierarchical Density-Based Spatial Clustering Application with Noise (HDBSCAN * ) to remove possible aberrant observations as described in detail elsewhere (Franklin et al 2019). Because of the computational intensity involved and scarcity of flaring in other basins, HDBSCAN * was performed only for the three basins with the greatest density of flares. The 86 counties in these three basins accounted for 94% of the high-temperature VNF detections near wells identified in the previous step (figure 1). HDBSCAN * is an unsupervised clustering method that requires only one input parameter-the minimum number of points required in a cluster, kand does not require setting the number of clusters a priori as with other clustering methods (e.g. Kmeans clustering). Using a minimum spanning tree that connects every point in the sample and calculating cluster-stability scores, the algorithm identifies k-minimum clusters and noise points that do not belong to any cluster. In previous work on UOGrelated flaring in the Eagle Ford Shale, we applied HDBSCAN * and observed that clustered VNF detections were closer on average to active UOG wells compared to noise detections (Franklin et al 2019). We extended the same approach here: for each basin, we used HDBSCAN * to identify clusters of VNF detections and excluded detections classified as noise. The clustering process involves tuning for the best k-resulting in the fewest number of noise points identified-for each basin and for each year. Flares detected in January and February 2020 were combined with flares from 2019 for the clustering process. Hereafter, we refer to our final sample of N = 407 368 high-temperature (>1600 K), HDBSCAN * -clustered VNF detections near wells as 'UOG flares' . We therefore use the term flare to refer to one nighttime VNF detection.

Population and demographic data
We obtained demographic data from the National Historical Geographic Information System (Manson et al 2018). We utilized 2010 U.S. Census data to identify populations living near flaring and assess their demographic characteristics at the smallest census geography available-the census block. While more recent population estimates are available from the American Community Survey, they are based on a sample and thus less accurate and are only available at more aggregated geographies (census block groups, which can be very large in our largely rural study areas). We examined population demographics by age (<5 and 65 or older) and race/ethnicity (Hispanic of any race, non-Hispanic White, non-Hispanic Black, non-Hispanic Native American, and all other non-Hispanic groups, including mixed race). Some racial and ethnic groups were collapsed due to the low number of individuals. We also obtained county-level population data from the 2018 American Community Survey to evaluate changes in population since 2010.

Building footprints
Because census blocks can be large in rural areas, we conducted dasymetric mapping using a national dataset of building footprints generated by Microsoft to refine our exposure assessment, illustrated in figure 2 (Anon 2019). Dasymetric mapping refers to the process of disaggregating spatial data-in this case 2010 census blocks-to a finer spatial unit of analysis using ancillary data (Mennis 2003), and has been used in previous studies of exposure to UOG development (Clough and Bell 2016). The Microsoft Building Footprint data were created using a two-stage process in which deep neural networks were used on satellite imagery to identify building pixels (semantic segmentation), and these aggregations of building pixels were then converted into polygons across all 50 US states. In our 86 counties of primary interest, we identified all buildings whose centroids fell within census blocks with non-zero populations. For a given block, we assigned population counts to those build-ings by assuming a uniform distribution of total, age-specific, and race/ethnicity-specific populations within the block. This resulted in fractions of persons at the building level that we then summed to generate more refined population estimates at the levels of counties, states and basins. While an improvement over traditional methods relying on census block geography, our approach makes the simplifying assumptions that (a) all building are residential (as opposed to being commercial or other types of uninhabited buildings) and (b) the population within each census block is uniformly distributed with respect to age and race/ethnicity. A small number of blocks with non-zero populations contained no buildings (<3% of blocks and 1% of people in our study area). In these cases, those populations were not assigned to any building or included in our dasymetric mappingbased estimates.
For each block and building, we counted the number of UOG flares that fell within its 5 km circular buffer. We focused on the 5 km distance given our prior finding of adverse associations between flaring and preterm birth at this distance (Cushing et al 2020); however, we also considered 3 km and 10 km buffers since studies of flaring are limited and the distance at which flares may result in potentially harmful exposures to nearby populations is not well understood. At the basin level, we compared our estimates of the number and demographics of people exposed derived from the dasymetric mapping with building footprints to the estimates that resulted from using census blocks alone.

Results
Eight basins contained a non-negligible number of flares (at least 1000 high-temperature VNF observations within 1 km from an active UOG well over the 8 year study period) (figure 3). The Williston, Permian, and Western Gulf basins that were the focus of our subsequent analysis accounted for 83% of all high-temperature (⩾1600 K) VNF observations in the contiguous US (table 1). The Appalachian basin, whose shale plays cover a larger geographic area than those in the three basins combined, only included about 5000 (<1%) UOG-related flares over the study period.
After further restricting to non-aberrant observations using HDBSCAN * , our final sample across the Williston, Permian, and Western Gulf basins included 407 368 UOG flares. The total number of flares was highest in the Permian Basin (170 962), followed by Williston (167 235 flares) and Western Gulf (69 171 flares). The highest flare densities were observed in the Williston Basin (maximum of 716 (km 2 ) −1 ), followed by the Permian (413 (km 2 ) −1 ) and the Western Gulf (131 (km 2 ) −1 ) (figure S1 (available online at stacks.iop.org/ERL/16/034032/mmedia)). The top ten counties with the highest number of flares included four in Williston (McKenzie, Williams, Dunn, and Mountrail, North Dakota), four in the Permian (Reeves and Loving, Texas; and Eddy and Lea, New Mexico), and two in the Western Gulf (La Salle and Karnes, Texas) (table S1). No UOG flares were observed in 21 out of the 87 counties in these three basins. Overall, the number of flares in all three basins grew over time, particularly since 2018 ( figure  S2).
During the study period, there were approximately 31 000 actively producing UOG wells in the Permian Basin, 28 000 in the Western Gulf basin, and 18 000 in the Williston Basin. The Permian Basin was the most productive (4.5 billion BBL in oil production and 2.5 billion BOE in gas production) followed by the Western Gulf (3.5 billion BBL in oil and 2.8 billion BOE in gas). Although the Williston Basin experienced a similar number of UOG flares as the Permian Basin, only 3.25 billion BBL in oil and 0.85 billion BOE in gas were produced over the same period in Williston (figure 4). At the county level, the  number of UOG flares was more highly correlated with the volume of oil produced (Spearman correlation coefficient r = 0.94) than the volume of gas produced (r = 0.86) in the 8 year period. We connected each UOG flare to the nearest well to identify the most likely production type (oil or gas) associated with each flare. In all three basins, most flares were attributed to oil-producing wells: 73% of flares in the Permian, 85% in the Western Gulf, and 99% in Williston. Aggregating the number of flares by the likely production source, we estimated 27 flares per million BBL of oil and 18 flares per million BOE of gas produced in Permian, 17 flares per million BBL of oil and 4 flares per million BOE of gas produced in Western Gulf, and 51 flares per million BBL of oil and 2 flares per million BOE of gas produced in Williston.
Using the dasymetric mapping approach, we estimated that 535 907 people lived within 5 km of a UOG flare across the three basins (table 2). Relying on census blocks rather than dasymetric mapping resulted in 4%-13% larger estimates of the exposed population, depending on the basin (table S2). This was most pronounced in the Western Gulf basin and when considering the smaller 3 km buffer distance. In terms of the intensity of exposure, more people in the Permian Basin lived within 5 km of over 100 flares than in any other basin. More than half of these were in Midland and Ector counties, which contain the cities of Midland and Odessa (data not shown).
In the Permian Basin, the proportion of children under five living near flares was slightly higher than the overall population, while in the Western Gulf, a higher proportion of seniors were exposed (table 2). With respect to race and ethnicity, Blacks were more likely to live within 5 km of a flare in the Permian Basin and Western Gulf than other groups. In the Williston, the Native American and Hispanic populations were the most likely to live near flares. In particular, over a fifth of the Native American population in Williston shale counties lived within 5 km of over 100 flares. Flaring is particularly intense in the Fort Berthold Reservation in North Dakota, which accounted for 70% of the Native American population exposed to more than 100 flares. McKenzie county, which includes part of the Fort Berthold Reservation, had the most UOG flares of any county nationally (83 000) and we estimated that virtually all (93%) of its 6400 residents lived within 5 km of more than 100 flares. Patterns with respect to age and race/ethnicity were consistent when we considered populations within 3 km or 10 km of flares (tables S3 and S4).

Discussion
In this comprehensive assessment, we estimated that three oil and gas producing regions accounted for over 80% of all UOG flaring activity in the contiguous US over the 8 year study period (March 2012-Febrarury 2020). The Permian Basin in West Texas and Eastern New Mexico accounted for the greatest number of individual nightly flares, while the flaring intensity of oil production was highest in the Williston Basin (Bakken Shale) in North Dakota and Montana. We estimate that over 535 000 people live within 5 km of flaring in these three regions, and among these, over 210 000 live within 5 km of 100 or more individual nightly flare events.
Although health studies of flaring are limited, residence within 5 km of ten or more flares during pregnancy was associated with a substantial and statistically significant increase in preterm birth in our prior work (Cushing et al 2020). In addition, monitoring studies from the Eagle Ford Shale indicate that flaring is a significant source of NO x as well as more reactive compounds including formaldehyde, acetaldehyde and ethene (Schade and Roest 2018). Increasing atmospheric NO x concentrations over the Bakken shale and Permian Basin have also been attributed to flaring (Duncan et al 2016). Nitrogen oxides contribute to the development and exacerbation of asthma as well as the formation of ground-level ozone, which in turn is linked with effects on the respiratory, cardiovascular, and nervous systems and with reproductive effects and mortality (U.S. Environmental Protection Agency 2016, 2020). Lab-based investigations and field studies from North Dakota, the Niger Delta, and Alberta, Canada have moreover shown that flaring emits hydrocarbons-including benzene and polycycic aromatic hydrocarbons (PAHs)-as well as particulate matter in the form of black carbon (Strosher 1996, Ana et al 2012, Mcewen and Johnson 2012, Fawole et al 2016, Weyant et al 2016, Gvakharia et al 2017. Benzene and some PAHs are well established carcinogens (Agency for Toxic Substances and Disease Registry 1995, Agency for Toxic Substances Control Registry 2007, Kim et al 2013 and have also been linked to birth defects (Lupo et al 2011(Lupo et al , 2012. Exposure to black carbon is associated with higher rates of all-cause and cardiovascular mortality as well as cardiopulmonary hospital admissions (Janssen et al 2011(Janssen et al , 2012. Together, this evidence indicates that a substantial number of people in the US could be at risk of health-damaging exposures due to flaring from UOG. However, the lack of routine air quality monitoring in these rural areas or systematic regulation and reporting of flaring activity limits efforts to estimate potential flaring-related exposures and associated health risks. Gaps in our knowledge remain about the characteristics of flaring-related emissions and contributions to local air quality in real-world settings, which are influenced by factors such as the composition of the waste gas, type and operating conditions of the flare and the resulting completeness of combustion, and local meteorology. Additional gaps remain with respect to the potential health impacts of flaring through pathways unrelated to air pollution such as noise and psychosocial stress. Our findings also show that flaring is an environmental justice issue. Flaring in the Williston Basin disproportionately impacts Native Americans, particularly members of the Mandan, Hidatsa, and Arikara Nation living on the Fort Berthold Indian Reservation. In the Permian and Western Gulf (Eagle Ford) basins, the majority of the population are people of color (table 2). These rural regions are also among some of the poorest in Texas (Tunstall 2015). While we did not assess this in the current study, our prior work also showed that majority Hispanic census blocks in the Eagle Ford Shale had a higher number of flares within 5 km on average than less Hispanic census blocks (Johnston et al 2020). In general, rural US populations like those in our study areas face additional challenges to health such as poverty and lack of access to health care that contribute to an urban-rural gap in life expectancy that is widening and more severe for people of color (Singh and Siahpush 2014, James and Cossman 2017, Long et al 2018. Indigenous leaders have also highlighted the detrimental social and cultural impacts of the Bakken oil boom in native communities (Horwitz 2014, Finn et al 2017. Our findings reflect larger patterns of urban-rural exploitation in which resources like fossil fuels are extracted from rural areas for the primary benefit of urban populations (Kelly-Reif and Wing 2016). This ultimately undermines progress toward more sustainable systems of energy provision because, with a few exceptions such as Los Angeles, California, the urban majority do not experience the environmental and health consequences of oil and gas extraction.
Finally, flaring also holds implications for climate change. Flaring emits greenhouse gases including carbon dioxide and nitrogen oxides and is an important source of black carbon in the Arctic, where it contributes to radiative forcing (Stohl et al 2013). Recent global estimates suggest 150 billion cubic meters of gas are flared annually, equivalent to the total annual gas consumption of Sub-Saharan Africa (World Bank 2020). Flaring thus accounts for significant losses of recovered natural gas, further contributing to the carbon footprint and climate impact of the fossil fuel industry.
Strengths of our study include the use of objective satellite observations to identify flaring and dasymetric mapping to refine population estimates. While the VFN satellite data does not provide an estimate of the volume of gas flared, it does include information on flaring at much higher spatial and temporal resolution than regulatory data and does not rely on unaudited self reports by the industry. Utilizing building footprints rather than census blocks to estimate populations near flaring overall resulted in a more conservative and likely more accurate estimate of the exposed population. Of the three study regions, we observed the greatest difference in the size of the exposed population when using building footprints rather than census blocks in the Western Gulf (Eagle Ford Shale). This may be because the Western Gulf encompasses more urban areas than the other two basins, including the cities of Laredo and Eagle Pass; utilizing building footprints rather than census blocks resulted in the exclusion of more people from these more densely populated areas.
The building footprints data we used is however also subject to error associated with the algorithmic classification of satellite imagery that introduces uncertainty into our estimates. We also did not have information on building usage (e.g. residential vs commercial) and may have assigned some populations to non-residential buildings. The date of the satellite imagery underlying the Microsoft Building Footprints dataset is also unknown, may vary across the study area, and may fail to capture new housing developments.
Our analysis likely underestimates the exposed population because we rely on the 2010 US Census data (the most recent available) and many oil producing regions have experienced population growth since then. For example, the 2018 American Community Survey suggests the population of McKenzie County, ND has doubled since 2010, likely due to the oil boom. Overall the population of our three study basins has increased between 8% and 17% between 2010 and 2018 (table S3). If the population growth followed a similar pattern to what we observed in 2010 with respect to the fraction of the population in each basin residing within 5 km of any flare, we have underestimated the exposed population by roughly 55 000 people.

Conclusions
Over half a million people in the US live within 5 km of flaring from oil and gas development. Given the recent increase in flaring and suggestive evidence of adverse health impacts to nearby residents, additional research and greater regulatory oversight of flaring is needed.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.