Potentially underestimated gas flaring activities—a new approach to detect combustion using machine learning and NASA’s Black Marble product suite

Monitoring changes in greenhouse gas (GHG) emission is critical for assessing climate mitigation efforts towards the Paris Agreement goal. A crucial aspect of science-based GHG monitoring is to provide objective information for quality assurance and uncertainty assessment of the reported emissions. Emission estimates from combustion events (gas flaring and biomass burning) are often calculated based on activity data (AD) from satellite observations, such as those detected from the visible infrared imaging radiometer suite (VIIRS) onboard the Suomi-NPP and NOAA-20 satellites. These estimates are often incorporated into carbon models for calculating emissions and removals. Consequently, errors and uncertainties associated with AD propagate into these models and impact emission estimates. Deriving uncertainty of AD is therefore crucial for transparency of emission estimates but remains a challenge due to the lack of evaluation data or alternate estimates. This work proposes a new approach using machine learning (ML) for combustion detection from NASA’s Black Marble product suite and explores the assessment of potential uncertainties through comparison with existing detections. We jointly characterize combustion using thermal and light emission signals, with the latter improving detection of probable weaker combustion with less distinct thermal signatures. Being methodologically independent, the differences in ML-derived estimates with existing approaches can indicate the potential uncertainties in detection. The approach was applied to detect gas flares over the Eagle Ford Shale, Texas. We analyzed the spatio-temporal variations in detections and found that approximately 79.04% and 72.14% of the light emission-based detections are missed by ML-derived detections from VIIRS thermal bands and existing datasets, respectively. This improvement in combustion detection and scope for uncertainty assessment is essential for comprehensive monitoring of resulting emissions and we discuss the steps for extending this globally.


Introduction
Monitoring changes in greenhouse gas (GHG) emissions and resulting levels of atmospheric carbon dioxide (CO 2 ) is critical for assessing climate mitigation effort towards the 1.5 • C goal under the Paris Climate Agreement (www.un.org/en/climatechange/ paris-agreement). The science research community has developed novel approaches to detect atmospheric CO 2 changes for climate monitoring by utilizing observations and modeling (Weir et al 2021, Zeng et al 2021, Hurtt et al 2022. To support decision-makers and stakeholders, GHG emission information also needs to be provided with evaluation and transparency (NASEM 2022).
From among a variety of carbon emission sources, emissions from combustion, such as gas flares, waste burning, and fires, are relatively uncertain compared to emissions from the energy sector, although these are calculated from the same activity-based (often called 'bottom-up' , NASEM 2022) approach (Eggleston 2006) as Emissions = Activity data (AD) × Emission factor (EF).
AD encompasses a wide range of indicators or drivers of GHG emission including a growing number of unconventional sources (Oda et al 2021a, NASEM 2022. For CO 2 from fossil fuel combustion, AD and EF for the energy sector are highly constrained for the system boundary (Eggleston 2006, Oda et al 2021a. AD for energy production is also reported with high precision (5% 2 sigma reported uncertainty) for fuel consumed, while AD for combustion events, such as gas flares is often based on estimates and the total fuel amount consumed within the system. Moreover, EF for biomass burning is highly uncertain (Akagi et al 2011), while EF for fossil fuels is uncertain for unknown chemical composition. Combustion emissions are incorporated in carbon modeling for estimating emissions and removals (Crowell et al 2019) causing errors and uncertainties from combustion events to potentially alias final estimates. Reducing these errors is crucial for maturing carbon monitoring systems (CMS), especially ones based on atmospheric inversions (Oda et al 2019(Oda et al , 2021b. A challenge in evaluating emissions from combustion is the lack of fiducial reference data, particularly with gridded emission reports (Andres et al 2016, Oda et al 2018. This has been tackled by intercomparing emission estimates and using differences as proxies for errors and uncertainties (Oda et al 2015, Andres et al 2016, Oda et al 2018, Pan et al 2020. As these differences are attributable to underlying computation and datasets, intercomparison allows characterization of emission differences and its drivers (Oda et al 2019, Pan et al 2020. This contributes to quality assurance (QA) and uncertainty analysis recommended by the IPCC guidelines and is essential for robust and transparent emission reporting. However, when the methodologies and underlying datasets are shared by different estimates, the process of assigning uncertainties is challenging as intercomparison is not adequately informative about potential uncertainties in estimates.
Daily satellite observations are widely used for detecting combustion from fires and flares globally. The detectors commonly utilize the Visible Infrared Imaging Radiometer Suite (VIIRS) (Justice et al 2013), thermal bands for detection (Elvidge et al 2013, Csiszar et al 2014, Zhang et al 2015, Liu et al 2018, Schroeder and Giglio 2018, Elvidge et al 2019, Lu et al 2020, Zhizhin et al 2021. These detections serve as a key AD in activitybased emission calculations from combustion and are used to derive fire emissions, and the experimental algorithm, VIIRS Nightfire (VNF), is used as the basis for gas flare detection under NASA's CMS. Flares detected by VNF (Elvidge et al 2013 have been used to assess its environmental impact (Deetz and Vogel 2017, Franklin et al 2019, Zhang et al 2019, Sun et al 2020, Cushing et al 2021 and to map emissions in gridded inventories such as Emissions Database for Global Atmospheric Research (EDGAR) (Janssens-Maenhout et al 2019). However, evaluation of VNF detections has not been thoroughly performed due to the lack of evaluation data matching its daily detection rate globally and the uncertainties are aliased into the subsequent analyses.
Improved combustion detection compared to the VIIRS thermal bands have been observed by using the VIIRS 375 m I-band imagery (Schroeder and Giglio 2018), Sentinel-2 imagery (Ramo et al 2021) that allows smaller, cooler fire detection, while smaller flare detections are obtained from the Sea and Land Surface Temperature Radiometer (SLSTR) observations due to an additional short wave infrared band (Caseiro et al 2018). Light emission from combustion, observed by the VIIRS Day/Night Band (DNB), have also been shown to capture a larger fraction of fire and flare pixels missed by VIIRS thermal bands (Polivka et al 2016, Elvidge et al 2019. The DNB lies in the visible/near-infrared region and has a large dynamic range, making it sensitive to weaker combustion, especially with a small source area (Elvidge et al 2019), and allows fire phase estimation . Despite the higher sensitivity of DNB, it is used for confirmation only, while nightlight-only images have been used to detect offshore drilling (Lu et al 2020, Wang et al 2021b. This highlights the need for incorporating features in addition to VIIRS thermal bands to monitor weaker combustion.
This study proposes a machine learning (ML) approach for detecting combustion that utilizes VIIRS thermal band and nighttime light (NTL) observations from NASA's Black Marble product suite (VNP46, Román et al 2018) by jointly characterizing its day/night visible and thermal emission. The approach is data-driven, and methodologically independent of existing algorithms, such as VNF, and leverages the orthogonal information embedded in VIIRS observations to generate detections and can be used to assess error and uncertainty in VIIRS-based combustion detection that serves as AD for derived emission analyses. We applied the approach for gas flare detection in the Eagle Ford Shale, Texas, US, explored detection improvement using light emission signals, and examined the differences with legacy methods (VNF) to generate potential detection uncertainty. While the global share is less than 1% of the total fossil fuel emissions (Gilfillan et al 2021), flaring associated with oil and natural gas production contributes to regional and local GHG and air pollution emissions with severe impacts on the environment and Earth's climate (Allen et al 2013, Fisher and Wooster 2019, Zhang et al 2019, Caseiro et al 2020, Faruolo et al 2020, Cushing et al 2021. Enhanced satellite-based monitoring and spatio-temporal attribution of these occurrences is essential for routinely tracking adherence to mitigation policies, such as Zero Routine Flaring by 2030 (The World Bank 2022) and progress towards the Paris Climate Agreement Goal (Eggleston 2006, Falkner 2016, Zhang et al 2019 globally.

Proposed methodology
We propose an anomaly detection approach utilizing the top-of-atmosphere DNB and moderate band (Mband) observations from Black Marble VNP46A1 to characterize the anomalous light and thermal emission of flares. Table 1 shows the M-bands and DNB in the VNP46A1 dataset acquired by the VIIRS instrument. We derive a high confidence set with both thermal and light response, a moderate confidence urban-masked, light-only response set, which are merged to derive daily detections.
Increased adoption of ML in combustion and emission monitoring has been observed to detect power plant activities (Couture et al 2020), emissions from combustion (Finch et al 2022), and fire using thermal bands (Wang et al 2021a). We explore its applicability in extracting multispectral thermal and light emission signature of combustion and in detecting weak combustion using light emission.
Our approach learns a multispectral model of the non-anomalous thermal and light background signal from a small volume of data from the region and monitors subsequent observations for deviations (see SI: methodology) caused by high thermal and light emissions, to derive pixel-based anomaly scores. The study duration consists of K observations that are divided into training set to learn background models, and test set when the models are applied to new observations. Each observation X k is a multispectral image where pixel i forms a seven-dimensional vector , with M representing all M-bands.

Training 2.1.1.1. M-band background model (thermal emission)
We characterize the non-anomalous multispectral thermal (M-10 to M-16) background properties using an autoencoder (Hinton andSalakhutdinov 2006, Baldi 2012) by training it on clear-sky M-band spectra from the training set. Anomalies are detected using a pixel's multispectral deviation from the background and denoted as anomaly scores. As thermal emissions have a high signal in M-10 and M-11, we also apply the Reed-Xiaoli detector (Chang and Chiang 2002) and monitor a pixel's deviation from daily background statistics in these bands to detect anomalies. These approaches jointly model thermal bands and reduce single-band spurious detections.

DNB background model (light emission)
We characterize the DNB background signal to analyze a pixel's deviation and incorporate its spatial neighbors to detect light emission. We partition the training set's radiance into clusters using a Gaussian mixture model (GMM). For each cluster, we derive a spatial relationship that predicts the central pixel's radiance as a function of its spatial neighbors using an elastic net (Zou and Hastie 2005). In new observations, the GMM assigns each pixel to a cluster, and the elastic net is applied to its neighbors to determine its high radiance likelihood or anomaly score using a daily variance-based threshold.
Clouds contaminate VIIRS M-bands and DNB, necessitating masking. The standard VIIRS cloud mask (VNP35) (Kopp et al 2014) mislabels nighttime clouds (Wang et al 2021c) and flags thermal anomalies as 'cloudy ' (Elvidge et al 2013). To minimize these errors, we train a cloud model from M-12 to M-16 using principal component analysis (PCA) to learn a projection of cloudy and clear spectra. For new observations, we apply this model and assign labels based on a pixel's proximity to cloud projections. This results in a conservative mask and avoids mislabeling of thermal anomalies as seen in VNP35. During high lunar illumination, light emission may appear through clouds. We apply the anomalous light-emission detector over clouds that sets clouds as background to remove such contaminations but retains anomalous DNB radiance appearing through clouds.

Test
We apply the trained models to new observations as shown in figure 1 to detect anomalies. After removing clouds by applying PCA-distancing on each pixel, the detectors extract candidate anomalies. High M-band deviations are used to detect thermal anomalies after the removal of cloud and water pixels, with thresholds determined from daily variance. We compute the DNB anomaly score, identify pixels exceeding the daily threshold and suppress visible clouds. We then use per-pixel urban settlement information from World Settlement Footprint (WSF) (Marconcini et al 2020) and retain pixels with no urban signal to obtain anomalous light emissions.

Detection sets
The anomalous thermal and light emissions are utilized to form the daily combined, DNB-only, and joint detection sets as shown in figure 1. The combined set consists of pixels with both anomalous thermal and light emissions. Anomalous light emissions are filtered to increase decision confidence by retaining pixels (a) that lie in a neighborhood with negligible WSF score, and (b) with at least one band (M-10 to M-13) deviating positively above the background to minimize interference from unlikely combustion signals, such as electric lighting. This forms the DNBonly set capturing anomalous light emission, including those from weaker anomalies with less distinct thermal signals. The joint detection set consists of merged detections from combined and DNB-only sets.

Experimental details
We applied the detectors over the densely-welled Eagle Ford Shale (Wolaver et al 2018, boundary from Energy Information Administration (EIA) 2022) to assess detection performance. Our study area (26.94 N to 29.85 N and −97.02 W to −99.94 W) corresponds to a 700 × 700 gridded block during 22 January 2021-28 February 2021 with 38 observations (12 clear) (see SI: experimental details). The duration was selected to encompass the lunar cycle and examine performance under varying cloud cover. This includes the winter storm Uri that affected natural gas production (Doss-Gollin et al 2021) and allows assessment of tracking variations in active flares.

Evaluating and interpreting detections
The average number of anomalies detected by the methods under clear and cloudy conditions is shown in figure 2, with the DNB-only set detecting approximately four times more anomalous pixels than the combined set and can be attributed to higher sensitivity of light emission to weaker anomalies. As the spatial extent of flaring signal can vary between Mbands and DNB, we consider the combined method to have matched a DNB-only detection, if there is at least one combined detection within a 3 × 3 grid centering a DNB-only detection and find 79.04 ± 2.23% of the DNB-only detections undetected in the combined set.
The lack of ground truth combustion data hinders validation, especially for the DNB-only set, which lacks confirmation from thermal bands. Accurate flare labeling is infeasible by experts given its spatial footprint and daily variation. This is worsened by clouds, DNB signal leakage around urban areas (Wang et al 2021c), and unsuppressed features in WSF. We assessed the likelihood of the detections being flares by contrasting the multispectral detection signal with the background and examining visible features in higher-resolution imagery after removing contaminations from false  positives (FP). We calculated the fraction of FP as n (FP) /n(pixels in the area), by outlining unremoved clouds and leakage around cities using LabelMe (Kentaro 2016), and n(.) is the number of pixels. Throughout the study duration, this fraction is 0.00282 ± 0.00101, and 0.0117 ± 0.002 in the block and Eagle Ford respectively, while no FP were observed in the combined set. Thus, contaminations are negligible due to masking and daily variancebased thresholds, showing the detectors' potential at monitoring daily combustion occurrence. The detections were analyzed using the following approaches: Multispectral profile: We calculated the ratio of the average signal from the detections to that from the background in each band as shown in table 2 (see  table SI-2). This ratio (and difference in M-12, M-13) is high in each band for the combined set, indicating these are very likely anomalies. The DNB-only detections showed a higher ratio in DNB, M-10, and M-11. The higher M-10 and M-11 signals of the DNBonly set, where gas flaring peaks, indicate that these are likely combustion that are relatively weaker than combined detections.
Co-location with flaring sites: We compared the clear night spatio-temporal detection aggregate over Eagle Ford with flaring infrastructure indicators to examine their co-location. We resampled an openly available flaring well dataset from The Texas Railroad Commission, (ArcGIS 2022), 2015 to 15 arc seconds. At least one flaring site was found in a 7 × 7 grid centering 71.04%, 73.92%, and 74.91% of the combined, DNB-only, and VNF detections respectively, showing comparable co-location of VIIRS detections with flaring sites. We also compared the DNB-only detections with a Landsat-8 composite (Gorelick et al 2017) and confirmed by visual analysis that well pads are co-located with our detections (figure 3). On examining the DNB-only and combined detections non-co-located with flaring sites, we observed 74.61% and 89.22% of the detections overlap with these visible features, respectively. The increased co-location with well pads is likely due to the composite's acquisition dates matching closely with the study duration. Although ground truth combustion information is unavailable, high co-location indicates that DNB-only detections are associated with flaring sites and minimally contaminated by nonflaring sites. We selected the minimum grid size that makes the co-location analysis feasible.
This indicates that DNB-only detections have weaker thermal signals and are probable flares missed by thermal bands, while the combined set consists of high confidence detections. These sets together capture a more accurate representation of likely daily flares at emission sites.
We examined flaring persistence by comparing the DNB-only and combined detections with the annual Black Marble composite (VNP46A4) from 2020 and observe 62.47 ± 0.32% and 82.43 ± 0.34% overlap respectively, showing persistent flaring at these locations. Persistence indicates consistent gas flaring and is important for tracking changes at these sites.

Comparison with VNF
We compare our daily detections (ML k ) with VNF (VNF k ) to evaluate the overlap and increase in detections with ML-based approaches. For a VNF detection, a larger number of adjacent pixels are detected by the methods. If we observe at least one detection within a 5 × 5 grid centering a VNF detection, such flares are considered to have overlapped and expressed as We observe high overlap between VNF and the proposed approaches as shown in table 3, showing that the methods effectively extract flaring signatures. The combined set overlaps with confirmed VNF detections. Approximately 87.06% of the non-overlaps correspond to non-confirmed VNF events, which may include spurious detections. DNB-only detections show an increased overlap with VNF. The joint set shows a high overlap with VNF. We found four observations with ML detections that are missed by VNF, and overlap is thus reported for 34 observations. By jointly learning the multispectral distribution of the M-bands, our approach lowers the chance of spurious detections in the combined set that are seen in confirmed (A2021057) and nonconfirmed (A2021041) VNF detections.
ML-enabled detections undetected by VNF are indicated as We compute d m for pixels in ML k for which at least one detection is not recorded in VNF k within  a 5 × 5 grid centering the ML detection. Table 3 shows increased detection with all ML k . For the DNBonly set d m is computed over detections that overlap with well pads in the Landsat composite. We handlabel persistent DNB detections that do not show spatial overlap with visible well-pads in Landsat imagery to exclude such detections and these have been masked to the best of our knowledge. The inclusion of urban-masked DNB extracts weaker anomalies and lowers the detection threshold without increasing FP errors.

Impact of VIIRS-derived estimates of gas flares
The proposed method is expected to impact VIIRSderived estimates of active flares by extracting currently undetected occurrences and generating independent estimates for approximating detection uncertainties through intercomparison. Figure 5 compares the binarized average spatio-temporal distribution of flaring between the ML detections.

Discussion
This work proposes an independent, data-driven combustion detection approach by jointly considering all VNP46A1 bands, and the resulting detections when viewed with detections from algorithms such as VNF, allow approximating uncertainties in satellite-based combustion detection. As these detections serve as AD for deriving emission estimates, approximating detection uncertainties show the sensitivity of derived emission estimates on input AD. As detectors, both VNF and our data-driven approach have intrinsic errors and biases. Intercomparison allows quantifying these uncertainties for transparently informing emission reports. Table 3 shows ML-based combined (16%) and DNB-only (72%) detections that are undetected by VNF, highlighting the necessity for complementary methods for accurate monitoring. The use of an urban-masked DNB-only signal detects likely weaker flares that are missed by thermal bands and underlines the importance of the DNB for combustion analysis and further studies are required for understanding these signals.
The lack of ground truth data limits validation and highlights the need for collecting evaluation data for improving detection accuracy. A limitation of the DNB-only set is that co-occurrences with non-flaring light signals, such as electric lighting, cannot be decoupled. However, retaining urban-masked detections with positive deviation from the background in at least one M-band minimizes the scope of such contaminations. The performance is also dependent on the cloud mask and WSF.
The proposed approach is agnostic to anomaly signatures and requires additional steps for generalization. We will apply the detectors over areas susceptible to combustion and extract their multispectral representation to create a training dataset. By training on this repository, we will explore generalized detectors robust to spatio-temporal heterogeneity for scaling globally and utilize anomaly score thresholds for reducing contamination. Established methods will be used to derive combustion-specific parameters ML that are currently unattributed in VIIRS-based bottom-up analysis. Our approach should allow localizing uncertainties in AD by intercomparison with existing detections and inform QA and verification analysis suggested by IPCC guidelines. The use of additional sources such as SLSTR, geostationary sensors, should further improve uncertainty assessment. Datasets with focused spatial or temporal coverage such as the SLSTR detections in 2017 (Caseiro et al 2020) can also provide independent estimates for intercomparison over localized scales. Approaches using annual flaring persistence bounds (Caseiro et al 2020), and atmospheric data-based (often called 'topdown') estimates can also contribute to uncertainty analysis.
Lastly, this is a step towards multifaceted Black Marble-based emission mapping that is suitable for CMS studies, given its extensive uncertainty assessment (Wang et al 2021c). NTL-derived estimates of human-caused emissions Maksyutov 2011, Oda et al 2018) and city-level CO 2 emissions have been improved by leveraging Black Marble (Oda et al 2021b). Being a physical measurement, satellite-derived NTL can derive value-added carbon products with science traceability through uncertainty estimates.

Conclusion
This study proposed and developed a ML-based nighttime gas flare detector using NASA's Black Marble product suite by jointly modeling the thermal and light emission signals. Our approach detects flares independently and provides an opportunity for assessing uncertainties in VIIRS flare detections through intercomparison with existing detections, for transparently informing emission reports. We applied the detector over the Eagle Ford Shale and showed the light emission signal to be sensitive to probable weak flares and should improve its detection compared to thermal bands. Our approach is agnostic to combustion type and future improvements will explore generalization techniques to scale globally.

Data availability statement
The Black Marble datasets are available through NASA's official Level-1 and Atmosphere Archive and Distribution System. LAADS is a standard website: (https://ladsweb.modaps.eosdis.nasa.gov/).
The data that support the findings of this study are available upon reasonable request from the authors. Characterizing flaring from unconventional oil and gas operations in south Texas using satellite observations