Estimating global oilfield-specific flaring with uncertainty using a detailed geographic database of oil and gas fields

Associated gas flaring during crude oil production is an important contributor to global warming. Satellite technology has made global flaring monitoring possible with high spatial resolution. In this study, we construct a granular database to geographically match global oil and gas fields with remote sensing flaring data from the Visible Infrared Imaging Radiometer Suite from 2012 to 2019. The geographic information system database contains over 50 000 oil and gas fields and around 4700 infrastructure sites (e.g. refineries, terminals) in 51 countries and regions, representing 96% of global oil production and 89% of natural gas production. Over 2900 fields and 140 infrastructure sites in 47 countries contain matching flares. The annual matched flare volume covers 89%–92% of the satellite-estimated flaring volume of these countries and 85%–87% of total worldwide volume detected by the satellite. In 2019, a set of 263 ‘high-flare’ fields (which flare more than 0.1 billion cubic meters per year) account for 67% of the total matched satellite-estimated volume. These fields are mainly concentrated in the Persian Gulf, West and East Siberia, Eastern Venezuela Basin, Permian and Williston Basins in the United States, the Gulf of Mexico, and West and North Africa. Accounting for asymmetric instrument uncertainty suggests that country-level flaring rates are accurate to within −8% to +29%, the global average within 1%.


Introduction
Crude oil production is a major contributor to the life-cycle greenhouse gas (GHG) emissions of transport fuels, resulting in ∼15%-40% of full fuel cycle emissions [1]. The natural gas that emerges at crude oil production sites is called 'associated gas' . The process of combusting the associated gas in an openatmosphere flame is called 'gas flaring' , whilst simply releasing gas to atmosphere is called 'gas venting' [2]. If local capacity for associated gas offtake and sale is absent or undersized, the gas is routinely flared, reinjected for pressure support or enhanced-oil-recovery, or in some cases vented. Additionally, smaller quantities of gas flaring occur at oil refineries and natural gas processing facilities. Gas flaring is an important source of GHG emissions from oil and gas production. Routine flaring contributes to more than 300 million tons of CO 2 per year [3]. A recent study showed that gas flaring is the most prevalent driver of carbon intensity from the production phase of global oilfields [4]. By applying gas flaring reduction, several countries can fully or partially meet a substantial portion of their nationally determined contributions of GHG reduction targets under the Paris Climate Agreement [5].
Gas flaring sites are dispersed across vast areas of landscape in many countries around the world. The sources of emissions are also often not reported to the public, crudely estimated, or not monitored by regulatory bodies with meaningful verification requirements. Only a relatively small number of countries, such as Norway, UK, Brazil and Mexico, provide their flare volumes to the public [6,7]. This lack of consistent public reporting makes the gas flaring data difficult to collect. Therefore, remote sensing detection technologies are needed.
Remote sensing is widely acknowledged as an important tool for flaring detection [8,9]. The earliest study using remote sensing to study gas flaring is from 1978 [10]. Multiple other studies have also made progress in this field using different remote sensing instruments [11][12][13]. In 2012, Elvidge, Zhizhin, and other US National Oceanic and Atmospheric Administration researchers successfully developed global gas flaring estimates based upon observations from the Visible Infrared Imaging Radiometer Suite (VIIRS), an instrument on a polar-orbiting satellite used for environmental monitoring and numerical weather forecasting [14]. This method is now hosted at Colorado School of Mines and reports real-time nightly flaring estimates as well as yearly summary estimates by flare and by country.
By recording nighttime data, VIIRS minimizes the impact of sunlight to collect high-quality short-wave gas flare data. Annual data starts from 2012, including spectral bands of M7, M8 and M10. Here 'M' means moderate resolution bands [15]. Data from the M11 spectral band became available in 2017. This band has a higher wavelength and shorter bandwidth, thus enhancing the accuracy of detection [16]. Two desirable aspects of an ideal satellite-based detection platform would be (a) a low minimum detection threshold to allow detection of smaller flares in more difficult conditions and (b) an accurate method of using flare brightness to estimate flare gas volume. Recent studies have examined performance in this regard. Zhinzhin et al performed a ground-based measurement study which showed that the VIIRS was able to detect flare volumes of 0.005 billion cubic meters per year (bcm yr −1 ), and the test results were within 10% error when measured on a nightly basis [17]. A validation study by Brandt showed that 80.8% of the satellite flare volume estimates of VIIRS lied within 0.5 orders of magnitude of reported volumes from governments, and the sum of flares studied had an expected interquartile range of −6% to +3% of reported volumes [18].
In addition to these VIIRS based data, other monitoring studies include European satellite detection published by Casadio et al [12]. We will not examine these results further here, partly due to a lack of easy access to flare estimation product results.
Previous flaring detection studies have focused on flare-level detection, and scaled these results to country-level totals [10][11][12][13]. However, there is no flare analysis that estimated field-level flare volumes on a global scale. This is largely due to a lack of comprehensive geographic information system (GIS) datasets of global oil and gas fields in the public domain. Up to now, a number of large-scale oil and gas mapping efforts exist in the commercial sector. One of the most comprehensive geographic database of global oil and gas fields is the World Energy Atlas (WEA), produced by the Petroleum Economist Ltd [19]. WEA has been published and repeatedly updated since 1999 and covers all major infrastructure (pipelines, terminals, etc) as well as oil and gas field outlines. However, it does not include GIS data and is a proprietary data product that is not able to be made open-source. In addition, many narrowly focused maps and datasets exist from governments, agencies, private companies, and journalistic sources. These products tend to be in the local scope. However, they can be combined and synthesized into a global mapping product that will allow for coverage of all major oil and gas basins.
In this study, we constructed a unique opensource granular geographical database of global oil and gas fields. We also collected data on the locations of infrastructure such as refineries, underground storage sites and gas processing plants as gas flaring and methane emissions are also associated with this infrastructure in some cases. Next, the database was used to apportion the global flaring data of VIIRS in order to estimate field-level gas flaring rates globally. This is the first time-to our knowledgethat flaring data from satellites has been comprehensively allocated to producing oilfields rather than to countries. The database we generated will help to improve accuracy of the life-cycle environmental impact assessment of global oil and gas operations. And the field-level flaring data is also useful in future carbon emissions monitoring and policy making.

Geo-database
One of the main obstacles to accurately allocate flare volumes of oil and gas production into oil and gas fields is the lack of a comprehensive global geographic database for fields that is accessible to academic researchers. The publicly available field data are mostly aggregated in a country-level basis, usually from different sources and with two data types, GIS data and non-GIS maps. For example, governmental GIS oilfields data of countries like Brazil [20], Mexico [21] and Norway [22] are accessible through their governmental webpages. However, for some crude oil producing countries like Algeria [23][24][25][26], only non-GIS maps from third-party sources are available. The data collection and data processing steps are presented as follows. More information can be seen in the supplementary material (available online at stacks.iop.org/ERL/16/124039/mmedia).

Data sources
We first generated an algorithm for priority of inclusion for different sources and types of data as shown in figure 1. Governmental data were given higher priority than third-party data, and direct GIS data were preferred to non-GIS maps because we aimed to build a GIS database. If two available data sources were in the same priority order, the one with higher resolution quality and more information such as field names and product types was preferred. In some limited cases-which we note-we used a source that was lower on the priority algorithm when its quality was much higher than another source.
We collected data from 76 different sources, 18 with GIS data and 58 with maps. Data collection methods include both online searching and offline searching, with 'oil and gas field' 'GIS' 'map' as keywords and English as main searching language. Note that for countries like Russia and China, we did not find data in the national level. Thus, regional data gathering from the main production areas was performed in these countries (see supplementary material sections 1.1.2.2 and 1.2.2.5). Another special case is the United States and Canada, where comprehensive data are available at the well level [27,28]. In those cases we used a process of generating fieldlevel outlines from well-level data, as explained in section 2.1.2.
Data on infrastructure including refinery, product terminal, underground storage and gas processing plant were also collected from governmental sources [22,[29][30][31][32][33] and third-party maps (see supplementary material). Most of the databases have information on facility names and types. At completion the database contains information from 70 different data sources.

Data processing
Our final database is in GIS format, classified by product types and with as many field names as possible. The collected data could be roughly divided into three types: field (shape) shapefile data, well (point) shapefile data and map data. All three types of data required further data processing, described below. The main tool we used here is ArcGIS 10.7.1, a widely used geospatial analysis platform.
Firstly, for the oil and gas field shapefile data, basic reorganizing needed to be done. Secondly, we needed to convert the well (point) shapefile data into field (shape) data with product types and field names. Some government field shapefile data of-e.g. Colorado and Louisiana-have typical characteristics resulting from use of a buffer area GIS tool. Therefore, for the states with well-level data, we created buffer shapes with wellheads as centers and merge the resulting shapes with the same field names into one large shape representing the field. During the conversion, we set the radius of buffer shapes as 1 km around the well according to the case of Colorado and Louisiana (see supplementary material figure S1). The same approach was used for well-level data of Alberta in Canada.
Additionally, for countries for which only non-GIS maps with fields outlines are available, we geo-referenced the maps with the base map of ArcGIS (base map: imagery; projection: Robinson) and manually created shapes based on the rectified maps. Field names and product types were imported manually into the GIS database. Note that the georeferencing process could affect the data quality, especially for countries with desert or offshore areas with few identifiable features that allow georeferencing of the maps. For large maps containing multiple countries, we improved projection accuracy by breaking down the maps into smaller regions to be georeferenced one by one. The same methodology was used for infrastructure sourced from maps.

Database summary
Overall we collected geographic data for oil and gas fields in 51 countries and regions, which cover 96% of global oil production and 89% of natural gas production [34,35]. The top 20 oil producing and gas flaring countries are included in our database. The database includes 51 158 fields and 4765 locations of infrastructure. A total of 88% of the fields have government sources, and 80% of the fields have GIS data (thus do not require digitization based on non-GIS maps). 57% of these fields are in the United States due to the structure of mineral rights in the US which causes creation of many small fields compared to other countries. For infrastructure coverage, 98% of infrastructure locations identified have government sources and 84% have GIS data.

Matching with flaring data
The flare data used in this study are from the VIIRS observation system [16]. The years 2012-2019 data were used in this work as their flaring estimates were calibrated and tested in the peer reviewed literature [18].
The steps of matching flares (points) with fields (shapes) are as follows (also see figure 2). (1) Determine if the flare is contained within the boundary of at least one field; if it is contained within a field boundary, (2) match the flares with the field containing them; (3) sum up flare volumes if multiple flares are within the field; (4) if a flare is contained within multiple overlapping fields (uncommon), we allocate its flare volume according to the production data of the fields; (5) for fields without production data, flare volumes are allocated equally between the overlapping fields containing the flare. In this branch of the allocation process, approximately 65% of the flares were assigned to fields.
Approximately 35% of the flares failed step (1) in figure 2, placing them outside the boundary of any oilfield. This could be due to a number of reasons. First, fields generated from maps might not be as accurate as governmental GIS data, as the geo-rectifying process could slightly reduce data accuracy. Also, the geographic field definition might not take flare sites into consideration in some regions, such as a case where a gas processing facility was placed slightly outside the geographic/geologic boundary of the field. However, we noticed that a majority of the flares that were not matched to a field are in close proximity to the field boundaries. Therefore, we extended the field boundaries to cover more flares that were likely to be related to the fields (see figure 3). The distance to extend field boundaries is uncertain. Performing a sensitivity analysis (see figure 3(B)), the flare counts covered by extended fields increased quickly with distance from 1 to 4 km; however, the increase slowed with distance longer than 6 km. Hence the distance threshold was set to be 5 km, at which the proportion of matched flares rises to 86%. Some flares were found to be far away from fields but close to the infrastructure points. This is due to the fact that flaring also occurs in infrastructure like oil refineries and gas processing facilities. Thus we matched these flares with their closest infrastructure point within a certain distance (shown in figure 3(A)). The distance threshold was set to be in accordance with that of the extended field, which is 5 km. The flares that were still not matched after this step were considered 'unmatched' flares.
After flaring data matching, we also accounted for instrument uncertainty in the field-level flare volumes based on data from Brandt [18]. Brandt's study compared VIIRS flare volume data with government-reported volumes in nine countries and seven years. The government-reported volume data were collected by self-reporting from operators. A total of 1054 flare estimates were analyzed. We divided these estimates into eight logarithmic flare size classes (see supplementary material figure S4). We then simulated the distribution of the flare volume ratios (government-reported volumes/satellite-estimated volumes). Here we assume that the governmentreported volumes were accurately reported by operators. All classes are well matched by base-10 log-normal distributions (see table 1, specific histograms of each flare size class in supplementary material figure S4). Note that the high-volume flares are slightly overestimated by satellite (log-ratio below 0), whilst the low-volume flares are underestimated by satellite (log-ratio above 0). This is due in part to failed detections, which lower total average detected emissions. Also note that the mean of a base-10 lognormal distribution is 10 µ+ 1 2 σ 2 > 10 µ , where µ is the mean of the logarithm of the full distribution and σ 2 is the corresponding variance. For the distribution of government-reported/satellite-estimated flare volumes, this implies that if the logarithm of the distribution is approximately normal with µ = 0, as is the case for larger flare volumes, the satellite will overestimate total flare volumes on average for flares in that size range.
We performed a 1000 sample Monte Carlo simulation for the flare volume ratios based on these distributions. For each flare, we picked its appropriate size class, then generated a random ratio based on its distribution, and then multiplied with its satellite estimates to get a corresponding random uncertainty-adjusted volume. We repeated this process for 1000 times for each flare. Then for each field, we sum up the random uncertainty-adjusted volumes of its flares in each iteration to be the Table 1. Properties of ratios of flare volumes per bin of flare size based on Brandt [18]. The smallest size class demonstrates significant average underestimation from VIIRS, by roughly a factor of 10. Larger size classes are roughly centered around zero, meaning that the mean and median ratio of reported and estimated flaring tends to be close to 1:1. uncertainty-adjusted volume of this field. So each field has 1000 random uncertainty-adjusted volumes.

Results and discussion
In this section, two kinds of flare volumes were analyzed and discussed: satellite-estimated volumes and uncertainty-adjusted volumes. As noted above, satellite-estimated volumes are volumes reported as yearly VIIRS flaring estimates in units of bcm yr −1 . Uncertainty-adjusted volumes, in contrast, are adjusted using the uncertainty analysis approach detailed above. In 2019, 2922 fields and 140 infrastructure sites in 47 countries are matched with flare volumes. The overall satellite-estimated volume matched is 138 bcm yr −1 , which is 90% of the total satellite-estimated volumes of the surveyed countries and 86% of worldwide total satellite-estimated volume. The mean uncertainty-adjusted volume is 141 bcm yr −1 (95% confidence interval, 140-192), which is 89% of the uncertainty-adjusted volumes of the surveyed countries and 86% of worldwide total uncertainty-adjusted volume. Figure 4 shows the distribution of satelliteestimated flare volumes. We see that the size of flares is distributed lognormally; the average estimated volume per field is 0.045 bcm yr −1 ; however, the median value is much lower at 0.001 bcm yr −1 . The highest estimated volume is 5.22 bcm yr −1 , whilst the lowest is 10 −5 bcm yr −1 . A total of 263 fields have estimated flare higher than 0.1 bcm y −1 . In this study, we call fields with more than 0.1 bcm yr −1 of satelliteestimated volume 'high-flare' fields. The cumulative satellite-estimated flare volume of high-flare fields is 67% of that of all the fields, even though these fields represent less than 10% of the total fields with observed flaring (263/2922, more in supplementary material section 2.1). Thus, these fields are significant for future routine flaring reduction plans. Given that the log-transformed flare ratios follow roughly a normal distribution, this means that flare volume estimation errors are distributed approximately lognormally across fields. Figure 5 shows the spatial distribution of fields and their satellite-estimated volumes in 2019. Note that only fields matched with at least one flare are plotted, and that the country total satelliteestimated volume here is the sum of field and infrastructure volumes, which therefore does not include 'unmatched' flares. High-flare fields concentrate in the following regions: Persian Gulf, West Siberia and East Siberia in Russia, Eastern Venezuela Basin, Permian Basin, Gulf of Mexico and Williston Basin of the United States, and African countries such as Algeria, Libya, Nigeria and Congo (Brazzaville).
At the country level, the top ten flaring countries with both their satellite-estimated volumes and mean uncertainty-adjusted volumes are labeled in figure 5. Russia, United States, Iraq and Iran are the top four countries with more than 10 bcm yr −1 of satelliteestimated volume in each. They together account for 48% of the total matched satellite-estimated volumes. Other high-flare countries are mainly in the Middle East, North and West Africa, North and South America and South Asia. For countries like Saudi Arabia, Canada, China and United Arab Emirates, their estimated volumes are significantly lower when computed on a per-barrel basis (i.e. flaring rate is low when compared with high oil and gas production). On the contrary, despite their lower production of oil and gas, countries like Syria, Cameroon and Venezuela have relatively higher satellite-estimated volumes.
In figure 5, uncertainty-adjusted volumes are presented in blue, satellite-estimated volumes in black. The mean uncertainty-adjusted volumes labeled in figure 5 are similar to the satellite-estimated volumes in most countries, except for US (17% higher), Iraq (8% lower) and Venezuela (5% lower). This is because that US has more low-volume flares, which tend to be underestimated by satellite; whereas Iraq and Venezuela have more high-volume flares, which tend to be overestimated by satellite (see section 2.2). To see the relationship between the uncertainty-adjusted and satellite-estimated volumes more clearly, we plotted the distribution of flare ratios of the top 30 flaring countries in figure 6. The ratio is defined as the uncertainty-adjusted volumes divided by satellite-estimated volumes. Most countries have their median ratios around 1. The global average  ratio is 1.01, implying that the uncertainty-adjusted volumes tend to be consistent with the satellite estimates. According to section 2.2, Countries with more low-volume flares tend to be more underestimated by satellite and thus have higher ratios, such as Canada (1.29), US (1.17) and China (1.15). On the contrary, countries with more high-volume flares tend to be more overestimated by satellite and thus have lower ratios, such as Iraq (0.92), Venezuela (0.95) and Iran (0.96). Note that the data we use here from Brandt's study are all from offshore fields. Studies have shown that the difference between onshore and offshore gas flaring could be large [36]. Offshore fields usually have well-defined boundaries and the flares usually occur from a limited number of points such as platforms or floating production vessels [18], whilst onshore fields are more difficult to be defined [37]. Therefore, Brandt's study may not show the comprehensive distribution of all the flares globally [18]. Field-level uncertainty analysis can be found in the supplementary material section 2.3. Figure 7 shows a more detailed analysis of the data matching quality of these countries. Figure 7(A) shows the country-level satellite-estimated volume coverage percentages, which reflects the quality of our data matching. Countries such as US with governmental data sources and GIS data format have higher coverage percentages. On the contrary, China, Turkmenistan and Russia are high-flare countries with low coverage percentages (China 54%, Turkmenistan 73% and Russia 83%). This is due to their lack of GIS data sources and challenges in finding maps of the oilfields in these countries. Also for Russia and China, only data in major oil and gas producing regions were collected, so some fields in other regions are still likely missing.
The pie charts in figure 7(A) show the field counts breakdown according to their satellite-estimated volumes. Total field counts are shown with different pie sizes. Again, only matched fields with flare data were counted. US has the largest field count of 918, of which 35 are high-flare fields, accounting for 53% of the national satellite-estimated volume. These high-flare fields are concentrated in Permian Basin, West Gulf Basin and Williston Basin rather than distributed all over the nation. Canada, Russia and China follow the similar trend as US. In Iran and Iraq, however, over half of the fields are high-flare fields although the total field counts are less than 100. Figure 7(B) shows more information on the coverage of our database. The bars of each country show the proportion of matched uncertaintyadjusted volume in total uncertainty-adjusted volume of each country. Two lines represent the average proportion and cumulative proportion of matched volume. Note that the worldwide total volume is slightly higher than the sum of total volumes in these countries as a small number of flaring countries are not included in our database. This figure shows that the top 24 countries from Russia to Canada cover 80% of the world total volume. Consequently, these countries should be given higher priority in the future database updates, especially those with coverage percentages less than the average 89%, such as Russia (highest volumes) and China (lowest percentage).
We also repeated our analysis for the flare data   figure 6. The green dashed line shows the average percentage of flaring captured by fields in our database across all countries (89%). The orange line shows the cumulative percentage based on the world's total mean uncertainty-adjusted volume. 89% of those of 47 countries and above 85% of those worldwide.

Conclusion
This study estimated field-level associated gas flare volumes during petroleum production from 2012 to 2019, based on a geographic database of global oil and gas fields. Over 2900 fields and 140 infrastructure points in 47 countries were matched with flare volumes in 2019. The annual total flare volume matched is 89%-92% of the volume of these countries and 85%-87% of worldwide total volume. This study is, to our knowledge, the first estimate of flaring from global oil and gas at a field level, rather than at the country level. The analysis is based on remote sensing technologies with global coverage, allowing for continuous updating over time. We found that a small number of high-flare fields account for a large share of flare volumes, suggesting that these are promising areas of focus for policies aimed at reducing flaring.
The geodatabase contains GIS data of over 50 000 oil and gas fields and around 4700 infrastructure points of 51 countries and regions, which account for 96% of global oil production and 89% of natural gas production. This database combines over 70 available data sources together to provide the most accurate and comprehensive geo-data. It could be a powerful data foundation of future studies on oil and gas production with spatial scale from fields to countries.
The field-specific flaring data could further support future carbon emission regulation and climate change policies. By allocating VIIRS flaring data to the oilfields, our study can give oil companies and regulators a clearer understanding of which fields contribute most to the overall gas flaring. Consequently, more accurate monitoring and specific policy making at gas flaring reduction are now possible based on this inventory. Academic studies regarding carbon emissions assessment could also benefit from our fieldlevel flaring data as flaring is an important contributor of GHGs in oil and gas production.
Future work could be done in two directions. First, further updates to the database would be helpful since our results imply that the field data in some countries are still not complete-most notably in Russia and China. Additionally, the data can be used for application studies such as methane plume detection, methane flux apportionment, or bottomup georeferenced emissions databases.