Onshore hydrocarbon seep detection using the GF-5 hyperspectral image: a case study in the Karamay area, NW China

ABSTRACT Satellite remote sensing has been widely utilized for petroleum detection offshore since oil spill is normally widespread. However, the onshore hydrocarbon detection is insufficient because the lower spectral resolution of previous satellites made it difficult to identify small scales of seepages. In this study, a dataset of the High-Resolution 5 satellite (GF-5) of China was chosen to directly detect hydrocarbon seeps in the Karamay area. First, an IDL program was used to remove the vertical tripe noise of the image to ensure the high quality of the image. Second, use principal component analysis (PCA) was combined with the classification and regression trees (CART) method RuleGen to process the image. The detected hydrocarbon information was enhanced by RGB color synthesis. The results show that the methods can detect hydrocarbon effectively. Field verification results show that the accuracy is 86.5%. This study provides an effective method for hydrocarbon detection using GF-5 data.


Introduction
The characteristic spectral absorption peaks of hydrocarbons are located at 1200, 1730, 2330 and 2350 nm due to the vibration of -CH2in hydrocarbon (Asadzadeh & Filho, 2017;Van der Meer et al., 2002).Taking advantage of these absorption features, remote sensing has been widely used for direct detection and contamination monitoring of offshore oil and gas seepages.Because the oil spill is normally widespread no other substances interfere (Clark et al., 2010;Lammoglia & Filho, 2011).However, the onshore direct detection was usually problematic since the oil seeps are commonly small scales and the present spectral sensors are of low resolution (Asadzadeh & Filho, 2017;Chen & Hu, 2014;Mu et al., 2020;Yang et al., 2000).Therefore, numerous indirect methods focusing on surface-relevant anomalous minerals or biological interaction with oils, such as hydrocarbon haloes, carbonate mineralization, clay mineral accumulation, reduction of ferric iron and increased reflectance of plants, are used for onshore hydrocarbon detection (Yang et al., 2000).The issue is that the above features can also be formed via processes that are unrelated to oil seepages and may lead to false positives during hydrocarbon detection.In recent years, with the development of hyperspectral technology, principally the airborne hyperspectral, provided effective tools for direct oil exploration and monitoring on land (Hörig et al., 2001;Moreira Scafutto et al., 2017), while satellite hyperspectral applications were relatively lacking.Fortunately, the High-Resolution 5 satellite (GF-5) of China gave an opportunity for such an application.
In this study, a scene image of the GF-5 was obtained for hydrocarbon detection in the Karamay area, NW China.In this region, numerous oil seeps, oil contamination-related sands, soils, and bitumen distribute around a nature tar hill called 'Black Oil Hill' which gave a good opportunity for the direct detection of hydrocarbons.Through image processing methods, such as preprocessing, principal component analysis (PCA) and classification and regression tree (CART) methods, hydrocarbon leakage information was successfully identified (Main Map).The results have implications for the first direct petroleum detection using hyperspectral data of GF-5.It is of guiding significance to use the massive GF-5 data to monitor the temporal and spatial variation of oil and gas leakage in the future.It also has important environmental monitoring implications since the onshore hydrocarbon, such as the methane, is a huge source of greenhouse gas (Etiope, 2015;Etiope & Klusman, 2010;Shikwambana & Kganyago, 2020).

Regional geology
The Karamay region is located in the western margin of the Junggar Basin, NW China, with Zaire, Hala'alat and other mountains as the basin boundary (Figure 1 (a-c)).In these mountains, plentiful diachronous volcano-sedimentary strata and granite intrusions are exposed (Zong et al., 2016).
In the Karamay region, magmatic events predominantly occurred during the Early Carboniferous, Late Carboniferous, and Silurian-Devonian.The sedimentary strata mainly include the Upper Triassic to Lower Cretaceous fluvial-lacustrine sediments (Figure 1(a)).From the bottom to top, the strata are sandstones, sandy mudstones and interlayered with conglomerates in the Upper Triassic Xiaoquangou Formation (T 3 x); the Lower Jurassic Badaowan Formation (J 1 b) mudstones, sandstones and conglomerates; the Middle Jurassic Sangonghe (J 2 s) sandstones interbedded with mudstones and conglomerates; the Xishanyao (J 2 x) Formations sandstones interbedded with shale and coal; the Upper Jurassic Qigu Formation (J 3 q) sandstones, mudstones and conglomerates; and the Lower Cretaceous Tugulu Formation (K 1 t) sandstones, mudstones interlayered with gypsum and calcareous concretions (Figure 1(a)) (Jianping et al., 2016;Qiu et al., 2015).The major oil reservoirs were developed in the T 3 x, J 1 b, and J 3 q.The oil accumulations are shallowly buried (200-500 m deep, Figure 1(d)) (Jianping et al., 2016).Hydrocarbons escaped from the trap easily along the fault system to the shallower layer and then traveled through the shallow geological structure to the surface (Figure 1(d)) (Du, 2005;Qiu et al., 2019).

Oil seeps around the black oil hill
The Black Oil Hill is located in the NE of Karamay city, where hydrocarbon seeps can be observed in high spatial resolution images (Figure 2(a,b)).Sandstones and conglomerates are commonly exposed around Black Oil Hill.When oil seeps leaked into these rocks, oil contaminations were formed, such as oil stain and oil pool (Figure 2(c,d)).
In urban areas, many artificial playgrounds, produced by petroleum products, can be used to test the accuracy of hydrocarbon identification (Figure 2(b)).Other substances that may interfere with identification results include some black crushed stones in a gravel plant to the north (Figure 2

Method
The hyperspectral camera carried by GF-5 can obtain spectra from 400 to 2500 nm, 330 spectral color channels, a width of 60 km, and a spatial resolution of 30 m.The visible spectrum resolution is 5 nm and the short-wave infrared spectrum resolution is 10 nm (Liu et al., 2020).We have used hyperspectral image recognition and field verification to establish the relationship between hydrocarbon seeps and hyperspectral images in the Karamay region.The processing of hyperspectral images is described in the following sections.

Preprocessing of GF-5 image
One scene image of the Karamay area (the center coordinates are 45°38 ′ 53.3 ′′ N, 84°49 ′ 6.4 ′′ E), captured in September 2019, was used to detect onshore hydrocarbon.A subset image was selected to extract hydrocarbon information from the original image (Figure 2 (a)).The pretreatments included radiometric calibration, atmospheric correction, stripe noises removal and orthorectification.The radiometric calibration, atmospheric correction and orthorectification were done on ENVI + IDL (version 5.3) software (Harris Geospatial Solutions, Inc., Broomfield, CO, USA).The stripe noise removal was carried out by the full image stripe noise removal method (Tan et al., 2005).The calculation formula of this method is where x ijk is the radiation value of the i column and j line in the K band; x ′ ijk is the corrected radiation value; the m ik is the mean value of the K band, i column; s ik is the standard deviation of the K band, i column; m ik is the mean value of the K band, i column of the reference image; s ik is the standard deviation of the K band, the i column of the reference image.The a ik is the sensor gain and the b ik is the offset; m k is the mean value of the whole image in the K band; s k is the mean standard deviation of the whole image in the K band.

Hydrocarbon detection
The principal component analysis (PCA) and the classification and regression trees (CART) method RuleGen were used for hydrocarbon detection.The PCA can generate a spectral image of lower dimension and clearer characteristics through the recombination of data from various selected spectral bands (Xu & Zhao, 2015).In this study, two absorption bands at, 1730 and 2327, and two reflectance bands at 1660 and 2260 nm were used for PCA to obtain hydrocarbon information.Subsequently, the classification and regression trees (CART) method RuleGen were used to eliminate the effects of interfering substances (Loh & Shih, 1997).The hydrocarbon leakage result was synthesized into a false color image to highlight the identified hydrocarbon leakage areas.Finally, filed verification work was carried out, including hydrocarbon leakage areas and non-leakage areas, to achieve quantitative accuracy and analyze possible interference factors.

Stripe noise removal result
Since the hyperspectral sensor is a push-sweep imager, calibration of tens of thousands of detection elements on array CCD devices is very difficult that made hyperspectral images have many vertical stripe noises, especially in SWIR bands (Tan et al., 2005).As shown in Figure 3(a), the result of PCA without stripe noise removal has obviously dense alternating light and shade vertical stripes, which will seriously degrade the image quality.Through the stripe noise removal by the IDL program, noise has been greatly reduced after PCA, and the ground features were well preserved (Figure 3(b)).

PCA result
The PCA result contains four components (pc Band 1∼pc Band 4), of which the third component highlighted the hydrocarbon information (Figure 3(b)).The result of the PCA presented obviously black areas of the Black Oill Hill and artificial playgrounds in urban.The artificial playgrounds have a regular shape, which was produced by hydrocarbon products, indicating that the band combination of 1660, 1730, 2260 and 2327 nm is valid for hydrocarbon identification (Figure 3(b)).However, some distractors still appear as black areas located to the east, west and north of the city (Figure 3(b)).To the north, crushed stones stacked in a gravel plant represented black areas.To the east and west, large areas appear black areas after PCA but show white or dark-red in natural color, as shown in Figure 2(a).

CART result and the final detection result
The CART is a decision tree plugin developed based on the ENVI.The predictor variables can be either discrete variables (CRUISE and QUEST algorithms) or continuous variables (GUIDE algorithm) (Loh & Shih, 1997).In this study, we chose the QUEST algorithm for the classification of decisions.In Figure 2(a), spectral information on clay rocks and crushed stones is counted standard spectral libraries are built to extract these interfering substances.Then created interferent masks and extracted the corresponding interferents in the PCA result to ensure that only hydrocarbons were reflected.
Figure 4(a) is the detection result of the hydrocarbon seepage.The result eliminates the interference of clay which was masked out and shown as white areas.And information on hydrocarbon in Black Oill Hill and artificial playgrounds is shown as black areas.In the result, part of the crushed stones is still unidentified (Figure 4(a)).To highlight the detection result, we performed RGB color synthesis of the image (R: pc Band 3, G: pc Band 3, and B: pc Band 2) to obtain a false color image.In the false-color image, oil and gas leaks and artificial plastic fields are shown in blue, clay fields are shown in white, crushed stones remain black, and other features are shown in yellow (Figure 4(b)).In general, the hydrocarbon seep information is better displayed on the false-color image.

Field verification
To quantitatively evaluate the detection results, field verification work has been carried out.Totally 37 points were verified in the field distributing around the Black Oil Hill to the northeast of the city, and the area of uplifting strata to the northwest (Figure 5).The verification results of 32 points are consistent with the detection results, and the results of the other 5 points were inconsistent with the detection results (see supplementary materials Table 1).The verification results show that oil pool, oil stain, dry asphaltene, and oil shale on the surface can be detected (Figure 6(a-d)).Some white clay bed was well identified, but darker clay bed was identified as oil seep (Figure 6(e,  f)).Some dry oil sands containing gravel are not detected (Figure 6(g,h)).Dark gravelly soil and brown sand were falsely identified as oil seeps (Figure 6(i,j)).In general, the detection accuracy is 86.5%.

Discussion
In the field of oil and gas detection using remote sensing images, clay rock is a factor that interferes greatly with the detection results.The clay rocks may be deposited or formed by the secondary reduction of  oil and gas.In the Karamay region, hydrocarbon leakage is commonly distributed, which may lead to a richer distribution of surface clay (Wang et al., 2014;Zhang et al., 2005).Because clay rocks have strong characteristic absorption peaks at 2200 and 2350 nm, overlap with the absorption peaks of hydrocarbons in these bands, which can interfere with oil and gas detection (Figure 2(f)) (D'Arcy et al., 2018;Huang et al., 2017).Through the PCA and CART methods, the GF-5 hyperspectral image can remove most of the interference of clay rocks and directly detect hydrocarbon seeps.From the detection result, the oil seeps in Black Oil Hill and plastic playgrounds in urban have been well identified (Figure 4).The oil pools, some soil oil, and dry asphalt can also be effectively identified.And some white clay beds without oil can also be correctly identified.However, there are still some points that are not correctly identified.As shown in Figure 6(f,i and j), there were no obvious oil seep can be seen on the surface, but it was identified as hydrocarbon leakage.Probably because these rocks contain trace amounts of hydrocarbons that are invisible to the naked eye.In Figure 6(g,h), the black oil sands fail to detect oil seep.We speculate that it may be because the oil sands are in coarse-grained rock strata, which may contain more water because of the large porosity.Water absorbs spectral information, leading to inaccurate identification.
Traditional methods for monitoring hydrocarbons in soil include a gas chromatographic method, mass spectrometry, liquid chromatography and solid phase microextraction (Chen et al., 2017).These methods require a large number of soil samples to be collected for analysis and are expensive, making it impossible to monitor oil and gas leakage over a large area.In addition, the leakage of oil and gas in the process of pipeline transportation cannot be located by traditional methods.By contrast, the hyperspectral satellite image data acquisition period is fast, the area is large, and can realize the rapid monitoring of a large range of oil and gas leakage.Therefore, this study is of guiding significance for using hyperspectral satellite data for oil and gas exploration and environmental protection.

Conclusions
Through the method of full image stripe noise removal by the IDL program, the vertical stripe noises of GF-5 image can be effectively removed, which is conducive for subsequent research.The principal component analysis, combined with classification and regression tree processing of high-resolution remote sensing satellite data, is an effective method to identify hydrocarbon information.The detection accuracy is 86.5%.

Software
ENVI + IDL (version 5.3) was used to process GF-5 data, and ArcGIS 10.3 and CorelDraw X8 were used to create and modify images and maps, respectively.
(e)), and widely distributed clay rocks (Figure 2(a)).The spectra of these substances and the Black Oill Hill were collected from the GF-5 satellite images (Figure 2(a)) and are shown in Figure 2(f).

Figure 2 .
Figure 2. Image and photographs of typical targets in Karamay city and Black Oill Hill.(a) GF-5 satellite true color image of Karamay city.(b) Google Earth image showing artificial playgrounds.(c) and (d) Photos of surface hydrocarbon leakages around Black Oil Hill.(e) Google Earth image showing a gravel plant.(f) Spectral profiles of the main surface materials collected from GF-5 image (points positions are shown in Figure 2a).

Figure 3 .
Figure 3. PCA results of non-stripe noise removal (a) and stripe noise removal (b).

Figure 4 .
Figure 4.The final detection result of hydrocarbon seep.(a) Detection result after the CART method.(b) False color image of the detection result.The blue areas show oil and gas leakage.

Figure 5 .
Figure 5. Field verification areas and points of the hydrocarbon detection.(a) and (b) False-color images showing the location of the verification points.(c) and (d) Google Earth images of the verification areas.The location of the field verification area is shown in Figure 4.

Table 1 .
Results of field verification of oil and gas leakage.The results of image recognition are consistent with the field verification.False: The results of image recognition are inconsistent with the field verification.