Comparison of chemical sediment analyses and field oiling observations from the Shoreline Cleanup Assessment Technique (SCAT) in heavily oiled areas of former mangrove in Bodo, eastern Niger Delta

Trial pitting, borehole drilling, and soil, sediment and groundwater sampling are important components of oil spill response and contaminated land assessment. These investigations provide detailed information on the subsurface geology and contaminant occurrence and transport but have disadvantages including worker safety hazards, cost and time required for completion, and may cause cross-contamination among aquifers. An alternative to such investigations applied in oil spill response is the Shoreline Cleanup Assessment Technique (SCAT) approach, which relies heavily on direct visual observations to assess the severity of oil contamination and guide cleanup efforts. Here, we compare SCAT observations of oil type, surface coverage and pit oiling with collected surface and subsurface sediment samples taken concurrently and analysed for a suite of hydrocarbon constituents. Results indicate that although limited sampling and analysis is required to chemically characterize the contamination, SCAT observations can be calibrated using limited sediment sampling and are sufficient to steer physical cleanup methods. This is particularly evident as even closely spaced chemical samples show high variability. A coarser direct visual observation is fit-for-purpose considering the wide variability in contaminant distribution at even local levels. In this contribution, we discuss the limitations of the different methods. Supplementary material: The modified SCAT data collection form, figures showing subsurface versus ground surface total petroleum hydrocarbons (TPH) and a variogram of TPH measured in the ground surface and subsurface samples, and data tables are available at https://doi.org/10.6084/m9.figshare.c.4534682 Thematic collection: This article is part of the Measurement and monitoring collection available at: https://www.lyellcollection.org/cc/measurement-and-monitoring

The study area of Bodo Creek, located in the eastern part of the Niger Delta ( Fig. 1), was exposed to two oil spillages in 2008 resulting from leaks in the Trans Niger Pipeline operated by The Shell Petroleum Development Company of Nigeria Ltd (SPDC). The affected area consists of low-lying mangrove habitat, numerous tidal channels lined with very soft mud, and harder substrates along the upper intertidal mainland and island shorelines and along constructed fish ponds no longer in use. In 2015, SPDC agreed to remediate 1000 ha of oiled former mangrove areas in the Bodo Creek. Phase 1 of the cleanup programme began in September 2017, lasted for about a year, and focused on reducing sediment oiling in the top 30 cm by using surface agitation (raking) in lesser contaminated areas and deeper low-pressure, high-volume water flushing in highly contaminated soft mud areas, most commonly found along intertidal channels. Phase 1 also included a mangrove replanting pilot programme to assess seedling survival in contaminated sediments and a coring programme at 30 sites to determine depth of oiling. Results will be provided in forthcoming publications. A more intensive Phase 2 remediation and mangrove replanting programme is scheduled to start in 2019. The damage to the Bodo area is the largest documented extent of spill-related damage to mangroves (Duke 2016;Gundlach 2018) and the largest cleanup of oiled mangrove habitat known to the authors. Likewise, the Shoreline Cleanup Assessment Technique (SCAT) and chemical sampling programmes described herein are also the largest ever undertaken within oiled mangrove forest habitat.
Before the two primary spills of 2008, the number of reported spills in the area was relatively small (17 between 1986 and 2008), of which 10 were from illegal activities (Gundlach 2018). Illegal activities causing spillage include pipeline tapping, connection from the tap to a vessel by (leaking) hoses, transport of stolen oil by a variety of open-hulled large (e.g. >30 m) and small wooden vessels, and shore-side refining. After 2010, the number of spills caused by illegal actions increased dramatically (over 30 in 2010-2011). These illegal activities and associated spillages have continued into 2018, even as cleanup and the SCAT-chemistry programmes were continuing. Mangrove plant recovery since the primary loss in 2008 is very limited and the need for planting is evident. An aerial view of part of a central portion of the study area in 2015 and 2018 is shown in Figure 2.
The method to assess the distribution of oil spills and the preferred methods of cleanup have evolved over several decades: from purely scientific investigations of surface and subsurface spill extent (e.g.  to using SCAT. SCAT involves collaborative field teams that include the responsible party, regulator, landholder and representative from the local community and non-governmental organizations. Participants all view the site at the same time and make a consensual recommendation regarding the appropriate response action (Owens & Sergy 2003;Santner et al. 2011). SCAT was also called upon by the cleanup managers who are represented in the Bodo Mediation Initiative (BMI) and SPDC to monitor and confirm when the spillresponse contractor successfully completes the Phase 1 cleanup. The BMI was established following a mediation process in 2013 led by the Dutch Embassy in Nigeria between the Bodo community and SPDC (Zabbey & Arimoro 2017;Sam & Zabbey 2018).
SCAT is in essence a relatively quick, straightforward, robust and participatory approach that contrasts with traditional land-based site assessments, which rely heavily on complex investigations and soil, sediment and groundwater sampling and analyses. Although drilling boreholes, sediment sampling and laboratory analyses provide detailed information on the subsurface geology and contaminant occurrence and transport, they have disadvantages including worker safety hazards, cost and time required for completion, and may cause cross-contamination in the case of deep boreholes that connect different aquifers and are improperly sealed or drilled (Bonte et al. 2015).
Although SCAT clearly has advantages over the traditional drilling, sampling and laboratory analysis approach, a comparison between SCAT data and chemical sample analyses has to our knowledge not been reported in the literature to assess the accuracy of the direct field observations collected during SCAT. To bridge this gap, this work compares results of SCAT field observations with those of an extensive chemical sampling programme undertaken concurrently. It should be noted that this study does not assess the tolerance of mangroves to crude oil or total petroleum hydrocarbons (TPH) levels, nor does it determine crude oil derived contaminants in fish or other aquatic species. Both topics are the subject of continuing investigations.

SCAT surveys
SCAT is a systematic method for surveying an affected shoreline following an oil spill. The SCAT approach was developed during the response to the 1989 Exxon Valdez oil spill and has since been applied and further refined (e.g. NOAA 2013). It is a viable and practical technique that maximizes the recovery of oiled habitats and resources while minimizing the risk of further ecological deterioration from cleanup efforts.
SCAT field procedures in Bodo initially utilized standard forms (e.g. NOAA 2013; IMO-REMPEC 2009) but these were found inadequate to accurately describe the contaminated mud-mangrove root environments of the area affected by the spills. Principal among the difficulties in using these forms was the time needed to capture the information when dealing with security issues, boat transport, the diurnal ≥2 m tide that floods the area, very soft muds making walking to each site difficult, and the inability to determine specific horizons of oiling when oil enters a pit. The modified SCAT data collection form is included in the supplementary material. Specific to this paper, we use the visual estimation of per cent oil on the surface and per cent subsurface oil as observed on the surface of water in a pit dug to 25-30 cm and after waiting c. 5 min for the incoming water and oil quantity to stabilize. Both surface and subsurface oil estimations were averaged based on at least three SCAT members giving their estimation. Oil type and extent of coverage (%) on the surface and subsurface area were categorized as sheen (silver or rainbow), brown or black oil, or tar or asphalt. Layers of brown oil ( probably with some emulsification) and black oil (no emulsification) are thicker than sheen (Bonn Agreement 2017) and would be expected to show higher concentrations measured by chemistry. Figure 3 illustrates the surface and subsurface oil observations for two SCAT sites.
A combination of surface and subsurface observations were made to steer the cleanup activities, where surface observations determined whether raking to remove tar, dead stumps and garbage was needed, and subsurface pit observations determined the need for sediment flushing.
Where the percentage cover of black or brown oil in the pit was 35% or less, the cleanup Phase 1 treatment was confirmed as completed. No treatment was performed in areas within c. 3 m of living mangrove as the presence of living mangroves indicates that the residual weathered oil in the soil is not restricting mangrove growth.
To undertake cleanup and SCAT field monitoring of the Bodo Creek oil spill site, the delineated 1000 ha was subdivided into 200 m × 200 m (4 ha) grids. Because of channels and presence of living mangrove, grid size varied from a full 4 ha to much smaller sizes (e.g. tens of m 2 ). Channels and live mangrove areas are not included in the 1000 ha to be treated. Two cleanup contractors were selected for the Phase 1 cleanup encompassing 535 grids. SCAT was applied first to advise on where and how to clean each grid and then to monitor and verify when Phase 1 cleanup was adequately performed. As cleanup progressed, SCAT categorized the grid and work to be done into four categories: 'No Treatment', where pit oiling was 35% or less and therefore no Phase 1 work was required; 'Pre-Treatment', where treatment was required ( pit oiling 35% or greater) but not yet started; 'During Treatment', where treatment started but pit oiling was still 35% or greater, indicating that additional work was required; 'Post-Treatment', where pit oiling was reduced to 35% or less and therefore Phase 1 was completed. Sites were not repeated; for example, sites of 'Pre-Treatment' in a grid were not the same as 'Post-Treatment'. For this reason, this paper reviews overall trends and is not site specific.
An overview of the work grids and former mangrove area (in red) is shown in Figure 4. This figure also shows the location of the Figure 2 photograph as well as the sites of the two 2008 spills discussed above.
SCAT surveys were undertaken in August 2015 (35 sites) and from 18 September 2017 to 30 August 2018 (911 sites). The chemical sampling team participated with SCAT during the 2015 surveys and from September to December 2017. SCAT teams included representatives from federal government (DPR, NAPIMS and NOSDRA), state government (RSMENV), SPDC-ORP, BMI, the Bodo community, non-governmental organizations (NACGOND) and the two cleanup contractors. Participating personnel and organizations, with full versions for acronyms, are provided in the Acknowledgements.  sampling for chemical analyses was carried out before, during and after the first phase of cleanup in conjunction with the abovedescribed SCAT site investigations. Sampling sites were selected to ensure a wide coverage of habitats, locations and oiling conditions. Figure 5 provides an overview of the location of sites reviewed in this paper.

Chemical sampling
At each chemical sample site, separate composites of five grab samples were taken from the surface (0-5 cm depth) and subsurface (15-25 cm depth). One of the five pits was used for the SCAT visual surface and subsurface observations. The two selected sampling depths are based on the observations from 2015 surveys, which indicated that the most heavily oiled sediments are located in the top 30 cm and that the associated cleanup method should predominantly focus on this depth of oiling. Figure 6 illustrates field-sampling activities. A clean stainlesssteel spoon was used to take sediment from each of five surface and subsurface locations spaced equally within c. 5 m around a centre point. The subsamples from surface and subsurface were placed into separate and clean stainless-steel bowls. After thorough mixing, a composite sample was packaged and shipped with accompanying Chain of Custody documentation for laboratory chemical analysis by methods detailed below. Pits were backfilled following SCAT data recording and site photography. In total, we utilize data from 317 SCAT sites that have observations of surface and subsurface oiling and results from chemical analysis.
In addition to the composite sampling, for every 20 samples a blind duplicate was collected consisting of a homogenized sample divided into two subsamples. Each duplicate portion was assigned its own sample number to be unknown to the analytical laboratory. At four sites, the five individual (discrete) samples used for the composite sample were analysed independently for comparison to assess small-scale variability.
This sampling plan follows the ISO 10381 standard for soil quality sampling (ISO 2002). Laboratory analyses were carried out using certified methods (MCERTS) with associated laboratory quality assurance and quality control procedures. The 2015 samples were analysed by Alcontrol (UK) whereas the 2017 samples were analysed by I2 (UK).
Samples were analysed for total and fractionated TPH, fraction banding as defined by the Total Petroleum Hydrocarbon Criteria Working Group (TPHCWG 1998(TPHCWG -1999, by gas chromatography-flame ionization detector (GC-FID) for non-volatile hydrocarbons and by gas chromatography-mass spectrometry (GC-MS) for volatile hydrocarbons as well as individually reported benzene, toluene, ethylbenzene and xylene (BTEX) components. Sixteen USEPA (2003) priority polynuclear aromatic hydrocarbons (PAH) were analysed individually and as a sum parameter.
The statistical significance of the difference in TPH concentrations between different treatment classes (i.e. No Treatment, During Treatment and Post-Treatment, respectively) and the Pre-Treatment concentration was determined with the nonparametric Mann-Whitney U test. The statistical significance of the differences in TPH concentrations for the different SCAT observation groups (i.e. oil sheen, brown oil and black oil) was tested with a one-way ANOVA test. Statistical calculations were performed using the Python SciPy module (Anonymous 2018). The spatial data structure was investigated by constructing variograms using the Python Pykrige module (Anonymous 2019).
Nigerian regulations relating to oil spill cleanup and remediation are described in Environmental Guidelines and Standards for the Petroleum Industries in Nigeria issued in 1992 and updated in 2018 (EGASPIN 2018). EGASPIN provides a tiered approach to determine corrective requirements based on the Standard Guide for Risk-based Corrective Action Applied at Petroleum Sites prepared by ASTM (2015). The first tier comprises intervention of ≥5000 mg kg −1 , which is derived for a long-term residential exposure scenario. The higher tier assessments (tiers 2 and 3) are more complex in that they consider receptors such as persons living near or using the area of contamination. Tier 2 and 3 assessments are being carried out to derive site-specific target levels, which will provide risk-based criteria that are considerate of local circumstances and are the topic of a subsequent publication. Some of the shorelines, particularly in areas close to Bodo, are used by fishers on a regular basis. Most of the other damaged areas are visited relatively infrequently.  four Phase 1 treatment conditions: No Treatment, Pre-Treatment, During Treatment and Post-Treatment. Median and mean TPH concentration before Phase 1 cleanup (Pre-Treat) for the ground (G) surface samples are c. 35 000 and 50 000 mg kg −1 , respectively, with the 25%-ile and 75%-ile at 17 700 and 68 000 mg kg −1 . Mean values are typically higher than the median (50%-ile) because of 'hotspots', which bias the mean upward. Surface concentrations are considerably higher than subsurface values, with the latter having median and mean concentrations of 12 250 and 30 300 mg kg −1 with the 25%-ile and 75%-ile at 2448 and 39 250 mg kg −1 . It is interesting to note that although subsurface sediment concentrations are consistently lower than surface concentrations, the correlation between surface and subsurface collected at the same location is very poor (R 2 = 0.12; Fig. SI-1 in supplementary material), highlighting the high degree of heterogeneity. The results of vibracoring and analyses undertaken in September-October 2018 at depths of 2-4 m reveal that the vast majority of petroleum hydrocarbon impact is restricted to depths of <50 cm (data not reported here).

Distribution of TPH concentrations
There is no clear spatial pattern present in the TPH concentrations (Fig. 5), other than lower concentrations present in the far south even though that area is close to the location of one of the 2008 spills (shown as a star at lower left in Fig. 4). High concentrations are found throughout the remaining area, probably owing to the spread of the initial spills as well as continued illegal activities. The absence of a spatial structure in the data is confirmed by the variograms that were constructed for TPH analyses from ground surface and subsurface samples and the exponential variogram model that was fitted to the data (supplementary material, Fig. SI-2). The fitted variogram models plot as horizontal lines with a nuggetto-sill ratio of close to unity, which confirms the absence of any spatial structure in the data (Cambardella et al. 1994).
BTEX constituents and light aliphatic constituents are absent or significantly depleted from all samples as compared with analyses of unweathered Bonny Light crude oil. The 16 USEPA PAH sum concentrations of all but one sample are below the Nigerian regulatory (EGASPIN) threshold of 40 mg kg −1 (which is actually for a subset of 10 PAH and not all 16 PAH defined by USEPA) and are not further discussed.
Post-Treatment TPH values show a statistically significant (P < 0.01) reduction for the subsurface samples with the median concentrations decreasing from 12 250 to 7250 mg kg −1 . However, the median concentrations for the surface samples show an increase from 35 000 to 39 500 mg kg −1 , which was, however, not statistically significant. The No Treatment sites have statistically   The whiskers extend from the box to show the range of the data between the 5and 95%-ile. The green triangle shows the mean value. Flier points are those past the end of the whiskers. An asterisk above a box plot indicates that the median is significantly different at a 99% confidence level (P < 0.01) from the Pre-Treatment value. significant lower TPH concentrations for subsurface samples compared with the Pre-Treatment samples whereas for surface samples no statistical difference is found.
The fractionated TPH data show that relatively heavy TPH fractions (>C16) dominate both the Pre-and Post-Treatment conditions (Fig. 8). Compared with unweathered Bonny Light crude oil, the sediments are enriched in heavier fractions as a result of weathering (combined volatilization and biodegradation) that preferentially removed the more volatile and biodegradable compounds (Brown et al. 2017a). In particular, the heavier aromatic fractions have been enriched, consistent with findings for land farming trials showing that although biodegradation decreases concentrations of all TPH fractions, the rate of depletion is lower for heavier fractions compared with lighter fractions (Brown et al. 2017b). The distribution of different TPH fractions is similar both between surface and subsurface samples as well as for Pre-Treatment and Post-Treatment conditions. The physical cleanup method applied mobilizes free oil but does not preferentially remove the lighter TPH fractions (unlike biodegradation).

Data reproducibility and variability
For TPH, an average relative percentage difference (RPD) between primary and blind duplicate samples was determined to be 63% (Fig. 9), above generally accepted criteria ranging between 40 and 50% (e.g. NJDEP 2014; IDEQ 2017). The likely explanation for the large RPD is that homogenization of the composite samples was complicated by the texture of sediment consisting of a mixture of detritus (e.g. dead mangrove roots) and clumps of clay-to-sand-sized sediments. Additionally, contamination was present in the form of isolated pockets of residual free oil droplets scattered through the sediment, which combined with the problematic homogenization may cause large differences between different parts in the composite sample (see photograph in Fig. 6b).
To assess small-scale variability from the same sample site, the five individual grab samples used for the composite sample were analysed for four sites. Samples were collected within 5 m of a centre point. Results show that TPH is highly variable on a small scale (Fig. 10). The mean of the individual discrete samples deviates considerably from that of composite sample from the same site. This is in line with the previously discussed results of blind duplicate samples, further suggesting that the high RPD in blind duplicate sampling was a result of incomplete homogenization. The relative standard deviation (ratio of standard deviation and mean) ranged between 29 and 111%, and 33% and 96% for the surface and subsurface samples, respectively. This is in the same order of magnitude as the average RPD for the set of blind duplicate samples.
The small-scale variability in the presence of crude oil in sediments that is observed visually through the SCAT process was investigating by replicate pit oiling observations in five SCAT pits within a 6 m radius (data shown in Table SI-3 in supplementary material). Similar to the TPH observations, these data demonstrate a high variability in the observed oil coverage in pits with values ranging between 5 and 95% coverage. The relative standard  deviation of these observations ranges between 5 and 183%, with an average RSD of 62%. Overall, the blind duplicate and discrete samplings, and replicate SCAT pit observations, show that TPH concentrations are highly variable over short distances. The highly heterogeneous nature of TPH concentrations is probably the result of both the continued and spatially variable impacts to the area over the past 10 years (including during and after the spills in 2008), the dynamic tidal environment causing variable oil settlement on intertidal sediments, and the nature of the mud-dominated sediments inhibiting internal oil movement with depth. The small-scale variability should, however, also be placed in the context of the overall variation in TPH concentrations, which ranges over three orders of magnitude as discussed further below.

Relation between chemistry and visual field observations
Comparisons between oil type and TPH concentrations are presented in Figures 11 and 12. In addition, Tables SI-1 and SI-2 in the supplementary material summarize the key statistics for TPH concentrations grouped by oil type. The comparison between SCAT described oil type and TPH concentrations indicates a good correlation for subsurface samples (Fig. 11, lower panel). The different SCAT groups have statistically different TPH concentrations, which is confirmed by an ANOVA test. Observations of 'silver sheen' in the waters of a 25-30 cm test pit have a median TPH value of slightly less than 2000 mg kg −1 and a 75%-ile value of c. 4000 mg kg −1 , whereas brown and black oil have higher median TPH values of 8000 and 20 000 mg kg −1 , respectively. The relative standard deviation (RSD) of TPH in the classes for the subsurface shown in Figure 11 range between 115% for black oil and 219% for silver sheen (Table SI-1 in supplementary material), which is higher than that of the small-scale sampling discussed in the previous section (33-96%). The higher RSD of TPH concentrations within each SCAT category can be caused by a combination of the following factors.
(1) SCAT observations were made in one pit whereas the analysed composite sediment sample consisted of five grab samples. For future work, it is recommended to take the average of three SCAT pits to account for the high degree of variability.
(2) Sheening and the presence of droplets of oil in a SCAT pit can occur over a range of TPH concentrations. This implies that SCAT observations are not expected to provide an exact proxy for TPH analyses, but rather that a certain SCAT observation can provide a bandwidth of TPH concentrations.
The ground surface samples do not show a correlation between SCAT field observations and chemical analysis results (Fig. 11, upper panel); this is confirmed by the ANOVA test, which showed no statistical difference in TPH concentrations for the different SCAT groups. The difference between ground surface and subsurface TPH correlations with SCAT observations is probably the result of the black organic-rich type colour of the sediment (believed to be a result of the high organic matter content and sulphate-reducing conditions) on which it is hard to differentiate weathered oil from sediment. Field reports indicate that only when oil was seen pooled on water or where oiling was fresh (black) could it be clearly seen on the sediment surface, whereas the oiling of pit waters could be discerned more easily and more consistently. In addition, the surface sample included sediments 0-5 cm deep, whereas the visual observation was solely on the sediment surface (0 cm deep) and therefore does not include oil incorporated into the sediment.
Pit oiling of ≤25% brown or black oil may be one criterion to provide a visual means to confirm completion of Phase 2 treatment Fig. 11. Box-whisker plots of TPH concentrations (in mg kg −1 ; log scale) for different oil types as delineated by SCAT surveys for surface and subsurface samples. The box extends from the lower to upper quartile values of the data, with an orange line at the median (or 50%ile). The whiskers extend from the box to show the range of the data between the 5and 95%-ile. Flier points are those past the end of the whiskers. The number above the upper x-axis indicates the number of samples in each group.
(compared with the performance criterion of <35% applied in Phase 1). This is believed to represent an oil level that can practicably be achieved with flushing and physical agitation. In parallel to SCAT and cleanup activities, pilot replanting of mangrove seedling is being carried out to determine if this cleanup target is adequate. A comparison of SCAT data and chemical data shows that the visual criterion corresponds to a median subsurface concentration of c. 4000 mg kg −1 (Fig. 12). For 75% of the sites sampled Post-Treatment (upper box limit), a TPH concentration <10 000 mg kg −1 was found. The sites that do not meet the criterion have a 50%-ile and 75%-ile concentration of 20 000 and 35 000 mg kg −1 . TPH concentrations measured in ground surface samples are much higher, reflecting the inability of SCAT observations to reliably detect surface oiling as compared with chemical values taken from 0-5 cm depth.

Conclusions
Results show that the assessed cleanup area is affected by hydrocarbons that are widely spread and highly variable both spatially and with depth. Spatial variability is both on a broad scale over the entire work area (several kilometres in length) and on very small scales (within 10 m). Chemical sampling provides important data to characterize the hydrocarbons present, demonstrating that they are primarily composed of highly weathered crude oil, enriched in heavy hydrocarbons compared with unweathered Bonny Light crude oil and with low BTEX and PAH components. The high spatial variability that is present on both small and large scales, however, limits the applicability to directly steer and confirm completion of cleanup activities using a single chemical-based analytical criterion. Given this variability, SCAT pit observations can be considered sufficiently reliable to guide confirmation of Phase 2 cleanup in Bodo in conjunction with a risk assessment according to EGASPIN requirements. Another alternative being reviewed is to reduce the level of contamination in the sediments sufficient that mangroves can be replanted and survive, aiding the further degradation of remaining oil through phytoremediation processes. Results from seven planted sites after 1 year show 82% survival of 346 seedlings planted and good growth (mean: +46% plant height) in spite of high TPH levels (2210-87 000 mg kg −1 surface, 420-59 000 mg kg −1 subsurface). Continued spillages, however, will affect and kill replanted mangroves.
We conclude that for the relatively large affected area under study here, with challenges pertaining to site access and sample shipment, visual SCAT observations can effectively be calibrated with a limited set of chemical field data. As such, the study provides an example of the application of SCAT as a speedy and reliable model to assess and manage cleanup endeavours in mangrove and tidal flat environments. The comparison between SCAT observations and the sediment chemical characterization did, however, reveal that SCAT observations for surface oiling are a poor indicator for surficial sediment TPH concentration. This contrasts with subsurface (or pit) SCAT observations, which showed a much better correlation with TPH concentrations. Calibration of SCAT observations ( pure product or sheen type) to sediment chemical analyses is also probably effective for rapid field assessment of other hydrocarbon products (e.g. different crude oils or gasoline spills), but this requires further work given that the solubility and visual appearance will vary.