Biomarkers Profile of the Mysterious 2019 Oil Spill on the Northeast Coast of Brazil and Discrimination from Unreported Events

In 2019, large amounts of oil reached the northeast coast of Brazil, causing damage to the environment and the local economy, especially in the state of Pernambuco. In order to correlate with possible sources, investigation was made of the geochemical biomarkers of the oils using “gold standard” forensic protocols from the European Committee for Standardization (CEN). The biomarkers study was improved by using gas chromatography-tandem mass spectrometry (GC-MS/MS), rather than the standard protocol that suggests use of the selected ion monitoring (SIM) method. Analysis was made of thirteen oil samples from the Pernambuco coast, in order to identify their degrees of similarity and the possible presence of oils from unreported spills. The use of eighteen diagnostic ratios and multivariate analysis revealed a cluster formed by eleven samples with biomarker distributions typical of oil from the 2019 spill. However, two samples had anomalous fingerprints, especially due to the absence of the 18α(H)-oleanane and 18β(H)-oleanane isomers. Both the CEN protocol applied for the classical biomarkers and a comprehensive Fourier transform mass spectrometry (FT-MS) analysis of polar compounds confirmed the dissimilarities between the samples. The findings suggested that these two oils could have originated from an event unrelated to the mysterious 2019 spill.


Introduction
In late August 2019, a large amount of oil affected a significant portion of the Brazilian coast.][4][5][6] In environmental geochemical studies, the calculation of diagnostic ratios of petroleum biomarkers, such as compounds from the terpane and sterane classes, can provide geochemical information including the degree of thermal maturity and the nature of the organic matter deposition.][9][10][11][12][13] Hence, diagnostic ratios of sterane and terpane compounds have been used in characterization of the oil from the 2019 spill and determination of its possible source. 2,3ata reported by Oliveira et al. 3 and Carregosa et al. 4 suggested that this petroleum material had geochemical similarity with oils from Venezuelan oil basins, while the study of Reddy et al. 5 identified characteristics of processed oil, due to the presence of the more thermally stable compounds 2-and 3-methylphenanthrene (2MP and 3MP), which were preferentially enriched, relative to 9/4-and 1-methylphenanthrene (9/4MP and 1MP).The compounds 2MP and 3MP can be used to differentiate processed oils and virgin crude oils, due to their high thermal stability in crude oil generation processes, compared to 9/4MP and 1MP.These proportions are commonly found to be altered in products derived from industrial thermal petroleum processes, with a higher proportion of 2MP and 3MP comparing to 9/4MP and 1MP. 14,15il spill investigations require the application of analytical protocols to perform the chemical characterization of oil and/or environmental samples.In most cases, the evaluation is done to assess the environmental impacts caused by the spill and enables the correlation of oils with potential spill sources. 7,16Typically, the European Committee for Standardization (CEN) 8,9,17 methodology for emergency oil spill investigation is employed, in order to standardize the required analysis steps.The process for comparing the chemical compositions of different oil samples in the context of an oil spill involves extracting oil to remove impurities, visually comparing chromatographic profiles, and calculating diagnostic ratios of oil biomarkers.
In order to comply with the CEN protocol, analytical techniques such as gas chromatography coupled with mass spectrometry (GC-MS), or gas chromatography with flame ionization detection (GC-FID), are widely used for the chemical characterization of oil spill samples, as well as for the correlation of these samples with oils from possible sources, based on the identification of nonpolar hydrocarbon compounds. 8,9,17Therefore, the GC-MS and GC-FID techniques are regarded as the "gold standards'' in the context of geochemical evaluation of oil spills. 10,11On the other hand, Fourier transform mass spectrometry (FT-MS) has become a new tool for the chemical characterization of spilled oils.FT-MS enables the elucidation of hydrocarbon compounds present in the polar fraction, consequently expanding the chemical characterization of oil and derivatives by identifying species not previously determined by GC-MS, GC-MS/MS or GC-FID. 16,18,19][4][5][6] Given the magnitude of the spill and its substantial economic and environmental impacts, there was a need to perform chemical characterization of oil samples collected on the coast of Pernambuco state in Northeast Brazil, following the occurrence of the spill in the second half of 2019.Therefore, in this work, analysis was made of thirteen oil samples collected from multiple beaches affected by the 2019 spill.The investigation employed the CEN 8,9,17 protocol, using gas chromatography-tandem mass spectrometry (GC-MS/MS) to identify classical petroleum biomarkers.In addition, FT-MS was used to expand the study, at the molecular level, of the chemical compositions of the samples in terms of the most polar compounds.As this study progressed, the findings led to consideration of the possibility of additional unreported events that could have contributed to the large amounts of oil material present along the coast.
The oil samples were collected using spatulas and glass jars.The samples, which consisted of mixtures of sand, oil, and seawater, were submitted to the oil extraction procedure described by Carregosa et al., 4 using 2 g of the sample, which was extracted five times with 5 mL of dichloromethane.The solvent was removed with a rotary evaporator and the oil extracted was re-dissolved at 1 mg mL -1 in dichloromethane for GC-MS/MS analysis.

GC-MS/MS analysis
The GC-MS/MS instrument was a triple quadrupole analyzer (model TQ8040, Shimadzu Corporation, Japan) with a 70 eV electron ionization (EI) source.The conditions for the chromatographic analysis followed the protocol established by Carregosa et al. 4 For this, an NA-5MS (5% phenyl, 95% dimethylpolysiloxane) capillary column was used, the sample injection volume was 1 μL, in split mode (1:30), and the injection temperature was 290 ºC.The oven temperature program was from 60 to 310 ºC, at 2 ºC min -1 .Helium (99.995% purity) was used as the carrier gas, at a flow rate of 1.0 mL min -1 .The total run time was 125 min.The temperatures of the MS ionization source and the interface were both set at 300 ºC.

FT-MS analysis
Electrospray ionization in negative mode (ESI(-)) and atmospheric pressure photoionization in positive mode (APPI(+)) FT-MS analyses were performed according to the methodology and conditions used by Castiblanco et al. 20 The samples were previously dissolved in a mixture of toluene and methanol (1:1 v/v), to a final concentration of 125 µg mL -1 , and an aliquot of 100 μL was introduced by direct infusion from a 500 μL syringe into the mass spectrometer (Exactive HCD Plus Orbitrap, Thermo Scientific, Bremen, Germany), operated in full scan MS mode, with resolution of 140,000 full width at half maximum (FWHM) at m/z 200, in the range m/z 100-1000.Molecular formulas were assigned using XCalibur v. 3.1 software (Thermo Scientific, Bremen, Germany) and graphs created using Excel software (Microsoft Co., WA, USA) with a macro developed by the Petroleum and Energy from Biomass Research Group (PEB) research group.

Multivariate statistical data analysis
Multivariate statistical analysis was applied to the results of the diagnostic ratios of petroleum biomarkers identified by GC-MS/MS.Similarities between the samples were investigated using principal component analysis (PCA) and hierarchical clustering analysis (HCA) with heatmap dendrogram generation, using R software (version 4.2.0)packages "mdatools" and "pheatmap". 21The HCA heatmap dendrogram was obtained using the Ward clustering method and the Manhattan distance measurement method, with correlation of the results of the HCA and heatmap evaluations.

GC-MS/MS analysis
Firstly, the starting point of the methodology recommended by CEN 8,9,17 was applied for the oil spill samples, comparing the chromatographic profiles acquired by GC-MS in scan or SIM mode, in order to find matches or differences.The full chromatograms presented profiles corresponding to degraded oil, hindering determination of the distributions of n-alkanes and isoprenoids, since these compounds are more vulnerable to weathering processes such as biodegradation, evaporation, and photo-oxidation, making it impossible to use them to find similarities or dissimilarities among the samples.As can be seen in Figure 2, natural weathering influenced the profiles of samples S10 and S11, characterized by low peak intensity for the n-alkanes and isoprenoids (pristane and phytane), which prevented accurate identification of these biomarkers in the chromatogram distribution.
On the other hand, comparison of the n-alkanes and isoprenoids chromatograms of the m/z 85 ions, obtained in SIM acquisition mode, identified three distinct profiles for the samples, one being unique for S10 and another for S11, while the third profile was identical for eleven samples (sample S4 was chosen as representative), as shown in Figure 2. Additionally, the GC profiles of samples S10 and S11 revealed greater degradation of n-alkanes.This result was a first indication of a possible dissimilarity between the samples, although no conclusions could be drawn, since it was not possible to measure the level of weathering that modified the chromatographic profiles of the paraffinic hydrocarbons in the samples.
On the other hand, the chromatographic profiles obtained using MRM analysis of the precursor and product ions of the terpanes and steranes biomarkers, summarized in Table S1 (Supplementary Information (SI) section), enabled an initial qualitative analysis showing high similarity of the terpanes and steranes distributions for the group of eleven samples, while samples S10 and S11 differed from this first group, corroborating the fingerprint of paraffinic constituents.Figure 3 shows an expansion of MRM transition m/z 412→191, related to terpanes, with different intensity ratios found for C 30 -hopane (C 30 Hop) and gammacerane (Gam), as well as the absence of the oleanane (O) isomers biomarker for samples S10 and S11.Oleanane could be highlighted as an important biomarker present in oil from the spill, due to the geochemical information provided by this molecule.Calculation of the oleanane index, in addition to the other diagnostic ratios, enabled association of the oil from the 2019 spill with oils produced in the Ayacucho region of Venezuela, based on data reported in the literature. 4The characteristic presence of oleanane differentiates these oils from those found in other regions of Venezuela.For the oil from the 2019 spill, the oleanane index indicated a contribution from angiosperm plants, with limited inputs of terrigenous organic matter during deposition of the source rock of these oils. 4ccording to the CEN methodology, 8,9,17 if the qualitative comparisons between chromatographic profiles are not sufficient to conclude similarity between samples, then calculation should be made of the diagnostic ratios of the biomarkers.These ratios can be used to identify similarities among oil samples and to provide chemical and geochemical information, such as thermal maturity, depositional environment, and the precursor organic matter leading to the formation of oil constituents. 9,12,13Eighteen diagnostic ratios, shown in Table S2 (SI section), were determined using the areas of the biomarker peaks obtained in the MRM analysis.
Determination of thermal maturity ratios including Ts/Tm, Ts/(Ts + Tm), C 29 αααS/(C 29 αααS + C 29 αααR) Stg, and C29αββ(S + R)/(C29αββ(S + R) + C29ααα(S + R)) indicated that the entire set of samples originated from oils generated in a low to medium thermal maturity environment. 17,18Differences were observed for the organic matter and depositional environment ratios, with steranes/ hopanes ratios < 1 for samples S10 and S11, indicating that the formation of these oils was related to inputs of terrestrial organic matter, whereas the ratios for the other eleven samples were characteristic of oil formed by inputs of marine organic matter. 12,13The C31βR/C30Hop ratio, which is used for the differentiation of oils from source rocks in marine and lacustrine environments, indicated a possible lacustrine contribution for sample S10 and corroborated the identification of a marine contribution in the generation of the other samples, except in the case of sample S11, for which the results were inconclusive. 12,13he data obtained using the CEN methodology indicated that samples S10 and S11 were not only different from each other, but also different from the other eleven oil spill samples.In order to further elucidate the chemical variability among the samples and obtain a more reliable conclusion, PCA and HCA/heatmap analyses were performed using the values of the biomarker diagnostic ratios provided in Table S2.
The graphs shown in Figure 4 illustrate the PCA scores and loadings, where the first two components (PC1 and PC2) explained 74% of the total variance.The scores plot shows that the sample set was distributed in three groups, separated by PC1 (which explained 39.2% of the total variance), with the group consisting only of S10 being located on the negative side of PC1, while the group consisting only of S11 was located on the positive side.The loadings graph shows that the C 29 Mor/C 29 Hop; Gam/C 30 Hop; and GI% ratios were responsible for the separation of S10 on the negative side, while the C 27 Dia/(Dia + St);C 27 Dia/St; C 27 Dia/(Dia + St); and C 27 DiaS/(C 27 αββS + C 27 αααS + C 27 αββR) ratios were responsible for the separation of S11 on the positive side.The grouping of all the other samples in the scores plot, with PC1 score near zero, was due to the combined effect of the results obtained for all the diagnostic ratios, shown in the loadings graph.PC2, which explained 34.8% of the total variance, separated samples S10 and S11, on the negative side, from the group formed by the other samples, on the positive side.Similarly, the loadings graph showed that the Gam/C 30 Hop; and GI%; C 27 Dia/(Dia + St);C 27 Dia/St; C 27 Dia/(Dia + St); and C 27 DiaS/(C 27 αββS + C 27 αααS + C 27 αββR) ratios were responsible for the separation of samples S10 and S11 from the remaining eleven samples.
The HCA heatmap plot (Figure 5) corroborated the PCA results, with samples S10 and S11 being separated from each other, as well as from the set of the other eleven samples.The separations between the same three groups were confirmed by the color pattern in the heatmap, representing the results obtained for the diagnostic ratios of the oil biomarkers.
The ratios that were highlighted in the PCA and HCA/heatmap analyses, responsible for the separation of the S10 and S11 samples from each other and from the other samples, were used to perform an oil-oil correlation.As mentioned previously, the thermal maturity ratios overall indicated mid-generation thermal maturity for all the samples.However, the ratios related to the depositional environment and organic matter, such as C 31 αβR/C 30 Hop, indicated some differences in geochemical parameters.The Gam/C 30 Hop and GI% ratios are indicators of salinity extraction from the water column in marine and nonmarine environments, where higher values correspond to higher salinity of the environment, as found for sample S10, when compared to sample S11.The Gam/C 30 Hop ratio is also used in the differentiation of carbonate rocks (low values) and calcareous siliciclastic sedimentary rocks and deltaic deposits (higher values). 12,13A significantly higher value for GI% was obtained for sample S10, compared to all the other samples, with the results for all the samples being indicative of formation associated with calcareous sedimentary rocks.The ratio between moranes and hopanes provides an indication of thermal maturity, with the highest value (ratio C 29 Mor/C 29 Hop) found for sample S10.
The results obtained using both statistical tools (Figures 4 and 5) supported the conclusion that samples S10 and S11 were dissimilar to each other and to the set of the other eleven samples.The multivariate processing also showed similarities among these other eleven samples, which were characteristic of the oil spilled in 2019.
In order to evaluate samples S10 and S11, they were compared with samples S4 and S6, selected at random from cluster 1 of Figure 4.The relative difference (RD) method was used to compare the petroleum biomarker diagnostic ratio values and identify similarities between the samples (Table 1).For each ratio, the values for different samples were subtracted, divided by the mean of the values, and multiplied by 100% (RD = (X-Y/mean (X:Y) ) × 100%), where the diagnostic ratio for a given sample is represented by X, while Y represents the diagnostic ratio for a second, different sample. 8,9or this purpose, a maximum RD of 14% was considered indicative of similar oils. 8,9The comparison between samples S4 and S6 resulted in an indication of similarity for most of the biomarker diagnostic ratios, so these samples were classified as similar, according to the RD method.However, the comparisons between samples S6 and S10, between S10 and S11, and between S6 and S11 mostly indicated dissimilarity between the oil samples.Therefore, it could be concluded that samples S10 and S11, collected on Maria Farinha beach in Pernambuco state, were dissimilar to each other and to the other eleven samples.

FT-MS analysis
FT-MS analyses were performed to identify similarities/ dissimilarities that corroborated the results obtained previously using GC-MS/MS, by identifying and comparing the classes of polar compounds present in the oil samples.The APPI(+) and ESI(-) FT-MS were used to cover as many ion assignments as possible for the medium to high polarity constituents of the oils, as well as polycyclic aromatic hydrocarbons identified in the APPI(+) mode.Those compounds in crude oils can provide geochemical insights about their sources and biodegradation.Comparing the abundance and classes of polar compounds of unknown oils can help to identify similarities and infer their origin, supporting oil-oil correlation.Additionally, the O x , SO x , and N 1 classes in oils can indicate the extent of biodegradation. 6,22,23he class distribution is shown in Figure 6 and the values are provided in Tables S3-S4 (SI section).Using both ionization modes, samples S4 and S6 showed similar abundances for most of the classes, suggesting very similar chemical compositions for compounds of medium to high polarity.However, samples S10 and S11 showed dissimilarity in the abundances of most classes, such as for N 1 , O 1 , and S 1 (using APPI(+)), and for N 1 , O 2 , O 1 N 1 , and O 4 S 1 (using ESI(-)), which were similar for samples S4 and S6, while samples S10 and S11 were different from each other.
The ESI(-) analysis showed higher relative abundance of the O 2 class and lower relative abundance of N 1 , suggesting that the oil had undergone significant biodegradation, as reported previously. 24This was particularly evident for sample S11, in agreement with the n-alkanes profile shown in Figure 2.However, sample S10 showed higher abundance of N 1 , which suggested that the oil may have originated from a specific type of source rock.
The distributions of all the molecular formulas, assigned in terms of each ion intensity, were plotted as graphs of double bond equivalent (DBE) versus carbon number (Figure 7), allowing observation of the dissimilarities in the molecular formula distributions of the oil constituents.Samples S10 and S11 presented distinct distributions, which were different from those for samples S4 and S6, with the latter two samples showing very similar distributions.These findings were obtained using both APPI(+) and ESI(-) ionization modes.
The DBE versus carbon number profiles obtained using APPI(+) FT-MS (Figure 7) showed that for sample S10, the highest intensity compounds had carbon numbers between 20 and 40, with DBE from 10 to 25.This distribution was different from that for sample S11 and indicated the presence of more aromatic compounds in sample S10.The profiles obtained using ESI(-) FT-MS showed higher intensity of compounds with carbon numbers between 20 and 40 and DBE from 0 to 10 for sample S11, compared to sample S10.The analyses in both ionization modes revealed differences between samples S10 and S11, as well as differences between these samples and the other larger   group associated with the 2019 oil spill (using samples S4 and S6 as representative of this group).These results showed that FT-MS data, presented as compound class distributions and DBE versus carbon number graphs, can be used in support of the CEN protocol, enabling the comparison of oil spill samples to identify similarities and differences.

Conclusions
The use of the CEN protocol applied to the oil spill samples, in combination with the multivariate statistical approach, enabled the identification of dissimilarities between the chromatographic profiles and biomarker ratios of samples S10 and S11, as well as between these samples and the other eleven oil samples.The absence of the oleanane isomers biomarker suggested that samples S10 and S11 could have originated from a contamination source different to that for the other oil samples associated with the 2019 spill.
The results for the steranes/hopanes diagnostic ratio were indicative of a contribution from terrigenous or lacustrine organic matter only for samples S10 and S11, which was different from the other samples.The dissimilarities between the oils were evidenced by the statistical data analysis, with clear formation of three groups of samples.In addition, the FT-MS results corroborated the differentiation between samples S10 and S11, as well as between these samples and the group composed of the other eleven samples.
The findings indicated that there was a contribution to the contamination of Maria Farinha beach in Pernambuco state (samples S10 and S11) by an oil of unknown origin, which differed from the one responsible for the contamination of the other beaches in the state.The results demonstrated the effectiveness of using GC-MS/MS analysis in MRM mode, together with FT-MS techniques, in investigations to evaluate the similarity of spilled oils.

Figure 1 .
Figure 1.Map of beach locations where the oil samples were collected in Pernambuco state, Northeast Brazil.

Figure 2 .
Figure 2. GC-MS profiles obtained in SIM mode at m/z 85 for n-alkanes and isoprenoids in samples S4, S10, and S11.

Figure 3 .
Figure 3. MRM profiles for the m/z 412→191 transitions of terpane biomarkers in the oil samples.