Characterization and Quantification of Naphthenic Acids in Produced Water by Orbitrap MS and a Multivariate Approach

Naphthenic acids (NAs) naturally occur in crude oil and its associated produced water, presenting significant challenges, such as corrosion, in refinery apparatus and ecotoxicity in aquatic habitats. This study delineates a multivariate method to quantify NAs in produced water via electrospray ionization coupled with high-resolution Orbitrap mass spectrometry (ESI-Orbitrap MS). By employing liquid–liquid extraction, followed by direct infusion ESI(−)-Orbitrap MS, we characterized and quantified NAs employing a partial least-squares regression (PLS) model enhanced by the ordered predictor selection (OPS) algorithm. Thirty-six produced water samples were utilized, with 24 allocated for calibration and 12 designated for validation. The PLS-OPS model demonstrated notable accuracy in predicting NA concentrations in simulated and actual produced water samples ranging from ∼30 to 300 mg·L–1. This methodology offers a rapid yet robust alternative for quantifying NAs using mass spectrometry augmented by PLS and the OPS. Its significance is underscored by its potential to equip the petroleum industry with a swift and reliable monitoring mechanism for NAs in produced water, thereby aiding in mitigating environmental and operational risks.


■ INTRODUCTION
Produced water is an abundant wastewater generated from offshore petroleum production and is a complex mixture containing different chemical substances, including heavy metals, hydrocarbons, phenols, additives, and organic acids, often represented as naphthenic acids (NAs); 1−4 however, its composition can change according to the field's geographical and geological factors. 5aphthenic acids (NAs) are natural compounds present in crude oils and their effluents whose concentrations can vary intensely depending on the source of oil.Chemically, these compounds comprise a complex mixture of acyclic, cycloaliphatic, and aromatic carboxylic acids.−12 Recently, this definition has been expanded to include a complex mixture with more oxygen atoms. 6,13−20 Considering the substantial environmental and economic impacts of NAs, using analytical methods and techniques for their characterization becomes crucial.
Since the development of ultrahigh resolution mass spectrometry (UHRMS) in the past few years, complex matrices such as petroleum and derivatives could be characterized by their thousands of compounds.−26 These characterization have been performed using different mass spectrometers, such as Orbitrap and FT-ICR, and several ionization techniques including Electrospray Ionization (ESI), 4,6,22,25 Electron Impact Ionization (EI), 21 Atmospheric Pressure Photoionization (APPI), 26,27 Matrix-Assisted Laser Desorption/Ionization (MALDI), 14 and Wooden-Tip Electrospray Ionization (WT-ESI). 13ven though UHRMS has shown a wide range of applications in characterizing NAs, quantifying NAs using UHRMS has revealed a more significant issue.In recent years, Samanipour and co-workers have studied the quantification of NAs from produced water using liquid chromatography (LC) coupled to ESI-MS, where a technical mixture of NAs was used as a standard and concentration of five produced water samples were found to range between 5 to 60 mg L −1 approximately. 3 Hindle and co-workers also performed a similar study, where two technical mixtures were used as a standard for quantifying seven oil sands process-affected water (OSPW). 28Furthermore, previous studies from our group have reported the capability of quantifying NAs using different approaches, such as WT-ESI 13 and eletromembrane extraction, followed by direct infusion ESI-MS. 4On the other hand, no study has reported a multivariate approach for quantifying NAs.
−31 Traditional methods for quantification, as highlighted in previous studies, have primarily relied on direct analytical techniques and standards-based approaches, which, while effective, may not fully account for the complexities inherent in environmental samples.This gap underscores the necessity for innovative approaches to accommodate the multifaceted nature of produced water and provide accurate, reliable quantification of NAs. 28,32he emergence of multivariate approaches heralds a transformative evolution in analyzing complex environmental samples.Multivariate statistical methods, such as partial leastsquares regression (PLS) combined with ordered predictors selection (OPS), offer a sophisticated framework for tackling the intricacies of NA quantification. 33,34Unlike univariate methods, which consider single variables independently, multivariate techniques analyze multiple variables simultaneously, capturing the inherent relationships and patterns within the data. 35This holistic view is particularly advantageous in produced water, where the interaction between various compounds can significantly influence the analytical outcomes.
PLS regression, a cornerstone of multivariate analysis, is well-suited for quantifying NAs due to its ability to handle highly collinear and complex data sets. 36Focusing on the covariance between the dependent and independent variables allows the PLS to extract the most relevant information for predicting NAs concentrations, even in a noisy or highly complex background.This feature makes it an invaluable tool for environmental chemists seeking to quantify NAs precisely and accurately.
The integration of the OPS with the PLS further enhances the robustness of the quantification process.OPS is a variable selection method that systematically identifies the most predictive variables, optimizing the PLS model. 34In the context of NAs quantification, the use of OPS can help to distill the vast array of mass spectrometry data into a manageable subset of predictors, significantly improving model performance and interpretability.Therefore, this combination of PLS and the OPS represents a powerful multivariate approach that can overcome the challenges posed by the sample complexity and variability in produced water.
For the first time, this study proposes a multivariate approach for quantifying NAs in produced water, leveraging the strengths of PLS regression combined with OPS.This study proposes a new strategy for quantifying NAs compounds from produced water.Using LLE and direct infusion ESI(−)-Orbitrap MS, the proposed methodology led to a straightforward, simple, and fast approach to extract and analyze NAs in wastewater, enabling an efficient characterization and quantification when followed by a partial least-squares regression (PLS) combined with the ordered predictor selection method (OPS).

Journal of the American Society for Mass Spectrometry
Twelve real produced water samples were also evaluated in this study.These samples were supplied by Petroĺeo Brasileiro S.A. (Petrobras).
Liquid−Liquid Extraction.Liquid−liquid extraction (LLE) extractions were conducted using 25.0 mL of simulated produced water and real produced water samples, previously adjusted the pH for 2.0, and 25.0 mL of dichloromethane (3×) in a 125.0 mL separating funnel.Anhydrous sodium sulfate (6.0 g) was used to remove residual water, and the extract solvent was reduced using an IKA Rotary Evaporators RV 10 auto.The residual extract solution was concentrated using a SpeedVac Savant concentrator (Thermo Scientific, Asheville, U.S.A.).All extracts were kept stored at 4 °C until mass spectrometry analysis.
ESI(−)-Orbitrap MS Analysis.The mass spectrometry analyses were performed using a HESI (heated electrospray ionization) ionization probe coupled to a Q-Exactive Orbitrap (Thermo Scientific, Bremen, Germany) in negative mode at 5.0 μL min −1 using a Chemyx Fusion syringe pump (Stafford, U.S.A.).The full scan was established in the mass range m/z 100 to 600 with parameters of spray voltage of 3.2 kV, capillary temperature of 275 °C, S-lens 100, and sheath gas of 5.For real produced water samples characterization, all spectra were processed using Composer software (Sierra Analytics, Modesto, U.S.A.) for molecular formula assignment.
Multivariate Quantification.ESI(−)-Orbitrap MS data obtained in .rawfiles were converted to .cdf and imported into Matlab R2020a (MathWorks, Natick, MA, U.S.A.), where multivariate models were developed.A data matrix containing the intensity values, which are the independent variables, was constructed and is called matrix X.The rows of matrix X correspond to the 36 mixtures of NAs, and the columns correspond to m/z.
A vector containing the respective NAs concentrations of the simulated produced water solution was constructed and named y, the dependent variable.The vector y has a number of rows equal to the number of samples in the X matrix.To build the calibration models, PLS was used combined with variable selection by OPS.All calculations were performed in Matlab R2020a using the NewOPS package and personal algorithms. 34he use of PLS regression offers several significant advantages, particularly when dealing with complex matrices that contain various interfering substances.One of the foremost benefits of PLS regression is its ability to handle multicollinearity among predictor variables, which is common in spectral data, where many m/z values are highly correlated.PLS regression can efficiently extract relevant information from this correlated data by projecting it onto a new set of orthogonal components that maximize the covariance between the predictors and the response variable. 39oreover, PLS regression excels in modeling data, even in the presence of interferents.By leveraging the entire spectral profile, PLS can differentiate between the signals of the target analytes and those arising from interfering substances.This capability is particularly valuable in our scenario, where samples often contain a complex mixture of compounds.The robust nature of PLS allows it to account for and correct these matrix effects, ensuring that the quantification of the target analytes remains accurate and reliable. 39,40nother key advantage of PLS regression is its capacity to handle large data sets with many variables and observations, making it suitable for high-dimensional data typical in mass spectrometry.PLS models can incorporate all relevant spectral information, providing a comprehensive analysis that captures the variability in the data due to both the analytes and the background matrix. 40ean centering or autoscaling was used in all calculations to preprocess the X and y variables.Several types of normalizations were applied to the lines of the X matrix to find the best prediction model.The cross-validation method with random removal of three samples was used to perform this optimization and selection of the model's latent variables (LV).Furthermore, in selecting variables, the OPS method was applied by using a window of 20 variables with an increment of 5, with 100% of the variables being tested.
The 36 sample set was divided into a calibration set (24 samples) and a prediction set (12 samples), separated randomly, with respect to the range modeled in both sets.The quality of the models was evaluated by the square root of the mean squared error (RMSE) and the correlation coefficient (R), which can be calculated using eqs 1 and 2, respectively. (2) In the equations, y i and y i are measured and predicted values, respectively, of a sample i. Variables y and y are the averages of the measured and predicted values for a set of n samples.The RMSE and R are called RMSEC and Rc for calibration, RMSECV and Rcv for cross-validation, and RMSEP and Rp for prediction, respectively.LC-Orbitrap MS Analysis.The mass spectrometry analyses were performed using an HPLC-UV 1220 Infinity II (Agilent Technologies) coupled with a Q-Exactive Orbitrap instrument and a HESI source.The column used in this study was a Poroshell C18 column (4.6 mm × 100 mm × 2.7 μm).All samples were analyzed by using a gradient elution program.The binary mobile phases comprised A (water with 0.1% acetic acid) and B (methanol).The gradient elution started at 5% (B), kept constant for 1 min, linearly increased to 90% (B) in 8 min, followed by increasing to 100% (B) in 5 min, and kept constant for 16 min at 100% (B).The eluent was restored to the initial conditions in 6 min (36 min run).The flow rate was set at 0.5 mL min −1 .The injection volume was 5 μL, and the column temperature was 40 °C.The ESI source conditions were established in the mass range m/z 100 to 600 with parameters of spray voltage 3.5 kV, capillary temperature 320 °C, S-lens 40, and sheath gas 5.
For the LC-MS quantification, the same ten carboxylic acid standards were used in the calibration curves at concentrations between 2. A class diagram was built to better understand the NAs in these samples.Figure 2 shows the samples's class distribution obtained from the ESI(−)-Orbitrap MS analysis.Only one sample presents an abundance of an O2-containing compound below 80%, with an abundance of an O4-containing compound close to 40%.These results indicated that all 12 samples showed similar species distribution along with a high abundance of O2 species.Also, these class distributions are similar to NAs reported before. 4,13,22,38igure S2 shows the DBE versus carbon number contour plots for NAs in the produced water samples.A similar distribution of compounds with DBE values between 1 to 10 and carbon numbers 5 to 20 were found.Only compounds with DBE 1 showed an extensive range of carbon numbers, with values up to 26.After the characterization of all NA extracts, a method for multivariate quantification was built.
Multivariate Quantification.As a first step to the method development for NA multivariate quantification, 36 simulated produced water samples were prepared using ten carboxylic acid standards in a concentration range of 0.5 to 40 mg L −1 , as described in the Methods and Materials.Each simulated produced water was extracted using the same extraction methodology applied to produced water and analyzed by ESI(−)-Orbitrap MS.
From the alignment of the 36 spectra of the NAs mixtures, the spectra were organized in the X matrix to construct the multivariate model.This matrix was aligned considering a resolution of 0.001, thus obtaining 77187 variables from m/z 100 to 400.The average mass spectrum obtained from the 36 mixtures of NAs is shown in Figure 3.
Notably, despite our initial intention to analyze the full mass range of m/z 100−600, meticulous examination revealed an absence of discernible signals in both the standard mixtures and real produced water samples beyond the m/z value of 400.This intriguing observation prompted a deliberate decision to constrain the mass range, restricting subsequent analyses to m/ z = 100−400.
Table 1 showcases the calculated parameters for the PLS-OPS model for quantifying NAs.The model was derived from the autoscaled X matrix, with intensities normalized by the total sum.The analysis of the results reveals that the RMSE values are notably low concerning the modeled range (0.5−40 mg L −1 ), underscoring the model's robustness and potential applicability in predicting NAs.The PLS-OPS model demonstrates a praiseworthy predictive performance, as evidenced by the calibration and prediction correlation coefficients (Rc and Rp, respectively), which exceed 0.95.Such high correlation coefficients indicate a strong linear relationship between the predicted and reference values, confirming the model's efficacy in accurately predicting NAs.Moreover, the relatively low RMSEs further attest to the model's accuracy and capability to   minimize prediction errors within the specified concentration range.Figure 4 displays the reference values, predicted values, and residuals for the 36 sample mixtures.Ideally, all points should align with the diagonal line depicted in Figure 3A; the closer a point is to this line, the more accurate is the prediction relative to the reference value.This visual representation underscores the accuracy of the predictions and highlights the relationship between the predicted and actual concentrations of NAs.
An overfitting model may not be advantageous, as it could be excessively tailored to the specific data set it was trained on, potentially resulting in more significant prediction errors when applied to new data sets.Therefore, striking a balance between the model fit and prediction errors for samples not included in the training phase is paramount.As observed, the random distribution of residuals suggests the absence of systematic errors in the model.This lack of pattern in the residuals indicates that the model can handle bias and uncorrected trends, which could compromise its predictive accuracy.
The implications of these findings are 2-fold.First, the PLS-OPS model exhibits a commendable level of accuracy in predicting NAs, as evidenced by the proximity of predicted values to the reference line in Figure 4A.Second, the random distribution of residuals corroborates the model's reliability and generalizability to new data.It highlights the model's ability to maintain its predictive performance without being overly specialized to the training data set, making it a versatile tool for predicting NAs concentrations across various contexts.The absence of systematic errors further enhances the model's utility, providing confidence in its predictions and underscoring its potential for widespread application in analytical chemistry.
The model's accuracy was further evaluated through its application to 12 real-produced water samples of produced water.In Figure 5, the spectral analysis juxtaposes a spectrum from the modeled mixture set with the spectrum of actual, realproduced water samples.A noteworthy observation from the spectral comparison is the substantially higher number of signals in the produced water sample than in the modeled artificial sample.This disparity underscores the complexity of real-world samples and the plethora of interfering substances that may be present.
An immense variety of peaks in the produced water spectrum indicates a diverse range of compounds, potentially affecting the prediction of NAs.Despite the increased complexity, the strength of the PLS-OPS model lies in its ability to discern the relevant patterns associated with NAs amidst a background of numerous other signals.Table 2 details the predicted values for NAs in real-produced water samples using the PLS-OPS model, offering insight into the model's capacity to cope with complex, real-world analytical challenges.
In the pursuit of accurate quantification of NAs in real produced water samples, we employed a developed LC-MS methodology emanating from our laboratory's innovative analytical techniques.LC-MS, renowned for its unparalleled accuracy and precision, is the benchmark for evaluating our multivariate PLS-OPS model with ESI(−)-Orbitrap MS.
Upon dilution of produced water samples by a factor of 20, essential for accommodating the ESI(−)-Orbitrap MS analysis, the imperative adjustment of this dilution was meticulously accounted for in comparison with LC-MS data.This careful recalibration laid the groundwork for a robust comparative analysis, encapsulating the outcomes in Table 2.The data vividly illustrate the congruence between the corrected PLS-

Figure 1 .
Figure 1.Chemical structures of ten carboxylic acid standards used in the simulated produced water with a concentration of 0.5 to 40 mg L −1 .
5 and 40 mg•L −1 .■ RESULTS AND DISCUSSION Characterization by ESI-Orbitrap MS.Before the method development for multivariate quantification of NAs from produced water, 12 produced water used in this study was extracted by liquid−liquid extraction and characterized by ESI(−)-Orbitrap MS. Figure S1 shows the 12 ESI(−)-Orbitrap MS spectra obtained in the m/z 100 to 600 range.For all spectra, the NAs compound peaks are most abundant between m/z 100 to 300, with high intensity in values below m/z 200.

Figure 2 .
Figure 2. Class distribution of species obtained from the ESI(−)-Orbitrap MS analysis of the real produced water samples.

Figure 4 .
Figure 4. (A) Scatter plot of the reference NAs versus predicted NAs for calibration (•) and prediction ( ■ ) sets, demonstrating the model's accuracy in estimating NAs concentrations.(B) Residuals plot for both calibration (•) and prediction ( ■ ) sets.

Figure 5 .
Figure 5. Mass spectra comparison between artificial and real produced water samples obtained by ESI(−)-Orbitrap MS.

a
PRED1 = Predicted values in mg L −1 by PLS-OPS; PRED2 = Predicted values corrected by dilution factor (20×); LC-MS = liquid chromatography−mass spectrometry used as the reference method.

Table 1 .
Calculated Parameters for the PLS-OPS Model for NAs (mg L −1 ) Prediction a

Table 2 .
Comparison of Predicted NAs by PLS-OPS Model with Reference Values a