H NMR and Chemometric Methods to Estimate the Octane Number in Brazilian C Gasolines

Brazilian gasoline type C can be purchased with octane number as the main difference. This quality parameter directly affects the price. Intermediate formulations may not be easily distinguished from conform samples due to similarity in visual appearance and physicochemical properties. The use of anhydrous ethanol as an additive also influences the octane values of the product. In this context, the present study describes the use of H nuclear magnetic resonance spectroscopy (H NMR) associated with chemometrics in the characterization and distinction of gasolines by different octane number. Conform samples of common, premium and blends of common-premium were used. Models of NMR-PCA (principal component analysis) and NMR-SIMCA (soft independent modelling of class analogies) showed a good correlation with the values determined by the standard method. The octane values predicted by the NMRPLS (partial least squares) model achieved a good correlation root mean square errors of prediction (RMSEP = 0.50), with the values determined by the standard method as well.


Introduction
Gasoline is a fossil fuel constituted by a complex mixture of liquid hydrocarbons from four to twelve carbon atoms, with 30 to 225 °C boiling points range. 1 Included in the mixture are paraffins, isoparaffins, olefins, naphthenes and aromatic hydrocarbons (PIONA) as well as compounds containing nitrogen, sulfur and oxygen. The PIONA ratio reflects on properties such as fuel resistance against compression, parameter called octane number or octane rating. This parameter, related to complete fuel combustion, is directly associated to the quality parameters of gasoline and consequently to the prices. [2][3][4] In Brazil, gasolines are sold at gas stations as common, premium and podium. These gasolines have different specifications for quality parameters, anhydrous fuel ethanol content and anti-knock index (AKI). The National Agency of Petroleum, Natural Gas, and Biofuels (ANP) 5 uses the AKI to represent the octane number and the minimum values for common and premium Brazilian automotive gasolines are 87 and 91 units, respectively. This is important because various technologies employed in Otto cycle engines require different fuel quality requirements, directly involved in their performance. In this way, engines with different compression ratios and combustion chamber volume require different anti-knock qualities of gasoline. In order to increase the anti-knock capacity of gasoline and to reduce vehicular emissions, additives such as aliphatic ethers and alcohols are used. The main additives used are ethanol, propanol and butanol isomers, methyl tert-butyl ether (MTBE) and ethyl tert-butyl ether (ETBE). 6 In Brazil, anhydrous ethanol is used, and the proportion added to gasolines is stablished according to Inter-Ministerial Sugar and Alcohol Council (CIMA). The values vary according to the Brazilian government's strategies and projections on the production of sugarcane. Currently, the ethanol contents  16 ABNT NBR 15441, 17 ASTM D 6277) 18 and anhydrous ethanol (ABNT NBR 13992) 19 contents. However, it is not always possible to identify adulterated gasoline by these tests because many solvents or gasoline blends exhibit physicochemical properties similar to gasoline.
Alternatively to the methods established by the Regulatory Agency, techniques such as near-infrared spectroscopy (NIR), 20 Fourier transform (FT)-Raman, 21 n u c l e a r m a g n e t i c r e s o n a n c e ( N M R ) , 22-26 a n d chromatographic methods 27,28 have been employed to assess the quality of fuels, expanding the analytical capabilities of products traded worldwide. [20][21][22][23][24][25][26][27][28] In order to determine the AKI value for gasoline using ANP regulations, 5 it is necessary to carry out timeconsuming tests (1 h per sample), large quantities of samples (1 L), reference standards for octane number (isooctane and n-heptane) and combustion engines dedicated only to this type of analysis. 20 Therefore, the development of rapid analytical protocols that allow obtaining information about octane number and how the chemical profiles and their alterations can compromise this parameter of quality have great importance. 20,25 In this context, NMR spectroscopy demonstrates great applicability since the technique does not require any physical separation or pretreatment. 29,30 Moreover, 1 H NMR measurements are rapid and can be easily automated, allowing the analysis of a large number of samples in a short period of time. NMR also brings the possibility of qualitative and quantitative analyses in a single experiment, since there is a direct proportionality between the area of the signals and the number of nuclei responsible for those signals.
Indeed, because of the complexity and strong spectral overlap inherent in the complex mixture of substances that characterizes gasoline, few isolated compounds (benzene and oxygenates) can be individually identified and quantified by the analysis of a single 1 H NMR spectrum. [31][32][33] In order, to assist in the interpretation and recognition of patterns, not evident by visual inspection of the NMR spectra, the chemometric analysis is indicated. Among the major chemometric tools used in fuel NMR data are principal component analysis (PCA), hierarchical class analysis (HCA), independent modeling by class analogies (SIMCA), and partial least squares regression (PLS). 1,28,34,35 PCA, an unsupervised analysis, generates information on possible sample groupings and indicates which spectral variables are determinants for discrimination, through the decomposition of experimental information organized in data matrices. The application of PCA in the study of NMR fuels goes beyond the exploratory analysis. 36 It can be used as a statistical basis for the execution of supervised methods such as SIMCA modeling. Each class is modeled by multidimensional spaces, used to classify new samples. The limits of each class are determined by critical values of variance, typical of each model, usually represented by hyperboxes or ellipses. 36,37 The multivariate study of NMR gasolines also allows the implementation of least squares regression models (PLS), useful in estimating (or predicting) physicochemical properties (e.g., octane number) of a new sample set. During the creation of the PLS model, the NMR spectral information of a given group are organized into an X matrix, while measurements resulting from a standard test (e.g., AKI determined by physicochemical tests) are organized into a Y matrix. Multivariate regression for X and Y matrices by PLS algorithm has a correlation that can be used to estimate the properties of new samples. 28 The present work describes the use of 1 H NMR associated with chemometrics as an analytical tool for non-conformities detection in Brazilian premium automotive type C gasoline (PG) intentionally adulterated with Brazilian common type C automotive gasoline (CG). Conform samples of the gasolines evaluated in this study had their chemical profiles characterized by 1 H NMR, used to perform the PCA analysis and discriminate conform and non-conform ones. Based on this information, a SIMCA classification of all samples was performed by the class model created of the 1 H NMR data from conform premium gasolines. An NMR-PLS model was applied in predicting the AKI of 28 samples of gasolines and compared with results of physicochemical parameters.

Samples and physicochemical analyses
Eight premium (PG) and 8 common (GC) gasoline samples collected directly from gas stations in São Paulo State, Brazil, were used in this work. Samples were analyzed for color, appearance, anhydrous ethanol content, distillation, AKI (motor octane number (MON) + research octane number (RON)/2), benzene, aromatics, olefin, and saturated hydrocarbons contents and the results indicated that they all conform to ANP specifications, 5 as described on Table 1. All analyses were performed on a portable Petrospec IR, GS 1000 PLUS, with a spectral range of 400-4000 cm −1 . The GS-1000 Multi-Function Analyzer uses a mid-infrared spectroscopic (IR) analysis technique to differentiate and quantify the individual components in a fuel sample. 38 Premium-common blends (M 1 -M 9 ) were prepared from these commercial samples and used to simulate adulteration of PG with CG gasoline ( Table 2). The CG and PG samples used to compose these mixtures were chosen randomly. All blends were prepared at Laboratório de Combustíveis e Biocombustíveis of Centro de Caracterização e Desenvolvimento de Materiais of Universidade Federal de São Carlos.

H NMR analyses
The 1 H NMR experiments were performed at 25 °C on a Bruker Avance III 500 spectrometer operating at 11.75 T, observing 1 H at 500.13 MHz. The spectrometer was equipped with a 5 mm triple broadband inverse detection (TBI) four-channel ( 1 H, 2 H, 13 C and X-nucleus) probe. For each analysis, 100 μL of gasoline were dissolved in 500 μL of deuterated chloroform (CDCl 3 , CIL, Tewksbury, USA) containing tetramethylsilane (TMS), used as the internal standard. The spectra were acquired using single excitation pulse sequence (zg, Bruker), 32 scans with acquisition time of 3.27 s, 64k time domain points distributed in a spectral width of 20 ppm and recycle delay of 2 s. Spectra were processed with 64k time domain points and applying an exponential function over the FID (free induction decay) by a line broadening factor of 0.3 Hz. Baseline and phase were automatically corrected by the software TopSpin (Bruker, Billerica, MA, USA). 1 H NMR spectrum for each sample were obtained in triplicate, resulting in 43 samples analyzed (8 CG, 8 PG and M 1-9 in triplicate).

Statistical analyses
The chemometric analyses (PCA, SIMCA, and PLS) were performed in AMIX software (Bruker BioSpin). The data matrix was obtained from the bucketing procedure of the 1 H NMR spectra of CG, PG and CG-PG blends. The spectral bucketing was performed in simple rectangular format of the signals between d 8.0 and 0.0 with integration mode obtained by sum of intensities. To keep most of the spectral information, the bucket's width was set to 0.01 ppm. The spectral region for the solvent and reference signals (CDCl 3 , d 7.5-7.0; TMS, d 0.5-0.0) were excluded from the data matrix.  The exploratory analysis by PCA was performed with non-scaled and mean-centered data matrix and confidence interval of 95.0%. The SIMCA classification was made using a model based on the information of PG gasoline samples, not used in the PCA. Eight PCs (variance explained 91%) were used to model the class. The model was validated by the complete cross validation method, with the use of all samples and confidence interval of 95.0%. For the PLS model, all 43 spectral data from gasoline samples, previously evaluated by physicochemical parameters, were used. For training set, 30 spectral data were used, divided into 6 CG, 18 M 1-9 and 6 PG. The model was validated by the leave-one-out-cross-validation method. Seven latent variables were used, the number found for the smallest root mean squared error of the cross validation (RMSECV), whose value was 0.13892. The prediction set was composed of 28 spectral data, divided into: 5 CG, 18 M 1-9 and 5 PG not used in training set. Table 3 shows the physicochemical parameters obtained for CG, PG and M 1-9 blends. Two trends related to the octane variations (AKI) and the chemical composition of the samples could be identified.

Results and Discussion
The progressive increasing in percentage of premium gasoline in the blends samples resulted in decreasing of aromatic hydrocarbons contents, while the saturated ones increased. Both variations culminated in the progressive increase of the AKI in samples M 1 to M 9 , which presented initial value of 88.5 units for M 1 and 96.4 units for M 9 . Thus, a direct proportional relationship between saturated hydrocarbons and AKI was identified, while inverse relationship was observed for aromatic hydrocarbons (Figure 1). This result was corroborated by the AKI values for individual CG and PG samples analyses. The CG group presented the lowest value of AKI (87.6) and saturated hydrocarbons content (53.0%), but it was the group with the highest percentage of aromatic hydrocarbons (14.9%). On the other hand, the PG group had the highest values for AKI (97.6) and saturated hydrocarbons percentage (65.5%), but lower values for aromatic hydrocarbons contents (2.8%).
Considering the results obtained according to ANP regulations the blends M 4 to M 9 , which contains up to 60% of CG, would be considered conform since they have AKI higher than 91 units. The difficulty in identifying the non-conform gasolines to AKI through the IR test lies in the fact that only minimum values are set for AKI. In this way, in order to develop a systematic product conformity analysis for the AKI parameter, the spectral profiles for CG and PG gasoline samples were characterized by 1 H NMR. Visually, the 1 H NMR spectra of the CG and PG are readily distinguished, due to the greater complexity of the CG chemical profile (Figure 2). In both spectra, the NMR signals of the hydrogens were characterized and supported by literature. 25,32,33 In the 1 H NMR spectra of the CG and PG samples, the signals between d 2.5 and 2.0 indicated the lower contents of α and β-benzylic hydrogens in the PG samples. The signals at d 1.27, 0.95, 0.92 and 0.90 were assigned to the n-octane, 39 used as maximum reference standard (100 units) of AKI. The triplet and quartet signals, respectively in d 1.2 and 3.68 (J 7.0 Hz), characterized the ethanol added as anti-knock additive in Brazilian gasolines. 1 Signals for olefin hydrogens (d 6.5-4.5) demonstrated higher quantities of these compounds in the CG gasoline,   as indicated by the comparison of the signal areas on 1 H NMR spectra (Figure 2). This fact may be related to the different origins of the petroleum or specific chemical processes for each type. The vinyl hydrogens signals were assigned as d 5.1-4.8 and 6.5-5.9 for monosubstituted (RHC=CHH, RHC=CHH, respectively); d 4.8-4.5 and 5.8-5.3 for geminal and vicinal disubstituted, respectively (RRC=CHH, RHC=CHR), and d 5.3-5.1 for trisubstituted (RHC=CRR) alkenes. The signals from aromatic hydrocarbons hydrogens were observed between d 8.5-6.8. The most representative compounds characterized were benzene by singlet at d 7. 35 and xylenes (ortho-and meta-dimethyl benzenes) by the signals between d 7.1-6.9. These assignments agree with Burri et al., 32 Meusinger, 30 and Sun and Wang. 40 The signal areas of the aliphatic (d 1.8-0.4) and aromatic (d 8.0-6.5) hydrogens from CG, PG and M 1,5,9 samples ( Figure 3), supported the proportions between AKI, aromatic and saturated contents, previously determined by IR analyses (Figure 1).
The signal areas of the aromatic hydrogens (d 8.0-6.5) from CG, PG and M 1-9 samples ( Figure 3) demonstrated a linear decreasing in the aromatic compounds with the increasing of premium gasoline contents. This result corroborated the proportions between AKI and aromatic contents, previously determined by IR analyses correlated with ANP tests (Figure 1). On the other hand, the hydrogen signal areas from aliphatic groups did not present a direct relationship between the different gasolines. This fact is related to the overlapping of hydrogen signals from aliphatic hydrocarbon and −CH, −CH 2 and −CH 3 groups in β and γ positions of aromatic rings observed at d 1. 8-0.4. 40 Therefore, the PG increasing content in blends (M 1 to M 9 ) results both to the decrease in the signal areas of hydrogens β and γ from aromatics, as well as the concomitant increasing in the signal areas for aliphatic hydrocarbon hydrogens. In order to reinforce these observations, PCA analyses of NMR data from CG, PG and M 1-9 samples were performed. The results evidenced the distinction between these different gasolines ( Figure 4).
The first two PCs explained 86.6% of the cumulative experimental variance. Samples with lower AKI values (CG, M 1 and M 2 ) were discriminated in PC1 negative scores. The progressive increasing in the percentage of PG and the AKI of the mixtures moved the gasoline discriminations for positive PC1 scores. In this way, the gasolines that presented higher values of AKI (M 7-9 and PG) were discriminated in positive scores of PC1 and PC2. These positive PC1 scores, responsible for discriminating the samples of higher AKI, were strongly influenced by hydrogen signals at d 0.92 and 0.90 according to PC1 × PC2 loading plot ( Figure 5). Compounds like n-octane, identified in Brazilian gasoline, 41 present methyl hydrogens with chemical shifts in this range. 39 On the other hand, the separation trends between the samples with lower AKI (CG, M 1 and M 2 ) observed in negative PC1, was strongly influenced by the signals from benzylic hydrogens at d 2.39 and 2.38. Just one M 5 and M 6 replicates were not perfectly distinguishable each other by the NMR-PCA model. The SIMCA PG model ( Figure 6) confirmed the separation trends observed in PCA. In the model, which was represented by the hyperbox in the lower left corner, only samples of premium gasolines were classified, with no type I error (sample not included in its own class) or type II (sample included in the wrong class) occurring.
The PG class model limits were determined according to Hotelling's T-squared distribution (T 2 ), which is understood as a statistical generalization of Student's t-analysis. The larger the difference between the sample information and the classifier (hyperbox of the class model), the higher is T 2 . 36,37 CG samples, visually more distant from the center of the model, presented T 2 whose value was 360.5. As the percentages of premium gasoline in the M mixtures increased, the T 2 values  In order, to complement the information about adulterations of premium gasolines observed in SIMCA, an NMR-PLS model was applied for predicting the AKI of gasoline samples (Figure. 7).
The predictive capacity of the model was evaluated by the root mean square errors of prediction (RMSEP) values, whose value was 0.50, for a prediction set of 28 samples. The low RMSEP value was interpreted  as a statistical measure of the good performance of the NMR-PLS model, when estimated the AKI of the new gasolines (not used in the calibration set) and presented good correlation with the values of AKI experimentally determined by the standard ANP method. The variance explained by the NMR-PLS model was 98.7% for 8 latent variables used. The error (e) found in the prediction of the AKI values for the whole sample set was lower than 2.0%, highlighting the greatest differences were 1.1 and 0.90%, for M 6 and M 7 , respectively, when compared to the IR analysis.

Conclusions
In this work, the use of 1 H NMR combined with the chemometric data treatment (PCA, SIMCA, and PLS) was described for the fast and unequivocal detection of a mixture of gasolines with different octane values. The discrimination trend between gasolines with different anti-knock index (AKI), observed in the PCA analyses, was associated with the variations in aromatic and saturated hydrocarbons contents. The SIMCA model classified the analyzed gasolines and their mixtures with 100% accuracy. The NMR-PLS model proved to be adequate in estimating the AKI values of CG, PG and CG/PG blends in different percentages. The good performance of the described method corroborates the use of NMR as a confirmatory technique for quality control of gasoline in a single experiment against the various tests currently adopted by oversight bodies and market agents.