(2017). Group Contribution Approach To Predict the Refractive Index of Pure Organic Components in Ambient Organic Aerosol. (17),

: We introduce and assess a group contribution scheme by which the refractive index (RI) ( λ = 589 nm) of nonabsorbing components common to secondary organic aerosols can be predicted from the molecular formula and chemical functionality. The group contribution method is based on representative values of ratios of the molecular polarizability and molar volume of di ﬀ erent functional groups derived from data for a training set of 234 compounds. The training set consists of 106 nonaromatic compounds common to atmospheric aerosols, 64 aromatic compounds, and 64 compounds containing halogens; a separate group contribution model is provided for each of these three classes of compound. The resulting predictive model reproduces the RIs of compounds in the training set with mean errors of ± 0.58, ± 0.36, and ± 0.30% for the nonaromatic, aromatic, and halogen-containing compounds, respectively. We then evaluate predictions from the group contribution model for compounds with no previously reported RI, comparing values with predictions from previous treatments and with measurements from single aerosol particle experiments. We illustrate how such comparisons can be used to further re ﬁ ne the predictive model. We suggest that the accuracy of this model is already su ﬃ cient to better constrain the optical properties of organic aerosol of known composition.


I. INTRODUCTION
Atmospheric aerosols are important in determining the global radiative balance through scattering and absorbing solar radiation. 1−8 Organic material partitioned into the condensed phase 6,9 associated with accumulation and coarse mode primary and secondary aerosol particles effectively scatters and absorbs incoming solar radiation and outgoing terrestrial radiation as well as influences the lifetime and albedo of clouds. 8 The chemical composition of particles determines their complex refractive index (RI), thereby influencing the efficiency of their interaction with light of a given wavelength and the ensuing impacts on climate, altering surface irradiances and actinic flux, 10 and on ozone formation through the reduction in surface UV radiation. 11,12 Commonly, atmospheric aerosols are composed of both inorganic and organic compounds, with compositions that vary greatly in both space and time. 13,14 The RIs of many inorganic aerosol components have been reasonably well established. 15 In contrast, the RIs of organic aerosol components have not been so completely characterized.
Current quantification of the chemical composition of the organic aerosol fraction is limited by its chemical complexity, 10 with hundreds (even thousands) of individual chemical compounds simultaneously present in the aerosol phase compared to the much simpler case of inorganic species. Until recently, the number of direct measurements of the RI of ambient aerosol was also rather limited. 16−21 It is incredibly difficult to infer the compounds present in ambient aerosol due to the large number of chemical precursors and the transformation of these compounds via subsequent oxidative, photochemical, or reaction pathways. Indeed, a number of methods have been developed to treat the properties of atmospheric organic aerosols, including gas-particle partitioning and chemical aging, which do not deal with individual chemical compounds but account for more general changes in observables such as polarity, volatility, carbon number, and O/C elemental ratio, in order to deal with this complexity. 2−4,9,22−25 Continued improvements in both measurement and modeling should lead to improvements in the accuracy and descriptions of the RIs of organic aerosol. However, a predictive framework that quantifies the optical properties of organic aerosol based on source/composition and changes due to in situ chemical processing would be beneficial.
Given the chemical complexity of organic aerosol, it is challenging to make predictions of fundamental physicochemical properties. This has led to the development of quantitative structure−property relationships (QSPR). This approach is most widely applied to predict vapor pressures in aerosol/gasphase partitioning models but has been also used to a limited extent to predict the RIs of organic components germane to atmospheric aerosols. 6,26−31 For example, Redmond and Thompson 32 developed and evaluated a QSPR model for the prediction of midvisible organic aerosol RI from molecular formula. They also assessed the applicability of the predictions for organic aerosols derived from chamber experiments and in models of organic aerosol aging. They assumed a linear relationship between the RI of an organic component and the degree of unsaturation (μ), molecular polarizability (α), and ratio of the density to molar mass of the component, ρ m /M: There is no physical basis for the assumption of a linear correlation between RI and μ, α, and ρ m /M; indeed, constraining the relationship of RI to these quantities in this way could lead to systematic inaccuracies in RI predictions. The molar refraction mixing rule has been used alongside bulk aqueous measurements and parametrizations for aqueous solution density to predict the refractive index beyond the solubility limit. 33 However, to the best of our knowledge, a robust and reliable approach for the prediction of RIs of organic components is not currently available.
A plausible approach to represent the large number of different organic compounds present in ambient aerosol with a manageable degree of model complexity is to consider the contributions of organic functional groups instead of individual compounds. Instead of judging a compound as the mixture of different elements, the compound is represented as a mixture of different functional groups. Further, molecular anisotropy can lead to significant intermolecular interactions (H bonding, van der Waals forces, etc.) between components in complex internal aerosol mixtures, and this can be factored in by introducing parameters to represent the interactions between different groups. 31,34−36 The group contribution (GC) approach has been widely introduced into thermodynamic predictions, such as with the well-known UNIFAC model. 37−40 UNIFAC assumes that dividing a molecule into appropriate functional groups, which is then modeled as a "solution of functional groups", can be used to determine the physicochemical properties of organic molecules. This approach for representing a molecule has been shown to provide an adequate level of accuracy in many instances to predict the real behavior of aqueous−organic and organic mixtures. 37 In this publication, we develop and evaluate a GC-based method to predict the real part of the RIs of organic components germane to atmospheric aerosols; we do not consider the imaginary part here. Unlike previous models that provided parametrizations of RI based on simple metrics measurable for ambient aerosol, 32,41 this model requires molecular structures to provide predictions of organic compound RIs. It is our purpose in this paper to provide an accurate framework for estimating the RIs of pure organic components using approaches similar to those used in thermodynamic models of, for example, hygroscopic growth. Reduced complexity models based on physically justifiable and validated simplifications will be the subject of future work. Already, we consider that the method may be useful to estimate/predict the relative humidity (RH) and compositional dependence of the RIs of organic aerosols in well-defined laboratory experiments and in models of organic aerosol aging using well-established RI mixing rules. 33,42 Following from a consideration of the deficiencies in simulations and models in previous studies, the model described here considers the role of molecular anisotropy and the presence of significant intermolecular interactions that might exist in complex internal aerosol mixtures. We will show that the method can provide accurate and reliable pure component organic RIs for further investigation of secondary organic aerosol growth and evolution.

II. METHOD
The Lorentz−Lorenz relation (also known as the Clausius− Mossotti relation) 43 relates the RI of a substance to the molar polarizability, α, and density, ρ m , of the compound: ρ m is a mass density of compound with units g cm −3 , M represents molar mass of the compound in g mol −1 , and n is the real part of the RI. We first introduce the molar volume, V m , into this equation. V m can be expressed as Therefore, the molar polarizability can be written as The molar polarizability can be expressed in terms of the contributions of different functional groups forming the compound and a stoichiometric weighting for each. 41 x i and α i /V m,i are the mole fraction and α/V m for each functional group i. Here, we provide a parametrization that leads to the calculation of α/V m from this GC approach.
To constrain the contributions of individual functional groups for a GC model, literature values of RIs for a wide selection of compounds are used to provide a training set. Then, values of α i /V m,i for each functional group are floated as fit parameters using the relationship In this study, functional groups are designated as either main groups or subgroups. Main groups provide a coarse designation of the key functional groups present in the molecule but do not recognize the number of hydrogen atoms bound to carbon atoms, e.g., CH n (alkyl main group) and CH n CO (ketone main group). Instead, subgroups resolve the structure of the molecule still further by recognizing the number of H atoms within the main group. For example, CH n is subdivided into CH 3 , CH 2 , CH, and C alkyl chains, and CH n CO is subdivided into CH 3 CO and CH 2 CO. This is the same classification of functional groups used in the Aerosol Inorganic−Organic Mixtures Functional groups Activity Coefficients (AIOMFACweb) equilibrium composition model. 30 Similar to the implementation in AIOMFAC-web, organic compounds are assigned subgroups representative of the functional groups contained in the organic molecule, with main groups identified automatically. 30 With the GC approach, there are a limited number of main and subgroups that can be used, in principle, to represent an unlimited number of organic

Environmental Science & Technology
Article molecules. Thus, the GC method provides a practical, yet scientifically robust approach to estimate properties for comparison with existing measurements and the predictive capability for molecules for which measurements are not available. This has already been implemented in other studies in models to the predict activity coefficients and pure component vapor pressures of atmospherically relevant organic compounds; here, the GC approach is used to determine pure component RIs. 37 −40 In the fitting process, we preset the initial values of α i /V m,i of different functional groups as random values. From these preset α i /V m,i , we calculate the RIs for compounds in the training set and compare with corresponding RIs reported in the literature. Then we adjust α i /V m,i to optimize the predicted RIs and minimize the difference to the literature values using the simplex linear programming solving method with bounds on the variables as implemented as an Excel solver add-in. Three separate GC models were derived to treat atmospheric related compounds containing C, H, N, and O, aromatic compounds, and compounds containing halogens separately. These are referred to as models 1−3, respectively, below. For models 1, 2, and 3, 32, 27, and 25 parameters are required to represent the functional groups (main and sub-) comprising 106, 64, and 64 compounds, respectively. Details of parameters corresponding to various functional groups are shown in Table S1. Clearly, new data on a wider range of compounds are desirable to improve the quality and reliability of the model. Here, we provide the basis for this longer term development.  Table S2. Functional groups considered in the fitting are the unsaturated carbon bond, OH, NH x , NO 2 , and −C(O)− functional groups. Details of all compounds in our training set including name, molecular structure, polarizability, and RIs from different methods and associated absolute errors are provided in Table  S2 in the Supporting Information. All these compounds are potential organic compounds identified as occurring in the atmospheric environment. For comparison, the molar polarizability of each organic compound in this training set is also known from the CRC Handbook of Chemistry and Physics. 44 Pure component RIs are known from previous measurements and the values adopted are those reported by Sigma-Aldrich in their catalogue. Indeed, the existence of both molar polarizability data and RIs are key conditions for the selection of a compound to use in the training set and necessarily limit the available data for the training set. Molar weights of compounds in the training set are between ∼25 and ∼225 g mol −1 .
As observed in Figure 1a, although data points (each corresponding to an individual compound) do exhibit scatter around the 1:1 line, RIs derived from the GC fitting give much better consistency with measured RIs than when derived directly from the molecular polarizabilities reported in the CRC Handbook of Chemistry and Physics and eq 4. The quality of the fitting is also shown as the percentage difference between a RI from the GC fitting or CRC polarizability data and the measurement in Figure 1b. The average differences and the standard deviation of RIs are illustrated in Table 1; the percentage differences reported in Figure 1 are calculated from the difference between the modeled RI and the value reported in the literature, relative to the reported value. A direct comparison between polarizabilities and RIs from GC fitting and CRC polarizabilities is provided in Figures 1c,d. In an effort to further characterize the performance of the model, we have considered deviations from the measured RIs with compound functionalities. In Figure S1, we summarize the

Environmental Science & Technology
Article influence of different functional groups on pure component RIs and the accuracy with which the GC model represents the literature data. The figure is divided into functional group categories of alkenes/alkynes, alcohols, amides/amines, nitro compounds, and aldehydes/ketones/carboxylic acids/esters. There are 22 compounds containing C−C double or triple bonds (alkenes, alkyne), 33 compounds containing hydroxyl (alcohols), 18 compounds containing an amino group (amide/ amines), six compounds containing nitryl groups (nitro compound), and 54 compounds containing a carbonyl group (aldehydes/ketones/carboxylic acids/ester). For each compound class, an average absolute error has been calculated to consider whether systematic trends exist between certain compound classes. The average differences and the standard deviation of RIs containing these different groups are reported in Table 2.
In order to evaluate the possible influence of the oxidative aging of aerosol on optical properties, we compare the GC model predictions with measurements highlighting the dependence on O/C ratio, H/C ratio, and degree of unsaturation in Figure 2a, b, and c, respectively (all data are the same as Figure  1). The limited data used in this study do not reveal systematic trends in RI with O/C and H/C ratios. Competing factors, such as component density, MW, multiple functional group types, molecular structure, etc., which each play a role in contributing to the value of RI, disguise any general trends across the coarse range of molecular compounds examined.
It has also been noted in a previous approach 32 that there is a linear relationship between RIs and degree of unsaturation, defined as follows in which μ is the degree of unsaturation, while #C, #H, and #N are simply the number of atoms of each element in the molecular formula. 32 Similar to the potential general trends in RI with O/C and H/C considered above, it is clear in Figure 2c that there is little correlation between the RI and the degree of unsaturation for the limited number of compounds considered here. As discussed above, real RIs can be influenced by the complex interplay of various factors such as component density, MW, and functional group type and molecular structure. Any general trends previously observed may not be observable if all of these vary from compound to compound and if the coverage of compounds (size of the training set) is too small. 45−47 From Figure 2, the model shows good accuracy and reliability independent of degree of oxidation and aging.
We have also evaluated the performance of the GC method for the training sets of 64 aromatic organic compounds (model 2) and 64 organic compounds containing halogen atoms (model 3). Molar weights of aromatic organic compounds in the training set are between ∼70 g mol −1 and ∼270 g mol −1 ; similarly, molar weights of organic compounds containing halogens in the training set are between ∼70 and ∼270 g mol −1 . Detailed information for these two models is provided in the Supporting Information; here, it is sufficient to state that

Environmental Science & Technology
Article similar levels of accuracy and similar trends are observed to the data used in model 1.

IV. PREDICTIONS OF PURE COMPONENT RIS FOR A BENCHMARK OXIDATION MECHANISM
In order to evaluate the performance of the GC method, we report estimations of the RIs of the ozonolysis products of maleic acid (MA), a benchmark system chosen based on our previous work. 48,49 We have chosen this system to benchmark the GC model here, having previously reported the use of the models provided by Cappa et al. 41 and Redmond and Thompson 32 to rationalize the measured RIs of oxidized MA aerosol determined from aerosol optical tweezers measurements. The details of the MA ozonolysis products considered in our previous study are shown in Table S5, along with the estimates of the pure component RIs. 48 The first method, proposed by Cappa et al., 41 is based on the RIs of heterogeneous oxidative aging products of squalane (a C 30saturated hydrocarbon) and azelaic acid (a C 9 dicarboxylic acid) by the hydroxyl radical (OH) and is based on a parametrization for the RIs derived from the elemental composition of dry aerosol. This allows estimation of the RIs of organic aerosol without knowledge of the specific molecular identity of the aerosol components. A second method, proposed by Redmond and Thompson, 32 is based on a quantitative structure−property relationship. It allows the prediction of the RIs of components of secondary organic aerosol from their molecular formula and density of 111 compounds to provide a parametrization for estimation of the RI of a compound.
In Figure 3, we report estimates of the RIs of the products from MA ozonolysis using these two previously reported methods and the GC method. 48 Details of these compounds are presented in Table S5. Uncertainties of RIs estimated using the methods of Cappa et al. 41 and Redmond and Thompson 32 come from the corresponding studies, while uncertainties of RIs estimated from group contribution are derived from the standard deviations for the different GC parameters. The predicted values of the RIs estimated by using the approach of Cappa and co-workers are consistently higher than those estimated by the method of Redmond and Thompson. Consistency between the treatments is quite poor with predicted values for the same compound spanning an implausibly large range of as much as 1.35 to 1.6. Despite this, some similar relative trends between compounds do exist between models. Further, predictions from the GC method are mostly located between predictions from the Cappa et al. 41 and Redmond and Thompson models. 32 The calculated RIs from both methods can be compared to the estimate of 1.481 ± 0.001 for the RI of a subcooled melt of pure MA determined in optical tweezer experiments. 48 The RIs of MA estimated using the methods of Cappa et al. 41 and Redmond and Thompson 32 are 1.57 ± 0.04 and 1.45 ± 0.03, respectively, while the RI estimated using the GC method is 1.479 ± 0.03, very close to the optical tweezers measurement of a droplet that exists as a subcooled liquid. MA is not in the original training set for model 1. The crystalline RI of MA is 1.509. 49 Predictions from the two earlier methods are based on parametrizations using compounds much less oxygenated than MA and the potential ozonolysis products; by contrast, the GC method is parametrized using compounds with a wide range of O:C ratio (see Figure 2 as an example). On the basis of our earlier measurements, 48  In order to compare results from the GC fitting introduced in this study and results from previous simulations, we compare RIs of all 234 compounds in our training sets (all results illustrated in Figure 1 and Figures S2 and S3) together with the RIs of all 111 compounds used in the training set by Redmond and Thompson 32 in Figure 4. Details of compounds in Redmond and Thompson's work 32 are presented in Table S6. Predicted RIs from the GC method provide a much more accurate representation of the RIs of all 335 compounds, including the compounds used by Redmond and Thompson to form their training set. This is apparent from the percentage differences between RIs from GC or Redmond and Thompson method and RIs from measurements as shown in Figure 4b.
A fuller evaluation of the performance of our GC method in reproducing the component RIs in the training set used by Redmond and Thompson, and the performance of the Redmond and Thompson model in replicating our training set is provided in the Supporting Information. The average differences and the standard deviations for comparisons of all RIs using training sets in different combinations are reported in Table 3.    Figure 5. These measurements have been previously reported and discussed by us, and we refer the reader to our previous publications for more information and Table S7. 33,50 All of these compounds are potential components of ambient aerosol. These substances include the following chemical functionalities: alcohols, amino acids, sugars, dicarboxylic acids, and hydroxyl acids. Molar polarizabilities are not available for any of these organic compounds, and so predictions can only be made using the GC approach. As observed in Figure 5a, RIs derived from the GC fitting do not yield equally accurate RIs across all compound classes. While predicted RIs of the alcohols and dicarboxylic acids are approximately consistent with measured RIs, those for amino acids, sugars, and hydroxyl acids can be very different from measured RIs, especially amino acids. The average differences and the standard deviation of RI containing different groups are summarized in Table 4.
In order to provide a refined GC model using the compound functionalities from our molar refraction measurements, we have included these 38 compounds with the 106 organic compounds previously used in the training set for model 1. The results of this refined model are presented in Figure 5b, illustrating the improvement in the predicted values. The average differences and the standard deviations of RIs  32 (gray hollow squares) and comparison of RI from GC model and reported measurements (colored solid circles mapped by organics molar weights). (b) Differences between RI from prediction and measurements in Redmond and Thompson's work 32 (gray hollow squares) and differences of RI from GC model and reported measurements (colored solid circles mapped by organics molar weights).  Figure 5. (a) Comparison between RIs of organic compounds from application of molar refraction mixing rule to bulk phase measurements and predictions from the GC method with color identifying the organic functionality of the molecule. (b) Same as (a) but following a refinement of the GC parameters using the molar refraction method derived data in addition to the original training set used to fit model 1.

Environmental Science & Technology
Article containing different groups are also illustrated in Table 4. This is a first example of the model retraining that can be pursued to improve the accuracy and generality of the model as more data become available. In summary, we report a group contribution model for predicting the RIs of pure organic components relevant for rationalizing the optical properties of atmospheric organic aerosol. We suggest that this model will be invaluable in interpreting laboratory based measurements of aerosol processes. Future developments of the model will focus on developing the simplicity of model, removing the need for structural information for the compounds considered while retaining robust and verifiable estimates of optical properties.

* S Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.est.7b01756.
Refractive indices for all compounds used in training sets, group contribution coefficients for all models, and figures reporting errors in model predictions (PDF)