Modeling Dipolar Molecules with PCP-SAFT: A Vector Group-Contribution Method

Predicting thermodynamic equilibrium properties is essential to develop chemical and energy conversion processes in the absence of experimental data. For the modeling of thermodynamic properties, statistical associating fluid theory (SAFT)-based equations of state, such as perturbed-chain polar (PCP)-SAFT, have been proven powerful and found broad application. The PCP-SAFT parameters can be predicted by group-contribution (GC) methods. However, their application to the dipole term is substantially limited: current GC methods neglect the dipole term or only allow for a single dipolar group per substance to avoid handling the molecular dipole moment’s symmetry effects. Still, substances with multiple dipolar groups are highly relevant, and their description substantially improves by including the dipole term in SAFT models. To overcome these limitations, this work proposes a vector-addition-based (Vector-)GC method for the dipole term of PCP-SAFT that accounts for molecular symmetry. The Vector-GC employs information on the substance’s molecular 3D structure to predict the molecular dipole moment through a vector addition of bond contributions. Combining the proposed sum rule for dipole moments with established sum rules for the remaining parameters yields a consistent GC method for PCP-SAFT for dipolar substances. The prediction capabilities of the Vector-GC method are analyzed against experimental data for two substance classes: nonassociating oxygenated and halogenated substances. We demonstrate that the Vector-GC method improves vapor pressure and liquid density predictions compared to neglecting the dipole term. Moreover, we show that the Vector-GC method enables differentiation between cis- and trans-isomers. The Vector-GC method, hence, substantially increases the predictive capabilities and applicability domain of GC methods. All parameters are provided as JSON and CSV files, and the Vector-GC method is available through an open-source python package. Additionally, the developed regression framework for GC methods for PCP-SAFT is openly available. The regression framework can be employed to regress the Vector-GC method to other substance classes and is easily adaptable to other sum rules for PCP-SAFT.


Initial values for the regression
Table S 1 Initial values employed for the regression of both the Vector-GC and the μ=0 method.Initial values for oxygenated groups are chosen based on the corresponding group contributions of Sauer et al. [23].Resulting group and bond contributions

Direct regression to experimental dipole moment data
In the main paper, we regress the bond contributions of the proposed dipole moment sum rule simultaneously with the other group contributions to thermodynamic data.In addition to this approach, we here investigate the minimally achievable deviation between experimental dipole To this end, we minimize the least square sum of the deviation between predicted and experimental molecular dipole moment: . ( Here,   = 181 is the total number of available dipole moments in the DIPPR database for the considered case study,  dippr, denotes the DIPPR dipole moment of substance , and  pred, represents the predicted dipole moment, determined by Equation (4) of the main paper.We define lower bounds for the considered bond dipole contributions such that   ≥ 0 holds for all considered bonds .We perform a leave-one-out cross-validation (LOO-CV).
A comparison to the PCP-SAFT dipole parameters predicted with the Vector-GC shows that regressing the bond contributions directly to experimental dipole moment data yields a lower mean absolute deviation (Figure S 1) and a slightly stronger correlation (Pearson correlation coefficient of 0.7 for Vector-GC, 0.73 for the regression to experimental data, and 0.71 for the LOO-CV prediction).This difference is expected due to the different objective functions chosen.Evidently, the Vector-GC uses the degrees of freedom of the sum rule to obtain dipole moments that are optimal for PCP-SAFT.PCP-SAFT expects effective dipole moments in the fluid phase [2], while the DIPPR database mainly contains dipole moments measured in vacuum.Hence, it is reasonable that the Vector-GC yields slightly higher dipole moments compared to regressing the dipole moments directly to the DIPPR data.
In addition, the comparison between the LOO-CV prediction and regression results shows only minor differences: The Pearson correlation coefficient is slightly higher for the regression results (LOO-CV: 0.71, regression: 0.73) and the mean absolute deviation is slightly lower (LOO-CV: 0.35 D, regression: 0.34 D).The strong similarity between LOO-CV and regression results show that the sum rule is robust and does not tend to overfitting for the considered data set.
The optimal bond dipole moments resulting from the regression to DIPPR data and from the Vector-GC are given in Table S 3.
moment data and the proposed dipole moment sum rule, i.e., the sum rule's model error.For this F−C ,  Cl−C ,  Br−C ,  I−C ,  I−C ,  O−C , and  O=C ) against the molecular dipole moment data from the DIPPR[1] database.The DIPPR database contains molecular dipole moment data for 181 of the 253 substances from the defined data sets of oxygenated and halogenated substances (cf., Section 3 in main paper).

Figure S 1
Figure S 1 Parity plot for predicted molecular dipole moments against molecular dipole moment data from DIPPR [39].Blue pentagons represent dipole moments resulting from the Vector-GC method (LOO-CV), red dots represent the regression result against the DIPPR data and pink crosses represent the LOO-CV predictions.The solid black line represents the angle bisector.The dashed pink lines indicate a deviation of 0.35 D, which is the mean absolute deviation of the LOO-CV.The dotted blue lines indicate a deviation of 0.68 D, which is the mean absolute deviation resulting from the Vector-GC.

Table S 2
Resulting group parameters for Vector-GC and the μ=0 method for regression to vapor pressure and liquid density data.

Table S 3
Resulting bond dipole moments for regression to dipole moment data from the DIPPR database, compared to bond dipole moments resulting from the Vector-GC.