1 Introduction

An accurate description of vapor/liquid equilibria is essential for developing efficient processes in the chemical and pharmaceutical industry. In particular, accurate models are crucial for the design of distillation and absorption processes that exploit vapor/liquid phase equilibria to separate products. For systems with no or few experimental data, reliable estimation methods are necessary in the early stages of conceptual process development.

In practice, excess Gibbs energy (\(g^{\textrm{E}}\)) models like NRTL [1] or UNIFAC [2] are widely used to model phase equilibria due to their simple mathematical structure. However, \(g^{\textrm{E}}\) models reach their limits if supercritical components are present or the system is generally at higher pressures. In that case, equations of state provide a more general approach to modeling phase equilibria. Aside from being able to model supercritical components, equations of state also have the advantage of combining phase equilibria, densities, and caloric properties in one consistent model.

Equations of state differ in their complexity and applicability: In industrial practice, cubic equations of state like Redlich–Kwong [3], Soave–Redlich–Kwong [4] and Peng–Robinson [5] are used excessively. To model mixtures with cubic equations of state, mixing rules are required. The choice of mixing rules is vast and significantly influences the description of phase equilibria [6, 7]. This ambiguity is rectified with molecular equations of state, like statistical associating fluid theory (SAFT) [8] and its derivatives (e.g., SAFT-VR-Mie [9], PCP-SAFT [10,11,12,13], or soft-SAFT [14, 15]). The parameters of molecular equations of state have a physical interpretation linked to the properties of the molecules. Because the properties of individual molecules do not change in a mixture, molecular equations of state can often give reasonable predictions of mixture properties based solely on a few (typically 3 to 7) pure-component parameters.

With the small number of parameters and their physical basis, molecular equations of state extrapolate well from few experimental data points. If experimental data are abundant, more flexible, and hence more accurate, equations of state have been parametrized, these so-called reference equations of state aim to reproduce experimental data within the measurement uncertainty. The development of reference equations of state was spearheaded by Roland Span [16,17,18,19,20], in whose honor this special issue is dedicated. Due to the large amount of data required for the parametrization, reference equations of state are only available for very few pure components, like water [21], \(\hbox {CO}_2\) [16], and nitrogen [19], but also for mixtures, like natural gas [22]. Due to the large number of parameters, the application of reference equations of state is usually constrained to the state space for which they were parametrized, although there are methods available to ensure reasonable behavior at temperatures and pressures far beyond those used in the parametrization. [17]

Molecular equations of state offer the most promising route for a large-scale parametrization of thousands of different mixtures. The explicit description of hydrogen bonds renders SAFT models especially valuable for accurately modeling systems in which hydrogen bonding plays an important role. This task is typically challenging with cubic equations of state. The SAFT modeling approach becomes particularly valuable for systems in which molecules that have hydrogen bond-acceptor sites but no hydrogen bond-donor sites (like ethers, esters, ketones, and many other polar molecules) are mixed with self-associating molecules (e.g., alcohols or water). The association model in SAFT [23] is based on Wertheim’s thermodynamic perturbation theory [24,25,26,27]. Although not explicitly considered by Wertheim initially, the framework is flexible enough to capture the so-called induced association between molecules [28]. Including the induced association in the model significantly improves the description of binary mixtures as shown by Cripwell et al. [29], Marshall and Bokis [30], and more recently by Nikolaidis et al. [31] and Smith et al. [32].

Including physical insights like hydrogen bonding into the model leads to robust predictions of mixture properties for many systems. The description of mixtures can be further enhanced for systems for which experimental data are available by adjusting a single binary interaction parameter.

The parametrization of an equation of state requires adequate experimental data. In contrast, components with few or no experimental data require predictive methods. Group-contribution (GC) methods, in particular, have proven successful for the estimation of equation of state parameters based purely on the structure of molecules. GC methods can be categorized into homosegmented and heterosegmented approaches. Homosegmented GC methods determine equation of state parameters from the number of occurrences of predefined groups in the molecule. Homosegmented GC methods have been developed for various equations of state, like SAFT [33, 34], SAFT-VR [33, 34], and PCP-SAFT [35,36,37]. Recently, group-contribution SAFT equations of state were reviewed by Shaahmadi et al. [38].

In contrast, heterosegmented GC models are expressed in terms of individual segments rather than entire molecules, similar to united atom force fields used in molecular simulations. As a consequence, binary (molecular) interaction parameters become obsolete. The effect of non-ideal interactions between unlike segments is resolved directly on the level of segment/segment interactions. Examples for heterosegmented GC models are SAFT-\(\gamma\) Mie [39], GC-SAFT-VR [40], and gc-PC-SAFT [36, 41, 42].

In a recent study, Esper et al. [43] published pure-component PCP-SAFT parameters for 1842 substances. When it comes to assessing mixture properties, Nikolaidis et al. [31, 44] investigated the performance of PC-SAFT on a set of 200 representative binary mixtures [45]. However, a comprehensive parametrization based on a large amount of experimental data that also accounts for the intricacies of hydrogen bonding, including induced association, is still lacking. A gap is also present for predictive methods and particularly GC methods, for which Sauer et al. [36] limited their study to pure-component properties.

In this work, we present parametrizations for three modeling paradigms. First, we determine binary interaction parameters for every mixture that can be described by the pure-component PCP-SAFT parameter database by Esper et al. [43] (containing 1842 substances) and for which experimental data are available through the DDB [46]. Depending on the types of intermolecular interactions, the binary interaction parameter can either adjust the strength of dispersive attraction between unlike molecules or the cross-association energy.

Next, we focus on the prediction of vapor–liquid equilibria using a group-contribution method. A previous study by Sauer et al. [36] only assigns association parameters to self-associating substances (e.g., alcohols), whereas association parameters were not assigned to polar groups (aldehydes, ketones, ethers, formates, and esters). For mixtures of, e.g., a ketone with an alcohol, it is beneficial to allow for cross-associating interactions. We thus extend the homosegmented group-contribution method for PCP-SAFT by Sauer et al. [36] by assigning association parameters for polar groups. These parameters have no effect on pure-component properties but significantly improve the description of mixtures that show induced association. Based on these updated group parameters, we parametrize a group-contribution method for binary interaction parameters to further improve the correlation with experimental data.

Finally, we repeat the adjustment of the association group parameters for polar molecules and the binary group/group interaction parameters for the heterosegmented group-contribution method by Sauer et al. [36] and compare and assess the results.

2 Rapid Calculation of Phase Equilibria and Derivatives

To calculate the bubble point pressure of an N-component mixture, the following set of \(N+1\) nonlinear equations must be solved for the pressure p and composition of the vapor phase \(\varvec{y}=\begin{pmatrix}y_1,\ldots ,y_N\end{pmatrix}^\intercal\):

$$\begin{aligned} f_0&=\sum _{i=1}^Ny_i-1=0 \end{aligned}$$
(1)
$$\begin{aligned} f_i&=\mu _i^{\textrm{V}}(T,p,\varvec{y},\varvec{\theta })-\mu _i^{\textrm{L}} (T,p,\varvec{x},\varvec{\theta })=0 \quad \quad \quad i=1\ldots N \end{aligned}.$$
(2)

The temperature T and the composition of the liquid phase \(\varvec{x}=\begin{pmatrix}x_1,\ldots ,x_N\end{pmatrix}^\intercal\) are given. In the expression for the chemical potentials \(\mu _i^{\textrm{L}}\) and \(\mu _i^{\textrm{V}}\) of the liquid and vapor phase, respectively, the dependence on the model parameters \(\varvec{\theta }\) is made explicit because large-scale parameter optimizations must rapidly determine the gradients of the target function (and hence the bubble point pressure) with respect to these model parameters. The derivatives can be obtained by reverse mode automatic differentiation (AD), which efficiently calculates gradients with respect to millions of model parameters in machine learning. Reverse mode AD requires the presence of an unbroken computational graph that records all mathematical operations from the inputs (in this case, the model parameters) to the outputs (the optimization target). However, the phase equilibrium conditions in Eqs. 1 and 2 are best and most robustly solved by specialized software that can, in general, not be interfaced with the AD framework. Aside from the interface, the performance also plays a role. Recording the entire iterative determination of the bubble point pressure on the computational graph is futile when only the result of the iteration is required to calculate derivatives.

Based on these observations, we propose the following solution method: The phase equilibrium calculation is carried out using the \(\text {FeO}_\text {s}\) [47] package. Then, one additional Newton step is carried out while recording the operations in the computational graph. For the calculation of the Newton step, the Jacobian of Eqs. 1 and 2 is required, which can be expressed by the following derivatives:

$$\frac{\partial f_0}{\partial p}=0 \quad \frac{\partial f_0}{\partial y_j}=1$$
(3)
$$\begin{aligned} \frac{\partial f_i}{\partial p}& =v_i^{\textrm{V}}(T,p^*,\varvec{y}^*, \varvec{\theta })-v_i^{\textrm{L}}(T,p^*,\varvec{x},\varvec{\theta })& &\frac{\partial f_i}{\partial y_j}=\left( \frac{\partial \mu _i^{\textrm{V}}}{\partial y_j} \right) _{T,p,y_{k\ne j},\varvec{\theta }} \quad i=1\ldots N, \end{aligned}$$
(4)

with the partial molar volume \(v_i=\left( \frac{\partial \mu _i}{\partial p}\right) _{T,\varvec{x}}\). The asterisk indicates properties that are calculated in an external framework and thus have no explicit connection via the computational graph to the parameters \(\varvec{\theta }\) anymore. The Newton step then consists of solving the system of linear equations

$$\begin{aligned} \frac{\partial f_i}{\partial p}\Delta p+\sum _{j=1}^N\frac{\partial f_i}{\partial y_j}\Delta y_j+f_i=0 \quad\quad\quad i=0\ldots N \end{aligned}$$
(5)

for the steps in pressure \(\Delta p\) and vapor composition \(\Delta \varvec{y}\). If only derivatives of the bubble point pressure and no vapor compositions are required, which is often the case for parameter estimations, the linear system of Eq. 5 can be recast to a single equation. By multiplying Eq. 5 with \(y_i^*\) for \(i\ne 0\) and summing over all components, the single remaining equation is

$$\begin{aligned}&\sum _{i=1}^Ny_i^*\left( v_i^{\textrm{V}}(T,p^*,\varvec{y}^*,\varvec{\theta }) -v_i^{\textrm{L}}(T,p^*,\varvec{x},\varvec{\theta })\right) \Delta p+\sum _{j=1}^N \Delta y_j\sum _{i=1}^Ny_i^*\left( \frac{\partial \mu _i^{\textrm{V}}}{\partial y_j} \right) _{T,p,y_{k\ne j},\varvec{\theta }}\nonumber \\&\quad\quad\quad\quad\quad\quad+\sum _{i=1}^Ny_i^*\left( \mu _i^{\textrm{V}}(T,p^*,\varvec{y}^*,\varvec{\theta }) -\mu _i^{\textrm{L}}(T,p^*,\varvec{x},\varvec{\theta })\right) =0 \end{aligned}.$$
(6)

The expression can be further simplified by applying the Gibbs–Duhem relation \(\sum _{i=1}^Ny_i\left( {\textrm{d}}\mu _i\right) _{T,p}=0\) and identifying the molar volume of the vapor phase \(v^{\textrm{V}}=\sum _iy_iv_i^{\textrm{V}}\) and the molar Gibbs energy of the vapor phase \(g^{\textrm{V}}=\sum _iy_i\mu _i^{\textrm{V}}\), resulting in

$$\begin{aligned}{} & {} \left( v^{\textrm{V}}(T,p^*,\varvec{y}^*,\varvec{\theta })-\sum _iy_i^*v_i^{\textrm{L}} (T,p^*,\varvec{x},\varvec{\theta })\right) \Delta p\nonumber \\{} & {} \quad +\left( g^{\textrm{V}}(T,p^*,\varvec{y}^*,\varvec{\theta })-\sum _iy_i^*\mu _i^{\textrm{L}} (T,p^*,\varvec{x},\varvec{\theta })\right) =0 \end{aligned}.$$
(7)

Using \(\Delta p=p^{\textrm{bubble}}(T,\varvec{x},\varvec{\theta })-p^*\) and the definition of the molar \(a=g-pv\) and partial molar \(a_i=\mu _i-pv_i\) Helmholtz energy, the expression can be solved for the bubble point pressure.

$$\begin{aligned} p^{\textrm{bubble}}(T,\varvec{x},\varvec{\theta })=-\frac{a^{\textrm{V}}(T,p^*, \varvec{y}^*,\varvec{\theta })-\sum _iy_i^*a_i^{\textrm{L}}(T,p^*,\varvec{x}, \varvec{\theta })}{v^{\textrm{V}}(T,p^*,\varvec{y}^*,\varvec{\theta })-\sum _iy_i^*v_i^{\textrm{L}} (T,p^*,\varvec{x},\varvec{\theta })} \end{aligned}.$$
(8)

The bubble point pressure in this expression is now connected with the parameters \(\varvec{\theta }\) via the computational graph. Therefore, derivatives of the target function with respect to the model parameters can be calculated using backpropagation.

The novel approach can evaluate 151 756 datapoints (95 415 bubble points and 56 341 dew points) that were used for the parametrization of the GC methods in merely 12 s (249 CPUs) on an AMD EPYC 7F72 workstation CPU. The subsequent calculation of the gradients of any number of input parameters is orders of magnitude faster than the forward pass because the expensive iterations for dew and bubble points are not recorded on the computational graph and therefore skipped.

Fig. 1 outlines the novel calculation procedure. The following aspects need to be taken into account when implementing Eq. 8:

  1. 1.

    Molecular equations of state like PCP-SAFT are formulated in terms of the molar Helmholtz energy as a function of its characteristic variables temperature and density. Therefore, a more convenient way of writing Eq. 8 is

    $$\begin{aligned} p^{\textrm{bubble}}(T,\varvec{x},\varvec{\theta })=-\frac{a(T,\rho ^{{\textrm{V}}*}, \varvec{y}^*,\varvec{\theta })-\sum _iy_i^*a_i(T,\rho ^{{\textrm{L}}*},\varvec{x}, \varvec{\theta })}{v(T,\rho ^{{\textrm{V}}*},\varvec{y}^*,\varvec{\theta }) -\sum _iy_i^*v_i(T,\rho ^{{\textrm{L}}*},\varvec{x},\varvec{\theta })} \end{aligned}.$$
    (9)
  2. 2.

    While the iterative calculation of the bubble point is done externally in the open-source software framework \(\text {FeO}_\text {s}\) [47], the PCP-SAFT Helmholtz energy and its derivatives still need to be implemented in the AD framework of choice, e.g., PyTorch. The vectors of partial molar volumes \(\varvec{v}\) and partial molar Helmholtz energies \(\varvec{a}\) are related to derivatives of the Helmholtz energy by

    $$\begin{aligned} \varvec{v}=-\frac{A_{\varvec{n}V}}{A_{VV}} \quad \text {and}\quad \varvec{a}=A_{\varvec{n}}-\frac{A_VA_{\varvec{n}V}}{A_{VV}} \end{aligned}.$$
    (10)

    In our implementation, the first (\(A_{\varvec{n}}\), \(A_V\)) and second (\(A_{\varvec{n}V}\), \(A_{VV}\)) partial derivatives are calculated in a single evaluation of the Helmholtz energy using a tensor hyper-dual number [48, 49] with both the amount of substance \(\varvec{n}\) and the volume V as one variable and the volume again as the second variable.

  3. 3.

    Although the description above refers specifically to bubble points, no assumption on the properties of the phases was made. Therefore, the same procedure can be used to calculate dew point pressures and their derivatives, as

    $$\begin{aligned} p^{\textrm{dew}}(T,\varvec{y},\varvec{\theta })=-\frac{a(T,\rho ^{{\textrm{L}}*}, \varvec{x}^*,\varvec{\theta })-\sum _ix_i^*a_i(T,\rho ^{{\textrm{V}}*},\varvec{y}, \varvec{\theta })}{v(T,\rho ^{{\textrm{L}}*},\varvec{x}^*,\varvec{\theta }) -\sum _ix_i^*v_i(T,\rho ^{{\textrm{V}}*},\varvec{y},\varvec{\theta })} \end{aligned},$$
    (11)

    where now the composition of the vapor phase \(\varvec{y}\) is given and the composition of the liquid phase \(\varvec{x}^*\) is calculated in \(\text {FeO}_\text {s}\) together with the densities of both phases.

  4. 4.

    In some instances, the model might encounter metastable liquid phases for a given experimental data point. However, we did not encounter cases for which neglecting a systematic analysis of the stability of the liquid phase lead to detrimental effects on the adjustment of binary interaction parameters.

Fig. 1
figure 1

Visualization of the proposed algorithm for the calculation of derivatives of bubble point pressures with respect to model parameters. For given T, \(\varvec{x}\), and \(\varvec{\theta }\), FeOs is used to determine the phase equilibrium. The resulting densities \(\rho ^{{\textrm{L}}*}\), \(\rho ^{{\textrm{V}}*}\), and composition of the vapor phase \(\varvec{y}^*\) are passed back to the PyTorch framework in which the Helmholtz energy \(A^{\textrm{HD}}\) is calculated from PC-SAFT. The evaluation uses generalized hyper-dual numbers [49]. Therefore, the output \(A^{\textrm{HD}}\) contains the Helmholtz energy itself together with the derivatives \(A_{\varvec{n}}\), \(A_V\), \(A_{\varvec{n}V}\), and \(A_{VV}\). Finally, Eq. 9 is used to determine the bubble point pressure. In the evaluation of the derivatives of \(p^{\textrm{bubble}}\) with respect to model parameters \(\varvec{\theta }\) using backpropagation, the expensive iterative calculations of the phase equilibria in FeOs are circumvented which improves the performance significantly

3 PCP-SAFT Equation of State

In this work, the perturbed-chain polar statistical associating fluid theory (PCP-SAFT) [10, 11, 13] is used as model to calculate and predict phase equilibria. PCP-SAFT splits the residual Helmholtz energy \(A^{\textrm{res}}\) into contributions corresponding to different intermolecular interactions, as

$$\begin{aligned} A^{\textrm{res}}=A^{\textrm{hs}}+A^{\textrm{chain}}+A^{\textrm{disp}} +A^{\textrm{dipole}}+A^{\textrm{assoc}} \end{aligned}.$$
(12)

The interactions considered in this work are hard-sphere repulsion (hs), chain formation (chain), dispersive attraction (disp), dipolar interactions (dipole), and hydrogen bonding/associating interactions (assoc). In PCP-SAFT, every fluid is characterized by the number of spherical segments per chain \(m_i\), the segment diameter parameter \(\sigma _i\), and the interaction energy parameter \(\varepsilon _i\). Polar fluids are described by their dipole moment \(\mu _i\) that can be known from experiments or simulations or fitted to experimental data. Associating fluids require the numbers of associating sites of kind A and B, \(n^A_i\) and \(n^B_i\), together with the dimensionless association volume \(\kappa ^{A_iB_i}\) and the association energy \(\varepsilon ^{A_iB_i}\). In this work, \(n^A_i\) denotes the number of hydrogen bond-donor sites and \(n^B_i\) the number of hydrogen bond-acceptor sites.

PCP-SAFT is designed to describe mixtures with any number of components. However, predicting mixture properties solely based on pure-component parameters can lead to deviations from experimental data. In this case, the dispersion energy between unlike molecules \(\varepsilon _{ij}\) can be adjusted using the binary interaction parameter \(k_{ij}\), as

$$\begin{aligned} \varepsilon _{ij}=\sqrt{\varepsilon _i\varepsilon _j}\left( 1-k_{ij}\right) \end{aligned}.$$
(13)

A negative \(k_{ij}\) implies that the interactions between the unlike molecules are stronger than estimated from the Berthelot combining rule. A common reason for underestimating binary interactions are mixtures between self-associating components (e.g., alcohols or amines) and polar but not self-associating components (e.g., ester, ethers, ketones). In these cases, the polar components can form hydrogen bonds with the self-associating components in the mixture but not with themselves. If this interaction is not accounted for explicitly in the equation of state, the only remedy is a large negative binary interaction parameter \(k_{ij}\). However, the association contribution in SAFT models is fully capable of capturing this effect of so-called induced association [28,29,30,31,32].

The association contribution in PCP-SAFT can be cast in the form

$$\begin{aligned} \frac{A^{\textrm{assoc}}}{k_{\textrm{B}}TV}=\sum _i\rho _i\left( n_i^A\left( \ln X_i^A-\frac{X_i^A}{2}+\frac{1}{2}\right) +n_i^B\left( \ln X_i^B-\frac{X_i^B}{2}+\frac{1}{2}\right) \right) \end{aligned}$$
(14)

with the Boltzmann constant \(k_{\textrm{B}}\), temperature T, volume V, and the partial densities of all the components \(\rho _i\). The fractions of non-bonded association sites \(X_i^A\) and \(X_i^B\) have to be determined implicitly from the equations

$$\begin{aligned} \frac{1}{X_i^A}=1+\sum _j\rho _j n_j^BX_j^B\Delta ^{A_iB_j} \end{aligned}$$
(15)

and

$$\begin{aligned} \frac{1}{X_i^B}=1+\sum _j\rho _j n_j^AX_j^A\Delta ^{A_iB_j} \end{aligned}.$$
(16)

In PCP-SAFT, the association strength \(\Delta ^{A_iB_j}\) is calculated from

$$\begin{aligned} \Delta ^{A_iB_j}=y_{ij}(d_{ij})\sqrt{\sigma _i^3\kappa ^{A_iB_i} \sigma _j^3\kappa ^{A_jB_j}}\left( e^\frac{\varepsilon ^{A_iB_j}}{k_{\textrm{B}}T} -1\right) \end{aligned}.$$
(17)

An expression for the cavity correlation function at contact of the reference fluid \(y_{ij}(d_{ij})\) can be found in the original publication on PC-SAFT [10]. Equation 17 already includes the combining rule for the association volume according to Gross and Sadowski [11]. Therefore, the cross-association energy parameter \(\varepsilon ^{A_iB_j}\) is the last remaining mixture parameter. It can be estimated from the combining rule

$$\begin{aligned} \varepsilon ^{A_iB_j}=\frac{1}{2}\left( \varepsilon ^{A_iB_i} +\varepsilon ^{A_jB_j}\right) \end{aligned}$$
(18)

or adjusted to experimental data, like a \(k_{ij}\) parameter.

For molecules that do not self-associate, e.g., a ketone with \(n^A_i=0\) (no hydrogen bond-donor site) and \(n^B_i=1\) (one hydrogen bond-acceptor site), the parameter \(\varepsilon ^{A_iB_i}\) has no physical meaning and vanishes for pure components. However, it can still be used together with the combining rule in Eq. 18 to model mixtures with self-associating components, leading to crosswise association among the two substances.

Fig. 2
figure 2

Binary vapor/liquid equilibrium of 2-butanone and ethanol at \(p=1.013\,\text {bar}\). Comparison between PCP-SAFT predictions (\(k_{ij}=0\)), adjusted binary interaction parameter (\(k_{ij}=-0.07\)), adjusted induced association energy parameter (\(\varepsilon ^{A_iB_j}=1990\)), and experimental data [50,51,52]

For a mixture of 2-butanone and ethanol, Fig. 2 demonstrates how accounting for induced association can enhance the description of binary vapor/liquid equilibria: The pure prediction (\(k_{ij}=0\)) is not able to capture the behavior of the mixture at all. A single binary interaction parameter \(k_{ij}\) can be used to obtain an azeotrope in the correct temperature range. However, the shape of the phase diagram still differs significantly from the experimental data. Finally, adjusting a single cross-association parameter \(\varepsilon ^{A_iB_j}\) for 2-butanone/ethanol gives an excellent description of the phase behavior over the entire composition range. To achieve a consistent comparison regarding the number of degrees of freedom and because the parameters are strongly correlated, the value of \(\kappa ^{A_iB_i}\) for 2-butanone is chosen to be identical to that of ethanol.

3.1 Homosegmented group-contribution method

The homosegmented group-contribution model for PCP-SAFT by Sauer et al. [36] calculates PCP-SAFT parameters from the number of occurrences \(n_{i,\alpha }\) of group \(\alpha\) on molecule i and group-specific parameters \(m_\alpha\), \(\sigma _\alpha\), and \(\varepsilon _\alpha\) using the sum rules

$$\begin{aligned} m_i&=\sum _{\alpha \in i}n_{i,\alpha }m_\alpha \end{aligned},$$
(19)
$$\begin{aligned} m_i\sigma _i^3&=\sum _{\alpha \in i}n_{i,\alpha }m_\alpha \sigma _\alpha ^3 \end{aligned},$$
(20)
$$\begin{aligned} m_i\varepsilon _i&=\sum _{\alpha \in i}n_{i,\alpha }m_\alpha \varepsilon _\alpha \end{aligned}.$$
(21)

Polar fluids require a dipole moment \(\mu _i\). However, a similar group-contribution model for polar components is challenging because the dipole moment is not additive; opposing dipole moments within the same molecule can cancel each other. For associating molecules, Wertheim’s theory is capable of describing arbitrarily many association sites per molecule. However, only fluids with up to one polar or associating group were considered for the parametrization. Therefore, we also restrict the study to fluids with at most one polar or associating group. If a molecule contains a polar or associating group, the parameters \(\mu _i\), \(\kappa ^{A_iB_j}\), \(\varepsilon ^{A_iB_j}\), \(n_i^A\), and \(n_i^B\) are set to the corresponding group parameters.

Compared to a description of PCP-SAFT with individually adjusted pure-component parameters, the GC method has increased deviations in the description of pure-component vapor pressures. These deviations propagate in the calculation of binary phase equilibria and lead to systematic errors in the prediction of bubble and dew point pressures. By construction, binary interaction parameters do not affect pure-component properties, and it is therefore not desired that binary interaction parameters correct deviations in the mixtures stemming from inaccuracies that are already present for pure components. To alleviate this effect, we modify the sum rule for \(\varepsilon _i\) as

$$\begin{aligned} m_i\varepsilon _i^*=\varphi _i\sum _{\alpha \in i}n_{i,\alpha }m_\alpha \varepsilon _\alpha \end{aligned}$$
(22)

and adjust the parameter \(\varphi _i\) to pure-component vapor pressures for every component for which experimental data (\(T_k^{\textrm{exp}}\), \(p_k^{\textrm{sat,exp}}\)) is available by minimizing the residual

$$\begin{aligned} {{\mathcal {L}}}^{\textrm{pure}}(\varphi _i)=\frac{1}{2}\sum _k \left( \ln \left( \frac{p^{\textrm{sat}}(T_k^{\textrm{exp}}, \varphi _i)}{p_k^{\textrm{sat,exp}}}\right) \right) ^2 \end{aligned}.$$
(23)

The adjusted \(\varepsilon _i^*\) parameters are used for fitting binary parameters. For the results shown later in Fig. 5; however, the values predicted from the GC model for \(\varepsilon _i\) are used (corresponding to \(\varphi _i=1\) in Eq. 22. As a result, these systems exhibit a systematic deviation between model results and experimental data. However, we regard these deviations as preferable compared to binary parameters that incorrectly change the behavior of mixtures to counteract inaccuracies in the description of pure components.

3.1.1 GC method for binary interaction parameters

We extend the homosegmented GC method with the prediction of binary interaction parameters. A sum rule is used to determine a binary component/component interaction parameter \(k_{ij}\) based on the group counts \(n_{i,\alpha }\) and binary group/group interaction parameters \(k_{\alpha \beta }\). In line with the sum rules for the pure-component parameters, we choose a sum rule which is linear in \(k_{\alpha \beta }\), as

$$\begin{aligned} k_{ij}=\frac{\sum _{\alpha \in i}\sum _{\beta \in j}n_{i,\alpha }n_{j,\beta }k_{\alpha \beta }}{\sum _{\alpha \in i}\sum _{\beta \in j}n_{i,\alpha }n_{j,\beta }} \end{aligned}.$$
(24)

The sum rule is similar to the one proposed by Peters et al. [53] and even identical to it in the limit of vanishing \(k_{\alpha \beta }\) and \(k_{ij}\). An alternative linear sum rule, motivated by the one-fluid theory used in the derivation of PCP-SAFT, was assessed. Because no significant improvement was found, the simpler Eq. 24 was chosen. A comparison is shown in the Supplementary Material.

3.2 Heterosegmented Group-Contribution Method

Sauer et al. [36] parametrized a homosegmented and a heterosegmented GC method for PCP-SAFT. The heterosegmented approach accounts for interactions between individual segments, comparable to other heterosegmented equations of state, like SAFT-\(\gamma\)-Mie [39]. As a consequence, the heterosegmented approach does not simplify to PCP-SAFT, even if the molecule is assumed to be homosegmented and is significantly more computationally expensive. However, Sauer et al. [36] showed that the heterosegmented model is also more accurate in the description of pure-component vapor pressures and liquid densities for all chemical families under consideration. Furthermore, the refinement of individual segments rather than entire molecules leads to a more physical description of inhomogeneous systems, like interfaces [54, 55].

The heterosegmented group-contribution method for PCP-SAFT uses the same perturbation terms as regular PCP-SAFT but with individual groups as species rather than molecules [36, 41]. The interaction energy between two groups is modeled using Lorentz–Berthelot combining rules and a binary group/group interaction parameter, as

$$\begin{aligned} \varepsilon _{\alpha \beta }=\sqrt{\varepsilon _\alpha \varepsilon _\beta } \left( 1-k_{\alpha \beta }\right) \end{aligned}.$$
(25)

To preserve the essence of a binary interaction parameter, \(k_{\alpha \beta }\) is always set to 0 for group/group pairs on the same molecule. Analogously to the homosegmented group-contribution method, we introduce a component-specific parameter \(\varphi _i\) [54] that is adjusted to pure-component vapor pressure data, as

$$\begin{aligned} \varepsilon _{\alpha \beta }^*=\sqrt{\varepsilon _\alpha \varphi _i \varepsilon _\beta \varphi _j}\left( 1-k_{\alpha \beta }\right) \end{aligned}.$$
(26)

Because, the heterosegmented model explicitly accounts for interactions between groups, it requires no sum rule for the molecule/molecule interaction parameter.

4 Parametrization

Recently, an extensive database of PCP-SAFT parameters was published by Esper et al. [43]. The parameters were obtained by fitting to experimental vapor pressure and liquid density data from the Dortmund Data Bank (DDB) [46], DIPPR [56, 57], and ThermoML [58, 59]. In this work, this parameter database is extended with binary interaction parameters adjusted for mixtures for which experimental VLE data are available from the DDB. The VLE data in the DDB are vast and although many possibly spurious datapoints are marked, unreliable or inaccurate data can be encountered in some cases. Jaubert et al. [45] present a curated subset of the DDB VLE data that contain accurate data selected to cover all forms of hydrogen bonding between the constituents. The rigorous selection of the data is valuable for a comparison study; however, the comparably small number of binary mixtures (200) is insufficient for a large-scale parametrization, particularly regarding group-contribution methods. Consequently, suspicious data from DDB were filtered out manually whenever the results suggest inconsistencies or inaccuracies in the data. Some components for which pure-component parameters are available from Esper et al. [43] were excluded from the parameter estimation, mostly because PCP-SAFT cannot reproduce the experimental data with or without binary interaction parameters. Notably, all components that do not contain any C atom were excluded except for water, ammonia, and sulfur dioxide. This includes gases, like oxygen and nitrogen, which are mostly measured at low temperatures or high pressures. For some systems, PCP-SAFT shows weaknesses in the description of states close to the critical point. In many cases, the predictions of supercritical phase equilibria can be satisfying, but we want to avoid adjusting binary interaction parameters in a region with known model weaknesses. Phase equilibria of water or \(\hbox {CO}_{2}\) with many other components are challenging to describe with SAFT models. Nevertheless, we keep them as part of this study because of their general importance. Currently, however, there is no conclusion in the field on how to best model water using SAFT models [60,61,62] and therefore, it might be necessary to choose a different parameter set that is fine-tuned to a specific application.

As described in Sect. 3, mixtures containing a self-associating component and a component with hydrogen bond-acceptor but no donor sites can be described more accurately by adjusting the association energy parameter \(\varepsilon ^{A_iB_j}\) instead of a binary interaction parameter \(k_{ij}\). Including this induced association contribution improves the description of mixtures for simple systems, in which the deviation in the prediction can be attributed to the missing hydrogen bonds (e.g., the mixture of 2-butanone and ethanol shown in Fig. 2). Previous studies by Cripwell et al. [29] and Smith et al. [32] show that the association energy and volume parameters are strongly correlated. Therefore, the \(\kappa ^{A_iB_i}\) parameter of the non-self-associating component is set to that of the self-associating component. To obtain a predictive model within a confined molecular space, previous studies adjust association parameters for entire chemical families rather than individual components [29, 30, 32]. Due to the large scope of our parametrization, a classification of molecules to chemical families is not straightforward and we are left with adjusting association parameters for individual mixtures. A more predictive approach on smaller molecular space is provided by the group-contribution methods detailed in Sects. 4.1 and 4.2. For systems for which the description of the mixture is generally poor, adjusting \(k_{ij}\) can result in lower deviations compared to accounting for the induced association. Therefore, we adjust \(k_{ij}\) and \(\varepsilon ^{A_iB_j}\) in two separate optimizations and pick the parametrization with the lower deviation from experimental data. Mixtures that do exhibit induced association but are nevertheless parametrized with \(k_{ij}\) typically consist of larger molecules in which the association energy contribution is small compared to the dispersive attraction. The regression results in the Supplementary Information contain the optimized parameters and corresponding deviations for both parametrization strategies. We observed that improvements from adjusting both \(k_{ij}\) and induced association parameters together for a mixture are generally small to negligible.

Similar to mixtures exhibiting induced association, mixtures of two self-associating components can also be corrected by adjusting the cross-association energy parameter \(\varepsilon ^{A_iB_j}\). Again, for some mixtures, adjusting a \(k_{ij}\) instead leads to better results and again, we refrain from adjusting both parameters at once because their effects on the phase equilibria are similar.

The target function that is used for all parameter estimations of binary systems is

$$\begin{aligned} {{\mathcal {L}}}^{\textrm{binary}}(\varvec{\theta })= & {} \frac{1}{n^{\textrm{bubble}} +n^{\textrm{dew}}}\left( \sum _{k=1}^{n^{\textrm{bubble}}}\left( \ln \left( \frac{p^{\textrm{bubble}}(T_k,\varvec{x}_k,\varvec{\theta })}{p_k^{\textrm{bubble,exp}}} \right) \right) ^2\right. \nonumber \\{} & {} \left. +\sum _{k=1}^{n^{\textrm{dew}}}\left( \ln \left( \frac{p^{\textrm{dew}} (T_k,\varvec{y}_k,\varvec{\theta })}{p_k^{\textrm{dew,exp}}}\right) \right) ^2\right) \end{aligned},$$
(27)

where \(\varvec{\theta }\) can refer to binary interaction parameters, association parameters, or group/group interaction parameters.

4.1 Homosegmented Group-Contribution Method

The extension of the homosegmented group-contribution method two mixtures is split into two parts. First, in Sect. 4.1.1, association parameters are adjusted for the polar groups, to be able to model the induced association between polar and self-associating components. The resulting parametrization predicts mixtures with similar accuracies for all types of intermolecular interactions. In Sect. 4.1.2, the prediction is then enhanced further by adjusting the binary group/group interaction parameters \(k_{\alpha \beta }\) introduced in Eq. 24.

4.1.1 Group Parameters for Induced Association

The quality of mixture predictions using the homosegmented GC method by Sauer et al. [36] is assessed by adjusting binary interaction parameters \(k_{ij}\) for every mixture individually using pure-component parameters obtained from the GC method. The so-obtained binary interaction parameters are used to quantify the accuracy of the predictions, with \(k_{ij}=0\) meaning that the model predicts the mixture perfectly and high absolute values of \(k_{ij}\) implying that the predictions are inaccurate. Using \(k_{ij}\) as a measure for the accuracy of the prediction rather than residuals with respect to experimental data suppresses the effect of the data, which varies significantly in the amount, the quality, and the temperature or pressure distribution between the different mixtures. At the same time, high absolute values of \(k_{ij}\) can be false negatives in the sense that the model predicts the mixture qualitatively well but with systematic quantitative errors. These false negatives occur particularly for mixtures with components that have similar vapor pressures.

In total, binary interaction parameters \(k_{ij}\) are adjusted for 1721 mixtures consisting of 267 individual components. The distribution of the resulting \(k_{ij}\) values is shown in Fig. 3. The mixtures are grouped by the polarity of their constituents based on their PCP-SAFT parameters (n: non-polar, p: polar (\(\mu _i>0\)), a: (self-)associating (\(n^A_in^B_i>0\))). For five of the six mixture categories, the binary interaction parameters are centered around 0. For non-polar/polar mixtures, the distribution is spread out, showing that correcting the properties of these mixtures with a binary interaction parameter is more necessary than, e.g., for non-polar/non-polar mixtures. Most striking, however, is the systematic error in the prediction of mixtures combining one polar and one self-associating component: For these mixtures, negative \(k_{ij}\) parameters are required throughout, which means that the attractive interactions between unlike molecules are underestimated. The group of polar components consists of aldehydes, ketones, ethers, formates, and esters, all of which can form hydrogen bonds with molecules containing hydrogen bond-donor sites but not with themselves. The results strongly suggest that neglecting this interaction leads to inaccurate mixture property predictions that need to be compensated by large negative binary interaction parameters.

Fig. 3
figure 3

Distribution of \(k_{ij}\) parameters for individual mixtures based on the homosegmented GC method [36]. The mixtures are grouped by the polarity of their constituents based on their PCP-SAFT parameters (n: non-polar, p: polar (\(\mu _i>0\)), a: (self-)associating (\(n^A_in^B_i>0\))). Accounting for the induced association between polar and self-associating components removes the systematic error in the description of the mixtures. The number above the plots indicates the number of mixtures in each group

As exemplified in Fig. 2, a more precise and physically meaningful approach to reduce the prediction error is to explicitly consider the induced association between the polar molecules and the associating molecules. Therefore, the number of association sites for each polar group is determined based on the number of hydrogen bond-donor (\(n^A_i\)) and acceptor (\(n^B_i\)) sites. This strategy assigns one hydrogen bond-acceptor site (\(n^B_i=1\)) to aldehydes, ketones, and ethers, whereas for esters and formates, both oxygen atoms are counted as acceptor sites which leads to \(n^B_i=2\). The remaining parameters \(\kappa ^{A_\alpha B_\alpha }\) and \(\varepsilon ^{A_\alpha B_\alpha }\) are then adjusted to experimental data of entire chemical families. We consider alcohols as the most important class of molecules that can form hydrogen bonds with non-self-associating, polar molecules. Therefore, the association parameters \(\kappa ^{A_\alpha B_\alpha }\) and \(\varepsilon ^{A_\alpha B_\alpha }\) are adjusted to mixtures between alcohols and molecules of the polar family (e.g., esters). Amines are not used for parameter estimation because significantly less data is available. The parameters \(\kappa ^{A_\alpha B_\alpha }\) and \(\varepsilon ^{A_\alpha B_\alpha }\) are strongly correlated and require data over a wide temperature for robust parameter estimation. In this study, the benefit of adjusting both parameters was found to be small. Therefore, to circumvent the correlation between the two parameters, \(\kappa ^{A_\alpha B_\alpha }\) is chosen to be identical to the value for the hydroxy group. Furthermore, only one parameter was adjusted for all ethers, despite Sauer et al. [36] introducing two separate groups (\(\hbox {OCH}_3\) and \(\hbox {OCH}_{2}\)). In total, only five parameters were adjusted to data of 267 binary mixtures. The resulting parameters are shown in Table 1.

Table 1 Induced association parameters for the polar groups

Readjusting the binary interaction parameters \(k_{ij}\) using the group parameters that include hydrogen bond-acceptor sites for polar molecules (from Table 1) rectifies the systematic error in the description of mixtures of polar and associating components, as can be seen in Fig. 3.

4.1.2 Binary Group/Group Interaction Parameters

To parametrize the group-contribution method, Eq. 24, for the binary interaction parameter, the \(k_{\alpha \beta }\) values in Eq. 24 requires a large amount of binary vapor/liquid equilibrium data. Our study relies on VLE data from the Dortmund Data Bank (DDB) [46]. Even though the simultaneous determination of all bubble and dew points, including the calculation of gradients with respect to the \(k_{\alpha \beta }\) values, is possible within seconds using the method described in Sect. 2, finding an optimum in such a high-dimensional space is challenging. Additionally, to obtain robust parameters, it is advisable not to adjust group/group interaction parameters to chemical families with which they are not primarily associated. As an example, the large amount of data for MTBE and similar branched ethers would lead to an overfit of the \({>}\hbox {C}{<}\) group parameters to mixtures containing ethers which in turn can have a negative effect on the description of mixtures containing branched alkanes.

Therefore, analogous to the method of Sauer et al. [36], group/group interaction parameters are fitted subsequently to mixtures containing the chemical families for which the pure-component group parameters were also adjusted to. The availability of mixture data for the various chemical families is shown in Fig. 4. “Others” refers to components that can be built with the groups defined by Sauer et al. [36] but do not belong to any of the chemical families used for determining the parameters.

Fig. 4
figure 4

Number of individual binary mixtures for every combination of chemical families. The letter in parentheses indicates the polarity of the molecules (n: non-polar, p: polar, a: (self)-associating)

Every molecule belonging to the chemical families shown in Fig. 4 (excluding “others”) can be fragmented into alkane groups plus up to three groups that are characteristic for the specific family. Therefore, the parametrization starts with alkane/alkane mixtures to obtain parameters for the alkane/alkane group combinations. We observed a good representation of mixtures containing linear or branched alkanes with all \(k_{\alpha \beta }=0\). Because adjusting \(k_{\alpha \beta }\) to alkane mixtures only led to marginal improvements or even deteriorations due to the influence of outliers in the data, we decided to set \(k_{\alpha \beta }\) to zero for alkane mixtures.

In the next step, alkane/non-alkane mixtures are investigated to obtain the group/group interaction parameters between alkane and non-alkane groups. Again some of the mixtures are predicted sufficiently well by PCP-SAFT. Therefore, no group/group parameters are used for mixtures of alkanes with alkenes, cyclohexanes, cyclopentanes, alcohols, and amines.

Finally, non-alkane/non-alkane mixtures are adjusted. We do not consider group/group interaction parameters for several chemical family combinations (alkenes/alkenes, alkenes/cyclohexanes, alkenes/cyclopentanes, alkenes/aldehydes, alkenes/alcohols, alkenes/amines, cyclohexanes/cyclohexanes, cyclohexanes/cyclopentanes, cyclohexanes/amines, alcohols/alcohols), either because of insufficient data quality or because predictions with \(k_{\alpha \beta }=0\) already lead to good results with minor improvements for adjusted parameters. Additionally, Fig. 4 shows how no experimental data is available for several family combinations containing alkynes, cyclopentanes, and amines.

4.2 Heterosegmented Group-Contribution Method

For the parametrization of the binary group/group interaction parameters of the heterosegmented GC model, the same strategy as for the homosegmented approach is applied, including setting the same group/group combinations to zero. Parametrizing the model on the identical dataset allows for a comprehensive comparison between the two modeling paradigms.

5 Results

To obtain an impression of how to interpret the error metrics discussed in this section, Fig. 5 shows phase diagrams for selected binary mixtures. The results from the PCP-SAFT parameter fit, the homosegmented GC method, and the heterosegmented GC method are compared to experimental data from the DDB [46]. The mixtures are selected based on the polarity of their constituents and the performance of the model. The rows show from top to bottom: non-polar/non-polar, non-polar/polar, polar/polar, associating/non-polar, associating/polar, and associating/associating mixtures. The first column gives examples of mixtures modeled accurately with all three methods. The examples in the center columns highlight systems where the GC methods show larger deviations from experimental data. Finally, the third column shows problematic systems, in part due to low data quality.

Fig. 5
figure 5

Phase diagrams for selected binary mixtures. Comparison of PCP-SAFT with fitted parameters (black), the homosegmented GC method (green), and the heterosegmented GC method (purple) with experimental data (DDB) [46]. The percentages in the legend refer to the MARD of all data points for the mixture. Rows from top to bottom: non-polar/non-polar, non-polar/polar, polar/polar, associating/non-polar, associating/polar, and associating/associating mixtures. Columns from left to right: mixtures modeled accurately with all three methods, systems in which the GC methods show larger deviations from experimental data and problematic systems, in part due to low data quality (Color figure online)

The examples in Fig. 5 showcase how all classes of mixtures can be described satisfactorily with PCP-SAFT and both group-contribution methods. The high accuracy over all considered classes of mixtures is only possible with the inclusion of induced association in the model. Deviations to the experimental data are mainly from inaccuracies in the GC models’ description of pure-component phase equilibria (e.g., for benzene, ethanol, or 2-methylpropan-1-ol). The percentages in the legend refer to the mean average relative deviation (MARD) of the bubble or dew point pressures for all data points of the given mixture. Therefore, the values also include results at temperatures or pressures not shown in the plots. It is also noteworthy to point out that deviations in pressure that can appear satisfactory in a pure-component vapor pressure diagram that spans multiple orders of magnitude, or a binary phase diagram of a wide-boiling mixture, can look drastic in a phase diagram of a narrow-boiling mixture, like benzene/methylcyclopentane.

The pentan-2-one/butan-1-amine system exemplifies how PCP-SAFT struggles to describe the properties of mixtures containing amines accurately. This is partly due to the lower availability of experimental data, particularly compared to alcohols. However, even for systems for which experimental data is plentiful, the individual parametrization still fails to describe the double azeotrope in the pentan-2-one/butan-1-amine system and others. A possible reason is that the asymmetry between the strength of the hydrogen bond-donor and acceptor sites in amines is not captured with the current parametrization. Resolving this asymmetry might, therefore, be a route to more accurate descriptions of mixtures containing amines in future.

Fig. 6
figure 6

Distribution of the MARD in bubble and dew point pressures for the parametrization of individual mixtures for (a) all systems and (b) systems that contain \(\hbox {CO}_{2}\) or water. Comparison between predictions with \(k_{ij}=0\) (gray) and either fitted \(k_{ij}\) (blue) or fitted \(\varepsilon ^{A_iB_j}\) (red) values. The horizontal lines and numbers below the graphs correspond to the median values of each distribution. The number of mixtures within each set is indicated below the horizontal axis (Color figure online)

5.1 Parametrization of Individual Mixtures

Figure 6a shows the distribution of the MARD for all mixtures for which individual binary interaction parameters were adjusted. The data is split into the kind of interaction considered: the first set contains all mixtures for which a binary dispersion interaction parameter \(k_{ij}\) was adjusted. The second set consists of mixtures that can be described better by accounting for induced association rather than adjusting \(k_{ij}\). The third set contains the mixtures in which cross-association occurs and that can be described more accurately by fitting the cross-association energy parameter \(\varepsilon ^{A_iB_j}\) rather than \(k_{ij}\). In all cases exactly one parameter (\(k_{ij}\) or \(\varepsilon ^{A_iB_j}\)) is adjusted.

Adjusting a single binary interaction parameter has a significant effect on the distribution of the MARD; for the bulk of molecules for which a \(k_{ij}\) parameter was adjusted, the median value of the MARD (indicated by the horizontal line in Fig. 6) is reduced from 9.4 % to 2.2 %. The mixtures that exhibit induced association are generally challenging, which is indicated by the higher median MARD for pure predictions of 13.4 %. With the adjusted association energy parameters \(\varepsilon ^{A_iB_j}\), the median MARD can be reduced to 2.9 %. For mixtures of two self-associating components, the sample size is smaller, but the reduction of the median MARD from 11.1 % to 3.2 % is in line with the general trend.

The median errors are partly driven up by few components that form mixtures that are difficult to describe with PCP-SAFT or SAFT models in general. As particular important examples of fluids that are difficult to model with SAFT equations of state, Fig. 6b shows the distribution of errors for mixtures containing water or \(\hbox {CO}_{2}\). The parametrization of Esper et al. [43] is focused on the breadth of the molecular space and, therefore, does not consider quadrupolar interactions, which are only significant for few molecules, \(\hbox {CO}_{2}\) being one of them. As a consequence, quadrupolar interactions are part of the effective dispersive interactions and including induced association between \(\hbox {CO}_{2}\) and molecules with hydrogen bond-donor sites does not improve the description of the mixture. Therefore, all mixtures containing \(\hbox {CO}_{2}\) are modeled using a \(k_{ij}\) parameter.

For water, a consensus on the correct or optimal parametrization is even further away [60,61,62]. Esper et al. [43] use a 2B association scheme and no dipole moment. As a consequence, both induced association and cross-association can be considered in this study. For many systems, a reasonable description can be achieved with the parameters provided in this work. However, it remains necessary to be careful with parameters when studying systems containing components in general known to be difficult to model.

Fig. 7
figure 7

Median MARD in bubble and dew point pressures for combinations of chemical families. Large deviations can be observed for mixtures containing inorganic compounds and, to a lesser extent, carboxylic acids, amines, and nitriles. The remaining combinations of chemical families are described well with median deviations below \(5\%\)

To quantify which groups of chemicals are problematic, Fig. 7 shows the median MARD for different combinations of chemical families. The available experimental data and, hence, the published binary interaction parameters cover a large part of the matrix of chemical families, and for most combinations, the deviations are low. Higher deviations occur, in particular, for systems containing inorganic compounds. In this study, the family of inorganic compounds consists of ammonia, carbon dioxide, carbon monoxide, carbonyl sulfide, sulfur dioxide, and water. Other classes of molecules that show above-average deviations are carboxylic acids, which have a unique dimerization behavior in the vapor phase, and molecules containing nitrogen (nitriles and amines). Through this analysis, our study also gives guidance on where additional modeling effort is required to make SAFT models applicable for an even broader molecular space.

Mixtures between common organic components like hydrocarbons, esters, ethers, ketones, or alcohols are described particularly well, which can be expected from PCP-SAFT. However, it is remarkable that the addition of halogenated compounds, of which 107 are present in this study, does not worsen the deviations at all. The observation shows how PCP-SAFT is also suited well for the design of heat pumps or similar thermodynamic cycle processes with mixed refrigerants [63].

5.2 Group-Contribution Methods

Figure 8 shows the values of the individual group/group interaction parameters for (a) the homosegmented GC method and (b) the heterosegmented GC method. In both cases, the interaction between polar and non-polar groups tends to be corrected with a small negative or even positive \(k_{\alpha \beta }\), which hints at an underestimation of the strength of dipolar interactions. The correction from the combination of polar and non-polar groups is also present in the mixture between two polar components and has to be corrected by larger negative \(k_{\alpha \beta }\) for the combinations of polar groups with each other. One notable trend is visible for the alkyne group, which requires strong negative \(k_{\alpha \beta }\) for all combinations with polar groups. The observation suggests that a more physical model could be obtained if the polarity of the triple bond is reflected in the PCP-SAFT parameters.

Fig. 8
figure 8

Adjusted values for the binary group/group interaction parameters \(k_{\alpha \beta }\). (a) Homosegmented GC method (Eq. 24) and (b) heterosegmented GC method (Eq. 25)

Fig. 9
figure 9

Distribution of the MARD in bubble and dew point pressures for all mixtures that are part of the GC parametrization. Comparison between the results using the unaltered parameters from Sauer et al. [36] (gray) and including induced association and binary group/group interaction parameters (green/purple) and with (right) and without (left) adjusting the component-specific correction parameter (Color figure online)\(\varphi _i\). The horizontal lines and numbers correspond to the median values of each distribution

A quantitative comparison of the different GC methods is shown in Fig. 9. For the determination of the induced association and binary group/group interaction parameters, the values for \(\varphi _i\) were adjusted. During the optimization, the MARD is reduced considerably from 5.3 % to 3.0 % (homosegmented) and 6.1 % to 3.6 % (heterosegmented). Notably, the MARD values with \(\varphi _i\) adjusted are already on a low level, which is a testament to the quality with which PCP-SAFT predicts mixture behavior based on pure-component parameters.

In an actual application of the GC methods, e.g., a molecular design or property prediction for a novel component, the values for \(\varphi _i\) are not available. In this case, the reduction of the MARD by including group/group interaction parameters is small, especially for the homosegmented approach (7.9 % to 6.4 %). For the heterosegmented approach, a larger reduction from 7.5 % to 5.1 % is achieved. The reason for the small effect of binary corrections on the MARD is that the inaccuracies in the description of pure components carry over to the description of mixtures. What is not captured by the MARD is that including the binary corrections delivers a better description of the shape of the phase diagram, a property that in some applications (in particular separation) can be more important than the absolute value of the pressure. An example of this behavior is the acetaldehyde/propan-2-one mixture in Fig. 5. The MARD of both GC methods is high, but the shape of the phase diagram is captured perfectly well.

Fig. 10
figure 10

Distribution of the MARD in relative volatilities \(\alpha _{12}\). Comparison between the results using the unaltered parameters from Sauer et al. [36] (gray) and including induced association and binary group/group interaction parameters (green/purple). The results are calculated without correcting pure-component vapor pressures (\(\varphi _i=1\)). The horizontal lines and numbers correspond to the median values of each distribution (Color figure online)

The ability to separate a mixture by distillation is best described by the relative fugacity \(\alpha _{12}=\frac{y_1/x_1}{y_2/x_2}\). Experimentally obtained values of \(\alpha _{12}\) are highly sensitive to errors in the concentration measurements and require the composition of the liquid (\(x_i\)) and the vapor (\(y_i\)) phase. Therefore, it is not well suited to be used in the objective function for the parametrization. Instead, it can be used to assess the quality of the final model. Figure 10 shows the distribution of MARD in \(\alpha _{12}\) values for all mixtures for which Tpxy data are available. Accounting for induced association and using adjusted group/group interaction parameters reduces the median MARD by a third for both the homosegmented and the heterosegmented GC methods.

In conclusion, the GC methods already capture the separation behavior of the mixture well. With an improved description of the pure components (as demonstrated by the adjusted \(\varphi _i\) parameter) a fully quantitative description of the mixture can be achieved.

6 Conclusion

This work provides binary PCP-SAFT interaction parameters for over 7000 mixtures. The quality of the mixture description can be enhanced significantly by accounting accurately for hydrogen bonds in the system. With the combination of pure-component parameters by Esper et al. [43] and the binary interaction parameters (for dispersion or association) from this work, the median deviation to experimental data over all available binary systems is 2.3 %.

On top of that, binary group/group interaction parameters are adjusted for the homosegmented and heterosegmented group-contribution methods for PCP-SAFT [36]. Deviations to experimental mixture data for both GC methods largely stem from inaccuracies that carry over from the description of pure components. Overall, the heterosegmented GC method shows slightly smaller deviations to experimental bubble and dew point data (\(5.1\%\)) than the homosegmented approach (\(6.4\%\)). It can, therefore, be a fruitful endeavor to further strengthen the heterosegmented modeling paradigm in future work, e.g., by considering fused-sphere chains [64] as reference fluid.

The study shows how capturing more physics in the interaction between molecules improves the description of mixtures. Parameters that are strongly correlated in the calculation of pure-component properties, like the strength of different attractive interactions (dispersive attraction, polar interactions, and hydrogen bonding), can be distinguished in mixtures of unlike molecules. With today’s availability of data and computational power, it is therefore inevitable to move closer to parametrizing models more holistically on pure component and mixture data simultaneously in future. Such a parametrization endeavor also greatly benefits from our proposed method of rapidly calculating derivatives of bubble and dew point pressures with respect to model parameters using reverse mode automatic differentiation.

The database of binary interaction parameters is available online and can be readily used together with the \(\text {FeO}_\text {s}\) [47] open-source toolkit or any implementation of PCP-SAFT that provides the necessary implementations of associative interactions. With this comprehensive database of PCP-SAFT parameters, we hope to facilitate further the application of molecular equations of state in industry and academia. The binary group/group interaction parameters can be used to estimate properties of systems containing novel molecules and enhance the description of mixtures in computer-aided molecular and process design [65, 66].