Development of a method to model the mixing energy of solutions using COSMO molecular descriptors linked with a semi-empirical model using a combined ANN-QSPR methodology

https://doi.org/10.1016/j.ces.2020.115764Get rights and content

Highlights

  • Relationships between the model parameters and the molecular interactions is assumed.

  • A link is established between the model-parameters/COSMO molecular descriptors pairs.

  • A combined ANN-QSPR procedure is used to quantify the relation parameter/ descriptor.

  • The work shows the feasibility of the methodology developed.

Abstract

A methodology linking the interaction parameters of a semi-empirical model to COSMO molecular descriptors to represent the hE(x) is presented. A database of 4395 values of ester-alkane is used to check the procedure. Relationships are established for the model coefficients, ai1,i2,…ip, although only for a12 and for a second parameter k21, which governs the asymmetry of the representation of the mixing process. A discretization of the σ-profile and its statistical moments constitute two descriptors, as a vector, Sσ-profilej and Sσ-momentp, for a12. A third molecular descriptor/vector, is described for k21, based on the divergence of Kullback-Leibler and the molecules size, which is presented for the first time as:SRσ-profile=A1,A2,DKL[p1(σ)||p2(σ)],DKL[p2(σ)||p1(σ)]

The clustering of systems in a a12/k21-space is predictable with molecular descriptors in relation to their main interactions. An ANN-QSPR binomial estimates the parameters a12 and k21 from molecular descriptors. The methodology generalizes the procedure, acceptably representing the energetic effects of solutions (R2 > 0.9).

Introduction

The estimation of physicochemical properties of pure components and solutions is one of the most interesting topics in the field of thermodynamics applied to chemical engineering. The existing contributions in the current literature differ in nature, some are: (a) theoretical models, developed from the thermodynamics formalism and quantum chemistry, using information on the molecular structure of the compounds, (b) semi-empirical models obtained either with a purely empirical mathematical profile, or those that combine both sciences, which can be called thermodynamic-mathematical. The parametrization of the latter is carried out using a limited amount of experimental data of some properties, such as phase equilibria, solution properties, etc, (c) models based on data-learning, where the information provided by theoretical thermodynamics is omitted; some of these models could be included in (b).

Within the first group, there are some important contributions (Renon and Prausnitz, 1968), although, for this work we are especially interested in those that arise from quantum-chemical-based models, such as COSMO-RS (Klamt, 1995) or COSMO-SAC (Lin and Sandler, 2002). These models present an interesting scientific base, minimizing the empirical proposals. Of the two COSMO-based models (Klamt and Schüürmann, 1993), the one proposed by Klamt (1995) stands out in the modelling of liquid phase properties of a wide range of solutions, including non-polar, polar and associating compounds, just using theoretical quantum-mechanic calculations of the moleculeś screening charges. However, so far, the methods based on the first principle (a) do not produce accurate results, especially when there are specific interactions in the solutions. In such cases, an alternative to overcome certain inconveniences it to use semi-empirical models, as indicated in (b). These models are usually defined for the excess Gibbs energy, gE, as NRTL (Renon and Prausnitz, 1968) and UNIQUAC (Abrams and Prausnitz, 1975), and others. These are functions supported by a certain degree of theory but, in practice, they are used almost exclusively as correlative models of experimental data. The third case commented above (c) is when a mathematical relationship is generated, often without physical sense, to correlate the experimental data.

One of the most important aspects to consider when dealing with this topic is the experimentation, due to any modelling requires of good quality experimental data. One of the quantities obtained directly with precision in the solutions field, is the mixing enthalpy, hE. Therefore, many theoretical models of different nature use estimates of this property (Lai et al., 1978, Gmehling et al., 1993) to verify the model consistency, and even the interpretation of its behavior, since the hEs are sensitive to molecular interactions. The hEs are a derivative function in relation to temperature of the Gibbs function, and the information it provides is useful, however, the estimates are still poor, increasing the discrepancies when the second derivatives are calculated, such as cpE and others. These properties are not only important from the perspective of interactive analysis, but also for their role in the calculation of energy balances in the process simulation.

In an attempt to provide an adaptable equation for the gE-function, a new model (Ortega et al., 2010) was introduced years ago, and since then, many works have attempted to validate its usefulness. Some focused on the multiproperty correlation obtained from the phase equilibria and mixing effects (energetic and/or volumetric nature) (Ríos et al., 2014, Ríos et al., 2018). It was proved that the model offers a great versatility by improving the representations made by other semi-empirical functions proposed for gE, or for the Helmholtz function, aE, but at the expense of using a greater number of parameters, which must be fitted to experimental data.

Assuming that the molecular interactions between the compounds in solution give rise to excess properties, these macroscopic quantities can be associated with molecular descriptors which, in turn, will be related to the characteristic parameters of the modelling of said quantities. This procedure has been used in other cases, such as: estimation of the binary interaction parameter (kij) of the Peng-Robinson equation of state (EoS) (Abudour et al., 2014, Abudour et al., 2017), that of PC-SAFT (Stavrou et al., 2016), and estimation of the energetic parameters of NRTL (Gebreyohannes et al., 2014). For these works, the structural descriptors of DRAGON and CODESSA databases were used with good results.

The σ-profiles obtained from COSMO have proven useful as molecular descriptors to estimate properties of pure components and mixtures, as well as in the modelling with an empirical extension (Zissimos et al., 2002) of COSMO-RS for complex mixtures. Some studies have shown the dependence of the σ-profiles with certain properties, such as the densities of ionic liquids (IL), with environmental properties such as toxicity, and with other solvent properties (Palomar et al., 2008a, Palomar et al., 2008b, Torrecilla et al., 2010). Even, a recent application of σ-profiles involves the evaluation of surface tensions of pure components (Kondor et al., 2014) which have a key role in interfacial transport rates. Also, several researchers (Ortega et al., 2008, Vreekamp et al., 2011, Zaitseva et al., 2016) analyzed the general suitability of using the COSMO-RS methodology to estimate the hEs of solutions containing several molecular solvents. Therefore, the proposal for this work is to establish relationships between the parameters of a semi-empirical mathematical-thermodynamic model, applied to hE, and the COSMO σ-profiles used as a basis for molecular descriptors. It covers the importance analysis of each descriptor obtaining the explained variance of model parameters. The relationships are raised by generating a Quantitative Structure-to-Property Relationship (QSPR) (Balaban, 2001, Diudea, 2001) using a multilinear analysis and Artificial Neural Networks (ANN). The study is focused on the mixing energies of a set of homologous series of ester-alkane. Why have these solution sets been chosen? There are several reasons, one of them is due to the large amount of experimental information available in the literature (Fernández et al., 2010, Fernández et al., 2013, Fernández et al., 2014, Gonzalez et al., 1993, Gonzalez et al., 1994, Gonzalez and Ortega, 1993, Gonzalez and Ortega, 1994, Lorenzana et al., 1989, Lorenzana et al., 1990, Ortega, 1990, Ortega, 1991a, Ortega, 1991b, Ortega, 1992, Ortega et al., 1990a, Ortega et al., 1990b, Ortega et al., 1991, Ortega et al., 1992a, Ortega et al., 1992b, Ortega et al., 1999a, Ortega et al., 1999b, Ortega et al., 1999c, Ortega et al., 2015, Ortega and Gonzalez, 1993a, Ortega and Gonzalez, 1993b, Pérez et al., 2016, Toledo et al., 2000, Vidal et al., 1997), which will lead to a more rigorous study. Another is because it is relevant that the experimental values of the chosen homologous series show a monotonous change with the chemical structure of the constituent compounds and the proposed framework should assume the phenomenology of the mixing process, which is a working desideratum. All this must be able to establish a formal qualitative relationship between the parameters of the thermodynamic model and the descriptors and molecular interactions. Lastly, the chosen set is gaining interest for its involvement in different biotechnological processes, as the enzymatic synthesis of esters (Martins et al., 2011), in food technology (Herrera et al., 2019), and also in chemical engineering as the biofuels production (Nabi et al., 2006).

Section snippets

A general equation to represent excess properties

In a previous paper (Ortega et al., 2010), the properties generated in the mixing process were mathematically represented as a functional of the intensive variables pressure (p), temperature (T) and a vector of compositions (x) of size equal to n-1, in solutions of n-components. With some simple considerations, where certain hypotheses to interpret the mixing effects are assumed, a mathematical model is developed, as expressed in Eq. (1). The following considerations were raised: (a) the

Description of the QSPR methodology

The relationship between the physicochemical descriptors of solutions (a12 and k21) and the molecular ones (Sσ-profilej,Sσ-momentP, SRσ-profile) is performed using a QSPR procedure (Balaban, 2001, Diudea, 2001). The first step is to convert the experimental dataset into a parameter dataset by a Nonlinear Least-Squares (NLS) fitting of the influence-coefficients and those of interaction ratio, and the experimental hE values. On the other hand, a multilinear model of the molecular descriptors is

Fitting results

The correlation of data for the chosen systems with Eq. (4) produces a good representation. All hE, calculated at T = 298.15 K and atmospheric pressure, are over the diagonal-line in Fig. 5(a), with a R2 > 0.999, justifying the consideration of second-order interactions, Q = 2, to properly describe the mixing energy of ester + alkane systems. Fig. 5(b) shows the comparison between the k21 obtained from the fitting and those calculated by Bondís method, Eq. (5); the values for the interaction

Conclusions

The relationship between the COSMO-RS molecular descriptors and the parameters of a semi-empirical model was established to correlate the functional hE = hE(x). In this work 293-out-of-832 systems of the ester(1) + alkane(2) homologous series were used; the dataset, measured at constant pressure and temperature, belongs to the binaries empirically described as: H2ν-1Cv-1COO(CH2)u-1CH3(v = 1–16, u = 1–4) + CnH2n+2(n = 5–17). It was proven that polarization charge density on the molecular

CRediT authorship contribution statement

Adriel Sosa: Conceptualization, Software, Data curation, Formal analysis, Investigation, Writing - original draft, Supervision, Writing - review & editing. Juan Ortega: Methodology, Investigation, Project administration, Supervision, Conceptualization, Validation, Writing - review & editing, Funding acquisition. Luís Fernández: Formal analysis, Investigation, Data curation. José Palomar: Supervision, Validation.

Declaration of Competing Interest

The authors declare that they have no conflicts of interest in this work.

Acknowledgments

The authors are grateful for financial support from Spanish Ministry (project PGC2018-099521-B-100). One of us (AS) is grateful to the ACISI (Canary Government, 2015010110) for the support received; LF thanks Spanish Ministry for the postdoctoral contract received under the Juan de la Cierva program (FJCI-2017-31784).

References (70)

  • A.B. Martins et al.

    Rapid and high yields of synthesis of butyl acetate catalyzed by Novozym 435: Reaction optimization by response surface methodology

    Process Biochem.

    (2011)
  • M.N. Nabi et al.

    Improvement of engine emissions with conventional diesel fuel and diesel–biodiesel blends

    Bioresour. Technol.

    (2006)
  • J. Ortega

    Excess molar enthalpies at the temperature 298.15 K of (a methyl n-alkanoate+pentane or heptane)

    J. Chem. Thermodyn.

    (1992)
  • J. Ortega

    HEm {xCH3(CH2), v–1CO2CH3 (v=5 or 6 or 14)+(1–x)CH3(CH2)11CH3, 298.15 K}

    J. Chem. Thermodyn.

    (1991)
  • J. Ortega

    Excess enthalpies of (a methyl alkanoate+n-nonane or n-undecane) at the temperature 298.15 K

    J. Chem. Thermodyn.

    (1991)
  • J. Ortega

    Measurements of excess enthalpies of a methyl n-alkanoate (from n-hexanoate to n-pentadecanoate)+n-pentadecane at 298.15 K

    J. Chem. Thermodyn.

    (1990)
  • J. Ortega et al.

    Solutions of alkyl methanoates and alkanes: Simultaneous modeling of phase equilibria and mixing properties. Estimation of behavior by UNIFAC with recalculation of parameters

    Fluid Phase Equilib.

    (2015)
  • J. Ortega et al.

    Thermodynamic properties of (a methyl ester+an n-alkane). V. HEm and VEm for {xCH3(CH2)u-1CO2CH3(u=1 to 6)+(1–x)CH3(CH2)12 CH3}

    J. Chem. Thermodyn.

    (1993)
  • J. Ortega et al.

    Thermodynamic properties of (a methyl ester + an n-alkane). III. HEm and VEm for {xCH3(CH2)u-1CO2CH3 (u=1 to 6)+(1–x)CH3(CH2)8 CH3}

    J. Chem. Thermodyn.

    (1993)
  • J. Ortega et al.

    Thermodynamic properties of (a methyl ester + an n-alkane) I. HEm and VEm for {xCH3(CH2)u−1CO2CH3 (u=1 to 6)+(1–x)CH3(CH2)6CH3}

    J. Chem. Thermodyn.

    (1992)
  • J. Ortega et al.

    Experimental and predicted mixing enthalpies for several methyl n-alkanoates with n-pentane at 298.15 K

    Thermochim. Acta

    (1992)
  • J. Ortega et al.

    Enthalpies of mixing at 298.15 K of a methyl alkanoate (from acetate to pentanoate) with n-alkanes (n-tridecane and n-pentadecane)

    Thermochim. Acta

    (1990)
  • J. Ortega et al.

    Excess molar enthalpies of methyl alkanoates +n-nonane at 298.15 K

    Thermochim. Acta

    (1990)
  • J. Ortega et al.

    Thermodynamic properties of (an ethyl ester + an n-alkane). XI. HEm and VEm values for xCH3(CH2)uCOOCH2CH3+(1–x)CH3(CH2)2v+1CH3 with u=6, 7, 8, 10, 12, and 14, and v=(1 to 7)

    J. Chem. Thermodyn.

    (1999)
  • J. Ortega et al.

    Thermodynamic properties of (a propyl ester+an n-alkane). XII. Excess molar enthalpies and excess molar volumes for xCH3(CH2)u−1COO(CH2)2CH3+(1–x)CH3(CH2)2v+1CH3 with u=(1 to 3), and v=(1 to 7)

    J. Chem. Thermodyn.

    (1999)
  • E. Pérez et al.

    Contributions to the modeling and behavior of solutions containing ethanoates and hydrocarbons. New experimental data for binaries of butyl ester with alkanes (C5–C10)

    Fluid Phase Equilib.

    (2016)
  • M. Stavrou et al.

    Estimation of the binary interaction parameter kijof the PC-SAFT Equation of State based on pure component parameters using a QSPR method

    Fluid Phase Equilib.

    (2016)
  • M. Vidal et al.

    Thermodynamic properties of (an ethyl ester + and n-alkane). IX. HmE and VmE for xCH3(CH2)uCOOCH2CH3+(1–x)CH3 (CH2)2v+1CH3 with u= 0 to 5, and v= 1 to 7

    J. Chem. Thermodyn.

    (1997)
  • D.S. Abrams et al.

    Statistical thermodynamics of liquid mixtures: A new expression for the excess Gibbs energy of partly or completely miscible systems

    AIChE J.

    (1975)
  • A.T. Balaban

    A personal view about topological indices for QSAR/QSPR

  • A.D. Becke

    Density-functional exchange-energy approximation with correct asymptotic behavior

    Phys. Rev. A

    (1988)
  • A. Bondi

    Physical Properties of Molecular Crystals Liquids, and Glasses

    (1968)
  • M. Diudea

    QSPR/QSAR Studies by Molecular Descriptors

    (2001)
  • F. Eckert et al.

    Fast solvent screening via quantum chemistry: COSMO-RS approach

    AIChE J.

    (2002)
  • K. Eichkorn et al.

    Auxiliary basis sets for main row atoms and transition metals and their use to approximate Coulomb potentials

    Theor. Chem. Accounts Theory, Comput. Model. (Theoretica Chim. Acta)

    (1997)
  • Cited by (8)

    • COSMO models for the pharmaceutical development of parenteral drug formulations

      2023, European Journal of Pharmaceutics and Biopharmaceutics
    • Extreme learning machine models for predicting the n-octanol/water partition coefficient (K<inf>ow</inf>) data of organic compounds

      2022, Journal of Environmental Chemical Engineering
      Citation Excerpt :

      In this case, universal QSPR models of Kow estimation for extensive organics would be desired. Recently, a priori quantum-chemical molecular descriptors (Sσ-profile) descriptors gained from the conductor-like screening model (COSMO) have been proved to be suitable for estimating the properties of pure and mixed substances[30–33]. Machine learning methods are also widely used for the properties assessment of various compounds.

    • Molecular graph-based deep learning method for predicting multiple physical properties of alternative fuel components

      2022, Fuel
      Citation Excerpt :

      Quantitative Structure–Property Relationship (QSPR) also has a wide range of applications in predicting physical properties [13]. Such method quantifies the molecular structure by specific chemical groups or topological indices (TIs) [14,15], after which the structure–property links are established through the various algorithms, including Random forest [16], Support Vector Machine [17,18], and Artificial Neural Networks (ANN) [19–21], etc. Gharagheizi et al. [22] realized the feasibility of constructing a three-layer Feed Forward ANN for NBP values prediction of 17,768 pure compounds, with 44 molecular descriptors calculated from Dragon software.

    View all citing articles on Scopus
    View full text