Elsevier

Chemosphere

Volume 86, Issue 9, March 2012, Pages 959-966
Chemosphere

A QSPR model for prediction of diffusion coefficient of non-electrolyte organic compounds in air at ambient condition

https://doi.org/10.1016/j.chemosphere.2011.11.021Get rights and content

Abstract

Evaluation of diffusion coefficients of pure compounds in air is of great interest for many diverse industrial and air quality control applications. In this communication, a QSPR method is applied to predict the molecular diffusivity of chemical compounds in air at 298.15 K and atmospheric pressure. Four thousand five hundred and seventy nine organic compounds from broad spectrum of chemical families have been investigated to propose a comprehensive and predictive model. The final model is derived by Genetic Function Approximation (GFA) and contains five descriptors. Using this dedicated model, we obtain satisfactory results quantified by the following statistical results: Squared Correlation Coefficient = 0.9723, Standard Deviation Error = 0.003 and Average Absolute Relative Deviation = 0.3% for the predicted properties from existing experimental values.

Highlights

► A simple QSPR model is presented for diffusion coefficient of organic compounds. ► A comprehensive diffusion coefficient data set was applied for model derivation. ► The model is stable and predicts the diffusion coefficient with acceptable accuracy.

Introduction

Diffusion coefficient is one of the most important transport properties of the fluids. This parameter plays crucial rule in diverse engineering applications from designing mass transfer equipment to atmospheric and indoor air pollution control.

From chemical engineering point of view, designing mass transfer units entails the knowledge of mass transfer coefficients. This parameter and the prior one appear in the important non-dimensional number known as Sherwood number which defines as follows (Treybal, 1980):Sh=k·xDIn Eq. (1), k,xandD refer to mass transfer coefficient, characteristic length and diffusion coefficient, respectively. Since many mass transfer correlations are introduced in the terms of Sherwood number, the knowledge of diffusion coefficient is requisite for determination of mass transfer coefficients.

Besides, diffusion coefficient is an important parameter for atmospheric pollutant dispersion models. Pollutant dispersion models are mathematical expressions which allow researchers to determine the vivid picture of pollutant fate and pollution profile in specific location. These models are essential to quantify air pollution impact on the environments and human health.

Finally, the concept of molecular diffusion finds its way through the analytical chemistry applications. Monitoring organic environmental contaminations is an ongoing challenge in the analytical chemistry. There are two approaches for monitoring contaminant concentrations in the air: active and passive sampling.

Active sampling methods require a large number of samples taken from a particular location over entire duration of sampling in order to determine the contamination concentration in the indoor air or atmosphere. The main drawbacks of this approach are its time consuming and costly operations.

For the solution of this situation the passive sampling methods is introduced. This approach is based on the measurement of the concentration as a weighted average over the sampling time. Since the concentration of the analyte is integrated over exposure time, this approach is invulnerable to accidental and extreme variation of pollutant concentrations (Namieśnik et al., 2004).

This sampling technique purely based on concept of molecular diffusion. The governing equation for the free flow of the analyte from the sampled medium to collecting medium is the first Fick’s law of diffusion which presents as follows:JA=-DACAzwhere JA is the molar flux of analyte, A which was diffused as a result of concentration gradient. As it is obvious in the Eq. (2), diffusion coefficient D is the key parameter to evaluate the diffusion flux values. Accessing to the accurate values of the diffusion coefficient would result in more accurate determination of the pollutant concentration for the indoor and atmospheric air quality assessments (Stranger et al., 2008).

Fig. S1 schematically portrayed the application of the molecular diffusion coefficient in diverse branches of engineering.

As it is indicted, knowledge of accurate values of diffusion coefficients is essential for many diverse applications. Unfortunately, the experimental data for many compounds are scarce and even unavailable. Experimental determination of these values requires accurate, costly and time consuming measurements (Marrero and Mason, 1972). To overcome this issue, predictive accurate models should be introduced to estimate the unknown values of diffusion coefficients.

Up to now, many correlations have been presented to determine the diffusion coefficients in gaseous binary mixtures. These correlations can be divided into two general categories: theoretical and empirical correlations.

Generally, theoretical correlations of diffusion coefficients are stemmed from solving Boltzmann equations. The results of this solution credited to both Chapman and Enskog who independently derived the correlation (Poling et al., 2001). This correlation could be successfully applied to binary mixtures at low to moderate pressures:DAB=316(4πkT/MAB)1/2nπσAB2ΩDfDwhere MA and MB are the molecular weight of molecules A, B respectively; MAB = 2[(1/MA) + (1/MB)]−1; n, the number density of molecules in the mixture; k, the Boltzmann’s constant; and T is the absolute temperature. ΩD the collision integral for diffusion is a function of temperature and depends upon the choice of intermolecular force law between colliding molecules. σAB is the characteristic length of the intermolecular force law. fD is a correction factor to account molecular differences of presented species in the mixture. ΩD and σAB derived from appropriate potential function. Generally, Lennard–Jones 12-6 potential was employed for its convenience and simplicity. The correlation presented in the Eq. (3) is valid for dilute gases consisting of non-polar, spherical, monatomic molecules. Calculation of diffusion coefficients based on theoretical approach shows up to 25% deviation even for aforementioned class of dilute gases.

The other estimation method correlates the diffusion coefficients with the viscosity values. Since the derived expressions for these two parameters based on Chapman-Enskog have a common basis, they can be combined to the related two parameters. To conduct diffusion coefficient calculation, the viscosity values as a function of composition at the constant temperature are required (Hirschfelder et al., 1965, DiPippo, 1967, Gupta and Saxena, 1968, Kestin et al., 1977, Kestin and Wakeham, 1983).

The validity of this method was approved as excellent with a large collection of viscosity and diffusion coefficient experimental data by Weissman and Mason, 1962, Weissman, 1964.

For extending theoretical approach for the prediction of diffusion coefficient of polar gases modified version of Lennard–Jones potential like Stockmayer should be applied.

Brokaw (1969) kept the original version of Eq. (3) and modified ΩD for polar components. His modification involved the addition of extra term to the original ΩD accounting for polarity. This extra term is calculated from dipole moment, liquid molar volume at the normal boiling point and the normal boiling point. His assumption of relating polarity effect, exclusively to the dipole moment was disputed by some authors (Byrne et al., 1967). The estimated diffusion coefficients using Brokaw’s method display maximum absolute deviations up to 33% for the polar gases of the same group of gas mixtures investigated by conventional theoretical method using Lennard–Jonnes 12-6 potential function (Prausnitz et al., 1999).

Several proposed methods of estimating the diffusion coefficient retain the original form of Eq. (3) with empirical constants based on experimental values.

Wilke and Lee (1955) proposed a model similar to the original form of Eq. (3). They suggested that derived parameters from 12-6 Lennard–Jones potential in original equation could be satisfactorily replaced by their suggested empirical constants which are function of liquid molar volume and normal boiling point.

Other modification was suggested by Fuller et al., 1966, Fuller et al., 1969. In their model a new parameter was introduced defining as summation of atomic diffusion volumes of the atomic species present in the structure of the molecules. These values obtained by the regression analysis of a large collection of experimental data. The authors claim that the largest associated error of this method of about 4%.

Despite of simplicity of the reported correlations, their employment accompanied with several drawbacks. First, all of the correlations presented in this section require some additional parameters in lieu of the estimation of diffusion coefficients. For the cases of the unknown components, all of these parameters like normal boiling point, scale factor, etc. should be estimated or calculated prior to the diffusion coefficient. Second, despite of their promising prediction of the diffusion coefficient of dilute and non-polar gases, they could not accurately predict the associated values for high polar gases. The more polar the gas is, the higher error is expected. Finally, these correlations are based on the limited numbers of experimental data. Many substances from various classes of components are absent in the original data sets and therefore, any generalization should be conducted with the caution.

To sum up, the proper model not only should be derived from a large collection of data for generalization purposes, but also should be based on the least parameters of the studied components.

One of the approaches which successfully satisfied the mentioned criteria is Quantitative Structure Property Relation also known as QSPR. In recent years, QSPRs gained recognition in the correlation and prediction of physical, chemical and biological properties (Gharagheizi, 2007b, 2008a,c, Gharagheizi, 2009, Vatani et al., 2007, Gharagheizi and Sattari, 2009, Gharagheizi et al., 2011b, Gharagheizi et al., 2011c).

In QSPR methodology, merely the structures of components served as inputs of the model. The objective of the QSPR is to relate microscopic properties solely derived from molecular structure to the macroscopic ones. This would be done by introducing the concept of “descriptors”. The molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment (Todeschini and Consonni, 2000). These descriptors could be empirical or purely based on topological or quantum chemistry computations. Next, by the aid of statistical approaches and regression analysis the pool of descriptors are accurately scrutinized and the best descriptors suited for the model are selected. This step is the most critical step of QSPR modeling. Fig. S2 illustrates the graphical presentation of typical QSPR modeling.

In this study the QSPR model was proposed for the prediction of diffusion coefficients of 4579 diverse organic compounds at dilute concentration and ambient condition (pressure and temperature of 1 atm and 298.15 K, respectively).

Section snippets

Data preparation

The soundness and validity of the models for representation/prediction of physical properties, especially those dealing with large number of experimental data, directly depends on the quality and comprehensiveness of the applied database for its development (Gharagheizi et al., 2011a). The previously mentioned characteristics of such a model include both diversity in the investigated chemical families and the number of pure compounds available in the dataset. In this work, we used the database

Results and discussion

The accurate linear correlation was obtained by applying GFA algorithm to the final set of descriptors.

The procedure commences by introducing the final set of descriptors to GFA algorithm, to obtain with one descriptor models. The output models are studied and the best model in term of R2 was selected. The process continued by incremental addition of descriptors and investigating related models to obtain the best model for a very number of descriptors.

The process will be halted, when

Conclusion

A five-parameter QSPR model was developed for representation/prediction of the diffusion coefficients of non-electrolyte organic compounds in air at infinite dilution and 298.15 K. A data set of 4579 diverse organic compounds was investigated for this model. Genetic Function Approximation was successfully employed for selection of the most relevant descriptors from the collection of more than 3000 molecular-based parameters. 80% of the data set used as training set for the model generation and

References (50)

  • F.R. Burden

    Molecular identification number for substructure searches

    J. Chem. Inf. Comput. Sci.

    (1989)
  • J.J. Byrne et al.

    Gas-phase interdiffusion coefficients for some polar organic compounds

    J. Phys. Chem.

    (1967)
  • M.T. Chhabria et al.

    QSAR study of a series of 2,3-dihydroimidazo[1,2-c]pyrimidines as antibacterial agents

    Med. Chem. Res.

    (2010)
  • M.T. Chhabria et al.

    QSAR study of a series of cholesteryl ester transfer protein inhibitors

    Collect. Czech. Chem. Commun.

    (2011)
  • R. DiPippo

    Diffusion coefficient of seven binary gaseous mixtures

    J. Chem. Phys.

    (1967)
  • J.H. Friedman

    Multivariate adaptive regression splines

    Ann. Stat.

    (1991)
  • E. Fuller et al.

    New method for prediction of binary gas-phase diffusion coefficients

    Ind. Eng. Chem.

    (1966)
  • E.N. Fuller et al.

    Diffusion of halogenated hydrocarbons in helium. The effect of structure on collision cross sections

    J. Phys. Chem.

    (1969)
  • H. Gao

    Application of BCUT metrics and genetic algorithm in binary QSAR analysis

    J. Chem. Inf. Comput. Sci.

    (2001)
  • F. Gharagheizi

    QSPR studies for solubility parameter by means of genetic algorithm-based multivariate linear regression and generalized regression neural network

    QSAR Comb. Sci.

    (2008)
  • F. Gharagheizi

    Quantitative structure–property relationship for prediction of the lower flammability limit of pure compounds

    Energy Fuels

    (2008)
  • F. Gharagheizi et al.

    Prediction of some important physical properties of sulfur compounds using quantitative structure–properties relationships

    Mol. Diversity

    (2008)
  • F. Gharagheizi et al.

    Estimation of molecular diffusivity of pure chemicals in water: a quantitative structure–property relationship study

    SAR QSAR Environ. Res.

    (2009)
  • F. Gharagheizi et al.

    Estimation of aniline point temperature of pure hydrocarbons: a quantitative structure–property relationship approach

    Ind. Eng. Chem. Res.

    (2009)
  • F. Gharagheizi et al.

    Artificial neural network modeling of solubilities of 21 commonly used industrial solid compounds in supercritical carbon dioxide

    Ind. Eng. Chem. Res.

    (2011)
  • Cited by (35)

    • Understanding and predicting the diffusivity of organic chemicals for diffusive gradients in thin-films using a QSPR model

      2020, Science of the Total Environment
      Citation Excerpt :

      To proceed, the structural information encoded within a symbolic representation of molecules, namely molecular descriptors, which is calculated by using topological or quantum chemistry computations (Todeschini and Consonni, 2008). The most critical step is to select single or combined descriptors as the optimal subset(s) from a huge number of descriptors (Mirkhani et al., 2012). Finally, the desired properties/activities are quantitatively estimated by applying modelling techniques.

    • Quantitative structure-activity relationship (QSAR): Modeling approaches to biological applications

      2018, Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics
    • QSPR prediction of the hydroxyl radical rate constant of water contaminants

      2016, Water Research
      Citation Excerpt :

      First of all, the best one-parameter multivariate model and then, the best two-parameter model were developed. The procedure was then repeated to obtain the best model with three to up to fifteen parameters (Mirkhani et al., 2012; Bagheri et al., 2013b). In order to investigate the nonlinear behaviour of the selected molecular descriptors selected in the previous section, the development of an 3 layers Feed-Forward Neural Network with the hyperbolic tangent sigmoidal transfer function was used as it has been found promising (Bagheri et al., 2012).

    • Determination of binary diffusion coefficients of hydrocarbon mixtures using MLP and ANFIS networks based on QSPR method

      2014, Chemometrics and Intelligent Laboratory Systems
      Citation Excerpt :

      Beigzadeh et al. [16] developed a MLP model for predicting binary diffusion coefficients of liquids. Except the latter reference [16], all AI based models that have been proposed for estimation of diffusivities of liquid binary mixtures are applied for one solvent at ambient temperature [13,14,17,18]. In this study, we have used MLP and ANFIS models based on QSPR method for estimation of the binary diffusion coefficients of liquid hydrocarbons at infinite dilution and at various temperatures.

    View all citing articles on Scopus
    View full text