Prediction of Normal Boiling Points of Hydrocarbons Using Simple Molecular Properties

Four hundred and seventy-six hydrocarbons (CnHm) were utilized to fit their normal boiling point temperatures (NBPT) as a function of molecular weight and carbon atomic fraction. The proposed model is of the following form: NBPT = a∗(Cfrac)∧b∗(MW )∧c, where a, b, and c are the non-linear regressed parameters for the given model; Cfrac is the carbon atomic fraction in a molecule, which is equal to n/(n+m) for a hydrocarbon compound; and MW is the molecular weight, which is calculated as 12∗n+ 1∗m. The model was found to predict NBPT with an adequate accuracy, manifested via the associated percent relative error (PRE) of the curve-fitted NBPT. Out of the examined 476 hydrocarbons, methane, ethylene, and acetylene were found to have PRE values higher than 10%. If the confidence interval is further reduced to PRE value less than 5%, then 43 compounds will be excluded, and then NBPT for the other 433 compounds could be well predicted by the proposed model. Although the proposed model does not differentiate among isomers having the same molecular weight and chemical formula, nevertheless, the difference in NBPT among isomers is not really significant to be picked up by a simple, straightforward model. A more rigorous model will work hard to offset such small differences in NBPT among isomers, nevertheless, at the expense of model simplicity.


Introduction
The prediction of physicochemical properties like the normal boiling point temperature (NBPT) of a substance is a major target of computational chemistry. NBPT is one of the major physicochemical properties used to identify a compound. This property is a fundamental characteristic of chemical compounds, and it is involved in many correlations used to estimate thermo-physical properties. In fact, commercial simulators, like ASPEN PLUS®, can be used to identify, or fill in the gaps of, a molecule with given chemical formula; nevertheless, software packages require some properties of the compound as a priori. NBPT and standard liquid density are the most important properties, for such properties, along with group contribution methods, facilitate the estimation of other missing properties.
NBPT of a compound is related, in general, to its molecular structure; but the nature of the relationship is not straightforward. Different models were used to correlate the boiling points of homologous hydrocarbons with the number of carbon atoms or molecular weight [5]. The group contribution method, used for predicting NBPT, relies on the assumption that the cohesion forces in the liquid predominantly have a short-range character, and the complex molecule is sub-divided into predefined structural groups, each of which adds a constant increment to the value of a property for a compound. In general, the group contribution methods give good predictions of boiling points for small and non-polar molecules [4].
Ivanciuc et al. [3] used quantitative structure-property relationship (QSPR) models for the estimation of boiling points of organic compounds containing halogens, oxygen, or sulfur without hydrogen bonding, accompanied by the comprehensive descriptors for structural and statistical analysis (CODESSA). Using the multi-linear regression (MLR), the boiling points of 185 compounds containing oxygen or sulfur could be accurately computed with a MLR equation containing six theoretical descriptors and having the following statistical indices: R 2 = 0.992 and standard deviation of 6.3°C. For a set of 534 halogenated alkanes C1-C4, the best MLR equation with five descriptors has R 2 = 0.990 and standard deviation of 9.0°C. In their opinion, the QSPR models developed with CODESSA allowed accurate computation of the boiling points of organic compounds using simple constitutional, topological, electrostatic, and quantum indices that could be computed with standard quantum chemistry.
Cholakov et al. [2] proposed a correlation between the molecular structure and the normal boiling point of hydrocarbons. Its main features are the relative simplicity, sound predictions, and applicability to diversified industrially important structures, whose boiling points and numbers of carbon atoms span a wide range. They used two types of descriptors: molecular energy and carbon atom descriptors. For the first type, a structure is treated as a collection of atoms held together by elastic (harmonic) forces-bonds, which constitute the force field. For the second type, it comprises the highest level of sophistication, like the graph topological indices, derived from the adjacency and distance matrices of a chemical structure and the lowest level of sophistication of carbon atom descriptors, like the numbers of atoms engaged in specific groups (atom counts).
Wang et al. [6] extended the application of conductorlike screening model-based segment activity coefficient model for boiling point calculation (COSMO-SAC-BP) solvation model to predict NBPT for environmentally significant substances that are large and more complex molecules, including pollutants, herbicides, insecticides, and drugs. The average absolute deviation in the predicted boiling points of these complex molecules, which spans the range of 266-708 K, was 17.8 K or 3.7%. They concluded that their 3.7% was similar to the value of 3.2% that was obtained for 369 molecules in their earlier study, indicating that this method could be applied well outside the systems used to train the model.
Chan et al. [1] proposed an empirical method for estimating the boiling points of organic molecules based on density functional theory (DFT) calculations with polarized continuum model (PCM) solvent corrections. The boiling points were calculated as the sum of three contributions. The first term was directly calculated from the structural formula of the molecule and was related to its effective surface area. The second was a measure of the electronic interactions between molecules, based on the DFT-PCM solvation energy, and the third was employed only for planar aromatic molecules. The method was found applicable to a very diverse range of organic molecules, with normal boiling points in the range of −50°C to 500°C, and included 10 different elements (C, H, Br, Cl, F, N, O, P, S, and Si).
In this model, the NBPT of a hydrocarbon compound is expressed as a function of simple molecular indicators, namely, the carbon atomic fraction (C frac ) and molecular weight (MW). Such molecular indicators are really simple to calculate. For example, given methane (CH 4 ), then its C frac will be 1/(1 + 4) = 0.20. Moreover, its MW is simply equal to 1 × 12 + 4 × 1 = 16. On the other hand, the difference in NBPT among isomers having the same C frac and MW was found to be small. Any attempt to account for such small differences among isomers will be at the expense of model simplicity.

Model development
Four hundred and seventy-six hydrocarbon compounds were used in the non-linear regression process for finding the best fit for their normal boiling point properties. The database of hydrocarbon compounds includes the following categories: (1) Normal paraffin: example: n-alkane.
The results of non-linear regression for (1), with 95% confidence interval, are: (2) The goodness of fit for (2) is given by R-square as 0.9997 and adjusted R-square as 0.9997 with the sum of squared error (SSE) of 1,796 K 2 and root mean squared error (RMSE) of 1.949 K. The RMSE is essentially the standard error in MATLAB® notation. The PRE is defined as: From engineering applications standpoint, it is tolerated to have uncertainty associated with a measured or calculated quantity, which amounts to a maximum PRE value of 10%.

Results and discussion
The mean PRE for all examined compounds was found to be 2.07, with a standard error of 2.1. However, Table 1 shows three compounds that have PRE higher than 10%.
Other than that, the model could predict well the normal boiling point temperature of a hydrocarbon as a function of its molecular size and carbon atomic (mole) fraction. Figure 1 shows the plot of the curve-fitted NBPT versus the experimental NBPT for all examined 476 hydrocarbons. Most of the data points fall on the 45 • diagonal (Y = X). There is, however, a small deviation in the high-boiling point region. Figure 2 shows that only three data points lie above the 10% PRE datum. In fact, if we take our datum to be 5% not 10%, then we will exclude only 43 compounds with PRE higher than 5%. The 43 compounds that have PRE > 5.0 are  shown in Table 2. The appendix contains all hydrocarbons used in this study.
On the other hand, regarding the isomers or stereochemistry of molecules, an example is shown here to demonstrate the strength and weakness of the model. Table 3 shows 17 different isomers that have the same chemical formula, that is, C 8 H 18 and molecular weight of 114.23.   Based on the proposed model (2), the predicted NBPT is: This means that the value given by the proposed model matches well the mean value shown in Table 3, with a PRE value of 0.4%. As Table 3 shows, the maximum PRE (%) is found to be 4.1% for such a set of stereo-chemical compounds. Moreover, in the previous set, the maximum percent relative difference occurs between the lowest and mean of experimental NBPT: So strictly speaking, it is true that the proposed model does not differentiate among isomers of the same molecular weight and chemical formula; however, at the same time, a maximum percent relative difference of 3.6% is really hardly noticeable by this model. A more rigorous model will work hard to offset this 3.6% value, but at the expense of model simplicity.

Conclusion
The NBPT for a hydrocarbon compound could be expressed as a function of simple molecular properties with an adequate accuracy manifested via the associated PRE of the curve-fitted NBPT. It is very easy for the user to calculate both the molecular weight and the carbon atomic fraction for a given chemical formula of a hydrocarbon (C n H m ). Out of the examined 476 hydrocarbons, methane, ethylene, and acetylene were found to have PRE values higher than 10%.
If the confidence interval is further confined down to PRE value less than 5%, then 43 compounds will be excluded, and then NBPT for the other 433 compounds could be well predicted by the proposed model. Consequently, in fulfillment of the acceptable engineering accuracy, one can say that the model adequately predicts NBPT for each of 433 different hydrocarbons with PRE less than 5% for each.

Appendix
List of 476 hydrocarbons used in the non-linear regression process to express the normal boiling point temperature as a function of hydrocarbon molecular weight and its carbon atomic fraction.