Quantum Chemical Investigation of Polychlorinated Dibenzodioxins, Dibenzofurans and Biphenyls: Relative Stability and Planarity Analysis

All the possible polychlorinated aromatic compounds in the classes of dibenzodioxins (PCDDs), dibenzofurans (PCDFs), and biphenyls (PCBs) were studied by the quantum chemical methods of HF/6-311++G(d,p), B3LYP/6-311++G(d,p), and MP2/cc-pVTZ. The calculated stabilities and structures of these compounds were compared with the available data on their abundance and toxicity. Prediction models for trends in energy and planarity among these congeners were proposed. The results discussed here can help contribute to the understanding of the role of dioxin-like compounds (DLCs) in the environment.


Introduction
The Stockholm Convention on Persistent Organic Pollutants (POPs) [1], initially signed in 2001, includes 12 classes of compounds, three of which can be regarded as dioxin-like compounds (DLCs). (see Figure 1). A subset of DLCs was assigned toxic equivalency factors (TEFs) by the World Health Organization (WHO) [2] to indicate their potential health impacts on humans and mammals. These compounds may accumulate and bioconcentrate through the food chain and, as a result, food products in some jurisdictions such as the European Union are strictly monitored for the presence of these compounds [3]. Quantum chemical methods, semi-empirical calculations, and other related computational techniques have previously been applied to polychlorinated dibenzodioxins (PCDDs), polychlorinated dibenzofurans (PCDFs), and polychlorinated biphenyls (PCBs) . The goals of these studies were, primarily, to predict their chemical properties and understand their mechanisms of action or degradation in biotic systems. A summary can be found in Supplementary Table S1.

Introduction
The Stockholm Convention on Persistent Organic Pollutants (POPs) [1], initially signed in 2001, includes 12 classes of compounds, three of which can be regarded as dioxin-like compounds (DLCs). (see Figure 1). A subset of DLCs was assigned toxic equivalency factors (TEFs) by the World Health Organization (WHO) [2] to indicate their potential health impacts on humans and mammals. These compounds may accumulate and bioconcentrate through the food chain and, as a result, food products in some jurisdictions such as the European Union are strictly monitored for the presence of these compounds [3]. Quantum chemical methods, semi-empirical calculations, and other related computational techniques have previously been applied to polychlorinated dibenzodioxins (PCDDs), polychlorinated dibenzofurans (PCDFs), and polychlorinated biphenyls (PCBs) . The goals of these studies were, primarily, to predict their chemical properties and understand their mechanisms of action or degradation in biotic systems. A summary can be found in Supplementary Table S1.  This communication describes the semi-automated codes for generating chemical structures, to be used as inputs for quantum chemical calculations in a standard package and to extract information from outputs for further analysis. Three different representative quantum chemical methods were selected for comparison: Hartree-Fock (HF), Density Functional Theory (DFT), and second-order Møller-Plesset perturbation theory (MP2). The lists of 76 PCDDs, 136 PCDFs, and 210 PCBs (inclusive of three non-chlorinated base structures) along with their systematic numberings [37][38][39] are shown in Supplementary Table S2-S4, respectively. The use of numberings herein will be in accordance with those in the lists. (This is not to be confused with PubChem ID in the Supplementary Materials and elution order number which is instrument-specific.) As shown in Supplementary Table S5, the congeners can be categorized into homologue groups by their empirical formulas or degrees of substitution.
To the best of our knowledge, in comparison to the existing literature this is the most comprehensive theoretical study in terms of the number of compounds in the study and the level of theory employed for the investigation. The computational results are presented in terms of the relative stability and planarity of compounds. These calculated results are compared to the experimental results in the literature in terms of their isomer distribution and toxicity. A better understanding of the nature of these compounds can lead us to a better management of and more sustainable solutions for POPs in the future.

Relative Stability
The association between the substitution patterns of each compound and its relative stability, with respect to its isomers, was assessed. The substitution pattern parameters include the degree of chlorination (using the number of H atoms, or the number of Cl atoms subtracted from 10 for PCBs and 8 for PCDDs and PCDFs) and the intra-ring and cross-ring interactions between two substituents. Intra-ring interactions are quantified by the number of pairs of Cl and non-hydrogen substituents (Cl, O, and C) at o-, mand ppositions on each phenyl ring. Cross-ring interactions are defined as the number of pairs of Cl substitution over an O-bridge (CR-O) and a C-C bond (CR-C). The possible parameters for intra-ring and cross-ring interactions are unique for each class of compounds, as shown in Table 1. The total number of o-, mand ppairs were also calculated from the tally of substituent pairs mentioned earlier. Overall, as listed in the table, there are a total of 11, 15, and 11 potential parameters for PCDDs, PCDFs, and PCBs, respectively. The following simple linear model was proposed for a relative stability prediction based on HF, B3LYP ∆G at 300 K, and MP2 electronic energy S = x 0 + 10 2 x 1 + 10 4 x 2 + 10 6 x 3 + 10 8 x 4 + . . . where S is the energy score and x 0 , x 1 , x 2 , . . . are possible parameters. Coefficients of 10 2n are used as some variables can take up to two-digit values (0 to 12 for example). Th association between energy scores and the calculated energy values was assessed using Spearman's correlation coefficient (ρ). Inspired by the knapsack problem, parameters that give the highest ρ values were added to the model one by one, until the increase in the ρ value became less than 0.0001.  Table 2 shows the resulting parameters, along with the corresponding ρ values. The results from all methodologies gave ρ > 0.99, and were mostly in agreement with each other. The major predictor for all compounds, in addition to the degree of substitution, are the intra-ring interactions. The predictor with the highest priority is the number of substituent pairs at opositions, regardless of substituent type, followed by substitutions at the mand ppositions, respectively. This supports the expectations of steric hindrance from the substituents (bulkiness: Cl > O ≈ C > H). Further, a cross-ring interaction over a C-C bond is an important parameter, especially for PCDFs, whereas a cross-ring interaction over an O-bridge is of less priority for PCDDs. With these results, the stability ranks of these compounds, regardless of methodology, can be sufficiently explained by their substituent pair interactions. For example, the energy scores S of PCDDs-13 and 14, when fitted according to the B3LYP results, can be calculated as:

PCDD HF B3LYP MP2
Parameters Parameters The weights used in this rank modelling are to provide a non-overlapping priority of consideration for each predictor variable. Qualitative interpretation then becomes obvious. Multiple linear regression for the prediction of energy values, following a similar knapsack approach, was also explored and is shown in Supplementary Table S6. In this case, the priorities of the predictors overlap and the already high ρ values can be increased further. Our computational model is able to predict the relative stabilities of all of these classes of compounds with a stronger correlation than the linear models proposed earlier [16][17][18][19], while not increasing the numbers of parameters.

Isomer Distribution
The energetic data in the previous subsection were used to predict the distribution of isomers within each homologue group. For this analysis, energy values were used to evaluate the distribution of isomers using the Boltzmann distribution at 300, 600, and 900 K [40]. The results were compared to available data on the abundance of these compounds in nature [37,[40][41][42][43][44][45][46][47][48][49][50][51][52][53][54][55][56]. A brief review of these data in the literature is shown in Supplementary Table S7. Our PCDDs and PCDFs results agree with a limited number of reports on experimental isomer distribution from incineration sources, while no clear trends were observed for PCBs (see Supplementary Table S8. The median ρ values calculated for each homologue group are 0.5833, 0.6555, and −0.1044 for PCDDs, PCDFs, and PCBs, respectively). For all three classes, the distribution of compounds found in nature usually differ due to the different accumulation and decomposition mechanisms in biotic and abiotic sources. For PCBs, it was argued that the distribution of isomers from the source is kinetically controlled rather than thermodynamically controlled [26].

Planarity
Planarity is widely claimed to be an important factor for these compounds to bind with the aryl hydrocarbon receptor (AhR) [57]. Therefore, geometric data were extracted and selected dihedral angles were calculated to represent the coplanarity of two phenyl rings. The designated carbon atoms for dihedral angle calculation are shown in Figure 1 and the list below.
The lists for PCDDs and PCDFs are separated into two groups representing two orthogonal axes of rotation. Absolute values of the acute angle representations of the angles were used for the mean calculation of each group.
Our data show that most PCDDs are planar, and therefore the data are insufficient for further prediction.
For PCDFs, with a limited number of compounds exhibiting non-planarity, there appears to be a moderate trend (R values ranging from 0.61 to 0.65) between the substitution pattern and coplanarity. The significant parameters (p < 0.05) involved are the presence or absence of a substituent pair at positions 1 and 9 and at positions 2 and 8. The reasons for this finding may be unclear due to a limited number of data points (n = 13) for analysis and moderate correlation. The prediction models were derived from multiple linear regression and are shown in Table S9.
For PCBs, a dihedral angle prediction, based on chlorine substitution position, was also assessed using multiple linear regression. In this model, the dihedral angle A of a compound is predicted by the equation: where; C 1 , C 2 , C 3 , and C 4 are the coefficients to be determined; and x i is 1 if position i is occupied by a Cl atom, and 0 if otherwise. Table 3 shows the coefficients of significant parameters (p < 0.05) and the corresponding correlation coefficients for the predictions. The results from all methodologies show that the most important parameter is Cl substitution at any one of the four positions ortho to the C-C bond connecting two rings, as mentioned in earlier findings [57]. This is then followed by the same kind of substitution on both rings (product of x 2 + x 6 and x 2 + x 6 ). Meta substitutions according to all methods and para substitutions, according to MP2 results, are less important parameters, as C 3 and C 4 are relatively small. Table 3. Constants, coefficients, and Pearson's (R) and Spearman's (ρ) correlation coefficients for the PCB planarity prediction model.

Toxicity
When comparing our geometric data to the toxic equivalency factor (TEF) values of these compounds [2], as shown in Supplementary Table S10, the following points are observed.

•
The twelve known dioxin-like PCBs with TEF values are among the ones with relatively small dihedral angles or are closer to being planar.

•
According to the B3LYP and MP2 results, there are three non-coplanar PCDDs (PCDDs-46, 65, and 71) from both methods, and two additional (PCDDs-42 and 45) from B3LYP only. All of these are not substituted at all four lateral positions (2,3,7,8), which are the substitution positions of PCDDs known to be toxic [2]. One of these, PCDD-46, is substituted at positions exactly opposite to that of 2,3,7,8-tetrachlorodibenzodioxin (TCDD), the most toxic dioxin-like compound [2].

Discussion, Conclusions and Future Work
Overall, this study provides high-quality quantum chemical calculations for the energetic and geometric properties of PCDDs, PCDFs, and PCBs. The comprehensive data provided for all of these three related classes of compounds can be used for further investigations. From our computational results, we have proposed prediction models for the important characteristics of these compounds. The relative stability prediction extends beyond earlier prediction models [16][17][18][19] with a high accuracy and with the allowance of qualitative interpretations. Prediction models have also been proposed for planarity, an important structural parameter for the toxic activities of these compounds. Furthermore, we have made associations between theoretical calculations (relative stability and planarity) and experimental data in the literature (abundance and toxicity).
Our work can be used for further analyses on the persistence and activities of the toxic compounds in nature. In terms of methodology, semi-automated codes in the Supplementary Materials can be extended further for the construction and analysis of other related compounds. All three methods, HF, B3LYP, and MP2, yielded similar trends for the classes of compounds in our study. Our results suggest that a computationally expensive treatment for correlation energy may not be needed for this purpose and a low-level method such as HF, in most cases, can sufficiently provide insightful findings on the molecular properties of the studied classes of compounds. The further investigation of different variations of DFT methods may also be useful. For example, range-separated hybrid functionals such as ωB97XD can be used for comparison with the results presented here.

Materials and Methods
We extended our exhaustive combinatorial investigation approach [58][59][60] to all members of the three classes of compounds. The Becke three-parameter hybrid functional combined with the Lee-Yang-Parr correlation functional (B3LYP), the most popular variation of DFT, was selected for this work. In total, three methods, HF/6-311++G(d,p), B3LYP/6-311++G(d,p), and MP2/cc-pVTZ [61], were used on Q-Chem 5.1 (developer version) [62] for geometry optimization. The HF-and B3LYP-optimised geometries were confirmed to be minima by verifying the absence of imaginary frequencies. All the raw output files and incomplete attempts made with MP2/6-311++G(d,p) are provided in the Supplementary Materials. The process of the structure enumeration and analysis of molecular properties, particularly relative energy and planarity, were completed by semi-automated scripts written in Mathematica for this project. The Mathematica notebook can be found in the Supplementary Materials. These Mathematica codes can be used for the construction and analysis of other compounds and for teaching purposes in classes relating to molecular modelling and cheminformatics. Our computational results were compared to the relative abundance of the compounds in nature and their estimated toxicity in humans and animals. The analyses presented are derived from the energetic and geometric data extracted from the computational results.
For energetic information, the electronic energy of each compound was obtained from the results of all methodologies and, additionally, the enthalpy, entropy, and Gibbs energy values were obtained from the frequency jobs of HF and B3LYP. Due to the free rotation about C-C bonds connecting the two phenyl rings in PCBs, 78 potential atropisomers [38] were explored and only the structures with the lower energy values were used for further calculations.
Supplementary Materials: The following are available online, Table S1: Summary of computational data and analyses on PCDDs, PCDFs, and PCBs in the literature; Table S2: List of all PCDDs by substitution positions and numbering, proposed by Ballschmiter et al., revised by Dorofeeva et al.; Table S3: List of all PCDFs by substitution positions and numbering, proposed by Ballschmiter et al.; Table S4: List of all PCBs by substitution positions and numbering, first proposed by Ballschmiter et al., later revised by IUPAC; Table S5: List of congener groups of PCDDs, PCDFs and PCBs by degree of substitution; Table S6: Values for constant C 0 and coefficients C 1 , C 2 , C 3 , . . . , parameters x 1 , x 2 , x 3 , . . . and correlation coefficients for the energy prediction model E = C 0 + C 1 x 1 + C 2 x 2 + C 3 x 3 + . . . , where E is the predicted HF, B3LYP ∆G at 300 K in Hartree and MP2 electronic energy in Hartree; Table S7: Summary of experimental data on abundance of PCDDs, PCDFs and PCBs in the literature; Table S8: Rank correlation coefficients (ρ values) for association between predicted distribution (using Boltzmann distribution) of PCDDs and PCDFs based on relative stability and observed distribution in fly ash samples from municipal waste incinerators (MWI); Table S9: Values for constant C 0 , coefficients C 1 , C 2 , and correlation coefficients for the PCDFs coplanarity prediction model; Table S10