Geometric and energetic data from quantum chemical calculations of halobenzenes and xylenes

This article presents theoretical data on geometric and energetic features of halobenzenes and xylenes. Data were obtained from ab initio geometry optimization and frequency calculations at HF, B3LYP, MP2 and CCSD levels of theory on 6–311++G(d,p) basis set. In total, 1504 structures of halobenzenes, three structures of xylenes and one structure of benzene were generated and processed by custom-made codes in Mathematica. The quantum chemical calculation was completed in Q-Chem software package. Geometric and energetic data of the compounds are presented in this paper as supplementary tables. Raw output files as well as codes and scripts associated with production and extraction of data are also provided.


Specifications
Chemistry Specific subject area Physical and Theoretical Chemistry/Spectroscopy Type of data Tables and Q-Chem output files How data were acquired Quantum chemical computation on Q-Chem 5.2.1, Developer Version Data format Raw and analysed Parameters for data collection Hartree-Fock (HF)/6-311 ++ G(d,p), Becke, 3-parameter, Lee-Yang-Parr (B3LYP)/6-311 ++ G(d,p), Second order Møller-Plesset perturbation theory (MP2)/6-311 ++ G(d,p) Coupled Cluster Singles and Doubles (CCSD)/6-311 ++ G(d,p) Description of data collection Geometric and energetic data from quantum chemical calculations of halobenzenes, xylenes and benzene were generated by quantum chemical computation and processed by custom-made codes Data source location Mahidol University, Salaya, Thailand Latitude and longitude: 13.792790, 100.325707 Data accessibility With the article Value of the data • All 1505 possible halobenzenes and three xylenes are explicitly shown in this paper with numbering, IUPAC name, PubChem CID and SMILES. These can be used as a reference for both theoretical and experimental work involving this class of compounds. • Geometric and energetic data can be used for further analysis to understand relative stability of isomers. In particular, the unexpected trend in relative stability of isomers are of particular interest to scientists in a similar manner to cis and gauche effect. The data set includes many examples where steric hindrance alone fails to account for the behaviour observed in halobenzenes and xylenes. • Raw data as well as associated scripts and codes are provided so that interested researchers can reproduce our data and perform calculation at other levels of theory or for other relevant classes of compounds. Vibrational spectrum and other detailed information can be extracted from output files as needed. There are many potential uses of the spectral information, for example, detection of xylene for food safety application [1] and understanding formation of polychlorinated biphenyls (PCBs) [2] . The data can also be a test set for molecular modelling software packages.

Data description
A total of 1505 unique compounds of benzene, including all degrees of substitution with F, Cl, Br and I atoms, and three isomers of xylene were investigated. Classification and counting of the 1505 compounds are exhaustively shown in Tables 1 and 2 with specific examples in Figs. 1-3 . The main difference between Tables 1 and 2 is the treatment of hydrogen atom. In Table 1 , hydrogen is treated in the same way as halogen and this leads to the binomial coefficients 5 k for five kinds of elements. In Table 2 , hydrogen is treated in a special way and this leads to binomial coefficients 4 k for four kinds of halogen atoms. Table 3 summarizes the total number of Q-Chem 5.2.1 [3] output files for different classes of compounds, types of calculation (geometry optimization/frequency calculation) and levels of theory (HF, B3LYP, MP2, and CCSD) In supplementary information, summary table files (.csv) are provided per level of theory.
• Geometric data of 12 bond lengths, 12 bond angles and 12 torsional angles in a single csv file Fig. 1. List of 6 + 7 + 3 = 16 structures of halobenzene with empirical formula C 6 αβγ 2 δ 2 (distribution of elements 1-1-2-2). For simplicity, the two δ are omitted and structures are organised into groups by which from left to right, the first four substituents are in positions 1,2,3,4-, 1,2,3,5-and 1,2,4,5-, respectively. If switching the red letters of a structure leads to a different isomer, then that single depiction represents two different structures as shown with the notation "×2". Letters α, β, γ , and δ represent different substituents of F, Cl, Br and I. (For Table 1 , one of the letters may represent a hydrogen atom.).   Fig. 2 . (Reassignment of letters is needed.).

Table 1
List of all compounds by the number of elements bonded to carbon atoms (In total, there are 1505 benzene and halobenzene compounds with 210 possible empirical formulas.). • Energetic data, in separate files, include electronic energy ( E elec ) in Hartree, thermal correction to enthalpy ( H corr ) in kcal mol −1 , zero-point vibrational energy ( E ZPE ) in kcal mol −1 and entropy ( S ) in cal mol −1 K −1 .

Number of elements
The following associated files are also provided.
• Raw Q-Chem output files (.out) for all compounds.
• Geometry in Z-matrix and Cartesian coordinate format (.xyz) for all compounds.

Experimental design, materials, and methods
Due to prohibitive computational cost, frequency calculations at MP2 and CCSD levels of theory were excluded and only benzene to dihalobenzenes and xylenes were selected for CCSD optimization jobs. The output files were processed by custom-made scripts and Wolfram Mathematica 12.0 [4] codes to extract geometric and energetic data of all halobenzene compounds in Table 2 List of all compounds by different degrees of substitution to benzene (In total, the number of compounds and empirical formulas is the same as in Table 1 ).   Fig. 3 . b See Fig. 1 .

Table 3
Summary of investigated compounds, levels of theory (HF, B3LYP, MP2, and CCSD) on 6-311 ++ G(d,p) basis set and types of calculation (opt for geometry optimization and freq for frequency calculation). a similar manner to our previous work [5] . Data from the three xylene compounds are provided for reference purpose and were read from IQmol 2.13 manually [6] .