In silico geometric and energetic data of all possible simple rotamers made of non-metal elements

This article presents theoretical data on geometric and energetic features of halogenated rotamers of the following backbone structures, C—C, N—N, P—P, O—O, S—S, N—P, O—S, C—N, C—P, C—O, C—S, N—O, N—S, P—O and P—S. The data is considered to be comprehensive combinations of non-metal elements in the form abcx–ydef whereby a,b,c,d,e,f are halogen (fluorine to iodine), hydrogen or a lone pair and x,y are carbon, nitrogen, phosphorus, oxygen and sulfur. Data were obtained from ab initio geometry optimization and frequency calculations at HF, B3LYP, MP2 and CCSD levels of theory on 6-311++G(d,p) basis set. In total, 8535 non-enantiomeric structures were produced by custom-made codes in Mathematica and Q-Chem quantum chemical package. Extracted geometric and energetic data as well as raw output files, codes and scripts associated with the data production are presented in the data repository.


Specifications
Value of the data • The origin of energetic preference for staggered structure in ethane [1][2][3][4][5] and gauche structure in 1,2-difluoroethane [6][7][8][9][10][11] has long been debated and sometimes controversial. The comprehensive data set presented in this article fills in the gap in the literature and can be used for further analysis and discussion in relevant topics such as gauche effect [7 , 8 , 12] and bent bond [7 , 13] . • Similar to cis effect where the cis or ( Z ) isomer is more stable than trans or ( E ) isomer [14] and relative stability of positional isomers of substituted benzenes [15] , gauche effect is demonstrated in this data set by many examples where steric hindrance alone fails to account for the observed relative stability trend. • For reference purpose, 15665 rotamers are identified with internal numbering, SMILES and PubChem CID. (Out of 15665 rotamers, 1713 rotamers (11%) are identified with CID, of which only 631 are unique.) These can be used in future theoretical or experimental work involving two-center non-metal rotamers. • Source codes and raw data are available for reproduction of the work and further analysis.
For example, molecular dipole moment and vibrational spectrum can be extracted from the raw output. Source codes can be used to generate molecules of related classes for further calculation.

Data description
There are 15 folders for C -C, N -N, P -P, O -O, S -S, N -P, O -S, C -N, C -P, C -O, C -S, N -O, N -S, P -O and P -S. In each folder, there are four subfolders for four different methodologies, HF, B3LYP, MP2 and CCSD. In addition to raw output files (.out) and geometry in Z -matrix and Cartesian coordinate format (.xyz), the following summary table files (.csv) are provided in each subfolder: • A single csv file in xyz subfolder containing geometric data of 7 bond lengths in Å , 12 bond angles and 9 torsional angles in degree (If lone pair(s) are involved, there will be less numbers of geometric parameters and 'de' is shown in place of a numerical value.) • Energetic data, in separate csv files, include electronic energy ( E elec ) in a.u. (Hartree), thermal correction to enthalpy ( H corr ) in kcal mol −1 , zero-point vibrational energy ( E ZPE ) in kcal mol −1 and entropy ( S ) in cal mol −1 K −1 .
An example of these data is shown in Fig. 1 . Names for compounds exist in two different formats and due to symmetry, there are up to six ways to write these out regardless of the format. Therefore, a rotamer name may not exactly match a file name in many instances. Source codes, scripts and examples are provided in a separate folder.

Experimental design, materials, and methods
Exhaustive listing of all rotamers can be done in many different approaches. We completed our comprehensive lists of all rotamers by extending the approach we have used for substituted benzenes [15] . Rotamers as viewed by Newman projection can be equivalent to substituted benzenes with two additional conditions. First, the list of substituent elements must include a lone pair of electrons. Second, rotamers are less symmetric compared to benzenes with regards to rotation and flipping. Q-Chem 5.2.1 [16] , IQmol 2.13 [17] and Wolfram Mathematica 12.0 [18] were used in the same way as described previously [15] . In addition to compounds in Tables 1-8 , Preliminary calculations were also completed for all combination of single atom of C, N, P, O, S and hydrogen/halogen atoms from F to I. Tables 1-8 provide a comprehensive listing of all rotamers considered in this work. The listing is first arranged by the number of substituent elements and pattern of empirical formulas. An explanation on how to calculate the number of chemical empirical formulas in each table is given in Table 9 . Rotamer structures are also listed for each pattern. Each rotamer structure can be rotated three times unless it is symmetric (cannot rotate) or has chiral center(s) ( ×2 for each center). Asterisks ( * ) shown in Tables 1 , 2 , 4 , 6-8 indicate chiral centers. There are two special cases of meso compounds in Tables 1 and 2 which have a reduced number of rotamer structures. Similar symmetrical cases were also found in previous study of substituted benzenes [15] . Since enantiomeric structures are identical in energy, only one of the two enantiomeric structures is considered for each pair. Table 10 provides an overview of all computational jobs described in this paper.
All optimization jobs converged in Q-Chem before geometry information were extracted. The converged results are not necessarily the same conformer as the input. For example, some anti forms are turned into gauche forms during the geometry optimization. Almost all of the converged rotamers were confirmed to be local minima by frequency calculation (16,729 out of 16,840 jobs). However, there were 111 frequency jobs with exactly one imaginary frequency        Similarly, 28 of 34 rotamers of N -N, P -P and N -P with an imaginary frequency are in an anti form from the perspective of the two lone pairs. As these observations suggest that the anti form is not stable, gauche effect is evident in these classes of compounds.

Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Table 6 List of 3125 possible C -N (or C -P) rotamers in 126 formulas (1625 non-enantiomeric rotamers).

Rotamers per formula
Number   Table 9 Examples for number of empirical formula calculation. • k is the actual number of substituent elements and • n i ! is the product of the factorial of the number of substituent elements with the same subscript.

Table 10
Summary of 43450 computational jobs included in this paper (opt for geometry optimization and freq for frequency calculation).