Data on generation of Kekulé structures for graphenes, graphynes, nanotubes and fullerenes and their aza-analogs

Two new features are added to existing algorithms for kekulization of chemical structures, i.e., handling of triple and cumulene bonds in cycles and use of random atom sorting to remove unmatched atoms. Handling of triple and cumulene bonds enables kekulization of graphynes and graphdiynes. Random sorting speeds up the calculation time, i.e., kekulization of large chemical structures containing about 107 atoms takes ≤1 min on a typical PC. Source codes (Pascal, GNU GPL license) are included as a compiled application (Windows 64). Calculation times and unmatched atom statistics are provided for graphenes, graphynes, nanotubes, graphyne nanotubes and fullerenes. Benchmark comparisons are made for some data.


Specifications
Value of the data This data proves the viability of a fast algorithm for the modeling of state-of-the art materials such as graphenes in scientific and commercial applications The data is provided for very large chemical structures (10 7 -10 8 atoms) and is obtained using ordinary hardware with short calculation times This data has been benchmarked against existing algorithms and can be used for benchmarking for the future improvements to the algorithm

Data
Detailed description of an algorithm for fast generation of Kekulé structures from a list of atomic valences and connectivity matrices is given in [1]. The source code that was based on this algorithm and used for calculations is provided in this publication.
Model calculations were performed for graphenes, nanotubes, fullerenes and their aza-analogs, polycyclopentadienes and porphine, using a Windows 2012 server with 2.8 GHz i7 processor and 16 GB RAM. The procedures were not run in parallel but rather in a single thread. The Win API method GetTickCount() was used for precise time measurements. This method allows the measurement of the elapsed time with a millisecond precision. For the compounds containing less than 1,000,000 atoms, the computing time was determined as an average of 1000 calculations, less than 10,000,000 atomsas an average of 100 calculations and for more than 10,000,000 atomsas the time of one calculation.
For repeated calculations, the sorting order of the sequence of chemical bonds was random. For an algorithm of aromatic bond alternation, this is equivalent to the random selection of a double bond in the node. For a large number of calculations, this procedure allows estimation of the statistical distribution of unmatched atoms after alternation of aromatic bonds.
Compiled application (Windows 64) used for calculations is in the Attachment1.zip. Computerreadable chemical structures are in the Attachment2.zip.

Nanotubes
Carbon nanotubes of various lengths were generated using the pattern shown in Fig. 1. Generated nanotubes contain two acyclic methylene groups at the beginning and at the end of a nanotube.
In Fig. 1, the arrows show the points of attachment for generation of a polymer molecule. The points of attachment for the outer rings connect with the points of attachment for the inner rings. The order of an aromatic bond resulting from connecting a pair of attachment points is unknown. For the end moieties of the polymer, the points of attachment were replaced with single bonds to hydrogen atoms. The results of calculation are given in Table 1 and discussed below.

Graphene
The pattern for generation of graphene is shown in Fig. 2. To generate polymers, the points of attachment on the left were connected with the points of attachment on the right of another graphene block in the way similar to that used for nanotubes. The points of attachment of the first and the last blocks were capped with hydrogen atoms via single bonds. For acyclic carbon atoms, required for the generation of repeating aromatic cycles, two hydrogen atoms were added to the point of attachment, resulting in a methylene group. The results of calculations are given in Table 1.
Kekulization of various compounds, including nanotubes and graphenes, is described [2,3] and computing times are provided. For comparison, we performed calculations for some of the same compounds. The results are in Table 1. Computing times were dramatically shorter than those reported in [2].
Detailed discussion of this data can be found in [1].
Polymeric analogs of these compounds do not exist. Consequently, all the calculations were performed for monomers. To validate the efficiency of the algorithm for five-member cycles, model calculations were performed for azafullerenes by randomly replacing 2, 4 or 8 carbon atoms with nitrogen. Nitrogen has a valence of 3 and three single converging bonds in each node. This substitution can be done for the even number of atoms only. Otherwise, bonds cannot be alternated, and the number of unmatched atoms is odd. The results of calculations are given in Table 2.
Detailed discussion of this data can be found in [1]. In addition, model calculations were performed for 2488 fullerenes from a library by Yoshida [4]. This data is provided in Table S3. The legend for the column headers in this table is the same as for Table 2, except the column No. non-existent is not provided because every compound from the set of 1000 had a Kekulé structure.

Graphynes and graphyne nanotubes
We studied graphynes GY1 and GY7 (Fig. 4) and graphyne nanotubes of various degrees of polymerization. Graphyne nanotubes were generated by replacing hydrogen atoms in GY7 with carbon atoms and adding a bond between these atoms in a vertical position.
The times required for kekulization of graphynes and graphyne nanotubes of various degrees of polymerization are shown in Table 3.  Where: Compoundchemical structure in Fig. 2

Polycyclopentadienes
Polycyclopentadienes (Fig. 5) are remarkable because they contain odd-sized cycles and can be easily generated as long-chain polymers.  We studied three types of polycyclopentadienes. In the first type, after generation of the structure, free valences of the end-group carbon atoms with arrows were replaced with hydrogens. That resulted in a methylene end group. In the second type, free valences in the left-hand end group were combined with these in the right-hand end group to form a cycle. In the third type, end groups were combined in a crisscross fashion for form a Moebius loop. The results of calculations are shown in Table 4.
The legend for the column headers is the same as for Table 2. Parenthetical values in the column No. non-existent are the counts of structures for which Kekulé representations were found using a backtrack algorithm [5].
The calculation statistics for polycyclopentadienes differ from those for the rest of studied compounds. Specifically, 100 calculations were performed for the number of atoms in the 10 6 -10 7 range and 10 calculationsfor the number of atoms 410 7 . The increase in the number of calculations was due to the probabilistic nature of the algorithm for polycyclopentadienes, requiring multiple initial approximations for the generation of Kekulé structures.