A proposed simulation method for directed self-assembly of nanographene

Abstract A methodology for predictive kinetic self-assembly modeling of bottom-up chemical synthesis of nanographene is proposed. The method maintains physical transparency in using a novel array format to efficiently store molecule information and by using array operations to determine reaction possibilities. Within a minimal model approach, the parameter space for the bond activation energies (i.e. molecule functionalization) at fixed reaction temperature and initial molecule concentrations is explored. Directed self-assembly of nanographene from functionalized tetrabenzanthracene and benzene is studied with regions in the activation energy phase-space showing length-to-width ratio tunability. The degree of defects and reaction reproducibility in the simulations is also determined, with the rate of functionalized benzene addition providing additional control of the dimension and quality of the nanographene. Comparison of the reaction energetics to available density functional theory data suggests the synthesis may be experimentally tenable using aryl-halide cross-coupling and noble metal surface-assisted catalysis. With full access to the intermediate reaction network and with dynamic coupling to density functional theory-informed tight-binding simulation, the method is proposed as a computationally efficient means towards detailed simulation-driven design of new nanographene systems.

energetics of chemical synthesis steps using density functional theory for specific surfaces and carbon-based molecules (e.g. Blankenburg et al [8]). In modeling self-assembly processes, molecular-dynamics [9] and kinetic Monte Carlo [10,11] have also been used for specific precursor molecules, surfaces and reaction conditions. In this paper, a physically transparent method is proposed, which uses a model-system approach and exploits the selective functionalization of precursor polyaromatic hydrocarbon molecules for directed nanographene kinetic self-assembly. The advantage of the kinetic self-assembly approach is that it provides a theoretical framework to identify simple sets of rules underpinning molecular assembly that are not easily probed using kinetic Monte Carlo or molecular dynamics methods.
Predictive modeling requires flexibility to account for various precursor molecules and reaction pathways. Already several reaction types can lead to a variety of graphene nanostructures, including complex systems such as graphene nanoribbons with added atom decoration [12]. One benefit of a predictive kinetic self-assembly approach is that other, yet-to-be explored synthesis routes can be simulated. For example, the Ullmann reaction [13] that is used in Han et al [14] provides a route to both symmetric and asymmetric couplings. However, greater versatility in nanographene products can also be obtained using asymmetric Suzuki-Miyaura couplings, which allow for different reaction conditions and energetics (i.e. catalysts, etc) [13]. Thus, there exists several types of aryl-aryl couplings for designer click-chemistry synthesis. A further benefit of predictive simulation is the ability to explore a parameter space for the reaction conditions allowing for materials discovery via directed synthesis. For example, the parameter space can be investigated to suggest reactants and reaction conditions for self-assembly products with desired length-to-width ratios. Specific to the kinetic self-assembly method, intermediate products and reaction networks can also be interrogated to determine the most energetically favorable pathway for directed nanographene design.
A novel feature of the proposed self-assembly method is that it maintains physical transparency by using matrix arrays to store the precursor, intermediate and product molecular structures. These arrays exploit the base symmetries of the molecules and can be easily manipulated using matrix operations (rotations, translations, etc) to generate complex networks of reaction possibilities, which are then stored using reciprocal space compression. All of these features allow for a predictive model that is efficient and can be modified to include different precursors, surfaces and types of reactions.
To demonstrate the predictive kinetic self-assembly method, a simple test case is explored of functionalized tetrabenzanthracene and functionalized benzene synthesis of nanographene with atomic-scale patterning (figure 1). Chemical point-functionalization of tetrabenzanthracene and benzene is well established, e.g. Artal et al [15] and Freeman et al [16], and the assembly of tetrabenzanthracene systems is also of interest [17]. For the purpose of detailing the computational method, the exemplar study assumes a minimal model approach. In this respect, ranges for the bond activation energies are explored, with surface catalysis effects included to first order. The reactions are also assumed to be couplinglimited with this deemed plausible against chemical synthesis and DFT studies that show the rate-limiting step to be defunctionalization of the precursor molecules [6]. Although the details of the surface are not explicitly considered, these can later be added as an extension. For example, the energetics for the kinetic self-assembly modelling can be expanded to included specific surface-molecule interactions determined via density functional theory simulations. The specific choice of nanographene self-assembly (figure 1) is further motivated by coherent transport results for armchair-edge devices of this kind that have hydrogen edgepassivation. These results, obtained using a density functional theory-informed, generalized tight-binding method [18], show a tunable conductance gap as a function of the device length-to-width ratio (figure 2). The formation of band gaps and conduction gaps is expected in patterned systems due to the loss of conduction channels arising from the patterning, with these effects previously reported for patterned and vacancy-defected nanographene [20,21]. The possibility to tune the conductance gap, as evidenced here, motivates the need to efficiently develop new synthesis methods for the controlled production of novel nanographene products that are of good quality and have specific features (such as the length-to-width ratio, atomic-scale patterning, etc). We will use the test case in this work (figure 1) to show that the proposed kinetic self-assembly approach is an efficient and transparent means of determining the possible chemical synthesis energetics, experimental conditions and reaction pathways to achieve these aims. Further, we will demonstrate how the model parameterization can be compared against available data to then propose a viable type of aryl-functionalization and catalyst to obtain optimised synthesis of nanographene with length-to-width ratio tunability.

Matrix formalism
Efficient and physically transparent kinetic self-assembly modeling is proposed by representing each molecule (precursors, intermediates and products) by an array. As an example of its use, figure 3 shows the array representation for a functionalized tetrabenzanthracene structure with its symmetry encapsulated in a sheared honeycomb lattice representation. In this full structure array, the molecules are succinctly described by sub-molecule groups pertaining to the A reaction sites, the two adjacent B reaction sites (considered together) and the benzene components, rather than by individual atomic positions. The sub-molecule groups are represented by the number 1 in the array, with the two adjacent B reaction sites assumed to react simultaneously.
In addition to the full structures, arrays are also constructed for the molecular reaction sites so that these can be tracked. Here, figures 4 and 5 show the full structure and reaction site arrays for the functionalized tetrabenzanthracene and benzene structures, respectively. Symbolic representations of these arrays (also shown in figures 4 and 5) provide condensed and physically transparent storage of the essential molecule sub-groups (i.e. reaction sites and benzene components). The array representation allows for lattice transformations commensurate with the array coordinates, e.g. molecular rotations in increments of 30°, translations, as well as 2-fold plane inversion transformations. These operations, and the crosscorrelation of arrays, determine possible molecule-molecule orientations, interactions and relative positions in which the molecules can form bonds.
The array functionality in the kinetic self-assembly model is illustrated in the bonding of two tetrabenzanthracene molecules via the autocorrelation of the full structure (Full Full) and reaction site (A A) tetrabenzanthracene arrays (figure 6). In this example, a sliding window translates one molecule over the other, from top to bottom and from left to right. Snapshots of two possible translations showing the overlap of the A-A reaction sites are given (figures 6(a) and (b)). For the single A-A overlap event in figure 6(a), a value of 1 is recorded in the autocorrelation arrays (circled) and, for the double A-A overlap event in figure 6(b), a value of 2 is recorded in the autocorrelation arrays (circled) (figures 6(c) and (d), respectively). The value of 13 in the centre of the Full Full array represents the event where the two molecules are fully superimposed. In figure 6(d), a corresponding value of 4 in the centre of the A A array denotes the number of overlaps of the A-A reaction sites associated with the full superposition of the two molecules.
The mathematical description of this array functionality involves the circular cross-correlation between two molecular  The conductance gap (∆G) in units of the quantum conductance (e 2 /h) as a function of the device length-to-width ratio for the armchair-edge device (figure 1) with hydrogen edgepassivation. These coherent transport results were calculated using a generalized tight-binding model [18]. structure arrays, which for arbitrary molecular structures f and g is Here, M is the maximum width and N is the maximum height of the f and g matrices having array indices i and j, and M × N is the size of the output matrix. More succinctly, the circular cross-correlation can also be obtained using discrete Fourier transforms and by applying the correlation theorem [19], such that Here, F D and F −1 D are the discrete direct and indirect Fourier transforms, respectively, and p and q are indices in phase space. Using a reciprocal space representation provides a computationally efficient means of storing detailed molecular information, full reaction history, intermediates and the final reaction products.
Following the circular cross-correlation procedure, all possible binding events between the two molecules are determined before binding with a third party molecule can be considered. Binding between molecules where two bonds form is permitted as this event prevents molecular rotation, with single bonds disallowed to prevent steric hinderance. For the two tetrabenzanthracene molecules in this example (figure 6), possible A-A binding events have been identified by , and translated into the array representation (c). The number 1 in the array corresponds to one of the three constituent sub-molecule groups defined for this system, namely the functionalized sites A and the two adjacent B sites (the latter assumed to react simultaneously), as well as the benzene ring components.  numerical matches between elements in the cross-correlated Full Full and the A A binding arrays (see figures 6(c) and (d)). Thus, only the double-bond binding events indicated by the number two (highlighted with a square), which match in the Full Full and A A arrays, are allowed.
One advantage of the array method is that intermediate arrays (i.e. product structures from binding events) can be stored and tracked throughout the simulation. The intermediate arrays are created from any permitted binding event, such as the one shown in figure 6(b), and are easily accessible. For example, assume that f g(i, j) results in a binding event at (a i , a j ) due to a match between the full and binding arrays at this coordinate position. Translating the array element g(i, j) by (a i − 1, a j − 1) will recreate the overlap sliding window for that element. By then applying this translation to each element in the g molecule structure array, the co-joined molecules, and hence an intermediate product structure array, can then be produced from This procedure can also be applied to the binding arrays. An example showing the intermediate structure generated from the A-A binding event between two tetrabenzanthracene molecules (figure 6(b)) is given in figure 7. To obtain the intermediate full molecule array from the symbolic structure (product full, figure 7(c)), all of the nonzero elements, including the bound sites (figures 7(a) and (b)) are reassigned to one. In the functional group array, all elements greater than one are removed as the binding positions have been used (figure 7(d)).

Kinetic self-assembly modeling
The reacting system evolves via the Arrhenius equation where in the example system, k = k AA and k BC are the rate constants for the A-A and B-C reactions, respectively (figure 1). Here, A o = 10 9 s −1 is the pre-exponential factor, which takes into account the number of possible reactions per second (see for example Pawin et al [22]). k B T = 0.034 eV, where k B is the Boltzmann constant and T = 400 K is the reaction temper ature suitable for aryl-aryl cross-coupling reactions that are catalysed by a metal surface or in solution [6, 13,  23-26]. In the type of self-assembly proposed, dehydrogenation is not required and therefore higher reaction temperatures have not been considered. Although catalyst effects are not explicitly nor directly included, these can be added to first order by perturbing the activation energies [28,29]. Hence, in this respect, Ea AA and Ea BC are chosen as free parameters in the range 0.01-1.0 eV corresponding to a possible molecular-reaction phase space for catalyst-assisted synthesis [6,23,[26][27][28]. In terms of surface interactions, free diffusion across a homogeneous surface is also assumed so that the system is well mixed and coupling limited. This assumption is justified against studies that show the largest reaction barrier, and hence rate-limiting step, to be defunctionalization (e.g. dehalogenation, against aryl diffusion and aryl combination as per Björk et al 2013 [23]). Within this reaction space, conditions for producing high-quality nanographene with length-to-width tunabilty via directional growth will be determined. These reaction conditions will also be compared against published density functional theory studies to assess experimental feasibility and to propose the type of molecular functionalization (A)-(C) and catalyst.
Between 400 (coarse-grain limit) and 2000 (fine-grain limit) precursor molecules are used in each simulation run with the volume of the reaction chamber set so that an initial molecule concentration of 1000 molecules per litre is maintained. The rate of bond formation is proportional to the molecule concentrations [ f ] and [g], with reactions between identical molecules containing a correction term to avoid double counting. For example, for two reacting f molecules, Once formed, it is assumed that there is no mechanism for molecular aggregates to break apart. At the start of this example simulation, functionalized tetrabenzanthracene and functionalized benzene are assigned to molecules f and g, respectively. The types of bonds between the molecules are first determined using the cross-correlation algorithm (section 2.1). In this case, f-f molecule binding is possible through eight different rotations and translations, each time forming two A-A bonds, whereas, for f-g (i.e. B-C) binding, bond formation is possible in eight different rotations. This information is contained in the geometries of interaction array, The calculated rate constants for these bindings are determined using the Arrhenius equation (equation (5)) and stored in the rate constant array [k], with the molecule concentrations also stored.
By obtaining the product of the elements in the above two arrays, a binding rate array can be determined for the rates of reaction, The sum of the upper quadrant elements of the binding rate array denotes the total reaction rate for the system until the next reaction is simulated. Within this time step, the reaction rates are normalized such that the probabilities for each reaction are in the range (0, 1).
A probabilistic approach is chosen to stochastically select the next occurring reaction using the Gillespie algorithm [30]. Using this method, only interactions that are possible within the population at each reaction step are considered, thus forming a reaction network. Reactions are chosen stochastically using a pseudo random number generator to select from the probability-weighted list of reactions. Following these processes, the simulation timer is advanced by a characteristic time step of 120s representing the time during which an experimentalist could terminate the chemical synthesis process, thus completing the Gillespie simulation step.
The Gillespie approach decreases the computational search space so that fewer interactions are considered as the total number of reacting molecules decreases over time.
In implementing the Gillespie algorithm, the relative likelihood of binding due to the mobility and size of the reacting comp onents has not been considered. These constraints are in keeping with the reactions not being diffusion limited and in assuming a well-mixed homogeneous volume. Although a well-mixed system is typical for the Gillespie approach, the model can be extended to include diffusion [31,32]. These extensions have not been explored here, but may be required for other types of reactions where the effect of surface directionality [33] and chemical reaction by-products [23] need to be considered. The simulation finishes when there are no possible reactions or when any further possible reactions are likely to take more than the characteristic time step of 120s. As the simulation is stochastic, 20 simulations have been run at each of the 100 sampling points in the Ea = 0.01-1.0 eV (coarse-grain) range to statistically represent the system behavior. For fine-grain sampling, a smaller activation energy space is explored with 40 simulations per sampling point.
At regular intervals during the simulation, the state of the system is saved so that the time-evolution and the reaction network can be analyzed. The synthesized molecular products are assessed using the parameters, length (L), width (W), and the number of whole benzene rings (N) in the molecule ( figure  8). The length-to-width ratio interrogates directional growth, and the area occupancy defines the completeness of the synthesis. Here, N I is the number of whole benzene rings in an ideal synthesized system made from the available building blocks, which can fit into the L × W area. A combination score is defined where L > 3 and W > 4, i.e. when there has been a binding event in both the A-A and B-C directions. Otherwise this parameter is set to zero when L 3 and/or W 4 and defines regions in the Ea AA versus Ea BC phase space where, on average, there is no growth, or uni-directional growth only, i.e. the latter being for Ea AA Ea BC , and vice versa. Interrogation of the properties in equations (8)-(10) within the Ea AA versus Ea BC phase space will determine the optimal energetics to produce nanographene from non-trivial, directed self-assembly growth in both length and width dimensions, and with well-defined completion (area occupancy).

Results and discussion
A schematic phase diagram corresponding to the activation bond energies Ea AA and Ea BC has been derived from the simulation results, with key regions defined in phase space for the expected type of reaction end products (figure 9). These regions can be explained in terms of the reaction energetics of the bond activation energies relative to k B T as per the Arrhenius equation (equation (5)). When both Ea AA and Ea BC are much greater than k B T , then no reactions occur within the characteristic time step (120s) (top right, figure 9). When Ea AA Ea BC , reaction products are produced that have single unit cell length, but variable width, and vice versa. Of interest is a region that occurs in the centre of figure 9 (boxed) where non-trivial two-dimensional growth occurs. This region will later be analyzed in detail.
Phase diagrams obtained via coarse-grain simulation show mean values for the length (figures 10(a) width (b)), length-to-width ratio (figure 10(c)), occupancy (i.e. degree of completion) and combination score (figure 10(d)) for the nanographene products. In the No Reactions region where Ea AA and Ea BC are k B T (see figure 9), there is evidence that the initiator molecules remain largely unreacted as the combination score C = 0, i.e. in this case, L 3 and W 4, however, the area occupancy O remains maximal. Single-width and single-length regions are found where Ea AA Ea BC and Ea AA Ea BC , respectively. These regions also have C = 0 due to uni-directional growth (i.e. L = 3 or W = 4), but with high values of O, thus indicating good completion.
If both Ea AA and Ea BC are very low (bottom left of the phase diagrams in figure 10), then all possible reactions can occur very quickly resulting in largely uncontrolled and nondirectional growth. At such low reaction energies, a single disordered nanographene product may be produced as the molecules non-discriminatorily bind to many points in the growing system. In general, regions that result in mostly non-directional growth are indicated where W ∼ L, together with low occupancy O (blue dominant region in figure 10(d)). When both C and O are low (white region in figure 10(d)), there starts to be width-dominant growth W > L, but with extremely poor completion O leading to a low combination score C. Ranges pertaining to the observables in equations (8)-(10) obtained using the coarse-grain sampling are summarized in table 1.
At the bottom of the length-dominated phase region ( figure  10(a)), and at the top of the width-dominated phase region ( figure 10(b)), there appears a band where the combination score C is high, extending from a region of length-dominant growth to width-dominant two-dimensional growth (see figure 10(d)). This region of transition is regarded as an area of  The boxed region (center) is an area of interest for non-trivial twodimensional growth that will later be explored in detail.
interest for non-trivial directed self-assembly. Thus, to obtain nanographene products with good completeness and to avoid single unit cell dimension systems, the activation energies for Ea AA and Ea BC are set to 0.25-0.70 eV and 0.60-0.90 eV, respectively (boxed area in figure 10). Fine-sampling from this reduced area ( figure 11) indicates a much richer phase space than what was extrapolated to in the coarse-grain sampling simulations (see figure 10).  A statistical representation of these simulations provides further insight and interpretation of the results as indicated by the standard error of the mean taken over the 40 measurements per sampling point ( figure 12). For example, in the region of length-dominant growth (bottom left, figure 11), there occurs a reduced area occupancy score O and number of products N P indicating that molecular bonds are produced quickly. As there are fewer constraints on unfavorable bond formation, evidence of uncontrolled synthesis can also be found in the correspondingly high standard error in R and O (figures 12(a) and (b)). In the width-dominant region ( Ea AA 0.35 -0.70 eV and Ea BC 0.60-0.75 eV), the standard error associated with these observables is comparatively lower.
Although candidate regions in the Ea AA and Ea BC phasespace show length-dominant and width-dominant, i.e. directional growth, the results indicate that further optimization is needed to achieve length-to-width tunability and to reduce defects, thus improving the quality, efficiency and predictability of the synthesis process. Modifying the bond activation energies by catalyst intervention may be one mechanism to achieve this aim. Another approach would be to change how the reaction is started such that the initial mixing of the initiator molecules is a possible experimental variable. Thus, the simulation could be modified by introducing different reactants at varying times to influence the production of lowdefect nanographene with length and/or width tunabilty.
To perform this test, a point in the Ea AA versus Ea BC phase space is selected where the variability of the length-to-width ratio R is investigated as a function of the rate of functionalized   Functionalized benzene is introduced into the reaction chamber at different rates, or all at once (∞), at Ea AA = 0.30 eV and Ea BC = 0.75 eV. Average nanographene widths relative to the width of a single tetrabenzanthracene unit are shown to be dependent on these rates. Quicker rates (red) yield predominantly single-width systems. For slow rates (green), the reaction is terminated before any benzene is bound to the ribbon. At intermediate rates (blue), wider nanographene systems form. The inset images show representative systems that are produced. benzene addition to the reaction cell. The activation energies Ea AA = 0.30 eV and Ea BC = 0.75 eV are chosen as the initial results showed good molecular completion (high O) and efficient synthesis with low N P and low corresponding standard error for all quantities measured (figures 11 and 12). Although the mean length-to-width ratio R of the products was ∼1.0, the result at Ea AA = 0.30 eV and Ea BC = 0.75 eV is in a region close to the length-dominant and width-dominant regimes. Thus, perturbing the reaction conditions may influence the system towards directional growth with good completion (i.e. to produce low defect products).
Variable mixing rates for the addition of functionalized benzene at the start of each simulation were tested, with the addition of benzene continuing until its amount became equal to tetrabenzanthracene. The characteristic time step of 120 s was maintained. The results show that the width of the nanographene products can be tuned as a function of the simulated functionalized benzene addition rate ( figure 13). If all of the functionalized benzene is added at the start of the reaction, or very quickly, then only single-width nanographene systems are formed on average (i.e. growth in the length-direction is preferred) ( figure 13 (red)). When the addition-rate approaches 10 3 s −1 , wider nanographene products are produced (figure 13 (blue)). At low addition rates (around 10 −1 s −1 ) there is not enough functionalized benzene to start a reaction before the simulation terminates, thus producing single-width nanographene only ( figure 13 (green)).
At faster addition rates, single-width growth of nanographene occurs due to larger amounts of benzene being available to cap the tetrabenanthracene. This effect makes it statistically unfavorable for molecular reactions as benzene-benzene structural clashes impede further width growth (figure 13, red-box inset). At lower addition rates, slower benzene addition can facilitate width growth (figure 13, blue-box inset). At the chosen bond activation energies, the binding of benzene to tetrabenzanthracene is not intrinsically unfavourable. Rather, the lower effective concentration of benzene corresponds to a lower rate of bond formation (equation (6)), enabling wider nanographene systems to form. Notably, these wider nanographene products also have low numbers of defects with the area occupancy O being maintained above 90% (figure 13, blue-box inset). Closing all of the gaps in the nanographene systems could be facilitated as a further simulation step after the model system has equilibrated through the addition of other functionalized molecules.
The results in figure 13 predict that the nucleation of self-assembly, in this and similar systems, could be best performed with the gradual introduction of initiator species into the reaction chamber (i.e. by using a time-dependent exper imental proto col). Further simulations are needed to determine whether this is essential to achieve optimal selfassembly under different reaction conditions, hence, parameterization of the model. The required time to halt assembly (included in the model as the characteristic time step of 120 s) is proportional to the activation energies, and this could also be altered to check for more efficient simulation conditions. A benefit of the kinetic self-assembly method is the ability to interrogate the intermediates and reaction pathways from each simulation. Figure 14 provides an example of a reaction network for the simulation of functionalized benzene and tetrabenzanthracene synthesis at Ea AA = 0.30 eV and Ea BC = 0.75 eV. To quantify the intermediate synthesis pathways in the simulation, a flow parameter f is computed, which is defined as the occurrence of a specific intermediate molecule multiplied by the number of initiator molecules that have been used to produce it. Thus, the flow parameter provides a time-averaged measure of the proportion of initiator molecules that aggregate into specific intermediate molecules normalized to the number of initiators at the start of the synthesis. For example, both functionalized tetrabenzanthracene and benzene have values of f = 1.0 at t = 0 ( figure 14).
Interrogation of the reaction network and study of the flow metric as a function of the synthesis conditions could lead to the engineering of certain synthesis pathways and experiments that target specific (i.e. directed) molecular growth or reaction products with desired properties. Coupling the bottom-up kinetic self-assembly model dynamically with rapid simulation to determine the properties of the products, such as the generalized tight-binding method [18] (figure 2), could enable further real-time selectivity of the chemical processes and outputs.
We have focused this paper on demonstrating a novel computational method for directed nanographene self-assembly, which is physically transparent with the example results from the model providing an indication of its potential use in predictive simulation. A region in the activation energy range of interest Ea AA = 0.25-0.70 eV and Ea BC = 0.60-0.90 eV was found for directed self-assembly of functionalized tetrabenzanthracene and benzene resulting in length-to-width ratio tunability and controlled quality of the reaction products. The defunctionalization step (assumed to be the rate limiting step), which occurs twice for the A-A reaction and four times for the B-C reactions leads to energy barriers of Ea AA /2 and Ea BC /4 assumed for each point defunctionalization. This results in reaction energetics ranging from ∼0.13-0.35 eV for each A-atom defunctionalized and ∼0.15-0.23 eV for each B-or C-atom defunctionalized (assumed to be energetically equal).
Current density functional theory literature shows that the activation energy barrier for dehalogenation is surface catalyst specific, for example, the Ullmann mechanism for dehalogenation of aryl halides can occur with an activation energy barrier of ∼0.4 eV for a copper surface [6,23]. Lower activation energy barriers are possible for other surfaces, such as palladium [34], or if other conditions for site activation are considered, such as the oxidative state, or metal promotion of the surface [35]. Comparison of the reaction energetics determined from this work against the available density functional theory data (referenced here) suggests the proposed synthesis may be experimentally plausible via halogen functionalized aryl-aryl cross-coupling and noble metal surface-assisted catalysis. A more precise prediction would require density functional calculations of the energy barriers (both coupling and diffusion) in relation to the specific system studied with various functionalization and catalytic surfaces tested. In this respect, the proposed kinetic self-assembly method could be used to determine new synthesis processes and for materials discovery of novel nanographene.

Conclusion
An algorithm for simulating nanographene production using a predictive, kinetic self-assembly method has been proposed. The kinetics-based model allows for refinement of existing protocols (initiator molecules, bond energies, reaction temper ature), and the ability to introduce new experimental protocols (surface interactions, etc). Nanographene systems with desired characteristics via the studied metrics (lengthto-width ratio, degree of completeness, and number of reaction products), and required properties, can be searched for by simulating different synthesis conditions. Through these metrics and their standard errors, the reproducibility of the experiments can also be assessed.
As an exemplar test-case, the self-assembly of functionalized benzene and tetrabenzanthracene was used to demonstrate the functioning of the model. A region in the bond activation energy phase space was determined, which has potential for directed self-assembly growth via tuning of the length-to-width ratio of the product molecules. Further control of the length-to-width ratio was shown by the slow addition of functionalized benzene into the reaction chamber. We provided an assessment of the reaction conditions against available published results on synthesis energetics, which suggest that the synthesis may occur via halogen-functionalized aryl-aryl cross-coupling and nobel metal assisted catalysis. Detailed studies via density functional theory on the proposed system would be needed to confirm these predictions. A benefit of the proposed application of kinetic selfassembly is that full access to the reaction network provides the possibility to tune the synthesis conditions to bias specific reaction pathways and reaction products. Dynamic coupling of the self-assembly method to a density functional theoryinformed tight-binding approach could facilitate rapid simulation-driven nanographene design with prediction of device properties. Future work will include more specific conditions, such as surface interactions, therefore providing further realistic refinements. Although only shown in a minimal model capacity, the simulations suggest an efficient and physically transparent approach towards bottom-up engineering of nanographene.
Trust (United Kingdom), 097326/Z/11/Z, PhD studentship. R T gratefully acknowledges a Royal Society Leverhulme Trust Senior Research Fellowship (United Kingdom), LT130088. Y H was awarded an EPSRC (United Kingdom), EP/P505798/1, PhD studentship from the Department of Physics for J P C B. All data created during this research are available by request from the University of York Data Catalogue http://dx.doi. org/10.15124/bd416f9b-44a8-4cc0-8831-4c2055762d39.