Application of artiﬁcial neural networks for rigid lattice kinetic Monte Carlo studies of Cu surface di ﬀ usion

Kinetic Monte Carlo (KMC) is a powerful method for simulation of di ﬀ usion processes in various systems. The accuracy of the method, however, relies on the extent of details used for the parameterization of the model. Migration barriers are often used to describe di ﬀ usion on atomic scale, but the full set of these barriers may become easily unmanageable in materials with increased chemical complexity or a large number of defects. This work is a feasibility study for applying a machine learning approach for Cu surface di ﬀ usion. We train an artiﬁcial neural network on a subset of the large set of 2 26 barriers needed to correctly describe the surface di ﬀ usion in Cu. Our KMC simulations using the obtained barrier predictor show su ﬃ cient accuracy in modelling processes on the low-index surfaces and display the correct thermodynamical stability of these surfaces.


Introduction
Diffusion in crystalline material is an important phenomenon in many situations. For example, computational studies of irradiation damage may have a hard time finding agreement with experiments without accounting for bulk defect diffusion and annihilation after the initial collision cascade [1]. Surface diffusion, on the other hand, is hypothesised to play a role in e.g. the events preceding vacuum arc breakdowns in devices with high electric field gradients [2]. Vacuum arcs hinder the operation of many such devices, including particle accelerators, freeelectron lasers and fusion reactors. The immediate motivation for studying the Cu surface specifically is the Compact Linear Collider (CLIC) [3], which has been proposed to be built in CERN to succeed the Large Hadron Collider.
Diffusion is difficult to study with molecular dynamics (MD) because it is a much slower process compared to the MD timestep, which has to be small enough to capture the atomic vibrations. Kinetic Monte Carlo (KMC) is a method suitable for longer time scale studies, such as simulations of the diffusion in crystalline solids. In this method, diffusion is approximated as a series of migration events (jumps) between potential energy minima. The jumps, while actually determined by Newton's equations of motion with atoms vibrating most of the time near potential energy minima, only occasionally crossing the energy barriers between them, can in aggregate be regarded as stochastic events that occur at rates determined by the height of those barriers.
Such migration energy barriers must be known for each event that is to be considered in the KMC simulation. Depending on the desired specificity to distinguish between different events, the number of barriers that must be known may be too high to be calculated in a feasible time. Machine learning was proposed earlier as an alternative approach-the problem of too many barriers may be solved by only calculating accurately a subset of the barriers and obtaining the rest from a computationally inexpensive regression model. Djurabekova et al. used artificial neural networks (ANN) to predict migration barriers in Fe-Cu bulk [4,5]. Castin et al. [6][7][8][9][10][11][12][13], Pascuet et al. [14] and Messina et al. [15] have since applied them to study bulk diffusion in various Fe-based alloys. Machine learning for surface diffusion barriers have been used at least by Sastry et al. [16], who applied genetic programming for vacancy-assisted {1 0 0} surface migration barriers in Cu-Co alloy, and Verma et al. [17], who built a cluster expansion model for Ag, Al, Cu, Ni, Pd and Pt (1 0 0) surface migration.
In this paper, we will study the capabilities of ANNs to predict migration barriers on Cu surfaces. As an outcome of this study, we present a configuration of trained neural networks for parameterizing a rigid lattice KMC model of the (arbitrarily oriented) Cu surface. The networks and instructions to utilise them are published in the parallel Data in Brief publication [18]. On the one hand, this work is an extension of the rigid lattice surface diffusion model developed earlier in our group [19], towards a more detailed description of the atomic environments and thus, hopefully, more accurate dynamics in the simulations. For the numerous challenges in the migration barrier calculations on the surface, we deploy solutions developed by Baibuz et al. [20]. On the other hand, this work applies the methodology of refs. [4][5][6][7][8][9][10][11][12][13][14][15] to a new system: the face-centered cubic Reaction coordinate E m,reverse Figure 1: A schematic illustration of a 1-dimensional potential energy surface from a NEB calculation. The barrier E m and the reverse barrier E m,reverse can be obtained from the same calculation.
(fcc) crystal surface. We will discuss various obstacles of the methods and how to overcome them.
The KMC method, barrier calculations, and ANN models that we have used are described in section 2. The results are presented in section 3 and discussed in section 4. Conclusions about the applicability of ANNs for the considered problems are drawn in section 5.

Kinetic Monte Carlo
We used the residence-time KMC algorithm with the rigid lattice approximation, as implemented in the Kimocs code [19].
KMC models the time evolution of a system by choosing events to be carried out one after another. The probability of an event to be chosen is proportional to its rate Γ, which in our model is given by the Arrhenius equation: where ν is the attempt frequency, E m is the migration energy barrier, k B is the Boltzmann constant and T is temperature. In our current model, E m is strongly dependent on the local atomic environment of each jump, while ν is taken to be the same for all events. The ν parameter is connected to the vibration frequency of atoms, but it can also been seen as a scaling factor for the simulation time.

Barrier calculations
The barrier data set was calculated using the nudged elastic band (NEB) [21,22] method implemented in the LAMMPS molecular dynamics program [23]. The NEB method finds the minimum energy path (MEP) between an initial and a final configuration. The barrier is the difference between the highest energy point along this path E max (saddle point, since it is the maximum along the MEP and the minimum along other directions) and the initial state energy E i : This is also illustrated in fig. 1. The potential energy function we chose was based on the molecular dynamics/Monte Carlo corrected effective medium (MD/MC-CEM) theory, developed by Stave et al. [24]. The potential is reported to describe the properties of the Cu surfaces well [25].
To connect the barriers to the corresponding processes during KMC simulations, a process descriptor is required. This same descriptor can also be used when training the ANN regressor. We describe the migration processes by their local atomic environments (LAE) before the jumps take place. We include in the LAE the first and the second nearest neighbours (1nn and 2nn) of the initial and the final position, as was done in the earlier parameterizations of Kimocs [19]. In systems with the fcc structure, this definition of LAE covers 26 lattice sites in total (see fig. 2). We did not expand the LAE beyond the 2nn sites to keep the input space at a manageable size. The impact of the size of the local atomic environment on the accuracy of the surface migration barriers is subject to future study.
In ref. [19], the LAE was described using a 4-dimensional vector (a, b, c, d), where a and c are the number of 1nn atoms of the initial and the final position of the jump and b and d are the number of 2nn atoms of the initial and the final positions of the jump. We will hereafter refer to this approach as the "4D description", as it was named in ref. [20]. KMC simulations that employ the 4D description have been found to produce good agreements with both MD and experimental results [19,20,[27][28][29][30]. The 4D description does not include the information on the exact locations of the 26 LAE atoms, but only reflects the stability of the initial and the final positions by counting the number of neighbours in each neighbour shell. This descriptor is also not necessarily extensible to systems with multiple atomic species present. To save the location information, in our new descriptor the occupation state of each site is encoded with either 0 (vacant) or 1 (occupied). The descriptor is a 26dimensional binary vector, referred to as the 26D description. The total number of different LAEs that can be distinguished with this descriptor is 2 26 ≈ 67 million. This choice of LAE encoding does not restrict the fcc surface orientation that can be described, and it is equally well applicable for bulk diffusion processes or systems with more than one atomic species, by denoting different elements with 1, 2, etc. This type of encoding is similar to what was used in refs. [6][7][8][9][10][11][13][14][15] to describe processes in bulk systems with the body-centered cubic (bcc) structure. A similar descriptor was also used by Trushin et al. [31] for a self-learning 2D KMC model fcc surfaces, and by Latz et al. [32] for the 3D expansion of that model. A selflearning KMC model, unlike a machine learning parameterized one, does not build a regression function for migration barriers, but simply calculates them on-the-fly and saves each result in a searchable catalogue for later use. This kind of model needs a descriptor for labelling the catalogue, whereas a machine learning model uses the descriptor as a way to encode input for the regression function.
The 26D descriptor is not compact in the sense that some LAEs are mirror images of each other, and thus have the same migration energy barrier: multiple descriptors have the same expected output. Up to four different LAEs may belong into these "families" of equivalent processes. This is problematic from the machine learning point of view: even though the redundant symmetric cases may be included in the training data for the ANN, training will never be perfect and the network output will be different for each case. If the ANN predicts different barriers for mirrored, but otherwise physically equivalent, LAEs, diffusion may work differently e.g. on different {1 0 0} facets like (100) and (010), or be biased towards some directions. This problem was solved by removing the redundant processes from the training set systematically: only one process represents each family in the training set. When calling the network to produce the barrier of a given jump, the LAE is first transformed to correspond to the "representative" that is shown (or would be shown, in the case of extrapolation) to the network in the training.
In the remainder of this paper we will refer to the LAE configuration around the initial and final jump positions up to the second nearest neighbour shells as the LAE cluster. Even though, as it is said, the 26D LAE cluster contains the information about the neighbours up to the second nearest neighbour shell, for barrier calculations by NEB, we embed such a cluster in a larger lattice, where all the sites are occupied by the same atoms and affect the calculations in the systematic mannercalculating the migration barriers within an isolated cluster of a few dozen atoms would not be realistic. As a first approach we embed the LAE cluster in Cu bulk for the NEB calculations, similarly as we did in refs. [19,20]. This is valid especially in studies where only a limited number of vacancies is present in the LAE. However, when calculating surface migration barriers, the LAE is on average half-empty. Previously we discovered [20] that in some cases embedding such an LAE, essentially a large vacancy cluster (void), behaves differently inside bulk compared to the surface, causing very strong forces and unphysically high barriers in the range of tens of eV. We thus returned to the original recipe, calculating all the barriers with the LAE clusters embedded in a surface (see fig. 2).
Embedding the cluster in a surface is more complicated than in bulk. It is not immediately obvious which surface should be used, and how each LAE should be oriented to be best fit into the surface. To answer these questions, we developed an automated procedure, where 1. Three different surfaces were considered: {1 0 0}, {1 1 0} and {1 1 1}. Within the local environment limited to 2nn sites, also the higher index surfaces will resemble one of these lowest-index cases, and thus the resulting model can be generally applied on any surface orientation. 2. For each of the three surface orientations, trial configurations were generated by embedding the LAE cluster in every possible orientation (with the constraint that the lattice points of the LAE have to match the lattice points of the surface). The LAE was embedded deep enough that at most one layer of atoms was above the surface of the surrounding lattice. 3. The centre-of-mass of the LAE cluster was calculated in each of the trial configurations. The configuration with the lowest centre-of-mass with respect to the surface was considered to be the most stable one, and was selected to be used in the NEB calculation to find the barrier in this LAE.
We used a simulation cell of approximately 45 × 45 × 25 Å 3 in size. Depending on the specific crystal orientation, the exact dimensions of the cell varied. Expanding the cell beyond this size did not affect the barrier value significantly. The boundary conditions were periodic in the horizontal dimensions and two fixed atom layers at the bottom. Eleven replicas were used in the NEB calculations. In addition to the force given by the MD/MC-CEM potential, a tethering spring force was applied on each atom in the same way as in [20]. The tethering force keeps the atoms close to their initial lattice sites, but does not directly contribute to the potential energy of the system. Previous KMC simulations with barrier sets for Cu and Au, calculated using the tethering method, have also shown good agreements with both MD and experimental results [20,[28][29][30].
In our case, if any atom slips to unintended lattice sites during the NEB relaxation, the process will no longer correspond to its 26D descriptor and thus the obtained descriptor-barrier pair cannot be used in the KMC simulations. This is a challenge specific to the rigid lattice KMC. The problem is more severe when performing NEB calculations on the surface, where there are other adatoms around the jumping atom, which may also have very few neighbours and thus are not strongly bound to the their lattice site. These loosely bound adatoms tend to easily "follow" the jumping atom, or otherwise leave their original lattice sites. The tethering force constant, as implemented in LAMMPS, was set to 2.0 eV/Å 2 . For details about the tethering force approach, see [20].
The first set of barriers was calculated for adatom migration processes on flat {1 0 0}, {1 1 0} and {1 1 1} surfaces, with the migrating atom surrounded by different configurations of atoms in the same atomic layer. The set formed by these LAEs will be referred to as flat surface processes for the remainder of this paper. The ANNs were first tested by training and predicting barriers within this flat surface set, comprising 3168 processes.
The set was then expanded by calculating the barriers for all processes where every atom within the LAE cluster and the migrating atom had at least three 1nn atoms around it when embedded on a surface. In total, 11 652 085 processes were found in this category-approximately 17 % of the entire LAE space. Attempting to calculate the barriers for the rest of the configurations, where some atoms have less than three neighbours, is likely to produce artificial results-even with the tethering force applied. In KMC simulations with only 1nn jumps permitted, it is still beneficial to include these unstable events in the simulation as they can serve as intermediate steps in e.g. 2nn or 3nn jumps that would be possible in a real system. An ANN regression model, that has learnt the general tendencies of the LAE dependent migration barriers, is likely to give more realistic values for the barriers of these events than NEB calculations that are less than reliable far from potential energy minima. Thus, ANNs are a good way to expand the NEB-calculated set of barriers to allow for more realistic kinetics.

Artificial neural networks
ANNs are a class of machine learning methods that can be used for classification and function regression. A regressor is trained with known input-output pairs to learn the underlying function and also to predict the output for previously unseen input. In this work, ANNs were fitted to predict values of the migration barrier function E m = E m (LAE).
Two different ANN models were studied: multilayer perceptrons (MLP) and radial basis function (RBF) networks. The MLP implementation was taken from the Fast Artificial Neural Network (FANN) library [33], and the RBF networks are from Python's SciPy package [34].

Multilayer perceptrons
MLPs consist of one input layer, one output layer, and one or more hidden layers between them. The layers hold nodes that are connected to each other. In a fully connected feed-forward network without shortcut connections, all nodes of each layer are connected to all nodes of the next layer (see fig. 3 for a schematic illustration). This was one of the network structures used in this work. The other type of MLP structure used here was a cascade network where hidden nodes are added one by one during training so that each new node is connected to each of the old nodes. This way the network will have N hidden nodes in N hidden layers with shortcut connections between each layer.
The input is mapped to the output by passing it through the hidden nodes: each node calculates the weighted sum of its inputs and passes this value to the next nodes through an activation function. The weights are chosen in an iterative training process to produce minimal error in the training set. The training algorithm used in this work was the FANN library implementation of the improved resilient back-propagation (iR-PROP) algorithm proposed by Igel and Hüsken [35]. The purpose of the activation function is to allow for nonlinear regression. For the MLPs with static layout, we used sigmoid (logistic) activation functions in the hidden nodes. In the cascade networks the Cascade2 [36] algorithm sets the hidden node activation functions automatically. In the output node we found linear activations function to perform best in all MLP networks. With the default sigmoid activation function (scaled to accommodate for the full range of energy barrier values) in the output node, the training would often stagnate, with the network weights drifting towards extreme values before sufficient accuracy was achieved. Cascade networks would completely diverge for this problem with a sigmoid output activation function.

Input layer Hidden layer Output layer
To avoid using negative barrier values given by the linear activation, we take the barrier to be E m = max(0, E ANN ) during the simulations.
Another post-processing step was to forbid processes that would lead to complete detachment of atoms from the substrate. Examples of such processes are not included in the training set, so the ANN-predicted barriers for them are greatly underestimated.
The number of input and output nodes is determined by dimensionality of the problem at hand. In the case of learning the 1D barrier values that correspond to the 26D input, the input layer will have 26 nodes, and the output layer will have one node. The number of hidden layers and nodes is another matter of optimisation. In this study, a single hidden layer with 35 nodes was found to be optimal for the static (non-cascade) network. For the cascade networks, 30-70 nodes were needed for reaching a sufficient accuracy. Overfitting was not observed: the root mean square error in the validation set was equal to the error in the training set.

Radial basis function networks
RBF networks are somewhat similar to MLPs. The difference is that the activation functions are radially symmetric functions that depend on the distance r of the input vector of the prototype vector of each node. The Gaussian function is an example of a radial basis function. In this work, we used multi-quadric functions as basis functions; ε is a width parameter that was automatically adjusted by the library function. RBF networks usually have only one hidden layer, with one basis function corresponding to each hidden node. The nodes have their own prototype vectors, and there is one weight vector that has a dimensionality equal to the number of hidden nodes. The weights are set by a matrix inversion that is faster than the iterative training procedure used with MLPs, but very memory-intensive for large training sets. For training sets beyond the 3000 barrier flat surface set, this could not be done with the available resources-only MLP networks were trained with the full 11.7 million barrier data set.
The prototype vectors for the RBF networks can be chosen from among the input vectors in the training set and the number of these vectors could in principle be smaller than the total number of data points. In the SciPy library this functionality is not implemented-instead, there is always one prototype vector for each data point. As a result, the accuracy of the obtained RBF network will always be perfect for the training set. For this reason, in this work, only 50 % of the 3000 barrier data set was used to train the RBF networks in order to get an estimate on how well they can actually learn the migration energy function. To make a fair comparison to MLPs, these were also trained using only 50 % of the flat surface data.

Error reduction techniques
When the barrier set was expanded from the initial 3000 barriers to the 11.7 million barriers, the MLP accuracy decreased significantly, with the cascade networks performing the best. Two additional techniques were used to reduce the error in barrier prediction. Firstly, the barriers calculated on each surface-{1 0 0}, {1 1 0}, or {1 1 1}-were treated as separate sets and different ANN predictors were trained to predict barriers on each of these sets. The issue of knowing which predictor to call to get the barrier for an arbitrary LAE encountered during KMC was solved by introducing an ANN classifier. This network uses the same LAE input encoding, but instead of energy output, it outputs the class, or the surface that the input LAE corresponds to. We used 1-of-C encoding for the surfaces, meaning that the classifier has three output nodes with each node corresponding to one of the surface classes. A very good success rate was achieved for the classifier (see table 1). During the KMC simulation, every encountered LAE is first passed to the classifier and an appropriate regressor based on the classifier output is used to obtain the migration barrier. In the case that more than one classifier output signals non-zero value (the classifier is uncertain of the correct surface orientation), the barrier will be calculated as a weighted average of multiple regressors. Using this technique, the RMS prediction error decreased by approx. 10 % compared to using only a single predictor for the entire data set.
Secondly, groups of ANN regressors that were trained on different, random subsets of the training data were combined    Fig. 4 shows the distribution of barriers in the initial set of 3000 flat surface processes. Fig. 5 shows the accuracy of the static MLP, cascade MLP and RBF networks in this flat surface set. 50 % of the set was used in training, while the correlation is plotted in the full flat surface set.

Regression accuracy
The migration energy distribution in the full 11.7 million barrier set is shown in fig. 6. The large quantity of 0 eV barriers are for processes that are spontaneous either in one direction (e.g. jumping towards a much higher number of 1nn atoms), or, less frequently, both directions. The latter processes include some events on the {1 1 1} surface that happen via hexagonal close-packed (hcp) sites. The full barrier set can be found as a Data in Brief entry [18].
Comparisons between the 11.7 million barrier full set and the 4D-parameterized barrier sets are shown in figs. 7 and 8. In these figures the error bars indicate the highest and the lowest values of the barriers, which were calculated for different permutations within the same 4D description. The dark dot in the middle shows the value of the mean barrier calculated for all permutations. The mean value is calculated as a simple arithmetic average. Following the notation used in refs. [20,39], we will refer to two of such sets as Cu set 1 and Cu set 2. Correspondence between physical local atomic environments and the 4D descriptors is not one-to-one, and thus the 4D barrier sets have been constructed by selecting one representative case from the family of environments that correspond to each 4D descriptor. Cu set 1 was constructed by randomly selecting one of the permutations of nearest neighbors, while Cu set 2 was selected by determining the lowest energy configurations of the initial and final sites of a jump. The tethering force approach was applied to the barrier calculations in Cu set 2, with the same tethering force constant 2.0 eV/Å 2 as in this work. The general trend of barriers in our 26D set is similar to the 4D sets, especially to Cu set 2. It can be seen that the groups of configurations that correspond to the same 4D descriptor can have deviations of a few eV in energy barrier between themselves.
The combination MLP regressor accuracy in the full 11.7 million barrier set is shown in fig. 9. As explained in section 2.3.3, the final barrier predictor consists of three ensembles (one for each low-index surface) of five regressors each and a classifier to combine them. Training specialist networks to predict migration barriers on different physical surfaces improved total regression accuracy by 10 %, compared to training a single regressor to the entire data set. We propose that this improvement can be explained by the networks implicitly learning the effect of different surface relaxation and surface stress present on the different surfaces. These surface effects affect the energy of all surface atoms in the system, and thus modify the surface migration energy barriers.
Even though KMC does not have an explicit total potential energy parameter, the information about relative potential energy change in a jump can be inferred from the forward and the reverse barrier for that jump: ∆E = E fin − E ini = E m − E m,reverse (see figure 1). For a KMC model to produce thermodynamically correct behaviour tending towards potential energy minima, these ∆E values must be sufficiently accurate. We examined how accurately our machine learning model reproduces this thermodynamical information; fig. 10 shows the comparisons between the predicted ∆E and the values given by NEB. The overall correlation of the ∆E values is good, even though the model was fitted to the migration energy barriers only, and not the energy differences. We note, however, that in the region of barriers with near-zero values, the prediction of the ∆E values is less reliable. However, these barriers describe mainly unstable configurations that can be encountered due to limitations of the rigid lattice approximation. Since the barriers are low, they are not expected to affect the overall thermodynamic behaviour of the system.

KMC simulations
We performed a number of KMC simulations to verify the applicability of the developed ANN for simulations of surface diffusion processes. We chose three typical scenarios of surface evolution driven by the surface energy minimisation principle. These are (i) the flattening of a Cu nanotip on surfaces with different crystallographic orientation; (ii) equilibrium shape of a Cu nanoparticle reached after relaxation of a particles with the different initial shape; (iii) stability of Cu nanowires. All these processes can contribute to evolution of a rough surface with various surface features, which can be observed on surfaces subject to high electric fields. Since in the current study we aim to validate the proposed model, we focus on the processes under equilibrium condition (no electric field effects are yet taken into account). It is important to show that the model is stable and predicts physically reasonable behaviour of surfaces.    [20,39] that was calculated without the tethering force. Correspondence between the 26D and the 4D descriptions is many-to-one; red bars show the minimum and the maximum values, which are calculated for all possible permutations of the same 4D description. The blue dots are the mean values of the barriers in that range. Barriers over 1 eV are generally lower in the 26D set than in the 4D set.   850 K to 1200 K. For each temperature we performed 20 statistically different simulations. In some simulations at high temperatures, the surface started to self-roughen. We excluded these cases from the statistical averaging, since it was not clear how to account for the evolution of the height of the tip. See section 3.2.2 for more detail. Hence the final number of cases used for statistical averaging were as follows: 18 (1000 K), 20 (1050 K), 13 (1100 K), 16 (1150 K), 6 (1200 K), and full 20 at all lower temperatures. The attempt frequency value ν = 2.81 × 10 14 s −1 produced the best fit for the flattening time t f , comparing to the MD results ranging from 850 K to 1200 K, reported in ref. [19] (see fig. 11). As the flattening times span over many orders of magnitude, the ν parameter was fitted to minimise The comparison shows the full parameterization to follow the trend observed in the MD simulations somewhat closer than the 4D parameterization, even though the difference is not drastic. Arising from the event rate formula (1), the mean residence time t 0 and the mean migration barrier E a can be obtained by fitting to the flattening time data over a range of temperatures. The fitting parameters for the ANN parameterized KMC model, as well as for the KMC and the MD results from ref. [19] are tabulated in table 2. With the fitted attempt frequency, the flattening of the tips on all three surfaces was simulated at 1000 K. The mean results from 20 simulations are tabulated in table 3. In both tables we see much better agreement between the results obtained in the current work and those obtained with MD simulations. Fig. 12 shows the initial and the typical final configurations of these simulations.   [19], and KMC with machine learning (KMC+ML) refers to this work. The solid lines are fits of eq. (4) to the data points. 0.72 MD [19] 7.33 · 10 −14 1.00  Figure 12: Examples of the initial and final frames of the cuboid nanotip flattening simulations on the {1 0 0}, {1 1 0} and {1 1 1} surfaces at 1000 K. The simulation was stopped when the tip had reached half of its original height, replicating the experiment in ref. [19]. The boundary conditions in both horizontal directions were periodic. Colour coding (available online) is according to the height coordinate of the atoms.

Instability of the {1 1 0} surface
At temperatures T ≥ 950 K, the {1 1 0} sometimes becomes unstable before the nanotip has time to flatten. In those cases, the surface near the nanotip starts to deplete, growing larger and larger {1 0 0} and {1 1 1} facets (see fig. 13). The atoms accumulate on top of the nanotip, which thus cannot be expected to flatten. The probability of destabilisation increases with temperature: see fig. 14 for the probability of roughening as a function of temperature. Additional 100 simulations on the {1 1 0} surface were run for the statistics in that figure, with a slightly different settings for the simulation box: the horizontal dimensions were 118 × 130 Å 2 , and surface was oriented to align atomic ridges of the {1 1 0} surface with the one of the sides of the simulation box. Fig. 15 shows examples the shapes of the nanoparticles before and after KMC relaxation. Each simulation was performed at 1000 K, each nanoparticle was evolving for about 1-4 µs, depending on the size. The figure is organized in panels separated by the thin vertical lines. Each panel is named according to the initial shape of the nanoparticles. Each panel shows three different size nanoparticles of the same initial shape, which are shown to the left of the panel and complemented by the final shape of the same nanoparticle after annealing on the right. The last panel shows the evolution starting from the minimal surface energy case, which is given by the Wulff construction, as  generated by the Atomic Simulation Environment (ASE) [37]. At finite temperature, the shape should be expected to fluctuate near the minimum energy configuration, if the model produces correct surface energies. Since all particles eventually relaxed into shapes close to the one given by the Wulff construction, we conclude that our ANN model captures the surface energies correctly through the migration barrier information, and hence the thermodynamical evolution of the system given by the model is reliable.

Nanowire stability
We simulated Cu nanowires of 0.5 nm radius at 1000 K. Thin nanowires melt at temperatures considerably lower than bulk Cu; Granberg et al. found Cu nanowires of diameter 1.5 nm to melt at 1000 K, using a similar MC/MD-CEM potential for which the bulk melting points was 1656.72 K [40]. Furthermore, Cu nanowires are known to fragment by Rayleigh instability at much lower temperatures than the melting point [41,42]. Nevertheless, we chose a small wire radius and a temperature of 1000 K to accelerate the simulation, with the primary goal of analyzing the performance of the current parameterization of KMC simulations of Rayleigh instability in thin wires as well as the mechanisms of such instability.
In fig. 16 we show examples of the 1 1 0 nanowire junction simulations. In these simulations, atoms diffuse along the wires towards the central junction, tending to minimise the surface area of the structure. The sharp dips at the junction are filled up by the atoms from nearby regions of the intersecting nanowires. When a stable channel to deliver the atoms from the wire to the knob at the junction is established, the atoms move from large surface-to-ratio areas at a nanowire to the smaller surface-tovolume regions at the knob. This leads to thinning of the wires and eventually breaking near the center. The first breaking occurred right next to the junction in all of the 20 simulations. The mean time for the first breaking to occur in a system of crossing nanowires was 8 ± 4 ns. Individual nanowires were much more stable, taking 250 ± 130 ns to break by diffusion. Errors are standard deviations observed in 20 simulations with different random seeds.
Fragmentation of Cu nanowire networks at the intersection points was observed experimentally by Mallikarjuna et al. [38] and Oh et al. [43]. Similar behaviour was found in Au nanowires by Vigonski et al. [28] both in experiments and in KMC simulations using the Kimocs code.

Discussions
The prediction performance of the ANNs in the set of the 3168 flat surface process barriers ( fig. 5) is extremely good. This evidence supports the applicability of ANNs for surface migration barrier prediction and the validity of the 26D parameterization for describing the LAEs. The accuracy observed for the full 11.7 million barrier set was considerably lower ( fig. 9). Some improvement was achieved by dividing the barrier data into three subsets by surface orientation-{1 0 0}, {1 1 0}, and {1 1 1}-and training specialist networks to predict barriers on each surface. We propose that the improvement is due to the networks implicitly learning the effect of the different surface stresses on the migration barriers. Further accuracy was gained by training ANN ensembles to each barrier subset, and averaging the predicted barrier over them. Even with these additional techniques, an accuracy comparable to the flat surface case was not reached.
The reason for the loss of accuracy in the full set is not entirely clear. The full set is three orders of magnitude larger than the set of flat surface barriers, with a more heterogeneous composition of the LAEs and a larger range of barrier values. The data itself seems also to be of reasonable quality, as the overall agreements with previous barrier sets (figs. 7 and 8) are good.
One possible explanation is, in addition to the different ranges, the different distributions of barrier values in the sets. The full set distribution ( fig. 6) resembles half of a Gaussian distribution except for the large amount of zero barriers, while the 3000barrier flat set distribution ( fig. 4) has no such "discontinuities". This kind of discontinuity in the output value distribution may be one of the reasons for the difficulties in the training. To verify this, in future works it could be considered to assign some negative pseudo-barriers to the 0 eV cases for the duration of the training to make the output distribution smoother. During KMC, negative barriers given by the ANN would be set back to zero. Another method to smoothen the distribution could be by introducing a finer classification scheme: in addition to splitting the data set into subsets of {1 0 0}, {1 1 0}, and {1 1 1} processes, further subdivisions could be made based on the jumping direction on the surface or on the stability of the LAE (e.g. how many neighbours the least bound atom in the LAE has).
The complexity of the network architecture could be questioned as a potential culprit for the lack of accuracy-are single layer ANNs and cascade networks sophisticated enough to produce sufficient fitting? However, the networks used in this work are very similar to what has been found to work well in earlier studies on machine learning for barrier prediction. References [5,[7][8][9][10][11][13][14][15] use ANNs with only one hidden layer. Cascade networks are used in refs. [6] and [12]. The earlier rigid lattice variants of the method also use a similar descriptor:  Figure 15: Examples of nanoparticle shapes before and after 1-4 microseconds of KMC simulation at 1000 K. Regardless of the initial shape, the particles relax to the same final shape that is very close to the minimum energy Wulff construction. a vector of lattice sites encoded with integers according to the occupation state of each site. One of the differences in the earlier studies compared to this work is the size of the LAE-up to hundreds of atoms instead of the 26 used here. Expanding the LAE could thus provide an additional way to improve accuracy, although in ref. [16] genetic programming was successfully applied to predict barriers using LAE with up to 2nn sites in the limited case of vacancy-assisted diffusion.
Despite the apparent lower accuracy of the ANN in the full barrier set, it produced physically reasonable results for nanotip flattening, nanoparticle shape relaxation and simulations of nanowire instability. The attempt frequency value 2.81 ×  [20]). The flattening times for the nanotips on the {1 0 0} and the {1 1 1} surfaces are also much closer to the MD results than with the KMC model proposed in ref. [19] (see table 3), although the flattening mechanism is different than seen in MD. Due to limitations of the rigid lattice KMC, the reorientation of the crystal structure, which is commonly seen within the nanotip oriented in 1 0 0 direction [19,44,45], is not possible within this approach.
As was mentioned in the results section, the thermodynamics of the Cu surface system appear to be well captured by our model, even though it was not explicitly trained to do so. Evidence towards this is given both by the prediction errors of the energy differences (see fig. 10) and the relaxation of the nanoparticle shapes ( fig. 15). Only a small fraction of the jumps have their energy differences predicted in the opposite order.
Regarding the instability of the {1 1 0} surface that was observed at T > 950 K, it is worth mentioning that the real Cu {1 1 0} surface is known to self-roughen at high temperatures. The lower bound for the roughening temperature T R was experimentally estimated to be at 600°C (approx. 900 K) initially by Mochrie [46] and later by Zeppenfeld et al. [47]. Kern estimated T R to be 1070 K [48]. Häkkinen et al. found a clear roughening transition to happen at 1000 K in molecular dynamics simulations [49].
In experiments, the roughening phenomenon is usually observed indirectly from changes in the diffraction of e.g. X-rays or He-atoms from the surface. In computational and theoretical models, the roughening is perceived as the proliferation of atomic steps, as the formation energy of the steps disappears upon reaching T = T R . In molecular dynamics simulations, roughening precedes premelting, where the whole surface becomes covered by a thin liquid-like layer at a temperature below the bulk melting temperature [50]. While rougheningformation of atomic steps-could in principle be modelled in rigid lattice KMC, premelting certainly cannot. Further studies are required as to whether the behaviour observed in this work is a good description of the roughening phenomenon, or if the transition at 1000 K is more of a coincidence. In any case, we advise caution when modelling the Cu {1 1 0} surface at elevated temperatures with this ANN KMC parameterization.
As for the results for the stability of individual and crossing thin 1 1 0 nanowires, the model suggests that fragmentation occurs first near the junctions of two nanowires. Individual nanowires took approximately 250 ns to break, while the first breaking occurred in approximately 8 ns in the systems of two nanowires. This behaviour is similar to what was earlier shown for Au wires in both simulations and experiments [28]: nanowire systems tend to break at the junction points earlier and at lower temperatures than individual nanowires. The Au and Cu systems are rather well comparable, since both metals are fcc materials with similar relative surface energies on the low index facets. Some experimental evidence for similar behaviour in Cu exists: Mallikarjuna et al. observed increased electrical resistance in Cu nanowire mesh if photonic welding was carried on excessively long [38]. They attributed the loss of resistance to breakup of nanowire junctions, which they also saw in FE-SEM images. Oh et al. also saw broken Cu nanowire junctions after using very high current in Eddy current welding [43], although they propose that breaking was due to oxidisation of the junction; in our model, oxygen is not present. Despite this limitation, the similarity of our results compared with these experiments and with previous results for Au wires, provides sufficient validation of the proposed model.

Conclusions
We have developed an artificial neural network that can predict Cu surface migration energy barriers with sufficient accuracy and implemented it into a Kinetic Monte Carlo model. The parameterization is fully three-dimensional, i.e. it is applicable to arbitrarily rough surfaces. The Kinetic Monte Carlo model is able to accurately simulate the energy minimisation of Cu nanoclusters, the thermal flattening of small nanotips as well as the fragmentation of nanowires. Our model predicts the {1 1 0} surface to be unstable at temperatures above 1000 K, very near to the known roughening temperature of this Cu surface. The exact mechanism of the instability may be different from actual roughening, so we advise some caution when using the model at high temperatures.