Cation interstitial diffusion in lead telluride and cadmium telluride studied by means of neural network potential based molecular dynamics simulations

Using a recently developed approach to represent ab initio based force fields by a neural network potential, we perform molecular dynamics simulations of lead telluride (PbTe) and cadmium telluride (CdTe) crystals. In particular, we study the diffusion of a single cation interstitial in these two systems. Our simulations indicate that the interstitials migrate via two distinct mechanisms: through hops between interstitial sites and through exchanges with lattice atoms. We extract activation energies for both of these mechanisms and show how the temperature dependence of the total diffusion coefficient deviates from Arrhenius behaviour. The accuracy of the neural network approach is estimated by comparing the results for three different independently trained potentials.

Using a recently developed approach to represent ab initio based force fields by a neural network potential, we perform molecular dynamics simulations of lead telluride (PbTe) and cadmium telluride (CdTe) crystals. In particular, we study the diffusion of a single cation interstitial in these two systems. Our simulations indicate that the interstitials migrate via two distinct mechanisms: through hops between interstitial sites and through exchanges with lattice atoms. We extract activation energies for both of these mechanisms and show how the temperature dependence of the total diffusion coefficient deviates from Arrhenius behaviour. The accuracy of the neural network approach is estimated by comparing the results for three different independently trained potentials.

I. INTRODUCTION
Lead telluride (PbTe) and cadmium telluride (CdTe) are semiconductors that combined constitute an immiscible system, which is often employed in the manufacturing of various nanostructures such as twodimensional nanolayers, one-dimensional nanowires and zero-dimensional quantum dots [1,2]. Experiments show that even at very high temperatures, but below the melting point, these two compounds remain separated and morphological transformations of the shape of the particular phases occur, both, during annealing of an as-grown sample [3] and multilayer crystal growth [4].
By the appropriate choice of the experimental conditions, such as the thickness of the deposited layers, growth rate and temperature, one can design nanoobjects of desired shape and size [4][5][6]. From the application point of view, the most interesting structures are PbTe quantum dots and quantum wells embedded in CdTe, which can emit and detect light in the mid-infrared range [1,2,7].
Despite many experimental studies concerning PbTe/CdTe growth, the theoretical understanding of the dynamics of the underlying processes occurring during the morphological transformations is scarce. The multistage disintegration of a single PbTe layer inserted between two bulk CdTe materials into separate quantum dots during anneling at high temperatures was reproduced by the Cahn-Hilliard model [3]. Furthermore, the growth of PbTe/CdTe multilayers was studied by means of kinetic Monte Carlo [5,6]. However, the models used in these studies have coarse-grained character without explicitly taking into account the underlying processes on the microscopic level. Hence, the atomistic mechanism of the morphological transformations across the PbTe/CdTe interface is still unknown.
On the atomic scale one can expect that the morphological transformations described above are the result of diffusion of atoms across the PbTe/CdTe interface [8].
Point defects such as interstitials or vacancies could play a crucial role. For instance, interstitials from one compound could diffuse across the interface to the other one and participate there in the transformation of the crystal structure. Therefore, the first step in studying the mechanism of the morphological transformations is to focus on diffusion of native defects in PbTe and CdTe, without considering the interface.
Molecular dynamics (MD) simulations are a common method for studying the time evolution of condensed matter systems. In order to perform such simulations one needs to determine the forces acting on the atoms, which are often described by empirical potentials. There exist empirical potentials for PbTe [9,10], and CdTe [11,12] crystals. However, the accuracy of empirical potentials is in general not adequate due to their limited flexibility. Alternatively, ab initio MD simulations can be performed, which are, however, computationally expensive and therefore not practical for the analysis of the diffusion of defects, where large systems and long simulation times are required. In order to overcome the difficulties related to both these approaches, we employ the neural network method of Behler and Parrinello [13][14][15], in which a high-dimensional artificial neural network is trained with energies and forces determined by means of ab initio calculations. It has been shown that such neural networks are capable of accurately reproducing the references data and deliver the accuracy of ab initio methods at a fraction of their cost. To date, this approach has been applied to study properties of various systems [16,[19][20][21].
In this work, neural networks for bulk PbTe and CdTe are trained and used to carry out extensive molecular simulations for the diffusion of single cationic interstitials in these materials over a range of temperatures. From the obtained MD trajectories we calculate the diffusion coefficients for the interstitials as a function of temperature and analyse the different diffusion mechanisms occurring in the two systems. For both materials, the same calculations are performed for three independently trained neural network potentials in order to estimate the overall accuracy of the method.
The remainder of the article is organized as follows. In Sec. II we review the main physical properties of PbTe and CdTe crystals. Then, in Sec. III, we introduce the computational techniques that we employed in our study and in Sec. IV we present the results of our simulation. Finally, in Sec. V we conclude by discussing our findings.

II. SYSTEMS
PbTe and CdTe are semiconductors belonging to the groups IV-VI and II-VI, respectively. They possess very similar lattice constants, however, their lattice structures are distinct. According to diffraction experiments the lattice constant a of PbTe at room temperature is 6.46Å [22] and that of CdTe is 6.48Å [23]. The lattice constant mismatch is therefore very small. However, there exists a lattice-type mismatch. PbTe crystallizes in the rocksalt structure and CdTe in the zinc-blende structure. Their lattices can be in fact represented as two interpenetrating face-centered cubic (fcc) cation and anion sublattices shifted with respect to each other by a/2 for rocksalt and a/4 for zinc-blende. Because of the different crystallographic structures the coordination number for PbTe and CdTe is also different. In PbTe each atom has 6 nearest neighbours, while in CdTe each atom has 4 of them.
From annealing and growth experiments it is known that systems containing both PbTe and CdTe are immiscible, i.e. instead of mixing they remain separated in thermal equilibrium [3,4]. Consequently, an interface is created, with which some surface free energy is associated [8]. The ratios of the energies for different interface orientations controls the shape of the nanoobjects that emerge in the experiments. Among these nanoobjects there are nanolayers, nanowires and quantum dots. The equilibrium shape of the PbTe quantum dots embedded in CdTe was predicted using ab initio methods [8] and their electronic properties were also studied [24].
Since PbTe and CdTe share the common anion (Te), the morphological transformations at the atomic scale must occur through the diffusion of cations (Pb and Cd) across the interface and the subsequent reorganization of the local lattice structure. It has been suggested that point defects play an important role in this process of cation exchange [25].
There are several possible types of point defects in the crystal structure: vacancies, interstitials, Schottky and Frenkel defects. Vacancies and interstitials are the simplest defects, however, the barrier for the diffusion of vacancies in PbTe and CdTe is very high [26]. Therefore, in this work we focus only on interstitials. We consider native interstitials diffusing in the bulk of PbTe and CdTe. Since we study PdTe and CdTe separately, our simulated systems do not contain interfaces between the two materials.
In PbTe crystals all energetic minima for Pb interstitials are equivalent. Each Pb interstitial atom is surrounded by 4 nearest-neighbour Pb and 4 nearestneighbour Te lattice atoms [25]. In CdTe, on the other hand, there are two non-equivalent energetic minima for Cd interstitials. One of them is a tetrahedrally coordinated site with Te atoms and octahedrally coordinated with Cd atoms, the other one has the reversed configuration of the first one [26]. They are labelled T a and T c , respectively.

III. METHODS
In order to study the diffusion of cation interstitials in PbTe and CdTe, the following computational strategy is employed. First, the Vienna Ab initio Simulation Package (VASP) [17] is used to calculate the reference data, i.e. electronic ground state energies and forces for a given set of configurations of PbTe and CdTe with densityfunctional theory (DFT). These energies and forces are then used to train neural networks by means of the Neural Network Potential Package (n2p2) [27]. Finally, the trained neural networks are used for performing MD simulations with LAMMPS [28]. In the post-processing of the trajectories obtained at different temperatures the position of the interstitial is identified at each time step and from the interstitial's trajectory its diffusion coefficient is extracted.
This approach is used iteratively, that is the training set is repeatedly expanded with new configurations, for which new energies and forces are calculated, and the neural network is retrained at each iteration with the expanded set. These retrained neural networks are subsequently used to perform new MD simulations.
Below we describe the methods used in this work in more detail.
A. Density-functional theory All our DFT calculations are carried out with the exchange-correlation functional PBEsol [18]. The spinorbit coupling is not considered. As the starting point for generating the set of reference configurations, 4×4×4 PbTe and CdTe supercells with the experimental lattice constants were used. Subsequent structures were created by increasing and decreasing the lattice constant, and randomly displacing the atoms from their positions in the perfect lattices. As the neural network was trained, configurations obtained from the MD trajectories were also subsequently added. Supercells containing one interstitial atom were also included in the training set such that the total number of atoms in a structure was either 512 or 513.
The Brillouin zone integrations for all structures were performed using the Γ-point only. A convergence test for the k-point mesh revealed that the forces obtained with a denser 3×3×3 k-mesh did not significantly differ from the Γ-point calculations but lasted roughly 10 times longer.

B. Neural network potential
The data obtained from VASP then can be used to train a neural network potential, which represents the force field used in the MD simulations. The input of the neural network is an atomic configuration and the output is its electronic ground state energy. Since the neural network is an analytic function of the atomic coordinates, the forces acting on each of the atoms can be determined by simple differentiation.
For training the neural network potential and employing it in MD simulations we use the approach introduced by Behler and Parrinello [13]. In this approach, the information about the atomic configurations is encoded in so called symmetry functions [29]. They transform the Cartesian coordinates into values that are invariant with respect to the translation and rotation of the whole system, and to the exchange of two arbitrary atoms of the same type. The values of these symmetry functions serve as input for the neural network.
The first thing to consider in constructing suitable symmetry functions is the cutoff radius R c , which defines the size of the local environment around a given atom. Its value should be selected in such a way that all the relevant interactions are taken into account. Here R c = 6Å is chosen, which guarantees that in addition to all the nearest neighbours other atoms within a single fcc elementary cell are included in the cutoff sphere. The cutoff is implemented by cutoff functions f (R), whose values go to zero beyond R c . Several forms of the cutoff function have been proposed [27]. In this work, we choose the polynomial where x = R ij /R c and R ij is the distance between the pair of atoms i and j. The derivatives of this particular function are continuous at the cutoff radius up to second order, such that forces change continuously as atoms move in and out of the cutoff sphere.
Since the purpose of the symmetry functions is to provide a structural fingerprint of the environment of every atom in the system within a certain cutoff radius, they depend on the relative position of the given atom (called the central atom) and each of its neighbours. Symmetry functions are classified either as radial or angular [29]. Radial symmetry functions depend only on the distances between the central atom and its neighbours and are expressed as a sum of two-body functions. On the other hand, angular symmetry functions depend additionally on the angle spanned by the central atom with each pair of its neighbouring atoms. Therefore, they are expressed as a sum of three-body functions.
It is important to note that since both our systems contain two different elements, separate symmetry functions must be provided to describe the distribution of atoms of each element and with each of them as the central atom. For radial functions there are four possibilities, whereas for angular functions there are six possibilities, i.e. two types of central atoms times two types of neighbour atoms, and two types of central atoms times three possible types of neighbour pairs, respectively.
We choose the following radial symmetry function which is a sum of Gaussian functions multiplied by the cutoff function. The parameter η determines the width of the Gaussian. Depending on the value of η, the function has a specific range of arguments in which it changes most steeply and is therefore most sensitive to changes of the interatomic distances. We choose the values of η in such a way that the radial symmetry functions are equally spaced and the whole relevant range of interatomic distances is covered, η = 0.0001; 0.016; 0.4; 0.07; 0.12; 0.2; 0.3; 0.5; 0.9Å −2 .
In addition to the radial functions we also choose the following angular symmetry functions Here, θ ijk is the angle spanned by the atoms i, j and k with the atom i at the center. The functions G 3 i and G 9 i are called narrow and wide symmetry functions, respectively [27,30]. In contrast to G 3 i , in the case of G 9 i no cutoff function acts on the distance between the atoms j and k, such that the contributions of wider angles are not suppressed. The radial parts of the angular symmetry functions are identical with those of the radial symmetry functions. Their angular parts, on the other hand, depend on the angular distribution of the neighbouring atoms. Therefore, they are in fact sums of three-body functions. The parameter λ is either 1 or -1, which sets the maximum of the angular part at 0 • and 180 • , respectively. The value of ζ, on the other hand, controls the width of the function. We choose angular symmetry functions with λ = ±1 and with ζ = 1, therefore with only two different angular parts. The values of η in the radial part are chosen the same as for the radial functions (2).
The values of the symmetry functions for a given configuration are passed as the input to the neural network, which gives the total energy of that configuration as the output. Between the input and the output layer are hidden layers, which consist of nodes. The value at each node depends on the values in the nodes in the preceding layer. The particular way in which the values from one layer are transformed into the values in the nodes in the next layer is determined by the parameters of the neural network, called weights and biases and by the choice of the activation functions.
A neural network architecture with two hidden layers, each of which has 25 nodes is chosen. The contribution of the atom i to the total energy of the system is given by Here G i j is the jth symmetry function centered at the atom i, j max is the total number of symmetry functions for each atom, a kk+1 ij is the weight of the contribution of node i in layer k to node j in layer k + 1, b j i is the bias for node i in layer j and f j i is the activation function for node i in layer j. For the first and the second layer we choose the hyperbolic tangent as activation function and for the output layer the linear function is used.
The sum of energy contributions from each atom in the system E = i E i is the total potential energy of the given configuration. The force F i acting on atom i is obtained by calculating the gradient of the total potential energy with respect to the coordinates of atom i, F i = −∇E. For a three-dimensional system with N atoms, there are in total 3N force components (3 for each atom).
During the training of the neural network, the weights and biases are adjusted to minimize the difference between the energies and forces predicted by the neural network and the reference data. This difference is quantified by the cost function where indices i and j enumerate configurations and atoms, respectively. The total number of configurations is n, each of which contains N n atoms. The parameter β tunes the importance of the forces with respect to the energies. For the purpose of training the reference data are divided into a training set and a test set. This is done to ensure that no overfitting occurs, that is the error of the test set should be comparable with that of the training set. The neural network training proceeds in 50 epochs, during each of which all the parameters of the neural network are updated. Since the number of forces exceeds by far the number of energies, we use only a 0.0008 fraction of all the forces for the training so the number of energy and force updates is similar. That fraction is randomly selected at the beginning of each epoch. For optimization of the cost function (6) we use the extended Kalman filter [31][32][33], which has been implemented in n2p2 [30]. The Kalman filter is an algorithm originally used for estimating a dynamical system's unknown state based on a series of noisy measurements. Later it was shown that it can be also used for training neural networks [34].

C. Molecular dynamics
Once the neural network for a given system is trained, it can be used for performing MD simulations. The program n2p2 provides an interface to LAMMPS [27], which is used for all MD simulations in the present work.
Since our aim is to determine the diffusion coefficient of a cation interstitial, we carry out MD simulations for 4×4×4 PbTe and CdTe supercells with an additional atom inserted in the middle of a unit cell. In total, the systems consisted of 513 atoms. The simulations are performed with a timestep of 2 fs with a Nose-Hoover thermostat and barostat for temperatures ranging from 700 to 1200 K in intervals of 50 K. At each run the system is first equilibrated for100 ps. Then the simulation is continued for several nanoseconds and configurations are stored every 200 fs. Both for PbTe and CdTe, the MD runs lasted 8 ns for temperatures up to 800 K, 6 ns for 850 K and 4 ns for all the higher temperatures.
Due to an indirect mechanism of diffusion, in which the interstitial kicks out a lattice atom that subsequently becomes the new interstitial, it is important to identify the interstitial atom at each step. This is done by defining the interstitial as the atom with the largest number of neighbours. At high temperatures, this method may be not very reliable due to large fluctuations. Therefore, the event of the exchange of the interstitial atom in only considered in case it remains for ten consecutive steps. Otherwise the exchange is considered as transient fluctuation and ignored.
Using this criterion to identify the interstitial, the trajectory of the interstitial is determined and the the periodic boundary conditions are unwrapped. After n steps, the mean square displacement (MSD) [35,36] is estimated using Here, r i is the position of the interstitial after i steps with respect to its starting position and N is the total number of collected configurations. Expression (7) takes into account that for a time lag of n steps, there are N −n positions that can be used to determine the MDS. The estimated MSD is more accurate for shorter times, since there are more short time lags than long ones. In order to extract the diffusion coefficient from the MSD, expression (7) is plotted as a function of time t.
By fitting a linear function to these data, the diffusion coefficient D is obtained from the slope according to where d is the number of dimensions. In our case d = 3. With this procedure the diffusion coefficient can be studied in the whole range of temperatures. This further allows to calculate the activation energy by fitting the data to the Arrhenius law. Moreover, the number of jumps in the trajectory is counted by analysing the displacements of the interstitial. In doing so, two different types of diffusion jumps are distinguished, i.e., exchange and direct jump.

A. Neural network training
For for each of the two investigated materials, PbTe and CdTe, three independent neural networks with different random seeds chosen for the initialization of the values of the networks parameters were trained. Consequently, the final values of these parameters were different for each network also at the end of the training, even though all three networks represented the same system.
The configurations used for the training were created iteratively. The initial set consisted of one perfect and a few hundreds of distorted 4x4x4 supercells with the experimental lattice structures and the lattice constants optimized using VASP. Then configurations with slightly modified lattice constants were added to account for the thermal expansion of crystals at finite temperatures. Afterwards new configurations were taken from MD trajectories. During the MD simulations, an extrapolation warning was produced each time the value of some symmetry function was out of range for which the neural network was initially trained. The number of such warnings was monitored during the simulations and the configurations with the largest number of extrapolation warnings were added to the set. Additionally, predictions for the same trajectory by two independently trained neural networks were compared and those configurations for which the predictions differed considerably were also added. This procedure was repeated until there were no extrapolations warnings and no significant differences between different neural networks in the relevant range of temperatures.
After training the neural networks for defect-free crystals, a single cation interstitial was introduced to the systems, that is Pb in PbTe and Cd in CdTe. This required further ab initio calculations and further retraining of the neural network. For creating configurations with an interstitial the same procedure was used as for the defectfree crystal. Special attention was paid to configurations with the interstitial around the transition states. They were identified visually and added to the training set.
During the MD simulations there are fewer such configurations than those with the interstitial around one of the energetic minima but the former are very important for reproducing the correct energy barrier for the diffusion.
In total 4898 configurations of PbTe and 2866 of CdTe were generated, of which 90% were used as training set and 10% as test set. For the training all the energies from the test set were used and 0.08 % of the forces, which gave a comparable number of energy and force updates in each epoch. The relative importance of the forces was set to β = 5. The training proceeded for 50 epochs and the weights and biases from the last epoch were used for the MD simulations.  In Table I the root mean square errors (RMSE) for all the neural networks trained and used subsequently in this work are listed. They are comparable to RMSE obtained previously with neural network potentials for other materials [20,30]. In all cases the RMSE is similar and the error for the training and the test set is of the same order. We notice that for CdTe the errors are always smaller than for PbTe.

B. Diffusion of cation interstitials
Examples of trajectories of the defect obtained from the MD simulations are shown in Fig. 1. Along these trajectories, two distinct mechanisms of diffusion were observed. One of them involved simple hops of the diffusing interstitial atom between neighbouring interstitial sites, while the other mechanism was an exchange of the interstitial atom with one of the lattice atoms. During the latter process the interstitial took over the position of the lattice atom, which subsequently became a new interstitial. In PbTe and CdTe, both, hops and exchanges were observed. However, in PbTe exchanges represent the dominant mechanism, whereas in CdTe hops are the preferred one. Both mechanisms will be discussed in more detail in the next subsection.
In Fig. 2 the mean square displacement for a single cation interstitial in PbTe and CdTe is shown as the function of the time lag. The plots correspond to the trajectories generated at various temperatures for the potential nnp-1. As expected, for short time lags the dependence of MSD on time is linear. For longer times the averaging is done over fewer time lags with more correlations leading to larger statistical errors. Therefore, in order to obtain the diffusion coefficient we fitted a linear function to the linear part of the MSD curve, that is for short times between 2 and 14 ps. The same procedure was applied to extract the diffusion coefficient in all the trajectories studied.
Moreover, the convergence of the method applied is tested by means of block averaging [37,38]. For this purpose, the series of instantaneous square displacements at 10 ps, for which the MSD is still linear, is obtained for the whole trajectory. This series corresponds to the The variance of the block average is estimated by If the simulation is properly converged, the product M B σ 2 B (M SD) should reach a plateau s in the limit M B → ∞. The variance of the average MSD can be then estimated as and the error of the diffusion coefficient is equal to where t = 10 ps. In Fig. 3 the method is illustrated for nnp-1 for the instersitial trajectories in PbTe and CdTe at the temperature of 700 K. The product of the block size and the variance of the average MSD is plotted as the function of the block size. For both systems it reaches a plateau around M B = 3500. It corresponds to the values of s around 8.0 · 10 4Å4 in PbTe and 1.2 · 10 4Å4 in CdTe, and the errors of the diffusion coefficient 7.458 · 10 −7 cm 2 /s and 9.134·10 −7 cm 2 /s, respectively. The same procedure is performed in the whole range of the temperatures studied and for all three neural network potentials.
The temperature dependence of the diffusion coefficients is shown for all neural network potentials in the Arrhenius plots in Fig. 4. The values of the diffusion coefficient at particular temperatures are represented by the dots with the corresponding error bars. Additionally, the data points are fitted with the Arrhenius equation where β = 1/(k B T ). Here, E a is the activation energy for the diffusion and D 0 is the diffusion prefactor. Both parameters can be extracted from the fit.  In Table II the activation energies and the diffusion prefactors for all the Arrhenius plots in Fig. 4 are collected. For PbTe the activation energies tend to be lower than for CdTe, which suggests that diffusion of intersti- On the other hand, during an exchange the interstitial replaces one of the nearest lattice Cd atoms, which subsequently becomes the new interstitial. As seen in the scheme, one exchange can be equivalent to two hops. tials in PbTe occurs more easily than in CdTe. On the other hand, the diffusion prefactors in CdTe are higher than in PbTe. Due to competition between these two effects, the diffusion coefficient is larger in PbTe at low temperatures, but at high temperatures it is higher in CdTe. For PbTe nnp-2 the diffusion prefactor is twice as low as for the other two PbTe neural network potentials. However, it is compensated by the lower activation energy for nnp-2. As can be seen in Fig. 4, the diffusion coefficient for PbTe nnp-2 at low temperature is slightly higher than for nnp-1 and nnp-3.

C. Diffusion mechanisms
As already mentioned, two different mechanisms of interstitial diffusion were observed in the MD simulations of PbTe and CdTe. In one of them the interstitial atom simply jumps between two different energetic minima, in the other one it exchanges its position with one of the lattice atoms. We refer to these mechanisms as hops and exchanges, respectively.
The mechanisms of diffusion in PbTe are illustrated in Fig. 5. Since there is only one equilibrium position for the interstitial atom in PbTe and two in CdTe, the diffusion in the former is simpler. During a hop, the interstitial Pb atom jumps between the nearest-neighbouring minima along the [100] direction. The length of the jump is equal to a/2. On the other hand, during the exchange process the interstitial Pb atom kicks out one of its four neighbouring Pb lattice atoms, which in turn assumes the position in one of the neighbouring minima, either in the [110] or [111] direction relative to the original position of the first interstitial. The effective jump of the interstitial Pb atom is therefore (a/2, a/2, 0) or (a/2, a/2, a/2) and analogously in all the equivalent crystallographic directions. Hence, during an exchange the interstitial moves by a distance of a √ 2/2 or a √ 3/2, respectively. Analogously, in Fig. 6 the diffusion mechanisms in CdTe are shown. The diffusion of interstitial Cd atoms is more complex. As mentioned in Sec. II, for the interstitial Cd atom there are two different equilibrium positions labelled T a (tetrahedral anion site) and T c (tetrahedral cation site). Moreover, the Cd interstitial can exist in three charge states [39,40]: one neutral and two charged ones, Cd + and Cd 2+ . The height of the energy barrier between the interstitial sites depends on the particular charge state. In the present case the interstitial atom occupies most of the time the T a site and only for a very short time it is found in T c position, which corresponds to the state Cd 2+ discussed in Refs. 39  One exchange is therefore effectively equivalent to two successive hops.
As can been seen from the above analysis of the diffusion mechanisms, in both materials, PbTe and CdTe, the interstitial jump is longer for an exchange than for a hop. As a consequence, even though there are fewer exchanges than hops in CdTe, the contribution of both mechanisms to the total diffusion coefficient is comparable. From the MD trajectories the number of jump events for both hops and exchanges was extracted. The corresponding jump rates k are shown as an Arrhenius plot for all considered neural network potentials and compared for PbTe and CdTe in Fig. 7. This quantitative analysis confirms the observation that exchanges dominate in PbTe and hops in CdTe. However, for both materials their ratio becomes closer to 1 as the temperature increases. At 700 K one of the jump rates is always around one order of magnitude higher than the other one, while at 1200 K they are of the same order.  As can be inferred from Fig. 7, for both mechanisms the jump rates follow the Arrhenius law. Therefore, in analogy to the total diffusion activation energy, the particular activation energies for hops and exchanges can be determined by fitting a line to the Arrhenius plots in Fig.  7. In Table III, these activation energies, E hops for hops and E ex for exchanges, are summarized for PbTe and CdTe. As expected, for PbTe the activation energy is higher for hops than for exchanges and for CdTe it is the opposite. For each neural network potential the effective activation energy given in Table II is located between the corresponding energies listed in Table III. These separate activation energies can be further used to fit a more complex function to the data than the Arrhenius law. The diffusion coefficient for a particle which jumps by means of two independent energetically activated mechanisms can be written as the sum where D hops 0 and D ex 0 are the diffusion prefactors for hops and exchanges, respectively. The results of this fitting are summarized in Table IV and the corresponding curves are shown in Fig. 8. By comparing the curves with those in Fig. 4, it can been seen that considering separate  Diffusion of Pb in PbTe was studied experimentally in Ref. 41. The activation energy was measured to be 249 meV, which is close to the values reported in this work (322, 250, 338 meV). In contrast, the diffusion prefactor was found to be 3.1×10 −6 cm 2 /s, which is three orders of magnitude smaller than our result. However, the method used in Ref. 41 is based on radioactive isotopes, which does not take into account the exchange mechanism.
The diffusion coefficient for a Cd interstitial in CdTe was measured in Ref. 42. The value obtained there for the temperature 800 K was 1.75·10 −6 cm 2 /s, which is one order of magnitude smaller than the value calculated in this work (1.95·10 −5 ; 1.80·10 −5 ; 1.61·10 −5 cm 2 /s).

V. CONCLUSIONS
In this work, the diffusion processes of interstitial Pb and Cd atoms have been studied in a supercell of PbTe and CdTe, respectively. Futhermore, a procedure of extracting the value of the diffusion coefficient from the trajectories generated by neural network based MD simulations has been demonstrated. For the training of the neural network potentials ab initio data calculated with the PBEsol functional were used. For both systems the results for the diffusion coefficients for three different in- dependently trained neural network potentials were compared. The corresponding activation energies were ex-tracted and it was found that they differ from each other by no more than 100meV, which can be viewed as the accuracy of the neural network approach used in this work for the calculation of activation energies.
Both in PbTe and CdTe two different mechanisms of interstitial diffusion have been observed, namely hops and exchanges, for which separate activation energies have been extracted. Hops were more frequent in CdTe and exchanges in PbTe. However, in both materials exchanges had longer effective jump lengths. Therefore, interstitial diffusion is dominantly controlled by exchanges in both materials. Taking into account two diffusion mechanisms with different activation energies explains the deviations of the temperature dependence of the diffusion coefficient from the Arrhenius law.
The mechanism of atom exchange is particularly interesting in the context of the morphological transformations observed experimentally in PbTe/CdTe systems. In this work, only self-diffusion of cations within either PbTe or CdTe has been considered. However, the framework presented here can be also used to study the diffusion of foreign atoms in the crystal. One can expect that Pb cations introduced in CdTe exchange with the host Cd cations and Cd cations in PbTe exchange with the host Pb cations. This could lead to a subsequent rebuild of the local crystal structure, which in turn could be an underlying mechanism for the morphological transformations observed in the PbTe/CdTe growth and annealing experiments. Further studies in this direction will be necessary to clarify this question.