In operando active learning of interatomic interaction during large-scale simulations

A well-known drawback of state-of-the-art machine-learning interatomic potentials is their poor ability to extrapolate beyond the training domain. For small-scale problems with tens to hundreds of atoms this can be solved by using active learning which is able to select atomic configurations on which a potential attempts extrapolation and add them to the ab initio-computed training set. In this sense an active learning algorithm can be viewed as an on-the-fly interpolation of an ab initio model. For large-scale problems, possibly involving tens of thousands of atoms, this is not feasible because one cannot afford even a single density functional theory computation with such a large number of atoms. This work marks a new milestone toward fully automatic ab initio-accurate large-scale atomistic simulations. We develop an active learning algorithm that identifies local subregions of the simulation region where the potential extrapolates. Then the algorithm constructs periodic configurations out of these local, non-periodic subregions, sufficiently small to be computable with plane-wave density functional theory codes, in order to obtain accurate ab initio energies. We benchmark our algorithm on the problem of the screw dislocation motion in bcc tungsten and show that our algorithm reaches ab initio accuracy, down to typical magnitudes of numerical noise in DFT codes. We show that our algorithm reproduces material properties such as core structure, Peierls barrier, and Peierls stress. This unleashes new capabilities for computational materials science toward applications which have currently been out of scope if approached solely by ab initio methods.


Introduction
It is now understood that the design of novel materials with exotic, unprecedented mechanical properties, specifically metallic alloys, requires a computer-assisted approach since many of the underlying mechanisms cannot be directly quantified by real experiments and are often not even observed. The state-of-the-art computing hardware and algorithms can simulate hundreds of atoms by direct atomistic simulations with density functional theory (DFT) which allows for an accurate prediction of ideal crystalline properties, such as phase stability or elastic moduli. However, on the macroscopic scale many critical material properties, such as yield strength or fracture toughness, are determined by the microstructure and, hence, require a multiscale approach. Presently, multiscale simulations are prohibitively expensive with DFT and, in lieu thereof, carried out with empirical interatomic potentials. However, existing empirical potentials can, at their best, only suggest a qualitative mechanism but remain inadequate for a quantitative understanding of multicomponent systems and chemical interactions.
A popular multiscale approach in the materials science community are quantum mechanics/molecular mechanics [QM/MM, 39,60,14,12,6] methods, where only a small subset of the computational domain, usually in the vicinity of crystalline defects, is treated quantum-mechanically while the remainder is approximated by interatomic potentials. A newer class of methods are classical atomistic simulations using machine-learning interatomic potentials [MLIPs, e.g., 4] which-in comparison with empirical potentials-admit a flexible and generic functional form allowing to solve any problem with arbitrary accuracy, at least in theory (cf. [48]). This makes them a promising candidate for multiscale simulations since using an interatomic potential everywhere is orders of magnitudes faster than retaining a quantummechanical model in parts of the computational domain.
Based on these premisses, different MLIPs have been proposed in recent years, mainly differing in their representation. Recent review papers [38,63] compare a number of interatomic potentials, namely the neural network potential [NNP,5], the Gaussian approximation potential [GAP, 2, 1], the spectral neighbor analysis potential [SNAP,55] and the moment tensor potential [MTP,48]. Differences in the potential representation mainly influence the computational cost but all the above-mentioned MLIPs are able to produce (qualitatively) similar results. Several other representations and their variants have been proposed to date [11,46,29,57,19,49,56], including related approaches in molecular modeling [13]. Hence, MLIPs can now be considered as sufficiently mature and, consequently, applications are rapidly emerging. The first notable contributions toward applications in computational metallurgy are the GAPs for bcc tungsten and iron by Szlachta [53], Dragoni [17], and the Al-Mg-Si NNP by Kobayashi et al. [31]. It was shown by the authors that these potentials can reproduce crucial properties that govern the mechanical and thermodynamical behavior, e.g., vacancy formation energies, stacking fault energies or phonon spectra.
On the other hand, a major drawback of state-of-the-art MLIPs is their poor extrapolation beyond the known training configurations. Therefore, the most time-consuming step in developing these MLIPs is the construction of the training set. It takes months, or sometimes years, to manually construct all the needed configurations that would be representative of all configurations appearing in a simulation; this is done based on physical intuition and best practices (see the PhD theses [54,18]). This approach might work well for simple problems, e.g., single-component materials, in which atomistic mechanisms of many phenomena are well-understood. However, a manual, "human-developed" procedure will be too involved to effectively address more complex scenarios. For example, the GAP for iron [18] has indeed been shown to successfully reproduce crucial properties of screw [35] but not of edge dislocations [21].
A different approach is active learning [47]. In contrast to passive learning, where the training set is finalized before running a simulation, any configuration appearing in a simulation is allowed to become a part of the training set. In a nutshell, the idea is to estimate the magnitude of the error of the MLIP, its extrapolation grade, during the simulation and, should it exceed some threshold, recompute energies and forces (and stresses, if necessary) with the underlying ab initio model and retrain the potential including the newly acquired data. In the field of atomistic modeling, active learning has been initially proposed in the context of QM/MM methods by De Vita and Car [15], Csányi et al. [14]-to learn force fields of recurring atomic neighborhoods on-the-fly during the simulation-and continues to develop to date [34,7,29]. For MLIPs, an active learning method has been proposed by Behler [3] using a query-by-committee strategy (cf. [47]). The query-by-committee algorithm have also been used in other neural-network potentials [62,50]. Bayesian predictive variance has also been used in Gaussian process-based potentials [29,57]. In this work we use the approach based on D-optimality that was proposed by Podryabinkin and Shapeev [42] who estimate the extrapolation grade by means of the growth in the determinant of the system matrix associated with the loss function. Their method has been successfully applied to predict molecular properties [25], diffusion processes in metals [37] and phase stability of (random) alloys [24,43], alongside with a reduction of the necessary amount of DFT calculations by several orders of magnitude. However, despite these recent successes, applying active learning to larger simulations with thousand or more atoms requires a novel approach since DFT codes cannot efficiently handle thousands of atoms. Such "larger simulations" are necessary to solve problems which involve extended defects with important properties affecting the material microstructure, e.g., dislocations, cracks or grain boundaries; typical properties are their mobility, possibly altered by interactions of the defect with impurities (or other types of defects), and their long-range behavior. In particular, the number of atoms rapidly exceeds the tens of thousands when considering intrinsically three-dimensional extended defects, such as kinked or jogged dislocations.
Building on [42], the main aim of the present work is thus to develop an active learning algorithm for large-scale problems containing extended defects. The idea we pursue is as follows: from the large-scale computational domain we extract local configurations of 100-200 atoms, small enough to be feasible to compute with DFT, and measure the extrapolation grade therein. Should the extrapolation grade in one of them become too large, we automatically convert it to a periodic configuration maintaining the atomic neighborhoods from the large-scale problem, run a DFT calculation on this configuration and subsequently extend the training set.
While the algorithm, described in the previous paragraph, appears similar to the "learning-on-the-fly QM/MM", it is novel in our understanding and does not share some of the drawbacks of QM/MM as will be made clear in the following. The distinguishing feature of our algorithm is the use of a MLIP in the entire computational domain-as opposed to the direct coupling between DFT and interatomic potentials in QM/MM-which allows us to compute energetic quantities directly, such as energy/activation barriers or defect formation energies. In QM/MM, energetic quantities are usually computed using indirect methods (e.g., [52]) since the quantum-mechanical energy is a global quantity and cannot be computed for clusters of atoms. Nevertheless, it is still necessary that the periodic training configurations are not "too" artificial to avoid degrading the accuracy of the MLIP. To that end, we will present a new method to construct such periodic training configurations by symmetrizing the atomic positions so that atoms at the domain boundaries enjoy neighborhoods close those in the large-scale problem.
We test our methodology by predicting properties characterizing screw dislocation motion in bcc tungsten, i.e., we perform simulations in order to calculate the core structure, Peierls barrier and Peierls stress. Thereby, the focus is set on demonstrating that our training procedure ensures that the MLIP is able to (i) learn the right training configurations during these simulations and (ii) reliably predicts energy differences and forces.

Summary of the active learning algorithm
Suppose we are running an atomistic simulation and {r i } is our configuration of atoms. The total energy of {r i } is, in the case of most interatomic potentials, including machine-learning potentials, given by a sum over contributions of per-atom energies as functions of atomic neighborhoods In this work we model the per-atom energies using moment tensor potentials [MTP,48] such that where the θ α 's are the fitting parameters and B α ({r ij }) the basis functions. For further details on the potential representation the reader is referred to the "Methods" section. In order to judge on the accuracy of the MTP, there exist a number of active learning algorithms [3,62,50,29,57] that are able to automatically assess whether {r i } is well-represented in the training set. If not, one could add {r i } to the training set after computing its DFT energy and gradients. However, if {r i } consists of 1000 or more atoms then this strategy is not feasible: DFT is very time-consuming and scales cubically with the total number of atoms.
In this work we develop an algorithm to learn interatomic interactions, as given by DFT, by automatically extracting small, representative configurations of 100-200 atoms with active learning and composing our training set from these small configurations. Our algorithm has two components: (i) detection of extrapolative neighborhoods, (ii) completion of such neighborhoods to small periodic configurations efficiently computable with plane-wave DFT codes.
Our algorithm of extracting neighborhoods is based on the D-optimality criterion developed for MTPs in [42,24]. In particular, we adapt the algorithm for detecting per-neighborhood extrapolation. In such an algorithm we determine the extrapolation grade based on how far the descriptors, B α ({r ij }), are from those of the training set. To be more precise, we calculate the extrapolation grade for nonlinearly parameterized potentials; this calculation is a natural generalization [S1] The simulation runs until active learning detects extrapolation in {r * i } containing the dislocation core atoms.
[S3] Construction of a periodic configuration which captures the extrapolative neighborhoods of {r * i } but removes the artificial ones at the boundary in order to accurately compute the dislocation core energy.
[S4] The potential is retrained on the updated training set. The coloring of the atoms is due to the centrosymmetry parameter [CSP,30] in order visualize their deviation from the bulk crystal configuration of the linear case (the one with basis functions B α ({r ij })) developed in [24] and outlined in Supplementary Section 1. This grade allows us to formulate a practically universal criterion by using a threshold for permissible extrapolation.
In order to describe the basic structure of the algorithm, we now assume that {r i } is the configuration shown in Figure 1 containing a screw dislocation whose atoms in the rectangular region around its core are denoted by {r * i }. Suppose that the MTP has been already trained sufficiently well on the ideal crystal structure such that extrapolation will only occur in {r * i }. In the first step [S1] we run the simulation until active learning computes an extrapolation grade in {r * i } which exceeds some threshold. This will be the case when the dislocation has moved to a new position whose core structure is not yet represented in the training set. In the second step [S2] we then extract this extrapolative {r * i } from the large-scale configuration. However, running a DFT calculation on {r * i } after naively applying periodic boundary conditions would critically degrade the accuracy of the MTP by training it on many artificial neighborhoods that occur at the boundary (guide the eye to the coloring of the atoms). Therefore, in the third step [S3], we construct a periodic configuration maintaining the extrapolative neighborhoods in the vicinity of the dislocation core but having the boundary atoms close to those in {r i }, run a DFT calculation and add it to the training set. From the figure it can be seen that this is a more involved endeavour since the number of atoms as well as the shape of this periodic configuration differ from {r * i }-but we will describe this procedure in more detail in the "Methods" section. In the last step [S4] we then retrain the potential and subsequently restart the simulation.

Details of the problem
We now test our active learning algorithm for 1/2 111 screw dislocation motion in bcc tungsten in order to predict the core structure, the Peierls barrier and the Peierls stress. Our large-scale problem is a cylindrical configuration of atoms and we choose the region where we check extrapolation of the MTP as the collection of atoms in the rectangular region with dimensions l x = 5 √ 3a 0 / √ 2 and l y = 9a 0 / √ 2, where a 0 denotes the lattice constant, centered at the dislocation core (cf. Figure 1). From this extrapolation region we construct the training set, as proposed in the previous section, leading to a total number of 135 atoms in the periodic configurations (i.e., in {r tr i }, cf. Figure 7 (4)). The size of the cylindrical configuration is ≈ 5700 atoms for the core relaxation and the Peierls stress calculation, respectively. For the barrier calculations the size is ≈ 1400 atoms. We also compare the results obtained by using the cylindrical configuration to the results obtained by using the classical 135-atom dipole configuration (e.g., [58]). In the following we therefore refer to them as the cylinder and the periodic problem, respectively.
For validation purposes, we have also trained the MTP with respect to two empirical interatomic potentials based on the embedded atom method (EAM), namely the EAM3 and EAM4 potentials developed in [36]. For clarity, we therefore refer to the MTP trained with respect to DFT as MTP-DFT and the MTPs trained with respect to the EAM potentials as MTP-EAM3 and MTP-EAM4, respectively. For compactness, we only present a detailed discussion of the results for the large-scale cylinder problem using the MTP-DFT, which is our main contribution, and refer to the results of the validation (such as training errors etc.) for the MTP-EAM3 and the MTP-EAM4, and also the MTP-DFT for the periodic problems, to Supplementary Section 3.

Core relaxation
Starting from an initial configuration subject to the linear elastic solution of a screw dislocation, the active learning algorithm converged after 63 iterations with a training set size of 40 configurations. The training errors reported in Table 1 are in the range of the numerical approximation of DFT codes themselves. Therefore, the MTP-DFT essentially reproduces the training data without introducing further errors.
The differential displacement map for the relaxed core structure computed with the MTP-DFT is shown in Figure  2. It can immediately be seen that our algorithm predicts the compact core structure from previous established DFT studies (cf., e.g., [45,58]). To quantify this result, we show the energy difference between the initial linear elastic and the final relaxed configuration in Table 2. This is of course not feasible to compute with a reference DFT calculation so that we compare our result to the 135-atom periodic problem. The energy difference is ≈ 2× larger for the periodic problem, which is expected since there are two dislocations in this system. In addition, we have computed the energy difference for the screw-cylinder configuration using the EAM4 potential as a reference model-where we can compute Error in: 0.0494 Table 1: Training errors corresponding to the core relaxation for the MTP-DFT; the errors are of the same order than the numerical precision of our (already conservative) DFT calculation and, consequently, the MTP-DFT is able to reproduce the DFT model practically without error 112 direction 110 direction  Table 2: Energy differences (in eV per Burgers vector) between the initial and final (relaxed) configurations. The MTP-DFT exactly reproduces the DFT value for the periodic problem while the small discrepancy between the MTP-EAM4 and the EAM4 is likely due to the nonsmooth potential energy surface (cf. Figure 3). Since this discrepancy is the same for both the periodic and the cylinder problem, and, moreover, the fitting errors are in the range of DFT precision (cf. Embedding function F (ρ) 3 4 Density function ρ(|r ij |) ; the ranges of the function arguments are adapted to the atomic neighborhoods occurring during the simulation the exact solution. Here, the error of the MTP-EAM4 is ≈ 2% which confirms the excellent agreement between both models and provides further evidence that our algorithm reliably extracts the core energy; note that an error of 1-2 % also occurs for the periodic problem (compare the EAM4 and MTP-EAM4 values in Table 2).
It is interesting to note that it was "easier" for the MTP to reproduce DFT than EAM4-the difference between MTP and DFT is less than 1 meV per Burgers vector, while the difference between MTP and EAM4 is of the order of 10 meV per Burgers vector. Speculatively, we attribute it to the fact that the MTP has a flexible functional form which makes it easy to reproduce the smooth underlying DFT energy (at least, when the configurational space is not too large), whereas fitting an EAM potential introduces an oscillatory behavior commensurate with atomic shell radii in the pair potential function as shown in Figure 3. Such a nonsmooth behavior appears to be more difficult to fit with a smooth MTP radial basis than fitting the DFT energy.

Peierls barrier
Using the relaxed configuration from the previous section, we now compute the Peierls barrier for screw dislocation motion under zero stress by calculating the minimum energy path using the nudged elastic band [NEB,26] method. We create the final image by translating the displacement field of the relaxed core structure by 2/3 a 0 along the 112 direction, i.e., to the next easy-core position on the {110} plane. To create the initial guess for the intermediate images, we use the same displacements according to their reaction coordinate.
Using an NEB with 5 images, the active learning algorithm converged after 38 iterations with a training set size of 22 configurations. Therefore, our algorithm only required a full DFT calculation on the 135-atom training configuration roughly every 9th iteration. The training errors in Table 3 are slightly worse than for the core relaxation, which is expected since the training set is now more diverse due to the intermediate core configurations. However, a maximum error in the total energy of 5.6 meV (per configuration) is still a very small quantify since the Peierls barrier is more than an order of magnitude higher (cf. Table 4). The corresponding energy curve is shown in Figure 4 (a). Again, since a reference DFT calculation is unfeasible, we have compared our result to the one obtained from a periodic problem calculated using solely DFT. According to the convergence study by Grigorev et al. [23], the difference between a periodic problem with 135 atoms and an effectively infinite configuration should be in the range of only a few percent. From Table 4 Table 4: Peierls energies in eV per Burgers vector corresponding to the energy curves in Figure 4 (a); the small difference of ≈ 1 meV per Burgers vector and less between the available MTP-DFT/MTP-EAM4 and DFT/EAM4 values coincides with the training errors (see Supplementary Section 3.2) and is thus also expected between the MTP-DFT and DFT for the cylinder problem (cf. Table 3) and thus in agreement with their analysis. For testing purposes, we have also carried out calculations with the EAM4 potential as our reference model, as in the previous Section. Here, the energy difference between the periodic and the cylinder problem is ≈ 6%. Moreover, the Peierls energies for the EAM4 and the MTP-EAM4 coincide almost perfectly which further validates our training methodology. In addition, the differential displacement map for the relaxed central image is shown in Figure 4 (b) which demonstrates that the MTP-DFT also correctly predicts the split-core configuration observed in our and recent DFT studies.

Peierls stress
Finally, we test our algorithm for the screw-cylinder problem subject to a far-field applied stress in order to predict the Peierls stress, that is, the stress under which the dislocation starts moving. To this end, we apply a deformation corresponding to the shear stress τ yz to all atoms and subsequently relax the system. During relaxation, we detect the position of the dislocation core and update the extrapolation region accordingly as shown in Figure 1.
We have applied an initial stress of ≈ 1.74 GPa and increased it in 0.1 GPa-steps upon convergence. The dislocation then moved to the next easy-core position when the stress was 2.34 GPa. Hence, we define the obtained Peierls stress as 2.29 ± 0.05 GPa. This value is in the same range as previous DFT studies as shown in Table 5. Nevertheless, we would like to remind the reader that these previous studies use indirect methods; e.g., the method in [16] is based on fitting a Kocks-type function using the Peierls barriers from several periodic calculations subject to different applied stresses as fitting parameters. They are therefore computationally significantly more expensive, require prior knowledge of the slip plane and are also prone to errors due to the analytic fit.
DFT [44]    In order to judge on the accuracy of our method, we compute the Peierls stress with the EAM3 potential as the reference model which allows us to perform an exact large-scale calculation. 3 Following the same procedure as described above, except that we now increment the applied stress by 0.01 GPa, the predicted Peierls stress for the MTP-EAM3 agrees well with our large-scale reference calculation showing an error of only 6-8% (cf. Table 5). Moreover, this corresponds to the relative fitting errors for the MTP-EAM3 shown in Supplementary Table 9. Given that the training errors for the MTP-EAM3 and the MTP-DFT (Table 6) are in the same range, it is likely that the DFT Peierls stress can also be predicted with the MTP-DFT up to an error of 6-8%. Should we need a higher accuracy, we may simply increase the number of basis functions of the MTP but note as well that errors up to 10% are absolutely tolerable when the Peierls stress is used to calibrate a higher-scale model, such as discrete dislocation dynamics.
To validate the correct functioning and demonstrate the capabilities of the active learning algorithm, we show the evolution of the dislocation position (with respect to the origin) and the training set size as a function of the iteration in Figure 5 (a). Having crossed the first Peierls barrier after ≈ 550 iterations, the training set contained 20 configurations, increased to 22 while crossing the second Peierls barrier, and then continued gliding without triggering new DFT calculations. This behavior is expected since the dislocation moves by alternating from 1 ○ easy-core → 2 ○ split-core → 3 ○ easy-core → ... etc. as shown in Figure 5 (b) and does therefore not encounter any significantly new, i.e., extrapolative, configurations.
(a) (b) Figure 5: (a) Dislocation position in dp = 2/3 a 0 , where dp is the distance between two easy-core positions, and training set size as functions of the iteration for the Peierls stress calculation using the MTP-DFT; the figure demonstrates that the active learning algorithm works properly, showing that no further configurations are added to the training set after the dislocation has crossed the first Peierls valley after ≈ 700 iterations (and henceforth only encounters "known" neighborhoods). (b) Visualization of the dislocation core structure using a differential displacement map at selected iterations from (a), i.e., 1 ○ before, 2 ○ during and 3 ○ after crossing the first Peierls barrier; thereby, the dislocation core structure correctly changes from compact to split, and back again to compact, without admitting any other intermediate configuration (which also explains the small training set size of 22 configurations shown in (a))

Summary
We have developed an active learning algorithm for large-scale atomistic simulations using moment tensor potentials (MTPs), a class of machine-learning interatomic potentials (MLIPs). Our algorithm allows for a novel way of performing large-scale simulations with DFT accuracy fully automatically by letting active learning find the right training data during the simulations-without the need for any prior (passive) training. We have applied the algorithm to model screw dislocation motion in bcc tungsten and shown that the MTP is able to reproduce known mechanically relevant properties from the literature, such as core structure, transition state barriers and Peierls stress, alongside with a significant reduction in the necessary amount of DFT calculations compared to standard methods.
More specifically, we have shown that active learning is able to reliably detect a minimum set of sufficiently distinct training configurations. This is in particular demonstrated for the Peierls stress calculation, where the dislocation moves through the effectively infinite crystal: in the beginning, active learning requests new training data during the change of the core structure but stops when the dislocation repetitively encounters the same configurations after crossing the first Peierls valley. Moreover, using our symmetrization strategy to convert extrapolative regions into periodic configurations, the MTP gave accurate predictions for forces as well as energy differences.
Our algorithm can in principle be applied by a user in the same way as a standard large-scale atomistic simulation. Therefore, no additional expert knowledge is required in order to extract physical quantities, such as the Peierls stress, from indirect DFT methods. This opens new prospects for using MLIPs in general as a robust tool for DFT-accurate simulations of extended defects.

Outlook on potential applications
Our way of computing the Peierls stress further shows that we can include far-field applied stresses which influence the core structure of the dislocation and its the elastic field. We therefore expect that our algorithm is directly applicable to related problems involving interactions between dislocations and different types of defects, e.g., vacancies or other dislocations.
While we have assumed that nonlinearities are confined to the small-sized extrapolation region(s), we have demonstrated that active learning significantly reduces the amount of actual DFT calculations. If the size of the defect core increases to several hundreds of atoms, e.g., in the case of dissociated dislocations in fcc lattices, the saved computing capacities may thus be invested in a few larger calculations in order to compute energies; combined with many small cluster calculations in order to compute forces.
Further, we think that our methodology will also be useful for inherently three-dimensional problems, e.g., curved dislocations. For this class of problems, we currently envision two possible approaches. The first possibility is to train a MLIP on several selected straight dislocations with different character angles. The idea is then to switch off active learning and use the MLIP as a standard interatomic potential. The second possibility is to proceed as we do in the present work, i.e., we carve out a collection of atoms in the vicinity of the dislocation core and construct a periodic configuration. This procedure is illustrated in Figure 6 for a three-dimensional dislocation in a bcc lattice moving by means of the kink-pair nucleation mechanism (see, e.g., [27]). Both directions will be explored in future work.  It is finally worthwhile to emphasize that our methodology does not exclude multi-component materials involving alloying elements and chemical impurities, e.g., hydrogen or helium, as most MLIPs already include the necessary descriptors (e.g., [5,24]). The discovery of new materials mainly comes through the search for novel alloys and we think that investigating corresponding mechanisms using MLIPs in combination with active learning will be one of the most auspicious potential applications since the configurational space will likely be too vast to be explored with only passively selected training sets.

Moment tensor potentials
Potential representation. The basis functions from (2) for MTPs are given by the following form [48] where are the moment tensor descriptors. Here, the f µ l 's are radial basis functions which vanish for |r ij | > r cut , where r cut is a predefined cut-off radius. We characterize the degree of an MTP by its level. That is, an MTP of level d implies that the B α 's comprise all basis functions which satisfy (2µ 1 + ν 1 ) + (2µ 2 + ν 2 ) + ... ≤ d. For all numerical examples, conducted in this work, we have used an MTP of level d = 16 with a cut-off radius r cut = 5Å.
Training procedure. Since we perform structural relaxation and compute energy barriers, we need to fit the MTP with respect to energies and forces. Therefore, we compute the hyperparameters θ by minimizing the loss functional where f r , f qm r are the MTP, respectively, the quantum-mechanical forces on an atom r i corresponding to a specific configuration {r tr i } in the training set T . The C-constants are regularization parameters which we set to throughout this work.

Active learning algorithm
The first component of our active learning algorithm is the detection of extrapolative neighborhoods. To that end, given a neighborhood {r * ij } of the i-th atom in a configuration {r i }, we compute the extrapolation grade γ = γ({r * ij }) using the neighborhood selection criterion from [42] based on D-optimality [47]. That is, we imagine the existence of per-atom energies E qm and fit the parameters θ to them: Since (7) would be overdetermined, we select an m × m submatrix A of the system matrix associated with (7). The m rows in A, representing a neighborhood {r ij } from the training set, are chosen to maximize |det A | using the maxvol algorithm [22]. The extrapolation grade is then computed by solving the linear system (cf. [42], Section 3.3) If the size of our atomistic configuration {r i } was ≈ 100-200 atoms, then our job would be done-we add this configuration to the training set together with the DFT-computed energy, forces, and stresses and continue (or restart) the simulation. This is, however, not feasible with larger systems because collecting DFT data scales cubically with the system size. Nevertheless, if the extrapolative neighborhoods {r * ij } appear in subsets {r * i } ⊆ {r i } of 100-200 atoms which are sufficiently far from each other, we can construct several training configurations which, together, indeed capture all extrapolative neighborhoods. The justification of this approach is evident if the boundary atoms of a configuration {r * i } have neighborhoods which are close to the ones they would have in the corresponding training configuration obtained by periodically replicating {r * i }. This is of course not the case if {r * i } contains extended defects, such as dislocations, as illustrated in Figure 1.
Our novel contribution here is the extension of active learning to precisely such cases by combining the extrapolation detection with a second component, the construction of periodic training configurations which capture the extrapolative neighborhoods {r * ij } appearing in {r * i }. The details on how this is done are given in the next section; first we describe the full active learning algorithm.
During an atomistic simulation we may frequently encounter extrapolative configurations. Instead of retraining the potential immediately after detecting any extrapolation, we propose a greedy strategy that runs the simulation for a certain number of iterations 4 and subsequently selects a minimal set of configurations which must be added to the training set in order to ensure that γ remains below some threshold. In this way we allow for training configurations further outside the training set which potentially interpolate on otherwise selected configurations using an iteration-wise scheme. The individual steps of the full algorithm, which is schematically depicted in Figure 1, are described below: [S0] Define the atomistic configuration {r i } and select the subset(s) {r * i } ⊆ {r i } where to compute the extrapolation grades of the atomic neighborhoods.
[S1] Run the atomistic simulation for N iterations. For all n = 1, ..., N , compute the highest extrapolation grade γ n in {r * i } n after every iteration n.
[S2] Stop the simulation after the N iterations-or if γ n exceeds a chosen threshold γ max -and add all configurations {r * i } n whose extrapolation grade is larger than γ min to the set of training candidates.
The steps [S1]-[S4] are then repeated until convergence or the maximum number of iterations is attained. For the dislocation core relaxation we set the bounds γ min and γ max , within which we consider configurations as potential candidates for the training set, to 1.1 and 10, respectively. This choice is close to the optimum (cf. [42]) and should therefore yield the best possible accuracy that one could achieve with an MTP in practice. For the Peierls barrier and Peierls stress calculations we use γ min = 2 and γ max = 10. The parameter N , that is, the number of iterations we perform before we retrain the MTPs, is of the order of 30 for all our simulations.

Construction of periodic training configurations
We now turn to the question of how to construct an appropriate training set from configurations {r * i } in which the extrapolation grade exceeds a certain threshold. This problem is similar to hybrid quantum/classical mechanics (QM/MM) methods, where a small, but arbitrarily-shaped, cluster of atoms, treated with an ab initio model, is embedded in a much larger domain computed with interatomic potentials. To consider such clusters of atoms using plane-wave DFT, the common practice in QM/MM is the use a combination of buffer and vacuum regions to separate the QM cluster from its periodic images [60,59,61,6,21]. In principle, this idea also applies to our algorithm, however, it does not allow one to compute the quantum-mechanical energy (of the cluster) which is a global quantity corresponding to the entire periodic cell. 5 Fortunately, in the case of MLIPs, we do not stringently require exact total energies-it suffices that the MLIP is able to interpolate on the large-scale configurational space. Nevertheless, it is still necessary to prevent free surfaces and/or other types of spurious boundary effects to avoid training on artificial neighborhoods irrelevant to the large-scale problem. We therefore present a new method which properly symmetrizes the atomic displacements at the periodic domain boundaries to ensure that the neighborhood of the atoms outside the dislocation core is close to the one they would "feel" in the large-scale problem.
The idea is to extract the displacementsũ i (r 0,i ) = r i − r 0,i , where r 0,i is the ideal lattice site of the i-th atom, around the dislocation core and apply it to a periodic configuration, small enough to be computable by DFT but without introducing atomic neighborhoods different from those in the large-scale configuration. To that end, we proceed as illustrated in Figure 7 (a)-(1) and define the rectangular region of size l x × l y , containing the dislocation core atoms {r * i } which we check on extrapolation. Next, we define three replicas of this rectangular region and translate them according to Figure 7 (a)-(2) by l x with respect to the x-axis and/or l y with respect to the y-axis.
Assuming that the dislocation line is parallel to the z-axis, we then define the mirrored displacement field where {r 1 0,i }-{r 3 0,i } are the ideal lattice sites in the translated regions. Imposing this displacement gives the configurations Figure 7 (a)- (2). The union of {r * i } and {r 1 i }-{r 3 i } is clearly periodic and from the figure it can be seen that this procedure creates minimal spurious effects in the sense that the atomic neighborhoods are not "too far" from the large-scale problem, indicated by the centrosymmetry parameter which is less than 0.04 in the vicinity of the periodic domain boundaries-which is more than two orders of magnitude smaller than for the core atoms. We further remark that this configuration appears similar to the well-known quadrupolar cell (cf., e.g., [8,45]), yet it is emphasized that our configuration fully includes the elastic far-field contribution due to external boundary conditions intrinsic to the large-scale problem.
We may further reduce this cell to the dipole-like configuration framed by the continuous line in Figure 7  It should be noted that this construction applies independently of the position of the dislocation core as shown in Figure 7 (b). That is, if the dislocation starts moving, e.g., due to some far-field applied stress, we detect the dislocation core position and proceed in the same way as above by shifting the rectangular region accordingly. Even though the proposed symmetrization technique ensures that no artificial neighborhoods occur in the training configuration, we would like to point out that there is potential freedom in modifying the positions of the boundary atoms: if, e.g., the symmetry of the displacement field breaks due to additional long-range effects, one may introduce an intermediate step which minimizes the centrosymmetry parameter, or, alternatively, the extrapolation grade, of atoms residing in some boundary layer.
Important remark. We find it important to emphasize that, despite the fact that the dislocation lines are extremely (some may say "unphysically") close to each other in the training set, the fitted model is still expected to (and does in fact, as we will see in next section) give accurate predictions. The reason for such training sets being representative of configurations with normal, "physical" distance between dislocations rests on the main assumption behind the success of machine-learning potentials: that the perturbations to electronic degrees of freedom decrease fast and therefore the electronic interaction can be well-described by the relative positions of atoms in a small neighborhood (like 5Å). Hence, if the atomic neighborhoods in the training set are representative of those in a large-scale simulation, then the potential is expected to give accurate predictions. The difference in the long-range elastic energy between training and actual configurations will be resolved by the atomistic deformation which is explicitly represented in the problem of interest.
Simulation details DFT simulations. As an ab initio model, we employ DFT by means of the Vienna ab initio simulation package [VASP,32] which uses plane-wave basis sets and the projector-augmented wave (PAW) pseudopotential method [10,33]. The parameters we have used in all our VASP calculations of bcc tungsten are given in Table 7. Electronic relaxation was performed using the preconditioned minimal residual method, as implemented in VASP, and terminated when the energy difference between two subsequent iterations was less than 10 −4 eV. With these parameters, we obtained the lattice constant a 0 = 3.168Å and the cubic elastic constants C 11 = 551.4 GPa, C 12 = 200 GPa and C 44 = 139.9 GPa, using the get elastic constants function from Matscipy (https://github.com/libAtoms/matscipy).
Exchange-correlation PE generalized gradient approximation [40] PAW potential PAW PBE W 08Apr2002 Energy cut-off 334.585 eV k-point mesh 2×3×16 Smearing width 0.06 eV  6 In our problems we consider solely structural relaxation using the fast inertial relaxation engine [FIRE,9] as implemented in ASE. The active learning algorithm is terminated when the maximum force on an atom is less than 0.001 eV/Å; except for the barrier calculations for which we use 0.01 eV/Å.

Nonlinear moment tensor potentials
In the case of nonlinear MTPs, as we use them in the current work, the radial basis functions f µ l depend on an additional parameter vector c µ l as follows (cf. [1]) where T n is the n-th order Chebysev polynomial and the function (r cut − |r ij |) 2 ensures a smooth transition to 0 as |r ij | → r cut . Hence, the basis functions have a nonlinear dependence on c µ l . Therefore, we need to modify the active learning algorithm to take this nonlinearity into account. For this purpose, under the assumption that the vector θ now comprises the basis coefficients and the c µ l parameters, we linearize the difference between the MTP and the quantum-mechanical per-atom energies with respect to some initialθ to first order such that The system matrix associated with the new loss function L is then the n × m Jacobian matrix where n is the total number of atomic neighborhoods in the training set which is usually significantly larger than the size of the parameter vector m.
We proceed analogously to the linear case by finding an m × m submatrix A of J using the maxvol algorithm. The extrapolation grade γ of a neighborhood {r * i } is now defined as

Equivalence of the quadrupole-and dipole-like configurations
In the following we show that the quadrupole-and the dipole-like configurations, enclosed by the dashed and continuous lines in Figure 7 (3), respectively, are equivalent when the displacement is the anisotropic linear elastic solutioñ ln ((x + p i y) 2 + y 2 ) 2 c i − arctan2 (q i y, x + p i y)d i (S5) of a screw dislocation. Assuming that the x, y and z-axes correspond to the 112 , 110 and 111 direction, namely the slip direction, the slip plane normal and the dislocation line direction, the real quantities p i , q i , c i and d i can be computed for b = 0 0 b T using the formalism described in [2, p. 444-445].
We now need to show that the displacement (9) on the upper boundary of the rectangular region with the {r * i }atoms is the same as on the lower boundary of the rectangular region enclosing the {r 1 i }-atoms, and vice versa. That is, ∀ x ∈ (−l x /2, l x /2] ∧ ∀ x ∈ (−3l x /2, −l x /2], we have u(x, l y /2, z) = u(−x − l x , −l y /2, z) or, equivalently, u(x, l y /2, z) =