An arbitrary order time-stepping algorithm for tracking particles in inhomogeneous magnetic fields

The Lorentz equations describe the motion of electrically charged particles in electric and magnetic fields and are used widely in plasma physics. The most popular numerical algorithm for solving them is the Boris method, a variant of the St\"ormer-Verlet algorithm. Boris' method is phase space volume conserving and simulated particles typically remain near the correct trajectory. However, it is only second order accurate. Therefore, in scenarios where it is not enough to know that a particle stays on the right trajectory but one needs to know where on the trajectory the particle is at a given time, Boris method requires very small time steps to deliver accurate phase information, making it computationally expensive. We derive an improved version of the high-order Boris spectral deferred correction algorithm (Boris-SDC) by adopting a convergence acceleration strategy for second order problems based on the Generalised Minimum Residual (GMRES) method. Our new algorithm is easy to implement as it still relies on the standard Boris method. Like Boris-SDC it can deliver arbitrary order of accuracy through simple changes of runtime parameter but possesses better long-term energy stability. We demonstrate for two examples, a magnetic mirror trap and the Solev'ev equilibrium, that the new method can deliver better accuracy at lower computational cost compared to the standard Boris method. While our examples are motivated by tracking ions in the magnetic field of a nuclear fusion reactor, the introduced algorithm can potentially deliver similar improvements in efficiency for other applications.


Introduction
The Lorentz equationṡ model movement of charged particles in electro-magnetic fields. Here, x(t) is a vector containing all particles' position at some time t, v(t) contains all particles' velocities, α is the charge-to-mass ratio, E the electric field (both externally applied and internally generated from particle interaction) and B the magnetic field. The Lorentz equations are used in many applications in computational plasma physics, for example laser-plasma interactions [1], particle accelerators [2] or nuclear fusion reactors [3]. The Boris method, introduced by Boris in 1970 [4], is the most popular numerical scheme used for solving (1) although other numerical time stepping methods like Runge-Kutta-4 are used as well. It is based on the Leapfrog algorithm but uses a special trick to resolve the seemingly implicit dependence that arises from the fact that the Lorentz force depends on v. Its popularity stems from the fact that it is computationally cheap, second order accurate and phase space conserving [5]. While is was recently shown that for general magnetic fields this does not guarantee energy conservation and that the Boris method can exhibit energy drift [6], it is nevertheless a surprisingly good algorithm. In most cases, particles will remain close to their correct trajectory because of its conservation properties. However, Boris' method can introduce substantial phase errors and, for long time simulations, it only ensures that particles are near the right trajectory -it does not provide information about where on the trajectory they are at a given time.
For some applications this is not an issue because the only required information is whether a particle passes through some region but not when it does so. In these cases, phase errors are of no concern and the Boris algorithm is highly competitive, combining low computational cost with high quality results. There are other applications, however, where accurate phase information is crucial. One example are particle-wave interactions triggering Alfvén instabilities due to resonances between orbit frequencies and wave velocities [7]. Because it is only second order accurate, the Boris method requires very small time steps in those cases, creating substantial computational cost. In these cases, methods of order higher than two can be more efficient.
For separable Hamiltonians, the development of explicit symmetric integrators has been studied for decades [8]. However, the Lorentz equations (1) give rise to a non-separable Hamiltonian, making development of higher order methods challenging, see the overview by He et al. [9]. Quandt et al. suggest a high order integrator based on a Taylor series expansion and demonstrate high convergence order for relativistic and non-relativistic test cases [10]. The method needs derivatives of the electric and magnetic field, though, which may be difficult to obtain. A recently introduced new class of methods are so-called explicit symplectic shadowed Runge-Kutta methods or ESSRK for short [11]. They are symplectic and therefore have bounded long-term energy error. ESSRK have been shown to be more accurate than Runge-Kutta-4 both with respect to energy and phase error but also require substantially more sub-steps. No comparison with respect to computational efficiency seems to exist. He at al. introduce a high-order volume preserving method based on splitting and composition of low order methods [12]. A class of symmetric multi-step methods is derived by Hairer and Lubich but not analysed with respect to computational efficiency [13]. Instead of building higher order methods, Umeda [14] constructs a three-step version of the Boris method that can be about a factor of two faster. This paper presents a new high order algorithm of solving the Lorentz equations (1) called GMRES-SDC. Its key advantages are that it is straightforward to implement since it heavily relies on the classical Boris scheme which will be available in almost any plasma modelling code. Furthermore, it allows to flexibly tune the order of accuracy by simply changing runtime parameters without the need to solve equations for order conditions. SDC also provides dense output and allows to generate a high order solution anywhere within a time step. We use this feature to accurately compute the turning points of particles in a magnetic mirror. The codes used to generate the numerical examples are available for download from GitHub [15,16].
Boris-GMRES-SDC (below, we simply write GMRES-SDC) is an extension of Boris spectral deferred corrections (Boris-SDC), introduced and tested for homogeneous electric and magnetic fields by Winkel et al. [17]. The present paper expands their results in multiple ways. First, it provides a slightly simplified version of the method with almost identical performance.
Second, it integrates a GMRES-based convergence accelerator, originally introduced by Huang et al. [18] for first order case problems, with Boris-SDC. We show that this leads to a substantial improvement in the long-term energy error. Third, it studies the performance of GMRES-SDC for inhomogeneous magnetic fields, in contrast to Winkel et al. who only studied the homogeneous case.
While GMRES-SDC can be applied to problems where an electric field is present, we focus here on scenarios motivated by nuclear fusion reactors. There, particle motion is governed by a strong magnetic field and the electric field generated through particle interactions is negligible. Also, to be able to quantitatively compare the accuracy of GMRES-SDC and the classical Boris algorithm for single trajectories, we make two simplifying assumptions. First, we only consider test cases where the magnetic field is given by a mathematical formula (a magnetic mirror and a Solev'ev equilibrium), in contrast to a real reactor where the field is given by measured data interpolated to particle positions. Second, we neglect the stochastic models used to capture the effect of interactions of fast ions with the plasma. An implementation of GMRES-SDC into the LOCUST-GPU simulation software [19] and experiments for realistic use cases for the DIIID, JET and ITER experimental fusion reactors are ongoing work.
Verlet-based versus Leapfrog-based Boris integrator. Boris-SDC relies on the classical velocity-Verlet scheme applied to (1), which reads being numerical approximations of the analytical solution at some time step t n+1 . The seemingly implicit dependence in v n+1 is resolved using the trick sketched in Algorithm 1 introduced by Boris in 1970 [4]. What is typically referred to as "Boris algorithm" is the staggered Leapfrog method where the Boris-trick is used in (4c). While Velocity-Verlet (2) and Leapfrog (4) are similar they are not equivalent, see for example the analysis by Mazur [20]. Below, we refer to (2) plus the Boris trick as "unstaggered Boris" and to (4) with Boris trick as "staggered Boris" method. While a variant of Boris-SDC can be derived based on the staggered Leapfrog method, it requires additional storage of solutions at half-points and, in tests not documented here, was not found to improve performance over the velocity-Verlet based Boris-SDC. Substantial differences between Verlet and Leapfrog seem only to arise in simulations with very large time steps with nearly no significant digits left (phase errors well above 10 −1 ), where staggered Boris shows better stability. In such regimes, GMRES-SDC is not going to be competitive anyway so that we focus here on the simpler Verlet-based GMRES-SDC.

Spectral deferred corrections
Spectral deferred corrections [21] are based on collocation. Therefore, we first summarise the collocation formulation of the Lorentz equations (1) before deriving the GMRES-SDC algorithm.

Collocation
Consider a single time step [t n , t n+1 ]. Integrating (1) from t n to some t n ≤ t ≤ t n+1 turns them into the integral equations The exact solution at the end of the time step can theoretically be found by inserting t = t n+1 . In the original paper introducing Boris-SDC for homogeneous fields [17], the second equation is substituted into the first, resulting in double integrals over f in the equation for the position x. For the test cases studied in this paper we could not see any meaningful improvement in performance and, since the substitution leads to more complicated notation, we omit it and work directly with equations (5).
To discretise the integral equations (5) we introduce a set of quadrature nodes t n ≤ τ 1 < . . . < τ M ≤ t n+1 , set t = t n+1 and approximate and with x j , v j being approximations of x(τ j ), v(τ j ), that is of the analytical solution at the quadrature nodes. Then, approximations at t n+1 can be found from To turn this into a usable numerical method, we require equations for the x m , v m . Those can be obtained from discrete counterparts of (5) when setting t = τ m , for m = 1, . . . , M , resulting in The quadrature weights q m,j are given by with l j being Lagrange polynomials with respect to the τ m . Solving (9) directly using Newton's method gives rise to a collocation method. Collocation methods are a special type of fully implicit Runge-Kutta methods with a full Butcher tableau. Depending on the type of quadrature nodes, they have favourable properties like symmetry (Gauss-Lobatto or Gauss-Legendre nodes) [22,Theorem 8.9] or symplecticity (Gauss-Legendre nodes) [23, Theorem 16.5] and A-and B-stability [24,Theorem 12.9]. However, note that even for formally symplectic implicit methods, accumulation of round-off error from the nonlinear solver can still lead to energy drift [25].

Boris-SDC based on the Verlet integrator
Solving the stage equations (9) directly would amount to solving a size 6M d×6M d nonlinear system (d being the number of particles, each having 6 degrees-of-freedom), which is computationally intensive. Instead, Boris-SDC solves a series of 6d × 6d problems to iteratively deliver approximations that converge towards the collocation solution. In contrast to the derivation as a preconditioned Richardson iteration used initially to derive Boris-SDC [17], we use here the classical way to derive SDC via iterative solution of error equations with a low order integrator [21].
Denote the exact continuous solution to the collocation equations (5) as x(t) andv(t) and let x(t), v(t) denote some numerical approximation. The residuals, measuring how well the approximations solve the integral equa-tions (5), are defined as Note that if both residuals are zero for all t n ≤ t ≤ t n+1 , then x(t) and v(t) solve (5) exactly. Next, we define the errors between the approximate and exact solutions as (12b) Subtracting (11) and (5) and using definition (12) we find that the errors satisfy , v(t)).
Next, we will approximately solve these error equations to compute a correction for x and v. To do so, we take the difference e x (τ m ) − e x (τ m−1 ) and approximate the arising integral via where e x,m ≈ e x (τ m ), e v,m ≈ e v (τ m ) are approximations of the error at the quadrature nodes. Correspondingly, the approximate residuals R using the quadrature weights q m,j defined above. Putting this together and rearranging yields Similarly, taking the difference e v (τ m ) − e v (τ m−1 ) while approximating the arising integral with trapezoidal rule Ultimately, SDC will perform not a single but multiple iterations, each computing and adding a new correction. To get the update from iterate k to iterate k + 1, denote all terms without a correction with a superscript k (e.g., x m−1 becomes x k m−1 ) and all those with a correction with a superscript k + 1 (e.g. v m + e v,m becomes v k+1 m ). This yields the final equations for an SDC sweep for second order problems based on the Verlet integrator and and s m, , the seemingly implicit dependence on v k+1 m can be resolved using Boris' trick, leading to the actual Boris-SDC algorithm. First, define  (27) to compute the new velocity value v k+1 m . However, it is worth noting that the here described variant of SDC, in particular with the GMRES acceleration described below, is equally applicable to problems where f depends only on position so that (23) becomes explicit and no Boris trick is needed. An investigation of how Verlet-based SDC performs for other kinds of second order problems is being prepared by Minion and Speck [26].

GMRES accelerated Boris-SDC
By packing the solutions x m , v m at the quadrature nodes into a single vector the discrete collocation problem (9) can be written as see the Appendix in Winkel et al. for details [17]. First, let us consider the case where f is linear. For the Lorentz equations, this would be the case, for example, if B is homogeneous and E = 0. In a slight abuse of notation we write F for the matrix denoting the operator . . .
so that the nonlinear collocation problem (29) reduces to the linear system In this notation, one sweep of Boris-SDC, consisting of (21) and (23), can be written as with where and and Q ∆,T = 1 2 Q ∆,E + Q ∆,I and Q ∆,E := Q ∆,E • Q ∆,E , see again Winkel et al. for details [17]. Iteration (33) can be understood as a Picard iteration applied to the preconditioned system For first order differential equations, Huang et al. showed that performing k iterations of the Generalized Minimum Residual (GMRES) algorithm on (37) often gives better results than performing k standard SDC iterations [18,27]. For details on GMRES see e.g. the book by Kelley [28]. Here, we adopt their strategy to the second order Lorentz equations for cases where the magnetic field varies only weakly over a single time step. Note that while we rely on a self-written GMRES implementation in the accompanying code, we verified that it gives identical results to the GMRES implementation in the SciPy library [29].
A key advantage of GMRES is that it does not require the matrix representing the linear system or the preconditioner assembled explicitly. Instead, it only requires functions that compute (I − QF) U given some U and solve given some b. Applying QF amounts to computing IV m+1  (21), (23). Thus, all that is required to implement GMRES-Boris-SDC is a slight modification of the Boris-SDC sweep to accept a generic right hand side b and encapsulating the modified sweep and the quadrature routine in functions which can be called from GMRES.

GMRES accelerated Boris-SDC for inhomogeneous magnetic fields
GMRES is a solver for linear systems and will not work if f and thus F are nonlinear. In their original work, Huang et al. suggest to adopt GMRES-SDC to nonlinear problems by employing an outer Newton iteration and using GMRES-SDC as inner iteration to solve the arising linear problems. While possible, this procedure requires a substantial amount of sweeps and quadratures and is unlikely going to be competitive when compared against the Boris method unless very high accuracy is desired. Instead, we propose a different strategy for scenarios where B is changing slowly over the course of a time step and the nonlinearity is therefore weak.
Our modified method starts with a single sweep with standard nonstaggered Boris to generate approximate values x 0 m , v 0 m at all nodes. Note that in the notation above this is equivalent to solving Algorithm 2: Single time step of GMRES-Boris-SDC(k gmres , k picard ) for weakly nonlinear problems input : by block-wise elimination. Then, we linearize the function F by setting That is, the magnetic field applied to the velocity v m is not B(x m ) but B(x 0 m ) and therefore remains fixed during the GMRES iteration. We then apply a small number of GMRES iterations to the preconditioned linearised collocation equation For a slowly varying magnetic field this will provide an approximation U lin that is close to the solution of the nonlinear collocation problem (29). We then apply a small number of Picard iterations as sketched in Algorithm 3 using U lin as starting value. Picard iterations only require application of QF and do not need Boris' trick. However, they only converge for starting values that are close to the collocation solution or for small time steps. Therefore, Picard iterations alone were not found to be competitive with either standard or GMRES-accelerated Boris-SDC.
It was recently observed that the entries in the Q ∆ matrix can be changed without loosing the sweep-like structure of SDC. For first order problems, this allows to build more efficient sweeps resembling DIRK schemes [30]. In particular, one can use optimization routines to find entries for Q ∆ that provide rapid convergence. We tried to adopt this approach to second order problems but were unable to find a robust strategy that delivered improved results for a reasonably wide range of parameters. Further study of this approach is left for future work.
Computational effort. We count the number of evaluations of f required by each method and use this as a proxy for computational effort W . While Boris' trick requires some additional computation, in realistic simulations with experimentally given magnetic fields, evaluation of B(x m ) dominates the computational cost because of the required interpolation. Therefore, we count each application of Boris trick as one evaluation of f , ignoring the cost of computing vector products. Non-staggered Boris (2) requires one evaluation of f per time step. Thus, its total cost when computing N steps many time steps is simply In contrast, the initial predictor step in GMRES-SDC requires M − 1 Boris steps and the computation of f (x 0 , v 0 ) for a total of M evaluations. Computing F lin (X 0 ) requires M − 1 evaluations of f for Gauss-Lobatto nodes 1 .
Because we keep the magnetic field fixed in the GMRES iterations, there is no additional cost in terms of evaluations of f . Finally, the Picard iterations each require M − 1 evaluations of f and the update step requires another M − 1. Therefore, the total estimated cost of GMRES-SDC is (43) In principle, all of those steps except the predictor can be parallelised by using M − 1 threads to do the f evaluations for all quadrature nodes in parallel. That would allow them to be computed in the wall clock time required for a single evaluation. In the Picard iteration sketched in Algorithm 3, for example, one could introduce an OpenMP parallel for pragma before the for loop. Parallelisation would reduce the cost of the algorithm to Here, τ overhead accounts for any overheads, for example from threads competing for memory bandwidth. There are approaches available to parallelise a full SDC sweep instead of only the Picard iteration [31] but those have not yet been adopted for second order problems. We leave those as well as the development of an effective parallel implementation and a detailed assessment of required wall clock times for future work.

Numerical results
In this section, we compare the performance of the new GMRES-SDC algorithm against both the staggered and non-staggered Boris method for two fusion-related test problems. The first is a magnetic mirror where particles are confined by a magnetic field generated by two coils. Magnetic mirrors were one of the first designs considered for containing the plasma in a fusion reactor. Although most of today's efforts focus on the Tokamak design, magnetic mirrors are still subject of current research [32]. Our second benchmark uses a Solev'ev equilibrium which resembles the magnetic field in the Joint European Torus (JET) experimental Tokamak reactor.
Here, B 0 = ω B /α is the magnetic field at the centre of trap, ω B is the cyclotron frequency, α is the particle's charge-to-mass ratio α = q/m and z 0 the distance between coil and centre. Figure 1 shows an example trajectory of a particle that remains vertically confined, reflecting back and forth between points at around z = −3 and z = 3. Note that the particle paths is for the parameters given in Table 1 which were chosen to create a recognisable trajectory and are different than the ones for the numerical examples. The basic physical principle of magnetic mirroring [33,34] is that charged particles placed in a longitudinal axially symmetric static magnetic field, bounded by coils with higher value of magnetic field on both sides, will be reflected from these high field side regions when moving along magnetic field lines. This reflection is due to the invariance of a charged particle's magnetic moment in the adiabatic limit where with ρ L being the Larmor radius of particle, L the radius of curvature of the magnetic field line and the particle's velocity v = v ⊥ + v || being split into a part perpendicular to the magnetic field lines and a parallel part. As a particle moves from a low field to a high field side region, B increases and therefore, according to (46), v 2 ⊥ must increase in order to keep µ constant. Since the particle's kinetic energy remains constant, the parallel velocity v 2 || must decrease. Therefore, when B becomes large enough, v || approaches zero and the particle is reflected and travels back along the field line.

Near adiabatic limit
While we do not have an analytical expression for particle trajectories in a magnetic mirror, for the adiabatic limit ε → 0 we can determine the strength of the magnetic field at the points where the particle is reflected. Comparing this value against the magnetic field at numerically computed reflection points allows us to measure and compare the precision of GMRES-SDC and the Boris algorithm. Simulation parameters for the near adiabatic test case are summarised in Table 2 (left) and correspond to a value ε ∼ 8 · 10 −5 . Consider a particle with initial velocity v 0 and position x 0 and B 0 = B(x 0 ) 2 being the strength of the magnetic field at the particle's initial position. Denote as B ref the strength of the magnetic field at the reflection point and as v ⊥r the perpendicular velocity. Then, it follows from conservation of magnetic moment µ that v 2 Conservation of kinetic energy gives because, by definition, v ||r = 0, that is the parallel velocity at the reflection point is zero. Using (50) we can substitute v ⊥r in (49) which gives allowing us to compute B ref directly from the initial conditions x 0 , v 0 . Here, ϕ is the so-called pitch angle. Note that magnetic moment is only exactly conserved in the limit ε → 0. For small but finite values of ε, the actual value of B at the reflection point will be close to but not identical to B ref . Denote by B i the strengths of the magnetic field at the numerically computed reflection points and as B ref the field at the reflection point according to (51). We compute the l 2 weighted error defined as where N is the number of times the particle was reflected. To compute the B i , we exploit the fact that SDC allows to easily reconstruct solutions at arbitrary times in a time step with high order of accuracy. If a particle is reflected in the current time step [t n , t n+1 ], detected by a sign change in v , we construct the Lagrange polynomial   3) with M = 5 quadrature nodes is the most efficient algorithm, requiring approximately 2 × 10 7 evaluations of f . Boris' method would require more than ten times as many evaluations to deliver the same accuracy.

Non-adiabatic regime
For the non-adiabatic regime with ε 0 we do not have an analytical solution for either the trajectory or the magnetic field at the reflection point. Therefore, we rely on a reference solution computed numerically with a very small time step. Simulation parameters for the non-adiabatic test case are summarised in Table 2 (right) and correspond to ε ∼ 10 −2 .
Convergence order. We demonstrate that GMRES-SDC achieves high order of convergence, up to the order of the underlying collocation method. Consider a series of simulations with increasing time steps ∆t i , that is ∆t i+1 > ∆t i such that ∆t 0 is the most accurate result. Denote as x i the final position from the simulation with step ∆t i and as x i the x component of x i . While we only analyse the x component of a particle's final position, results for the other position components or velocities are similar and can be generated using the published code.
We calculate the relative error where x 0 is the result for smallest time step ∆t 0 . From that, we compute the numerically observed convergence order The resulting convergence rates p i for simulations with time steps from ∆t 0 ω = 0.0015625 to ∆t N ω = 0.4 are shown in Figure 3 for M = 3 nodes (left) and M = 5 nodes (right). Both variants of Boris, staggered and nonstaggered, achieve their theoretically expected order of p = 2 for resolutions below ∆tω < 10 −2 , that is approximately 100 steps per gyro-period. Although the more complex interplay between GMRES and Picard iterations does not allow a simple heuristic like two orders per iteration that was found for non-accelerated Boris-SDC [17], these results clearly show that GMRES-SDC can deliver high orders of convergence by changing the runtime parameter M and (K gmres , K picard ).
Work-precision analysis. We measure the error in the strength of the magnetic field at the reflection point by comparing against the values delivered by a reference simulation with ∆tω = 0.005 using Boris-SDC with M = 5 and 6 iterations. To find the reflection points, we employ the same procedure as in the near-adiabatic limit case.  1) is slightly more efficient than the other GMRES-SDC variants. To achieve errors of 10 −1 and below, GMRES-SDC is more efficient than Boris. It delivers a fixed accuracy with fewer f evaluations or, if the number of evaluations is fixed, delivers a smaller error with the same amount of computational effort. Already for errors of 10 −3 , the reduction in computational effort is about a factor of ten. The higher the desired precision, the larger the gain from GMRES-SDC. For M = 5, we observer higher convergence orders for GMRES-SDC(2,1) and GMRES-SDC(2,3), indicated by the steeper slopes. Using (2,3) iterations delivers the most efficient method for errors below 10 −5 while both GMRES-SDC(2,1) and GMRES-SDC(2,3) are about equally effective for errors up to 10 −1 . Only for errors above 0.1 does the classical Boris algorithm start to become competitive. Note that staggered Boris gives slightly smaller errors than the non-staggered Boris, but mostly the difference is very small. Only for very large time steps and therefore small numbers of f evaluations does a more substantial difference emerge. The error for staggered Boris remains bounded at roughly 0.1 while the error for non-staggered Boris continues to increase. However, for accuracies that low GMRES-SDC is not competitive with Boris anyway.
Long-time energy error. Finally, we compare the energy error for both Boris, Boris-SDC and GMRES-SDC over very long simulation times. It has been shown that Boris conserves phase-space volume [5] which typically means a bounded long-term energy error. For Boris-SDC and GMRES-SDC, depending on the choice of quadrature nodes, the collocation solution is either symmetric (Gauss-Lobatto) or symplectic (Gauss-Legendre) and will also have bounded long-term energy error, see the discussion in Winkel et al. and references therein. However, for a small number of iterations before the iterations have fully converged, both methods exhibit some energy drift. Figure 5 shows the relative error in the total energy over N steps = 3, 840, 000 time steps (with ∆tω = 0.5 and t end = 4800) for M = 3 (left figures) and M = 5 (right figures) Gauss-Lobatto nodes and different iteration numbers. The two upper figures show standard Boris-SDC, the lower ones GMRES-SDC. Except for the larger t end , parameters are identical to those in Table 2. As expected, the standard Boris integrator shows no drift, however its energy  iterations can be parallelised, GMRES-SDC delivers a smaller energy error for less computational work than standard Boris-SDC.

Solev'ev equilibrium
As a second test case we consider the Solev'ev equilibrium [35,36] where the magnetic field is given by Here, σ, , κ, ψ, r ma , r mi , z m , B 0 φ are constants given in Table 3, chosen to model an equilibrium similar to the one in the Joint European Torus (JET) fusion reactor 2 . Furthermore,x,ỹ are the intermediate coordinates Fig. 6 shows the magnetic surfaces in a R-Z cross-section as well as the two example trajectories studied below. One is for a passing particle that continues to perform full revolutions in the reactor's magnetic field. The second is a trapped particle which changes direction at some point of its orbit. It thus fails to complete a full revolution and instead travels on a so-called "banana-orbit". Initial position and velocity for both the passing and the trapped particle are given in Table 4.

Accuracy
To assess the accuracy of Boris and GMRES-SDC, we compare the their respective particle trajectories against a reference trajectory computed using GMRES-SDC(1,3) with a time step of ∆t = 0.1 ns. We choose time steps such that the time points in every run are a subset of the time points of the reference run to avoid the need for interpolation. Then, we compute the   Table 4: Initial position and velocity for a passing and trapped particle in the Solev'ev equilibrium. The charge-to-mass ratio is α = 47918787.60368 and their trajectories are simulated until t end = 10 ms. maximum defect at all points of the computed trajectory. The maximum defect in x reads d x := max n=0,...,N with t n , n = 0, . . . , N being the time steps for the current resolution and x (ref ) (t n ) the reference solution at those points. Analogous expressions are used to compute d y and d z . The maximum difference from the reference trajectory is then computed by taking the largest deviation across all three coordinate axis It is important to note that (58) compares positions at a specific time. Thus, d max measures not only particle drift but also errors in phase, that is if a particle is at a point on the trajectory but at the wrong time. Table 5 shows the resulting trajectory errors for the passing particle (upper two) and trapped particle (lower two) for staggered Boris and GMRES-SDC with M = 3 and M = 5 nodes.  Passing particle. For a passing particle, staggered Boris shows a substantial deviation from the reference trajectory. While energy conservation guarantees that the particle stays on the trajectory, the large errors mean that even for a small time step ∆t = 0.5 ns there is essentially no information about phase left in the numerical solution. In contrast, GMRES-SDC(1,3) and GMRES-SDC(2,4) deliver somewhat accurate phase information even for a time step ∆t = 1 ns with errors of around 26 cm, about ten times smaller than Boris. For larger time steps, both methods fail to produce accurate trajectories.
In terms of computational effort, GMRES-SDC(1,4) with M = 5 nodes requires 5 + 4 + 4 * 4 + 4 = 29 evaluations of f per time step according to (43). In contrast, Boris needs only one. However, running GMRES-SDC(1,4) with ∆t = 1 ns results in an error of 0.00036 or about 0.4 mm. In contrast, Boris with ∆t = 0.5 ns gives an error of about 2.8 m, about a factor of 7778 times larger, while requiring twice as many time steps. Assuming Boris achieves its full theoretical order of two (even though the results suggest it does not yet), achieving the same error as GMRES-SDC would require further reduction of ∆t by a factor of about √ 7778 ≈ 88. Therefore, even though each time step of GMRES-SDC is about 29 times more expensive than a step of plain Boris, the latter will require about 2 × 88 = 176 more time steps to achieve the same accuracy. Thus, simulations with GMRES-SDC can be expected to be about a factor of 176/29 ≈ 6.0 faster.
If, however, only precision of the order of centimetres is required, GMRES-SDC without parallelisation will struggle to be faster than Boris. GMRES-SDC(1,3) with M = 3 nodes achieves an error of about 1.9 cm for a time step of 0.5 ns. For the same accuracy, Boris would require a 2.8/0.019 ≈ 12 times smaller time step of around 0.042 ns Therefore, while Boris would need about 12 times more steps, GMRES-SDC(1,3) is 3 + 2 + 2 * 3 + 3 = 14 times more expensive per step, making it slower than the classical Boris. In the regime of centimetre precision (in contrast to millimetre precision), only the parallel GMRES-SDC(1,3) with workload model (44) would be competitive as it is only 3 + 1 + 3 + 1 = 8 times more expensive per step than Boris method.
Trapped particle. For the trapped particle errors are generally higher than for the passing case. This is probably because of the very sharp turns as the particles change directions at the tips of the banana orbit. Some form of adaptivity where the number of iterations or the time step is adjusted according to the residuals of the iteration could potentially be useful here, but is left for future work.
As for the trapped case, errors are very large for all methods when using time steps of ∆t = 5 ns or more. For M = 3 nodes, GMRES-SDC starts to produce accurate results with errors of about 4 cm for time steps of ∆ = 1 ns and below. GMRES-SDC(2,6) with M = 5 nodes already produces results with errors of around 0.2 mm for a 2 ns time step. Again assuming perfect second order convergence for Boris, we would need to reduce the time step from ∆t = 0.5 ns, where it produces an error of about 31 cm, by a factor of 0.31/0.0002 ≈ 39. While GMRES-SDC(2,6) is about 5 + 4 + 4 * 6 + 4 = 37 more expensive per time step, Boris requires about 4 × 39 = 156 times more steps to reach the same precision, making GMRES-SDC about a factor of 156/37 ≈ 4.2 more efficient.
However, to achieve centimetre precision with an error of 4 cm, Boris would only need a .31/0.04 ≈ 2.8 times smaller time step of around ∆t = 0.18 ns. GMRES-SDC(1,3) with M = 3 can deliver comparable precision with a step size of 1 ns, corresponding to 1/0.18 ≈ 5.6 times fewer steps, which is not enough to offset the 3 + 2 + 2 * 3 + 3 = 14 times higher cost per time step. Even a fully parallelised version of GMRES-SDC would still be 3+ 1+3+1 = 8 times more expensive per time step and thus costlier than Boris. While this analysis is slightly biased by assuming that the classical Boris method achieves its full second order convergence, these results nevertheless suggest that further improvements, e.g. through optimisation of coefficients, would be needed to make GMRES-SDC competitive when only centimetre precision is required. Figure 7 shows the long-time energy error for various configurations of GMRES-SDC for the Solev'ev test case. As for the magnetic trap, too few iterations or a too large time step mean that the energy conservation of the underlying collocation solution is not retained, causing the energy error to grow over time. However, the growth is relatively mild and energy errors are often very small even after computing millions of steps. GMRES-SDC (2,8) with M = 5 nodes for a 5 ns time step, for example, shows some energy drift but still delivers energy errors below 10 −5 at the end of the simulation. GMRES-SDC(2,4) for M = 3 nodes with ∆t = 5 ns shows no energy drift for a passing particle with a constant error of about 10 −4 . For a trapped particle, some drift is visible towards the end, leading to a slightly higher energy error of around 10 −2 . Note also that the reference solution with GMRES-SDC(1,3) and a time step of ∆t = 0.1 ns conserves energy exactly for the passing particle. Even at the end of the simulation, after 10 9 time steps, energy errors are below 10 −10 and thus essentially of the order of magnitude of round-off errors. For the trapped particle, some drift can be seen but errors are also at around 10 −10 at the end.

Conclusions and future work
The paper introduces Boris-GMRES-SDC, a new high order algorithm based on the widely-used Boris method to numerically solve the Lorentz equations. The method relies on a combination of spectral deferred corrections for second order problems and a GMRES-based convergence accelerator originally devised for first order problems. Since it freezes the magnetic field over the GMRES iterations to linearise the collocation problem, its applicability is limited to cases where the magnetic field does not change substantially over the course of one time step. Parts of the introduced algorithm are amenable to parallelisation, opening up a possibility to introduce some degree of parallelism in time across the method (compare for the classification by Gear [37]), but this is left for future work.
The new algorithm is compared against the standard Boris method for two problems, a magnetic mirror and a Solev'ev equilibrium, the latter resembling the magnetic field of the JET experimental fusion reactor at the Culham Centre for Fusion Energy. For the Solev'ev equilibrium, our examples show that if precisions in the millimetre range are required, GMRES-SDC can reduce computational effort by factors of around 4 to 6 compared to the standard Boris method. Gains will be greater for even smaller accuracies and will decrease if less accuracy is needed. While the break even point from where GMRES-SDC cannot produce computational gains is hard to pinpoint, our result suggest it to be for precisions in the centimetre range. However, a properly parallelised implementation of GMRES-SDC together with an effective adoption of the parameter optimisation strategy by Weiser [30] may still outperform the classical Boris method for centimetre accuracies. Devising and studying these improvements of the algorithm is left for future work.