Explicit energy-conserving modification of relativistic PIC method

The use of explicit particle-in-cell (PIC) method for relativistic plasma simulations is restricted by numerical heating and instabilities that may significantly constrain the choice of time and space steps. To partially eliminate these limitations we consider a possibility to enforce exact energy conservation by altering the standard time step splitting. Instead of updating particles in a given field and then the field using the current they produce, we consider subsystems that describe the coupling of a single particle and the field at the nearby nodes and solve them with enforced energy conservation sequentially for all particles, which is completed by the field update with zero current. Such an approach is compatible with various advances, ranging from accounting for additional physical effects to the use of numerical-dispersion-free field solvers, high-order weighting shapes and particle push subcycling. To facilitate further considerations and use, we provide a basic implementation in a 3D, relativistic, spectral code $\pi$-PIC, which we make publicly available. The method and its implementations are verified using simulations of cold plasma oscillations, Landau damping and relativistic two-stream instability. The capabilities of the method to deal with large time and space steps are demonstrated in the problem of plasma heating by intense incident radiation.


Introduction
The particle-in-cell (PIC) method [1] has a long history of developments that gradually made it suitable for a vast range of numerical studies in fundamental and applied plasma physics. Some of the developments are related to efficient use of computational resources and the possibility to perform computations in parallel on modern supercomputers [2][3][4][5][6][7][8]. Other developments concern the inclusion of physical processes, such as ionization or the effects due to strong-field quantum electrodynamics [9][10][11][12][13][14][15][16], as well as the solution of the concomitant algorithmic difficulties, such as particle ensembles resampling [17][18][19][20][21][22][23][24][25][26][27][28][29][30] to compensate for the growth of particle number due to ionization or particle/photon production. Finally there are developments related to the elimination of fundamental limitations, which can potentially facilitate a broad range of PIC method applications.
One such limitation, which remains a long-standing problem, is the numerical dispersion of field solvers. Although dispersion-free spectral solvers have been known from late 70's [31,32], their practical use is limited by difficulties of arranging parallel computations. That is why parallel codes are commonly based on grid-based field solvers (exceptions include codes described in Refs. [8,33,34]). One way to overcome the limitation while still using grid-based methods is to modify FDTD scheme [35] so that the numerical dispersion is suppressed/eliminated along the directions of interest [36][37][38][39][40].
One another long-standing problem of the PIC method is the lack of exact energy conservation, which leads to numerical heating and instabilities [1], thereby restricting the choice of time and space steps. The early developments related to energy-conserving PIC methods were carried out in early 70's by Lewis [41] and Langdon [42]. The idea of energy-conserving simulation is that the state of particles and the sate of electromagnetic field have to be advanced in time concurrently so that the energy conservation is enforced. This can be done by implicit scheme derived from Lagrangian formulation. Nevertheless this leads to the necessity of solving nonlinear system of equations, those number grows with the number of particles. Using Newton Krylov solver makes it possible to exactly preserve energy, under assumption of convergence. This approach was developed and used for the implementation of implicit PIC solvers in Refs. [43][44][45][46][47]. Some recent developments on relativistic energy-conserving implicit PIC method are reported in Ref. [48].
In 2017 Lapenta proposed a semi-implicit method (referred to as Energy Conserving Semi-Implicit Method (ECSIM)) that achieves the property of exact energy conservation with computational costs similar to that of explicit methods [49]. The idea of Lapenta method is to include the linear response of particles to the field into Maxwell's equations solved using θ-scheme [50,51] and advance particles using explicit for coordinate and implicit for velocity leap-frog scheme with electric field at the θ time level [52,53]. Although in Ref. [49] the method was developed for non-relativistic case, it is possible that the approach can be generalized to relativistic case and be useful even in highly non-linear simulations.
Nevertheless, for many studies that concern highly relativistic non-linear plasma dynamics, it is common to use explicit PIC method with sufficiently small time and space steps to avoid numerical heating and instabilities (these effects can be also suppressed by high-frequency field filtering). The choice of explicit methods can be related to the readiness for parallel implementation and overall possibility to use smaller time and space steps than that possible with implicit methods under similar computational demands.
In this paper we describe a possibility to modify the standard relativistic PIC scheme so that we enforce exact energy conservation while still keeping the scheme explicit and permitting the use of dispersion-free spectral field solver. This is achieved by splitting the time step into two actions: current-free energyconserving advancement of the field and energy-conserving local coupling of the field (current-related part) with the dynamics of each particle. The coupling is performed sequentially for all particles (this however doesn't prevent computations in parallel). We discuss options for achieving the desired coupling and describe a basic option of corresponding particle-field pusher. We show that this scheme exactly agrees with the standard explicit PIC method when the energy non-conservation is not an issue for the latter. In doing so we explain how the proposed method exactly removes the source of heating and numerical instabilities. This leads to the following properties of the proposed PIC modification: • the method preserves energy even in case of highly nonlinear, relativistic dynamics and thus is not subject to numerical heating/instabilities; • the method doesn't include implicit parts and admits parallel implementation; • the method is compatible with an arbitrary field solver, including the spectral solver, which is free from numerical dispersion.
We conjecture that due to this properties the approach described in this paper might be found useful in various numerical studies of relativistic, as well as non-relativistic plasma dynamics. The paper is arranged as follows. In Sec. 2 we motivate and describe the modified PIC method. In Sec. 3 we derive one basic implementation of the energy-conserving particle-field pusher used in the method and discuss possible improvements. The use of the standard spectral solver is detailed in Sec. 4 for completeness. The basic implementation of the method is described in Sec. 5. We study basic properties of the method using plasma oscillations in Sec. 6. In Sec. 7 we present a more sophisticated validation test and show the capabilities of the proposed method. We summarise the work in Sec. 8.

Modified PIC scheme
When particle dynamics becomes relativistic the idea of enforcing energy conservation faces several difficulties. The first difficulty is that the velocity response becomes nonlinear due to Lorentz factor. One way to combat this is to linearize the response near the current state. Nevertheless, it is possible that the change of particle momentum and velocity over a single time step is not small enough. Recent studies showed [54] that in some cases of interest strong magnetic fields in combination with low particle energies cause difficult-to-resolve gyration, thus requiring advanced particle pushers [55] and subcycling [56]. Although this aspect is clearly important for ensuring accurate numerical results, it doesn't seem the primary difficulty for the energy conservation because even standard Boris pusher [57] preserves energy when accounting for the magnetic field.
The second difficulty is that over a single time step ∆t a relativistic particle can traverse a distance comparable with the space step ∆x. (If the spectral field solver is used, the time step can be equal or even larger than ∆x/c, where c is the speed of light.) In case of significant change of particle velocity over ∆t, the coordinate becomes another nonlinear part in the response to the field (along with the velocity itself). Moreover, the update of coordinate by an explicit method based on "old" velocity (as Lapenta method prescribes) unambiguously determines the current, depriving us of the freedom critical for enforcing energy conservation. This is because the current has to be consistent with the charge relocation to comply with the current continuity equation. This consistency is known to be important in relativistic simulations and is perfectly addressed by Esirkepov scheme for current deposition [58]. (For spectral solvers, there are also options of corrections via direct solution of the Poisson's equation.) To identify requirements and possibilities for enforcing energy conservation it is instructive to consider 1D problem of electric field interacting with a moving charged particle. Within a sufficiently large time step the energy conservation can become intricate in two different ways: (1) the particle is being accelerated and this process halts when the field energy is exhausted and (2) the field is being set up by the moving particle and this totally exhausts its energy turning the particle in the opposite direction. If both happens within a single step, an oscillating behaviour occurs. This consideration indicates one notable point: to achieve energy-conservation it is not possible to advance particle location based on the "old" momentum because we do not know in advance whether the particle will propagate that far. That is why we choose to reverse the common logic: we first determine current that is consistent with momentum change and then determine the next particle location to satisfy the charge continuity equation.
To implement the outlined principle we start from the standard formulation of coupled evolution of classical electromagnetic field and charged particles (CGS units are used): where B, E are magnetic and electric field vectors; J is the current; r i , p i , v i , q i and m i are the i-th particle coordinate, momentum (given in units of m i c), velocity, charge and mass, respectively; N is the number of particles; w(·) quantifies particles' form factor. To enforce energy conservation we consider time step splitting into steps that advance the state according to subsystems of our choice. We first notice that the field evolution without current do not directly concern the energy coupling between particles and fields. This defines a useful choice for the first subsystem: which can be exactly solved using spectral solver to advance the state of field with exact energy conservation (details are given in Sec. 4). Since magnetic field does no work, the effect of magnetic field on momentum p i is another part that doesn't concern energy exchange and defines a useful set of independent subsystems: Each subsystem (8) can also be solved exactly by rotating p i about B (r i ), or approximately, using Boris scheme, which also preserves energy. The remaining parts account for the current in Maxwell's equations and for the electric part of the Lorentz force in the equation of motion. These parts are responsible for the energy coupling between the field and particles. We now notice that we can split the current into contributions of each particle J i and separate the remaining subsystem into N subsystems. The i-th subsystem describes local coupling between the field and i-th particle: Even thought the particles can overlap we intentionally split the coupling between the field and particles into N independent subsystems, which we choose to advance sequentially to be able to enforce energy conservation for each subsystem. This corresponds to the principle of particle-mesh approach that implies that particles do not interact with each other directly within a single time step, but instead interact through the field defined on a mesh/grid. In what follows we omit index i because the subsystems for each particle are independent and their consideration is identical.
To construct an energy-conserving numerical method for the outlined step splitting we need to formulate a consistent form of subsystem (9) -(12) for the field defined on a grid. Although a complete treatment with advanced approaches can be useful here, we are primarily aiming at preserving energy and thus use the following simplification. We assume that the interaction with the nearest grid values of the field is represented by coefficients c j > 0 that do not change in time during a single time step: where index j enumerates the nearest M grid nodes involved in the interaction and E j (τ ) is the electric field vector at the j-th node. Certainly, the normalization condition must be satisfied: The coefficients are defined by the particle form factor and can be determined in different ways depending on necessary accuracy. In general we need to compute a cumulative overlap. This can be done assuming that the particle propagates with the "old" velocity: where S(r) describes the shape of the particle, V g denotes the volume of each cell centered at the j-th node r (j) . If the integration can be done analytically the procedure becomes computationally cheap. Alternatively one can approximate the value of the integral using one or several points within the time interval. One can further increase accuracy of the final particle location at the end of the time step by running a predictor-corrector cycle, using the entire procedure described below. Nevertheless, it is not clear whether such more accurate calculations with non-propagating field within one time-step would me more efficient than the overall reduction of the time step for the simplest method. Since our primary goal is to consider energy conservation, in the provided implementation we use the simplest option of integral estimation based on the middle point and cloud-in-cell (CIC) weighting for the shape function.
Assuming that coefficients c j are fixed for the duration of time step, we define a grid-base version of subsystem (9) -(11): whereJ has the meaning of the current produced by the particle. To demonstrate that this formulation is consistent in terms of energy conservation, let us explicitly show that the energy is preserved. To do so we multiply both sides of (17) on E j , sum over all j = 1, ..., M , substitute (18) and (16), and finally obtain Multiplying both sides on V g /4π, we finally get which explicitly shows that the system (16) -(18) preserves total energy of the particle and the field defined on a grid. This means that if we advance the state according (16) - (18) so that the energy is preserved, then the whole method preserves energy. Now let us show that eqs. (16) -(18) form a self-consistent system, i.e. a system that define a single solution for arbitrary initial conditions. To show this we substitute (18) into (17), multiply both sides by c j and sum over all j = 1, ..., M : where we defined a constant We then take time derivative of (16) and substitute (21) into the right hand side: Introducing another constant we transform eq. (23) into the formp where we used dot to denote time derivative. In what follows we use the following notations: superscript "+" denotes that the value is taken at the time instance t + ∆t, the absence of such superscript indicates that the value is take at time instance t, and ∆ is used to denote the difference, e.g. ∆p = p + − p = p (t + ∆t) − p(t) (in some cases we indicate the time instance explicitly to avoid confusion). As we can see (25) is the equation that describes classical motion of an abstract particle with coordinate p in a central potent η 1 + p 2 . For the time interval ∆t this equation defines a definite solution p + ,ṗ + for arbitrary initial conditions: We consider the solution of (25) in the next section, while here we complete the proof of consistency by showing that its solution unambiguously and consistently determine E + j and r + . To show this let us express the change of coordinate over the entire time step ∆r = r + − r using eq. (12): Substituting (18) into (17) and integrating over the time step we get the change of E j : where we denote: Considering (16) at time instance t + ∆t we get: Thus our task is to find p + andṗ + according to eq. (25), preserving energy given by (20). The solution of this task in combination with updates (31) -(33) and the momentum update according to (8) can be referred to as energy-conserving particle-field pusher. We consider one possible implementation of such pusher in the next section. We conclude this section by summarizing the proposed modification of the PIC method. The method implies that we sequentially advance the state of each particle and of electric field at the coupled nearby nodes according to (16) - (18). Apart from this for each particle we advance the momentum according to the local magnetic field using rotation about the magnetic field vector. Finally, after advancing the states of all particles together with current-related change of the electric field at the coupled nodes, we use spectral solver to advance the field state according to Maxwell's equations without current. In fig. 1 we show the standard PIC scheme and the described modification in a way that highlights the differences: the circles are used to denote data, while the data processing principles are summarized in rectangular elements; the superscript "+" indicates the updated data; subscript "g" denotes grid values, E (i) g and E (i+1) g denote the state of electric field before and after processing the i-th particle, respectively.
3 Explicit energy-conserving particle-field pusher The energy-conserving particle-field pushed introduced in fig. 1 can be generally defined as an algorithm that advances the state of a given particle together with the state of the field at the nearby nodes according to both (8) and (9) - (12). An exact advancement warrants energy conservation given by eq. (20). For approximate solution we need to enforce energy conservation. Note that improved properties that reduce violation of energy conservation can be also practical because they can broaden the parameter region of numerical stability. However we here develop a solution with exact energy conservation to make the method unconditionally stable.
Although a more comprehensive treatment can be useful, we here use several simplifications to get an approximate, yet energy-conserving solution of this problem. First, as mentioned above, we separate and split the effect of magnetic field, which we account for through the rotation of p about B before the advancement according to eqs. (9) - (12); the details are given in the summary of the algorithm below. Next, we assume that coefficients c j are not changing in time during each time step. Under these assumptions, given relations (31) -(33), the remaining task is to advance the state of p(t) andṗ(t) defined by (26) over one time step according to (25): This equation can be seen as an equation that describes motion of an abstract non-relativistic particle with coordinate p in a central potential η 1 + p 2 . Hereafter we refer to this particle as p-particle, so as not to confuse it with the real particle in question. The relation to the p-particle dynamics provides an insight into the origin of numerical heating and corresponding instability. In case of exact dynamics, p-particle is constantly being deflected towards the center, while in the standard PIC scheme the p-particle is advanced in time along a straight line pointing towardsṗ(t) (tangent to the trajectory at the initial point of the time step). In case of a small enough time step, the deviation is minor but always leads to p-particle being further from the center than it should be, meaning that the energy is constantly increased. We thus can conclude that accounting for the potential that restricts p-particle motion is the way to get rid of numerical heating and instability.
For the motion of p-particle we have two invariants: where the first one represents the angular momentum conservation and the second one represent the energy conservation. Note that, despite having similar terms proportional to 1 + p 2 , the energy conservation for p-particle differs from the energy conservation for the field and the real particle given by eq. (20). In case of eq. (34), the standard way of determining particle dynamics in a central potential by adding centrifugal potential leads to differential equation: Since the exact solution is intricate, we choose to use another simplification with the following plan. We use another potential that approximates the one in eq. (34), but leads to an explicit solution. We then have two main options to enforce exact energy conservation according to eq. (20). We can either use approximate p + and correct the approximate ∆E (ṗ + ) (see eq. (31)), or use approximateṗ + , determine ∆E (ṗ + ) and then correct the approximate p + . Since the correction concerns 3D vector based on a single equation, there are many options. We choose to correct p + keeping its direction unchanged, i.e. by multiplying vector p + by a factor σ, which presumably should be close to unit. Since the correction of particle energy concerns the length of p only, the vector of difference between the approximate p + and the corrected σp + has the minimal length among all possible ways of correcting p + (determining such a minimal correction for ∆E is also possible but requires some derivations). The condition for enforced energy conservation then takes the form: We can explicitly express the correction factor σ: There is, however, an important restriction on the choice of approximate potential. It should restrict the motion of p-particle in such a way that the correction is always possible. In our case, it means that the resultant approximateṗ + doesn't lead to the field taking more energy than the particle can provide, otherwise we get imaginary value for σ.
Lemma. Ifṗ + is limited by the maximal kinetic energy gain for p-particle then σ is real.
Proof. The restriction on the maximal kinetic energy gain for p-particle can be expressed as which is achieved for L = 0, p + = 0. Let us now determine the upper bound for the electric field energy gain: where we substituted E + j = E j + c j ∆E given eqs. (31) - (32). Since c j > 0, the largest value is achieved in case all E j are pointing along ∆E, therefore Substituting eq. (31), we get Substituting restriction (41) and expressingṗ 2 via eq. (16), we obtain Recalling the definition of ∆E E (41) and substituting inequality (44) into eq. (39), we can see that radicand is greater or equal to zero and thus σ is real.
Given the obtained restriction we choose to approximate the exact potential η 1 + p 2 by a parabolic potential κp 2 /2, where coefficient κ is chosen to match the slope at the initial point p(t): This potential satisfies the identified requirement because the maximal drop of potential energy (reached at p = 0) is less than that of the exact potential. This follows from the fact that the gradient on the way to p = 0 is always smaller for the approximate potential: Note that the potentials exactly match in the limit p 2 1. The equation for the dynamics of p-particle with the approximate potential takes the form of three dimensional symmetric harmonic oscillator:p To express the solution in explicit form we define a function and compute its time derivative using (47): Using the solution g + = g exp (i √ κ∆t), we geṫ Note that in case of non-relativistic description the approximate parabolic potential coincides with the exact one. In this case we do not need to do any correction and have exact energy conservation (20) already for (50) -(51) with E + j determined according to (31) - (32). In case of relativistic simulations, the solution (50) -(51) for approximate potential becomes increasingly close to that for exact η 1 + p 2 .
To complete the description of this basic form of energy-conserving particle-field pusher we address the question of changing the momentum according to the local magnetic field B: For constant magnetic field this equation describes the rotation of vector p about vector B. Under this assumption we can express the solution using a rotation operatorR a b, which rotates vector b about a by an angle a. To obtain explicit form of this operator we decompose b into the sum of parallel and perpendicular components with respect to vector a: Using this decomposition, we can express the rotation operator: The rotation of p described by eq. (52) can be be given by The rotation of momentum p due to magnetic field doesn't change the energy of the particle. We can apply it before or after coupling with the electric field. One can also split the step into two half-step rotations applied before and after coupling with the electric field. We here consider the simplest case leaving possible improvements for further considerations elsewhere. The method is applicable for arbitrary weighting scheme. To exemplify the use of the method let us detail the case of CIC weighting, which we use in the provided implementation. To do so we define the CIC weighting function: where x, y, z are the Cartesian components of r; ∆x, ∆y, ∆z are the corresponding space steps (V g = ∆x∆y∆z), and T (·) is the triangular shape function: We can now collect all the steps of such basic energy-conserving particle-field pusher, which for i-th particle converts p = p i and E = E (i) into p + i = p + and E (i+1) = E + (see fig. 1): We conclude this section by outlining some limitations and possible improvements of the described algorithm. The presented consideration indicates two characteristic time scales of cyclic behaviour. The first one is induced by magnetic field and describes particle gyration with frequency ω B = qB/mcγ (see eq. (52)), where γ = 1 + p 2 is the particle gamma factor. The frequency of the second one √ κ (see eq. (47)) corresponds to plasma frequency for the partial density. To see this and clarify the meaning of the partial density, we recall that in PIC method the time evolution is computed for macroparticles that are assumed to represent a number of real particles, the number of which is commonly referred to as weight. Denoting the real particle mass and charge for the i-th macroparticle with weight w i by q r = q i /w i and m r = m i /w i respectively, we can represent the frequency √ κ in the following form (see eqs. (45) and (24)): where partial density n p i = w i /(V g /ξ) corresponds to the density that real particles associated the i-th macroparticle would have in case of even distribution in an effective volume V g /ξ (note that ξ ∼ 1 for the CIC weighting). The right-hand side of eq. (60) indicates that √ k represents the scale of plasma oscillations with the density of n p i and with account for the relativistic mass increase. The algorithm described in this section provides the state advancement that can be sufficient for simulations even if the outlined time scales of cyclic behaviour locally become comparable to or even shorter than the time step. For example, the algorithm can be sufficient if the accurate tracking of the phases of oscillations is not important and the energy conservation is a sufficient property that provides an appropriate modelling of particle's response for the simulation of a macroscopic processes in question. Nevertheless, if a more accurate treatment is needed, there are few interesting possibilities for improvement. To reveal such possibilities let us consider in what ways the algorithm can be inaccurate.
First, a poor resolution of particle gyration in magnetic field may lead to inaccurate description of the particle drift and energy gain near stopping points during direct laser acceleration (see [54,56]). To mitigate these effects in relativistic simulations one can apply the approach of individual subcycling proposed by Arefiev et al. [56] and further developed in [54]. In the context of described algorithm the individual subcycling means that for each particle we determine an individual time sub-step ∆t i so that ω B ∆t i 1, and repeat steps 1-3 with this time sub-step (if ∆t i ≥ ∆t we do a single time step with ∆t). The individual subcycling can also provide a way to account for the changing gamma factor during plasma oscillation. This is related to another source of inaccuracy that is caused by the use of approximate parabolic potential instead of η 1 + p 2 . To mitigate this inaccuracy the choice of individual sub-step should be driven by requiring ∆t i √ κ 1.
One another source of inaccuracy is related to the fact that the simulated plasma frequency (60) corresponds to partial density of a single macroparticle, while a large number of macroparticles for the given space location would imply a higher plasma frequency. This kind of inaccuracy is related to the fact that particles interact via field, not directly. To mitigate this effect, one can apply the following procedure. For each particle we determine a time sub-step that ensures a good resolution of the actual plasma oscillations where n r is the estimated local density of particles. If ∆t i < ∆t the particle is advanced in time by ∆t i and flagged as a particle with uncompleted time step with individual time t i = t + ∆t i . Once all particles are processed, we return back to the flagged particles and advance each again by another individual time sub-step, increasing t i by ∆t i and unflagging it if t i = t + ∆t (the substep ∆t i can either be adjusted to fit a whole number of sub steps in ∆t or the last step can be cut to fit the time at the end). We repeat this cycle over flagged particles until they all become unflagged. To highlight the difference in the way of processing particles one can call this collective subcycling. It is important that the procedure becomes computationally demanding only locally, where the density is high. Nevertheless, accounting for multiple plasma oscillations within a single time step may still be inaccurate because we do not account for the electromagnetic field evolution during a single time step. Effectively, we inaccurately describe the oscillating current that would otherwise result in energy emission in the form of electromagnetic waves.
In a more general case, the plasma oscillations can be strongly coupled with electromagnetic waves. That is why there should be some practical limit for the use of subcycling. One another reason for resolving the time scale of √ k is related to the motion of particles. To see the mechanism of this limitation, let us consider the case, when the time step is so large that we have √ k∆t = π and also let us assume that p = 0. In this case, ∆E = 2E no matter how strong the field is (one can interpret this as a half oscillation for plasma associated with a single particle). Following step 7 of the algorythm this means that the change of coordinate r + − r = V g E/2πq can potentially exceed c∆t. One can interpret this limitation as the necessity to resolve plasma period for partial density created by individual particles. Assuming equal weights of particles, this period is larger than the actual plasma period by a factor that is equal to the square root of the number of particles per cell.
There is a source of inaccuracy that is related to the discrepancy between the c i determined based on the "old" velocity and the c i that correspond to the actual change of particle coordinate ∆r. This is related to the violation of the continuity equation for the current and thus can lead to the accumulation of spurious charge (note that the correction based on solving Poisson's equation is possible but can violate the energy conservation). This kind of inaccuracy becomes increasingly large with more localized shape functions and when the motion is relativistic (and also when the particle is near the stopping points). Although subcycling can also alleviate this difficulty, there is another possibility. In case the determined ∆r corresponds to largely different c i , we can repeat the computations with c i determined at r + ∆r/2. One can even run a cycle of such correcting procedure, expecting that it quickly converges to a consistent advancement.
Finally, there is one another collective source of inaccuracy related to the fact that during each step the particles interact with the field sequentially. This means that field experienced by each particle depends on the order of processing. The effect is small if the time step is small, as is the total field change over each time step. Nevertheless, it is interesting to consider the possibility of using large time steps. In this case, one can again use local collective subcycling. However, there is also another simple measure to suppress the effect of order. Since the order of processing can be correlated with the particle location in phase space, ordered processing means that different parts of phase space are repeatedly experiencing the fields at the same stage of update after the update by the field solver. To remove the accumulation of such bias we can shuffle the order of processing. This is implemented in the provided implementation, while the positive effect of shuffling is shown in Sec. 6.2.

Field advancement using spectral method
To complete the time step so that the energy is exactly preserved we need to do energy-conserving advancement of electromagnetic field. This can be done by standard spectral method that in addition is free of numerical dispersion. Although the spectral method can be considered a well-recognized alternative to finite-difference and finite-element methods (i.e. grid-based methods) it remains a rather uncommon choice in many numerical studies. We thus provide a brief description of the method for completeness.
The spectral solver of Maxwell's equations was described by Haber et al. in 1973 [31], by Buneman et al. in 1980 [32], and later by Birdsall and Langdon in their textbook "Plasma Physics via Computer Simulation" (see chapter 15.9 (b)) published in 1984 [1]. The solver implies that Fast Fourier Transform (FFT) quickly converts the field defined on a grid into a sum of plane wave modes, each of those is independently advanced in time without errors, and then returns the field to the initial coordinate representation on the grid. This gives exact solution (without numerical dispersion) in case of constant current. In 1997 Liu considered an approximate/truncated version called pseudo-spectral time-dependent (PSTD) [59], which doesn't require sine/cosine function computation but is subject to numerical dispersion. In 2013 Vay et al. considered the exact solution (also for time staggered field components) and introduced the term pseudo-spectral analytical time-domain (PSATD), which highlights the difference from PSTD [33]. In the same work Vay et al. proposed a solution to the problem of arranging parallel computations being the main stumbling block for the broad use of the method. Exploiting linearity of Maxwell's equation and the finiteness of the signal propagation speed, the method in parallel advances the field in spatial domains having overlaps, where the cross-summation of the fields is carried out once in a time interval less than that needed for the light to traverse the overlap. A different method for arranging parallel computations, as well as the PIC code ELMIS based on this method, was described and used for numerical studies in PhD thesis [34] also published in 2013. The method is based on alternating data decomposition throughout each computational step (for details see Sec. 5). Although the term PSATD is being used in recent works [8,60], we here revert to the original term spectral solver used in the pioneering works [1,31,32].
The spectral solver for Maxwell's equations can be seen as a particular case of a general approach to numerical solution of partial differential equations (PDE). The idea is to use time step splitting so that the time advancement due to linear part (with optional inclusion of constant terms) of a PDE is carried out using eigenbasis so that the time-advancing operational exponent is computed exactly via the Jordan canonical form. In physics the relevant bases for getting useful diagonal representation of the operational exponent correspond to wave and coordinate representations of the state function, which means that the method can be greatly facilitated by the use of FFT. Although general procedure is applicable to the case of Maxwell's equations, it is instructive to consider the derivation of the spectral methods in physical terms. Following Ref. [32], we consider complex-value field that combines electric (real part) and magnetic (imaginary part) field components: where i is the imaginary unit. For vector F Maxwell's equations with zero current take the following form: where ρ is the charge density and the current J(r) is omitted in accordance with the PIC modification in question. Applying Fourier transform to both sides of equations (63,64) and denoting the representations of functions F(r) and ρ(r) in the basis of wave-vectors k by subscript k, we get: Eq. (66) corresponds to Poisson's equation and remains true if the current and charge fulfill the continuity equation. The propagation of electromagnetic radiation is described by eq. (65), which describes the rotation of vector F k about vector k with frequency ck. Note that one can also exactly solve the equation for F with constant current (see [1]), but this is not of interest for us because the current is already accounted for in the particle-field pusher. We thus can express the time advancement via rotation operator given by eq. (55):

π-PIC
Here we briefly describe an open-source code π-PIC (PI-PIC: Python-controlled Interactive PIC), which we make publicly available for use and extension under the GPL license [61]. The code is designed for relativistic simulations in 3D, 2D and 1D geometry. The computational routine is based on spectral field solver and includes a basic implementation of the described energy conservation concept (mid point CIC coupling) along with the standard PIC method with Boris pusher. The inclusion of two solvers is to facilitate comparison and studies of the proposed approach. Simulations can be configured and executed using Python (or Jupyter notebook). The code is written in C++ and resource intensive parts exploit OpenMP. For each type particles are stored in separate vectors (operated by standard library vector) allocated for each cell. To make possible use of multiple threads the cells are processed with stride 4, which in case of CIC coupling/weighting ensures thread-safe operation. If a particle migrates to a neighboring cell it is also thread-safe to remove it from the current vector and add to the corresponding vector of the neighboring cell (migrated processed particles are stored in the demarcated end part of the vector to avoid double processing). If a particle migrates to a non-neighboring cell, the reference to this particle is added to thread-local list of particles to be moved after the parallel loop over particles (note that in this case CIC might be insufficient and one should consider reducing time step or using more advanced coupling method). In addition, in case of energy-conserving solver the particles are shuffled in each cell and processed in two halves (first a half of particles in each cell is processed for all cells and then the remaining one) to remove repeated effect of ordered interaction with the field (for more details see Sec. 6.2). Relative to Boris method, the computations of energy-conserving method together with shuffling were measured to result in a slow down by a factor that ranges from 1.1 to 1.9. For example, in case of using 4 cores of Intel Core i7-1165G7 (via WSL) in the test case presented below the best iteration time corresponds to 30.7 ns per particle for Boris pusher and 33.5 ns per particle for energy-conserving pusher. The simulation with a single thread gives a slow down by a factor of about 3.8, which suggests an efficiency of about 95%. Computations related to spectral field solver are also arranged using OpenMP loop over individual 1D FFT calls. The time demands should be attributed to the used FFTW library [62]. Nevertheless we note that the described method doesn't require FFT for the current. For reference, for the above mentioned CPU the best time for 64 × 64 × 64 was measured to be 22 ns per cell. Nevertheless, these results should not be considered conclusive. The performance aspects are beyond the scope of this paper and more studies as well as optimization are desirable.
The controls of π-PIC are rather minimalistic, yet they can be found sufficient for many cases. To maintain reasonable performance the communication between the data of simulations and Python is arranged using C callbacks provided by Numba package. From Python it is possible to call a full cycle over all cells or particles so that for each particle or cell a function defined in Python is called. Along with the address of that function, it is possible to pass pointers to arbitrary arrays of integer or double type to be used or amended during the function calls. The use is exemplified in the next subsection.
The interface is based on several assumptions: • CGS units are used for all dimensional variables; • the components of vectors are enumerated: x, y, z cooresond to [0], [1], [2], respectively; • the sizes of the grid nx, ny and nz should be powers of two to facilitate FFT; • 2D simulation is enabled by setting nz=1, 1D simulations is enabled by nz=1 and ny=1; • the space is assumed to be periodic (other geometries can be implemented within Python).

Example: plasma oscillation
Here we provide a Python file that illustrates the use of π-PIC. The case in question is the process of plasma oscillation.   The code produces a sequence of images. The 25-th image is shown in fig. 2. When choosing solver in line 16 it is possible to select Boris pusher by replacing "ec" by "boris". In this example the results of both solvers match to within statistical variations.

.1 The limit of low resolution
Even thought the method warrants energy conservation by construction, we need to verify that its use is not precluded by numerical artefacts that do not violate this conservation law. For example, preserving energy per se does not contradict to the hypothetical process of transferring kinetic energy of thermal particle motion to electromagnetic field of plasma oscillations, i.e. to the process of spontaneous generation of plasma oscillations. Clearly, such a behavior would contradict to the second law of thermodynamics; any plasma oscillations should in opposite decay in line with Ladnau damping. Although accurate description of Landau damping is not questionable in case of high resolution, it is crucial to check if the method reverses the Landau damping in the limit of low resolution, e.g. in case of two time steps per plasma period. To assess the presence of this and potentially other numerical artefacts in case of low resolution we perform a more extensive study of plasma oscillations as probably one of the most indicative plasma processes.
We consider a simulation of 10 plasma periods for a setup described in Sec. 5.1 with XMin, XMax = -500 * DebyeLength, 500 * DebyeLength represented by nx = 32 cells, using 100 particles per cell and varied time step (see Appendix. A). To test the limits we intentionally consider low resolution for both time and space, while the results were measured not to crucially depend on the spatial resolution in this numerical experiment. The results for different time resolution are presented in fig. 3.
In case of using the energy-conserving method (labeled "ec") the total energy is preserved to within machine accuracy (deviation is below 10 −11 ) independently on the time step. For comparison we show the results of using the standard Boris pusher in combination with spectral field solver without any filtering that one can use to reduce numerical heating. For Boris pusher (labeled "boris") we can see that the total energy notably deviates from the initial value even in case of 128 steps per plasma period, whereas in case of 16 steps per plasma period the system becomes unstable within just about two periods.   Figure 3: Dependencies of electromagnetic E f ield (solid curves) and total E tot (dotted curves) energy on time t during 10 plasma periods T p for difference values of time step dt. The velues of energy are normalized to the initial energy E 0 . The energy-conserving method is labeled "ec", while the combination of spectral field solver and standard Boris pusher is labeled as "boris".
Let us consider how the results change when resolution becomes low. We can see that in the limit of low resolution the presented energy-conserving method doesn't cause any more severe deviations than increased rate of decay and frequency altering, both effects being fairly minor until about 16 steps per plasma period. In this case, the increased rate of decay becomes significant for the resolution of 8 steps per plasma period, while even for 4 steps per period the effect of plasma oscillations is qualitatively reproduced. In case of 2 steps per plasma period the oscillations decay almost immediately, while still not causing any clearly destructive nonphysical plasma states.
These results indicate that the method can be found capable of performing simulations with notably large time step than that needed for Boris pusher. Moreover, even if the time step is insufficient for resolving plasma oscillations in some regions of high plasma density, these regions are expected to remain physically stable. The latter can be useful if the physics of interest doesn't require good resolution of many plasma oscillations, or such oscillations are resolved using reduced time step during a limited part of simulation.

The cost of using low resolution
The results presented in the previous subsection indicate that in the limit of low resolution the use of energy-conserving method leads to an increased rate of decay for plasma oscillations. It is interesting to consider this effect in more detail. This is to reveal potentially less prominent effects that constitute the price we pay for using low resolution. For this purpose in this subsection we consider how a low resolution affects the particle distribution in the phase space.
Since in π-PIC the particles are stored and processed in a cell-by-cell order, the effect of order mentioned in the end of Sec. 3 can clearly play a significant role. In case of single loop over all particles, the particles that arrive to a given cell are processed either before or after all other particles in this cell. Since the chance of migration is correlated with energy, this clearly introduces a bias. That is why when energy-conserving scheme is used in π-PIC the particles are shuffled in each cell (before each iteration) and processed in two halves: first we make a loop over all cells processing a half of particles and then make another loop over all cells processing the remaining particles. To see the effect of such shuffling, we present a separate result for simulation without it.
To perform analysis we repeat the simulation of a single plasma oscillation described in sec. 5.1 with nx=8. In fig. 4 we show the phase space x-p x described in fig. 2 for different time step values and for different setups: Boris pusher, energy-conserving method without shuffling and energy-conserving method with shuffling. We can see that shuffling indeed significantly suppresses the effect of order and overall improves the accuracy of simulation in the limit of low resolution. In case of low resolution the energy-conserving method with shuffling shows a rather reasonable distribution that doesn't seem to deviate dramatically from the thermal distribution, which one could observe after the decay of plasma

Validation and capabilities
In this section we pursue two goals. First, we validate the correctness of the π-PIC code. Second, we show that in a rather generic case the proposed energy-conserving method indeed makes the code capable of performing simulations with time and space steps notably larger than that needed to resolve plasma period and Debye length, respectively. The validation and comparison can be more indicative if the simulated process involves complex relativistic plasma dynamics. In this case exact analytical results that provide complex data for comparison are rarely achievable. One example is the theory for relativistic radiation reflection from dense plasmas, referred to as Relativistic Electronic Spring (RES) [63,64]. The RES theory provides a way to compute the field structure of the electromagnetic pulse after reflection from a plasma surface with arbitrary density distribution. The results become increasingly more accurate for higher intensity of the incident pulse so that highly relativistic plasma dynamics is induced. In case of non-zero incidence angle this process can still be simulated in 1D geometry using a moving reference frame (in π-PIC this can be arranged by setting initial moment for all particles). Nevertheless, we intentionally consider the process in laboratory  reference frame to make testing more comprehensive. We choose to perform validation and comparison using 2D simulations (3D case can be considered by setting nz = 1 and modifying the shape of the pulse along z).
We consider a short circularly polarized pulse with the wavelength of λ = 1 µm impinging on a plasma layer with the incidence angle of θ = π/3. The layer has smooth density drops within 1 µm and the shape of ∝ sin 2 . The bulk density is N 0 = 100n cr , where n cr = πmc 2 /e 2 λ 2 is the plasma critical density. The peak field amplitude of the pulse is 100a 0 , where a 0 = 2πmc 2 /λ|e| is the relativistic field amplitude. We perform simulations for different time and space steps keeping 2c∆t = ∆x = ∆y. The initial plasma temperature is chosen so that the Debye length is set to λ D = λ/640. Other details can be read from the listing of the Python code presented in Appendix. B. Note that this setup is related to experiments on the interaction of intense laser pulses with solid targets, which is motivated by various applications including high harmonic generation and particle acceleration.
In fig. 5 we show the initial state of the field and plasma. The pulse, shown via B z field component, propagates toward point x = y = 0. The state of the system at the time instance t ≈ 3.5 × 10 −14 s is shown in fig. 6 for all considered cases. One can see the outgoing pulse that can be benchmarked against the RES results. In fig. 7 we compare the transverse field along the line x = y tan θ at t ≈ 3.5 × 10 −14 s.
From fig. 7 we can see that numerical simulations result in shapes that are close to what obtained with the RES theory. This indicates that the code is likely to work correctly. Note that we should not expect the exact agreement with the RES theory because of many reasons including limited transverse size and limited amplitude of the pulse. We can also see that the results remain fairly accurate even in case the time step is comparable with the plasma period T p = λ/10c for the density inside the target. Note that in the case of T p /∆t = 1.6, it is likely that the low spatial resolution of the wavelength λ affects the accuracy of the simulation. Finally, from fig. 2 we see that there are no clear numerical artefacts that prevent the use of the method with large time and space steps.

Conclusions
In this paper we described a way to modify the standard relativistic PIC method so that the exact energy conservation is enforced, while the computational scheme remains explicit and involves similar amount of computations and memory calls. We derived governing principles and a basic algorithm for their implementation. We considered possible limitations and some ways for overcoming them. To facilitate further studies and use of the proposed method we developed and made publicly available the π-PIC code, which is relativistic energy-conserving PIC code based on spectral field solver [61]. The performed tests indicate that the code and the proposed approach in general can be found useful for plasma simulations.