Parallel-in-time integration of Kinematic Dynamos

The precise mechanisms responsible for the natural dynamos in the Earth and Sun are still not fully understood. Numerical simulations of natural dynamos are extremely computationally intensive, and are carried out in parameter regimes many orders of magnitude away from real conditions. Parallelization in space is a common strategy to speed up simulations on high performance computers, but eventually hits a scaling limit. Additional directions of parallelization are desirable to utilise the high number of processor cores now available. Parallel-in-time methods can deliver speed up in addition to that offered by spatial partitioning but have not yet been applied to dynamo simulations. This paper investigates the feasibility of using the parallel-in-time algorithm Parareal to speed up initial value problem simulations of the kinematic dynamo, using the open source Dedalus spectral solver. Both the time independent Roberts and time dependent Galloway-Proctor 2.5D dynamos are investigated over a range of magnetic Reynolds numbers. Speed ups beyond those possible from spatial parallelization are found in both cases. Results for the Galloway-Proctor flow are promising, with Parareal efficiency found to be close to 0.3. Roberts flow results are less efficient, but Parareal still shows some speed up over spatial parallelization alone. Parallel in space and time speed ups of $\sim300$ were found for 1600 cores for the Galloway-Proctor flow, with total parallel efficiency of $\sim0.16$.


Introduction
Dynamo theory seeks to explain the processes that generate magnetic fields in stars, planets, and galaxies. These magnetic fields are sustained by currents generated by the flow of a conducting fluid [1,2], such as molten iron in the Earth's core, or plasma in the Sun and stars. The fluid velocity, u, twists, stretches, and shears field lines, counteracting the Ohmic diffusion of the magnetic field, B, which would otherwise cause the field to decay [3]. The dynamo problem is typically modelled using the induction equation coupled with the momentum equation for the fluid, which determines u. The magnetic Reynolds number is defined by where U is a characteristic speed, L a characteristic length scale, and η is the magnetic diffusivity [4]. A solution of this coupled system is called a dynamo if the magnetic field continues to be generated as time t → ∞. The smallest scale of magnetic features created by the evolution of the induction equation (1) is the resistive scale which is proportional to R m −1/2 [3], so that the spatial resolution required to capture these structures increases as R m 1/2 . This leads to very high computational requirements that limit the parameter regimes that can be studied. Simulations are currently carried out at parameter regimes far removed from those found in natural dynamos of real stars and planets [5].
Owing to the complexity of the full dynamo problem much attention has focused on the simpler problem of kinematic dynamos where u is prescribed. As the induction equation is linear in B , solutions to the kinematic dynamo problem are either exponentially growing or decaying with a well defined growth-rate for a statistically stationary u. Due to a number of anti-dynamo theorems [6,7], relatively complex space or time dependent flows are required in order to generate a growing magnetic field. Such a problem is difficult to address analytically, so numerical and computational methods are employed.
If u is steady, then the growth-rate can be found by solving an eigenvalue problem, whilst if the flow is periodic in time, the growth-rate can be found by solving a Floquet problem. If in addition the flow is independent of one co-ordinate, say the z-co-ordinate (2.5D flow), then one can seek monochromatic solutions of the form B(x, y, z, t) = b(x, y, t)e ik z z , so that k z becomes a parameter and the problem becomes 2D. The behaviour of dynamos at different R m has received much attention [8][9][10]. Low R m systems are dominated by diffusion, while high R m systems are dominated by advection. Laboratory and engineering type flows tend to have R m < 1 [11], flows in the Earth's core are characterised by R m ∼ 10 2 − 10 3 [12], whilst flows in the Sun have R m ∼ 10 6 − 10 10 [13].
Behaviour of dynamos at large R m is of particular interest astrophysically. If the growth-rate stays bounded away from zero as R m → ∞ then a dynamo is called fast, otherwise it is called slow. It is a necessary (but not sufficient) condition that a fast dynamo have chaotic Lagrangian particle paths [14]. In 2.5D flows, this can only be the case for unsteady flows. In fully 3D flows, the paths tend to be chaotic even without time dependence. No fast dynamos have yet been proven mathematically, but some have been found to act as fast dynamos in numerical simulations.
A particular choice of 2.5D cellular steady flow was considered by Roberts [8]. The Galloway-Proctor circularly polarised (CP) [9] flow is an extension of this flow to include time dependence. While the Roberts flow must act as a slow dynamo, as it is steady, the Galloway-Proctor dynamo is thought to be fast. The Roberts flow dynamo has been used to investigate the experimental dynamo at Karlsruhe [10], whilst the Galloway-Proctor flow was used to investigate the formation of large scale magnetic fields at high R m [15].
High accuracy solutions are needed for dynamo simulations and so the majority of studies use spectral methods to discretise in space [16], expanding in Fourier series, spherical harmonics, or Chebyshev polynomials as best fits the domain, to make use of their spectral convergence properties. Because of the high resolution requirements, very long compute times are found for the full dynamo system, even when simulations are run on large numbers of cores in high performance computing (HPC) facilities. Matsui et al. [17], for example, tested scaling on up to 16,384 cores while Schaeffer et al. [18] ran simulations that required over 10 million cpu hours. Transforming between spectral and spatial coordinates, usually via Fast Fourier Transform (FFT), acts as a limiting factor on the parallel scalability of pseudospectral codes [19]. Owing to the global communications requirements of spectral methods, simulations are currently unable to scale to the huge number of processors available in modern HPC facilities. Further ways to parallelize computations are therefore required to investigate more realistic parameter regimes.
Parallel-in-time methods can increase scalability of computer simulations beyond saturation of spatial parallelisation [20] and have been investigated for over 50 years [21]. The most widely studied parallel-in-time algorithm is Parareal [22], which was introduced in 2001 and has spurred a renewed interest in the subject. Further parallel in time methods have been proposed recently, like PFASST [23], PITA [24], and Paraexp [25]. The Parareal algorithm has been utilised in fluid flow problems [26,20], plasma physics [27], financial simulations [28], and in planetary mantle simulations [29]. The method has been extensively analysed mathematically by Gander and Vandewalle [30], and is thought to perform badly for purely hyperbolic and highly advective systems [31], as it primarily corrects amplitude defects, rather than phase or frequency defects [32]. There has been recent work in parallel-in-time for advective problems, where the linear operator was solved using the method of characteristics [33]. In this work, however, we seek to inform future studies of the non-linear dynamo problem, where the advective terms are no longer linear, and this approach would not be suitable. While some recent works demonstrate efficient application of Parareal to hyperbolic PDEs [34], the applicability of the method for dynamo simulations has not yet been studied.
The present paper studies the ability of Parareal to speed up kinematic dynamo simulations. It presents the first demonstration that parallel-in-time methods can deliver speedup for the induction equation beyond the saturation point of spatial parallelization. We introduce an implementation of the Parareal algorithm in the open source spectral solver Dedalus [35] and investigate its performance for the stationary Roberts [8] as well as the time-dependent Galloway-Proctor [9] flow. Although these flows are relatively simple, they generate complex dynamics in the magnetic field and are good initial test problems to demonstrate Parareal's efficiency for dynamo simulations.

The kinematic dynamo problem
In this work, the kinematic dynamo is studied for a subset of the ABC (Arnold, Beltrami and Childress) class of flows u = (C sin(z) + B cos( y), A sin(x) + C cos(z), B sin( y) + A cos(x)) , [36], with one of A, B, or C equal to 0, making the flow 2.5D.

Roberts flow
The Roberts flow is found from the ABC flow by setting A = B = 1, C = 0, giving u = (cos(y), sin(x), sin( y) + cos(x)). Noting that ∇ · B = ∇ · u = 0, equation (1) becomes This flow is periodic in x and y, and equations (4) are solved in a 2π square plane with periodic boundary conditions.

Galloway-Proctor flow
The Galloway-Proctor circularly polarised (CP) flow adds time dependence to a Roberts like flow u = (C sin(z + sin ωt) + B cos( y + cos ωt), C cos(z + sin ωt), B sin( y + cos ωt)) , (5) with ω = 1 and C = B = √ 3/2, A = 0. We look for solutions of the form B = b( y, z, t)e ik x x . When put into (1), we have (6c)  Fig. 1d. For the Roberts flow, the magnetic field has a constant morphology and simply grows exponentially in magnitude.

Parareal algorithm
We give only a brief description of the Parareal algorithm, for a more detailed description see for example the works by Lions et al. or Gander and Vandewalle [22,30]. Parareal is an iterative method for solving initial value problems (IVPs) of the form where the right hand side f typically comes from spatially discretizing a partial differential equation. Parareal uses two different time stepping methods, the coarse method G and the fine method F . The time domain is split up into N p time slices, defined by the time-points 0 = t 0 < t 1 < ... < t N p −1 < t N p = T , where N p is the number of processors available for time parallelisation. We assume that all time slices are of equal length T = T /N p . The fine time stepping method has time step δt, while the coarse method has time step t. Parareal starts with running the coarse solver from t = 0 to t = T , giving an initial approximation to the solution U at time t n , U k=0 n , for every n = 0, 1, ..., N p , where k denotes the Parareal iteration number, and n denotes the time slice. Then, the approximation is refined using the iteration The Galloway Proctor field shows more spatial variability, due to the time dependence of the flow. The Roberts field stays effectively fixed in space, only varying in magnitude over the course of the simulation, whilst the morphology and magnitude of the Galloway Proctor field changes over time.
Computing the fine method can be parallelized across time slices. However, the correction from the coarse propagator has to be computed in serial, propagating the Parareal correction throughout the time domain.

Convergence and performance
As k → N p , Parareal converges to the solution that would have been obtained through the use of the fine method in serial over the time domain [0, T ] [30]. Achieving speed up, however, is dependent upon converging to an acceptable tolerance tol within a much smaller number of iterations than N p . We test for convergence by measuring the relative two-norm defect at the last time slice between successive iterations We denote as k con the number of iterations required until σ ≤ tol. Speed up s of Parareal can be estimated as where R c is the runtime of the coarse method over one time slice and R f is the runtime of the fine method over one time slice. Speed up is bounded by [30]. The bound illustrates the trade off that needs to be optimised to gain optimal performance with Parareal. On the one hand, the ratio between the run times of the fine and coarse solvers should be large, so that the second bound is high.
This can be achieved by using a very coarse and cheap G. However, the method also needs to converge in a small number of iterations to ensure that the first bound is high. A less accurate coarse solver will typically require a larger number of iterations to converge. Finding a good compromise between these two competing factors is key to gaining high speed up with Parareal. Parallel efficiency is defined as where ideal scaling s = N p would give an efficiency of one. Parareal has an efficiency bound of 1/k con , which highlights the need to keep the number of iterations low. The need to test for convergence between two successive iterations of Parareal means there is an effective limit of 1/2 for parallel efficiency when using Parareal.

Coarse solvers
A number of strategies have been proposed to design coarse solvers for Parareal. The most simple to implement is an increase in time step size for the coarse solver compared with the fine solver, whilst leaving the timestepping method and spatial discretization unchanged [20,37]. A second strategy is to use an implicit time stepping method with large step size for the coarse solver, with explicit time stepping for the fine solver [38]. Further strategies to find a coarse solver include simplifying the physics so that a less complicated model can be used [39][40][41]. Another strategy to be considered is to use a reduced spatial resolution, along with a larger time step [42]. Using this strategy, the method of interpolation from coarse to fine grids has been found to be important to Parareal convergence [43]. In section 5.2 we explore different options in the context of the kinematic dynamo problem.

Implementation
This work was carried out using the Dedalus [35] open source, spectral solver. We use a collocation-based pseudospectral method. Spectral methods benefit from spectral convergence: for a sufficiently smooth solution, the error of the method is O[(1/N s ) N s ] for a discretisation with N s collocation points. This compares favourably with a standard finite difference discretisation, which has error O[(1/N) p ] for a pth order method. Because of the periodic domain, Fourier bases are used. Dedalus uses the FFTW library to perform fast parallel transforms between real and spectral space and parallelizes in space over n − 1 dimensions of an n dimensional domain using the mpi4py library [44].
Dedalus offers a number of different time stepping methods up to 4th order, including multi-step and Runge Kutta implicit/explicit (IMEX) methods [45][46][47]. Because multi-step methods in Parareal require restarting in every time slice and every iteration, we focus on Runge-Kutta methods. For optimal serial performance, we rely on IMEX Runge-Kutta methods [45]. Here, we integrate the terms (u · ∇)B and (B · ∇)u explicitly, whilst the diffusion term is treated implicitly. Below, we compare the IMEX Runge-Kutta methods RK111 (1 implicit stage, 1 explicit stage, 1st order), RK222 (2 implicit stages, 2 explicit stages, 2nd order), and RK443 (4 implicit stages, 4 explicit stages, 3rd order) to determine the most efficient serial fine solver against which to compare Parareal.
Simulations are carried out in a periodic domain with side length 2π . The spatial resolution is measured in terms of the number of spectral modes in x and y, (N x , and N y respectively) for the Roberts flow, or number of modes in y and z, (N y and N z respectively) for the Galloway-Proctor flow. Since the domain is a square, we set N x = N y = N for the Roberts flow and N y = N z = N for the Galloway-Proctor flow in all examples, so that the total spatial problem size is N 2 . The initial conditions were set to a random perturbation of magnitude 10 −5 for B x and B y in the Roberts flow, and for B y and B z in the Galloway Proctor flow.
In the kinematic approach, the simulation time needs to be long enough that the largest growing mode can be found and a steady growth rate calculated. In this work, this time was found to be around 10-20 turnover time periods. However, in a fully dynamic simulation, much longer simulation times are required to reach a statistically steady state in both the fluid flow and the magnetic field. For example, Smith et al. [48] ran simulations until T = 450 to achieve a statistically steady flow. As the aim of the work is to inform future studies of dynamic simulations, we choose a longer time interval of 50 turnover times.
Parareal was implemented by splitting the MPI world communicator into a space communicator and a time communicator. The space communicator was utilised by Dedalus to parallelize in space, whilst the time communicator was used to communicate between time slices. Two instances of Dedalus were created on each time slice, one with the coarse resolution (N C spectral modes in each direction) and time step t, and one with the fine resolution (N F spectral modes in each direction) and time step δt. Interpolation from coarse to fine grids, and restriction from fine to coarse grids, were carried out using the Dedalus set_scales(ratio) method on the field objects, which allows efficient interpolation based on Fourier re-sampling.

Results
To ensure that reported speedups are a like-to-like comparison and meaningful, we need to make sure that the solution provided by Parareal is of the same accuracy as the fine solver used serially. Given the need for highly accurate results in dynamo simulations, we aim for an accuracy of 10 −5 . Furthermore, we need to compare Parareal against an efficient fine solver -achieving speedup with Parareal when a much more efficient alternative to the fine solver exists would not provide convincing evidence of Parareal's usefulness. Therefore, we first find the optimal fine solver in Dedalus to deliver the required accuracy and then parallelise it with Parareal.

Fixing the fine solver
First, we need to fix the required spatial resolution for F . Convergence in space was tested by running simulations at double the previous spatial resolution until the normalised L 2 difference between two solutions was ∼ 10 −15 . At this point, the error from the spatial discretisation is of the order of machine precision. We denote U N as the solution vector containing b x and b y in real space with resolution N. The lower resolution solution was interpolated onto the same grid size as the fine solution using spectral interpolation so that the difference could be computed. The most highly resolved solution, U Nmax , was then used as a reference solution to compute relative error of each U N . The results are shown in Fig. 2, confirming the expected spectral convergence behaviour. For each R m , we set N F to the smallest value that gives a solution with error smaller than 10 −5 (indicated by the dashed red line). Because higher magnetic Reynolds numbers produce smaller scale features, they require better spatial resolution to match our error tolerance.
Next, we fix the time stepping method and time step. Creating a reference solution with a temporal error of the order of machine precision proved to be unfeasible, due to computational constraints, especially in the higher R m cases. Therefore, a result with error lower than 10 −7 was used as a reference solution for setting δt. This is two orders of magnitude smaller than the desired result of 10 −5 and should provide an accurate estimation of the error due to time-stepping. A comparison of the Runge-Kutta time steppers available in Dedalus is shown in Fig. 3. Because RK443 reaches the required tolerance of 10 −5 with the smallest number of evaluations of the right hand side function, it is the most efficient choice. Similar results were found for other magnetic Reynolds numbers. We therefore use RK443 for the fine method F throughout this work. Fig. 4 shows the dependence of the solution error on time step size for a range of R m for the Roberts and Galloway-Proctor flows. All simulations were carried out with the optimal N F found from the spatial resolution study (Fig. 2). We can see that smaller step sizes are required to meet a given level of accuracy for the Galloway-Proctor flow than for the Roberts flow. This is likely due to the Galloway-Proctor flow depending explicitly on time. In the case of the Roberts flow, a δt small enough to satisfy the stability requirements for a given N F is sufficient to also satisfy the accuracy requirement of 10 −5 ,  except for the most simple case of R m = 4. This affects the performance of Parareal through the ratio of computational run times R f /R c because we have to use essentially the same time step for both the coarse and fine method. In contrast, for the Galloway-Proctor simulations, satisfaction of the stability requirement did not guarantee accuracy within the required tolerance, and a reduced δt must be used for the fine solver, leading to a better coarse-to-fine computation time ratio.

Fixing the coarse solver
The N F and δt determined in Section 5.1 are used in the fine solver. We now discuss the different possibilities available for choosing a coarse solver. Using the same spatial resolution with t > δt was not suitable for the Roberts flow, as δt was the largest stable time step for a given resolution. It was also unsuitable for the Galloway-Proctor simulations, as the ratio of t/δt was not large enough to give meaningful speedup. Use of a fully implicit coarse solver was rejected, as the spectral spatial discretisation meant that a dense matrix solve would be required at each time step. This large increase in computational complexity would reduce the difference in computations required between the fine and coarse solvers, leading to smaller speed ups. There was little scope to attempt to use reduced physics in this study, as we are already considering the simplest form of dynamo problem. However, this strategy may be useful in further work on a non-linear dynamo. Coarsening in both space and time was found to be the most promising strategy. As time step stability is linked to spatial resolution, reducing spatial resolution allows a larger time step to be taken, even where the fine solver is at the largest stable time step. This means that N C < N F and t > δt, opening up the possibility of a large difference in computational complexity between the coarse and fine solvers. However, too aggressive coarsening will lead to a very Accuracy is constrained by spatial resolution, until the finest resolution is reached in each case. As the resolution reduces, the largest stable time step increases as expected. Error increases above 10 1 (off the top of the graph) indicate that the method has become unstable at that time step. Accuracy for a given resolution/time step size is higher for the Roberts flow than the Galloway-Proctor flow. inaccurate coarse solver and slow convergence. The most efficient amount of spatial coarsening was studied by carrying out Parareal simulations with a wide range of coarse method spatial resolution.
Simulations were carried out for the Roberts flow with R m = 512. This is moderately high, whilst allowing relatively modest compute resources to be used. The fine solver parameters were fixed, with N F = 160, and δt = 10 −2 , while the coarse step t was set to the highest stable step for the given N C . This was found by estimating the error at different time steps for each resolution, as shown in Fig. 5a. A similar study was carried out for the Galloway-Proctor flow (Fig. 5b). The number of Parareal time slices N P was fixed at 10. Fig. 6 shows that the peak speed up is acquired when N C = 0.5N F . When N C < 0.5N F , the speed up is reduced by the extra number of Parareal iterations required to converge, and when N C > 0.5N F , the difference in computational complexity between the coarse and fine solvers is insufficient.

Scaling results
Scaling tests were carried out for both the Roberts flow and the Galloway-Proctor flow. Simulations of the Roberts flow were carried out on the ARC 3 HPC facility at the University of Leeds, made up of Intel Xeon E5-2650v4 (Broadwell) CPUs, with a total of 6,048 cores. Simulations of the Galloway-Proctor flow were carried out on the ARCHER HPC facility, made up of Intel Xeon E5-2697v2 (Ivy Bridge) CPUs, with a total of 109,056 cores. A range of R m were simulated to see the effect on Parareal performance (see Table 1). Scaling performance was compared with pure spatial scaling of the Dedalus solver. Fully parallel in space and in time simulations were also carried out in order to show how Parareal can increase scalability beyond the saturation of spatial scaling. Validation was carried out by comparing computed growth rates with those found in the literature. Growth rates for the Roberts dynamo were found to be consistent with those obtained by Plunian and Radler [10], which were reported to 2 significant digits, with the peak growth rate for each R m occurring at the same k z wave number reported. Growth rates for the Galloway-Proctor simulations were consistent with those found by Charbonneau et al. [49], with the peak growth rate occurring at a k x of 0.57. The Galloway-Proctor flow had the correct behaviour in terms of growth rate for large R m , with the growth rate staying positive, showing the expected fast-dynamo behaviour.

Roberts flow
Scaling results for the Roberts flow are shown in Fig. 7 for R m = 512 (upper figures) and R m = 4096 (lower figures). For both values of R m , both parallel scaling and efficiency in space are superior to Parareal at low processor counts. As expected, spatial scaling is better for the R m = 4096 case with higher spatial resolution, due to higher workload per processor. While Parareal alone is not competitive, in both cases a combined space-time parallelization generates slightly more speedup than a pure spatial parallelization. The theoretical maximum efficiency for Parareal is 1/3, indicated by a horizontal dashed line, due to the simulation requiring three iterations to converge. However, because of the relatively expensive coarse solver, Parareal's observed efficiency is mostly substantially lower. As the efficiency of the combined space-time parallelisation is the product of the parallel in space efficiency and the parallel in time efficiency, it is low for high numbers of processors because of the low efficiency of Parareal for the Roberts flow. Despite the larger overall speedup, with efficiencies below 0.1, space-time parallelization using Parareal may not be particularly attractive.

Galloway Proctor flow
Results for Rm= 3, 300 and 3000 for the Galloway Proctor flow are shown in Fig. 8. Performance of Parareal is much better than for the Roberts flow. Parareal speed up is competitive with spatial parallelisation at a relatively low number of processors. In all of these cases, the number of iterations required to converge was three, so that the efficiency is bounded by 1/3, and the efficiency of Parareal stays close to this bound over a range of N p , and does not fall much as the number of processors increases. The poor performance of Dedalus in parallelising the Rm=3 case is attributable to the fact that there are only 16 2 spectral modes. In the R m = 3000 case using 32 processors in space, the results show that speed up above that of spatial parallelisation alone is possible, with an efficiency around 0.16. In all cases, Parareal has not yet reached saturation in its scaling performance, and has almost ideal scaling behaviour, except for a constant offset due to the bounds on Parareal scaling. This is shown in Fig. 9, where the efficiency of the method is tracked over the different R m . Efficiency is close to the bound of 1/3, and Parareal efficiency does not fall with increasing R m . Pure Parareal efficiency was estimated in the R m = 3000 case by dividing by the efficiency of the spatial parallelisation found for that particular N S (32). In that case, total parallel efficiency is lower than for the other Galloway-Proctor cases due to the combination of the spatial and temporal parallelisation, and is approximately the product of the two, as expected. This reduction in overall efficiency is unavoidable, as parallel in space is more efficient than Parareal for lower numbers of processors, and Parareal only becomes competitive after spatial efficiency falls away. Also shown on this Figure are the efficiencies obtained at different R m for the Roberts flow, highlighting the difference in performance of the method for the two cases.

Conclusions
The Parareal algorithm has been found to offer parallel speed up for kinematic dynamo simulations beyond what can be achieved through spatial parallelisation alone. In the case of the simpler steady Roberts dynamo, the speed up is modest and parallel efficiencies are low. Here, owing to the steady nature of the imposed velocity, the difference in computational complexity of the coarse and fine methods is found to be too small for good performance of Parareal. The issue was that the Total number of processors is calculated as number of processors in space (N S ) multiplied by number of processors for Parareal (N P ). Speed up and efficiency are both poor for low numbers of processors for Parareal, but as the parallelisation in space saturates, further gains can be made from Parareal, although they are small. Parareal does not offer any gain over parallelisation in space until parallel efficiency is less than 0.1, and does not come close to the theoretical maximum of 1/k, where k is number of Parareal iterations. time step size was not a limiting factor on the accuracy of the fine solver; as long as the time step was stable, it was within the accuracy required. Therefore, there was little room to use a coarser resolution for the coarse propagator in Parareal.
Performance for the time-dependent Galloway-Proctor flow was better and the efficiency stayed close to the theoretical limit over a wide range of magnetic Reynolds numbers, while scaling well to large numbers of processors. In this problem, since evolution of the magnetic field depends explicitly on the current time, the accuracy of the solution depends more on the size of the time step. This means that a time step in the coarse solver much larger than that of the fine step is possible, allowing for better Parareal performance.
To summarise the difference in performance of the two problems, the Roberts flow fine solver is accurate with relatively high timestep size (close to the stability limit). This means that the coarse solver timestep is only a little larger (∼ 2 times) than the fine timestep. In the Galloway-Proctor flow, the accuracy of the fine solver requires a timestep lower than the stability limit −− due to the time dependence of the driving flow. Therefore, the coarse solver, which only has to be stable, can have a timestep ∼ 20 times larger than the fine solver in this case. This leaves more potential for speedup.
Fully coupled dynamic dynamo simulation is complicated, and has non-linear dependencies, and so the accuracy of the fine solver is expected to behave more like the Galloway Proctor flow. Therefore, the good performance of Parareal for the Galloway-Proctor flow suggests that good performance can be expected also for more complex dynamos. This paper therefore gives an interesting example of how studying too simple a problem can lead to an overly negative assessment of the performance of Parareal.
The parallel efficiency of the Parareal algorithm applied to the Galloway Proctor dynamo is close to the theoretical maximum of 1/k con . This means that the overheads due to communication are small, in comparison to the serial cost of the coarse method, pointing to an efficient implementation of the algorithm. Performance of the algorithm when applied to Fig. 8. Speed up and parallel efficiency of the parareal method compared to parallelisation in space for R m of 3, 300, and 3000, Galloway Proctor flow. Total number of processors is calculated as number of processors in space (N S ) multiplied by number of processors for Parareal (N P ). In the case of R m =3000, parareal simulations were carried out with 32 processors in space, as serial runs with one processor were time intensive. Spatial resolutions required were 16 2 , 128 2 , and 512 2 respectively. Results here are more promising than in the Roberts flow. Parareal becomes more efficient than spatial parallelisation for smaller processor numbers, and keeps higher efficiency for longer, closer to the theoretical maximum of 1/k (k: number of Parareal iterations). Scaling saturation for parareal has not been reached even at 1600 processors in the R m =3000 case. Fig. 9. Parallel efficiency vs R m for Galloway-Proctor and Roberts dynamos. Galloway-Proctor results show higher efficiency than the Roberts flow. Parallel efficiency of the method does not appear to degrade with increasing R m . There is a reduction for total efficiency for R m =3000, however, this is due to a combination of the efficiency of the spatial parallelisation with the parareal efficiency. Efficiency of parareal alone is comparable to the efficiency of the lower R m simulations in the Galloway-Proctor case. this problem does not appear to degrade with increased R m , as can be seen in Fig. 9. Performance has remained constant, with Parareal efficiency not much lower than 1/3 for R m = 3, R m = 300 and R m = 3000. This is noteworthy since highly advective problems are thought to cause problems with Parareal convergence, but this has not yet been found in the highly advective case with R m up to ∼ 10 3 .
The results in this paper show that parallel in time methods can speed up kinematic dynamo simulations and are therefore worthy of further study. Better parallelization would enable the study of dynamos at larger magnetic Reynolds numbers by reducing simulation times by harnessing the ever growing number of available cores in HPC facilities. Current and future work involves extending our analysis to non-linear dynamic dynamo systems and the interaction of magnetic fields with convection. The development of successful parallel-in-time methods for these problems could allow the integration of geoand astro-dynamo simulations in parameter regimes closer to reality than have been hitherto possible.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.