Efficient simulation of cardiac electrical propagation using high order finite elements

We present an application of high order hierarchical finite elements for the efficient approximation of solutions to the cardiac monodomain problem. We detail the hurdles which must be overcome in order to achieve theoretically-optimal errors in the approximations generated, including the choice of method for approximating the solution to the cardiac cell model component. We place our work on a solid theoretical foundation and show that it can greatly improve the accuracy in the approximation which can be achieved in a given amount of processor time. Our results demonstrate superior accuracy over linear finite elements at a cheaper computational cost and thus indicate the potential indispensability of our approach for large-scale cardiac simulation.


Introduction
New insight into cardiac function is of great importance to medical science, not least because heart disease is the leading cause of death in the developed world; in the United Kingdom alone it accounts for more than one in six of all deaths [1]. Increased understanding of the working of the heart in both physiological and pathological conditions will therefore aid the development of new treatments for a variety of cardiac and non-cardiac [2,3]

diseases.
A major hurdle we face is that obtaining high spatial and temporal resolution data on the dynamics of the heart is difficult. In a clinical environment, we must settle for low resolution methods such as magnetic resonance imaging (MRI) or electrocardiograms (ECG). Outside of the clinical setting, more information can be obtained from animal studies, for example by optical mapping [4]. Unfortunately, these only provide incomplete data on a limited subset of the parameters of interest. Computational multiscale simulation provides another tool, allowing the measurement and modification of hundreds of different variables in the whole three-dimensional tissue volume, with the added benefits of procedural simplicity and avoiding the need for animal studies.
Myocardial electrical propagation can be simulated using the monodomain or bidomain PDEs [5,6]. Due to its capacity to represent complex geometries with ease, approximations are often obtained using the finite element method (FEM) to discretise the PDEs in space on realistic cardiac geometry meshes; this results in very large (up to forty-million degrees of freedom (DOF) for human heart geometries) systems of linear equations which must be solved many thousands of times over the course of even a short simulation. Thus, they are extremely computationally demanding, presenting taxing problems even to high-end supercomputing resources. This computational demand means that effort has been invested in developing efficient 0021 solution techniques, including work on preconditioning, parallelisation and adaptivity in space and time [7][8][9][10][11][12]. In this study, we investigate the potential of reducing the number of DOF by using a high-order polynomial FEM [13][14][15] to approximate the monodomain PDE in space, with the goal of significantly improving simulation efficiency over the piecewise-linear FEM approach commonly used in the field [16][17][18][19]. For schemes where the polynomial degree p of the elements is adjusted according to the error in the approximation, this is known as the finite element p-version. In the work presented here, we work with schemes which keep p fixed.
Because the a priori L 2 error ku À u hp k 0,2 between the true solution u and finite element approximation u hp of spatially semi-discrete parabolic PDEs satisfies ku À u hp k 0;2 6 Ch l p Àk kuk k;2 ; ð1Þ on a mesh with quasiuniform element diameter h, for some constant C with l = min{k, p + 1} (when the true solution has k square-integrable derivatives, allowing the norm k Á k k,2 , which we recall later, to exist) [15], and because for our problem we believe that k is large, we have good reason to look for greater computational efficiency from using larger values of p and h in order to obtain a desired accuracy. This allows for an exponential rate of error reduction as p increases. It is reasonable that we should expect k to be large given that our problem is a homogenised model of a physical process. Careful choice of h and p can result in a linear system with fewer DOF and thus improved computational efficiency. In other fields such as acoustics, elastodynamics and electromagnetics, this approach has been shown to produce a speed-up of five to ten times over a standard linear FEM approach [20]. Work to-date on higher-order elements [21,22] has focused on hexahedral meshes and what is effectively lumping of the mass matrix [23] (despite claims that an advantage of such high-order methods is that they avoid mass-lumping [20]). The existing approaches demonstrate a two-to threefold speed up over linear FEM for a 3D parabolic test problem on a coarse cardiac geometry [22]. Our approach allows the use of tetrahedral meshes and lends itself to spatial adaptivity, although we do not investigate the latter here.
In this study, we focus on the monodomain equation, although the presented techniques can easily be extended to the bidomain problem. In one spatial dimension, we provide comparisons of the error in the simulations using different polynomial degrees with theoretical a priori results in certain norms, displaying strong agreement. We also use simpler error measures such as activation times and conduction velocities (CV); these are applied in both the one-and two-dimensional cases.

Introducing the governing equations
The myocardium consists of roughly cylindrical cardiac myocytes which are connected to their neighbours by gap junctions, creating an electrically-connected syncytium known as the intracellular space which sits within the extracellular space. The myocytes are arranged into thin fibres, aligned axially. These fibres are in turn arranged into sheets of myocardial tissue. The coordinated contraction of the myocytes which makes the heart beat is orchestrated by spatial waves in the electric potential difference between the intracellular and extracellular spaces, which we refer to as the transmembrane potential. These waves are caused by triggered pulses in the transmembrane potential on a cellular level, called action potentials (APs), triggering further APs in neighbouring cells via tissue-level electrical conductivity.
For isolated cells, APs can be simulated using a system of ODEs [24][25][26][27], or [28] for a detailed exposition, describing how the cellular machinery of the myocytes controls the flux of ionic species through ion channels across the cell membrane. Full cardiac tissue simulation is made tractable via a homogenisation procedure which does away with the individual cells, modelling the tissue instead as two compartments representing the intracellular and extracellular spaces; both are considered to exist at every point in the domain. The spaces are electrically-isolated except for the transmembrane ionic currents between them, which are given by PDE analogues of the isolated cell ODE models. This homogenisation results in the bidomain reaction-diffusion-type system of differential equations describing AP propagation. It consists of two PDEs describing the electrical conduction in the intracellular and extracellular spaces coupled to the PDE system for the transmembrane ionic currents and concentrations w. We must be careful how we discretise the latter, as we shall demonstrate. The two spaces display anisotropic conductivity due to the sheets and fibres; in order to capture this, each has an associated conductivity tensor. Approximating these as being proportional to one another, this system can be reduced to the monodomain: a single PDE describing the dynamics of the transmembrane potential together with a formulation for w that remains unchanged from the bidomain. For more detail, see for example [6,29].
The results obtained with the monodomain equation will not be identical to those found with the bidomain, differing in CV by around 2% [30], but because of the difference in computational effort required to solve the two forms, the researchers use the monodomain where possible. Monodomain simulations can reproduce most of the behaviour seen when using the bidomain, including some which involve phenomena which once required the bidomain such as bath-loading effects [31], but excluding defibrillation due to the need to simulate virtual electrodes [32]. The techniques that we present here are expected to extend without difficulty to the bidomain.
The monodomain system with a cell model is given by C m @u @t À 1 b r Á ðrruÞ À I ionic ðu; wÞ ¼ I stim ðx; tÞ in X; ð2aÞ @w @t uðx; 0Þ ¼ u 0 ðxÞ 8x 2 X; where x is a point in the n-dimensional myocardial domain X, X has boundary oX which is often polygonal due to the methods used to generate it from cardiac MRI [33], outward-pointing unit surface normaln to oX, transmembrane potential u(x, t), initial conditions u 0 and w 0 , conductivity tensor r(x) (related to fibre orientation), time t, cell membrane capacitance C m , surface area to volume ratio b and current I total (u, w, x, t) = I ionic (u, w) + I stim (x, t), consisting of I ionic (u, w) the transmembrane ionic current as described by the cell model and I stim (x, t), the stimulus current as determined by the experimental protocol. g describes how the m cell model state variables w(x, t) = (w 1 (x, t), . . ., w m (x, t)) T vary in time. In this work, we use the Luo-Rudy Phase I (LR91) cell model [24], modified according to [34], for g and I ionic . We are primarily interested in the transmembrane potential u.

Discretisation
For the space discretisation of the transmembrane potential PDE, the FEM approach is outlined in what follows. We recall the definition of the Hilbert space H 1 ðXÞ, . . . ; a n Þ; jaj ¼ P n i¼1 a i and D a :¼ D a 1 1 . . . D an n , where D i :¼ @ @x i [35]. In what follows, for convenience we shall often write ða; bÞ ¼ R X a Á bdx.
We cast the transmembrane potential PDE from system (2)  Choosing some finite-dimensional subspace S & H 1 ðXÞ to work in, with basis f/ i ðxÞg N i¼1 , we can find a spatially discrete approximation u d ¼ P N i¼1 u i / i to u by determining appropriate basis function weights u i . Inserting this, we obtain the semi-discrete system of equations We now write u = (u 1 , u 2 , . . . , u N ) T , I = ((I total , / 1 ), (I total , / 2 ), . . . , (I total , / N )), and M and A for the matrices with (j, i)th entry (/ i , / j ) and (rr/ i , r/ j ), the mass and stiffness matrices respectively, to obtain which we want to solve for u. More details can be found for example in [36,37].
Practically, we require a mesh M ¼ fs i g of elements s i with the property that [ i s i = X and such that when ij, if s i \ s j is non-empty, it is an element vertex (mesh node) or an entire edge of both s i and s j . In 1D, M consists of non-overlapping line segments. In 2D, we use triangular elements. Let is the standard Euclidean metric on R 2 , be the diameter of s i ; h ¼ max i fh i g; q s i ¼ supfdiamðBÞjBa ball ins i g, and Q and R be constants independent of M. We work with meshes that satisfy for all i h h i 6 Q; ð4aÞ where relation (4a) is the property of quasiuniformity of the mesh [38]. The subspace S is determined in our case by the choosing M and a basis f/ i g N i¼1 of continuous piecewise polynomials of degree at most p in each element; we denote by S hp the particular S associated with M and p. The usual choice in the field is to use linear elements (p = 1) with / i (x j ) = d i,j " i, j, where fx j g N j¼1 is the set of nodes of the mesh and d is the Kronecker delta function. In this work we demonstrate how to work with p > 1 for system (2). The spaces we use can be obtained from the standard linear FEM S hp = S h1 by hierarchically adding basis functions of increasing degree to each element. Because each basis function has a DOF associated with it, increasing p results in an enlargement of the system of linear Eqs. (3). However, the significantly improved accuracy that increased p provides allows us to more than offset this by using a coarser mesh.
We take the hierarchical approach because it has the advantage that on each element the basis of degree p is the same as the basis of degree p + 1 with the degree p + 1 functions removed. This lends itself to adaptive techniques, although we do not explore them here. For example, with p = 3 on a 1D reference element [0, 1] we have x; 1 À x; xð1 À xÞ and xð1 À xÞ 1 2 À x which are the two components of a linear basis function, the quadratic and the cubic respectively. In 2D the analogous approach has three linear, three quadratic and four cubic basis functions partially or wholly supported on each element; see Fig. 1 for some of these.

Treating I total and w
We wish to have an approximation to I total of the form P N i¼1 I i / i ; I i 2 R; so that the integration required to generate I in the linear system (3) can be reduced to the computationally efficient product M(I 1 , . . . , I N ) T , referred to as matrix-based assembly. Because the term I ionic in (2a) is nonlinear, in general I total R S hp . Thus we require a choice of a suitable projection P : L 2 ðXÞ ! S hp & S hp for somep; we can then work with PI total . Additionally, the non-diffusing cell state variables w must be solved for with sufficient spatial accuracy; a piecewise-linear approach to w, which is effectively what is used in most implementations, will limit the overall accuracy of the scheme. We can obtain the accuracy needed by using finite elements of degreep to approximate w; in this case there is a very natural way to construct P mapping into S hp (see Section 2.4), so we take the samep for the degree of the image of P and the order of the finite elements used for w.
Takingp ¼ 1 is not sufficient; the problem with this can be seen by considering the error in a 1D simulation at different levels of h-refinement with p kept fixed. Fig. 2 shows how the error in the L 1 (L 2 ) norm (maximum-in-time of the L 2 (X) spatial norm) varies with h using different finite element basis degrees p whenp ¼ 1, demonstrating the restricted convergence. Because of inequality (1) we might expect the error in this norm to be O(h p+1 ), but we only see O(h 2 ). This is consistent with the quadratic convergence of the error in w caused byp ¼ 1. Instead, we letp ¼ p to allow for the full convergence rate. In the next subsection we put this on a solid theoretical foundation.

Analysis for the error in L 2
We begin with a necessary lemma, supposing that the true solution to system (2) has at least k derivatives. Note that when mapped to the real mesh from the reference element, each of these is just part of one of the basis functions that is actually supported on multiple elements. for quadrilateral meshes; the proof Theorem 4 of [40] contains the details necessary to modify w so that it applies when the mesh consists of triangles. Because p is the L 2 projection, we have ku À puk 0,2 6 ku À wk 0,2 . h The importance ofp can be seen via an a priori error estimate (see [41]). We examine this in what follows, where we work with the vectorised formulation of system (2), treating the spatial discretisation of the non-diffusing state variables in the finite element framework. To this end we introduce the inner product ð½a 1 a 2 T ; the L 2 (X) projection into S hp in each of its m + 1 components, with m the number of components of w, and also the (m + 1) Â (m + 1) matrix-like operator Dropping the constants and conductivity tensor r from system (2) for clarity, we obtain the weak form of the full system where u(x, t) : where we are solving for u using elements of order p and for w using elements of orderp, and we have applied P hp to obtain an approximation in S hp to the current I total for the purpose of computationally-efficient matrix-based right-hand side assembly of the linear system (3).  1D simulation results demonstrating that we never do better than quadratic convergence in the L 1 (L 2 ) norm when the approximation to w is linearonly, even when the finite element space S hp is of high enough order to allow for better convergence. The points are the measured errors, the dashed line shows a theoretical quadratic convergence gradient. The simulation was 1D and the errors are measured against a reference with mesh spacing h = 0.0001 cm, whereas the test simulations use meshes with six different spacings ranging from h = 0.01 cm to h = 0.001 cm.
Let R hp : H 1 ðXÞ Â ðL 2 ðXÞÞ m ! S hp Â ðS hp Þ m be a Ritz-L 2 projection, given by where b I is the (m + 1) Â (m + 1) identity matrix adjusted so that b Ið1; 1Þ ¼ 0; ðzÞ 1 denotes the first component of z, and Eq. (7b) ensures that R hp is well-defined. We note that in particular R hp satisfies ðGu; Informed by [41], we now prove the following theorem on the error in the finite element approximation, which demonstrates the importance ofp.
Theorem 2.1. Let u be the solution to system (5) with initial conditions as given in system (2), and u hp the solution to its semidiscrete-in-space form (6). Let l = min{p + 1, k} andl ¼ minfp þ 1; kg, where u has spatial derivatives of order at least k.
Suppose k > 3/2 and that f is Lipschitz continuous in u with respect to the norm k Á k F . Then for some constant C, the following a priori estimate for the error at time T holds: Proof. Following [41], we decompose the error we wish to bound as Taking v = h (which is possible because we chose h such that h 2 S hp Â ðS hp Þ m ) and applying Cauchy-Schwarz, ðh t ; hÞ F þ ðGh; GhÞ F ¼ ðfðuÞ À P hp fðu hp Þ; hÞ F À ðq t ; hÞ F ; or since krhk 2 0;2 P 0 and d dt khk 2 Using the Lipschitz property of f and the L 2 projection bound kP hp ðfðvÞÞk F 6 kfðvÞk F , d dt Now we can integrate to obtain Applying Gronwall's lemma, we see that 41]. Using this estimate and the bounds from [15] and Lemma 2.1, where k can be as large as is allowed by the smoothness of u and w. By combining the bounds for h and q together with which can essentially be found in [41], we arrive at the estimate for the full error: The theorem demonstrates that the approximation properties of S hp are crucial. For example, with p = 3 we would like to get a quartic convergence rate of u hp to u at any particular time T in the norm ku hp À uk 0,2 as we refine h, but we can not expect this ifp ¼ 1 because of the O(h 2 ) terms that then appear in inequality (9). This agrees with our experimental results (see Fig. 2).

Proper treatment of the cell model PDEs for high order FEM
For an element s, a nodal basis / L i È É NðpÞ i¼1 of degree p (for the space of polynomials of degree 6 p on s) associated with a set of nodes fx i g [36]. For our problem (2), we use a nodal basis of degreep on the elements of M to approximate w; the Gauss-Lobatto points associated with polynomials of orderp [42] make a good choice here for high-quality approximation. This naturally gives us an approximation i to P hp ; we compute the current at each x i , and we immediately have the nodal basis weights for our degree-p projection iI total , interpolating I total at the points x i . Informed by Theorem 2.1, we takep P p to obtain the expected convergence rate for our choice of p.
If we assume that the L 2 projection P hp used in Theorem 2.1 and the interpolation operator i that we replace it by in practice are sufficiently close, we can suppose that the error between f and if is Oðhlp Àk Þ, where we also assume sufficient smoothness of f. Of course, we cannot be certain that this error will be achieved with our scheme given that we have not attempted to carefully approximate P hp , but our experimental work has indicated that i is sufficient.
In order to use the product M(I 1 , . . ., I N ) to efficiently integrate the current, because M uses a hierarchical basis, on each iteration of our simulation we change the basis iI total is represented in from nodal to hierarchical.
Because there are no spatial derivatives in the PDE for w, the nodal finite element system reduces to effectively a large number of local ODE systems; these are familiar in concept as the pointwise ODE models that occur in standard linear FEM approaches to this problem. In practice, this means that we never have to form matrices for the nodal system.

Discontinuities in the cell model
This subsection describes a deficiency in the cell model which must be overcome for a proper high-order FEM implementation. Many of the cardiac cell models in the literature include some discontinuous functions which describe the voltagedependent rate at which conductive ion channels embedded in the cell membrane open and close [43]. Such issues can prevent high-order numerical schemes from attaining their theoretical rates of convergence. This has been noted previously for high order temporal schemes [43]; here we identify it as a problem for spatial schemes also. Fig. 4(a) shows the problem; note the deviation of the quartic and cubic solutions from their respective theoretical convergence gradients. The discontinuity exists because two different analytic expressions have been fitted to experimental data for the voltage-dependent transition rates for the ion channel gates in the cell model. Which of these expressions is used is determined by the transmembrane potential, with the discontinuity at the point where the model switches between them (À40 mV).
A continuous replacement has been around for some time [34] but has not been adopted; the problem has propagated due to the fact that some cell models have been created as modifications older ones. Introducing this continuous form is required to ensure theoretically optimal errors in the solutions to system (2). Compare Fig. 4(a) with Fig. 4(b) which differs only in that the cell model used in Fig. 4(b) has undergone this modification. Fig. 3 shows the AP difference between the standard LR91 model and the modified Noble-form LR91; they are quite minor. Given the fact that cell models are generated from experimental data naturally prone to experimental error [44], these differences are probably not worth being concerned with, especially given that the discontinuities do not appear to be biologically justified. We must check for and remove such discontinuities when using a particular cell model for simulation.

Including a smooth fibre field
In one of our test simulations (see Section 3.3.2), we use a geometry that includes holes representing blood vessels passing through the tissue (see Fig. 7(b)). In order to construct a realistic conductivity tensor r we generate fibre orientation vector fields using a Laplace-Dirichlet approach [45]. This involves solving Laplace's equation on the domain with Dirichlet boundary condition +1 on one external edge of X, À1 on the opposite external edge and zero Neumann conditions on all other boundaries. The result is a conductivity tensor field which approximates the way that cardiac fibres negotiate around blood vessels [46].

Simulations
All simulations were performed in MATLAB and use a semi-implicit backward Euler time discretisation scheme. We use a fixedp ¼ 4 regardless of the value of p(64) so that we can focus on the effect of varying the approximation order for the transmembrane potential u; leavingp fixed means that we can examine this in a fair manner. Timings presented are for a 3.4 GHz CPU.

Convergence in 1D
We stimulated a 2 cm 1D domain at one end with a ramp stimulus using a time-step of Dt = 0.001 ms and simulated the first 20 ms of activation. The errors in two different norms are presented in Fig. 4(b) and Fig. 5(a) and use as a reference a quartic solution generated with h = 0.0001 cm. Fig. 5(a) shows the error measured using the L 2 -in-time norm of the H 1 ðXÞ norm in space, for which we expect O(h p ) convergence gradients [15]. Note the agreement of Figs theoretical error gradients presented, and the limited accuracy displayed in Fig. 4(a) caused by the discontinuities in the standard LR91 cell model. The exponential error convergence rate achievable using p-refinement is emphasised in Fig. 5(b). Note that we use a conductivity of 0.5 S m À1 for all our convergence figures.
In order to perform a robust investigation of conduction velocity (CV), we used a 6 cm domain, stimulated at one end with a ramp stimulus and used Dt = 0.01 ms; the activation time for the node at the opposite end of the domain and the CVs are presented in Table 1. We performed two sets of simulations in order to gather data on the fast and slow conductivities that we use later, respectively parallel and perpendicular to the fibres in anisotropic 2D simulations. Note how small h needs to be when using linear elements to achieve CV convergence.

2D homogeneous conductivity
Having demonstrated the superior accuracy in 1D, we move on to 2D. In this subsection, unless otherwise noted we use a time-step D t = 0.01 ms.
Before performing our experiments, we demonstrate the effect of anisotropy in our simulations. Fig. 6 shows wavefront locations (u = 0 mV) with p = 1-4 for two simulation modes at 12 ms, one with homogeneous isotropic conductivity   Fig. 5(a) shows the L 2 (H 1 ) norm of the error; this is the L 2 -in-time norm of the Sobolev k Á k 1,2 norm of the error in space. Fig. 5(b) shows the same data as Fig. 4(b), but with the exponential convergence in p highlighted instead. Data from 20 ms simulations on a 2 cm 1D domain. Dt = 0.001 ms. The errors are against a quartic reference solution with h = 10 À4 cm. The values of h given are in cm.
( Fig. 6(a)) and one with homogeneous anisotropic conductivity using fibres aligned to the x-axis (Fig. 6(b)). In the latter case, the conductivity along the fibres is the same as that for the isotropic case (1 S cm À1 ), and perpendicular to the fibres we use one-fifth of that value. Both simulations were initiated with the same stimulus current in the lower-left corner of the domain and use a mesh with mean element diameter 0.0113 cm. Note the poor accuracy of the linear case, and note further that where the anisotropic p = 1 wavefront has propagated predominantly perpendicular to the fibres in Fig. 6(b), its accuracy is even worse. This is due to the low conduction velocity in that direction, requiring better approximation properties (smaller elements, larger p) to properly capture u at the wavefront.
In our first 2D experiment we investigated a 1 cm by 1 cm 2D domain with homogeneous isotropic conductivity. A ramp stimulus was applied to a rectangular region along the bottom of the domain, generating a planar propagating wave. The measured CVs and activation times at the top-right corner of the domain are given in Table 2. Each simulation was performed with Dt = 0.01 ms and D t = 0.001 ms (asterisked) in order to investigate how the temporal error affects the Table 1 Six cm 1D domain simulation with a ramp stimulus at one end and Dt = 0.01 ms. The activation time is the time until the transmembrane potential at the node at the opposite end of the domain (at 6 cm) passes up through 0 mV. CVs are computed using the nodes at 2 and 4 cm.  results. The CV is an average taken over many randomly selected point pairs in the domain. Percentage errors in conduction velocity are also presented in the table, using the finest quartic simulation with the same value of Dt as a reference. Note the high accuracy of the quartic simulations, and the minimal variation in the percentage errors caused by varying Dt. Details on the degrees of freedom are in Table 5. The data shows that p = 4 on the coarsest mesh or p = 3 with Dt = 0.01 ms on the second coarsest mesh ought to be preferred over p = 1 on the finest mesh, given that this produces at worst a halving of the percentage error using a linear system which takes less than half the time to solve. Alternatively, p = 2 on the second coarsest mesh with Dt = 0.01 ms produces roughly equivalent accuracy to the finest linear solution, but does so using a linear system which can be solved eight times faster.

Plain square domain with cubic fibre field
We studied simulation on a 1 cm by 1 cm domain using Dt = 0.01 ms and inhomogeneous anisotropic conductivity, with the conductivity along the fibres the same as that used in the isotropic case above (1 S m À1 ), and one-fifth of this value in the perpendicular direction. The fibre field is defined by a cubic polynomial designed to represent cardiac fibres rapidly changing orientation (see Fig. 7(a)) and is integrated by reading the value of the vector field at each Gauss point [47] during matrix construction. The domain was stimulated with a ramp in the bottom-left corner of the domain. The measured activation times at the four points shown in Fig. 7(a) are given in Table 3. Note how cubic and quartic elements display superior accuracy to all tested linear simulations by the second coarsest level. Details on the degrees of freedom are in Table 5. Fig. 7(b) shows the fibre vector field on a 1 cm by 1 cm domain with two holes representing blood vessels passing through the simulation plane. We performed this study to demonstrate the applicability of the method when the domain contains Table 2 Activation times, conduction velocities, percentage conduction velocity errors and linear system solve times for a variety of 2D meshes of X = [0, 1] Â [0, 1] cm. The first timings presented are for a single time-step only and use preconditioned conjugate gradients (PCG) with an incomplete LU (ILU) decomposition of the portion of the system matrix corresponding to the linear basis functions as a preconditioner, and the second times all code on a time-step iteration (ILU PCG + cell updates, etc.). The stimulus was along the bottom edge of the domain and the activation time is for the node in the top-right corner. Asterisked basis degrees indicate that the simulation was run with D t = 0.001 ms instead of the usual Dt = 0.01 ms. The percentage errors use as a reference the p = 4 simulation on the finest mesh with the same value of Dt. fine structure that must be properly captured using small elements. This vector field was generated using the Laplace-Dirichlet approach and the time-step used was Dt = 0.01 ms. The activation times at the three points shown in Fig. 7(b) are given in Table 4. Our results show that the second coarsest mesh with p = 3 can half the worst percentage error in activation time using a linear system which can be solved twice as quickly when compared to p = 1 on the finest mesh, or with p = 4, we can reduce the error to one-tenth that of p = 1 on the finest mesh with a linear system which takes slightly longer to solve. See Fig. 8 for some wavefront locations using various meshes and values of p; note how poor the wavefront location can be when p = 1 even on highly refined meshes, and how on the second coarsest mesh with p = 3 the results are better than the computationally twice-as-demanding p = 1 on the finest mesh.

Degrees of freedom
Details on the degrees of freedom for the various meshes and basis degrees are presented in Table 5.

Summary of results
We have shown how to successfully employ high-order finite element methods for simulation of the cardiac monodomain system (2), meeting or exceeding theoretical error convergence rates. The method achieves our goal of  Fig. 7(a), results are displayed in Table 3 and stimulation was a ramp in the bottom-left corner. For Fig. 7(b), results are displayed in in Table 4 and stimulation was a ramp along the bottom of the domain.

Table 3
Activation times in the 1 cm by 1 cm domain with fibre orientation and four measurement points shown in Fig. 7 improved efficiency over linear finite elements; we produce highly accurate numerical solutions cheaply, as can be seen for the homogeneous conductivity by comparing the finest linear solution to the second coarsest cubic Dt = 0.01 ms solution in Table 2. In this case, taking the finest quartic solution as a reference, we see that we obtain a six times smaller percentage error in activation time with a linear system that takes one-third of the time to solve. It can be used with isotropic and anisotropic conductivity, and geometries which include microstructure such as blood vessels passing through the tissue. The improved efficiency is key because good numerical approximations can take days to obtain with present technology. As shown in Table 2, convergence in the conduction velocity requires very fine meshes when working with p = 1. Even at h = 0.0111 cm, the CV error is still around 2.5%; at that level, the error in the position of the wavefront will become very large during long-time, whole-heart (large domain) simulations.
In the case of inhomogeneous conductivity (Table 3) the second coarsest cubic solution is the point at which the accuracy starts to beat that of the finest linear solution. We note that in this case, the coarsest mesh should not be used for simulation; due to the low conductivity perpendicular to the fibre direction (one-fifth of that parallel to the fibres), we see that the activation times at nodes one and two overshoot the converged activation times. This is due to the effective mesh size being coarser when the conductivity is lower, introducing error into the solution. This explanation is supported by the fact that node three, being connected to the stimulus site by a straight line to which all the fibres along its length are aligned, does not experience any activation time problems. See also Table 1, which shows that the same effect occurs in 1D on very coarse meshes when using a low conductivity equal to that for the slow direction here (see h = 0.1, p = 4). Because error due to the coarse mesh discretisation of u will be compensated for by the accuracy gained as we increase p, we believe that this effect is due to insufficient resolution in the spatial discretisation of w, as this does not change as we refine p. Further investigation is needed to confirm this.
When holes are present in the domain, the simulations are much slower (Table 4) due to the increased number of elements required to mesh around the holes (see Fig. 8). Here we recommend using, for example, p = 2 on the third coarsest mesh for a sixfold accuracy improvement using a linear system which can be solved in half the time when compared to the finest mesh with p = 1. This accuracy can be seen by the position of the wave fronts in Figs. 8(c) and 8(d). Alternatively, the second coarsest p = 2 simulation can be seen as providing considerable accuracy improvement over p = 1 on the third coarsest mesh with a linear system which takes the same amount of time to solve. An example high-resolution anatomically-derived heart mesh has h = 0.0125 cm [33]; this is comparable with the third coarsest mesh here. See Table 6 for more detail and other possible choices of mesh and p.

Limitations on error reduction
We note that Figs. 4(b) and 5(b) show some unexpected behaviour, apparent in Fig. 4(b) as a crossing over of the errors for the p = 3 and p = 4 solutions, and in Fig. 5(b) as a smaller-than-expected error reduction when going from p = 3 to p = 4 on the coarser meshes.
We are not certain of the cause of this. Possible explanations include it being caused by the error introduced by our approximation of P hp by i, or that the smoothness k of the cell model is limiting, for example where Lemma 2.1 is applied. Table 4 Activation times in the 1 cm Â 1 cm domain with holes, fibre orientation and three measurement points shown in Fig. 7(b) using various h and p values. The meshes are identified by maximum element diameter because the mean element size would convey little information due to the very small elements near the holes. The timings presented are for the linear system only (ILU PCG) and for the whole-timestep (ILU PCG + cell updates, etc.). More information about the meshes used is presented in Table 5, and some plots of the wavefront at 17 ms are shown in Fig. 8 Table 4. Simulations using the very simple FitzHugh-Nagumo model [48] instead of LR91 withp ¼ 4 do not display any problems, neither dop ¼ 6 simulations with LR91 for p = 1-4. Regardless of this, the efficiency remains compelling, and we note that the time-integrated Sobolev norm spatial error ( Fig. 5(a)) does not appear to be so strongly affected.

Assumptions on the regularity of the cell model
Note that the requirement in Theorem 2.1 that f be Lipschitz in u is not satisfied for general cell models. However, if we suppose that u is bounded, the Lipschitz property will hold. For simulations, there is evidence to suggest this boundedness occurs for cell models modified to take electroporation currents into account [49][50][51]. Indeed, if we do not have boundedness, we can say that there is a problem in the simulation study design rather than in the numerics; it is certain that realworld quantities such as the transmembrane potential will be bounded in general, and all the more stringently whilst the cell is still functional. We also need to assume that the [Ca 2+ ] is bounded away from zero; again this is likely to be the case in a functioning cell due to leak currents.

Representation of I total
The approach that we take to I total could be compared to a more standard scheme which used a piecewise linear I total . Neither approach involves making any statements about what I total looks like; even though I total is smooth on elements and has discontinuous derivative at element boundaries, the positioning of the elements themselves is essentially arbitrary within the domain. It is clear therefore that no claims are really being made about any lack of smoothness in I total . Therefore, the assertion that we make in using the p-version that the smoothness of u (depending on the smoothness of I total ) is sufficient so as not to limit the rate of convergence the p-version is not really any different from the assumptions that are implicitly made when discretising X for any standard application of FEM to the cardiac equations.

Gauss points for the cell model
Another approach to approximating the cell model variables w would be to construct a nodal basis using the Gauss points [47]. Instead of integrating the right-hand side of Eq. (3) using the mass matrix, we could do soby computing the current at each node and then integrating it against each basis function on each element using quadrature. This would work, but would be considerably more computationally demanding than the matrix-based assembly we use [52].
Another alternative would be to use the same nodal basis for u and for w; this would allow matrix-based assembly, but without the need for the change of basis (see Section 2.4). However, without the use of the hierarchical basis for u, our method would no longer be suitable for the p-adaptivity that we wish to investigate in future work.

Parallel simulation and timings
We note that while we have presented timings for both solving the linear system and also for all code that runs on each iteration (the difference between the two being primarily due to the cell model updates), less attention should be paid to the all-code timings. This is because our simulations were performed on one processor, and the cell state variable PDEs can be parallelised straightforwardly. As the future of cardiac simulation will be on massively parallel machines, this is important.
A consideration for implementing the approach presented in this work in a parallel computing environment is that as much of the work as possible should be achievable using data local to each element or patch of elements so that when the workload is divided up between different processors, the communication between them can be kept to a minimum. Our approach satisfies this requirement; for example, the change of basis that we perform to switch the current representation from nodal to hierarchical is performed locally on each element. In fact, the cell model and change of basis code could in future be implemented as GPU code; this has promise as some workers have already demonstrated that considerable speed-up is possible [53,54]. Table 6 Simulations on the mesh with holes in the context of the prescribed error tolerances for percentage activation time error at which they would be acceptable. The average single-iteration linear system solve time ratio (time for simulation)/(time for finest linear simulation) is given. Finest mesh p = 4 solution taken to be the ''true'' solution. Errors are worst-case over the three test nodes used. Full data in Table 4 Timings in general should be treated with some caution, as there are many factors which can affect them. At the most basic level, solving the linear system using an iterative method requires matrix-vector multiplications (matvecs). With the sparse data structures used in this sort of problem, the cost of each matvec is a function of the number of nonzero entries in each row of the system matrix and of the number of rows it has; note that the last two vary with h and p. Each solve will require multiple matvecs to achieve convergence, with the number needed varying with the condition number of the system matrix. Thus, in addition to depending on h and p, the relative time costs of solving different linear systems will vary according to the preconditioner, the hardware type (parallel architecture, CPU cache size) and the software implementation. In this work, we have not looked at all of these; for example, further study investigating efficient preconditioning [29,7,8] for the linear system we have constructed may lead to further overall efficiency gains.

Conclusion
Because of the efficiency we have demonstrated, our work leads us to recommend preferring higher-order finite elements for cardiac monodomain simulation. With careful choice of meshes, this has the potential to reduce the amount of time required to perform whole-heart simulations; these have linear systems which are many times larger than the ones for our relatively small test simulations, so the overall efficiency savings will likely be substantial. Across all meshes, the advantages of using high p are considerable. In the case of anisotropic conductivity, it is likely that obtaining even larger gains will be possible by using our method together with meshes which are finer perpendicular to the fibre orientation than parallel to it in order to compensate for the reduced conduction velocity in that direction.
A recent paper comparing the convergence properties of eleven monodomain solvers [55] introduced a benchmark problem which can be easily applied to new codes so that they can be robustly evaluated relative to existing software. In future work, we will further evaluate our method by applying it to this problem.
Our one-and two-dimensional work using the monodomain should be seen as a proof-of-concept; in the future we shall look to extend this to three dimensions, bidomain and realistic cardiac geometry. The hierarchical nature of the elements we use lends them to adaptive methods which have the potential to bring about further significant gains in simulation efficiency.