EPIC: The Elliptical Parcel-In-Cell method

We present a novel approach to simulating general two-dimensional ﬂows, which could also be applied to other areas of continuum mechanics. The approach generalises the Particle-In-Cell (PIC) method, originally used to model two-dimensional hydrodynamics, by representing ﬂuid elements by elliptical parcels. The rotation and deformation of these parcels are calculated, and parcels split beyond a critical aspect ratio. Conversely, small parcels are eliminated by merging them with larger ones. The elliptical parcels well represent the ﬂow deformation and have excellent conservation properties. In contrast to earlier work that combined PIC with elliptical parcels that split and merge, a vorticity-based framework is used, and accurate integration over ellipses is performed eﬃciently by two-point Gaussian quadrature. The small-scale mixing associated with parcel splitting and merging is shown to be strongly convergent with grid resolution. The robustness, versatility, accuracy and eﬃciency of the new Elliptical Parcel-In-Cell (EPIC) method is demonstrated for a variety of standard test cases, and compared with a standard pseudo-spectral method. The results indicate that EPIC is a promising, Lagrangian-based alternative to grid-based methods. © 2022 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
The numerical representation of fluid flows and other continuum mechanical systems by Lagrangian particles or parcels (the latter objects having a finite volume) communicating with an underlying grid has a long history dating back to [1,2] (for a historical review please see [3][4][5]).There have been diverse applications, ranging from two-dimensional vortex interactions, to galaxy dynamics and even to gaming.There have also been diverse approaches, such as in the way the particles/parcels communicate with the underlying grid (e.g.interpolation).Yet, despite this, fully grid-based methods are overwhelmingly preferred by the scientific community, especially in fluid dynamics (hydrodynamics) and magnetohydrodynamics.For example, some of the most intensive uses of computer resources are in weather forecasting and in climate modelling, and those models almost without exception use a fixed computational grid.Largely this is driven by developments in parallelism and the need to parametrise processes such as unresolved convection, radiation and gravity-wave drag, which are often designed around grid-based approaches.Having a fixed number of computational elements allows certain efficiencies and appears simplest.
Fluid flows, in particular the ultra-high Reynolds number turbulent fluid flows found in atmosphere/ocean dynamics, exhibit energetic fine-scale structures like fronts, storms, etc which play an important role in the evolution of the system.These are not well captured using a fixed computational grid, and invariably significant errors accrue when under-resolving (or not resolving) such structures.While adaptive mesh refinement (AMR, e.g.[6][7][8][9]) helps to provide additional resolution in the most turbulent regions of a flow field and can go part way in following complex structures if a moving mesh is used, this method can be challenging to implement in an efficient way.Arguably, a more natural approach is to use Lagrangian computational elements (particles/parcels).They offer great flexibility and may be relatively easily focused in regions requiring greater resolution.They also offer great adaptability, allowing one to increase or decrease the number of parcels in response to changing conditions.
Other advantages include a fully explicit representation of turbulent mixing (especially in the Elliptical Parcel-In-Cell method we introduce below, this is in contrast to Eulerian methods where there is numerical diffusion in all prognostic fields), and an ideal framework for the Lagrangian advection of tracers (both active and passive).Each parcel can carry an arbitrary list of attributes besides its position: shape, volume, mass, vorticity, as well as thermodynamical, chemical and even biological attributes.A separate partial differential equation for each attribute is not required as in fully gridbased methods.Any materially-conserved quantity remains exactly conserved (e.g. a passive tracer, and density or mass in an incompressible fluid), at least when parcel splitting/merging is done consistently (see below and [4]).This means that conservative tracer dynamics can be computed almost for free; only a space in memory is required.Other attributes (e.g.vorticity) are updated by ordinary differential equations (ODEs).Parcels also sample subgrid scales, and thereby offer an alternative to grid-based subgrid closure schemes.
Parcel-based approaches, however, do have potential shortcomings.Besides the greater challenges of parallelisation mentioned above, two serious issues are (1) the difficulty in maintaining an adequate number of parcels in each grid cell, and (2) an underestimation of turbulent mixing.The first issue can also lead to problems in representing incompressible flows, where one would ideally like to ensure that the gridded volume (or area in two dimensions) -obtained by interpolation from the parcels -remains constant.The Lagrangian advection of a finite set of parcels does not automatically preserve gridded volume.The second issue regarding mixing is discussed in more detail in [10].Although this previous study included a description of parcel stretching and mixing at small scales, convergence analysis suggested that too little mixing occurred at small scales.This is important for example in properly modelling (in a statistical sense) thermodynamical processes and chemical reactions on small scales.A lack of mixing may be the result of either the formulation of parcel stretching, or due to the lack of explicit subgrid scale parcel-parcel interactions [e.g.11,12] or processes at scales smaller than individual parcels.
The current paper makes major progress on both of these issues.We develop an approach which ensures adequate parcel representation in each grid cell and the preservation of the parcel-interpolated gridded volume, without the need to occasionally reset the parcel distribution.We also show that deformable elliptical parcels significantly improve the representation of subgrid scale turbulent mixing.
The article is structured as follows.In the next section, we detail the key elements of the EPIC method after first reviewing the basic aspects of the Particle/Parcel-In-Cell (PIC) method.In Section 3, the structure of the numerical algorithm is presented, including a flowchart.In Section 4, we examine the performance of the EPIC method in three diverse test cases.The results allow us to determine numerical parameter values by balancing cost and accuracy.The objective is to produce the most efficient method for a chosen underlying grid resolution, and to provide a simple user interface wherein only initial data on the grid is required, as in conventional fully grid-based approaches.Our conclusions are offered in Section 5.There we note that the present two-dimensional method is easily extended to three dimensions, with all of the same advantages.We will report on this extension in a subsequent paper.

Parcel-In-Cell (PIC): a brief overview
The PIC method was developed to accurately treat advection by following Lagrangian parcels (or particles) directly, while making use of an underlying grid structure to accelerate the calculation of interactions between parcels (the parcels communicate via the grid).For a large number of parcels, the direct calculation of all such interactions is computationally expensive.Instead, the PIC method interpolates parcel properties to the underlying grid where well-established efficient methods exist e.g. for inverting Poisson's equation and for calculating spatial derivatives.Quantities computed on the grid are then interpolated to the parcel positions, e.g. the velocity field required to move each parcel.Then, simple ODEs are used to update parcel positions and possibly other attributes.
The version of the PIC method developed by [2], for example, followed clouds of point vortices to study the behaviour of an unstable jet (two side-by-side bands of approximately uniform vorticity).Point vortices are particles of infinite vorticity, zero area, but finite circulation (the product of vorticity and area).Their direct interaction can be computed in theory, but this is much more expensive than first building a gridded vorticity field from the point vortices, inverting Poisson's equation and computing the velocity field on a regular grid, and finally interpolating this velocity field to each point position.Each particle requires two ODEs to update its x and y coordinates.The approach is simple and efficient, and when it was first developed, it led to a breakthrough in the understanding of nonlinear, inviscid two-dimensional flows.
The PIC method approximates the particle-particle or parcel-parcel interaction to gain efficiency (for a large number of parcels).The error depends on the resolution of the underlying grid and the number of parcels used (as well as details of the interpolation, etc).This error is unlike that made by fully grid-based methods: it is not diffusive (no numerical diffusion is required for stability).
Parcels, unlike particles, have finite area (or volume in three dimensions).Parcels are usually overlapped to represent properties continuously on a grid.Using a small number of parcels per grid cell will invariably lead to errors in this representation, but computational cost usually dictates the grid resolution and the approximate number of parcels per cell.
The PIC method has been extended in many ways and we will not attempt to review the entire literature on this subject (some discussion can be found in [3][4][5]10]).Here we instead focus on a relatively simple form of PIC recently developed to model three-dimensional convective clouds in the atmosphere [4], and shown to provide some advantages over fully grid-based Large-Eddy-Simulations (LES) [10].The PIC method in [4] represents an incompressible density-stratified threedimensional flow containing water vapour and liquid water by a space-filling distribution of spherical parcels of variable size.Each parcel carries a series of attributes (volume, buoyancy, specific humidity and vorticity) some of which can change in time either by condensation/evaporation or natural advection (e.g.stretching of vorticity).Gridded properties are obtained from the parcels using simple tri-linear interpolation, and these are used to compute the velocity field and the vorticity tendency on the grid.These gridded quantities are then interpolated back to the parcels, again using (reverse) tri-linear interpolation.Then ODEs are solved to move the parcels forward and to update their vorticity.To model the natural scale cascade in a turbulent three-dimensional flow, an estimate of parcel stretch is used to approximate each parcel's total deformation; however, the parcels are kept spherical at all times.When this stretch exceeds a threshold, the parcel is split into two equal parts along the direction of the local vorticity vector.This splitting will create progressively smaller parcels, and a growing population of them.To limit this process, small parcels below a threshold volume (1/6 3 of the grid cell volume) are eliminated.Their properties are given back to other parcels in their vicinity by interpolation to and from the grid.In this way, conservation of total volume, mass, etc is preserved.
The present paper describes a further extension of the PIC method which allows for deformable, elliptical parcels.Here we develop the two-dimensional algorithm for a dry density-stratified flow and defer the three-dimensional algorithm to a subsequent paper.A similar extension (the Deformable Particle-In-Cell method, which was the first method to combine a PIC method with elliptical parcels that split and merge) was recently developed in [5], who showed that elliptical parcels significantly improve the representation of fluid flows in various test cases examined.In particular, elliptical parcels much better preserve the gridded areas obtained from parcel interpolation in an incompressible flow compared to non-deformable (circular) parcels.Elliptical and ellipsoidal parcels have also been used in other approaches, such as Smoothed Particle Hydrodynamics (a meshfree approach) and the moving particle semi-implicit (MPS) method [e.g.[13][14][15].
The EPIC method developed here could be considered to fall into a wider class of Deformable Parcel-In-Cell methods: we use the name EPIC as it makes it explicit that ellipses are used, and makes it easier to refer to the entire framework presented here.The main difference between our work and [5] is in the way parcel attributes are transferred to the grid, and vice-versa.Here we use a much simpler approach involving two support points placed between the foci of the ellipse, rather than a much more expensive numerical area integration.These two points correspond to the Gaussian quadrature points for an ellipse, as shown in [16], and there is a simple generalisation to 3D ellipsoids requiring four support points [17].Another difference is that our parcels are space filling (sum to the domain area), whereas they are not in [5] (they are largely kept from overlapping).In contrast to DPIC, the parcels in the EPIC framework carry vorticity, which evolves only due to horizontal buoyancy gradients.We document the procedure for obtaining the velocity field from vorticity below.Furthermore, the parcel splitting and mixing procedures differ.During splitting, we additionally preserve the second-order spatial moments (besides volume and centroid).We identify mergers using the distance between centres and the parcel areas, whereas [5] does this by locating two or more parcels in a grid box on a finer grid.The construction of the resulting merged parcel also differs as illustrated in Fig. 1 (for details, see below).Our criterion aims to preserve the second-order moments as well as possible.We also preserve parcel gridded area with much smaller errors by applying an area-based correction to the parcel positions, avoiding any need for parcel re-gridding.Finally, there are differences in interpolation: here we use bi-linear interpolation whereas [5] uses a step function (nearest grid cell).

The flow model
We here develop the EPIC method for simulating two-dimensional (2D) inviscid, incompressible density-stratified flows.However, we stress that the method is much more widely applicable, even beyond fluid dynamics (see discussion).The flow model here is still sufficiently complex to stringently test the new method and relates to a problem of significant importance in environmental modelling.Moreover, it is readily extended in multiple ways, e.g. to include moisture, a magnetic field, etc.
We use x for the horizontal coordinate and y for the vertical one (gravity points downwards in y).We assume the density ρ varies only slightly from a constant background value ρ 0 .Then, making the Boussinesq approximation, we arrive at the (1) Fig. 1.Merger of three elliptical parcels in DPIC [5] and EPIC.While the centroid is located at the same position, the rotation angle, semi-major and semi-minor axes are not identical.

Db
where g is the acceleration due to gravity and D/Dt denotes the material or Lagrangian derivative (see e.g.[18]).Since the flow is incompressible (u x + v y = 0), we can introduce a streamfunction ψ in terms of which u = −ψ y and v = ψ x .Then the definition of vorticity leads to a Poisson equation for ψ : To solve this, we need to consider boundary conditions.Here we assume the domain is periodic in x and is confined between impermeable boundaries at y = y min and y = y max .As the flow is inviscid, only the normal component of the velocity must be zero on the y boundaries, i.e. v = 0 there.But since v = ψ x , then ψ must be constant on each y boundary.
Due to the periodic x boundaries, and the fact that the governing equations Eq. ( 1) and Eq. ( 2) have no explicit x dependence, the solution of Eq. ( 3) for ψ is most easily accomplished in spectral space after a Fast Fourier Transform (FFT) in x.However, this does not generally provide the entire velocity field since ζ may have a component which is independent of x, say ζ0 (y) in spectral space.But using again the definition of vorticity, we have d û0 /dy = − ζ0 .Integration in y then provides the x-independent part of u, apart from a constant.The latter is determined by fixing the global average value of u, which is constant by momentum conservation.Details can be found in Appendix A.

Elliptical parcels: basic representation and dynamics
Given an Eulerian velocity field u(x, t), a Lagrangian fluid parcel i at position x i (t) evolves in time t according to where u i (t) denotes the parcel velocity interpolated from the grid.We represent parcels in coordinates relative to their centres by the ellipse equation [17] x where denotes a transpose, and B is a symmetric positive-definite matrix with constant determinant Here a and b are the semi-major and semi-minor axis lengths, and their product multiplied by π gives the ellipse area (here subscripts i are suppressed for simplicity).
Unlike point parcels, elliptical parcels can deform due to local gradients in the velocity field S = ∇u = u x u y v x v y (6) as described in [19].Assuming a locally linear background flow u(x, t) = S (t)x(t) , (7) in a frame of reference moving with the parcel, the time derivative of Eq. ( 5) yields dB dt = B S + S B (8) after making use of Eq. ( 4) and Eq. ( 7) -see [17,19] or Appendix B. Although we only apply this to two-dimensional (2D) flows in this paper, Eq. ( 8) is also valid in 3D, and indeed in any dimension.For a 2D incompressible flow, i.e.
where we have exploited the fact that the determinant of B is preserved in time.This property further allows us to store only two matrix components and compute B 22 when needed.This makes EPIC as memory efficient as possible.
In [5], the elliptical parcel deformation was computed instead with the simpler equation dM /dt = S M where M is related to B above by B = M M .This leads to our equation for B in Eq. (8).Nevertheless, here we evolve B since this is needed to find the ellipse semi-axis lengths a and b from the eigenvalues of M M = B (see below and section 4.3 in [5]).
Furthermore B is more useful than M for parcel splitting and merging.

Interpolation
As mentioned in the introduction, we only use two support points to interpolate from the parcels to the grid and viceversa.These points lie between the foci of the ellipse at x ± = x ± 1 2 c â (these are the Gaussian points, see [16]), with semi-focal length (11) and principle eigenvector (lying along the major axis) where is the largest eigenvalue of B.
Following the notation of [4], the value at a grid point x = (x, ȳ) of some gridded field q(x, t) is found by interpolating the attribute q i of all parcel support points x ± i lying within the four surrounding grid cells, P(x).As we are using two-point Gaussian quadrature, there is an additional weight of 1/2 to apply.
where analogously is the gridded area and φ(x are the bi-linear interpolation weights.
Similarly, the interpolation from the grid to the parcels is given by where G ± i stands for the four grid points of the cell in which the support point x ± i is located.The two support points do not necessarily lie in the same cell.The sums in Eq. ( 14) to Eq. ( 17) are efficiently computed by summing over all parcel support points.Fig. 2.An elliptical parcel splitting into two parcels of half the area and equal shape, with the same total centroid and total second order moments as the original parcel.

Parcel splitting
When the elliptical parcels evolve in an incompressible velocity field with non-zero strain, they deform according to Eq. ( 8) while preserving their area.This can result in extremely elongated parcels with aspect ratios λ = a/b 1.By contrast, a material area would also bend as it stretches: the elliptical deformation only captures the local strain (or average strain over the area).As the elliptical parcels cannot bend, we instead split them when λ > λ max (a user-defined threshold > 2) or when their area is bigger than a maximum prescribed parcel area V max = Ṽ max V cell ( Ṽ max is also user-defined; see the recommended parameter settings below).Although the individual parcel area is not preserved when splitting or merging occurs, the total area (i.e. the sum over all parcels) is conserved at all times.Parcel merging, as discussed in Section 2.6, reduces the number of parcels per grid box, while the maximum area criterion ensures that we have at least a certain minimum number of parcels per grid cell.
A parcel is split into two equal-shaped ellipses ('child parcels') having half the area of the 'parent' parcel, as illustrated in Fig. 2. Their centres are separated by a distance h = √ 3a/4 from the centre of the parent parcel x, and along the direction of the eigenvector â corresponding to the largest eigenvalue, i.e. the new centres are x ± h â.This construction guarantees that the total area, centroid and second-order spatial moments are conserved.This also results in a simple equation for the shape matrices of the child parcels: Other parcel properties such as vorticity and buoyancy remain unchanged.A complete derivation of Eq. ( 18) is given in Appendix C. Note that the parcel splitting used by [5] does not conserve the second-order moments; instead the parcels are prevented from overlapping.A small overlap is required to conserve these moments.

Parcel merging
Splitting leads to an ever-increasing number of small parcels.In order to limit computational cost, and to model the eventual mixing that takes place due to this scale cascade, we remove parcels of area less than a prescribed minimum area V min by merging them with neighbouring ones.In practice, V min is chosen to be a prescribed fraction Ṽ min of the cell area V cell = x y.A parcel, indexed say by i, is first marked ready to be merged if its area is smaller than V min (i.e.V i < V min ).Then nearby grid cells are searched for the closest other parcel, say i (for an efficient algorithm, see Section 3.4 and Appendix D).Every small parcel i always has a closest other parcel i : we say 'i points to i '.Note, several parcels can point to the same parcel, and two small parcels can point to each other (they are 'double-linked').Small parcels can point to other small parcels, because in some cases a large number of parcels in a region can become small in a single time step.In such cases, allowing mergers between small parcels ensures that merging takes place over small distances.If i is a large parcel with V i ≥ V min , then i does not point to another parcel, but small parcels with V i < V min can point to i (all links are 'single links').
Details of the numerical algorithm which enforces these rules can be found in Appendix D, where directed graphs are used to represent different configurations of parcel links.The algorithm prevents chains of parcels from merging, as in parcel a pointing to b pointing to c and so on.This is done by breaking some single links between small parcels using an iterative approach.Otherwise, mergers could result in a highly distorted parcel and may be more difficult to parallelise.Nevertheless, the algorithm ensures that every small parcel is merged in each time step.This is important in particular for the Taylor-Green flow studied in Section 4.1, where we found that the number of small parcels increased without bound when using a simpler approach that did not ensure this.The merging algorithm results in a unique solution, except if the neighbouring parcels are equidistant from the small parcel to be merged with.
Similar to parcel splitting, parcel merging aims to preserve the total area, centroid, and second-order moments.
The merged parcel has area V = πab = i V i (where V i = πa i b i and the sum is over all merging parcels), centroid , and (tentative) second-order moments Here, the elliptical parcel whose centre lies in the first horizontal grid cell has a support point in the periodic extension lying in the grid cell at the opposite end of the domain.( Note that merger is the inverse of splitting, in that two split parcels, if allowed to re-merge, would give back the original parent parcel.In general, however, the merging parcels are not identical nor aligned, so it is impossible to exactly conserve the second-order moments after merging, though area and centroid are conserved.

Enforcing boundary conditions on parcel operations
We use periodic boundaries in the horizontal (x) direction and free-slip boundaries in the vertical ( y) direction.The implementation of periodic boundary conditions is the same as in PIC with point-like parcels.Parcels and support points that leave the domain on one side reenter on the opposite side.A simulation having n x grid cells horizontally consists of the same number of grid points, i.e. the rightmost grid point of Fig. 3 is a periodic copy of grid point 0 on the left.In the example shown, one of the ellipse support points lies in grid cell 1 along with the ellipse centre, while the other lies in grid cell n x across the periodic edge.The orange lines show how the support points contribute to the gridded values at the cell corners.Note that both support points contribute to the two grid points on the periodic edge (i x = 0).
To ensure the x coordinate of a point lies within the periodic domain, a simple construction is used.Let x c = (x min + x max )/2 denote the domain centre in x, and L x = x max − x min denote the domain width.Then we use where [s] denotes the 'integer part' of s.This integer part is −1 if x < x min ; in this case x is increased by L x .The integer part is +1 if x > x max ; in this case x is decreased by L x .Otherwise x is unchanged.It is assumed that x is within L x /2 of a periodic edge, but in practice x is always within a few grid spacings x.
For the free-slip impermeable boundaries at y = y min and y max , a different interpolation in the parcel-to-grid algorithm is required.We introduce a row of halo cells next to each y boundary, effectively extending the y grid points from i y = −1 to n y + 1.Then if an ellipse support point lies within a halo cell, see e.g.Fig. 4, each parcel attribute q i and area V i is interpolated to the four corners of that cell in the normal way, using Eq. ( 14) to obtain the product q V and Eq. ( 15) to obtain V .Next, after all parcel points have been interpolated, the gridded values of q V and V at i y = −1 are added to those at i y = 1; likewise the gridded values at i y = n y + 1 are added to those at i y = n y − 1.Furthermore, the gridded values at i y = 0 and n y are doubled.Finally, q V is divided by V to determine q.This construction ensures that the total parcel area i V i equals the total gridded area x V (x) = L x L y , and moreover ensures i q i V i = x q(x) V (x) for each attribute.
Care must also be taken when interpolating from the grid to the ellipse support points, see Eq. (17).If one of these support points lies in a halo cell, interpolation requires the gridded values at the outer edge of that cell (either at i y = −1 or n y + 1).We only need this interpolation for the velocity field to advect the parcels, for the strain matrix to update the shape matrix B, and for the vorticity tendency to update dζ i /dt on each parcel i.For the velocity, we assume the x component ū is symmetric across the y boundaries for simplicity, so we copy the values at i y = 1 to i y = −1, and likewise copy the values at i y = n y − 1 to i y = n y + 1.We assume the y component v is anti-symmetric, so we do the same copy but also flip the sign of v.This is compatible with the no normal flow boundary conditions, v = 0 at i y = 0 and n y .We have tested a more complicated interpolation which exploits the definition of vorticity, ū y = vx − ζ , but numerical tests indicate this produces only very minor differences because ellipse support points are never more than a small fraction of the grid cell width y from a boundary.The current procedure is efficient.
For consistency, for the strain matrix S components, we assume symmetry for ūx and v y (= −ū x ), and anti-symmetry for vx .For ū y , we use the definition of vorticity, but this requires ζ in the halo regions.As there are no symmetry requirements to guide us here, we use extrapolation, ζi,−1 = 2 ζi,0 − ζi,1 ζi,n y +1 = 2 ζi,n y − ζi,n y −1 ∀i = 0, 1, . . ., n x − 1 . ( This extrapolation assumes the second y derivative of ζ is zero at each domain edge.For consistency, the same extrapolation is used to set the gridded vorticity tendency bx in the halo regions. Notably, it is rare that parcel support points lie in the halo, and when they do, they lie close to the inner edge of the halo region near one of the y boundaries (this is ensured by the time step choice as explained in Section 3.2).Thus, such support points receive their dominant contribution from grid points along the boundaries.To ensure this, parcel centres are forced to lie within the physical domain after each full time step.The vanishing of v on the y boundaries implies that the parcel centres remain within the domain, assuming accurate advection.On the other hand, a parcel near a boundary could split into two, and one part could have a centre lying in a halo.When this occurs, the parcel centre is mirrored across the domain edge and the B 12 component of its shape matrix B is flipped in sign.This has the effect of mirroring the entire parcel across the domain edge (cf.Fig. 5).In this way, all parcel centres remain inside the physical domain, though support points may still lie in the halos.

Incompressibility
In general, parcel methods struggle to maintain incompressibility, as a result of using a finite number of parcels to represent the flow (see e.g.[20] and section 4.8 in [4]).One common solution is to occasionally re-distribute the parcels to ensure the gridded area found by particle interpolation is the same as the actual grid box area (see e.g.[5] and references therein).For short-time integrations, the error in the interpolated gridded area can be kept small when using a sufficient number of parcels [4,5,10].However, for long-time integrations, a remedy is required.Re-distributing parcels is excessively Fig. 6.Reduction of the average error of the gridded area V due to the divergent flow and gradient correction.The divergent flow and gradient corrections alone do not correct the gridded area as well as when they are combined.diffusive [5]; in particular, it destroys fine-scale structure which might be needed to resolve important features like fronts.Below, we discuss an effective alternative which preserves this fine-scale structure.We perform two operations: a divergent flow correction followed by a parcel density gradient correction.This is done at the end of every time step (it can be done less frequently but it is efficient), and usually repeated twice (tests show that this is adequate, see below).The corrections provide a systematic way to ensure the gridded area remains nearly uniform.

Divergent flow correction
The basic idea here is to displace parcel centres slightly to correct for errors in the gridded area V in Eq. ( 15).In theory, we would like to maintain V = V cell = x y at all times, but invariably V = V cell due to the advection of a finite number of parcels.To correct for this, we add a purely divergent displacement field d = ∇φ where φ(x) is the solution to the Poisson equation Tests with random area errors show that this correction rapidly and monotonically reduces those errors, with the biggest error reduction occurring in the first iteration.However, the reduction in error is not as great as found when coupling this correction to the gradient correction discussed next.

Gradient correction
In order to further improve the homogeneous distribution of parcels over the grid, parcels are moved down gradients of gridded area.This additional correction is applied inside each individual grid box, and we ensure a parcel cannot leave a grid box due to this correction, so that the flow field remains continuous.Below, we use x to denote the x coordinate relative to the grid cell (i.e. it has a value between 0 and 1).For a one-dimensional problem, we can specify a gradient correction in position by a fractional grid-box distance dx at the end of the time-step of the form where C is a pre-factor which depends on the gradient in gridded area: |C| determines the maximum degree to which the locations of parcel centres are compressed at the edges of the cell.We limit this such that |C| ≤ C max .Otherwise, C between grid points i and i + 1 is given by The default value of β, 1.8, was chosen on the basis of one-dimensional tests with parcels initially located in random locations, where the gradient correction and the divergent flow correction were combined.
In the two-dimensional case, a similar correction is applied independently in the y-direction.The gradients with respect to x which determine C x are determined by linear interpolation between the bottom and the top of the grid box (depending on the vertical position of the parcel), while those in y are determined by linear interpolation between the left and right sides of the grid box.
In Fig. 6 we show the convergence of the gridded area V towards the cell area V cell when applying both correction algorithms.To generate this figure we first placed four circular parcels per grid cell (as described below in Section 3.1) in the domain [0, 1] 2 discretized by 32 × 32 cells, and we assigned an area of V cell /4 to each parcel.We then perturbed their horizontal and vertical positions randomly by up to ± x/4 from this optimal setup and applied the parcel correction algorithms iteratively as would be done in a simulation (cf.Fig. 7).Both the divergent flow and the gradient correction reduce the area error, but the combination significantly increases the convergence rate.

Energy conservation
The total energy, here kinetic plus potential, is conserved in the density-stratified flow model considered (see Section 2.2).The degree to which the numerical model conserves energy is a useful measure of accuracy, and is discussed in relation to the Straka and Robert test cases in Section 4.2 and Section 4.3 below.Here, we discuss the calculation of kinetic and potential energy using elliptical parcels.
The kinetic energy (per unit mass) is the integral where the integral is over the entire domain.The actual kinetic energy is K multiplied by the constant background density ρ 0 (a consequence of the Boussinesq approximation).Numerically, we calculate K by a sum over parcels: where (u i , v i ) is the velocity at the parcel centre, and i ranges over all parcels.
The potential energy (per unit mass) is the integral Here, recall b = −g(ρ − ρ 0 )/ρ 0 where g is the acceleration due to gravity.Numerically, we calculate P by the sum where y i is the height of the parcel centre.However, this potential energy cannot all be converted to kinetic energy.The part which cannot be converted is the potential energy of the hydrostatic equilibrium, P ref , found by re-arranging b as a monotonically increasing (or non-decreasing) function of y.Using parcels, this is done by sorting b i , then obtaining a reference height y ref by summing the parcel areas V i divided by the domain width L x .Specifically, we compute then find P ref from Since b is materially conserved, this only needs to be computed at t = 0; it remains constant.The available potential energy is then just the difference P − P ref .
In the results below, we simply use P to denote the available potential energy.
The flow model in Section 2.2 exactly conserves any functional of b, namely where F (b) is an arbitrary function of b; this is a consequence of material conservation of b.The numerical model only conserves the linear functional F (b) ∝ b due to mixing associated with parcel merging.

Initialisation: parcel locations and attributes
To prepare the initial conditions for an EPIC simulation, the first step is to place the parcels on the grid.Here we use m × m parcels per grid cell (m = 3 is sufficient as the tests below demonstrate).In each cell, their centres x i are placed in a regular array, at the locations ( x(i − 1/2)/m, y( j − 1/2)/m) relative to the lower left corner of the grid cell, for i = 1, ..., m and j = 1, ..., m.For square grid cells, x = y, each parcel is a circle with area x y/m 2 .Otherwise, we initialise with ellipses of the same area; if x > y, they are aligned with the x axis and have aspect ratio λ = x/ y; if x < y, they are aligned with the y axis and have aspect ratio λ = y/ x.The grid length ratio is assumed to lie in the range λ −1 max to λ max , but a unit value is recommended.In principle, higher grid anisotropy can be handled using a different arrangement of parcel centres (not m × m).In practice, we currently split parcels if λ > λ max .
Once the parcels have been placed, the next step is to prepare the initialisation of their attributes.The user supplies a gridded field for each attribute, say q(x, 0) for all grid points x, and the algorithm below converts these data into parcel attributes q i (0), for i = 1, ..., n where n is the total number of parcels (this algorithm is adapted from section 2.2 of [21]).
The q i are chosen so that the parcel-interpolated gridded field in Eq. ( 14) matches the given gridded field (to high precision, see below).
To this end, first the parcel weights φ are computed from Eq. ( 16) but using the parcel centres x i in place of the support points x ± i .Then the parcel density ρ at each grid point x is determined from For the parcel placement described above, this is equal to m 2 for all grid points (the approach taken here is analogous to the interpolation algorithm for gridded area in Section 2.4).From this, the inverse parcel density α i is determined from Again, this works out to 1/m 2 for all parcels when placing them as described above.The next steps can be repeated for each attribute q.First, a guess is made for the parcel attribute: Then the parcel attributes are updated using These values are accepted if the maximum value of | R| < ε q (a prescribed tolerance), otherwise we repeat the last two operations (updating R and q i ).The tolerance used here is ε q = 10 −9 q rms where q is the departure of q from its domain mean.In practice, the algorithm converges rapidly, requiring only a few tens of iterations for convergence.If the user supplies a constant field q, the algorithm is bypassed and we simply assign q i = q (constant).
Note that this algorithm still works even for an anisotropic grid.Only the parcel centres are required because the interpolation of the ellipse support points is equivalent to the interpolation of the ellipse centres for elliptical parcels aligned with the x or y axes.This property is convenient for initialisation.

Time stepping: low-storage Runge-Kutta and adaptation
In EPIC, all time-dependent quantities (x i , B i and ζ i ) are propagated in time using the five-stage fourth-order 2N-storage Runge-Kutta scheme (ls-RK4) [22,23] to reduce the memory footprint.Each such quantity, say q i , then requires only one additional array δq i .Given a time step t, the ls-RK4 scheme loops over the following two steps for j = 1, 2, ..., 5: where qi = dq i /dt is the tendency, t j = t 0 + c j t and a j , b j and c j are numerical coefficients listed in Table 1 (note a 1 = c 1 = 0, hence δq i (t 0 ) is not required and t 1 = t 0 ).δq i stores a weighted combination of the actual tendency and those from earlier stages for memory efficiency (note that the weights do not add up to one at each stage).qi must be updated at each stage j of the scheme, since the tendency may depend on other evolving quantities as well.This also means that all quantities must be iterated in the same loop, not sequentially.Before entering the above loop, we adapt the time step using where α s and α b are dimensionless constants, is the maximum gridded strain rate (see Appendix E), and is the maximum (generalised) buoyancy frequency [18].
The constant α s controls how much stretch a parcel can experience in a time step t.The maximum parcel stretch occurs when the major axis of the parcel is aligned with a local straining flow in which γ = γ max .Taking this axis to be the x axis without loss of generality, and a straining flow with u = γ max x and v = −γ max y, one can use Eq. ( 9) to show that λ = 2γ max λ.Thus, over one time step, the maximum parcel stretch is assuming perfect time integration and that t = α s /γ max in Eq. (41).It is less than this if t is limited by N max .
In EPIC, α s is chosen small enough to limit the distance between the ellipse support points, c in Eq. ( 11), to the minimum grid spacing, g = min( x, y).In this way, parcels may deform and split at subgrid scales, thereby providing a good representation of the subgrid flow.Moreover, we must ensure that parcel support points never extend beyond the halo regions next to each y boundary (see Section 2.7) during a time step; the condition c < g ≤ y strongly satisfies this constraint (indeed a support point should never reach further than midway into the halo, but given that the time stepping is not perfect, this is done for safety). Using If we consider the worst-case scenario in which λ(t + t) = λ max e 2α s , then the above inequality implies neglecting λ −1 compared to λ.
Now, the most restrictive condition is obtained when the right-hand side takes its minimum value, i.e. for a parcel of maximum size ab = V max /π : This is a restriction on α s , but may equally be viewed as a restriction on either λ max or V max / 2 g , i.e.
Written this way, it is convenient to choose a small value of α s (for time-stepping accuracy) and a moderate value of λ max (to control parcel splitting), then ensure V max / 2 g satisfies Eq. (48).For an isotropic grid (recommended), note that V max / 2 g = Ṽ max , the dimensionless maximum parcel area fraction.Otherwise, V max / 2 g = Ṽ max A g where A g = max( x/ y, y/ x) is the grid aspect ratio.
The constraint in Eq. ( 48) not only determines the maximum distance between ellipse support points, but also the maximum separation distance when parcels are split (2h).By neglecting λ −1 compared to λ in Eq. ( 45) above, we get a 2 < 2 g .Substituting h = √ 3a/4 as derived in Appendix C, we find 2h < √ 3 g /2.
We require the additional constant α b not just to resolve buoyancy oscillations (frequencies of O(N max )) but also to ensure a finite time step when starting from a flow at rest (which would have γ max = 0).In practice, we take α b = α s = α, a single user-defined parameter, based on numerical tests presented in Appendix G.

Tendency calculation
The time stepping above requires the tendencies qi for each time-dependent quantity q i (and for each parcel i).These are calculated as follows.First, the parcel attributes V i , b i and ζ i are interpolated to obtain the corresponding gridded fields V , b and ζ (in EPIC, this is par2grid).Next, the gridded vorticity ζ is inverted to find the gridded velocity field ū (this is vor2vel, see Appendix A).These components are differentiated in x to form the strain matrix S (only ūx and vx need be computed, since v y = −ū x by incompressibility, and ū y = vx − ζ by the definition of vorticity).Next, the gridded vorticity tendency bx is accurately computed using FFTs and wavenumber multiplication (as are ūx and vx ).Then these gridded tendencies are interpolated to the parcel support points to provide the parcel tendencies ẋi , Ḃi and ζi (this is grid2par).

Parcel merging and splitting
After the parcels have been advanced one time step, they are checked for merging and for splitting.For merging, any small parcels with V i < V min are identified and added to a list of parcels available for merger, of length n merge .To minimise search costs, separate lists of parcels occupying each grid cell are created (this can be done in O(n) operations for n parcels).
The list of small parcels i is then processed to locate the nearest other parcel i .This is done by locating the grid point (i x , i y ) closest to parcel i (using nearest integer arithmetic), then by searching only the 4 grid boxes sharing this grid point (or 2 when adjacent to a y boundary).After the n merge small parcels have been processed in this way, the merger rules described in Section 2.6 are enforced (details of the algorithm may be found in Appendix D).
Note that merging changes all parcel attributes.The new attribute of the merged parcel, say q m , is the area-weighted mean of the attributes of all merging parcels, i.e.
where the sum extends over the merging parcels only.The same formula is used for position x m , vorticity ζ m and buoyancy b m (and other attributes if included).In this way the total mass, proportional to n i=1 b i V i , and the total circulation n i=1 ζ i V i remains conserved for all time, consistent with Eq. ( 1) and Eq. ( 2).The shape matrix B m is handled slightly differently in that, after the area weighting, B m needs to be scaled to preserve the parcel area (see Section 2.6).
Next, any parcel satisfying the splitting criteria (see Section 2.5) is split into two parts, creating a new parcel with identical attributes.Splitting is also conservative, and moreover preserves the second-order spatial moments, besides area and centroid (the zeroth and first-order moments).

Parcel adjustment to correct gridded area errors
At the end of every time step, and after all parcel splitting and merging, the parcels are slightly shifted to correct for gridded area errors.Ideally, we would like to preserve V = V cell = x y, but the advection of a finite number of parcels, splitting and merging inevitably lead to a discrepancy.The algorithm described in Section 2.8 is used to keep this discrepancy small, typically around 0.01% of V cell (see tests in Appendix G).It can be reduced further, if desired (and at cost), by increasing the number of times the divergent flow and parcel gradient corrections are applied, as shown in Fig. 6.

Flowchart
The structure of the EPIC algorithm is shown in Fig. 8.In order to run a simulation, the user only needs to provide gridded field data from which the parcels are initialised (cf.Section 3.1).Before performing the first low-storage RK4 step, the time step is calculated and the initial configuration of field and/or parcel data is written to HDF5 [24] files.In a ls-RK4 sub-step, we first interpolate parcel attributes to the grid (par2grid), then compute the gridded velocity field using inversion (cf.Appendix A), and the vorticity tendency from the gridded buoyancy by differentiation in semi-spectral space.The user provides gridded data fields which are used to initialise the parcels.Occasionally, parcel and field data are written at specified time intervals (following a par2grid and before the first stage of the ls-RK4 algorithm).The parcels are then evolved in time using the ls-RK4 algorithm (each update of field tendencies requires a call to par2grid).In the first stage of this algorithm, the time step is adapted after the call to vorticity tendency.After each ls-RK4 step, the parcels are merged, split and then corrected for incompressibility.The program terminates once the time limit is reached.

Table 2
Recommended numerical settings in EPIC, independent of the grid resolution used.

Numerical parameter Value
Initial number of parcels per cell 9 Maximum aspect ratio, λ max The strain matrix S needed for evolving the shape matrix B is similarly found by differentiation of the gridded velocity.Afterwards, these gridded quantities (the tendencies) are interpolated back to the parcels (grid2par) in order to propagate them.
After each ls-RK4 step, any very small parcels are merged with others and highly-elongated parcels are split.Then the parcel positions are corrected to enforce incompressibility, see Section 2.8.The parcel correction is an iterative procedure where the parcel area is interpolated to the grid (vol2grid) to apply the divergent flow correction, followed by a vol2grid to do the gradient correction.The number of correction steps can be controlled by the user, but normally two steps are sufficient to keep gridded area errors small (cf.Fig. 6).Before the updated configuration is written to the HDF5 files, a new time step is calculated as explained in Section 3.2.Once the time limit is reached, EPIC terminates.

Results
In the following subsections we present results for three standard fluid dynamical test cases: (1) the Taylor-Green flow, (2) a density current simulation developed by Straka [25], and (3) a test case by Robert [26] which collides a large warm bubble with a much smaller cold bubble.We use the Straka test case to examine the model sensitivity to its numerical input parameters (see also Appendix G), and establish a recommended model setup in Table 2.These default values are used in all simulations unless otherwise stated.
All simulations are performed on a laptop running an Intel(R) Core(TM) i7-10750H CPU @ 2.60 GHz.The timing comparison between the numerical models is obtained running EPIC single-threaded, otherwise we used 4 OpenMP threads.The slopes of the decay are estimates of the effective viscosity ν as given in Table 3.

Taylor-Green test case
In our first test case we use, as the initial condition, the steady incompressible Taylor-Green flow [27] that exhibits two counter-rotating vortices: u(x, y) = − 1 2 sin 2x sin y, v(x, y) = − cos 2x cos y (50) in the domain [−π /2, π/2] 2 .The corresponding vorticity is ζ = 5 2 sin 2x cos y.Note, this flow is dimensionless.We employ the same flow model outlined in Section 2.2, except that here b = 0.In a viscous fluid, the flow preserves its spatial form but decays as e −5νt , where ν is the viscosity coefficient (the factor of 5 comes from the sum of the squared wavenumbers, 2 2 + 1 2 ).The Taylor-Green flow thus provides an analytical solution to the Navier-Stokes equations.(Note: a simpler version of this test was used in [5] to check code accuracy; results for EPIC may be found in Appendix F.) In EPIC, we do not include viscosity, so we expect the flow to remain approximately steady.The maximum vorticity in the flow is 2.5, corresponding to a rotation period of 4π /2.5 or approximately 5 time units.While the flow is (approximately) steady, fluid particles still circulate, so in EPIC the parcels will move and deform.Moreover, they will split and merge, resulting in vorticity mixing.This will cause the flow to become (slightly) unsteady.
Here, we measure the impact of this by examining the evolution of the r.m.s.vorticity ζ rms in a series of simulations on grids of dimensions 32 2 , 64 2 and 128 2 .As shown in Fig. 9, the r.m.s.vorticity decays as if the flow were slightly viscous: the fits to an exponential decay are excellent (and continue well beyond the times shown), enabling us to estimate the effective viscosity ν by fitting ln(ζ rms ) to a straight line with slope −5ν, as done in Table 3.Moreover, ν is seen to be inversely proportional to the total number of grid points.Dimensionally, we expect ν ≈ C ν ζ max (0) x y for some dimensionless constant C ν which should be independent of grid resolution.The data in Table 3 support this relationship and indicate C ν ≈ 4.15 × 10 −4 .Hence, the merger and splitting occurring in EPIC result in a weak diffusion.By comparison with conventional numerical methods like pseudo-spectral and semi-Lagrangian, the diffusion in EPIC appears to be much less for comparable grid resolutions [28].While the latter study examined shallow-water flows, the numerical diffusion effects are expected to be broadly similar, and this is confirmed for the more realistic Straka and Robert test cases examined in the following subsections.From the viscosity diagnosed for the 32 2 grid, we can derive a scaling Reynolds number Re = ζ max (0) 2 /ν ≈ 6.26 × 10 5 , with the being the shorter extent of one of the vortices (i.e.π/2).This compares to a prescribed value of 2.48 × 10 4 used in the convergence tests of the Taylor-Green flow in a previous study [23] (although the model used there can be used as an implicit LES, i.e. without a prescribed or parametrised diffusion).
A similar series of simulations having a smaller time step pre-factor of α = 0.05 results in viscosity estimates that are only 3% to 4% smaller compared to Table 3.In fact, by also considering α = 0.1, we find that C ν linearly increases with α, closely obeying the relationship C ν ≈ 3.96 × 10 −4 (1 + 0.263α).Hence, a larger time step results in (slightly) more diffusive behaviour, which suggests that parcel splitting and merging is more vigorous, likely because parcels tend to be more deformed on average.
We next examine the degree to which the flow remains incompressible.This is measured by the (relative) r.m.s.area error We consider three different initialisations, with 4, 9 and 16 (circular) parcels per grid cell.The Fig. 10.Gridded relative r.m.s.area error for different initial numbers of parcels per cell.These results are obtained using a grid of 32 × 32 cells.evolution of the relative r.m.s.area error for the three simulations is shown in Fig. 10, where we see that the eventual error is approximately the same (and small, around 0.01%) in all cases.This is because parcels can split and merge, and in time the number of parcels becomes closely similar despite using different numbers initially.At early times, however, the r.m.s.area error exhibits a significant overshoot when only 4 parcels per cell are used initially.A good choice appears to be 9 parcels per cell initially, as this keeps the relative r.m.s.area error approximately constant during the early stages of the simulation.The average number of parcels per grid cell saturates at around 21 with a standard deviation between grid cells of approximately 4 at late times, independent of the number of parcels used initially (not shown).
Finally, we examine the degree to which symmetry is maintained in EPIC.To this end, Fig. 11 shows the parcel configuration at times 0, 5, 20 and 80 (for the simulation using all the default parameters in Table 2).The parcels are coloured with respect to their aspect ratio λ (initially all parcels have λ = 1).The mean aspect ratio, time averaged, is approximately 2.38 with a standard deviation of 0.69 when using either 4, 9 or 16 parcels per cell to initialise.One may observe that, in time, symmetry is not preserved (compare for example the green-yellow spots at the top left and right of Fig. 11 at t = 20).Symmetry is broken by parcel mergers on or near the symmetry axis (x = 0) and across the periodic boundary (x = π/2), another symmetry axis of the Taylor-Green flow.It appears extremely difficult to design a parcel merger algorithm that preserves symmetry, without first imposing symmetry e.g. by simulating only one cell of the flow in a domain entirely enclosed by free-slip boundaries.Nonetheless, the vorticity field remains closely anti-symmetric in x, as shown in Fig. 12 which plots the r.m.s.value of ζ(x, y, t) + ζ(−x, y, t) (for x ≥ 0) scaled by ζ max (0) as a function of time.Symmetry is first broken between t = 2 and 3, but the loss of symmetry is only slight (around 0.15%) over the entire duration of the simulation, and saturates by around t = 50.The largest errors occur where vorticity gradients are largest, as expected (not shown).

Straka density current test case
In the second test case we initialise a cold bubble following the setup of Straka [25], where an ellipse with semi axes a temperature perturbation (a cold anomaly) is introduced having the form with T max = 15 K.The total buoyancy [4] is therefore given by b(r) = g T (r)/T 0 with reference temperature T 0 = 300 K and gravitational acceleration g = 9.81 m/s 2 (there is no moisture in this setup).Unlike [25], we consider an incompressible flow and therefore do not need to solve a pressure or energy equation.Moreover, we consider an inviscid flow, whereas [25] used a moderately strong prescribed viscosity.Other results for the Straka density current test case without viscosity can be found in [29], who used radial basis function generated finite differences (RBF-FD) to solve the governing equations.
In Fig. 13 we show the parcel configuration in the right half of the domain at time 900 s.The simulation is initialised with 9 parcels per cell (the default) and 4096 × 512 grid cells.In the upper panel of the figure, the parcels are coloured according to their aspect ratio.At the interface between the passive and active regions, the parcels undergo significant stretching and have a relatively high aspect ratio.The corresponding parcel buoyancy is shown in the lower panel.Many small-scale vortices are evident, arising from early Kelvin-Helmholtz instabilities developing on the spreading density current.
The evolution of the energy, including the kinetic and potential components, is shown in Fig. 14.The EPIC model conserves the total energy very well considering the great complexity of the flow, and this is true even at lower grid resolution.The net energy loss over time as a function of grid resolution is listed in Table 4, including results for pseudo-spectral simulations using hyperviscosity and molecular viscosity, PS-h and PS-m (see below).The data indicate that doubling resolution reduces the loss in energy by less than a factor of 2. This is less than the expected factor of 4 for two-dimensional flows without buoyancy [see sec.3.4 in 30]; however, buoyancy-driven flows exhibit fine-scale vorticity production, and thus greater dissipation is to be expected [31].
We next compare EPIC with a standard pseudo-spectral (PS) method [32].The latter is a grid-based method that has been extensively used in simulations of fluid flows, in particular turbulence.It is simple and accurate, and is ideally suited to the flow model considered here (for details, see Appendix I).It uses many of the same components as EPIC, in particu-  2. Each parcel is coloured according to its aspect ratio (above) or buoyancy (below).The square zoom windows in the lower panel are 600, 100 and 25 m wide, the last corresponding to two grid cell widths.Individual parcels can be seen in the finest zoom; they are overplotted depending on their order and thus do not give the actual buoyancy field.To make the images in the lower panel, the buoyancy field is first obtained on a very fine grid using the subgrid scale visualisation method described in Appendix H, then the individual parcels are plotted over this background.This ensures that any voids between the parcels are filled.Subsequent figures use subgrid scale visualisation alone, which produces a smoother image at the finest scales.Note: the parcel number indicated on this figure refers only to the subdomain shown.Fig. 14.Evolution of the available potential ( P ), kinetic (K ) and total ( P + K ) energy.The results are obtained using the default parameter values listed in Table 2 and 4096 × 512 grid cells.The relative total energy loss between the initial time (t = 0 s) and the final time (t = 900 s) is 0.53%.

Table 4
Relative total energy loss between the initial time (t = 0 s) and the final time (t = 900 s) as a function of grid resolution, and for different numerical methods (see text).Doubling the grid resolution in both directions for EPIC reduces the energy loss by a factor ranging from 1. 32   lar the same FFTs and the same time-step adaption, to facilitate comparison.For numerical stability, the PS method must obey an additional CFL constraint on the time step, and furthermore requires small-scale damping.As in many previous studies, we have implemented both hyperviscosity, ∝ ∇ 6 ζ and ∇ 6 b on the right-hand sides of Eq. ( 1) and Eq. ( 2), and ordinary (molecular) viscosity.For both forms of damping, the resolution-dependent viscosity coefficient is tuned to minimise spurious Gibbs fringing and ensure decay in spectra at high wavenumbers (small scales) -full details may be found in Appendix I. We start with a qualitative comparison of the methods in Fig. 15 (note that some of the details of these plots are best seen using a detailed digital zoom), which shows a portion of the buoyancy field at the final time, at various grid resolutions, and for EPIC and PS simulations using hyperviscosity and molecular viscosity (PS-h and PS-m).The EPIC fields are derived using the fine-scale visualisation method in Appendix H. Across any row, the grid resolution used in EPIC is half that (in each direction) used in PS.The EPIC fields compare well with the PS-h ones at four times higher resolution, while the PS-m fields are comparatively diffusive.Indeed, only the highest resolution PS-m field (bottom right) is comparable to, yet still more diffusive than, the coarsest resolution EPIC field (upper left), which uses 16 times lower resolution.
While PS-h is much less diffusive than PS-m, there is a price to pay: the extrema b min (t) and b max (t) do not remain within their initial bounds, and do not vary monotonically in time, as in PS-m (almost, here the diffusion is marginally sufficient) and in EPIC -see Fig. 16.Not only does hyperviscosity cause fringing, it leads to overshoots and undershoots where steep gradients develop, as is well known [33,34].As a result, field extrema may lie well outside their initial range, here up to 50% of b max (0) − b min (0).Moreover, the situation worsens for increasing resolution, and is not improved by increasing the damping rate (we have tested a 10-fold increase).This is a particularly undesirable feature in atmospheric models, as it is critical that certain fields remain monotone to prevent spurious negative tracer concentrations, e.g.water vapour.
While other advection schemes could be used to preserve monotonicity, they are typically much more expensive than the PS method used here [25].EPIC, by construction, preserves monotonicity: buoyancy only changes when parcels merge, and the resulting buoyancy value always lies between the minimum and maximum buoyancy of the merging parcels by Eq. (49).
A quantitative comparison of the EPIC and PS methods is provided by examining the buoyancy power spectra, P (k), the spectral variance of b as a function of total wavenumber k (the magnitude of the wavevector).Fig. 17 compares the spectra at the final time in the EPIC, PS-h and PS-m simulations across a wide range of resolutions.The strong diffusion in PS-m is especially evident, and it affects the entire range of scales.By contrast, EPIC and PS-h exhibit power-law spectra with little evidence of diffusion, and the spectra are closely comparable over shared wavenumber ranges, indicating good convergence with resolution.Overall, EPIC captures the widest range of scales for a given grid resolution, about four times wider than PS-h.And EPIC does this while remaining monotone.
Notably, Table 4 indicates that the net energy loss in the EPIC simulations is comparable to that in the PS-h simulations at the same grid resolution.As discussed below in Section 4.3, this is due to the fact that EPIC obtains its vorticity tendency b x from gridded values of the buoyancy b.This limits vorticity growth (though it is still higher than in PS-h as shown

Fig. 17.
Comparison of the gridded buoyancy power spectrum P (k) at t = 900 s in the Straka test case, for different grid resolutions, and for different models (EPIC, PS-h and PS-m).The power spectrum satisfies Parseval's identity, namely ´P (k) dk = ˜b 2 dxd y.Note: the EPIC results were obtained by first interpolating the buoyancy onto a grid which is 12 times finer in each direction, using the method described in Appendix H. Also note: spectra are cut off below P = 10 3 .below), and reduces the kinetic energy content of the flow, since a part of the subgrid scale velocity field is unresolved in EPIC.This explains the comparable decay in total energy seen in EPIC and in PS-h.The latter method, however, achieves its level of energy conservation by creating spurious overshoots and undershoots in b and ζ .A monotone method like PS-m is far more dissipative.EPIC achieves its level of conservation while remaining monotone.
Regarding the numerical cost, Fig. 18 compares EPIC, PS-h and PS-m over a 16-fold variation in grid resolution.EPIC is seen to be most costly, as expected, given the relative simplicity of the PS method.However, the cost increase per doubling in resolution is less, rising by a factor of approximately 6.3, and at the highest resolution, 4096 × 512, the numerical costs of the three simulations is more closely comparable.EPIC then takes 3.4 times longer than PS-h (at 2048 × 256, EPIC takes 8.4 times longer).The steeper growth in cost in the PS method is due to the CFL stability constraint on the time step, which is not present in EPIC.This constraint limits the time step over an increasing proportion of the simulation with increasing resolution (at early times, the buoyancy frequency limits the time step, see Eq. ( 41)).It is evident from Fig. 16 and Fig. 17  that EPIC is considerably more accurate than PS at the same comparable to PS-h at 4 times resolution and with no fringes or overshoots, and more accurate than PS-m even at 16 times resolution.Taking this into account, EPIC can be seen to be an efficient, cost effective means to model density-stratified flows.

Robert warm bubble test case
In our final example we demonstrate EPIC with the asymmetric bubble test case introduced by [26].This case starts with a large warm and a small cold 'bubble'.Each bubble has a potential temperature of The evolution of the flow to the final time of 900 s is shown in Fig. 19, which displays the buoyancy field b in an EPIC simulation using 256 × 384 grid cells.The two bubbles collide around t = 300 s, and the larger bubble draws in environmental air from below forming a characteristic mushroom shape.Thereafter the flow grows rapidly in complexity.
By t = 600 s, the stretched and sharpened buoyancy interfaces roll up into a multitude of vortices, with many incipient ones developing.This is especially visible on the right, along the thin filament connecting the main head of the bubble with the complex structure to the lower right.By the final time (t = 900 s), the flow has become exceedingly complex, but the wake region also shows significant mixing with the environment and is thus losing its buoyancy.
This test case was run at various grid resolutions with initially 9 parcels per cell and the default parameter settings in Table 2.The net energy loss reduces by a factor of around 2 in EPIC when doubling the resolution as summarised in Table 5.The evolution of the kinetic and potential energy for 1024 × 1536 grid cells is illustrated in Fig. 20.By the end of the simulation, nearly half of the available potential energy has been converted into kinetic energy.The kinetic energy likely grows further before saturating and oscillating, as in the Straka test case (cf.Fig. 14).

Table 5
Relative total energy loss between the initial time (t = 0 s) and the final time (t = 900 s) as a function of grid cell resolution.Doubling the grid resolution in both directions for EPIC reduces the energy loss by a factor ranging from 1.88 to 2.27.The energy loss in the EPIC simulations is comparable to that in the PS-h simulations at the same grid resolution.Regarding accuracy and efficiency, Fig. 21 (top panel) shows that the r.m.s.area error overshoots at early times, reaching approximately 1.5 × 10 −4 , before decreasing and stabilising at later times to around 10 −4 , independent of grid resolution.
This decrease occurs because the number of parcels per cell (middle panel) increases, enabling the code to better preserve incompressibility.Despite the increase in parcels, the percentage of small parcels (with V i < V min , see bottom panel) in the pool of all parcels before merging remains around 5% or less.
The convergence of the spatial fields of both buoyancy and vorticity at the final time t = 900 s is presented in Fig.
and Fig. 23, comparing EPIC and pseudo-spectral simulations using hyperviscosity and ordinary viscosity, PS-h and PS-m.The height of the leading edge of the rising bubble is closely similar at all resolutions in EPIC, but this edge becomes increasingly convoluted with resolution.As expected in a flow where the production of vorticity is proportional to the buoyancy gradient, and therefore inversely proportional to the grid cell width, each increase in grid resolution will activate more intense and smaller-scale vortices.As soon as buoyancy gradients are limited by the grid resolution, results (on small scales) will differ from those generated at higher resolution.At the time shown in these figures, buoyancy gradients are limited even at the highest resolution, so results at yet higher resolution will differ, but primarily at small scales.The only way to achieve pointwise convergence is to compare an early time at which buoyancy gradients are not limited by resolution, like at t = 304.1 s in Fig. 19.However, this hardly challenges the numerical method.
As previously found in the Straka test, qualitatively EPIC resembles a PS-h simulation run at 4 times finer resolution (in each direction).Moreover, EPIC has none of the numerical artefacts of overshoots and undershoots seen in PS-h (see Fig. 24).PS-m is more diffusive than EPIC even when the latter uses 16 times coarser resolution in each direction.Notably however, vorticity extrema in EPIC and PS-h are comparable at the same grid resolution.This is due to the fact that EPIC computes the vorticity tendency, b x , from the gridded buoyancy field b.Thus, the growth in vorticity is limited ultimately by the inverse of the grid spacing.Yet, what matters to the velocity field is the circulation deposition, effectively the areaintegrated vorticity tendency, and this only requires an accurate gridded buoyancy field.Note, the streamfunction ψ(x, t) is essentially the (spatial) sum of the local circulation ζ(x , t)dxd y multiplied by the appropriate Green function G(x, x ) for the domain.In EPIC, the main limitation is that the subgrid scale velocity field is crudely represented -it varies only linearly in each coordinate, holding the other fixed, by Eq. (17).Nonetheless, the missing part of the subgrid scale velocity field is much weaker than the resolved field, and so parcel advection is accurate, indeed significantly more accurate than the finite grid resolution would suggest.The dominance of the resolved velocity field in advection is key to the accuracy of Lagrangian-based methods, see e.g.[18,28,35] and references therein.
As in the Straka test case, it is instructive to study how the numerical results converge across scale by examining the buoyancy power spectrum P (k), and again comparing with PS simulations using hyperviscosity and molecular viscosity, PS-h and PS-m.This is done in Fig. 25 for the final time, and across a wide range of resolutions.Again, PS-m exhibits strong diffusive effects and poor convergence.PS-h and EPIC are much less diffusive and show stronger convergence.Again, EPIC captures the widest scale range without noticeable effects of diffusion, about 4 times wider than PS-h.The cost of the simulations over a wide range of resolutions is shown in Fig. 26, comparing EPIC with PS-h and PS-m.Again, EPIC costs more at any given resolution, but considering it is also most accurate, perhaps 4 times more accurate than PS-h (and without the associated numerical artefacts), EPIC is seen to be highly cost effective.The cost increase in EPIC per doubling in resolution is approximately 6.5, while it is more than 8 in PS-h and PS-m.Closely similar results were found in the Straka test case, see Fig. 18.
We finish by examining the mixing properties of EPIC for this case.For this, we form a histogram of parcel buoyancy to see how the initial buoyancy values mix as a result of parcel splitting and merging (the only source of mixing in EPIC).The histogram is created by summing parcel volumes V i into buoyancy bins (like when forming a PDF), then normalising so that the area under the histogram is 1 (without loss of generality).The results are shown in Fig. 27, for the initial conditions (top panel), the final configuration (middle panel), and the difference between the final and the initial configurations (bottom panel) -for all six grid resolutions considered.The large peak at b = 0 corresponds to the neutrally buoyant background (here most of the domain), while the secondary peak near b = 0.0163 corresponds to the uniform buoyancy inside the large bubble.The broader distribution of the histogram seen at low resolution is a consequence of our initialisation of parcel buoyancy values from a given gridded field.The differences shown in the bottom panel indicate the net mixing that has taken place from the initial to the final time.We see that the middle range of positive buoyancy values has filled in, while the lower negative range of buoyancy has been depleted.The latter is a consequence of the intense stretching and folding of the cold bubble, seen e.g. in Fig. 19 (right panel).Similarly, high buoyancy values mix with lower values or the environment, especially around the sharp interfaces which develop as a result of material stretching.
An especially notable feature is the strong convergence displayed by even the difference histograms with increasing grid resolution.This is promising, as it shows that the mixing of buoyancy, here a material tracer, is well modelled by parcel splitting and merging in EPIC.Such strong convergence was not obtained previously using point parcels in MPIC [10], which used an ad hoc, implicit model of stretching (since point parcels cannot explicitly deform).It appears, therefore, that the ability to explicitly deform parcels in EPIC is key to properly modelling small-scale mixing.Stretching is accurately modelled by following the shape of each parcel, and a single criterion is used for parcel splitting (a high aspect ratio).Parcel merging is also handled differently in EPIC compared to MPIC.The latter method removes small parcels by spreading their       properties to the corners of the grid cell they occupy; these properties are then shared among all remaining parcels.In EPIC, by contrast, parcels are merged with the closest other parcel, an operation which is much more local and independent of the grid.Arguably, it is also a more natural approach, and leads to the excellent mixing properties observed in Fig. 27.

Discussion
In this paper, we have formulated and tested a new, robust and efficient parcel-in-cell based approach for the simulation of two-dimensional flows and related physical systems.The approach, called EPIC after 'Elliptical Parcel-In-Cell', makes use of deformable elliptical parcels as in [5], but differs in several fundamental ways.These include (1) ensuring that the parcels are space-filling (generally overlapping), (2) simplifying their interpolation using a pair of representative 'support points', (3) splitting highly-deformed parcels in a way that conserves all spatial moments up to second order, (4) merging very small parcels with larger ones in a more geometric way, and (5) automatically correcting for errors in incompressibility without the need to ever re-grid the parcels.Other details also differ.
Three different test flows were used to quantify accuracy, measure convergence and investigate the dependence on numerical parameters.These tests demonstrate that complex exhibiting sharp variations and extreme values can be handled efficiently and robustly with the new EPIC method.Moreover, we have compared results obtained with a standard pseudo-spectral method, widely used in fluid dynamical modelling, and have shown that EPIC offers a highly cost-effective alternative.During the development stage, we have exhaustively tested each component individually in a wide variety of 'unit tests'.We have ensured that steady flows like Taylor-Green remain very close to steady for long times.We have ensured that incompressible flows remain area preserving, a known challenge for parcel-based methods [4,5,10,20].Finally, we have also ensured that the EPIC code database is easy to use and is readily adaptable to other problems (see 'code availability' below for access).
The many tests conducted helped to pin down choices for the numerical parameters in Table 2.The choice of each parameter has an impact on both accuracy and efficiency.We have sought a balance.It makes no sense to use a tiny minimum area fraction Ṽ min because this does not improve the subgrid representation without an associated subgrid-scale velocity field; moreover it is computationally expensive and takes much memory.Likewise, the time step pre-factor α need not be very small since the greater accuracy in time stepping is then swamped by other errors -like simply using a finite spatial resolution.The recommended parameters in Table 2 do not appear to be dependent on the grid resolution (not listed in the table).At higher resolution, we simply use more parcels.The time step will be shorter as the velocity strain and buoyancy gradients will naturally increase.
Note, the recommended settings need not be strictly adhered to: there is some liberty in the values.We advise against changing β and C max in particular.Making λ max smaller may be more accurate but will lead to many more parcels, so the code will be less efficient.Likewise, α could be reduced if higher temporal accuracy is needed for some reason.The values of Ṽ min and Ṽ max can be adjusted, but extensive tests indicate that the recommended values enable an adequate representation of subgrid-scale variations.An improvement we hope to make in the near future concerns small, inactive parcels left over following a period of strong stretching.In the density-stratified flows examined here, a descending patch of denser fluid or a rising patch of lighter fluid leaves a turbulent wake which decays in time through entrainment and mixing.This may leave many small parcels in nearly stagnant regions with low strain.Numerically, these small parcels over-represent the flow, and moreover increase the computational burden and memory footprint.A possible remedy is to make the minimum parcel area fraction Ṽ min depend on certain flow properties.For example, Ṽ min could increase as the stretching rate γ , buoyancy gradient |∇b| and/or vorticity gradient |∇ω| decrease.
The EPIC method, developed here for incompressible density-stratified flows under the Boussinesq approximation, could be extended to much more diverse physical systems.The core aspect of the method, namely representing a continuum by a space-filling set of elliptical parcels, does not depend crucially on the physical system modelled so long as it is advection dominated, with diffusion playing a minor role except perhaps in subgrid-scale turbulent mixing.Even if diffusion is important for certain quantities but not for all quantities, as in magneto-hydrodynamic turbulence at small magnetic Prandtl number [35], a hybrid EPIC method could be used.This hybrid method would evolve diffused quantities on a regular grid using conventional numerical methods, and use parcels to evolve the remaining advection-dominated quantities (this would be analogous to the 'Contour Advection' method used in [35]).
A similar strategy may be applied to systems involving a mix of predominantly large-scale waves and advectiondominated quantities, as in rotating shallow-water turbulence [36].The latter is an example of a compressible (2D) flow.In EPIC, parcel areas would generally change in time, meaning that all three distinct components of the shape matrix (B 11 , B 12 and B 22 ) would have to be evolved independently.This is however straightforward.We would still require that the parcel interpolated area V at each grid point is as close as possible to the grid cell area V cell .The algorithm to do this is exactly the same as we have developed for incompressible EPIC.
The original motivation for developing EPIC was to improve the modelling of cloud growth and decay in the atmosphere, following on from the moist parcel-in-cell (MPIC) method [4,10].In fact, the 2D model developed here optionally includes moisture to allow for condensation and evaporation, and the associated latent heat exchanges.Our next step is to generalise EPIC to three dimensions, to properly account for turbulent processes like vortex stretching.Like in MPIC, each parcel will now carry a vector vorticity which evolves not only because of buoyancy gradients (baroclinic processes), but also because of stretching.Otherwise, the attributes carried on parcels are the same as in the 2D model.Moreover, many of the numerical methods carry over, largely unchanged, to the 3D model.The exception is the shape matrix B, which now has 5 independent components (in an incompressible flow) rather than 2. The evolution of B is however formally identical.The interpolation in 3D will use 4 support points lying in the major-middle axis plane of the ellipsoid (rather than 2).These support points are the analogues of Gaussian quadrature points for volume integration [17].Their location depends on obtaining the eigenvalues and eigenvectors of B, which is less straightforward than in 2D but is nonetheless efficient.A 3D version will also incorporate distributed memory parallelism.Most of the functionality required for parallelism has been previously implemented for the Moist Parcel-in-Cell method [37], but some adjustments will be needed because the 4 parcel support points can be in different grid cells.The new merging algorithm (see Appendix D) will also require local parallel communication.We hope to report on the 3D version of EPIC in the near future.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.The eigenvector along the major axis of the ellipse is given by â = (â x , ây ) ≡ (cos ϕ, sin ϕ), the (normalised) solution to Both rows are equivalent; using the second gives Eq. ( 12).Under the assumption that a ≥ b, the semi-focal length c as given in Eq. ( 11) is obtained by inserting Eq. (B.13 (B.16)

Appendix C. Parcel splitting
To derive the positions and centres of the two identical 'child' parcels that result from splitting, we orientate without loss of generality the major axis of the 'parent' parcel along the x axis, so that its minor axis is along the y axis (cf.Fig. C.28).After splitting, the child parcels must each have half of the area, the same total centroid and the same total second-order spatial moments as the parent parcel.The child parcels are therefore centred on the x axis at x = x ± (h, 0), with h to be determined by equating the second-order moments.The moments for the parent parcel are given by where denotes the area of integration (here we have used elliptical polar coordinates (R, θ) in which x = aR cos θ and y = bR sin θ ).The last integral is zero due to symmetry.Assuming the child parcels have the same shape, we can evaluate the moments for just one of them and double the result:

D.1. Enforcing the merger rules
In this section, the algorithm for merging parcels is discussed in detail.We start by showing a number of parcel configurations of increasing complexity and introducing the concepts of leaf parcels, unavailable parcels and double links.Subsequently, a full example of the merging algorithm with several iterations is given.The merging algorithm ensures all small parcels are merged, and consists of an iterative stage, and a second stage that deals with small parcels that point to each other (i.e. they are each other's nearest neighbour). 1.The first example is a small parcel pointing to a large parcel, with no other parcels involved.In this example, the algorithm will merge the two parcels.The small parcel is labelled in green to indicate that it is a so-called leaf parcel.This terminology comes from graph theory, and more specifically the study of trees.A tree is a connected graph without cycles.In our case, a leaf parcel is defined as a small parcel that does not have any parcels pointing to it.In the iterative approach, we ignore merged parcels after each iteration, which means a parcel can become a leaf parcel in a later iteration (after a parcel that was pointing to it is merged).2. The second example is a chain, where an additional intermediate small parcel is added.In this case, we merge small parcels a and b, and ignore parcel c as it is a large parcel.In our algorithm, every parcel that has an (unmerged) nonleaf parcel pointing at it, such as parcel c which has parcel b pointing at it, will be marked as unavailable for merging in the current iteration.Unavailable parcels are identified only after all leaf parcels have been determined.3.In the third example, another intermediate parcel is introduced.In this case, we merge parcels a and b in the first iteration, where parcels c and d are marked as unavailable.Parcels c and d are merged in a second iteration.An example of the merging procedure with several iterations is given later in this section.4. The fourth example is similar, but involves a pair of double-linked parcels instead of a small and large parcel.Again, a and b are merged in the first iteration.A special procedure that deals with double-linked parcels is introduced later in this section.Double-linked parcels cannot become leaf parcels during the iterative stage, as they will always continue to have the incoming link from the other double-linked parcel.For the same reason, they will remain marked as unavailable.5.The fifth example shows a group merge, where two parcels (a and b) point to the same parcel (c).Any parcel configuration without a double link can be described as a tree structure, where the edges are all directed towards the large parcel.In graph theory, this is known as an in-tree.Here, the parcels a and b merge with c.The parcel d merges with parcel e.
A more complex example of a configuration having a double link is given in Fig. D.30.A double link introduces a cycle into the graph, which therefore no longer is a tree in the strict sense.However, the graph can be described as a combination of two trees which share two nodes at the double link (a dual tree).In Fig. D.30, the different trees are indicated by the shading of parcels and links (the purple nodes indicate parcels that are shared between the blue and red tree).Any graph can contain at most one double link, as each parcel can only point to one other parcel.Larger cycles are also excluded: given  a set of small parcels, there will be one link which corresponds to the shortest distance (the double link).The corresponding parcel pair will not have another outward link that would be needed to form a larger cycle.
The first (iterative) stage of the algorithm deals both with trees having a large parcel, and with any links in a dual tree that do not involve a double-linked parcel.As discussed above, the double-linked parcels remain unavailable for merging during the iterative stage of the algorithm and require a separate (though simple) procedure.
An example of the iterative algorithm is given in Fig. D.31.The algorithm gradually merges the (dual) tree, starting from the initial leaf parcels.In each iteration, we first establish the leaf parcels, followed by the unavailable parcels.Then we determine valid mergers, which involve a single or multiple leaf parcels merging with an available parcel.
In the first iteration shown in Fig. D.31, parcel c is the only available parcel, and we merge parcels a and b with it.Subsequently, these parcels are marked as merged, indicated by light blue shading here.The link from the merged parcel c to the unmerged parcel d is disregarded in the second iteration, and d now becomes a leaf parcel.This subsequently means parcel f becomes available, and parcels d and e can be merged with it.Once this has been done, no further mergers can be established: note parcel i cannot merge with the unavailable parcel h.
Once no further mergers have been found during an iteration, the algorithm proceeds to resolving the double links and parcels that point to double-linked parcels (see Fig. D.32).We now mark any double-linked parcel that is connected to at least one leaf parcel as available for merging.Here, this means h becomes available for merging and both g and i are merged with it.In principle, we could mark parcel g as a leaf parcel as well at this point, but there is no need to do this in our implementation.
The outgoing links from an available parcel are removed in this stage.In some other parcel configurations, both doublelinked parcels have leaf parcels pointing to them and become available: in such cases the double link is broken in both directions (see Fig. D.33).
An isolated pair of double-linked parcels results in both parcels being marked as unavailable.If a link between two unavailable parcels is found while resolving double links (e.g. a pointing to b and vice versa, as in Fig. D.34, panel 1), we mark the parcel with an outgoing link (e.g.parcel a in panel 2) as available, so that parcel b merges with it when the link pointing from b to a is found (panel 3).Note that because of the conservative properties of the merging process, the direction of merging is arbitrary: parcel a merging into b yields the same result as b merging into a.
Although the current implementation of EPIC only has shared-memory parallelism, the algorithm has been designed with distributed parallelism in mind.The number of iterations involved in tests with about 300,000 parcels is typically at most   5, although this number goes up to about 10 during the initial stage of the Taylor-Green simulation, where chains of small parcels tend to occur near the symmetry axis.The number of parcels that need to be merged in any time step is small, and none of the operations involved are global.

D.2. Merging parcel groups
The merger rules above identify groups of parcels to be merged in the current time step.The actual merger is carried out as follows.Each merger group consists of a set of elliptical parcels centred at x k , with areas V k = πa k b k and shape matrices where k denotes the area of integration, while x ≡ x − x k and y ≡ y − y k are coordinates relative to the parcel centres.
Here the index k identifies the parcels belonging to the specific merger group under consideration.
We then create a merged parcel having total area (modulo periodicity in x).The corresponding B matrix components are found from where xi ≡ x i − x.Performing the integrals yields Eq. ( 19).These are scaled by a constant factor as described in Section 2.6 to ensure that the determinant B is a 2 b 2 = (V /π ) 2 , required for consistency.

Appendix E. Time step adaptation
In EPIC, the time step is adapted to ensure that the parcel stretch between successive time steps is kept small, and that buoyancy oscillations (internal waves) are well resolved in time.2, except for the initial number of parcels per cell, here 4 as in [5].The black solid line defines the reference solution obtained from the contour advection (CA) algorithm on a 400 × 400 grid.
Next, we seek to find the parcel shape which maximises γ .

G.2. Variations in the numerical parameter settings
We next use the Straka test case to find suitable values for all user-defined input parameters.To this end, we fix the grid resolution at 512 × 64 cells but vary individual input parameters over the ranges indicated in Table G.6, holding all other parameters fixed to the values indicated in bold.We also use 9 parcels per cell initially, as this appears to be a good choice for keeping the r.m.s.area error uniformly small in time (see Fig. 10).
The default parameters in Table 2 were chosen for efficiency and accuracy, but the results prove not to be sensitive to these choices.This is quantified next by comparing the location of the buoyancy-weighted centre of mass ( x b , y b ) at time t = 900 s.We evaluate this measure over the right half of the domain (x ≥ 0) using We then compare these values with their reference values x ref b and y ref b , obtained using all the default parameters in bold, by computing the maximum relative deviation is the mean radius of the initial elliptical buoyancy distribution.The percentage maximum relative deviation is listed in Table G.7 for each varied parameter (holding all others fixed).We see that the deviations are exceedingly small for both Ṽ max and C max , likely because these parameters are rarely needed over the course of the simulation.On the other hand, and as expected, the maximum aspect ratio λ max and the time step pre-factor α have the greatest influence on the flow.Nevertheless, the deviations in e.     2 and 512 ×64 and 256 ×384 grid cells, respectively.As expected the operations par2grid and grid2par take most of the time.Together they account for about 60% of the total time.All timings below 5% of the total time are accumulated in others.
We next consider the impact of λ max on other measures of accuracy.Fig. G.40 (top panel) shows that the gridded relative r.m.s.area error reduces when using smaller maximum aspect ratios λ max .This improvement comes, however, at a higher cost since the number of parcels per cell increases (see middle panel) and parcels split more frequently (see bottom panel).
As the loss in efficiency going from λ max = 5 to λ max = 4 is relatively small, while the gain in accuracy is more significant, we have adopted λ max = 4 as the default, recommended value.

G.3. Timings
Detailed timing percentages for the Straka and Robert test case are in Fig. G.41.The figure shows that the routines interpolating to from the i.e. par2grid and grid2par, are responsible for the highest costs.Together, they take approximately 60% of the total time.The advection of the parcels (parcel push) by the low-storage RK4 algorithm uses about 12%.The newly developed parcel correction methods (gradient and laplace correction) account for about 20%.All other operations taking less than 5% are accumulated in others, accounting for 8% to 9%.

Appendix H. Subgrid visualisation
For subgrid visualisation and detailed spectral analysis, we follow the approach described in [10], extended to elliptical parcels with some modifications.Each parcel contributes to the fine-scale gridded field at a number of grid points in its surroundings.The values of a field q on the fine-scale grid q(x) (distinguished from an overbar to prevent confusion with the coarser grid used for advection) are given by a sum of contributions from nearby parcels (indexed by i): q(x) = i q i f (r eff (x, x i )) i (r eff (x, x i )) where r eff (x, x i ) is the effective radial distance in elliptical coordinates, i.e. the distance between x i and x normalised by the distance between the ellipse centre point and its edge on the line connecting x i and x: For the plots in this article, we use a generalised form of equation A7 in [10]: Here, r v and p v are two coefficients, for which we recommend the choices r v = 1 and p v = 3.A low value of r v (e.g. 1) and a high value of p v (e.g. 3) result in the shapes of individual ellipses being visible with sharp gradients between individual ellipses, and give larger spectral contributions at high subgrid wavenumbers, whereas the combination of a relatively high value of r v (e.g. 2) and a low value of p v (e.g. 2) smooths out filaments in the flow.A low value of p v also means that parcel contributions fall off less rapidly at larger distances from a parcel.
The area integral ˜f (r eff ) d A is proportional to the volume of the parcel.We choose a relatively large value of r max = 4 here so that the discontinuity in f at r eff = r max is small.In order to calculate the contributions of each parcel efficiently, we first determine the major semi-axis length of the ellipse a i from B i .We then consider the fine-scale grid points in a square with sides 2a i r max around the ellipse centre (see Fig.  of radius a i r max (this is relatively efficient).If it does, we further check if it falls within the projection ellipse r eff < r max defined in Eq. (H.3).Only then do we add its contribution to q(x).
The time stepping of Eq. (I.1) and Eq.(I.2) is done in spectral space, using the (implicit) trapezoidal rule with iteration.
Specifically, if we denote the (k, l) spectral coefficient of any quantity q by q, then we solve the equations where n denotes the time level, Ŝζ is the Fourier coefficient of the vorticity source term −u • ∇ζ + b x , and similarly Ŝb is the Fourier coefficient of the buoyancy source term −u • ∇b (all spatial derivatives are computed spectrally by wavenumber multiplication).In the above, quantities at time level n correspond to time t and are known, while we seek ζ n+1 and bn+1 at time t + t.In spectral space, the dissipation operator D = ν(k 2 + l 2 ) for molecular diffusion, and D = C ζ n char (k 2 + l 2 ) 3 for hyperdiffusion.
We solve the above system iteratively by first using Ŝn+1 where L ≡ 2/(1 + 1 2 t D) is fixed.The above update is performed three times, each time with an improved estimate of the source terms.The time step t is adapted before the above iteration is carried out using the same restrictions on the maximum strain rate and buoyancy frequency enforced in EPIC, see Eq. ( 41), except here we take α s = α b = 0.1 by default.This is half the default used in EPIC, but the time-stepping used in the pseudo-spectral code is only second order, compared to fourth order in EPIC.Additionally, to maintain numerical stability, it is necessary to ensure the time step is below the CFL limit t CFL = min( x, y)/|u| max .Here we require both Eq. ( 41) and t < 0.8 t CFL ; the factor of 0.8 is necessary due to nonlinearity, which is not accounted for when deriving the CFL limit.The source terms in Eq. (I.6) depend on the velocity field u, which must also be updated using the most recent estimate for k and l not both zero.The problem with this solution is that it does not satisfy the boundary condition v = ∂ψ/∂x = 0 on y = y min and y max .This is because ζ is represented as a cosine series in y to allow the vorticity to take arbitrary boundary values, and v = 0 there requires ψ = 0, except for the x independent Fourier modes with k = 0.It is thus impossible to express ψ entirely in terms of a cosine series in y.If we further require that the mean horizontal i.e. the part of u = −∂ψ/∂ y for k = 0, be zero (without loss of generality), we can then require ψ = 0 on the boundaries, even for the k = 0 modes.The boundary conditions on ψ can be enforced by adding solutions to Laplace's equation, ∇ 2 ψ = 0. Let the streamfunction ψ be separated into three parts so that ψ = a + ψ p + ψ h , where ψ a = 1 2 ζ a (y − y min )(y − y max ) is the part due to the domain-average vorticity ζ a (associated with k = l = 0), ψ p is the particular solution arising from ψp = − ζ /(k 2 + l 2 ) in spectral space (for k and l not both zero), and ψ h consists of solutions to Laplace's equation, i.e. linear combinations of e ikx e ±ky for each discrete k.We want to choose ψ h = −ψ p on each boundary, and there are just enough solutions of (I.8) The first term in the sum, for k = 0, is found by taking the limit k → 0, e.g.sinh k( y max − y)/ sinh kL y reduces to (y max − y)/L y .
Once the streamfunction ψ is determined, it is re-expressed as a sine series in y.This allows the velocity field u to be computed efficiently in spectral space by simple wavenumber multiplication.An inverse fast Fourier transform is then performed to recover u in physical space.
To justify the choice of hyperdiffusion, in particular the choice of the characteristic vorticity ζ char defined above as a premultiplier, Fig. I. 43 shows vorticity diagnostics from a 4096 × 512 pseudo-spectral simulation of the Straka test case.Here, following system of equations for the scalar vorticity ζ = v x − u y (spatial subscripts indicate partial differentiation while u = (u, v) is the velocity field) and buoyancy b = −g(ρ − ρ 0 )/ρ 0 : Dζ Dt = b x

Fig. 3 .
Fig.3.Parcel to grid interpolation across periodic boundaries (x grid points are labelled).Here, the elliptical parcel whose centre lies in the first horizontal grid cell has a support point in the periodic extension lying in the grid cell at the opposite end of the domain.

Fig. 4 .
Fig. 4. Parcel to grid interpolation across free slip boundaries ( y grid points are labelled).

Fig. 5 .
Fig.5.Parcel mirroring at vertical boundaries.The child parcel in the halo region (black, dashed ellipse) is mirrored such that its centre is inside the domain.The orange, dashed ellipse denotes the parent parcel that was split.

Fig. 7 .
Fig. 7. Parcel configurations before (leftmost) and after applying the different parcel correction methods 2 times in the domain [0, 1] 2 , discretised into 32 × 32 cells.Each subplot shows only the lower left sixteenth of the domain.The divergent flow correction tends to cluster the parcels compared to the gradient correction.The parcels are more homogeneously distributed after applying the divergent flow and gradient corrections sequentially.

Fig. 8 .
Fig. 8. Flow chart of the EPIC program.The user provides gridded data fields which are used to initialise the parcels.Occasionally, parcel and field data are

Fig. 9 .
Fig. 9. Evolution of the r.m.s.vorticity ζ rms for the Taylor-Green test case with initially 9 parcels per cell.The decay reduces with increasing grid resolution.

Fig. 11 .
Fig. 11.Parcel configuration at times 0, 5, 20 and 80 (from left to right and top to bottom).The parcels are coloured according to their aspect ratio λ.This simulation starts with 9 parcels per cell on a 32 × 32 grid, giving a total of 9216 parcels.

Fig. 12 .
Fig. 12.The symmetry error in the Taylor-Green simulation as measured by the r.m.s.value of the vorticity folded into x ≥ 0, normalised by the initial maximum vorticity.

Fig. 13 .
Fig. 13.Parcel configuration of the Straka density current test case (4096 × 512 grid cells) at 900 s in the subdomain [0, 18] × [0, 6.4] km.This simulation uses the default parameter values listed in Table2.Each parcel is coloured according to its aspect ratio (above) or buoyancy (below).The square zoom windows in the lower panel are 600, 100 and 25 m wide, the last corresponding to two grid cell widths.Individual parcels can be seen in the finest zoom; they are overplotted depending on their order and thus do not give the actual buoyancy field.To make the images in the lower panel, the buoyancy field is first obtained on a very fine grid using the subgrid scale visualisation method described in Appendix H, then the individual parcels are plotted over this background.This ensures that any voids between the parcels are filled.Subsequent figures use subgrid scale visualisation alone, which produces a smoother image at the finest scales.Note: the parcel number indicated on this figure refers only to the subdomain shown.

Fig. 15 .
Fig. 15.Zoom of the buoyancy field b at t = 900 s, comparing EPIC (left column), PS with hyperviscosity (PS-h, middle column), and PS with molecular viscosity (PS-m, right column) for various grid resolutions (increasing downwards as indicated).Note, in any row, the PS simulations employ a grid which is twice finer than the EPIC one.In this test case, the exact b min = −g T max /T 0 = −0.4905m/s 2 , while b max = 0 m/s 2 .

Fig. 16 .
Fig. 16.Comparison of the buoyancy extrema b max (t) (top row) and b min (t) (bottom row) in the Straka test case, for different grid resolutions, and for different models (EPIC, PS-h and PS-m).The dashed lines indicate the exact initial values, which are theoretically conserved by Eq. (2).In EPIC, b max (0) and b min (0) vary with resolution because parcel attributes are initialised from gridded fields -see Section 3.1.

Fig. 18 .
Fig. 18.Cost in CPU seconds versus resolution for EPIC, PS-h and PS-m in simulating the Straka test case.

Fig. 19 .
Fig. 19.Development of the buoyancy field in the Robert test case using 256 × 384 grid cells in EPIC.Only the vertical range [100, 1350] m is shown.
. The warm bubble is centred at x c = (0, 300) m with A = 0.5 K, R = 150 m and σ = 50 m, while the small cold bubble is centred at x c = (60, 640) m with A = −0.15K, R = 0 m and σ = 50 m.The bubbles are contained in a box of width 1000 m and height 1500 m, whose lower left corner lies at (−500, 0) m.The corresponding buoyancy used in EPIC is b(r) = g θ(r)/θ 0 with reference potential temperature θ 0 = 303.15K and gravitational acceleration g = 9.81 m/s 2 .

Fig. 21 .
Fig. 21.Evolution of the gridded relative r.m.s.area error (top), average number of parcels per grid cell (middle), and the percentage of small parcels (bottom) for the asymmetric Robert test case at different grid resolutions.

Fig. 24 .
Fig. 24.Comparison of the buoyancy extrema b max (t) (top row) and b min (t) (bottom row) in the Robert test case, for different grid resolutions, and for different models (EPIC, PS-h and PS-m).The dashed lines indicate the exact initial values, which are theoretically conserved by Eq. (2).In EPIC, b max (0) and b min (0) vary with resolution because parcel attributes are initialised from gridded fields -see Section 3.1.

Fig. 25 .
Fig. 25.Comparison of the gridded buoyancy power spectrum P (k) at t = 900 s in the Robert test case, for different grid resolutions, and for different models (EPIC, PS-h and PS-m).Note: spectra are cut off below P = 0.01.For comparison with the Straka test case, see Fig.17.

17
Fig. 25.Comparison of the gridded buoyancy power spectrum P (k) at t = 900 s in the Robert test case, for different grid resolutions, and for different models (EPIC, PS-h and PS-m).Note: spectra are cut off below P = 0.01.For comparison with the Straka test case, see Fig.17.
Fig. 25.Comparison of the gridded buoyancy power spectrum P (k) at t = 900 s in the Robert test case, for different grid resolutions, and for different models (EPIC, PS-h and PS-m).Note: spectra are cut off below P = 0.01.For comparison with the Straka test case, see Fig.17.

Fig. 26 .
Fig. 26.Cost in CPU seconds versus resolution for EPIC, PS-h and PS-m in simulating the Robert test case.

Fig. 27 .
Fig. 27.Volume-weighted buoyancy histograms of the Robert test case at times t = 0 s (top panel) and t = 900 s (middle panel), together with the difference (final minus initial, bottom panel) for various grid resolutions.The bin width is 2.32 × 10 −4 m/s 2 .Each histogram is normalised such that its area sums to 1.

Fig. C. 28 . 2 (B 11 + 4 (B 11 −
Fig. C.28.An elliptical parcel centred at the origin with major axis along the x axis before splitting.The dashed ellipses represent the child parcels obtained after splitting.wemay simplify the eigenvalues to

Fig. D. 29 .
Fig. D.29.Examples of parcel merging chains and a tree.

Fig. D. 29
shows five different parcel configurations: the arrows in the diagrams refer to parcels pointing to other parcels.

Fig. D. 30 .
Fig. D.30.A parcel merging graph that is a dual tree.

Fig. D. 32 .
Fig. D.32.Merging procedure example: resolution of dual link and final configuration.

Fig. D. 33 .
Fig. D.33.Merging procedure example: resolution of a dual link with leaf parcels on both sides.1) Configuration after first (iterative) stage.2) Both duallinked parcels are made available, and outgoing links are removed.3) Final configuration.

Fig. D. 34 .
Fig. D.34.Procedure for resolving isolated dual links.1) Configuration after first (iterative) stage.2) The first parcel encountered is made available, and the corresponding outgoing link is removed.3) Final configuration.

Fig. F. 35 .
Fig. F.35.Evolution of the tracer scalar field in the static Taylor-Green vortex test case (as defined in[5]) using 100 × 100 grid cells.We use the default parameter settings as given in Table2, except for the initial number of parcels per cell, here 4 as in[5].The black solid line defines the reference solution
g. y b are still small, as shown in Fig. G.39 for variations in α, despite the complexity of the flow evolution.

Fig. G. 40 .
Fig. G.40.Effect of changing the maximum aspect ratio λ max on (top) the relative r.m.s.area error, (middle) the average number of parcels per grid cell, and (bottom) the percentage of small parcels with V i ≤ V min following splitting.

Fig. G. 41 .
Fig. G.41. Timing percentages of the Straka and Robert cases using the default parameter values in Table2and 512 ×64 and 256 ×384 grid cells, respectively.As expected the operations par2grid and grid2par take most of the time.Together they account for about 60% of the total time.All timings below 5% of the total time are accumulated in others.
H.42).For each such point, we first check if it falls within a circle

Fig. H. 42 .
Fig. H.42. Determination of the region contributing to the subgrid visualisation of a parcel, with the fine-scale visualisation grid shown by grey points.
b , making a first estimate of ζ n+1 and bn+1 , then iterating the above equations two more times with improved estimates of the source terms Ŝn+1 ζ and Ŝn+1 b .Algebraically, it is simplest to first store the fixed quantities ζm ≡ ζ n + 1 4 t Ŝn ζ and bm ≡ bn + 1 4 t Ŝn b , then, at each iteration, update ζ n+1 = L ζm + 1 of ζ n+1 .Since the flow is incompressible, we do this by expressing u = (−∂ψ/∂ y, ∂ψ/∂x) and finding the streamfunction ψ from the solution to Poisson's equation ∇ 2 ψ = ζ as in Appendix A. Here, however, since ζ is represented as a double Fourier series, it makes sense to represent ψ similarly.Then, the (k, l) Fourier coefficient ψ is found simply from ψ = − ζ /(k 2 + l 2 ) Laplace's equation to do this.If we represent the boundary values of ψ p byψ p (x, y min ) = k a − k e ikx and ψ p (x, y max ) = for ψ h is simply ψ h (x, y) = − k a − k sinh k( y max − y) + a + k sinh k( y − y min )sinh kL y e ikx .

Fig. I. 43 .
Fig. I.43.Time evolution of the rms, maximum and characteristic vorticity values in theStraka test case with hyperdiffusion, as found in a pseudo-spectral simulation with grid resolution 4096 × 512.we see that the r.m.s.vorticity ζ rms is smallest, as it is dominated by vast areas of the domain with little or no vorticity.On the other hand, the maximum vorticity ζ max is largest, and is furthermore highly variable in time.characteristic vorticity ζ char lies in between ζ rms and ζ max , and varies slowly in time like ζ rms .Notably, ζ char is about 4 times smaller than ζ max .
22,iwhere xi ≡ x i − x (see Appendix D for details).They are tentative because these matrix components as given do not generally preserve the total area, i.e.B 11 B 22 − (B 12 ) 2 = a 2 b 2 .In order to resolve this issue, we simply scale each entry in Eq. (19) so that the new matrix preserves area, i.e.

Table 1
Coefficients of the five-stage fourth-order 2N-storage Runge-Kutta scheme.

Table 3
Viscosity as a function of grid resolution.

Table G . 7
The maximum relative deviations of the mass-weighted centre of mass at 900 s.The reference values are obtained using the default parameter settings.Fig. G.39. Buoyancy weighted centre of mass in the vertical direction for various time step pre-factors α. Results are obtained with 512 × 64 grid cells.