LEoPart: a particle library for FEniCS

This paper introduces LEoPart, an add-on for the open source finite element software library FEniCS to seamlessly integrate Lagrangian particle functionality with (Eulerian) mesh-based finite element (FE) approaches. LEoPart - which is so much as to say: `Lagrangian-Eulerian on Particles' - contains tools for efficient, accurate and scalable advection of Lagrangian particles on arbitrary polyhedral meshes. In addition, LEoPart comes with several projection operators for exchanging information between the scattered particles and the mesh and \textit{vice versa}. These projection operators are based on a variational framework, which allows extension to high-order accuracy. In particular, by implementing a dedicated PDE-constrained particle-mesh projection operator, LEoPart provides all the tools for diffusion-free advection, while simultaneously achieving optimal convergence and ensuring conservation of the projected particle quantities on the underlying mesh. A range of numerical examples that are prototypical to passive and active tracer methods highlight the properties and the parallel performance of the different tools in LEoPart. Finally, future developments are identified. The source code for LEoPart is actively maintained and available under an open source license at https://bitbucket.org/jakob_maljaars/leopart.


Introduction
Passive and active tracer methods find applications in a versatile range of engineering areas such as geophysical flows and environmental fluid mechanics [1,2,3,4], experimental fluid mechanics [5,6,7], and bio-medical applications [8,9], to name a few. In passive tracers methods, the Lagrangian particle motion is fully determined by the carrier flow and there is no feedback mechanism from the particles to the carrier flow.
Such a feedback mechanism between the tracer particles and the carrier fluid is, however, typical to active tracer methods, in which particles actively affect the flow properties of the carrier fluid and vice versa. To Email addresses: j.m.maljaars@tudelft.nl / jakobmaljaars@gmail.com (Jakob M. Maljaars ), chris@bpi.cam.ac.uk (Chris N. Richardson ), nsime@carnegiescience.edu (Nathan Sime ) capture this interaction between the particles and the flow in numerical models, physical information which is carried by the tracer particles is used to estimate one or more additional source terms in the governing fluid equations. In a discrete setting, this typically requires the reconstruction of mesh fields from the scattered particle data when the fluid flow equations are solved using a mesh-based approach, such as finite difference (FD), finite volumes (FV), or finite elements (FE). Application examples of active tracer methods include, among many others, the modelling of turbulent (reacting) flows [10,11], and mantle convection problems [4,12,13,14]. Alternatively, the particle information can be used to solve the advective part of physical transport phenomena, resulting in so-called particle-mesh operator splitting schemes, see, e.g., [15,16], and the earlier work by Maljaars et al. [17,18,19] from which LEoPart has evolved. Eliminating artificial dissipation by using Lagrangian particles for the discretization of the advection operator primarily motivates such methods, which are particularly promising for simulating advection dominated flows [17,18] or free-surface flows [16,20].
In all the aforementioned methods and applications, particle-based and mesh-based discretization techniques essentially become intertwined. To render such a combination of Lagrangian particle methods in conjunction with mesh-based FD, FV, or FE solvers tractable for simulating practical engineering problems, a suite of dedicated, flexible and efficient tools is indispensable.
The open source library LEoPart which is presented in this paper provides such a toolbox by integrating particle functionality in the open source FE library FEniCS [21]. LEoPart -which stands for 'Lagrangian-Eulerian on Particles' -contains utilities for efficiently and flexibly advecting and tracking a set of user-defined particles on arbitrary polyhedral meshes. In addition, LEoPart contains a suite of tools for projecting scattered particle data onto the mesh and vice versa in a high-order accurate, efficient and physically sound manner. These features render LEoPart unique in its kind in that it is one of the few, if not the only, open source package embedding Lagrangian particles functionality in a mature open source FE library. The resulting combination of LEoPart and FEniCS is particularly suited for application to flow problems involving active or passive tracers, or to implement particle-mesh operator splitting schemes, as will be demonstrated by various numerical examples throughout.
The paper is structured as follows. Section 2 gives some background information on the encompassing FEniCS library, and provides a helicopter view of LEoPart. Section 3 describes the implementation of particles, as well as the advection and tracking of particles in LEoPart. Section 4 details the available particle-mesh interaction strategies, paying particular attention to the PDE-constrained particle-mesh interaction in Section 4.1.3, which enables the reconstruction of conservative mesh fields from a set of moving particles. Section 5 illustrates some example applications, meanwhile paying attention to the performance and scaling properties of LEoPart. Section 6 closes the paper by presenting conclusions and providing an outlook on future developments.

A primer on FEniCS
FEniCS is a Finite Element (FE) framework, written in C++, with a Python interface. One of the major challenges of writing an FE code is that the computation which needs most user configuration is that of the local element tensor, which lies at the innermost part of the assembly loop. The local element tensor describes the matrix entries on each individual element, and relates to the physical equations of the system. FEniCS solves this problem by allowing the user to write these equations in a Domain Specific Language (DSL), which is then automatically compiled to C code to be called by the assembly loop.
In addition to simplifying the construction of the local element tensor, FEniCS makes it easier to run in parallel using MPI. Running a DOLFIN code with MPI can be as simple as mpirun -n 32 python3 demo.py. The mesh will be partitioned into 32 chunks, and the problem will be distributed across 32 processes for this job. Calling the solve() method runs the form compiler (FFC) and compiles the symbolic expressions into C code, which is compiled and loaded into memory. A global matrix equation is then assembled using this generated code for the local element tensor, and finally an LU solver is called to solve the resulting system of equations. Whilst this example is simple and compact, many options exist to expand each part of the problem, for example by applying Dirichlet boundary conditions, or assembling matrix and vector separately, and choosing more sophisticated solvers.
For larger problems, it is important to run in parallel using MPI. Mesh partitioning is performed using PT-SCOTCH or ParMETIS, and there is support for the HDF5 file format, which allows parallel access of large datasets. Third party libraries are used throughout, wherever possible: PETSc is the linear algebra backend of choice, with a large selection of parallel solvers available.
FEniCS is an open source package, and is available for various platforms. The latest information can be found on the project website www.fenicsproject.org.

LE oP art code structure
LEoPart is built on top of the FEniCS package, and adds new concepts for the advection of particles on polygonal meshes, and the interaction between particles and the mesh. A central paradigm in the design of LEoPart is that it serves as an add-on to FEniCS, using the existing FEniCS tool chain wherever possible.
As a result all the FEniCS functionality remains available for the user. In particular, LEoPart is designed such that it seamlessly integrates with the mesh partitioning in FEniCS facilitating parallelism using MPI.
To provide a fast and user friendly suite of tools, LEoPart wraps C++ code in Python using pybind 1 for the computationally demanding parts such as the particle advection and the matrix assembly. Particle pre-processing and post-processing is done in Python. Fig. 1   In the remainder of this paper, we essentially discuss the different components from Fig. 1. Particular attention is paid to the particles class, the advection of the particles, and the tools for the interaction between the particles and the mesh via ℓ 2 projections or PDE-constrained projections.
Implementation details of this solver are beyond the scope of this paper.

Particle Functionality
This section explains the implementation of particles in LEoPart and discusses the particle advection and particle tracking strategy on polyhedral meshes.

Particle initialization
The particles class forms the backbone for dealing with the Lagrangian particles in LEoPart. Operations such as particle advection and the particle-mesh projections require as input an instance of this particles class, see Fig. 1.
Each particle is assumed to have at least one property attached to it, viz. a spatial coordinate, which henceforth is denoted by x p for a single particle. Moreover, it is presumed that particles always live in a spatial domain, denoted Ω, so that the particle coordinate set is defined as with N p the total number of particles. For notational convenience, we also will make use of the index set of particles and the index set of particles hosted by cell K, at a fixed time instant t, which are defined as Whereas carrying a spatial coordinate might be sufficient for passive particle tracing, additional properties, such as density, concentration, or momentum values, need to be attached to the particles for active particle tracing. For a scalar and vector valued property, such particle quantities are defined as respectively, where d = 1, 2, 3 is the spatial dimension.
The set of Lagrangian particles which is formed by the coordinate set X t , and an arbitrary number and ordering of scalar and/or vector valued particle quantities is used to instantiate the particles class in LEoPart. For example, instantiating the particles class on a unit square mesh from a user-defined coordinate array, together with arrays for a scalar-and a vector-valued property is as straightforward as For the sake of generality it is noted that the ordering and the length of the list with the particle quantities, i.e. [psi p, v p] in the above example is arbitrary.

Particle advection
Three different particle advection schemes are currently supported by LEoPart. These advection schemes solve the system of ODEs: given a vector-valued velocity field a h , solve ∀ p ∈ S t where a particle can carry an arbitrary number of scalar-and/or vector-valued quantities that will stay constant during the particle advection stage. The advect particles class solves Eq. (7) with a first-order accurate Euler forward method, and the two and three stage Runge-Kutta methods are available via the advect rk2 and the advect rk3 classes, respectively. The two multi-stage Runge-Kutta advection schemes inherit from the advect particles class, and a typical constructor for the latter reads This snippet shows that the particle advection classes require a particles instance, a velocity field specified in the Function and its corresponding FunctionSpace, and a MeshFunction for marking the boundaries, see Section 3.3. A complete overview of all the particle advection constructors is found in advect particles.h. Imperative for both the particle advection, as well as the particle-mesh interactions later on, is the evaluation of mesh fields at a potentially large number of points inside the domain. In order to do so, it must be known in which cell a particle resides, for which two options are available at a meta-level. The first option is that each particle carries a reference to its hosting cell in the mesh. As soon as a particle crosses a cell boundary, this particle-to-cell reference is updated. The second option is to consider a cell as a bucket filled with particles. Rather than a particle keeping track of its hosting cell, the bookkeeping is done at the cell level, i.e. each cell contains a list of particles. When a particle leaves a cell, the particle is removed from the cell's 'particle bucket', and added to the receiving cell's particle bucket.
The latter method is used in LEoPart as this allows efficient evaluation of a mesh-field at the particle positions. The expansion coefficients for the mesh-field are evaluated only once per cell, and are subsequently re-used for all particle positions within that cell. In parallel computations with distributed memory, the mesh partitioning which is taken care of by FEniCS, provides a convenient structure to store the particle data on the different processes. From a load balance perspective, such an approach is particularly suited for cases in which particles are more or less uniformly distributed over the spatial domain.
A second challenge, specifically related to the particle advection, is to efficiently keep track of the hosting cell for the Lagrangian particles in the unstructured polyhedral mesh. In the literature several procedures have been developed with this purpose. For example: superposition of a coarse Cartesian mesh onto the unstructured mesh [24], the tetrahedral walk method [25], or methods based on barycentric interpolation [26].
An alternative method is the convex polyhedron method [27], which assumes that the mesh consists of convex polyhedral cells. This indeed is the case for the simplicial cells used in FEniCS. For each facet in the mesh, the midpoint, the unit normal, and the connectivity of the facet are pre-computed. Concerning the connectivity, a facet has two neighboring cells for facets internal to the mesh, or only one neighboring cell for exterior boundary facets and internal facets which are located on processor boundaries. From the perspective of a mesh cell, indicated by K, the sign of the facet unit normals is adapted so as to make sure that they are always outwardly directed. Algorithm 1 Convex polyhedron particle tracking: pseudo-code for a single particle initially located in a cell K, using Euler forward for the particle advection.

7:
Time to facet intersection: ∆ti = bi/(a x 0 p , t · nK i ), with i the indices of the neighboring cells.

12:
∆t (k) ⇐ 0 13: else 14: Push particle to facet 15: if Facet has two neighboring cells then The convex polyhedron particle tracking then proceeds as follows for a particle p, which at time t is located at x p (t) within cell K 0 , see Fig. 2 for a principle sketch. Assume the particle has a velocity a (x p , t) and the time step used for advecting the particle is ∆t (0) = ∆t, where the use of a superscript will become clear shortly. As the first step in the particle tracking algorithm, the time to intersect the i-th facet of element K 0 is computed as ∆t i = b i /(a (x p , t) · n Ki ), with b i the orthogonal distance between the particle and the facet with index i. Next, the minimum, yet positive, time to intersection is computed as ∆t imin = min{max (0, ∆t i )}, with i the indices of the neighboring cells. If ∆t imin > ∆t (k) the particle is pushed to its new position using timestep ∆t (k) , and the time step is terminated by setting ∆t (k+1) = 0. If ∆t imin < ∆t (k) , the particle p is pushed to the facet intersection x (k) p using ∆t imin , and the hosting cell is updated to facet i min , that is the facet with the index corresponding to ∆t imin . Furthermore, the time step is decremented to ∆t (k+1) = ∆t (k) − ∆t imin , after which the particle tracking continues until the time step for a particle has zero time remaining. Algorithm 1 contains the pseudo-code for the convex polyhedron particle tracking, using an Euler method for the particle advection.
The convex polyhedron procedure comes with a number of advantages, also pointed out in [27]. First of all, it is applicable to arbitrary polyhedral meshes, both in two and three spatial dimensions, on the premise that cells are convex. Secondly, by marking the facets on the boundary of the domain, it is straightforward to detect if, when, and where a particle hits a specific boundary. This feature is useful when dealing with external boundaries, as well as internal boundaries, with the latter resulting from the mesh partitioning in parallel computations. Finally, the fraction of the time step spent in a certain cell is explicitly known in the convex polyhedron method. This can facilitate the updating of the particle velocity along its trajectory [28], although it is not further pursued in the code in its present form.

Boundary conditions at particle level
Apart from enforcing the boundary conditions on the background mesh, modifications at the particle level are also required when a particle hits a specific boundary. This event is detected when a particle is pushed to a facet having only one neighboring cell, see Line 20 in Algorithm 1. The following particle boundaries are supported in LEoPart, see for details advect particles.h: The user can mark the different parts of the boundary using a MeshFunction, where this mesh function is passed to the particle advection class, see, for instance, the unit tests in test 2d advect.py. The implementation of the different particle boundary conditions is briefly discussed below.

Periodic boundaries
When a particle crosses a facet which is marked as a periodic boundary, it should reappear at the opposing side of the domain. To implement this, facets on a periodic boundary, need to be matched against facets at the opposing side of the domain. This is taken care of in the advection class when the boundary is marked as a periodic boundary via a MeshFunction, and the coordinate-limits of the opposing boundaries are specified pairwise by the user. To illustrate this, a bi-periodic unit square domain is marked in LEoPart, when using the forward Euler particle advection, as Full code examples of how periodic boundaries are applied in 2D and 3D are found in TaylorGreen 2D.py and TaylorGreen 3D.py.

Open boundaries and particle insertion/deletion
At boundaries that are marked as open, particles either escape or enter the domain. When a particle escapes through an open boundary facet, it simply is deleted from the list of particles. Inflow boundaries, however, are less straightforward since new particles are to be created. This is done via the AddDelete class, which also allows a user to keep control over the number of particles per cell.
AddDelete takes as arguments the particle class, a lower and an upper bound for the number of particles per cell, and a list of FEniCS functions which are to be used for initializing the particle values. If a cell is marked as almost empty, i.e. the number of particles is lower than a preset lower bound for the number of particles per cell, the particle deficit is complemented by creating new particles. The locations for the new particles in their hosting cell are determined using a random number generator. To initialize the other particle quantities, two options are at the user's convenience: the particle value is either initialized based on a point interpolation from the underlying mesh field, or the particle value is assigned based on rounding-off the interpolated field value to a lower or upper boundary, i.e.
This feature is particularly useful when the particles carry binary fields, such as the density in two-fluid simulations.
The minimal example below demonstrates the LEoPart implementation using the two options for particle insertion/deletion. Results are depicted in Fig. 3. The AddDelete class can also be used for keeping control over the maximum number of particles per cell by specifying the variable p max in the above presented code. If a cell in the do sweep method is marked to contain more particles than prescribed, the surplus of, say, m particles is removed by deleting m particles with the shortest distance to another particle in that cell. This procedure ensures that particles are removed evenly from the cell interior.
As a final remark: an upwind initialization of the particle value, i.e. initializing the particle value near open boundaries based on the value at the (inflow) boundary facet, is expected to be a useful feature not yet included in LEoPart. Figure 3: Particle insertion: initial particle field (left), particle value assignment by interpolation (middle), binary particle value assignment (right).

Closed boundaries
When a particle hits a closed boundary during the time step, the particle is reflected by setting the particle velocity to the reflected value, i.e.
in which n the outwardly directed unit normal vector to a boundary facet.

Particle-mesh interaction
Obtaining mesh fields from the scattered particle data and updating the particle values from a known mesh field is essential to active tracer problems. These particle-mesh interaction steps go by various names in the literature such as: 'gather-scatter' steps [25,29], 'forward interpolation -backward estimation' [30] or 'particle weighting' [31].
In line with our earlier work on particle-mesh schemes [17,18,19], the data transfer operators are consistently coined 'particle-mesh projection' for the data transfer from the set of scattered particles to the mesh, whereas the opposite route is indicated by 'mesh-particle projection'. This convention reflects that the data transfer operators are perceived as projections between different spaces. More precisely, information needs to be projected from a particle space onto a mesh space and vice versa. Adopting this point of view, it readily follows that the data transfer operations are auxiliary steps, which should not deteriorate accuracy, violate consistency, or compromise on conservation.
To comply with these requirements LEoPart adopts a variational approach to formulate the particlemesh and the mesh-particle projections. An ℓ 2 objective function is the starting point for deriving the mutual particle-mesh interactions. For a scalar-valued mesh field ψ h and a scalar-valued particle field ψ p , this objective function reads where it remains to specify the minimizer, other than to say that either ψ h or ψ p are used to fit this purpose. The implementation of the projection strategies which can be formulated based on Eq. (10) are further highlighted for a scalar quantity ψ in the remainder of this section, and the projections for a vectorvalued quantity follow the same path. More specifically, Section 4.1 discusses the various particle-mesh projections available in LEoPart, and in Section 4.2 the available mesh-particle projections are discussed.
The notation T := {K} is used throughout to indicate the set of disjoint cells K that constitutes a meshing of the domain Ω, and each cell K has a boundary ∂K.

Particle-mesh projections
Common to the available particle-mesh projections in LEoPart is the minimization of the objective function Eq. (10) with respect to an unknown mesh field ψ h given a known particle field ψ p . For this, the function space in which ψ h is approximated must be defined. To this end, LEoPart conveniently exploits existing FEniCS tools for defining arbitrary order polynomial function spaces. For reasons that become clear shortly, LEoPart is tailored for projecting the particle data onto discontinuous function spaces at the mesh. In the scalar-valued setting these function spaces are defined by where T is the partitioning of the domain Ω into a set of cells K, and P k (K) denotes the space spanned by Lagrange polynomials on K, where the subscript k ≥ 0 indicates the polynomial order.

ℓ 2 -projection
With these definitions, the most elementary particle-mesh projection is found by minimizing Eq. (10) for ψ h ∈ W h , which results in the ℓ 2 -projection: given the particle values ψ n p ∈ Ψ t and particle positions Given the definition for W h in Eq. (11), ψ h , w h ∈ W h are discontinuous across cell boundaries. Hence, Eq. (12) is solved in a cellwise fashion, i.e.
requiring the inversion of small, local matrices only, thus being amenable to an efficient, parallel implementation.
The particle-mesh projection via the ℓ 2 projection is implemented in LEoPart in the l2projection class, which is instantiated as LEoPart also allows projection of particle data onto a continuous Galerkin space -which leads to a global system for Eq.

Bounded ℓ 2 -projection
The minimization problem, Eq. (12), can be interpreted as a quadratic programming problem. This class of problems has been thoroughly analyzed in literature, and well-known techniques exist to extend these problems with equality, inequality, and box constraints, see e.g. [32] and references. In the context of the particle-mesh projection, imposing box constraints of the form can be particularly useful to ensure that the mesh field is bounded by [l, u].
In LEoPart, the box-constrained ℓ 2 projection is implemented via a specialization of the project method, which can be invoked as

PDE-constrained particle-mesh interaction
The motivation for introducing Lagrangian particles -particularly when used as active tracers -is to conveniently accommodate advection. The particle-mesh projections presented in the preceding two sections, however, do not possess conservation properties. That is, initializing a particle quantity from an initial mesh field, advecting the particles, and subsequently reconstructing a mesh field from the updated particle positions with the (box-constrained) ℓ 2 -projection, results in a reconstructed mesh field with different integral properties. One way to conserve the mesh properties over the sequence of particle steps, is to keep track of the integral quantities on the mesh. This is accomplished by constraining the objective function for the particle-mesh projection to obtain fields ψ h that satisfy a hyperbolic conservation law. The resulting PDEconstrained particle-mesh projection, developed in [18], possesses local (i.e. cellwise) and global conservation properties, and essentially involves solving the minimization problem: given such that: is satisfied in a weak sense. For brevity, only periodic boundaries or boundaries with vanishing normal velocity (i.e. a · n = 0) are considered in this paper. For a more elaborate discussion on other boundary conditions in Eq. (15), reference is made to [18].
By casting the strong form of the constraint into a weak form by multiplying Eq. (15b) with a Lagrange multiplier field λ h ∈ T h -with T h defined in Appendix A, Eq. (A.2) -and after applying integration by parts, the PDE-constrained optimization problem amounts to finding the stationary points of the Lagrangian (16) in which the first term at the right-hand side is similar to the objective function in Eq.
where the different contributions readily follow from Appendix A, Eq. (A.4).
Secondly, the assemble method assembles the local contributions into a global matrix-vector system.
Since ψ h and λ h are local to a cell, the resulting global system of equations is expressed in terms of the flux variableψ h only. That is, the assemble method assembles the global system as follows in which the wedge denotes assembly of the cell contributions into the global matrix, where this requires the inversion of a small saddle-point problem for each cell K independently, so that the assembly procedure is amenable to a fast parallel implementation.
The method solve problem for obtaining the local unknownsψ n+1 and (optionally) the Lagrange multiplier unknowns λ n+1 .

Mesh-particle projection
The mesh-particle projections, for updating particle properties from a given mesh field, also take the objective functional Eq. (10) as their starting point. Contrary to the particle-mesh projections, the particlefield is the unknown, so that the objective function needs minimization with respect to the particle field ψ p , for the projection of a scalar-valued quantity. Performing the minimization results in: given ψ h ∈ W h , find Since this equation must hold for arbitrary variations δψ, the particularly simple result for the mesh-particle projection becomes i.e. particles values are obtained via interpolation of the mesh field. Interpolating a mesh field to particles is done in LEoPart via the interpolate method in the particle class, i.e. An interpolation overwrites the particle quantities with the interpolated mesh values. However, one of the assets of combining a high resolution particle field with a comparatively low-resolution mesh field is that the particle field may provide sub-grid information to the mesh [36,37,28]. In order to take advantage of this, the particles need to have a certain degree of independence from the mesh field. This is achieved by updating the particle quantities by projecting the change in the mesh field rather than overwriting particle quantities. For a scalar valued quantity, this incremental update readṡ in whichψ h ∈ W h the time derivative of the mesh field.
A fully-discrete implementation of Eq. (22) is implemented in LEoPart using the θ method for the time discretization: where ∆t the time step, 1/2 ≤ θ ≤ 1, andψ n h ∈ W h is defined asψ n h = (ψ n h − ψ n Two closing remarks are made in view of this incremental update: • For step 1, θ = 1 sinceψ 0 h is usually not defined. • The increment from the old time level, i.e.ψ n h x n p is stored at the particle level between consecutive time steps, for efficiency reasons. This requires an additional slot on the particles, i.e. dpsi p dt. The integer array in the increment call indicates which particle slots are used for the incremental update, i.e. p.increment(psih new, psih old, [1,2], theta, step).

Example applications
This section demonstrates the performance of LEoPart in terms of accuracy, conservation and computational run time for a number of advection-dominated problems. On a per-time-step basis, the particle-mesh approach typically comprises the following sequence of steps: 1. Lagrangian advection of the particles, as outlined in Section 3.2.
2. particle-mesh projection to project the scattered particle data onto an Eulerian mesh field using either the ℓ 2 -projection, discussed in Section 4.1.1, the bound constrained ℓ 2 -projection from Section 4.1.2, or the PDE-constrained projection, Section 4.1.3.
3. (optional) solve the physical problem -e.g. a diffusion or Stokes problem -at the mesh, using the reconstructed mesh field as a source term or as an intermediate solution to the discrete equations.
4. (optional) update the particles given the solution at the mesh, using the mesh-particle interaction tools from Section 4.2.
Step 1 and 2 are sufficient to solve an advection problem at the particles and to test the reconstruction of mesh fields from the moving particles. The sequence of steps 1-4 can be used for active tracer modelling or a particle-mesh operator splitting for, e.g., the advection-diffusion equation or the incompressible Navier-Stokes equations, see also Maljaars [17,18]. For all the examples presented below, reference is made to the corresponding computer code in the LEoPart repository on Bitbucket.

Translation of a periodic pulse
As a straightforward, yet illustrative example, the translation of the sinusoidal profile ψ(x, 0) = sin 2πx sin 2πy (24) on the bi-periodic unit square is considered, in analogy to LeVeque [38]. Owing to its simplicity, this test allows to assess the accuracy and the convergence properties of the ℓ 2 -and the PDE-constrained particlemesh projection. Furthermore, it is used to illustrate the performance of the scheme by means of a strongscaling study. Test results can be reproduced by running SineHump convergence.py for the convergence study, and SineHump hires.py for the scaling study. In this example, the advective velocity field a = [1, 1] ⊤ is used, so that at t = 1 the initial data should be recovered. To investigate convergence, a coarse mesh, consisting of 11 × 11 × 2 regular triangular cells is refined 5 times where the finest mesh contains 176 × 176 × 2 cells. Different polynomial orders k = 1, 2, 3 are used for the discontinuous function space W h , Eq. (11), onto which the particle data is projected.
Particles are seeded in a regular lattice on the mesh, such that each cell contains approximately 15 particles, independent of the mesh resolution, see Fig. 4 as an example. An Euler scheme suffices for exact particle advection, and the time step corresponds to a CFL-number of approximately 1. Furthermore, in the PDEconstrained particle-mesh projection, the β-parameter is set to 1e-6, and ζ is set to 0. All computations use a direct sparse solver (SuperLU) to solve the global system of equations. Also, note that for this advection problem, the scalar valued property ψ p , attached to each particle p, needs no updating and stays constant throughout the computation. The accuracy of the method is assessed at t = 1, using both the ℓ 2 -particle-mesh projection from Section 4.1.1, and the PDE-constrained projection from Section 4.1.3 upon refining the mesh and the time step, see Table 1. Optimal convergence rates of order k + 1 are observed for both projections strategies, thus highlighting the accuracy of the particle-mesh projections in conjunction with particle advection.  As reported in Table 1 the error levels for the ℓ 2 -and the PDE-constrained particle-mesh are similar.
The difference between these projections becomes clear, however, by investigating the mass error which is plotted as a function of time for the ℓ 2 -projection in Fig. 5a, and for the PDE-constrained projection in Fig. 5b. Evident from these figures is that the ℓ 2 -projection yields a nonzero mass error, whereas for the PDE-constrained projection the mass error is zero to machine precision.
The trade-off between the non-conservative ℓ 2 -projection and the conservative PDE-constrained is elucidated by investigating the computational times. Wallclock times for the high-resolution case -polynomial order k = 3 with 61, 952 cells, 921, 837 particles and 160 time steps -run on different number of Intel Xeon CPU E5-2690 v4 processors are presented in Fig. 6. Solving the global system for the PDE-constrained projection using a direct solver is computationally much more demanding compared to the (local) ℓ 2 -projection.
This illustrates the need for an efficient iterative solver for the PDE-constrained projection step.    The scaling of the different components is further investigated in Table 2, summarizing the speed-up for the different tests relative to the run on one processor. The particle advection and the assembly step -with the latter only relevant for the PDE-constrained projection -exhibit excellent scaling properties, which is explained by the locality of these operations, i.e. these steps are performed cellwise. This also holds true for the ℓ 2 -projection, which amounts to a cellwise projection of the particle properties onto a discontinuous function space, see Section 4.1.1. Clearly, the direct sparse solver for the PDE-constrained projection does not possess optimal scaling properties, which thus appears appears the limiting factor for the scalability of this step.

Slotted disk
Combining particle-based and mesh-based techniques appears particularly promising for applications in which sharp flow features are to be preserved. The solid body rotation of a slotted disk after Zalesak [39] is a prototypical example of such problems, and often serves as a benchmark for interface tracking schemes, see [40,41], among many others. We now use this test to demonstrate the various tools that LEoPart offers for tracking sharp discontinuities in material properties, such as a density jump in immiscible multi-fluid flows.
The problem set-up is as follows. A disk with radius 0.
The time step is set to 0.02, so that one full rotation is performed in 100 steps. The three-stage Runge-Kutta scheme, available via the advect rk3 class, is used for the paticle advection. Particle-mesh projection k l ζ Different particle-mesh projection strategies that are available in LEoPart are investigated in this example, see Table 3. Note that Case 2 and Case 3 only differ in terms of the ζ parameter, see Eq. (16).
The computer code in SlottedDisk rotation l2.py reproduces Case 1, the computer code for reproducing Case 2 and Case 3 is found in SlottedDisk rotation PDE.py. The test is run using 8 Intel Xeon CPU E5-2690 v4 processors.
As for the previous example, there is no updating of the scalar valued property attached to a particle from the mesh-solution for this advection problem. Hence, sharp features at the particle level pertain, and the particle advection part for all the different cases is equal. Fig. 7 shows the particle field at t = 0 (initial field), t = 1 (half rotation) and after a full rotation at t = 2.  For the bounded ℓ 2 -projection (Case 1), the discontinuity in the particle field is captured at the mesh without under-or overshoot, and values stay within the prescribed bounds 0 ≤ ψ h ≤ 1 to machine precision, see Table 4. Another advantage of this approach is that it is fast, and easy to implement in parallel. However, as reported in Table 4, the mass error for the bounded ℓ 2 -projection is non-zero. The latter issue is overcome by using the conservative PDE-constrained particle-mesh projection for the reconstruction of the mesh fields, Case 2 and Case 3. However, for Case 2 -in which ζ = 0 -localized over-and undershoot is observed near the discontinuities. As argued in [18], this artifact is a resolution issue with the mesh resolution being too coarse to capture the sharp discontinuity at the particle level monotonically. By setting ζ to a value of 30, i.e. approximately equal to the number of particles per cell, this issue is effectively mitigated, see also the minimum and the maximum values for ψ h over the entire computation reported in Table 4.

Lock exchange test
As an example which is geared towards practical multi-fluid applications, the lock-exchange test is considered. Driven by gravity, a dense fluid current moves underneath a lighter fluid, where a thin vertical membrane initially separates the two fluids. Using LEoPart, density tracking and momentum advection is done using Lagrangian particles, and the incompressible, unsteady Stokes equations are discretized on the mesh using the hybridized discontinuous Galerkin (HDG) method from [22,23]. The computer code for this test can be found in LockExchange.py.  : Lock exchange: density field at the mesh at t * = 10 using ℓ 2 or PDE constrained particle-mesh projections.
Two different particle-mesh projection configurations are considered. In a first configuration, the local ℓ 2 -projections are used to project density and momentum data from the particles to the mesh. The density projection is rendered bound preserving by imposing box constraints on the local least squares problem.
The density values attached to a particle stay constant throughout the computation, whereas the specific momentum value attached to a particle is updated using Eq. (23) after the Stokes solve.
In the second configuration, the PDE-constrained projection is used for the projection of density and momentum data from the particles to the mesh. This results in a global problem with 964,160 unknowns for the density projection, and 1,928,320 unknowns for the momentum projection. The global systems resulting from the PDE-constrained projections are solved using a GMRES solver in conjunction with an algebraic multigrid preconditioner, where this solver/preconditioner pair is used as a black-box. Furthermore, for the density projection, ζ = 20 to penalize over-and undershoot in the reconstructed density fields, and as for the other configuration, the specific momentum value attached to a particle is updated using Eq. (23).
Visually, the mesh density fields at t * = 10 are comparable in terms of the bulk behavior for both projections, Fig. 9, although differences in the small scale features are observed. No further attempts are made to interpret and value these small scale differences between the local ℓ 2 -projections and the PDE-constrained projections, other than to say that the PDE-constrained approach results in mass-and momentum-conservative fields, whereas this is not so for the ℓ 2 -projection, see Fig. 10.
Timings are reported in Fig. 11  Results provide insight into the performance of the different parts, and indicate which parts of the scheme are critical for obtaining higher performance. Clearly, the computational time is dominated by the global solves for the Stokes system, and the PDE-constrained particle-mesh projections, Fig. 11a. The advantage of using iterative solvers for the PDE-projections also becomes clear from this figure. Even though the system for the momentum projection is larger than the system for the Stokes projection, the wallclock time for the momentum projection is considerably smaller and appears to possess better scaling compared to the Stokes solve. Therefore, implementing the iterative solver for the Stokes solver proposed in [42] is believed to be an important step for improving the performance, and probably indispensable for problems in three spatial dimensions. Noteworthy to mention is that the ℓ 2 particle-mesh projections exhibit excellent scaling properties, on top of their low computational footprint, see Fig. 11b.

Rayleigh-Taylor instability benchmark
As a last example, the applicability of LEoPart for simulating active particle tracing problems is demonstrated. A well-established benchmark from the geodynamics community is used to fit this purpose, and we consider the Rayleigh-Taylor instability community benchmark proposed by van Keken and co-workers [12].
This benchmark involves the evolution of a geodynamic laminar flow whose initial state is a compositionally light material, situated under a considerably thicker and denser layer. Tools from LEoPart are used to discretise the Stokes system using the HDG method, and Lagrangian particles are used for a diffusion-free tracking of the chemical composition field. Conservative composition fields at the mesh are reconstructed from the particle representation using the PDE-constrained projection. The reconstructed composition field is subsequently used to compute a source term f and a composition-dependent viscosity η in the Stokes equations. The code for this numerical experiment can be found in RayleighTaylorInstability.py, considering the following problem description: where Rb = 1 is the compositional Rayleigh number andĝ = (0, −1) ⊤ is the unit vector acting in the direction of gravity. The viscosity of the Stokes system is dependent on the chemical composition function where η top and η bottom are the viscosities of the initial heavy top and light bottom layers, respectively. The initial state of ψ is a small perturbation from the rest state of a light layer of depth The boundary conditions are set such that u = 0 on the bottom (y = 0) and top (y = 1) boundaries, and u · n = 0 on the left (x = 0) and right (x = L) boundaries. Additionally, ψ = 0 on the bottom boundary (y = 0) and ψ = 1 on the top boundary (y = 1).
The domain is triangulated into 40 2 ×2, 80 2 ×2 and 160 2 ×2 regular simplices. The mesh is then displaced in order to align the cells with the initial viscosity discontinuity described in (29). Each cell is assigned 25 particles -carrying a composition value ψ p -resulting in 80 000, 320 000 and 1 280 000 particles in the meshes, respectively. This number of particles remains constant throughout the simulation. Furthermore, the time step size is chosen based on the Courant-Friedrichs-Levy (CFL) criterion, i.e. ∆t j = C CFL h min /max Ω |u(t j )| where t j is the jth time step, h min is the minimum mesh cell diameter and C CFL > 0 is the CFL parameter, chosen here to be C CFL = 0.5. The HDG method is used with k = 1 to solve the Stokes system. After computing the solution approximation of the Stokes system, the particles are advected. Given ψ p , the conservative PDE-constrained projection is used to update the composition field ψ h on the mesh and thereby the source term f and the viscosity η. The composition function ψ h is represented by the k = 1 DG-finite element space, and we choose ζ = 25 to penalise over-and undershoot of the the reconstructed field.
The benchmark [12] considers three cases, η top /η bottom ∈ {1, 10, 100}. For brevity we document the case with a 100-fold difference in the viscosity layers, namely, η top = 1 and η bottom = 0.01. For comparison with [12], the distribution of the 1 280 000 particles in the 160 2 × 2 mesh at simulation times t = 500, 1000 and 1500 are shown in Fig. 12. Noteworthy to mention is that the particle distribution remains uniform throughout. This feature owes to HDG discretization of the Stokes system, yielding pointwise div-free velocity fields by which the particles are advected [17].
(a) t = 500 (b) t = 1000 (c) t = 1500  Here n is the number of cells along one axis of the mesh and np is the number of particles used in the simulation.
A quantitative comparison with literature results from [12,43] is made by investigating the root mean square (RMS) velocity, u rms , the mass conservation error, ǫ ∆ψΩ , and growth rate, γ, where u rms = Ω u · u dΩ λ , ǫ ∆ψΩ = Ω (ψ h (x, t) − ψ h (x, 0)) dΩ , γ = ln (u rms (t 1 )) − ln (u rms (0)) t 1 , and t 1 is the simulation time at the first time step. The computed quantities (30) are shown in Fig. 13, and key results are reported in Table 6. This includes comparison to the marker chain method of [12] and the case with no artificial diffusion in [43], showing good agreement for the growth rate γ and the RMS-velocity.
On top of this, discrete conservation for the composition field ψ h can be ensured, Fig. 13b, when using the PDE-constrained particle-mesh projection provided by LEoPart.

Conclusion and outlook
This paper introduced LEoPart, a library which integrates a suite of tools for Lagrangian particle advection and different particle-mesh interaction strategies into the open source FE library FEniCS. For efficiently implementing the particle advection, particles are tracked on the (unstructured) polygonal mesh using the convex polyhedron method. As demonstrated in the numerical examples, the particle advection exhibits near optimal performance for distributed-memory parallel runs. Furthermore, several options are available for the projection of particle data onto mesh fields and vice versa. In particular, the PDEconstrained particle-mesh projection allows to track particle quantities on the mesh in an accurate, diffusionfree, and conservative manner.
The code for LEoPart is hosted under an open source license at https://bitbucket.org/jakob_maljaars/leopart.
The various numerical tests demonstrate that LEoPart is sufficiently mature to be applicable to a range of practical particle tracing problems, and the community is invited to tailor LEoPart to suit their specific needs. At the same time, LEoPart remains work in progress, and possible directions for further development may include: • Iterative solvers: as demonstrated in the numerical examples, all the components exhibit excellent scaling for distributed-memory parallel runs, except for solving the global systems which arise in the PDE-constrained projection and the incompressible Stokes equations. To improve the performance for these steps, the implementation of scalable iterative solvers heads our wish list. The optimal preconditioner presented by [42] for similar systems can serve as a starting point.
• Implementation of particle advection: the choice for particle advection schemes is, as yet, limited to an explicit Euler, or a two-or three-stage Runge-Kutta scheme. LEoPart could benefit from the implementation of other advection schemes, such as multi-step schemes. In addition, the velocity of the particles is dictated by a velocity field at the mesh. To further increase the applicability of LEoPart, the option can be added that particles are (partially) advected with a particle-specific velocity, which, for instance, is induced by a force exerted on the particle or follows from Brownian motion.
• Particle output: as yet, a view of the particles and their properties can be dumped to pickle objects.
Writing the output to HDF5 files would be more efficient, particularly so for large scale computations.
• dolfin-x support: the LEoPart functionality works with the latest stable version of FEniCS (v2019.1.0), but is not yet supported in dolfin-x, the development version of FEniCS.
Given these function space definitions, the fully-discrete optimality system is obtained after taking variations of the Lagrangian functional Eq. (16) with respect to ψ h , λ h ,ψ h ∈ W h , T hWh and using a θ-method for the time discretization. The fully-discrete co-state equation in this optimality system reads: given the particle field ψ n p ∈ Ψ t , the particle positions x n+1 p ∈ X t , and the intermediate field ψ