A staggered overset grid method for resolved simulation of incompressible flow around moving spheres

Article history: Received 5 August 2016 Received in revised form 13 November 2016 Accepted 16 December 2016 Available online 23 December 2016


Introduction
For particle-resolved direct numerical simulation of turbulent flows with embedded rigid particles several numerical techniques exist. A powerful technique is the so-called immersed boundary method. In this method the particle boundary is approximated on a Cartesian grid and its effect on the flow is accounted for by a forcing term, which is either a so-called direct forcing term or a continuous forcing term [1]. The grid is usually chosen to be uniform and the forced Navier-Stokes equations are typically solved in the domain including the interiors of the particles [2][3][4][5][6], although there are also variants that exclude the interiors of the particles [7]. The papers cited are only a few of the large amount of publications on this method. A second method is the lattice Boltzmann method, which is also usually applied on a uniform Cartesian grid [8]. Another method, Physalis, uses series of analytical solutions of the steady Stokes equations to bridge the narrow gaps between the uniform Cartesian grid and the particle surfaces [9]. In Physalis, the boundaries of the spherical particles are implemented as sharp interfaces, such that, in contrast to immersed boundary and lattice Boltzmann methods, there is no explicit or implicit smoothing of the particle boundaries.
Traditionally, a body fitted grid is adopted to represent the boundary of an embedded object by a sharp interface, and then the flow around the object is solved by a finite difference, finite volume or finite element method or by a spectral or spectral element method (see [10] and [11] for applications of spectral methods to turbulent flows around a single fixed sphere). Also most commercial computational fluid dynamics packages are based on the body-fitted approach.
For flows around moving particles, the geometry of the flow domain changes in time. This is a challenge, in particular for methods using body-fitted grids. One way to deal with this problem is to use unstructured body-fitted grids, to let the grid points move and, if the grid has become too distorted, to project the solution on a new grid produced by an automatic generator for unstructured grids. In combination with a finite element method, this approach has been used by Johnson and Tezduyar [12] to simulate multiple spheres falling in a liquid-filled tube. Another way to handle the problem of the changing geometry is offered by the so-called overset grid method, also called composite mesh, overlapping grid or Chimera method. The main idea is to attach to each object a body-fitted structured orthogonal grid, which is overset on a background Cartesian grid, or on grids attached to other objects. Interpolation is used to connect the numerical solutions of two overlapping grids to each other.
The concept of overlapping grids was originally introduced for the solution of the Laplace equation and other elliptical partial differential equations on two dimensional domains with a complicated non-moving boundary [13,14]. For the simulation of compressible flows around multiple objects, the Chimera overset grid method was developed [15], which has been used for many flow problems in aerospace engineering [16]. Moreover, Chesshire and Henshaw [17] presented a general method for the construction of colocated overlapping grids for the solution of partial differential equations, general in the sense that it can be used for an arbitrary number of overlapping grids and also for higher-order schemes for the spatial discretization and interpolations. They applied their method to the compressible Navier-Stokes equations. For the incompressible Navier-Stokes equations, Henshaw [18] developed a fourth-order overset grid method and showed an application to steady flow around a sphere, in which the surface of the sphere was covered by two overlapping orthographic patches to avoid pole singularities. In this method the pressure is solved from a Poisson equation equipped with a damping term to keep the velocity divergence small. Others have used the artificial compressibility technique (instead of the pressure Poisson equation) in incompressible flow simulations on overset grids [19]. All overset grid methods mentioned in this paragraph were designed for colocated grids. It seems that, so far, none of them has been used for resolved simulation of particles in turbulent flows.
The first staggered overset grid method was developed by Burton and Eaton [20] and based on the standard second-order staggered approach by Harlow and Welch [21]. Staggered methods are known to be relatively robust, also in combination with standard central differencing. Furthermore, they naturally lead to a compact stencil of the discrete Laplacian of the pressure and a relatively straightforward treatment of the pressure near solid boundaries. The implementation of Burton and Eaton was developed for two fixed staggered grids: a spherical polar grid overset on a Cartesian background grid. The method was successfully applied to direct numerical simulation of turbulent incompressible flow around a single fixed sphere [22]. In each time step, the pressure Poisson equation was solved iteratively by a multigrid matrix factorization technique (a Schwarz alternating method), in which the Poisson equations on the two subdomains were each iteratively solved in turn, while the subdomain boundary conditions were obtained in the outer iteration loop by interpolation of the pressure gradient. Very recently, the staggered overset grid method for spherical particles was used in direct numerical simulation of turbulent flow modified by 64 fixed particles [23]. For that purpose the method was extended to multiple fixed particles. Instead of using an alternating method and interpolation of the pressure gradient, the BiCGStab method with interpolation of the pressure in each iteration was used to solve the pressure Poisson equation. The implementation was parallel, but limited to shared memory systems.
In the present paper, the staggered overset grid method is extended to (freely) moving spherical particles in incompressible flow. Thus a staggered spherical polar grid is attached to each particle and all these grids are overset on the Cartesian background grid. The data structure of the interpolation and grid routines of the implementation in [23] has been altered to realize a parallel implementation based on MPI (Message Passing Interface), which is the standard for high performance computing on large distributed memory systems. In order to make the step of increased complexity not too large, we avoid interpolations from one spherical to another spherical grid in this paper. Thus the particles cannot collide. Narrow gaps between particles can in principle be simulated, but at high computational cost. This apparent limitation of the overset grid method is compensated by a number of advantages. Due to its body-fitted character, the method facilitates an accurate computation of the turbulence dissipation rate and other turbulence statistics of spatial derivatives in the direct vicinity of the particle surfaces, as demonstrated in [23]. Another attractive feature of the overset grid method is that by radial stretching of the spherical grids the total number of grid cells can be largely reduced without sacrificing fine resolution near the particles. The latter feature is particularly useful for resolved simulations of flows around small particles in dilute flows.
After a presentation of the governing equations in section 2 and a description of the staggered overset grid method in section 3, seven validation studies for moving spherical particles will be presented in sections 4 and 5. In section 4, we will consider simulations of four Stokes flows around a single moving sphere and compare the results with non-trivial analytical results. The cases are the Stokes flows due to a translating sphere, an oscillating sphere, a rotating sphere, and an instantaneously accelerated sphere. In section 5, we will present validation results of three Navier-Stokes cases: a translating array of spheres, a falling sphere in a channel, and eight freely moving small spheres in Taylor-Green flow.
The material presented is novel in several respects. The first novelty is the extension of the staggered overset grid method to incompressible flows with moving spheres. Furthermore, pointwise convergence to analytical solutions is demonstrated in numerical tests, which is new in the context of moving spheres and 3D finite difference or finite volume methods. Moreover, the quantitative comparison of immersed boundary methods with a body-fitted method applied to a flow with moving spheres is novel. Also, resolved simulation of small moving spheres in incompressible three-dimensional Taylor-Green flow seems to have been performed for the first time.
Since the flow solver is a Navier-Stokes solver to simulate flows around an arbitrary number of spheres, it is called NSpheres. Source data of the validation simulations presented in this paper are available at www.vremanresearch.nl. Although the MPI implementation opens the way to use the method for large-scale computations on many processors with one or more particles per processor, it is outside the scope of the present work to perform such computations. The purpose of this paper is to describe the method and to validate it for cases that can be simulated in reasonable time with use of a few parallel MPI processes.

Governing equations
We denote the Cartesian position vector by x = [x 1 x 2 x 3 ] T . The Cartesian base vectors are denoted by e 1 , e 2 , e 3 . We consider a spherical particle embedded in the flow, centered at position x p . The particle radius is denoted by r 0 and the particle diameter is denoted by d p = 2r 0 . The spherical coordinates around the particle are given by the nonlinear where atan2( y, x) is the function that provides the argument of the complex number with real part x and complex part y, such that −π < φ ≤ π . Here r denotes the radial, θ the polar and φ the azimuthal coordinate. The base vectors of the spherical coordinate system are denoted by ẽ 1 =ẽ r , ẽ 2 =ẽ θ and ẽ 3 =ẽ φ and specified by the following orthogonal matrix Equation (1) describes the (nonlinear) transformation from Cartesian to spherical coordinates. The inverse transformation is given by and the components can be written as x j = x p j + r A 1 j . If the last two columns of the matrix of partial derivatives (Jacobian matrix) of the inverse coordinate transformation are divided by r, then A T is obtained.
The translational and rotational velocity vectors of the particle are denoted by v p and ω p , respectively. The fluid velocity vector is denoted by u = [u 1 u 2 u 3 ] T in the Cartesian coordinate system and by ũ = [ũ 1ũ2ũ3 ] T = [u r u θ u φ ] T in the spherical coordinate system. The latter two vectors are related through u i e i =ũ iẽi , which implies the following (linear) relationships between the Cartesian and spherical velocity components In this paper, we will use the index notation and the summation convention (only for the indices i and j) in some equations, for example in the Navier-Stokes equations in Cartesian and spherical coordinates and in equations (13)- (14) and (22)- (23), to indicate more clearly in which form these equations, term by term, are used in the computer code. In addition, we use the comma notation for derivatives, for example u r,t = ∂u r /∂t (t denotes time), u r,θ = ∂u r /∂θ and u j,i = ∂u j /∂x i , for integer values i and j, while u j,ii denotes ∇ 2 u j in Cartesian coordinates.
The form of the Navier-Stokes equations in the Cartesian frame of reference is given by for component j = 1, 2, 3, where ν denotes the kinematic viscosity and g the gravity vector, while a is a spatially constant acceleration term, useful for a driving force in combination with periodic boundary conditions for q. The latter is related to the physical pressure p through the definition q = p/ρ + a · x, where ρ is the constant fluid density. If we mention the pressure in this paper, we usually refer to q. The second term on the left-hand side of (6) is called the convective term. The four terms on the right-hand side are called the pressure gradient term, the viscous term, the gravity term and the forcing term, respectively.
In the spherical frame of reference of a particle, the following form of the Navier-Stokes equations is used: u r,t + (r 2 u r u r ) ,r Applied to a scalar c, the Laplace operator in spherical coordinates appearing in these equations is given bỹ Impermeability and no-slip conditions are imposed at the particle surface (r = r 0 ), which means that the fluid velocity should be equal to the surface velocity of the solid particle: u = v p + r 0 ω p × e r at r 0 . The surface spherical velocity components are derived by substituting (2) These expressions and the continuity equation (7) imply that u r,r = 0 at r = r 0 .
The equations that govern the particle motion are given by: where ρ p is the particle density, V p = πd 3 p /6 the particle volume and I p = ρ p d 2 p V p /10 the moment of the inertia of the particle. The expressions for the particle force F p and particle torque T p are based on the stress tensor in spherical coordinates, where δ ij is the Kronecker delta and the tensor G ijẽiẽ j with is the gradient of the velocity expressed in the spherical coordinate system [24].
Since the outward normal of the particle surface S p is equal to ẽ 1 , we write At the particle surface, u r = 0 and u r,r = G 11 = 0, while the former also implies u r,θ = 0 and u r,φ = 0. Since the components of the spherical base vectors are given by (2) and u θ and u φ by (13) and (14), the integrals over G 21ẽ2 and G 31ẽ3 vanish. Thus the components of the particle force and particle torque can be written as All fluid variables that appear in the latter two expressions are evaluated at r = r 0 . The force F p is naturally decomposed into a pressure part F p,q and a viscous part F p,ν . If q is periodic in a direction and gravity acts in that direction, buoyancy enters through the term ρa (a = −g for a system at rest).

Computational method
In this section, the staggered overset grid method for moving particles is explained. Subsection 3.1 contains a basic description of the numerical method. The description is basic in the sense that the explanation of grid structures, interpolations, boundary conditions and parallelization are described in general terms or omitted. The technical explanation of the grid structure and interpolation procedure is provided in subsections 3.2, 3.3 and 3.4. In subsections 3.5 and 3.6, all activities during the initialization and time step are listed, including explanations of several details (on the implementation of boundary conditions, for example). Additional information and discussion of the parallelization can be found in Appendix A, and the interpolation formulas can be found in Appendix B.

Basic description of the numerical method
The present overset grid method is based on the standard second-order staggered discretization scheme for incompressible flow. Thus the velocity components are defined at the faces of a given grid cell. More specifically, velocity component i is defined at the faces whose normal vector is aligned to the unit vector of direction i. The pressure and the discrete velocity divergence are defined at the center of the cell, such that the continuity equation reduces to the balance of the volumetric flows through the cell faces. The advantage of the staggered scheme is that the discrete velocity divergence and pressure gradient operators are computed on the most narrow stencils possible. In each direction, the stencils of these operators are only one grid spacing wide (the distance between two grid points). This automatically provides a tight coupling between the pressures at adjacent grid points, and staggered schemes are therefore known to be relatively robust.
We consider a rectangular computational domain c = [0, The domain c includes the regions inside the particles. The domain that excludes the volumes occupied by the particles is denoted by and changes with time if the particles are moving. A two-dimensional illustration of a Cartesian and a spherical domain is shown in Fig. 1. The radii of the three concentrical spheres in Fig. 1(a) are denoted by r 0 , r a and r b , and these are visualized in Fig. 1(b). The particle is represented by the inner circle (r ≤ r 0 ). The interior of the spherical domain is represented by the two rings around the particle (r 0 < r < r b ). The interior of the Cartesian domain is represented by the second ring and the white region outside (r > r a ). The union of the Cartesian and spherical domain represents (r > r 0 ). The intersection of the Cartesian and spherical domain is the second ring. This is the region where the domains overlap. The Navier-Stokes equations in Cartesian coordinates are solved on the Cartesian domain, which is meshed by a staggered Cartesian grid. The Navier-Stokes equations in spherical coordinates are solved on each spherical domain, which is meshed by a staggered spherical grid, which can be stretched in the radial direction. The discretization on each domain is based on the standard second-order finite difference scheme for incompressible flow. The rectangular domain c is partitioned into M = M 1 × M 2 × M 3 rectangular blocks. The total number of MPI processes is equal to M. Each one of the M processors contains a Cartesian field, which we define as the numerical solution on the intersection of a given block and the interior of the entire Cartesian domain. Each block is equipped with a Cartesian grid which contains N k /M k cells in the x k direction, such that all blocks together contain N 1 × N 2 × N 3 grid cells. A spherical polar grid of N r × N θ × N φ cells is attached to each particle. The numerical solution of the Navier-Stokes equations in spherical coordinates around a given particle is called a spherical field. Each processor contains up to N p1 spherical fields, where N p1 is the smallest integer number satisfying N p1 ≥ N p /M.
We proceed with an overview of the algorithms used in each time step. We distinguish between three parts of the time step, in which the variables are updated from time level n to n + 1 (the time level is included in the superscripts). At the end of each part the interpolation procedure is performed for the velocity, which means that the velocity interpolations from Cartesian to spherical grids and vice versa are performed and that the boundary conditions are applied. For conciseness, the terms of the Navier-Stokes equations are grouped in vectors. However, from the expressions in index notation in the previous section, the contents of the vectors can be deduced.
In the first part of the time step, the positions and the translational and angular velocities of the particles are updated: v p,n+1 = v p,n + t (F p,n + ρ p V p g + ρ V p a n )/(ρ p V p ), (25) ω p,n+1 = ω p,n + t T p,n /I p , (26) where t is the length of the time step. Then the first intermediate fluid velocities, u * and ũ * , are obtained: The last expression accounts for the term −Av p ,t in (8) to (10). Afterwards, the new surface velocities are computed by substituting the angular velocity at level n + 1 into equations (13)- (14). Then the interpolation procedure is applied to u * and ũ * .
In the second part of the time step, the second intermediate velocities, u * * and ũ * * , are computed. For each spherical domain, ũ * * is obtained by solving where I is the identity matrix, H * φ the coefficient matrix of the convective derivatives with respect to φ (the coefficients depend on u * φ ) and J φ the coefficient matrix of the viscous second-order derivatives with respect to φ. The other convective terms are denoted by H * , while the other viscous terms and the forcing terms are denoted by J * (which does not include −Av p ,t since that term is treated in the first part of the time step). All terms in −H * +J * are based on ũ * . After evaluation of the right-hand side, Gauss elimination of the tri-diagonal systems is used to obtain ũ * * in each spherical domain. The second intermediate velocity u * * in each Cartesian block is computed by u * * = u * + t(−H * + J * ), (29) where H * represents the Cartesian convective terms and J * the Cartesian viscous and forcing terms, all based on u * . For the spatial discretization, standard second-order central difference approximations are applied to the convective and viscous terms. The discretization of each term is based on the form specified in section 2. In the discretization of the convective terms, appropriate averages of velocity components over two points with weights 1 2 and 1 2 are used before a velocity component is squared or two velocity components are multiplied. The second part of the time step is completed by applying the interpolation procedure to u * * and ũ * * .
In the third part of the time step is the pressure Poisson equation problem is solved for overset grids (see [23], but as the notation was slightly different there, the description is repeated below). The pressure Poisson equations of all domains are assembled and solved as a combined linear system. The augmented matrix approach of [18] is used, which means that an extra variable and an extra equation are introduced to overcome the singularity of the original linear system. The discrete system of equations for the pressure q is given by where ∇ 2 and ∇· represent the discrete Laplacian and divergence in Cartesian coordinates, while ∇ 2 and ∇ · represent the discrete Laplacian and divergence in spherical coordinates. The forms of the discrete Laplacians follow from the approximations of the continuity equation and the pressure gradient term by straightforward second-order central differences on staggered grids. The system contains N q + 1 equations and N q + 1 unknowns, where N q is the total number of internal points The last equation is the extra equation and the extra variable is b (b converges to zero in the limit of zero grid size). The coefficients c k represent normalization coefficients, such that all diagonal elements of the first N q rows of the coefficient matrix are equal to 1. The total linear system to be solved is written as Cy = z, where C is the (N q + 1) × (N q + 1) coefficient matrix, y the vector of unknowns and z the vector of right-hand sides of (30). The system Cy = z is iteratively solved by the BiCGstab(1) method [25]. The starting pressure is the pressure from the previous time step. In each iteration the pressure is interpolated from Cartesian to spherical grids and vice versa. The velocities at time level n + 1 are obtained by projecting the intermediate velocities on the space of functions that is (approximately) divergence free: where ∇ and ∇ represent the discrete gradient operators, in Cartesian and spherical coordinates respectively. Like the first and second part of the time step, the third part of the time step is completed by applying the interpolation procedure to the velocities, this time to u n+1 and ũ n+1 .

Numbering of blocks, particles and grids
As mentioned above, the rectangular domain c is partitioned into Each x p is updated such that it always remains inside c . In addition to the N p physical particles, shadow particles are defined to account for periodic boundary conditions. These shadow particles are shifted copies of physical particles such that the centers of the shadow particles reside in the shadow blocks outside c . For each shadow particle, the corresponding physical particle number is stored in a pointer array. The number of particles including shadow particles is called the virtual particle number. This number is in general larger than the number of spherical grids, which is equal to the true number of particles, N p . For each block, a block list of particles is defined that contains the numbers of all particles whose cell centers lie inside the block. In addition a pointer array is defined which contains for each block the number of the physical block (and its grid), while it contains for each particle the number of the physical particle (and its grid). The Cartesian grids of the blocks can be stretched, but in this paper we assume them to be uniform, and we also assume that each block has the same dimensions. Furthermore, all spherical grids are the same. Grids are essentially partitions of certain domains into grid cells. The corners of the grid cells are usually called vertices or nodes. The vertices are often not the locations where the discrete flow variables are defined. In the staggered method, the pressure is defined at the cell centers, and velocity component i is defined at the centers of each cell face whose normal vector points in direction i. In this paper, each location where the discrete pressure or a discrete velocity component is defined is called a grid point.
The structure of the Cartesian and spherical grids makes it convenient to store, on each processor, the Cartesian fields as three dimensional arrays and the spherical fields as four dimensional arrays, as shown in Table 1. The original grid points, the grid points inside the Cartesian blocks and the grid points inside the spherical domains, are defined by the ranges of the grid indices listed in the last three columns of Table 1. Each Cartesian grid has N = N k /M k original pressure grid points in direction x k . The corresponding N 1 × N 2 × N 3 original cells cover the Cartesian block. Likewise, each spherical domain The indices in the direction of a staggered quantity refer to staggered locations. Staggered quantities have some grid points on the boundaries of the domain. For example, a grid point of the staggered u r velocity lies on the particle surface if the original i index for the u r velocity is zero, which refers to a dummy location. For a staggered quantity near a boundary  that is not a wall, an extra dummy layer is required at index i = −1. This layer is not required for the discretization of derivatives, but is convenient for the interpolation stencil (this will be clarified later on). Since in this paper we assume that the interpolation stencils do not use points outside the physical domain, no extra dummy layer is required for the normal velocity at a wall. In Table 1, it is assumed that the block end faces are no walls. For example, if the x 1 end face were a wall, the original i index for u 1 would run up to N 1 − 1, i = N 1 would coincide with the wall and thus be a dummy point, and i = N 1 + 1 would not play any role.

Interpolation points
In addition to the distinction between original and dummy points, we classify all grid points into four categories: internal points, where the Navier-Stokes equations are discretized; interpolation points (query points), where the pressure or a velocity component is obtained by interpolation; boundary points, which are dummy points used to implement the boundary conditions; and passive points, which (temporarily) play no role in the numerical scheme. Each original point is either an internal point, an interpolation point, or a passive point. Each dummy point is either an interpolation point, a boundary point, or a passive point. The relation between the two classifications is visualized in Fig. 2.
The characterization of the staggered grid points is derived from the characterization of the pressure grid points (the cell centers), and therefore we consider the pressure grid points first. For each spherical grid, the internal pressure grid points are precisely the original grid points, while the interpolation points are the dummy grid points with r > r b (see Fig. 3(a)). All other dummy pressure grid points are boundary points. The spherical grids have no passive points. For the Cartesian grids, the characterization of points is determined by r a , the radius that is used to cut holes in c . If a Cartesian pressure point is located inside a sphere with radius r a around a particle (see Fig. 3(b)), then the point is an interpolation or a passive point. Otherwise, it is an internal or a boundary point. If the point is an interpolation or a passive point, it is a passive point if none of the six direct neighbours is an internal or a boundary point.
In view of the characterization of points explained above, type arrays are defined, which are integer arrays with a similar structure as the floating point arrays in Table 1 (but each Cartesian type array has one extra dummy layer around the block). The types of the pressure points of a given Cartesian grid with grid identification number b are set in two rounds. In the first round, the nearest particle is determined for each Cartesian pressure point (the distance to the nearest particle center is equal to the minimum of the distances to all the particles in the block lists of the block under consideration and the 26 surrounding blocks). If the distance to the nearest particle is less than r a , then the type is set equal to the grid identification number of the nearest particle (which can be a shadow particle), otherwise the type is set equal to the default type, which is b. The first round includes the outer dummy layers, which are the extra dummy layers just mentioned. In the second round, in another loop performed over all Cartesian pressure points except those in the outer dummy layers, the type is overwritten by 0 if the type of none of the six direct neighbours is equal to b. The result of these two rounds can be summarized as follows. If the type of a grid point is equal to b the point is either an internal or a boundary point (to distinguish between internal and boundary points, the values of the grid indices i, j and k of the three coordinate directions are used). If the type is zero, the grid point is passive. However, if the type is unequal to zero and unequal to b, then the grid point is an interpolation point and the type value represents the grid identification number from which the fluid variable at the grid point under consideration is interpolated.
The types of the staggered points are derived from the types of the two nearest cell centers (each staggered point is located on a face that joins two neighbouring cells). If the type numbers of these center points are the same, the type is copied to the staggered type. If one of them is zero, the type of the staggered point is also set to zero (passive). If one of them is an internal point and the other one an interpolation point, the staggered point is an interpolation point. Although the staggered velocity points are not explicitly indicated in Fig. 3, above explanation may be clarified by mentioning that the spherical staggered points located on the two thick solid circular lines and the Cartesian staggered points on the step line twined around the dashed circular line are not internal grid points, but interpolation points (r = r b and inner step line) or boundary points (r = r 0 ).
For a given interpolation point, the grid on which the interpolation stencil is defined (say grid a) is called the donor grid, while the grid to which the interpolation point itself belongs is called the acceptor grid. Each interpolation stencil contains 3 × 3 × 3 points. The type of each point of the interpolation stencil should be equal to a, which means that each point of the interpolation stencil should either be an internal point or a boundary point. Furthermore, a boundary point due to a wall is not allowed to be a member of an interpolation stencil, except if it is a boundary point for the wall normal velocity on the wall. The interpolation stencil is chosen such that the position of the interpolation point is not further away from the central point of the stencil than from any of the other 26 points of the stencil. As an example, we consider the pressure interpolation point indicated by the enlarged plus symbol in Fig. 3(a). For this point, the Cartesian grid is the donor grid and the spherical grid is the acceptor grid. The 3 × 3 encircled Cartesian pressure points near the bottom right corner of

The radial grid
The grid in the r direction can be uniform or stretched. The cell center radial locations r c i are defined by where f (ξ ) represents the stretching function, i denotes the grid index, while i = 0 and i = N r + 1 represent dummy locations. The staggered radial locations of the u r velocity are defined by where i = 0 and i = N r represent dummy locations. The radial grid size is defined by r c i = r s i − r s i−1 at location r c i and by r s i = r c i+1 − r c i at location r s i . If we wish to simulate a case in which particles are relatively close to each other, we may want to minimize N r and r. Then a uniform radial grid seems to be a suitable option, which is obtained if f (ξ ) = 1 + ξ( r)/r 0 and r = (r b − r 0 )/N r . If stretching is used in the radial direction, we choose f (ξ ) = exp(γ ξ) with stretching factor γ = θ = π/N θ [23], which implies that r c It is possible to derive sufficient conditions for r a (the radius of the holes in the Cartesian grid) and r b (the radius of the outer boundaries of the spherical grids), such that each point of an interpolation stencil is ensured to be an internal point or a staggered dummy point on a solid wall. For simplicity, we consider uniform Cartesian grids with the same grid spacing h in each direction. Without loss of generality we assume that x p = 0 for the nearest particle. A one-dimensional illustration of two basic grid configurations used in this paper is shown in Fig. 4. The left and right grids in each subplot represent the spherical grid and Cartesian grid, respectively. The right grids have been shifted slightly downward, for clarity.
Consider a pressure or velocity interpolation point of the Cartesian grid located at x. From the definition of the interpolation point, we infer that |x| ≥ r a − h. The interpolation stencils of x do not include a dummy point inside a particle if |x| ≥ r s Furthermore, |x| < r a for a pressure location and |x| < r a + 1 2 h for a velocity location. The radius of each outer dummy point of the spherical grid is at least r b . Thus the velocity interpolation stencil does not include outer dummy points of the spherical grid if |x| < r c Moreover, the interpolation stencils of the spherical interpolation points should fit on the Cartesian grid. Let x be an interpolation point of the spherical grid. By definition, |x| ≥ r b . The corresponding three interpolation stencils should consist of internal Cartesian velocity points. From the in total 81 stencil points, let y be the point farthest from x (y is not always unique). Because of the definition of the interpolation stencils, The interpolation stencil for the pressure fits if |y| ≥ r a , but the interpolation stencils of the velocity are more critical. For a velocity interpolation stencil, we consider the two pressure points at distance 1 2 h from the staggered y, we choose the pressure point with the largest distance to x, and call it z. The latter distance satisfies Since the point y is internal if the pressure point z is internal, the velocity interpolation stencils fit on the internal points of the Cartesian grid if |z| ≥ r a , which is ensured if The argumentation also applies to multiple particles, provided the distance of each spherical interpolation point to the nearest particle is not smaller than r b . This means that all interpolations can be performed if the distance between two particle centers is at least r b + r c N r +1 and the inequalities (34), (35) and (38) are satisfied. Mutual overlapping spherical grids are avoided as long as the minimum interparticle distance, d min , satisfies In addition, as long as the minimum distance between a particle and a wall, d min,wall , satisfies the presence of a wall does not hinder the interpolations. It is instructive to simplify the set of conditions (34), (35), (38) and expressions (39) and (40) for cases with uniform grid size r: In case r = h, the minimum suitable value for r b to guarantee the existence of the interpolation stencils for any configuration with a minimum interparticle distance of 2r b + 1 2 r is r b = r 0 + 5h. Then r a should be chosen between r 0 + 2h and r 0 + 2.08h. Then d min should be at least 2r 0 + 10.5h, which implies that the width of the gap between two particles should remain at least 10.5h. The minimum width of the gap between a particle and a flat wall should remain at least 6.5h if r = h. In theory, arbitrarily small nonzero gap widths are allowed in the limit h → 0 and t → 0. In practice, a very large number of grid points will be required for a simulation that allows very small gap widths.

Details of the initialization
The initialization procedure consists of 7 substeps: 1. Define the spatial domains. Store the coordinates, the grid spacings, the elements of matrix A, and some metric quantities for cell-center and the staggered locations. Store also the matrix coefficients of the left-hand side of the pressure Poisson equation. 2. Initialize time t and evaluation time t eval , particle positions, and particle and fluid velocities and pressure. 3. Compute the velocity field on the particle surfaces. 4. Set up the interpolation lists. 5. Set the velocity and pressure at all dummy and interpolation points. 6. Make the initial velocity field divergence free (optional). 7. Call the evaluation routine if t = t eval .
In substep 1, the initial reference position vectors and the sizes of the Cartesian and spherical domains are defined. Then the coordinates of all cell center and staggered points are defined. Each cell centered grid spacing is a distance between two staggered points, while each staggered grid spacing is the distance between two cell centers. It is important to mention that the elements of matrix A and metric quantities are directly computed by inserting r, θ and φ, for the cell center and for the staggered locations, such that none of these quantities is obtained by interpolation.
The setup of the interpolation lists (substep 5) consists of several routines. First the block lists of particles (defined in subsection 3.2) are determined. Second, the block lists are used to find the distance to the nearest particle center. This distance and the corresponding particle number are stored for each Cartesian pressure point (cell center location). Third, the types of the grid points are set as described in subsection 3.3. Fourth, the acceptor and donor interpolations lists are set-up, and the interpolation weights are computed.
In substep 5 of the initialization routine, the interpolations are performed for the first time. This means that two interpolation procedures are called, one for the velocity, the other one for the pressure. Both the velocity and pressure interpolation procedure consist of three main actions: (1) call the boundary condition routine of the flow variable under consideration (set the values at dummy points), (2) perform the interpolations (set the variable at interpolation points), (3) call the boundary condition routine again.
In the boundary condition routines, the following actions are performed. The variables at the Cartesian dummy points are copied from the corresponding points in adjacent cell block or, in the case of periodic boundary conditions, from a block that becomes adjacent after a periodic shift. If the boundary is a wall, the normal velocity is prescribed at the wall location, while the tangential velocity components are extrapolated using the third-order extrapolation formula specified in an appendix of [26]. For each spherical grid, u r is set to zero at the particle surface. The tangential velocity components, u θ and u φ , are set at the dummy points inside the particle, using the third-order extrapolation formula just mentioned. This extrapolation is based on the tangential velocity of the wall and the velocity at two internal grid points. The pressure at the dummy points is copied from the first layer of internal points (q(r c 0 ) = q(r c 1 )). It is stressed that the pressure at the dummy points is only used by the procedure that solves the pressure Poisson equation. It does not mean at all that a homogeneous Neumann boundary condition of the pressure is prescribed to the continuous pressure Poisson equation, since the discrete pressure equation at r c 1 should not be regarded as a discretization of the continuous pressure Poisson equation [26]. The values of the variables in the dummy points in the φ direction are obtained using the periodicity in that direction. For the poles (θ = 0 and θ = π ), the fourth-order interpolation formula specified in the appendix of [23] is used to calculate u θ at the poles. It was shown in [23] that this pole treatment is adequate. In fact, this is the only pole treatment that is required for the discretization of the spherical Navier-Stokes equations, since many metrical quantities are zero if θ = 0 or θ = π . However, for the interpolation routines it is necessary to use meaningful extensions for the dummy grid cells in the θ direction (thus for θ < 0 and for θ > π ). Near θ = 0, we define for θ < 0: u r (r, θ, φ, t) = u r (r, −θ, φ + π, t), u θ (r, θ, φ, t) = −u θ (r, −θ, φ + π, t), u φ (r, θ, φ, t) = −u φ (r, −θ, φ + π, t) and q(r, θ, φ, t) = q(r, −θ, φ + π, t). Analogous extensions are applied near θ = π .
Substep 6, a correction of the initial velocity field, is optional and not applied by default (it has only been used in the simulations presented in subsections 4.1-4.3). To make the velocity field divergence free (down to the truncation error), ∇ 2 ψ = ∇ · u is solved iteratively, and ∇ψ is subtracted from u. Although ψ does not represent the variable q at t = 0, the procedure that solves the Poisson equations for ψ is the same as the procedure that solves the Poisson equations for q.
In the last substep of the initialization procedure the evaluation routine is called if t is equal to t eval , the evaluation time. This is a post processing routine, intermediate results are computed and written to file, for example forces and torques on particles, errors and some spatial integrals. In addition, t eval is set to the next evaluation time.

Details of the time step
The main parts of the time step and temporal and spatial discretization have been presented in section 3.1. In this section, a complete list of all actions performed during a time step is given, and additional explanation is provided. Each time step consists of 16 substeps: 1. Increase t with the time step t. 2. Store a copy of the particle velocities in v p old . 3. Compute the hydrodynamic forces and torques on the particles, F p and T p . 4. Update the particle positions by incrementing the particle position vector x p by t v p . In substep 3, the hydrodynamic forces and torques exerted on the particles are computed. The integrals over the particle surface in definitions (22) and (23) are performed using the midpoint rule. The tangential velocity derivatives at the wall are computed by central differences using the dummy points inside the particle and the first layer of internal points near the particle surface. The pressure at the particle surface is obtained by second-order accurate extrapolation using the first two layers of internal points near the particle surface, The pressure values in the dummy cells inside the particle are not used in this step. In substep 4, the particle locations are updated. For each periodic direction i, the domain length L i is added or subtracted from x p i if that is required to keep x p i inside the domain c . In substep 5, the hydrodynamic forces and torques computed in substep 3 are used to further update the velocities and the angular velocities of the particles. Substeps 3 to 5 are only performed for freely moving particles.
In the results section, we present several simulations of spheres with a prescribed motion. In those cases, substeps 3 to 5 are replaced by statements that compute x p , v p and ω p by substitution of time t + t in the functions that prescribe the motion of the particles. In substep 6, the fluid velocity at the internal points of each spherical domain is modified to account for the change of the particle velocity. In substep 7, the surface velocity of each particle is modified to account for the change of the angular velocity of the particle. In substep 8, the new interpolation structure is set up, based on the new positions of the particles. Substep 8 includes an update of the types of all grid points. This update leads to so-called exposed points [17], internal points that were not internal before the types were updated. If v max t is sufficiently small (v max is the maximum of the magnitude of the particle velocity), each exposed point was an interpolation point in the previous time step, which implies that the variable at the exposed point has a meaningful value. Since for the present overset grid method only Cartesian points can be exposed, the condition v max t < h/ √ 3 should be satisfied. This is a mild restriction, such that no special treatment of exposed points is needed, provided the spatial discretization is based on one previous time level (n). In substep 9, the first intermediate spherical and Cartesian velocities are completed by computing their values at interpolation and dummy points, using the new interpolation structure and the results of substeps 6 and 7.
The method reduces to the standard explicit Euler method for the intermediate velocity update if all particles are fixed. In the method restricted to fixed particles [23], the explicit part of the convective terms was based on two time levels (n and n − 1) and second-order in time. This has some advantages, higher accuracy of the convective motion and less severe convective restriction on the time step if the viscosity is low, but it also requires that the velocity at interpolation points is available at time levels n and n − 1. However, if particles are moving, an interpolation point at time level n may have been a passive point at time level n − 1. Thus for the convective term at exposed points the time level n − 1 should not be used.
To avoid a special treatment of exposed points, the temporal discretization of the convective terms has been simplified such that time level n − 1 is not used at all anymore.
The time step t remains fixed during a simulation. For the simulations in the present paper, the viscous stability criterion is much more restrictive than the common convective criterion that the Courant number should be less than one. The theoretical upperbound for numerical stability of the explicit viscous terms is given by Each t used in this paper satisfies t < 0.6 t visc . This implies that the temporal truncation error, which is proportional to t, is second order in terms of the spatial grid spacing, like the truncation errors in the spatial derivatives. Another factor relevant for numerical stability is the density ratio ρ p /ρ, which is at least one in all cases presented in this paper. However, it was found that the present method is unstable for ρ p /ρ < 1 2 , while for ρ p /ρ ↓ 1 2 , the time step needs to be reduced for stability. The instability for ρ p /ρ < 1 2 is related to added mass effects in combination with the explicit advancement of the particle translational velocities, see for example [27].
In substep 13, the pressure Poisson equation problem is iteratively solved, as explained in subsection 3.1. The stopping criterion of the iterative method is that the maximum norm of the residual vector is less than a specified tolerance, 10 −5 min(u 2 where u ref is a reference value for the fluid velocity relative to the particles. This tolerance can be regarded as an estimate of the absolute error in the pressure caused by the finite number of iterations. The pressure interpolation routine, executed in each iteration, includes the assignment q(r c 0 ) = q(r c 1 ). As discussed in the previous subsection, this does not reflect a physical pressure boundary condition, but is just a convenient choice for the discrete system, a choice that implies that the boundary condition that should be imposed on the intermediate velocity is u * * r = 0 at r 0 . The latter is done in substep 11, before the divergence is taken in substep 12.
Even if the linear system were solved down to machine precision, the discrete divergence of the fluid velocity would not be zero down to machine precision. The first cause is the augmented matrix technique, which introduces a small error at each internal point. The second cause is the interpolation in substep 15, which leads to a nonzero velocity divergence at internal pressure points with at least one neighbouring velocity interpolation point. Both errors are truncation errors, and are therefore expected to converge to zero if the grid size is refined to zero. The second cause can be avoided if, instead of the pressure at pressure interpolation points, the pressure gradient is interpolated at velocity interpolation points in each iteration of the pressure Poisson equation [20]. Therefore, interpolation of the pressure gradient in combination with the BiCGstab iterative method was also implemented by the author of the present paper, but without success. The computational time per iteration was significantly increased and the iterative procedure failed to converge.
For completeness, it is mentioned that if two particles become too close, the program does not terminate, but modifies the velocities of the two particles according to an artificial hard-sphere repulsion (in between substeps 2 and 3). The repulsion occurs if the particles approach each other and the distance between the centers becomes smaller than d min + v max t. If a repulsion occurs, the user is warned. However, no repulsions occurred in the simulations presented in this paper. In particle resolved simulations performed with other methods, collisions have been modelled by soft-sphere collisions (or soft-sphere short-range repulsions), sometimes in conjunction with a lubrication model, see for example [2,5,9].

Results of Stokes simulations
In this section, we show that canonical analytical solutions of the (unsteady) Stokes equations for flows around a single sphere have been reproduced by the code. Results of four test cases are presented in four subsections. In each case d p = 1, ν = 1 and ρ = 1, while the acceleration terms a and g are zero unless mentioned otherwise. The Cartesian domain c is cubical (L 1 = L 2 = L 3 = L) and the Cartesian grid cells are cubes of uniform edge length h, which implies N 1 = N 2 = N 3 . Each simulation was run in parallel on eight processors, using MPI, for which the Cartesian domain was divided into 2 ×2 ×2 blocks. The computing times can be found in Appendix C.
In the simulations presented in this section, the convective terms in the code were multiplied by zero in the equations on the spherical grid, while they were replaced by the term v p (t) · ∇u in the equations on the Cartesian grid, where ∇u was discretized by the standard second-order central difference scheme. The analytical Stokes solutions used are solutions of the unsteady Stokes equations in the frame of reference of the sphere. Since the unsteady Stokes equations are not Galilean invariant, the convective term v p (t) · ∇u appears in the unsteady Stokes equations after transformation to the Cartesian frame of reference.

Translating sphere
In this and the following two subsections, where we consider the Stokes flows due to a translating, oscillating and rotating sphere, the size of c is given by L 1 = 4. The center of c is denoted by x cen . At t = 0, the fluid velocity is set equal to the exact velocity. Each time the boundary condition routine is called, the exact fluid velocity vector at actual time t is copied to the dummy points on and near the outer faces of c . This is combined with zeroth order extrapolation of the pressure, which, as explained in section 3.4, is consistent since the exact velocity is also imposed on the intermediate velocity on the faces of c , just before the divergence operator is applied. The resolutions used are summarized in Table 2.
Each grid is uniform in the radial direction and r = h. Moreover, r a = r 0 + 0.4(r b − r 0 ) + 10 −8 (the grid structure is illustrated in Fig. 4(a)). In order to allow a meaningful comparison with the exact pressure, the simulated pressure is modified by subtracting the simulated pressure averaged over the particle surface. The subscript "exact" refers to the corresponding analytical solution.
In the first test case, we consider Stokes flow around a sphere translating with constant velocity V. The translation is given by: The exact solution can be written as [28] (p. 265): q(x, t) = 3νr 0 where r = x − x p (t) and r = |r|. In the numerical tests, we use V = e 1 or V = e 3 (u ref = |V| = 1). These two choices are numerically not equivalent, because the polar axis in spherical coordinates is always aligned with e 3 , by definition. The translating sphere simulations performed on the grids specified in Table 2 are indicated by Tran1, Tran2, Tran2a and Tran2b. The number in each name refers to the level of refinement. The deviations from the analytical solutions are shown in Fig. 5, as functions of time: the relative error in the particle force and the maximum norms of the velocity divergence, the error in the velocity and the error in the pressure. The discrete maximum norm is defined as the maximum of the absolute value of a quantity over the centers of all internal cells in the spherical domain and the centers of all internal cells in the Cartesian domain (thus the overlap region is used twice). This is straightforward if the quantity is the pressure or velocity divergence, since these are defined at the centers of the cells. For a velocity difference, we first square the velocity differences of each component at their staggered locations, then interpolate these squares to the centers of the cell using the weights 1/2 and 1/2, then we sum the three squares, and then the root of this sum is inserted as argument into the discrete maximum norm. The maximum norm of the error in the velocity is normalized by V = |V| = 1. The force is normalized by F exact = 3πρνd p V = 3π . The maximum norm of the error in the pressure is normalized by the maximum of the analytical pressure field, q max = 3ν V /d p = 3.
All errors are around 1% or less for the coarsest grid and they decrease upon grid refinement (Fig. 5). Comparison between Tran2 and Tran2a shows that it matters if r a and r b are kept constant during grid refinement (Tran2) or if the radial extent of the grid shrinks during grid refinement (Tran2a). All errors are reduced in both cases, but the reduction is largest if r a and r b are kept constant. Thus fixed overlap during grid refinement reduces the error more than shrinking overlap. This was also found by others [17,20]. For fixed overlap, the particle force and the velocity field display second-order accuracy, but the pressure and the velocity divergence only first-order accuracy. The errors in case Tran2b, in which the velocity of the sphere is directed along the polar line, are comparable to those in Tran2, except the error of the force, which is much smaller in Tran2b than in Tran2.
It is surprising that the accuracy of the pressure in an overset grid method based on second-order finite differences and third-order interpolations reduces to first-order (at some times at some locations). It is mentioned that the requirement of vanishing maximum norms of truncation errors in simulations of flows around moving objects can be regarded as severe tests, because pointwise convergence of the numerical solution to nontrivial exact solutions is verified. The first-order accuracy of the pressure can probably not be attributed to unsteady behaviour, since it also occurs for uniform Stokes flow past a fixed sphere, which is a steady state flow (see Appendix D, where also a reduced order of accuracy is observed for the pressure, in the maximum norm and to smaller extent also in the L 2 norm).
The reduction of the order of accuracy of the divergence and the pressure can be caused by the abrupt change of the structure of the discretization error in the overlapping region. Consider an internal velocity point adjacent to an interpolation point. The leading terms of the truncation errors in both points are both second order (since the velocity is shown to be second-order accurate), but their prefactors may be discontinuous and then the accuracy of the derivative based on these two points can reduce to first order. The same argument shows that the accuracy of a second-order velocity derivative based on three points with one of them being an interpolation point can locally reduce to zeroth order. Because of the projection of the second intermediate velocity, part of the viscous term induces part of the pressure gradient. Thus if the viscous term is zeroth order at some points, the pressure gradient required to make the velocity divergence free can be a zeroth order quantity at some points. If this is the case, the pressure itself can be first-order accurate at some points.
To validate a three-dimensional particle-resolved simulation method, analytical solutions of flows around an isolated sphere have been used before, but apparently only for steady flow around a fixed non-rotating and a fixed rotating sphere. For example, Tang et al. [29] and Baltussen [30] used the Stokes expressions for the force and torque on an isolated fixed sphere to test their immersed boundary methods. Kempe et al. [31] used the exact solutions for both no-slip and free-slip surface conditions to test an immersed boundary method for steady flow past a fixed sphere. For the free slip case, the maximum norm of the error in the velocity was shown, which remained (slightly) above 1% in all tests. According to Ref. [23], the maximum norm of the velocity error dropped below 0.2% in overset grid simulations of steady flow past a fixed sphere with a no-slip surface. Moreover, Hasimoto's analytical expressions for the drag force exerted on spheres in ordered particle arrays have been used in verification of numerical methods, see for example [8,3,5,29]. Not for the purpose of validation, but to investigate the effect of Reynolds number ranging from 0.1 to 100, Mei [32,33] has simulated unsteady flow past a fixed sphere, using an axisymmetric flow solver.
It is remarked that in Ref. [23] the Cartesian and spherical domains were much larger than in this section since the spherical grid was stretched. The present test is more challenging, not only because the sphere moves but also because the interpolations are performed much closer to the surface of the sphere. Near the surface the variation of the pressure is relatively strong (the pressure decays with the square of the distance to the center).

Oscillating sphere
The second test case is an oscillating sphere. The oscillation is given by where i 2 = −1 (only in this subsection i represents the imaginary unit), the function (.) takes the real part of its argument, while the constants V and ω denote the velocity amplitude and the oscillation frequency, respectively. The exact solution, presented in terms of Stokeslets in [28] (p. 310), can be rewritten as where r = x − x p (t) and r = |r|, while B = 6π νr 0 (1 + λ + λ 2 /3), In the numerical tests, (52) is prescribed, using ω = 2π , and V = e 1 or V = e 3 .
The oscillating sphere simulations performed on the grids specified in Table 2 are indicated by Osc1, Osc2, Osc2a and Osc2b. The force exerted on the sphere and the pressure at stagnation point (θ, φ) = (π /2, 0) on the particle surface are shown in Fig. 6 for Osc1 (coarse grid) and Osc2 (fine grid), together with the analytical solutions. The amplitude of the force, F max ≈ 20.4, is more than two times higher than 3π , the magnitude of the force exerted on the sphere translating with velocity V = 1. This indicates that the oscillating flow is essentially different from the translating case and that it can certainly not be considered as a quasi-steady flow in the frame of reference of the sphere. In Fig. 6(a) the curves for the force are on top of each other, while in Fig. 6(b) slight underprediction of the pressure amplitude by the coarse grid simulation can be observed. The amplitude of the pressure at this location equals q max , the maximum of the absolute pressure over space and time (q max ≈ 7.1).
Results for the relative error of the force and the maximum norms of the velocity divergence and the errors in the velocity vector and pressure are shown in Fig. 7, for simulations Osc1, Osc2, Osc2a and Osc2b. Comparing Osc2 to Osc2a, we again notice that the error reduction by grid refinement is larger for grid 2 (fixed overlap) than for grid 2a (shrinking overlap).
In cases Osc2 and Osc2b, the same grid is used, but the direction of the velocity vector is different. The results of these two cases are equally good. The figures indicate pointwise convergence of the numerical solution to the analytical solution upon grid refinement. To check the approximate order of convergence it may be more appropriate to look to the temporal maximum or average of the spatial norm than to individual times. For the velocity, we see that the maximum over time reduces by a factor 4 upon grid refinement, indicating that the velocity is second-order accurate. Whereas for the translating sphere, second-order accurate convergence of the particle force was observed, the error reduction of the force upon grid refinement is reduced by factor of approximately 2.8, which suggests that the order of convergence is 1.5 in this case. The pressure and velocity divergence display roughly first-order convergence in the maximum norm (see the discussion on the order of accuracy in the previous subsection). It is not known whether other methods would provide better than first-order pointwise convergence of the pressure if they were applied to this problem. Despite the first-order spatial accuracy of the  Table 3 Results for the rotating sphere at t = 0.3. The maximum norms of the velocity difference, pressure difference and velocity divergence are normalized by |ω 0 |d p , |ω 0 |d p ν and |ω 0 |, respectively.
Case pressure, all errors, including those in the pressure, are less than 1% of the reference values in all internal grid cells for all times for 30 grid points per particle diameter. In conclusion, this test case is a further confirmation of adequate performance of the present overset grid method.

Rotating sphere
The third test case is a rotating sphere. The rotation, which is the only movement of the sphere, is given by The exact solution can be written as [28] (p. 268): where r = x − x p (t) and r = |r|. Since the translational velocity of the sphere is zero, this is a solution of the (steady) Stokes equations in both the Cartesian and spherical frames of reference. Baltussen [30] used this flow to test a second-order immersed boundary method and concluded that d p /h > 80 was required to predict the torque within 1% accuracy. Table 3 shows the steady state results of the overset grid method simulations, which were performed on the spatial grids specified in Table 2 (u ref = π|ω 0 |d p = π ) for 0 ≤ t ≤ 0.3. The discrepancies with the analytical solution are small. The torque errors are very small, less than 0.05% in all cases. In fact, the coarse grid leads to the smallest torque error. This is probably due to a fortunate cancellation of errors for this quantity since the maximum norms do show that the simulation on the coarse grid (Rot1) is the least accurate one. The reduction of the maximum norms in case Rot2 compared to those in case Rot2a (grid refinement with fixed overlap) clearly indicates pointwise convergence of the numerical to the analytical solution. Overall, the smallest errors are obtained if the torque directed into the x 3 direction (Rot2b), which is not surprising since in this case there is only one component of the spherical velocity nonzero and that one is constant (u φ ).

Instantaneously accelerated sphere
In the fourth test case, a step change is applied to the velocity of a sphere. Initially the fluid and the sphere are both at rest. At t = 0, the sphere is given an instantaneous impulse ρ p V p v 0 e 1 and afterwards the system evolves freely. There is no gravity.
An exact closed system for the velocity v p = v(t)e 1 of the sphere is given by The first three terms on the right-hand side of (62) are the Stokes drag force, the Boussinesq-Basset viscous memory force and the reaction acceleration force (added mass force), respectively. These well-known terms can be derived from the Laplace transform of the unsteady Stokes equations, see [28]. The fourth term, proportional to the Dirac delta function δ(t), represents the momentum pulse injected into the system at t = 0. This system is a special case of the so-called where H(t) is the Heaviside function. We derive the following system for v (t): The function v (t) is continuous at t = 0, such that v (t) and F (t) for t > 0 can be accurately computed by applying a standard numerical method to the latter system. The implied solutions v(t) and F (t), which follow from (64) and (65), are called v exact and F exact . Since the velocity of the sphere undergoes a step change, the overset grid simulations of this problem are denoted by Step1 and Step2. A velocity step change of a sphere is an interesting and also relevant phenomenon, because in reality collisions create sudden changes of particle velocities, relatively large changes if ρ p /ρ is large, like in a gas-solid flow. For this reason ρ p /ρ = 1000 is selected in Step1 and Step2. The investigation of a step change is also relevant from a more theoretical point of view; if a particle of finite size is injected in a nontrivial smooth flow at location x, discontinuities are generated at the particle surface, even if the particle is injected with translational and angular velocity of the particle respectively equal to the velocity and half vorticity of the fluid at x just before injection.
The impulsive acceleration is applied through the initial condition: at t = 0 the fluid velocity and pressure field are zero, but the velocity of the sphere is set to v 0 e 1 (v 0 = u ref = 1, ω p = 0). It is essential that the option to make the velocity divergence free after the initialization is not used in this case. The velocity of the sphere is not prescribed for t > 0. Since for this case no analytical expression for the full velocity field was found in literature, the domain c is large (L 1 = 20) and equipped with periodic boundary conditions. The sphere is initially positioned at the center of c . The extent of the spherical domain size is determined by r b ≈ 3.562d p , while r a = (r 0 + r b )/2. The spatial resolution of two simulations is given by 15κ × 24κ × 48κ and N 3 1 = (40κ) 3 , where κ = 1 (Step1) or κ = 2 (Step2). The grid configuration for κ = 1 is illustrated in Fig. 4(b). The minimum radial grid spacing is approximately d p /(15κ). The radial grid is stretched using the exponential stretching function defined in section 3. The time step equals t = 0.0004/κ 2 . Fig. 8 shows the velocity change and the force on the particle due to the impulse input at t = 0 as functions of time.
The axis scalings are logarithmic to capture the large variation of quantities and to show the large changes in the first time steps clearly. Equation (62) contains three terms that are singular at t = 0. The impulse injected into the system at t = 0, ρ p V p δ(t), is accounted for by the initial condition of the simulations. The other two singular terms are the history force and the added mass force, which are both negative at t = 0. These negative singularities lead to the very large −F in the first time step of the simulations, roughly proportional to 1/ t. The largest contribution of −F in the first time step is due to the added mass singularity since the delta function singularity in the added mass force is stronger than the root singularity in the history force. In both cases, the simulations takes two time steps to adapt v 0 to approximately v 0 , which indicates that in the numerical solutions v(2 t) converges to v 0 if t → 0. This reflects the theoretical limit v(t) →v 0 if t ↓ 0. In the fine grid simulation (Step2), the relative error drops below 1% within 8 time steps for v 0 − v(t) and within 16 time steps (t = 0.0016) for F (t). In a test for density ratio ρ p /ρ = 1, a larger error was found (less than 4% in F (t) for t ≥ 0.0016, for the same grid and same time step as in Step 2), but in that test the error could be lowered by a factor 2 by reducing the time step. It is concluded that the response to a step change of the particle velocity can be captured by the present method.

Results of Navier-Stokes simulations
In this section, we present results of simulations of three Navier-Stokes flow cases: (1) a moving array of spheres, (2) a sphere falling in channel flow, and (3) eight small spheres freely moving in a Taylor-Green flow. In the last two cases, the stretching option of the spherical grid is used to reduce the number of grid points of the Cartesian background grid with orders of magnitude (compared to uniform Cartesian grid methods). In each case d p = 1, ν = 1 and ρ = 1. The acceleration terms a and g are zero, except in the second case. The Cartesian grid cells are cubes of uniform edge length h. Each simulation was run in parallel on eight processors, using MPI, for which the Cartesian domain was divided into 2 × 2 × 2 blocks. The computing times can be found in Appendix C.
Although in the multi-particle cases, cases (1) and (3), the particles do not collide, these cases do add value to the present validation study. The results of the first case demonstrate that the Galilean invariance of the Navier-Stokes equations is numerically respected, which is a non-trivial validation of the implementation of the convective terms in the overset grid approach. The third case is a transient flow with a number of symmetries in the initial condition. The motion of multiple particles, initially symmetrically embedded in the flow, appears to obey the initial symmetry up to high accuracy during the entire simulation. The verification that these expected symmetries are persistent in the simulation confirms that the implementation of geometrically complicated parts of the algorithm, such as the time-dependent motion of interpolation points and interpolation stencils is correct. Furthermore, the present code is the first MPI implementation of a staggered overset grid method that can be used for particle-resolved direct numerical simulations of turbulent flows around fixed or oscillating particles, which is a valid reason to demonstrate that the method is able to simulate flows with multiple non-colliding particles. Of course such flows are theoretical, but they do give insight into particle-fluid interaction [37,23]. Table 4 Results of simulations of an FCC array of particles moving with velocity V 1 . The particle volume fraction equals 0.2 and the Reynolds number is 40.

Moving FCC array of spheres
We consider a face-centered cubic (FCC) particle array with particle volume fraction α = 0.2 at finite Reynolds number, like in Refs. [3,23,29,8]. Periodic boundary conditions are applied in each direction and 4 spherical particles are put at The size of the cubical domain is given by 19. The velocities of the particles are constant and equal to V = V 1 e 1 and the forcing term is set to a = a 1 e 1 with where U 1 = 40 is the preset superficial relative velocity. For stability reasons the explicit forcing term depends on t. The difference between the preset and the computed superficial relative velocity decreases with decreasing t. The Reynolds In the definition of a 1 , the operator · appears. This represents a volume average over , the domain that excludes the particles. For the approximation of the volume integral over , we define the radius r e for the evaluation of integrals, which should coincide with a staggered radial position (r e = 0.8r b in this subsection). The integration over is then the sum of: (1) a midpoint integration over all spherical cells within distance r e of a particle location, (2) a midpoint integration over all Cartesian cells entire outside the spheres with radius r e , and (3) a midpoint integration over tiny Cartesian subcells that cover the parts of not covered by (1) and (2). The size of the tiny Cartesian subcells is h/5. This integration procedure has been described in more detail in Ref. [23].
Simulations have been performed for both fixed particles, V 1 = 0, and moving particles, V 1 = −U 1 /(1 − α), for three different resolutions and for 0 ≤ t ≤ 0.3. The number of Cartesian grid cells is N 3 1 = (42κ) 3 , where the grid refinement factor κ equals 1, 2 or 4. The corresponding resolutions of the spherical grids are 5κ × 24κ × 48κ . In each case, the radial grid spacing r is uniform and equal to h, while u ref = U 1 and t = 0.000025. The grid structure for κ = 1 is the same as in Fig. 4(a), except that h is approximately 1.3 times smaller than in Fig. 4(a). The time step is given the same small value in each case to ensure that the difference between preset and computed superficial relative velocity is negligible in each case. Results of the simulations, the three contributions to the particle force, F p,q 1 , F p,ν 1 and ρa 1 V p at t = 0.3, are also shown in Table 4 (after normalization by F 0 = 3πd p νU 1 ). The third contribution is due to the spatially constant pressure gradient −ρa 1 . Moreover, the total particle force F p,tot 1 The forces shown in the table are mean values, averaged over the particles and over time between t = 0.25 and t = 0.3. The corresponding standard deviations were verified to be small: for the fixed particles they were zero, while for the moving particles the relative standard deviations of F p,q 1 were 1.2% for the coarse, 0.2% for the finer and 0.03% for the finest grid (those of F p,ν 1 were an order of magnitude smaller). Table 4 demonstrates clear convergence upon grid refinement for both the fixed and moving configurations. The differences between the results of the fixed and moving configurations are negligible, which confirms that the implementation mimics the Galilean invariance of the Navier-Stokes equations. A perfect momentum balance implies that the mean of (1 − α)a 1 is equal to the mean of N p F p 1 . Thus the last number in the table, the mean of F p 1 divided by the mean of F tot 1 , should converge to 1 − α = 0.8, and it does. For fixed particles, the same test case was considered in Refs. [3] and [23]. It is remarked that the force contributions F p,q and F p,ν mentioned in Ref. [23] were larger because those were normalized by (1 − α)F 0 . As a check, the values from Table 4 were divided by (1 − α) and compared to those in [23]. Small discrepancies (<1.5%) were observed. Their source was traced, and a small error in the treatment of interpolation points outside c was found in the code used for the simulations in [23]. This error affected only the FCC test case and no other simulation presented in [23].

Falling sphere in channel flow
In this subsection we consider a falling sphere in a laminar channel flow. The test case was introduced by Uhlmann [2], while the initial condition was slightly modified by Breugem [5]. The domain c is given by L 3 = H = 20d p and L 1 = L 2 = H/2. There is a flat solid no-slip wall at x 3 = 0 and a free slip wall at x 3 = H . Periodic boundary conditions are used in the x 1 and x 2 directions. The initial velocity of the fluid is given by the parabolic profile  Fig. 4(b). The radius r e for the evaluation of integrals (introduced in the previous subsection) is approximately 3.125d p in this case. The evolution of the streamwise and wall normal particle velocity and the lateral rotational particle velocity are shown in Fig. 9. A clear dependency on resolution is observed, in particular for the streamwise and wall normal velocity. The fact that the differences between FS1 and FS2 are much smaller than the differences between FS2 and FS4 suggests that the grid refinement sequence is convergent. The curves of the streamwise velocity from FS2 and FS4 fall one upon the other. At the highest resolution (κ = 4), the total number of grid cells is approximately 1.2 million and the total number of time steps 16000.
Furthermore, Fig. 9 shows results of two immersed boundary simulations IBM1 and IBM2 performed by Breugem, both for d p /h = 54 (see Figs. 13-15 in [5]). IBM1 is the immersed boundary method as originally developed by Uhlmann [2] and has been used in impressive resolved simulations with many particles, see for example [34]. Breugem [5] showed that the first-order accurate IBM1 can be made second-order accurate if the fluid-solid interface is retracted. Thus, he formulated IBM2, in which the rectraction distance is 0.3h. Other improvements of IBM2 over IBM1 are that IBM2 uses the technique of multidirect forcing and, like [4], takes the fluid inertia inside the particle into account. Fig. 9 shows a remarkably good agreement between IBM2 and the overset grid simulation FS4. The results of these two simulations are apparently more accurate than those of IBM1. On coarse grids (for example d p /h ≈ 16), IBM2 produced small but clearly discernible rapid oscillations on the curve of ω 1 , which were attributed to grid locking. Such oscillations are not observed in the results of the overset grid simulations, also not in those of FS1 (see Fig. 9(c)). The total number of grid cells in FS4 is approximately 260 times smaller than in IBM1 and IBM2, which used a grid of 314.9 million cells. Also timewise FS4 is more efficient; the simulation required approximately 20 times less CPU time than IBM2 on the same hardware (see Appendix C). It is remarked that the immersed boundary method is more suitable for dense flows, whereas the present overset grid method is more suitable for dilute flows (with non-colliding particles).

Moving spheres in Taylor-Green flow
The third example of a Navier-Stokes flow is the three-dimensional Taylor-Green flow that evolves from the following initial condition for the fluid velocity in Cartesian coordinates: The flow is interesting because this very simple initial condition evolves into a complicated flow in which small scales are created by vortex stretching, like in turbulent flows [35]. The flow is periodic in each direction. The initial condition contains many symmetries and, provided the Reynolds number is moderate, the symmetries are probably stable under small perturbations. The domain c is cubical with L 1 = 32, while U = u ref = 40 and ν = 1, such that the initial Reynolds number is 1280. Schneiders et al. [36] performed a resolved simulation of small particles in a three-dimensional compressible Taylor Green flow (at low Mach number, 0.1). They showed an illustrative snapshot of the vorticity, but no quantitatively verifiable result of a flow variable.
Eight small spheres of diameter d p = 1 are injected at t = 0, four in the plane x 3 = 8 and four in the plane x 3 = 24. The initial particle velocity is obtained by substituting the initial particle positions into (69) to (71). The initial angular velocities of the particles are set to ζ /2 at the particle positions (ζ = ∇ ×û). We consider two cases: density ratio ρ p /ρ = 1 (TG1 and TG2) and ρ p /ρ = 8 (TG1a and TG2a, no gravity). Simulations have been performed using spherical grids of 30κ × 24κ × 48κ cells overset on a Cartesian grid of (64κ) 2 cells, for κ = 1 (TG1 and TG1a) and κ = 2 (TG2 and TG2a). The radial stretching function, r a , r b and r e are the same as in the previous subsection, and again d p /h = 2κ (see the illustration in Fig. 4(b)).
The locations of the particles and the structure of the velocity field is clarified by the simulated particle trajectories, shown in Fig. 10. For example, the trajectory of the first particle in TG2 is indicated by open circles. The diameter of the circles indicates the particle size on scale. When time evolves each particle moves to the outer region of the corresponding vortex. Each pair of neighbouring circles on the same curve corresponds to a time interval of 0.5. The simulations were terminated at t = 5. Due to the structure of the velocity (u 3 remains zero in these two planes), the x 3 coordinate of the particles does not change in time. Comparing the fine grid to the coarse grid simulations, we observe that the trajectories are similar, but not the same. Apparently, the effect of discretization errors gradually increases and becomes significant at later times in case TG1. For clarity, only the trajectory of the first particle is shown for ρ p /ρ = 8, one for TG1a and one for (e, f) Contours of the magnitude of the velocity from TG2 at t = 0.5 in the plane x 3 = 8. The contour increment equals 2 in (e) and 0.5 in (f), which also shows 12 velocity vectors (6 at r = 0.5 and 6 at r = 1). TG2a. The initial particle positions are the same as for ρ p /ρ = 1 and also in these cases the symmetries are preserved. For ρ p /ρ = 1 particles remain in the vortex where they were injected. However, for ρ p /ρ = 8 the particles respond to the fluid motion more slowly: when the particle shown exits the periodic domain at x 1 = 0 and reenters at x 1 = 32, it leaves the original vortex and moves over to a vortex rotating in opposite direction.
Further details of the simulations for ρ p /ρ = 1 are shown in Fig. 11. In Fig. 11(a), the total kinetic energy K = 1 2 |u| 2 and the total energy dissipation rate = ν ∇u : ∇u are shown as functions of time. The kinetic energy decays, but the dissipation rate first increases, due to the generation of smaller scales, until it attains its maximum around t ≈ 1, and then it decreases. The differences between the coarse and fine grid simulations remain small until t ≈ 1.5. At later times most differences gradually increase and can be clearly observed. Fig. 11(c) shows that the angular velocity is surprisingly accurate, compared to the velocity. The particle force components of the first particle are shown in Fig. 11(d) and the corresponding velocity components in Fig. 11(b). While the particle velocity looks very smooth for both resolutions, its time derivative (the particle force) displays small rapid oscillations in the coarse grid case TG1, which is another indication that TG1 is somewhat underresolved at later times. Furthermore, the viscous contribution to the particle force is shown in Fig. 11(d). It appears to be much smaller than the total force, which implies that the particle force is dominated by the pressure contribution if The motion of the eight particles follows the theoretically expected symmetries very accurately. The particles at the left bottom quarter of Fig. 11(a) and 11(b) are particle 1 and 5, respectively. Furthermore, in both figures, the numbering is anti-clockwise. The theoretically expected symmetries are v 1 7 , ω 1 = ω 3 = ω 6 = ω 8 , and ω 2 = ω 4 = ω 5 = ω 7 (the superscripts denote the particle numbers). All these equations have been explicitly checked for case TG2; during the entire simulation, all equations were satisfied with errors less than 10 −4 .

Conclusions
A staggered overset grid method for resolved simulation of incompressible flows around moving spheres has been presented. In this method, body-fitted spherical polar grids attached to the moving spheres are used to resolve the flow in the vicinity of the spheres. An accurate representation of the sharp particle-fluid interface is thus provided. The spherical grids are overset on a fixed Cartesian background grid. The Navier-Stokes equations in spherical coordinates and in Cartesian coordinates are respectively approximated on the spherical and Cartesian grids, using the standard staggered second-order finite difference scheme. The velocities and pressures on different grids are coupled by third-order Lagrange interpolations. A comprehensive description of the method has been provided, including the description of the data structure and communication algorithms required for an implementation in the form of an MPI parallel program.
The solver has been validated for seven different cases, four Stokes flows and three Navier-Stokes flows. The first three Stokes flows were the flows around a translating sphere, oscillating sphere and a rotating sphere. The analytical solutions for these flows are known, and the results obtained indicated pointwise convergence of the numerical to the analytical solutions. More specifically, it was found that the spatial maximum norm of the error in both velocity and pressure dropped below 1% of the reference values if the grid contains 30 points per particle diameter. The fourth test case was an instantaneously accelerated sphere. The effects of the step change of the velocity of the sphere were compared with those predicted by the Basset-Boussinesq-Oseen equation of particle motion. Good agreement was obtained within several time steps.
The first Navier-Stokes case was the flow due to a translating FCC array of spherical particles at particle volume fraction 0.2. The grid refinement study displayed a clear convergence of both the pressure and the viscous part of the drag force. Second, a sphere falling in a laminar channel flow was simulated using grid refinement. The results were compared to results obtained by two immersed boundary methods, IBM1 (first-order) and IBM2 (second-order, due to retraction of the particle surface), which were published in literature [5]. The results of the overset grid simulation at the highest resolution and those of the IBM2 simulation at the highest resolution displayed a very good agreement. In this case, the radial stretching facility of the overset grid method was exploited, such that the overset simulation at the highest resolution used approximately 260 times less grid points and 20 times less computing time than the IBM2 simulation at the highest resolution. The radial stretching facility was also used in the third Navier-Stokes case, incompressible three-dimensional Taylor-Green flow with 8 freely moving particles. The effect of grid refinement was shown for this flow, as for the other flows in this paper.
In the present method, the spherical grids do not overlap each other. Thus particle-particle contact cannot be simulated. Simulations in which particles closely approach each other at random locations require a fine Cartesian background grid (the minimum number of grid points across the gap between particles needs to be at least ten). Therefore, the present method is intended for relatively theoretical studies, for parallel simulations of incompressible flows with a low volume fraction of non-colliding solid spherical particles, flows that are described by an initial boundary value problem of the Navier-Stokes equations that is not too complicated for a direct numerical simulation in which the Navier-Stokes equations are accurately solved without simplification of the equations in any part of the fluid domain. In the opinion of the author, particle-resolved direct numerical simulation of cases with multiple non-colliding particles are useful for insight into particle-fluid interaction and for validation of point-particle models in turbulent flows, even if collisions are prevented by fixing the particles or limiting their motion in another way, see for example [37] and [23]. Furthermore, the overset grid method can be used to validate and benchmark other methods for particle-resolved simulation (for example, in the case of the falling sphere, not only the overset grid method, but also the immersed boundary methods IBM1 and IBM2 were tested).
To increase the flexibility and applicability of the present method, there are several possibilities. The first option is to allow mutual overlap of spherical grids. (In fact, the routines for interpolating from one spherical grid to another and for cutting out parts of spherical grids have been implemented in the solver, but apart from the observation that these routines work robustly, they have not been validated yet.) A second option is to construct an (implicit) method in which the radial extent of the spherical grids is kept as small as possible and the Cartesian background grid is nonuniform and adaptive. One could also think of a fine spherical grid overset on a local fine Cartesian grid, which is on its turn overset on the background Cartesian grid. For the narrow gaps between particles one could attempt to devise a deformable (perhaps non-orthogonal) grid overset on mutually overlapping spherical grids.

Appendix A. Parallel implementation
A common parallel algorithm for particle-resolved simulation is the so-called master and slave technique developed by Uhlmann [38]. In that technique the block structure is two-dimensional, while each time when a particle center moves to another block, the fluid variables associated to a particle are transferred to the processor of the new block (the particle gets another master). The present method differs from the master and slave technique in two important respects: (1) in order to minimize the total area of the contact surfaces of the blocks and the related MPI communication, the present method employs a three-dimensional instead of a two-dimensional block structure, and (2) in the present method, the center of the corresponding moving particle does not necessarily reside inside the Cartesian block on that processor, because a given spherical field stored on a certain processor remains on this processor during the entire simulation. The latter is done to avoid MPI communication of entire three-dimensional spherical fields.
As explained in section 3.1, each of the M processors contains the Cartesian fluid field of one Cartesian block and the spherical fluid field of maximum N p1 particles (N p /M ≤ N p1 < N p /M + 1). Thus, if N p = M N p1 , each processor is associated with N p1 + 1 grids, namely of the Cartesian block grid and the grids of N p1 particles. The acceptor lists and the primary and secondary donor lists, which include all interpolation weight coefficients, are accordingly distributed over the processors. Furthermore, the hydrodynamic forces exerted on the particles are distributed over the processors. However, all reference position vectors, all particle translational and angular velocities, the pointer array relating shadow to physical particles and the block lists containing particle numbers are stored on each processor. This is not strictly necessary for the algorithm, but it is convenient and not expected to produce significant overhead or affect the scalability of the solver, provided the number of particles is not too large (say less than 10 4 ). Under this condition, the resources required are almost entirely determined by the arrays for the fluid motion on the Cartesian and spherical grids, and these arrays are, for a given particle or a given block, stored on one processor only.
With respect to the actions in each time step in subsection 3.6, MPI communication is required in substeps 5, 8, 9, 11, 13, 15 and 16. In substep 5, after the particle translational and angular velocities have been updated on a given processor for the N p1 particles associated to the processor, these quantities are broadcasted to all other processors. In substep 16, the post processing substep, MPI communication is required for the computation of extrema and integrals over the entire flow domain. The MPI communication in the other substeps is related to either setting up the interpolation structure, performing the interpolations, or determining the fluid variables at boundary points.
Let us consider the determination of the interpolation structure on a given processor m. Since the Cartesian type arrays are equipped with multiple dummy layers, no MPI communication is required to fill the type arrays. Also, the construction of the acceptor lists does not require MPI communication, but the construction of the donor lists does. Processor m, being a future acceptor, prepares an a priori unknown number of fixed-length packages for distribution to other processors. A given package contains a package identification number, the number of acceptor interpolation points represented by the package, the processor number m b = m, the acceptor grid number b, the donor grid number a, and, for all interpolation points represented by the package, the grid acceptor grid indices (i, j, k). All interpolations points in a given package have the same donor grid a. Thus the package is prepared to be sent to processor m a (which could be equal to m). Processor m may prepare multiple packages for given processor m a . Because of the fixed size of the packages, the last package in a row of packages prepared for m a may be partially empty. After the preparation of a row of packages, processor m uses MPI communication to inform each processor m a how many packages it can expect from processor m. At the same time processor m is informed by other processors how many packages it can expect from them. Afterwards processor m, being a future acceptor, uses MPI to send all packages in the row sequentially to each processor m a . At the same time, processor m, being also a future donor, may receive packages from other processors. All received packages are copied into the primary donor lists.
After processor m has filled its primary donor lists, it collects additional information for the interpolation points in these lists. Going through its primary donor lists, it performs the following actions for each interpolation point in the lists. First, being a future donor, it recomputes the acceptor interpolation coordinates of a for each acceptor interpolation point x b , using the acceptor grid number and the grid indices (i, j, k) of the acceptor interpolation point. Subsequently, the indices of the reference corner of the corresponding interpolation stencil (which resides on processor m) and the interpolation weights are determined and stored in the secondary donor lists.
The interpolations themselves are performed in the following way. We remind that the donor lists contain the package identification numbers that were sent by the acceptor processors when the interpolation lists were set up. According to these identification numbers floating point packages of fixed size are prepared. Such a package contains the package identification number (converted to floating point format) and the corresponding interpolated values. MPI is used to send the package to the acceptor processor. After the MPI communication, the contents of the received packages is copied to the corresponding interpolation points, whose location is received from the acceptor lists. Before and after the interpolations are performed, the boundary conditions routines are called. The latter routines require MPI communication to copy the fluid variables at Cartesian dummy points from other blocks.  Table 6 Uniform flow past a fixed sphere. Maximum and L 2 norms of errors at t = 1. The numbers in brackets are estimates of the order of accuracy based on κ − 1 and κ. The pressure q was normalized by A small test of scalability of the parallel implementation up to 8 processors was performed by repeating the first 100 time steps of case TG2, on M = 1, 2, 4 and 8 processors. The scaling efficiency for M processors is defined by the wall clock time for M = 1, divided by the product of M and the wall clock time for M processors. The scaling efficiencies in this test were excellent since they appeared to be larger than one: 1.17, 1.30 and 1.11 for M = 2, 4 and 8.
Furthermore, the computing times of the overset grid simulation FS4 and the immersed boundary simulation using method IBM2 (see section 5.2) were compared. The results of these simulations displayed the same accuracy (see section 5.2). In order to perform this comparison for the same computer architecture, (parts of) these simulations were repeated on Intel Xeon 2690v3 nodes equipped with an Intel compiler (ifort). Since due to the 260 times larger grid, IBM2 required much more memory than FS4, more than 8 (namely 30) processors were used for IBM2. It was found that the CPU time was

Appendix D. Order of accuracy (steady Stokes flow)
To verify that the first-order behaviour of the pressure was not necessarily due to unsteady effects, a convergence analysis was performed for the steady state variant of the translating sphere case: uniform Stokes flow past a fixed sphere, for d p = 1, ν = 1 and free stream velocity equal to −e 1 . The analytical solution was prescribed at t = 0 and on the outer boundaries of the Cartesian domain for all times (0 ≤ t ≤ 1). The domain sizes were the same as in case Tran1. The spherical and Cartesian grids contain 5κ × 24κ × 48κ and (60κ) 3 cells, and three simulations were performed (κ = 1, 2 and 4). In Table 6, the maximum and L 2 norms are shown for both the velocity and the pressure. The estimated order of accuracy based on κ − 1 and κ, denoted by the number in brackets, is defined by 2 log(E κ−1 /E κ ), where E κ is the error in the quantity under consideration. Both the maximum and L 2 norm were based on all internal points. The L 2 norms were normalized by the total volume of all internal cells.