Efficient Convergent Boundary Integral Methods for Slender Bodies

The interaction of fibers in a viscous (Stokes) fluid plays a crucial role in industrial and biological processes, such as sedimentation, rheology, transport, cell division, and locomotion. Numerical simulations generally rely on slender body theory (SBT), an asymptotic, nonconvergent approximation whose error blows up as fibers approach each other. Yet convergent boundary integral equation (BIE) methods which completely resolve the fiber surface have so far been impractical due to the prohibitive cost of layer-potential quadratures in such high aspect-ratio 3D geometries. We present a high-order Nystr\"om quadrature scheme with aspect-ratio independent cost, making such BIEs practical. It combines centerline panels (each with a small number of poloidal Fourier modes), toroidal Green's functions, generalized Chebyshev quadratures, HPC parallel implementation, and FMM acceleration. We also present new BIE formulations for slender bodies that lead to well conditioned linear systems upon discretization. We test Laplace and Stokes Dirichlet problems, and Stokes mobility problems, for slender rigid closed fibers with (possibly varying) circular cross-section, at separations down to $1/20$ of the slender radius, reporting convergence typically to at least 10 digits. We use this to quantify the breakdown of numerical SBT for close-to-touching rigid fibers. We also apply the methods to time-step the sedimentation of 512 loops with up to $1.65$ million unknowns at around 7 digits of accuracy.


Introduction
Understanding viscous hydrodynamics in the presence of high aspect ratio bodies, such as filaments, rods, and rings, is key to many areas of science and industry, including the rheology of fiber and polymer suspensions [1] and permeability of fiber structures [2].Numerical simulation is crucial, since theoretical approximations may only apply in the dilute limit, or not at all [3].For instance, the gravity induced sedimentation of rigid fibers (with applications to paper and pulp manufacture) is shaped by spatio-temporal correlations in density, velocity and orientation, and instabilities such as streamers and flocculation, that are an active research area [4,5,6,7].Modern experimental methods allow a detailed comparison with numerics [1].Simulation also sheds light on many areas of biological fluids, including locomotion by driven flagellae [8], the collective effects of such swimmers [9], and transport by arrays of driven cilia [10].Molecular motors moving along a bed of flexible filaments can lead to a swirling instability in oocytes [11].In cells, the hydrodynamics of large assemblies of microtubules or actin filaments controls cell motion, transport, and division [7].
The inner step in modeling such phenomena is solving a quasistatic elliptic boundary value problem (BVP) for the incompressible Stokes equations in the exterior of the collection of slender bodies in question; see Eqs. (1)(2)(3) below.From this, the linear relationship between velocities (and angular velocities) and forces (and torques) may be extracted.In the case of rigid bodies, this is encoded by a matrix, leading to two distinct tasks: [12,13] 1) the resistance problem where velocities and angular velocities are prescribed (as in porosity, where they vanish), and 2) its inverse, the mobility problem where body forces and torques are prescribed (as in sedimentation).In the former case statistics of the velocity field are of interest [2,14].In the latter case the resulting velocities enable the body dynamics to be evolved in a time-stepping scheme for a system of ordinary differential equations (see, e.g., [15,16]).The case of flexible fibers is in some ways easier, since no BVP solution is needed (simply an evaluation of prescribed Greens function sources); however, more specialized time-stepping schemes are needed to handle bending and inextensibility [7,1].Stochastic averaging over many expensive long-time simulations is often needed to extract meaningful bulk quantities (viscosity, transport rates, etc), making computational efficiency a pressing concern.In this work we consider the rigid (and non-Brownian) case where the challenge of an accurate BVP solution is foremost.

Prior work
We now overview numerical methods for the Stokes BVP and associated mobility problem for many slender bodies; also see [1,Sec. 3] in the flexible case.Firstly, direct discretization (e.g., finite elements), while in principle convergent as the mesh size tends to zero, is inefficient and thus rarely used.This is due to the high cost of meshing (or remeshing every time step) the exterior domain and its inability to handle unbounded domains.More common volumetric approaches include Lattice-Boltzmann methods [17] in which fictitious gas particles interact within a grid of cells covering the domain, and immersed boundary methods where fibers are overlayed on a finite difference grid [18].However, both of these are accurate only when the grid spacing is somewhat less than the smallest features or separations [17], forcing the number of grid points to be huge if accuracy is needed.For a comparison including finite elements for aggregates of spheres see [19].
The most popular approach is nonlocal slender body theory (SBT) [20,21,22].This expresses the fluid velocity via a 1D integral over a given force density living on the union of the centerlines of the bodies.The integral kernel is the stokeslet Green's function, plus a "doublet" correction due to Johnson [21].The self-interaction of a body requires a special form with local and nonlocal terms.SBT was originally derived using matched asymptotics as ε → 0, where ε is a slenderness parameter, with leading neglected term O ε 2 log ε −1 [21].Ellipsoid-like rounding of open fiber ends is assumed; for closed fibers (as we consider in this work) there is no such complication [23].For flexible fibers the force is given locally by the geometry, thus evolving the dynamics involves simply applying the 1D SBT integral operator (modulo a tension solve) [24].In contrast, for rigid fibers one must solve a 1D integral equation for the unknown force density [15] (the so-called slender body inverse problem [25]).There are other subtleties.Since the classical self-interaction operator is in fact logarithmically divergent [22,23] [24, App.B], regularization of the kernel is often needed for numerical stability.Only recently has SBT been derived [26] from boundary integral equations (discussed below).Very recent rigorous error bounds, by Ohm and coworkers, include O(ε 3/2 ) in the flexible case (which required defining a new slender body BVP [27]), and numerical tests give best-fit errors around O(ε 1.7 ) [28,Sec. 5].
Yet, despite their wide use, classical and regularized SBT do not give convergent numerical methods: for any finite ε > 0, the error in solving the desired Stokes BVP (or in the flexible case, slender body BVP), does not vanish as the centerline discretization becomes finer.Especially problematic is the growth towards O(1) errors as fibers approach O(ε) or smaller separation; even at a separation of 10ε, [28,Fig. 8] shows a maximum-norm centerline velocity error of about 0.003, as ε → 0, in the flexible loop case.In Section 5.4 we explore the analogous breakdown for a resistance problem in the rigid loop case.The cause is failure of the inner SBT asymptotic expansion when there is another fiber within distance O(ε).This has led to various ad hoc methods to better handle the close-touching case, such as added lubrication forces [33,4] or blending to the centerline [29].However, the upshot is that in any given fixed-slenderness simulation, there is no way to vary a convergence parameter in order to assess the size of the errors.The errors induced by SBT in most practical settings are unknown.
We should note that SBT is one in a family of popular lower-order approximate models used in slender hydrodynamic interactions, such as local SBT [33,4], pairwise Rotne-Prage-Yamakawa tensors in Stokesian dynamics [34], regularized stokeslets (e.g.[35,10]), bead models (e.g.[36]), and short-range approximations.Such methods are reviewed in [15,1].A common difficulty is that the regularization scale, or bead diameter, must be of the same order as the radius ε for accuracy, making such models arbitrarily expensive as ε → 0.
Finally, via potential theory with the free-space Green's functions (stokeslet and/or stresslet) for the Stokes system, a Stokes BVP can be reformulated as a boundary integral equation (BIE) involving an unknown density on the surfaces of the bodies [37,12,13,38].There are various choices in formulation for the resistance problem (Stokes Dirichlet BVP) and for the mobility problem for rigid bodies; we discuss these below.The discretized BIE results in a dense linear system, but with many less unknowns (N) than for volume discretization to the same accuracy.An iterative solution for the density often converges rapidly if a well-conditioned formulation is chosen.Here, each matrix-vector multiplication may be accelerated by a Stokes FMM [39,40] or Ewald method [41], giving optimal O(N) or O(N log N) complexity.With the density found, solution evaluation can also benefit from such acceleration.BIE methods have had much success for bodies of moderate aspect ratio, including mobility simulations with nearly 10 5 spheres [42].Accurate and efficient quadrature schemes to discretize the weakly-singular integral operator is still an active research challenge, especially when surfaces approach each other or evaluation is needed near to a surface.The root cause is the 1/r singularity on-surface, and typically 1/r 2 singularity for surface-to-volume interactions.While we cannot review all such high-order quadratures, for 3D Stokes the major types are: Galerkin on triangle patches [13,43,44], and Nystr öm methods including partitions of unity [45,46,47], triangular patches [48], local weight corrections on regular grids [49,50], quadrature by expansion (QBX) [51,14], line extrapolation [52], and spectral schemes exploiting spherical harmonics for spheres [53,42] or (by grid rotations) smooth deformations of spheres [54].
At high aspect ratio (ε → 0), conventional BIE quadratures suffer: the requirement that the surface patches or grids retain a local O(1) aspect ratio forces the N per body needed for fixed accuracy to grow like ε −1 .This is growth is illustrated by the 2nd-order slender body tests of Keaveny-Shelley [55], the 16th-order torus tests of Bremer-Gimbutas [48,Tbl. 3], and the rounded rods of aspect ratio 10 of Bagge-Tornberg [56,Sec. 6.3] via QBX.A related difficulty is the need to replace any "completion flow"-a term traditionally applied using a stokeslet and rotlet at a single interior point [13]-with centerline sources whose number also grows like ε −1 [55] [56,Sec. 7.2.1].These issues motivated the present work.The method of fundamental solutions has also been used for Stokes flows [13,Ch. 7], but also suffers at least as much as BIE at high aspect ratio.

Contributions
We present a convergent Nystr öm discretization scheme (that we dub CSBQ for "convergent slender-body quadrature"), that is high-order accurate for the Stokes BIE over a wide range of aspect ratios ε −1 , and has constant cost in the limit ε → 0. It exploits the idea that the density, although highly anisotropic with respect to the surface metric, is smooth as a function of centerline (s) and angular (θ) coordinates.The centerline is discretized into panels, on which the density is interpolated from a tensor product grid in s and θ involving a small number (usually ∼ 10) of uniform angular nodes.To handle the weakly singular on-surface kernels, precomputed custom quadratures are used both for inner toroidal Green's function evaluations and for outer centerline integrals.This minimizes the number of auxiliary nodes needed, boosting efficiency.We show that, even at aspect ratio only 10, our set-up and solution times are 10× faster than a recent optimized more general torus solver (Section 5.1), and this ratio grows linearly with aspect ratio.By interpolation onto new target-dependent panels, we provide accurate velocity evaluations with a cost that is only logarithmic in distance.By choosing parameters adaptively, CSBQ also accurately handles close-to-touching fibers: we show 11-digit accuracy for a separation of ε/20.
Through our HPC implementation of these tools, we envision that accuracy may be controlled (via simple convergence studies) in rigid-fiber Stokes simulations, at a CPU cost comparable to that of a numerical SBT simulation.Towards this goal, we show that typically 6-digit accuracy is achieved with a quadrature set-up throughput of 20,000 unknowns per second per core, which is comparable to a few Stokes FMM calls at this accuracy.In settings where SBT is inaccurate, or has unknown accuracy, such as the common case of moderate aspect ratio (10-10 2 ) and/or close-to-touching fibers where lubrication forces play a role, CSBQ for the first time enables a reliable reference solution at an acceptable cost.
Our secondary contributions are: • We present and test a layer representation (an ε-dependent admixture of single-and double-layer) which leads to a O(1) condition number as ε → 0, for both Laplace and Stokes BVPs (Section 4).
• We prove that our new projected combined field integral equation for Stokes mobility Eq. ( 42) is uniquely solvable (Theorem 3).
• We use the presented tools to explore the breakdown of the SBT solution for resistance (drag) as a function of the slenderness ε and separation of two rigid tori; see Section 5.4.For this we solve the SBT inverse problem with high order accuracy.This complements a recent study [28,Sec. 5.3] in the flexible-fiber case.
• In mobility problem tests (sedimentation of up to 512 rigid loops), we combine our BVP solver with a high-order timestepper to accurately evolve the body dynamics, approaching the time of first numerical collision.We study parallel strong scaling for this problem.
We expect that CSBQ could also much accelerate the solution of the slender body BVP for the flexible fiber case, for which a high-order accurate, but non-accelerated, BIE quadrature was devised recently [28,App. B].
Remark 1.We develop and test the presented quadratures in the scalar Laplace as well as the Stokes case.The former can be directly applied to the electrostatics of thin wires and loops.The techniques would also easily adapt to the Helmholtz (and possibly Maxwell) equations with high aspect ratio geometries when the radius is subwavelength.These are crucial to the modeling of electromagnetic scattering or radiation from arbitrarily curved (or close-to-touching) wires or antennae, for which 1D integral equations are only exact in the isolated, straight, axisymmetric antenna case [57].

Limitations
CSBQ as presented is limited to the common case of bodies with circular cross-section, although their radius may smoothly vary along the centerline.This allows all θ-integrals to be evaluated rapidly via toroidal Green's functions, for which we precompute custom quadratures.We present and test only on closed fibers (loops); note that the open-fiber case has endpoint singularities that add an extra complication, unless a parabolic radius scaling is assumed as in SBT [21,22,24].In order to tackle singular geometries in classical Laplace and Stokes BVPs, we restrict to rigid body problems (resistance and mobility).Thus our work is not (directly) applicable to the recently-formulated slender-body BVP needed for flexible-fiber hydrodynamics [27].
Finally, we do not address the explicit prevention of collisions: we solve the mathematical Stokes BVPs and the resulting rigid-body dynamics until the bodies become too close to resolve (see Remark 5).

Organization of the paper
The following Section 2 defines the Stokes BVPs under study and their layer potentials.Section 3 presents the core quadrature techniques.Section 4 the new well-conditioned scaling for slender-body combined-field representations, for both Laplace and Stokes BVPs, and motivates them via extracting operator eigenvalues on the straight periodic fiber.Several different types of numerical tests are reported in Section 5, including complicated fibers, close-touching geometries, a study of the breakdown of SBT in a close-to-touching mobility setting, comparison with an existing solver, and the sedimentation in time of many rigid tori.We summarize and discuss open problems in Section 6.Three appendices organize the details of the proofs, generalized Chebyshev quadrature construction, and the numerical SBT inverse problem.In Table 1, we list some frequently used symbols for easy reference.
Symbol Description y(s, θ) surface parameterization along centerline and angular direction respectively x c (s) slender-body centerline coordinates ε(s) cross-sectional radius n(s, θ) outward unit surface normal

Problem setup and notation
Consider B distinct slender bodies Ω = B b=1 Ω b , where Ω b ⊂ R 3 .Each body Ω b is described by its centerline, restricted in this work to be a smooth closed curve (loop), plus a function ε giving the radius of the circular cross section at each point on the centerline.The bodies are suspended in a Stokesian (linear viscous) fluid where the constant dynamic viscosity has been nondimensionalized to 1.The fluid velocity u and pressure p in the exterior of Ω are governed by the Stokes equations, Here Eq. ( 1) and Eq. ( 2) denote the momentum balance and incompressibility constraint respectively.In addition, we also assume that the fluid velocity at infinity decays to zero, In this work we consider the Stokes problem with Dirichlet boundary conditions (as arises in the resistance setting where body motions are known), and the Stokes mobility problem (corresponding to solving the unknown rigid motions of bodies with specified forces and torques).The boundary conditions in each case are discussed below.

Stokes Dirichlet problem
The fluid phase R 3 \ Ω satisfies the Stokes equations Eqs.(1, 2) with decay boundary conditions at infinity Eq. ( 3), and specified fluid velocity on the geometry boundary ∂Ω, The velocity field u in the fluid phase R 3 \ Ω has a unique solution [37, p. 60-64] which is to be determined.A common application is to the resistance problem [13, §4.9]: one solves this BVP with u 0 equal to minus the value on ∂Ω of a given background flow.Then u is interpreted as the change from the background flow due to the presence of the bodies Ω with non-slip boundary conditions.

Stokes mobility problem
where v b is the unknown translational velocity and ω b is the unknown angular velocity of Ω b about the point x c b .A slip velocity boundary condition u s between the rigid bodies and the fluid is prescribed, so that We are given u s , and x c b , F b , and T b for each b = 1, . . ., B. The rigid body motion V (i.e., v b and ω b for each b = 1, . . ., B), and the flow u, are not known and must be determined.Such a solution exists and is unique, as shown via potential theory and Fredholm theory [13, §4.9] [58, Secs.2.5.1 and 4.6] [59].
Boundary integral equations Solutions to homogeneous linear constant coefficient elliptic PDEs can be represented by layer potentials, i.e., convolution of a Green's function for the PDE with a boundary density function.For the Stokes equations, Eqs.(1-3), the fluid velocity u can be represented as [37,13,38] where S(r) := 1 8π I |r| + rr T |r| 3 is the (tensor-valued) single-layer Stokes velocity kernel, σ is an unknown vector-valued boundary density function, and dS is the surface area element.By construction, u satisfies the PDE everywhere in R 3 \ Ω.We will also need the double-layer velocity potential where D(r; n) := − 3 4π (r • n) rr T |r| 5 is the (tensor-valued) double-layer velocity kernel, and n y is the outwardpointing unit surface normal at y ∈ ∂Ω.
Applying the boundary conditions to a representation as above (which may involve jump relations) results in an integral equation for σ, which may be discretized to give a linear system whose solution vector is σ at the set of surface nodes.To effectively use such integral equation methods, we need layer potential representations that lead to second-kind Fredholm integral equations, hence to well-conditioned linear systems in the discretized unknown boundary density.We also need efficient quadrature schemes to evaluate the singular integrals required for computing layer potentials.We discuss the algorithms for evaluating layer potentials for slender bodies in Section 3 and give boundary integral equation (BIE) formulations for the Stokes Dirichlet and mobility problems in Section 4.

Numerical algorithms
In Section 3.1, we describe the discretization of the slender body surface and then discuss the construction of boundary integral operators in Section 3.2.In Section 3.3, we give details of the parallel algorithms.

Surface discretization
The geometry of the b th slender body Ω b is shown in Fig. 1 (left); for notational simplicity we drop the dependence on b.Table 1 provides a summary of commonly used notations.The smooth closed centerline curve γ is parameterized by s ∈ [0, 1], so that γ = x c ([0, 1)).The circular cross-sectional radius is given by the smooth periodic function ε(s).In addition, we require an orientation vector e 1 (s) of unit length at each point on the centerline, orthogonal to the centerline.This orientation vector may be given or determined automatically as described below in Section 3.1.1.A second orientation unit vector e 2 is computed as, ds × e 1 .Points y on the slender body surface Γ = ∂Ω b are then given by, We discretize a slender body surface by constructing a piecewise polynomial approximation of x c , ε, and e 1 along the centerline to a given accuracy tolerance.This results in partitioning of [0, 1] into K intervals (or "panels") {I 1 , . . ., I K }, with corresponding partition of Γ into surface elements {Γ 1 , . . ., Γ K }.
θ .Smooth functions f (s, θ) on the slender body surface ∂Ω are represented to high-order accuracy by their point values at these nodes, f For instance, approximating a vector-valued density on the single body requires 3N nodes unknowns.A function can then be evaluated at any another parameter pair (s, θ) by using barycentric Lagrange interpolation from the nodes of the containing panel in s [60] and trigonometric interpolation in θ.By differentiating the interpolant, we can also compute surface gradients of these functions.

Constructing an orientation vector automatically
The orientation vector e 1 can be any smoothly varying vector defined at each point on the centerline and orthonormal to it.For a given slender body Ω b , we set e 1 at s = 0 to be any random vector orthonormalized with respect to the centerline tangent vector dx c /ds.Then, using this as the initial value, we solve the following ordinary differential equation (ODE) in s ∈ (0, 1), It can be shown that e 1 constructed in this way is orthonormal to the centerline everywhere.We solve for e 1 using spectral discretization of the ODE on the Gauss-Legendre nodes of each panel in s. (Note that the accuracy of the scheme is not crucial, only that the approximate solution be smooth.)The derivatives dx c /ds and d 2 x c /ds 2 are computed through numerical differentiation of the polynomial representation of x c on the piecewise centerline panels.Since the numerical solution and its piecewise polynomial representation along the centerline are not exact, each time we evaluate e 1 (s), we re-orthonormalize it with respect to the centerline.

Layer potential operators
We now describe the numerical computation of boundary integrals of the form in Eqs.(7,8).Our method is kernel independent, and therefore can be applied to most elliptic PDE kernels.In this work we demonstrate it for Laplace single-and double-layer potentials (S L and D L ), and Stokes single-and double-layer velocity potentials (S and D).The target point x can be either on-surface (such as when solving the boundary integral equation) or off-surface.For on-surface targets, the boundary integral is singular and we need special quadrature rules.For off-surface targets, when the targets are far away from the boundary, the integral is smooth and we can use standard quadratures; however, for targets close to the boundary the integrand is sharply peaked and using standard quadrature rules is not feasible.In the following sections, we describe how our algorithm handles each of these cases.

Nystr öm discretization with local corrections
We consider a single slender body surface Γ = ∂Ω b , discretized into K surface elements {Γ 1 , . . ., Γ K }, as described in Section 3.1.Also consider a surface density function σ discretized similarly with the function values σ Let G be a kernel function whose convolution with σ on Γ gives the desired potential.The potential at a target point x is given by summing the potential from each slender body element The kernel G(x, y) can be any linear combination of single-layer S(x − y) and double-layer D(x − y; n y ) kernels.The potential from each element Γ k is given by integrating along its interval I k in s and over θ ∈ [0, 2π).When the target x is sufficiently far from Γ k , the integrand is smooth and the integral can be computed numerically using the existing Gauss-Legendre quadrature rule in s and periodic trapezoidal rule in θ, where J = ∂y ∂s × ∂y ∂θ is the Jacobian of Eq. ( 9), and w ij are the quadrature weights.These weights are defined as w nodes total work.This can be accelerated through the use of a fast multipole method (FMM) [61,30] and computed in O(N nodes ) time.We use the PVFMM library [62, 63], which is an optimized distributed-memory implementation of the kernel independent FMM [64] and supports several elliptic PDE kernels.
The slender body surface is discretized into elements Γ k .For a given quadrature accuracy tolerance ϵ quad , there is a region N Γ k around each element such that for target points outside this region, standard quadrature rules (tensor product of Gauss-Legendre and periodic trapezoidal) can be used.For target points inside N Γ k we use special quadrature rules.We approximate the region N Γ k by a set of overlapping spheres centered at the surface discretization nodes.This simplifies the task of identifying the target points within N Γ k .
Error estimates for Gauss-Legendre quadrature are given in [65,Thm. 19.3], and for periodic trapezoidal quadrature in [66].These estimates depend on how far the integrand can be extended analytically into the complex parameter plane.In our application, this is related to the distance of the target point from the surface.Given an accuracy tolerance ϵ quad , the k th element quadrature in Eq. ( 12) is accurate for target points outside a region N Γ k around Γ k , as shown in Fig. 2. Such regions have been determined accurately for quadratures on curves in [31] and for quadratures on surfaces in [67].In our setting, when the radius ε is small, N Γ k takes the form of a prolate ellipsoid (a body of revolution generated by a Bernstein ellipse) distorted by the map x c (I k ) giving the centerline panel [31,Sec. 3.2].For efficiency, we approximate this N Γ k by a union of overlapping spheres centered at each surface discretization node.The radius of the spheres is chosen based on the accuracy tolerance ϵ quad , the dimensions of the element, and the orders of the Gauss-Legendre and periodic trapezoidal quadrature rules.By sorting the target points and these spheres on a space-filling curve we can efficiently determine the target points which lie within each neighborhood N Γ k for all elements Γ k in a collection of slender bodies.We describe this algorithm in more detail in Section 3.3.For all such pairs of elements and target points x ∈ N Γ k , we add a correction to Eq. ( 12), where are target-specific corrections to the discretized linear operator.Since the corrections are local, we expect only O(1) storage and evaluation cost per target point.The corrections have the effect of subtracting the incorrect first term above and adding the correct potential.The correct potential is computed using special target-specific quadrature rules described below.These quadratures can be relatively expensive to evaluate on-the-fly each iteration.Therefore, for every surface element Γ k we precompute the needed correction matrix elements its near field.The result may be kept in memory as a sparse N nodes × N nodes matrix, whose product with the current density vector is added to the result of the FMM.This makes subsequent applications of the discretized boundary integral operator faster.We refer to the computation of these corrections as the quadrature setup step, and the subsequent application of the integral operator as the evaluation step.
x k γ circ θ α 2α Figure 3: A circular source loop γ circ (in red) of unit radius, parameterized by θ with points on the loop given by y circ (θ).An annular sheet (in blue) with radius between α and 2α, and a target point x k on this annulus.We use the method described in Appendix B to build a generalized Chebyshev quadrature rule in θ that can integrate the potential at any target point on the annulus.

Fast modal Green's function evaluation
We consider the setup in Fig. 3.A circular source γ circ with unit radius is parameterized by the angle θ and points on this loop are given by y circ (θ).If the density functions are Fourier modes of the form e ilθ for l ∈ {0, 1, . . ., l 0 }, the potentials generated by this circular source are called the modal Green's functions g l and are given by the following integral, where x is the location of the target evaluation point.Such integrals must be computed to evaluate the integral in θ (the inner integral in Eq. ( 11)) when computing layer potentials from slender body elements.As discussed earlier, for a target point that is sufficiently far away from the source loop, the integral can be computed using the periodic trapezoidal rule.We now describe an efficient method for evaluating these integrals when the target is arbitrarily close to the source γ circ .In Fig. 3, we consider an annular region with radius between α and 2α, centered at y circ (0) and orthogonal to the circular source at θ = 0. We sample a uniform distribution (i.e., a polar grid uniform in angle and radius) of points x k for k = 1, . . ., k 0 on the annulus, and define integrand functions φ kl (θ) := e ilθ G(x k , y circ (θ)).We choose the number of points k 0 to be sufficiently large so that the integrand corresponding to a target point anywhere on the annulus can be represented as a linear combination of φ kl up to the desired accuracy tolerance.Typically, k 0 ≈ 600 is sufficient for 14-digits accuracy.Next, we build a panel based (composite) Gauss-Legendre quadrature rule in the interval [−π, π] with the panels refined dyadically around θ = 0 until the smallest panel size is commensurate with the dimensions of the annulus.The order of the Gauss-Legendre rule is chosen to be sufficiently high to integrate the product of any two integrands.We then follow the algorithm described in Appendix B to build a generalized Chebyshev quadrature rule with nodes m 0 } that can integrate all integrands φ kl .This quadrature rule can be used to approximate the integral in Eq. ( 14) for any target point x on the annulus and |l| ≤ l 0 .We construct such quadrature rules for all values of α ∈ {2 −1 , . . ., 2 −30 }.The number of quadrature nodes m 0 is found to be independent of the distance α, and is significantly smaller than what would be required for an adaptive panel based quadrature rule.For instance, for l 0 = 8 and quadrature accuracy tolerance ϵ quad = 10 −10 , we have m 0 ≈ 38 (for all distances α).
In our application, the kernel function G is scale invariant; therefore, these precomputed generalized Chebyshev quadratures can be applied to source loops of any radius when the annulus region is also appropriately scaled.Due to rotational symmetry, by rotating in θ by an appropriate angle, the quadrature can be applied to targets anywhere in the volume of revolution of the annulus.To compute the inner integral in Eq. ( 11), for a given s and a target x, we determine θ 0 corresponding to the closest point y(s, θ 0 ) to x.Based on the distance |y(s, θ 0 ) − x| relative to the loop radius ε(s), we pick the appropriate quadrature rule with nodes {θ Instead of computing the integral directly for the density σ, we compute it for each Fourier mode e −il(θ 0 +θ) , ĥl (x, s) := In practice we only need to evaluate ĥl for l ≥ 0, since ĥ−l = ĥ * l .The Fourier modes evaluated at the quadrature nodes (e −ilθ (α) m ) are precomputed and stored along with the quadrature rule to avoid expensive evaluation of complex exponentials each time.Then the length-N θ inverse discrete Fourier transform (DFT) of the vector of complex-valued coefficients ĥ(x, s) := { ĥ1−N θ /2 , . . ., ĥN θ /2 } gives the N θ real-valued weights of the discretized integral operator, h = F −1 ( ĥ).Writing out the inverse DFT explicitly, Since N θ is small (≤ 100), we build the operator matrix for the inverse DFT and apply it as a matrix-vector product (or matrix-matrix product when batched).The loop integral can then be approximated to high order for any smooth density σ using 2π 0 where θ j are the equispaced discretization nodes in [0, 2π).Here the values σ(s, θ j ) are interpolated from the known surface values σ (k) ij in the manner explained in the next section.

Near-singular quadrature
When evaluating the potential from a slender body element Γ k at a target point x that is near it (i.e.x ∈ N Γ k ) but off-surface, we use an adaptive panel based Gauss-Legendre quadrature rule to compute the integral in s along the length of the element, as shown in Fig. 4. The panels are refined adaptively near the target x and the order of the Gauss-Legendre rule is chosen to achieve the required quadrature accuracy ϵ quad .The x ada ptiv ely refi ned GL pan el qua dra ture integral in θ is computed as described in Section 3.2.2.The potential at x due to Γ k can be approximated as, where {s  13) are given by where w (k) ij are the surface element far-field quadrature weights that were used in Eq. ( 12).

Singular quadrature
When evaluating the potential from a slender body element Γ k at a target point x that is on-surface, for the Laplace and Stokes kernels, the integrand in s has a logarithmic singularity at the target parameter s 0 .We could compute this integral using panel based quadrature rule as shown in Fig. 5 (top).The panels are refined dyadically around the target until the smallest panel length is approximately ε(s).Gauss-Legendre quadrature rule is used for panels not touching x and a special singular quadrature rule is used for the panels on either side of x.We constructed this singular quadrature rule using the method described in Appendix B using integrands of the form p(s) log |s − s 0 | + q(s), where p(s) and q(s) are polynomials in s.However, using dyadically refined panel quadratures can become expensive when ε(s) is small, simply because the panels must refine down to the ε scale before the logarithmic singularity becomes dominant (note that at larger scales the singularity tends to the pointwise Green's function, thus is 1/r or 1/r 2 ).This has been realized in the electromagnetic setting [57].Instead, we precompute generalized Chebyshev quadrature rules that integrate the entire length of Γ k at once.For this we generate integrand functions using several (∼ 20) straight cylindrical slender body elements with different aspect ratios (the ratio of the length of the slender body element to its radius ε) in the range β to 2β.We did not try using elements with different curvature since the number of integrands was already very large; e.g. for the Stokes single-layer kernel we have 6 × 20N s N θ scalar integrand functions (6× is because the kernel is symmetric 3 × 3 tensor).We first use the dyadically refined panel quadratures to discretize these integrands, then use the method in Appendix B to construct a generalized Chebyshev quadrature with far fewer quadrature nodes.We precompute and store separate quadrature rules for each of the N s positions of the target within the panel (i.e. each Gauss-Legendre node in s, since the on-surface targets are just the surface discretization nodes) and for different slender element aspect ratios for β ∈ {2 −7 , 2 −6 , . . ., 2 30 }.For 10-digit accuracy and N θ = 10, typically m 0 is in the range 32 to 47.
To compute on-surface Nystr öm corrections E (k) ij (x) in Eq. ( 13), we select the appropriate quadrature rule for the aspect ratio of Γ k .Let {s sing 1 , . . ., s sing m 0 } and {w sing 1 , . . ., w sing m 0 } be the nodes and weights for the singular quadrature in s.As in the previous subsection, we build a Lagrange interpolation matrix P to interpolate from the s discretization nodes to the special quadrature nodes such that σ(s The Nystr öm corrections are given by where h j are again given by Eq. ( 17) and w (k) ij are the far-field quadrature weights that were used in Eq. ( 12).

Parallel algorithms
We support distributed memory parallelism using MPI.This requires partitioning the data across MPI processes, while also achieving good load balance across processes.The data consists of a global array of slender body elements that is partitioned across processes.We load balance for the quadrature setup stage by assigning a cost estimate to each element using the cost analysis for T setup in Section 3.4, and repartition this global array so that each local section has similar total cost.For the layer potential evaluation stage, the far-field computation (i.e. the N-body sum) is the dominant cost, and this has a different cost estimate than the quadrature setup.Therefore, for the evaluation stage, all the source points (the discretization nodes y (k) ij ) and the target points are repartitioned equally across processes; this is done within the PVFMM library.
The quadrature setup step is essentially a local operation for each slender body element.However, identifying the set of target points in the near region N Γ k of a slender body element Γ k requires communication since these points may be on different processes.As described previously in Section 3.2.1,we approximate N Γ k by a set of overlapping spheres centered at the surface discretization nodes y (k) ij .To identify the target points that overlap with these spheres requires an octree-like data structure to partition the space hierarchically and allow for searching in the neighborhood of a tree node.However, instead of using a standard tree data structure, we use space-filling curves with Morton ordering (also called a hashed octree [68]).It provides most of the same functionalities (such as O(log N) searching), while having several performance advantages due to better memory access patterns.The spheres and the target points are sorted in Morton order using a distributed sorting algorithm [69].Each tree node can be identified as a contiguous section of this Morton sorted array and these sections can be identified through binary searching in logarithmic time complexity.The partitioning of this array across processes also gives a partitioning of the domain across processes.The radius of a sphere determines its depth in the tree (according to d = ⌊log 2 r⌋ where r is the radius), and we search among the sphere's neighboring tree nodes at that depth to identify target points that overlap with the sphere.For spheres that have neighboring nodes outside of the domain of the current process, we send the spheres to the processes containing those neighboring nodes.After identifying the target points in each slender body element's near region, these points are sent to the process where the element originated.Duplicate target points must be removed since a point can overlap with more than one sphere of the same element.For each target and element pair (x, Γ k ) such that x ∈ N Γ k , the local Nystr öm corrections E (k) ij (x) can now be computed as described in Sections 3.2.3 and 3.2.4.We keep track of where each target point originated, since in the evaluation stage, the corrections from all near-elements of each target must be gathered and added to the final target potential.

Computational Cost
The quadrature setup step requires computing the local Nystr öm corrections for the singular and nearsingular interactions.The algorithm in Section 3.3 to identify the near interaction targets is dominated by the cost of the sorting algorithm and requires O(N/p log N) time, where N is the total number of unknowns and p is the number of processes.The number of near interactions is geometry dependent; however, we will assume that the total number of singular and near-singular interactions is proportional to the number of surface discretization nodes.We also assume that the number of generalized Chebyshev quadrature nodes in θ and s scale linearly with the discretization order N θ and N s respectively; this is justified by a heuristic geometric convergence with respect to each of these parameters.Then, Eqs.(16,17) to compute all h j .For each target x, Eq. ( 21) requires computing O(N s ) modal Green's functions and O(N s ) work (for each i, j index) to apply the interpolation operator.Therefore, building the singular corrections has If we assume a constant number (on average) of dyadically refined panels in s for near-singular interactions, then we arrive at the same asymptotic cost estimate for the near-singular interactions as well.If the discretization orders N θ and N s are assumed to not vary too much across all the elements, then the overall quadrature setup time is, The quadrature evaluation step is dominated by N-body sum (first term in Eq. ( 12)).When the sum is computed directly, the evaluation time is T eval = O N 2 /p .This can be accelerated to T eval = O(N/p) (neglecting communication costs) using the PVFMM library.However, due to extra overheads and lower parallel efficiency of FMM for small problem sizes, direct evaluation may be faster for problem sizes smaller than ∼ 30K points per CPU core on up to 40 cores.

Boundary integral equation formulations
Although our focus is (vector-valued) Stokes BVPs, it will be beneficial first to study the simpler (scalar) Laplace BVP.Thus in Section 4.1, for Laplace and Stokes Dirichlet boundary value problems, we present new indirect boundary integral equation (BIE) formulations which exhibit small condition numbers uniformly as the cross-sectional radius ε → 0. This allows us to solve these BIEs efficiently using iterative solvers such as GMRES.Then, using the analogous formulation, in Section 4.2, we develop a new BIE formulation for the Stokes mobility problem.Unlike existing formulations, it remains well-conditioned for slender-body geometries.

Dirichlet boundary value problems
We first consider the Laplace Dirichlet exterior BVP where u 0 ∈ C(∂Ω) is given surface voltage data, and the solution u may be interpreted as an electrostatic potential.The solution exists and is unique [70,Ch. 6].Recall the single-layer kernel G(x, y) = (4π|x − y|) −1 and double-layer kernel ∂G(x, y)/∂n y , which induce the layer representations S L and D L respectively [70].
We will use a "combined field" representation, where η > 0 is a fixed mixing parameter.Then, u satisfies Eqs.(23,24) by construction.Taking the limit as x → ∂Ω, invoking the jump relations, and substituting in Eq. ( 25), we get a second-kind boundary integral equation (BIE) in the unknown σ, namely where S L and D L are the weakly singular (hence compact) boundary integral operators given by restricting the single-and double-layer representations to ∂Ω; note that D L is taken in the principal value sense.We can solve this BIE for σ, and then use Eq. ( 26) to evaluate u in R 3 \ Ω.
A word is needed about the representation choice Eq. ( 26), which is not commonly used for Laplace.A plain single-layer representation u = S L [σ] would lead to a first-kind integral equation, whose discretization is thus ill-conditioned; yet this is sometimes used in 3D low-order settings.A plain double-layer (setting η = 0 above) cannot represent the O(1/r) term associated with non-zero net charge, and furthermore the resulting integral equation, while second-kind, is not uniquely solvable: its operator I/2 + D has a null space of dimension equal to the number of bodies comprising Ω.A common remedy to recover unique solvability is to add a rank-1 operator (per body) to I/2 + D [71, §38] [70, Thm.6.23].Using the representation Eq. ( 26) is an alternative remedy; its proof of unique solvability hinges on uniqueness for the interior Robin BVP when η > 0 (see [58, §4.2] for the trickier R 2 version).It is analogous to the popular combined-field Helmholtz [43, §3.9.4] or "completed" Stokes representations [72,73].In all known prior work η is chosen as an O(1) constant.
Even though Eq. ( 27) is second-kind, with fixed η ≈ 1 we observed that it becomes increasingly illconditioned for slender-body geometries as the fiber radius ε → 0. When solving the discretized BIE using GMRES, for extremely slender geometries, the result is failure to converge even after hundreds of iterations.However, we found that this can be remedied by a specific ε-dependent scaling of η, empirically resulting in uniformly small condition number as ε → 0.
Figure 6: The infinite cylinder of radius ε with a periodic boundary condition with respect to length.The surface of one period is parameterized by (s, θ) ∈ [0, 2π) 2 .For different values of ε, in Fig. 7 we plot the eigenvalues of the Laplace single-layer, double-layer, and the proposed new admixture boundary integral operators.
To understand these issues, we study the spectrum of the boundary integral operators in the subspace of periodic functions on the infinite cylinder (Fig. 6).The cylinder has radius ε, the periodicity is L period = 2π, and the surface is parameterized by s along the length and θ in angle.This is a simple model for a closed filament ignoring center-line curvature, also chosen for a recent slender-body BVP analysis [25].By Figure 7: Plot of the eigenvalues of the Laplace single-layer operator (S L ) on the first row, the exterior double-layer operator I/2 + D L on the second row, and the new admixture operator K L = I/2 + D L + S L /(2ε log ε −1 ) on the third row for infinite periodic cylinder (Fig. 6) with period length L period = 2π and different radii ε = 0.1, 0.01, and 0.001.The eigenvectors are of the form shown in Eq. ( 28), with index k (on the horizontal axis) corresponding to the Fourier mode in s.The condition numbers κ are for discretizations with 20 Fourier modes in both s and θ.For the new admixture formulation, the condition number of the discretized operator K L remains uniformly small as ε → 0. symmetry argument, the eigenfunctions for the Laplace single-and double-layer operators must be all complex exponentials of the form Let λ S kj and λ D kj be the corresponding eigenvalues of the on-surface single-layer operator S L and the on-surface exterior double-layer operator I/2 + D L respectively, i.e., S L f kj = λ S kj f kj , I/2 + D L f kj = λ D kj f kj .Pending an analytic study, we computed these eigenvalues numerically by applying the quadratures described in Section 3 to a single period, and using an approximately periodized Green's function (sufficient accuracy of a few digits was achieved here by naive summation of around 10 4 periodic image sources).The results are plotted in Fig. 7.The eigenvalue λ S 0 0 of the single-layer operator is unbounded (due to the log divergence of a line sum in 3D) and therefore not plotted.For computing the condition number of S L , we assume that the operator acts on the space of mean-zero functions and therefore λ S 0 0 can be neglected.The eigenvalue λ D 0 0 of the exterior double-layer operator I/2 + D L is always zero; therefore, we report the condition number of the latter after removing this null-space by adding a constant-kernel rank-1 update to the operator.Since both the single-and double-layer operators have the same eigenfunctions f kj , so does any linear combination of these operators; the corresponding eigenvalues are the same linear combination of λ S kj and λ D kj .For each k ∈ Z, k ̸ = 0, but staying in the limit |kε| ≪ 1 (s-wavelength much larger than the radius), we observed the eigenvalues λ S kj and λ D kj to be approximately given by Firstly, note that λ S kj → 0 as |j| → ∞, with the same form as the 2D single-layer operator on a radius-ε circle.The combined-field operator of Eq. ( 27) has the eigenvalues λ D kj + ηλ S kj .Since λ D kj ≈ 0.5 for j ̸ = 0, these eigenvalues remain safely bounded away from zero.However, the eigenvalues corresponding to j = 0 are approximately O ε log |kε| −1 and therefore, for fixed η, the condition number blows up as O 1/(ε log ε −1 ) as ε → 0.

Laplace slender combined field integral equation formulation
We propose a new slender-body admixture where the single-layer operator is scaled by η = 1/(2ε log ε −1 ).This choice fixes the problematic j = 0 eigenvalues so that λ D k 0 + ηλ S k 0 ≈ 0.5 for all k ̸ = 0 in Eq. ( 29), yet allows decay towards 0.5 for each other j ̸ = 0.In the exterior of Ω (which may comprise one or many bodies of similar slenderness), the solution u to the Eqs.(23)(24)(25) is thus represented as Taking the exterior limit to ∂Ω and applying the boundary conditions we get the BIE where K L := I/2 + D L + S L /(2ε log ε −1 ).Indeed, for the cylinder in Fig. 7 (last row), we show that the condition number of K L remains small as ε shrinks.This BIE can therefore be solved efficiently using GMRES.

Stokes slender combined field integral equation formulation
Combined field representations with mixing parameter η ≈ 1 are already used for the Stokes exterior Dirichlet BVP, Eqs.(1-4), for which their unique solvability has been proven [72,73].However, inspired by the above Laplace analysis, we propose a new admixture for the slender geometry case.In R 3 \ Ω, the solution u is represented as (recalling the velocity potentials Eqs.(7, 8)), Then, u satisfies the Stokes equations Eqs.(1-3) by construction.Taking the limit to ∂Ω and applying the boundary conditions in Eq. ( 4), we get the second-kind BIE In the top row, the surface geometry is a torus with major radius R = 1 and a fixed minor radius ε.In the bottom row, the minor radius varies from ε to 3ε.Geometries are shown on the right.The surface is discretized using eight 10th-order Chebyshev panels, each with Fourier order of 20.
Note that the varying-radius case demands N GMRES around 8 times larger than the constant-radius case.
where K := I/2 + D + S/(2ε log ε −1 ), and S and D are the (tensor-valued) boundary integral operators, D being taken in the principal value sense.This BIE can be solved efficiently using GMRES, and then Eq. ( 32) can be used to evaluate u in R 3 \ Ω.
To test this claim, in Fig. 8, we return to true toroidal (closed filament) geometries, and track the condition number κ of the operator and the number of GMRES iteration N GMRES required to solve the combined field integral equation for different values of the single-layer parameter η > 0. We clearly observe minima in both the condition number and the number of GMRES iterations around η = 1/(2ε log ε −1 ), matching our Laplace analysis.We show similar results for more complicated geometries in Section 5.2, for both Laplace and Stokes Dirichlet BVPs, where GMRES converged in a few tens of iterations, instead of the hundreds of iterations for the standard η = 1 admixture formulation.

Stokes mobility problem
We turn to the Stokes mobility problem stated in Section 2. We first present a standard "completed doublelayer" BIE formulation [13] with a slight variant of the completion flow.The fluid velocity is represented in terms of the Stokes double-and single-layer potentials as, where σ is an unknown vector density field on ∂Ω, and S[ν] is a "completion flow".Some form of completion flow is necessary because the double-layer potential by itself can only represent flows with zero net force and zero net torque on each closed surface.In the literature, completion flows using an interior point stokeslet and rotlet dominate [74,13,12,14], although in a slender body they would behave poorly since they would give surface data that is only smooth on the ε scale.The latter motivated the use of a line source completion flow [55].Instead, we propose a single-layer potential representation of the completion flow, with density ν given by the restriction to each surface of a certain rigid-body velocity field, Here α b ∈ R 3 and β b ∈ R 3 are chosen such that ∂Ω b ν(x) dS x = F b , the given force, and ∂Ω b ν(x) × (x − x c b ) dS x = T b , the given torque.This guarantees that the representation Eq. ( 34) has the required body forces and torques.For each body independently, such a pair (α b , β b ) is found by solving a simple 6 × 6 linear system whose entries are approximated using the surface quadrature used in the Nystr öm method.This form is convenient since it needs only existing BIE quadratures for S, and the completion flow data remains smooth with respect to s and θ as ε → 0.
By construction, u in Eq. ( 34) satisfies the Stokes equations Eqs.(1-3).Taking the limit of Eq. ( 34) to ∂Ω, applying the boundary conditions Eqs. (5,6), and rearranging the terms, we get, where both terms on the right are known (recall u s is the given slip velocity on ∂Ω).On the left, σ is an unknown density and V is an unknown rigid body motion of the form Eq. ( 5).For B rigid bodies, let V be the 6B dimensional space of all rigid body motions on ∂Ω b , b = 1, . . ., B (i.e., it contains functions of the form , and let {v 1 , . . ., v 6B } be an orthonormal basis of V in L 2 (∂Ω).Then (as in [59]), is the orthogonal projector onto the subspace of rigid body motions in L 2 (∂Ω).We will need the following well known fact that V is in the null space of the Stokes exterior double-layer operator [12,Ch. 16,Thm. 4]; we include a concise proof in Appendix A.
Therefore, the solution σ to Eq. ( 36) is not unique and is determined only up to a vector in V. Yet, we can use the extra degrees of freedom in σ and the fact that V ∈ V to represent V in terms of σ as, Substituting in Eq. (36) gives us the following BIE formulation, The boundary integral operator on the left side of Eq. ( 39) is Fredholm and invertible; this follows by Riesz-Fredholm theory [70, since it is the adjoint of the injective Fredholm operator arising in the interior traction mobility formulation [59,Lem. 6].Finally, after solving Eq. ( 39) for σ, we can recover the rigid body motion V from Eq. ( 38).An issue with this formulation is that, as shown in Fig. 7 for the Laplace case where κ(I/2 + D L + 11 T ) = O ε −2 / log ε −1 , the Stokes exterior double-layer operator I/2 + D is extremely ill-conditioned for slenderbody geometries.The cause-eigenvalue clustering around zero-is empirically the same.Therefore, solving the BIE formulation in Eq. ( 39) with GMRES requires unreasonably large numbers of iterations.

Stokes slender mobility combined field integral equation formulation We now present a new BIE formulation for the Stokes mobility problem which gives well-conditioned discretizations for slender-body geometries. The representation is
Notice that this combines two features: i) it replaces the double-layer in the above completed formulation by the same carefully scaled admixture of the single-and double-layer potentials as in Section 4.1, and ii) the density σ is first projected to the space orthogonal to V to remove any additional net force or net torque on any rigid body Ω k due to the new single-layer source.Taking the exterior limit of Eq. ( 40) to ∂Ω and applying the boundary condition in Eq. ( 6), we get, where, as before, K := I/2 + D + S/(2ε log ε −1 ) is the injective combined-field operator, and V is the unknown rigid body velocity.As before, since σ is determined only up to a vector in V, and V ∈ V, we can choose the representation for V in Eq. ( 38) to get the new BIE, The following shows that this proposed BIE is uniquely solvable.Its proof is in Appendix A; note that the sign of the admixture parameter η is crucial.

Theorem 3.
Let Ω ⊂ R 3 be the union of one or more smooth bounded bodies.Let L be the orthogonal projector onto the space of rigid-body motions on ∂Ω, as in Eq. (37).Let K := I/2 + D + ηS be the Stokes exterior combined-field integral operator with parameter η > 0 on ∂Ω.Then has a unique solution σ for any right-hand side f ∈ C(∂Ω) 3 .
After discretizing (42) and solving the linear system for σ, we can recover the rigid body motion using Eq.(38).We expect the left-hand side operator in Eq. ( 42) to have a small condition number, comparable to the condition number of operator K. Therefore, it should converge rapidly when solved using iterative solvers like GMRES, uniformly as ε → 0. This is confirmed through the numerical experiments presented next.

Numerical results
We present numerical results to validate our method and demonstrate its efficiency.We first compare the accuracy and computational cost of our method with a generic boundary integral code in Section 5.1.In Section 5.2, we solve Laplace and Stokes boundary value problems on a slender-body geometry and in Section 5.3, we show similar results for close-to-touching geometries.In Section 5.4, we compare with slender-body theory (SBT) to show when SBT is a good approximation to the true solution and when it fails to give accurate results.We solve the Stokes mobility problem in Section 5.5 and present parallel scalability results in Section 5.6.

Hardware
All numerical experiments were performed on the Skylake nodes of the Iron cluster at the Flatiron Institute.Each node has two 20-core Intel Xeon Gold 6148 CPUs running at 2.4GHz and 768GB of RAM.All experiments, except those in Section 5.6, were run on a single node (using up to 40 cores).
Software Our software was run on a Linux operating system and compiled using GCC-11.3.0, with the OpenMPI library version 4.0.7, and optimization flags "-O3 -march=native".We linked against the Intel MKL library version 2023.0.0 and the FFTW library version 3.3.10.Most code is in C++, makes use of the first author's SCTL library, and is publicly available at https://github.com/dmalhotra/CSBQ0.5 1.0 Table 2: Results for the Laplace Green's representation test for the geometry of Fig. 9.With increasing mesh refinement (N unknowns), we report the L ∞ -norm of the error in Eq. ( 44) for the slender-body quadrature and for the BIEST code.
For the same accuracy, the quadrature setup time T setup and evaluation time T eval are compared for both methods.

Comparison with a general boundary integral method
We first compare the accuracy and efficiency of our method for Laplace BVPs against BIEST [47,75], a general boundary integral code for smooth surfaces of genus one.BIEST uses a uniform doubly-periodic grid to discretize the toroidal parameterization, with a smooth blending to a polar coordinate transform to compute the singular integral.Unlike our method, BIEST does not require the cross-section to be circular; however, it cannot handle slender geometries efficiently.To do a fair comparison of the two methods, we choose the geometry shown in Fig. 9, which can be handled efficiently by both methods.We use the interior Green's representation theorem [70,Thm. 6.5] to validate the quadrature accuracy.It states that, for a harmonic function u on a bounded domain Ω ⊂ R 3 , recalling G(r) = (4π|r|) −1 , the free-space Green's function for Laplace's equation.Taking the limit as x → ∂Ω with x ∈ Ω and applying jump relations, we have We evaluate the RHS in Eq. ( 44) using the two boundary integral methods and compare it to the reference potential u on ∂Ω.The reference potential u (visualized in Fig. 9) is generated using a unit point charge at x 0 outside of but near to the domain Ω; i.e., u(x) = G(x − x 0 ).
In Table 2, we show convergence in L ∞ -norm with mesh refinement for both methods.We also report the quadrature setup time T setup and evaluation time T eval for each method.For the same accuracy, the setup for slender-body quadrature is up to 40× faster that for BIEST.The evaluation time for both methods is dominated by the far-field computation, which has O N 2 cost.The constant for the slender-body code is about half that of BIEST due to better vectorization.Also note that the slender-body method supports FMM acceleration and should scale as O(N) for larger N; however, the FMM is not advantageous until N is in the tens-of-thousands.3. Left: the solution (visualized on a planar cross-section) corresponds to the potential in the exterior of a thin conducting wire maintained at a constant unit potential.Right: pointwise error magnitudes compared to a reference solution computed to much higher accuracy.10.Here, K is the number of slender body elements, N θ max is the maximum discretization order of the elements in the angular direction, N is the number of unknowns in the boundary integral equation, and ϵ GMRES is the tolerance for the GMRES solve.The solution is computed using the Laplace combined field integral operator (K L = I/2 + D L + ηS L ), and we show results for different values of the mixing parameter η.We report the number of GMRES iterations N GMRES , the quadrature setup time T setup , the setup rate N/T setup and the BIE solve time T solve for the serial implementation on 1 core and in parallel on 40 cores.

Dirichlet boundary value problems on slender geometry
We demonstrate the performance of our method for the slender closed fiber in Fig. 10.Its centerline x c (s) coordinates are Fourier series with iid normal random sine and cosine coefficients over the frequency index range k = 1, . . ., 10, with decaying standard deviations 1/(1 + k/3).We choose a varying ε with typical aspect ratio (circumference to length) of order 10 3 , making it extremely challenging for prior boundary integral methods.We solve a Laplace Dirichlet boundary value problem in the exterior of the domain.The solution u satisfies Laplace's equation ∆u = 0 for all points x ∈ R 3 \ Ω, has boundary condition u| ∂Ω ≡ 1, and decay u(x) → 0 as |x| → ∞.This corresponds to an electrostatic problem where a thin conducting wire loop is maintained at a constant unit potential.We compute the numerical solution by solving the BIE formulation in Eq. ( 31), for an unknown density σ on the surface ∂Ω.Then, using Eq. ( 30), we evaluate the solution u on a uniform 3D grid of dimensions 100 × 100 × 100 enclosing Ω.We estimate the error by comparing it to a reference solution computed to much higher precision on a fine boundary mesh.Fig. 10 shows the potential u and the error in the computed solution on a planar cross-section.
In Table 3, we present results for different boundary mesh resolutions (number of unknowns N), GMRES tolerance (ϵ GMRES ), and mixing parameter η (in K L = I/2 + D L + ηS L ).In each case, we report the number of GMRES iterations in the linear solve (N GMRES ), and the maximum solution error on the grid (∥e∥ ∞ ) compared to the reference solution.We also report the timings for the quadrature setup stage (T setup ) and the solve time (T solve , excluding T setup ) on 1 core and 40 cores (using MPI parallelization) when evaluated directly ("no-FMM" case) and using FMM acceleration.
By varying the mixing factor η we determined the optimal value to be around η = 32; this is similar to the expected 1/(2ε log ε −1 ) ≈ 19 using the lower end 0.005 of the radius range for the geometry.As discussed in Section 4, η affects the condition number of the discretized linear system and therefore the number of GMRES iterations (N GMRES ) required.The top rows of the table shows that a poor choice (such as η = O(1)) not only takes three times longer to solve due to the iteration count, but also has noticeably worse accuracy.
By refining the surface mesh and reducing the GMRES tolerance, we observe convergence to about 12 digits in the L ∞ norm.For 8-digit accuracy, we get quadrature setup rates (N/T setup ) of 17K unknowns/s on 1 core, and 214K unknowns/s on 40 cores.The solve time (T solve ) is dominated by the cost of the far-field evaluation.With N, this cost scales as O N 2 when evaluated directly and as O(N) with FMM acceleration; however, due to the larger constants in the FMM cost, this benefit is not apparent until N is large.With our current implementation, when using the FMM, the single-layer and the double-layer operators have to be evaluated separately.With some modifications, it would be possible to use a single combined-field kernel function (as we already do in the "no-FMM" case).This would reduce T setup and T solve by a factor of two for the FMM accelerated case.4. Left: the streamlines for the flow around the geometry.The flow corresponds to a thin rigid wire dragged through a stationary viscous fluid.Right: pointwise error magnitude compared to a reference solution computed to much higher accuracy.
In Fig. 11 and Table 4, we present corresponding results for a Stokes Dirichlet (resistance) boundary value problem for this same fiber.The solution u satisfies the Stokes equations in the exterior of the wire loop: ∆u − ∇p = 0, ∇ • u = 0 for all points x ∈ R 3 \ Ω, with boundary conditions u| ∂Ω = u 0 where u 0 = (1, 1, 1) is constant, and |u(x)| → 0 as |x| → ∞.This corresponds to a rigid body dragged through a stationary  11.For different surface discretizations (K, N θ ), number of unknowns N, and GMRES tolerance ϵ GMRES , we report the number of iterations N GMRES , the quadrature setup time T setup , the setup rate N/T setup and the total BIE solve time T solve for the serial implementation on 1 core and in parallel on 40 cores.
viscous fluid.We solve the boundary integral equation and evaluate the velocity field u on a 100 × 100 × 100 grid as before.The L ∞ -norm errors on the grid (compared to a reference solution, computed to much higher accuracy) are reported.The quadrature setup time, setup rate and the BIE solve times are again reported for 1 core and 40 cores, with and without using FMM acceleration.For single precision accuracies (about 7 digits), we find setup rate of about 23K unknowns/s on one core and over 255K unknowns/s on 40 cores.

Close-to-touching interactions
In order to demonstrate that our slender BIE quadrature remains accurate even in the presence of close-totouching surfaces, we test the geometry of Fig. 12 with two rings (tori) of unit major radius, minor radius ε = 0.0625, and separation less than ε/20.Note that slender body theory would give meaningless answers in this case, since it already breaks down at separations of order ε (see next section)-here we are an order  12.The error ∥e∥ ∞ converges to about 10 digits as we reduce the GMRES tolerance ϵ GMRES and the quadrature accuracy tolerance ϵ quad .We also report the quadrature setup time T setup and the BIE solve time T solve for the serial case on 1 core and in parallel on 40 cores.
of magnitude closer yet.A true PDE solve is required, and it is difficult to imagine a method other than a boundary integral equation achieving the accuracy that we now exhibit.The mesh is refined adaptively with N = 65K unknowns and a maximum Fourier discretization order N θ max = 88.We solve the Stokes Dirichlet boundary value in the exterior of the rings with boundary conditions u| ∂Ω = u 0 where u 0 = (1, 1, 1).In Table 5, we report maximum error compared to a reference solution computed to much higher accuracy evaluated on a 100 × 100 × 100 grid as before.We observe convergence to about 11 digits in L ∞ -norm as we reduce the quadrature accuracy tolerance ϵ quad and the GMRES tolerance ϵ GMRES .
Most of the unknowns are in the close-touching region, due to panel adaptivity in s along the centerline, and in the angular discretization order N θ for each resulting slender element.The high resulting N θ values for many of the elements makes the quadrature setup relatively expensive, since it scales quadratically with N θ .For 6-digits of accuracy we get a setup rate of 3.9K unknowns/s on 1 core and 29K unknowns/s in parallel on 40 cores.These are about 10 times slower than for the previous (non-close) geometries tested.
Remark 4. The tori geometries and separation in Fig. 12 are similar to those in the recent BIE tests of [52, Sec.6.3] using "hedgehog" quadrature (not designed for slender bodies, nor parallelized), which reported 6 CPU hours at 5-digit accuracy.While a strict comparison is not meaningful, we note that our CSBQ 1-core solve time of 40 seconds at this accuracy is roughly three orders of magnitude faster.

Comparisons with slender body theory
Here we compare numerical solutions based on an SBT asymptotic approximation to converged solutions of the true Stokes Dirichlet BVP.Indeed, the present CSBQ method enables one to quantify errors in using SBT as a numerical tool for rigid body hydrodynamics, which has not yet received numerical study in the close-touching case.(See [55] for a comparison for a helical fiber without close-touching interactions, and [28] for a close-touching flexible loop).The only relevant rigorous error bound is the recent special case of a rigid straight periodic fiber, with regularized SBT kernel, for which a O ε 2 error bound in the L 2 -norm was given for H 2 -regular velocity data [25].
Following [28], we start with a simple case without close-to-touching issues: the axial mobility (sedimentation) of tori of varying ε, shown in Table 6.Due to symmetry, all functions of s are constant and the SBT formulation (see Appendix C; here Kf and ŝT f vanish) degenerates into a single scalar ratio between force and velocity.In Table 6 we show numerical results for rings (tori) of unit radius and varying cross-sectional radius ε, when a unit downward force (0, 0, −1) is applied to them.The true downward velocity (0, 0, −U exact ) of the rings is obtained from the drag coefficients (computed to at least 13 digits using a semi-analytic approach) reported in Table 2 of [28].We compare this with the velocity computed using SBT, and that using our boundary integral (CSBQ) method.The latter exhibits a relative accuracy of at least 12 digits for all values of ε, which serves as an independent validation of CSBQ.The error of SBT agrees extremely well with expectations, namely its leading asymptotic omitted term of O ε 2 log ε −1 [21].We turn to a more interesting case in Fig. 13: the accuracy of numerical SBT for close-touching rigid bodies with relative motion.There are two rings of unit radius, each with cross-sectional radius ε, separated by a minimum surface-to-surface distance δ.The first ring is held stationary while the second ring is translating with a given velocity, and we compute the total drag force on the first ring.Recall that SBT expresses velocity in terms of centerline forces, thus for such a resistance (e.g., rigid body drag) problem one must invert this to solve for centerline force given centerline velocities.This amounts to solving a 1D linear integral equation.Appendix C outlines our high-order Nystr öm method for this, which uses the classical (un-regularized) SBT kernel for self-interactions, and the correction of [21] for interactions between bodies.With CSBQ we resolve the solution to about 10 digits, and use it as the reference solution F BIE .We plot the relative error in the SBT-approximated force for different values of δ and ε.For separations δ ≫ ε (upper part of the plot), the error is well explained by O ε 2 log ε −1 as seen for the single ring in Table 6.For δ = 10ε (see dashed line) we see 2-3 digit accuracy, similar to that found by Mitchell et al. in a flexible case [28].However, the lower-left of the plot indicates about 10% error when δ = ε.Finally, the net force magnitude using SBT is wrong by greater than a factor of two when δ = ε/10.

Stokes mobility problem
In Fig. 14, we show the solution of a mobility problem with two sedimenting rings using the BIE formulation in Eq. (42).Each ring has a major radius of 0.45, and ε = 0.025.The rings are suspended in a Stokesian fluid and a unit downward force is applied to each ring at their center of mass.We use 5 th -order spectral deferred correction (SDC) [76] to evolve in time.We use adaptive time-stepping with an error tolerance of 1e-7 and use the method of [77] to determine the step size.The smallest and largest step sizes are 1.08 and 3.89 respectively.The minimum separation between the rings is 0.0014.In each time step, we adaptively refine (and coarsen) the geometry to a tolerance of 1e-8.In Fig. 15, we show the number of unknowns in the discretization at different points in time.We also show the number of GMRES iterations required to solve the mobility boundary integral equation with and without using a preconditioner, as follows.We precompute the exact (dense direct) inverse of the discretized boundary integral operator for one ring at the finest discretization in a reference orientation.Then, we use this precomputed matrix to construct a block diagonal preconditioner with two diagonal blocks by appropriately rotating and refining the mesh to the reference geometry and back.The reference geometry used to construct this preconditioner contains 44K unknowns.From the plots in Fig. 15, the preconditioner reduces the number of GMRES iterations by fewer than 10 iterations; it is not very effective.

Parallel scalability
We consider the sedimentation flow in Fig. 16 with 512 slender rings, each with aspect ratio about 20.The full simulation took about 2 days of wall-clock time on four nodes (160 cores); most of this time is devoted to the last 10% of simulated time when rings approach each other, N grows, and the time step ∆T drops.We present a strong scaling study in Fig. 17, showing a breakdown of the total CPU time for one time step at T = 6.6 with step size ∆T = 0.16.This one time step of the 5 th -order spectral deferred correction required

Conclusions
We have presented an efficient quadrature scheme (CSBQ) for second-kind boundary integral formulations of rigid 3D slender-body Stokes Dirichlet and mobility BVPs, whose cost is independent of the body aspect ratio ε −1 .We have shown that adaptivity along the centerline and in the angular discretization can achieve close to machine accuracy, even down to lubrication-dominated separations (< ε/10).This convergent scheme contrasts the commonly used asymptotic slender body theory (SBT), which is non-convergent at any ε, has uncontrolled errors, and has O(1) errors for distances of order ε or less.We include the first known study (enabled by CSBQ) of such SBT errors in the setting of close-to-touching rigid bodies.
We have strived for efficiency in implementation, including using precomputed generalized Chebyshev quadratures to accelerate modal and centerline quadratures.Our quadrature setup rate is thus about 20,000 unknowns/sec on a single core.We present and test newly-scaled combined-field formulations for Dirichlet (Laplace and Stokes) and Stokes mobility problems, with condition number bounded independent of ε → 0. We prove that our new (projected) mobility formulation has a unique solution.We expect these combined tools to give an efficient and trustworthy alternative to SBT in simulations of viscous rigid-fiber hydrodynamics.In an HPC distributed-memory parallel FMM-accelerated implementation we show its use when coupled with an iterative solver and high-order time-stepping for rigid multi-body sedimentation problems.

Remark 5 (Collision handling).
In this work we have chosen to present a high-order convergent scheme for what could be called the "mathematical" mobility problem, i.e., integrating velocities and angular velocities given by an accurate Stokes BVP solve, no matter how close the surfaces come.Yet, in 3D, smooth rigid bodies approach exponentially fast in the lubrication limit under constant forces [12, §7.1] [6].The decay time scales as the typical radius of curvature, which is O(ε), so is extremely short.Thus, once an approach starts, miniscule distances of, say, 10 −100 are rapidly reached in the mathematical solution.Of course, due to surface roughness, friction, and molecular effects [6], this model no longer matches any physical experiment.Numerical breakdown (as in Section 5.5) occurs well before this, certainly when machine precision fails to distinguish surface coordinates correctly.Thus, in practical solvers, explicit short-range force pairs are usually added to prevent rigid body collisions [42] (although not always for flexible fibers [29]).Choice of such ad hoc forces is application dependent and beyond the scope of this work.However, we believe that our work will allow correct hydrodynamics to be used with shorter-range collision-avoiding forces than for SBT-based schemes.
There are several other fruitful avenues to extend the presented techniques, including: • Open fibers.Here, the centerline panelization would need to be adjusted to respect possible endpoint singularities in s, when the radius does not have parabolic behavior at endpoints.In informal studies we have found no difficulties with ε-radius hemispherically rounded endpoints, and will report results at a later date.
• Non-circular cross-sections, as in [55,9].The same quadrature rules should work as long as the cross-section curve is not too irregular.In evaluating the θ-integral (Section 3.2.2),we would need to also evaluate the surface coordinates, normals and Jacobians at each quadrature node from a Fourier representation of the cross-section (as we do now for the surface density).
• Flexible fibers.We expect that the ideas presented could accelerate the completed single-layer formulation of the slender-body BVP of [28].or to a second-kind formulation of this (non-classical, angle-averaged) BVP.
• Nonuniform adaptivity in the angular direction.This could increase efficiency somewhat in the lubrication case of separations less than ε.Likewise, the use of generalized Gaussian [78] (as opposed to Chebyshev) quadratures could increase quadrature throughput.
• Extension to the alternative single-layer mobility formulation involving the interior traction BVP [79,80].We believe that the question is open: Does there exist a single-layer mobility formulation that remains well-conditioned for slender bodies as ε → 0 ?
Given the above m-node rule for [a, b], we construct the following m-by-k matrix to which compression will be applied: .
The scaling by the square-root of the quadrature weights means that the dot-product of any two columns of A is equal to the L 2 -inner product of the corresponding functions since b a φ i φ j dx = ∑ m k=1 φ i (x k )φ j (x k )w k .Given a tolerance ϵ, we construct an orthonormal basis for the columns of A, which can approximate any column of A to an accuracy of ϵ in the l 2 -norm.This can be done by computing a singular value decomposition (A = UΣV T ) then truncating the U matrix to the first n columns such that the singular values corresponding to the discarded columns are smaller than ϵ.(A rank-revealing QR decomposition A = U R may similarly be truncated.)This is equivalent to approximating an L 2 ([a, b])-orthonormal basis {u 1 , . . ., u n } for Span {φ 1 , . . ., φ k }, to an accuracy of ϵ, and then discretizing these basis functions using the above adaptive quadrature rule.Therefore the truncated m-by-n matrix is .
Finding stable quadrature nodes and weights We now build an n-point quadrature rule that integrates each of {u 1 , . . ., u n }.We first compute a column pivoted QR decomposition of U T .Each pivot column corresponds to a node in {x 1 , . . ., x m } and we take these n nodes to be our quadrature nodes {y 1 , . . ., y n } ⊂ {x 1 , . . ., x m }.
The quadrature weights ω j are obtained by solving the square "Vandermonde transpose" linear system, n ∑ j=1 u i (y j )ω j = b a u i dx, for all 1 ≤ i ≤ n.
The use of column pivoting insures system has a small condition number and that the quadrature weights can be stably computed [78,Thm. 3.2].In all of the above precomputations we use quad (128-bit real) precision, although double precision is often adequate for ϵ > 10 −10 .Note that further optimization to generalized Gaussian rules with less nodes is possible [78]; however, we favored the above since it is completely automatic and performed well enough.
Here the first term is a local drag term, whereas the second ε-independent term is nonlocal and defined for the self-interaction of the bth fiber (diagonal blocks) by where r := x b (s) − x b (s ′ ) is the displacement vector, and S is the Stokeslet velocity kernel defined beneath Eq. (7).The between-fiber interactions (off-diagonal blocks b ′ ̸ = b) are where the between-fiber displacement is r := x b (s) − x b ′ (s ′ ), and the so-called "doublet"1 kernel [13, (3.3.8)] is 1  2 ∆S(r) = − 1 8π ∇∇|r| −1 = 1 8π I |r| 3 − 3 rr T |r| 5 .To devise a high-order Nystr öm quadrature, the diagonal singularity of the self-interaction kernel must be understood.The two terms in Eq. ( 48) are each divergent like |s − s ′ | −1 , thus neither is integrable, but the periodized arc-distance function d(s, s ′ ) is such that their singularities cancel.It can be shown that for a smooth curve and force density the integrand is smooth apart from a discontinuity at s ′ = s, whose size is related to curvature and to df b /ds [32,Sec. 3.1].For example, for the unit circle in the xy-plane with force density f = (0, 0, f ), the z-component of Eq. ( 48) is 2π 0 ( f (s ′ ) − f (s))/|e is ′ − e is | ds ′ , whose integrand has a jump from − f ′ (s) to f ′ (s) at s ′ = s.
Force and velocity are discretized on the centerline using panel-based Gauss-Legendre quadrature.The centerline parameter is split into intervals {I 1 , . . ., I K }, whose union is [0, L).Let w  By setting s to each of the KN s nodes for centerline b, this defines elements of each offdiagonal block A bb ′ of the overall Nystr öm matrix A. Unlike in Section 3.2, we do not use near-singular corrections, so must push K high enough so that all panels are in each other's far fields.Blocks of self-interaction matrices A bb between different panels are filled similarly using plain Nystr öm quadrature for Eq. ( 48); note here that the 2nd term involving f b (s) subtracts only diagonal entries given by the row-sums of the Nystr öm matrix for (I + ŝ ŝT )/8πd(s, s ′ ).
A special rule is needed to handle the self-interation of each panel in Eq. (48).Two options are product quadratures [32] or auxiliary nodes; for simplicity we choose the latter.(Note that, unlike with algebraic singularities, no special rule is needed for neighboring panels.)Let the panel parameter interval I k be [a, b), and consider a target parameter s ∈ I k .Define auxiliary nodes as the union of N s GL nodes for (a, s) with N s nodes for (s, b), and define corresponding weights.This split at the target point handles the diagonal discontinuity.We interpolate from the original GL nodes on I k onto these auxiliary nodes using a barycentric Lagrange matrix P as in Section 3.2.3.Finally, the local drag term (first term in Eq. ( 47)) simply adds to the diagonal of the Nystr öm matrix.
We fill the 3N × 3N Nystr öm matrix A as above, where N is the total number of nodes on all fibers, thus get the discretization of Eq. (47) as where U , F ∈ R 3N are the vectors of velocities and force densities at all nodes.The resistance problem specifies U , thus we solve (50) for F (the SBT inverse problem [25]), then sum F using the quadrature weights to get the total force (drag) on each body.For convergence to 7 digit accuracy in the tests of Section 5.4, we found K ≥ max 50, 3.8/(δ + 2ε) panels sufficient, requiring up to K ≈ 400.A dense direct solution of (50) was then adequate for this task.This completes the high-order accurate SBT numerical solution.Clearly, many efficiency gains would be possible, but are beyond the needs of this paper.
Remark 6.The classical SBT self-interaction operator K bb : C(γ b ) → C(γ b ) in Eq. ( 48) is unbounded, with negative eigenvalues growing logarithmically in magnitude [22,23], causing the spectrum of the total SBT operator on the right side of Eq. (47) to pass close to zero.This may cause resonance, or even lack of invertibility, but is only numerically relevant when N ≳ ε −1 , i.e., for larger ε values.Our high-order Nystroöm discretization allowed convergence to around 7 digits even though resonances (oscillatory f (s)) were sometimes visible at the largest ε.Regularized versions (either by mollifying the kernel [24] or by deleting a small interval in s ′ about the target point [29]) have been proposed which avoid the resonance problem; yet, since they cause a change in the right side of Eq. (47) of O ε 2 , we do not expect them to give different conclusions in Section 5.4.A comparison of different approaches to regularizing SBT is beyond the scope of this paper.
The documented MATLAB implementation used is found in the SBT directory of the CSBQ repository discussed in Section 5.
The bodies Ω b , b = 1, . . ., B, are embedded in a viscous fluid satisfying the Stokes equations Eqs.(1, 2) with decay at infinity Eq. (3).The b th body has a net force F b and a net torque T b acting about a fiduciary point x c b ∈ R 3 .The bodies undergo rigid body motions with velocity V of the form

Figure 1 :
Figure 1: Left: A slender body is described by a centerline γ parameterized as x c (s), and cross-sectional radius ε(s).The vectors dx c /ds, e 1 (s) and e 2 (s) are orthogonal, with e 1 (s) setting the θ angular origin at each s.Right: The slender body surface Γ (= ∂Ω b for the b th body) is partitioned into surface elements Γ k , k = 1, . . ., K, each discretized by the tensor product of N (k) s Gauss-Legendre "panel" nodes in s with N (k) θ equispaced nodes in θ.
are the values at the surface discretization nodes, w (GL,k) i are the weights for the N (k) s -th order Gauss-Legendre quadrature rule on I k , and 2π/N (k) θ are the equal periodic trapezoidal quadrature rule weights.Evaluating Eq. (12) requires O(N nodes ) work per target point, where N nodes is the number of surface discretization points.When the number of targets is also O(N nodes ), this requires O N 2

Figure 4 :
Figure 4: A slender body element Γ k with a target point x close to it.To evaluate the potential at x, for the outer integral we use a panel based Gauss-Legendre quadrature rule in s.Panels are refined adaptively near the target x.

x
log sin gu lar ity |s − s 0 | −α special quadrature for p(s) log |s − s 0 | + q(s) dyad ically refin ed GL pane l quad ratur e x spe cial qua dra ture rule rep lace s sev eral GL pan els

Figure 5 :
Figure 5: Illustration of our special quadrature rule in s for an on-surface target point x = y(s 0 , θ).The s-integral is singular with the asymptotic forms shown.Top: This integral could be approximated using dyadically refined Gauss-Legendre panel quadrature, plus a special singular quadrature rule for the panels on either side of x.Bottom: For efficiency we replace such a panel quadrature rule by a single generalized Chebyshev quadrature rule with far fewer nodes.

Figure 8 :
Figure 8: Condition numbers κ(K) of the Stokes combined field integral operator K := I/2 + D + ηS (left), and the corresponding number of GMRES iterations for an exterior Stokes Dirichlet BVP solve to ϵ GMRES = 10 −8 (middle), for three slenderness choices, and different scaling parameters η > 0.In the top row, the surface geometry is a torus with major radius R = 1 and a fixed minor radius ε.In the bottom row, the minor radius varies from ε to 3ε.Geometries are shown on the right.The surface is discretized using eight 10th-order Chebyshev panels, each with Fourier order of 20.Note that the varying-radius case demands N GMRES around 8 times larger than the constant-radius case.

Figure 10 :
Figure 10: A closed fiber of length about 58, circular cross-sectional radius ε varying in the range [0.005, 0.015], hence aspect ratio ∼ 10 3 , used for the convergence study in Table3.Left: the solution (visualized on a planar cross-section) corresponds to the potential in the exterior of a thin conducting wire maintained at a constant unit potential.Right: pointwise error magnitudes compared to a reference solution computed to much higher accuracy.

Figure 11 :
Figure 11: Stokes BVP tests for the slender fiber geometry of Fig. 10, with convergence results in Table4.Left: the streamlines for the flow around the geometry.The flow corresponds to a thin rigid wire dragged through a stationary viscous fluid.Right: pointwise error magnitude compared to a reference solution computed to much higher accuracy.

Figure 12 :
Figure 12: Stokes flow around two tori with a narrow separation between them (see plot for parameters).Left: the surface discretization mesh used to solve the BIE for the Stokes Dirichlet BVP.The finest elements (see zoomed view) require N θ = 88 points in the angular direction to resolve the interactions in the gap.Right: streamlines showing the flow around the geometry.

log 10 ∥FFigure 13 :
Figure 13: Left: two tori, each with major radius 1 and minor radius ε, with surface separation δ.The left ring is centered at the origin and has axis (0, 0, 1), while the right one is centered at (2 + 2ε + δ, 0, 0) and has a twisted axis (0, − sin π/3, cos π/3).We compute the total drag force F on the left ring (held stationary) due to the right ring being translated with the velocity (0.8, −0.5, 1) shown by arrows.The colors denote the magnitude of the density σ in the boundary integral solution.Right: contour plot of the relative error in the mutual drag force computed using the slenderbody theory inverse problem (F SBT ), compared to the fully resolved solution (F BIE ) computed using the boundary integral formulation, for different values of ε and δ.Note the O(1) errors for δ ≤ ε.

Figure 14 :Figure 15 :
Figure 14: Two rings in a time-periodic sedimentation flow visualized at different points in time.At each instant, the velocity and angular velocity of each ring is computed by solving the Stokes mobility problem.The solution is resolved to 8-digit accuracy in spatial discretization, quadrature accuracy, and GMRES residual; we use a 5 th -order adaptive spectral deferred correction (SDC) scheme for time-stepping with a relative error tolerance of 10 −7 per unit time.

Figure 16 :
Figure16: Sedimentation flow with 512 rings, each with major radius 0.45 and ε = 0.025.The rings are visualized at times T = 0 (left), 6.6 (center), and 13.2 (right, just before two rings become exponentially close and cause breakdown).The solution is computed using 5 th -order spectral deferred correction (SDC) time-stepping, solved to 7 digits of accuracy in time and 8 digits of accuracy in spatial discretization.The color denotes the magnitude of the velocity on ∂Ω.In Fig.17, we show strong scaling results on 160 CPU cores for one time step of this flow.

K
bb [f b ](s) := L b 0 S(r)f b (s ′ ) − I + ŝ ŝT 8πd(s, s ′ ) f b (s) ds ′ , with d(s, s ′ ) = L π sin π L |s − s ′ | , i = 1, . . ., N s , for GL quadrature on I k .It was sufficient for our experiments to fix K equal-length panels for each body, and fix an order N s = 12.The nodes map to points y(k) b,i = x b (s (k) i ) with force density samples f (k)b,i and velocity samples u . Between-fiber blocks are discretized using plain Nystr öm quadrature, soK bb ′ [f b ′ ](s) ≈

Table 1 :
Index of frequently used symbols.
A surface element Γ k is approximated to polynomial order (number of nodes) N The total number of surface nodes for this body is thus N nodes } are the complete sets of nodes and weights of the adaptive panel quadrature rule in s, while h j are the components of Eq. (17).We use barycentric Lagrange interpolation in s to interpolate from the given N

Table 3 :
Results for a Laplace Dirichlet boundary value problem shown in Fig.

Table 4 :
Results for a Stokes Dirichlet boundary value problem shown in Fig.

Table 5 :
GMRES N GMRES ∥e∥ ∞ T setup (N/T setup ) T solve T setup T solve Convergence for a Stokes Dirichlet boundary value problem for close-to-touching tori as in Fig.
Figure17: Strong scaling results on 4 nodes (160 cores total) for one time step (at T = 6.6) of sedimentation flow in Fig.16with 1.65 million unknowns.We report the cumulative CPU time for different stages of the computation.T setup is the quadrature setup time for building the local quadrature corrections.The quadrature evaluation time T eval is composed of T near for applying the near corrections, and T FMM for the FMM computation.T other is the time for the remaining steps in the mobility solve.18solves of the Stokes mobility problem, each of which required on average 16.5 GMRES iterations.This corresponds to 18 quadrature setups and 297 quadrature evaluations each for the single-and double-layer operators.The computation is overwhelmingly dominated by FMM, comprising 70-83% of the total time.Scaling from 1 core to 160 cores (4 nodes), we get a 34× speedup, thus a parallel efficiency of 21.6%.Here, results on up to 40 cores are for a single node, and the rest are multi-node results using all 40 cores per node.We tuned the number of OpenMP threads and MPI processes per node for best performance.It is preferable to avoid threads accessing data on different NUMA nodes, so it is best to have at least one MPI process per NUMA node (same as a CPU socket in this case); however, too many MPI processes (e.g., pure MPI) makes the load imbalance worse.