Original article
Parallel collision detection of ellipsoids with applications in large scale multibody dynamics

https://doi.org/10.1016/j.matcom.2011.11.005Get rights and content

Abstract

This contribution describes a parallel approach for determining the collision state of a large collection of ellipsoids. Collision detection is required in granular dynamics simulation where it can combine with a differential variational inequality solver or discrete element method to approximate the time evolution of a collection of rigid bodies interacting through frictional contact. The approach proposed is structured on three levels. At the lowest level, the collision information associated with two colliding ellipsoids is obtained as the solution of a two-variable unconstrained optimization problem for which first and second order sensitivity information is derived analytically. Although this optimization approach suffices to resolve the collision problem between any two arbitrary ellipsoids, a less versatile but more efficient approach precedes it to gauge whether two ellipsoids are actually in contact and require the more costly optimization approach. This intermediate level draws on the analytical solution of a 3rd order polynomial obtained from the characteristic equation of two arbitrary ellipsoids. Finally, this intermediate level is invoked by the outer level only when a 3D spatial binning algorithm indicates that two ellipsoids share the same bin (box) and therefore could potentially collide. This multi-level approach is implemented in parallel and when executed on a ubiquitous Graphics Processing Unit (GPU) card scales linearly and yields a two orders of magnitude speedup over a similar algorithm executed on the Central Processing Unit (CPU). The GPU-based ellipsoid contact detection algorithm yields a 14-fold speedup over a CPU-based sphere contact detection algorithm implemented in the third party open source Bullet Physics Library (BPL). The proposed methodology provides the efficiency demanded by granular dynamics applications, which routinely handle scenarios with millions of collision events.

Introduction

Constraints in a multibody system can be classified as bilateral or unilateral. For holonomic systems, the former impose algebraic conditions that must be satisfied by the generalized coordinates used to capture the location and orientation of the rigid bodies in the dynamic system. They are associated with mechanical joints such as spherical, revolute, translational, and are discussed extensively in [13]. Unilateral constraints are associated with non-penetration conditions that arise in frictional contact dynamics when using, for instance, a differential variational inequality (DVI) approach to numerically determine the time evolution of the multibody system [1]. Contact detection is critical for imposing the non-penetration condition, and its efficiency affects the overall performance of the multibody dynamics simulation, which at each integration time step requires the resolution of a contact detection problem.

Collision detection is critical when simulating the dynamics of granular material. Most of these simulations consider the grains as being either polyhedral or spherical. Polyhedral elements are useful for rocks/boulders while spherical elements are typically used for fine grain material such as sand. The main drawback of granular flow modeling with spherical bodies is that grain rolling can dominate the flow mechanism, resulting in low shear resistance. The variable-curvature shape of ellipsoids alters the rolling modes of bodies, which is conjectured to be responsible for differences between granular flow measurements and simulation results. By changing the three radii of ellipsoids independently, a wide range of microstructures can be obtained resulting in different macroscopic granular flow behavior. The use of ellipsoids provides an opportunity to investigate the anisotropic fabric of soil deposits [23]. Additionally, the irregularity in the particle shape causes the mechanical interlocking of the particles, which tends to increase the fluidized bed voidages and minimum fluidization velocities [9], [19], [42]. In spite of the special features associated with the 3D granular flow of ellipsoidal bodies, spherical geometries have usually been preferred for the simulation of large numbers of 3D bodies due to the low computational burden incurred. The current contribution aims at addressing this issue by providing an efficient ellipsoid contact detection algorithm amenable to realistic simulations of large systems of granular material.

This paper is organized as follows. The next subsection provides a brief review of relevant literature. The subsequent three sections concentrate on the ellipsoid collision detection algorithm and its parallelization. Section 5 presents several numerical results that validate the algorithm and illustrate its scalability on the GPU. The last section summarizes findings and directions of future work.

Several types of ellipsoid geometry have been considered to date in collision detection tasks: classical ellipsoids [4], [5], [6], four-arc ellipsoid approximations [17], [20], [30], [36], and super-ellipsoids [4], [7], [38], [40]. In spite of their rather simple geometric representation, the salient attribute of the ellipsoid collision detection problem is its lack of an analytical solution. Typically, the numerical methods used rely on optimization algorithms or the solution of a series of systems of nonlinear equations. A common approach sets out by searching for the nearest pair of points P1 and P2 which satisfies the constraint of parallelism for the normal vectors at these points. The approach was originally posed as a six-variable constrained optimization problem [23] and subsequently extended to super-ellipsoids as an unconstrained optimization problem [38]. Improved performance can be achieved by approximating, when acceptable, the ellipsoids with simpler geometries. For instance, in the four arc ellipsoid approximation, the actual ellipsoid is replaced with a geometry obtained by revolving an ellipse approximated with four circular arcs [17], [20], [30], [36]. This approach results only in axisymmetric objects. In a similar fashion, ellipsoids were represented with clusters of spheres to simulate the flow of soybeans in [43]. Another ellipsoid contact detection approach draws on the mathematical concept of geometric potential whose minimization leads to the desired contact points [23], [33], [34]. In this method the geometric parallelism of the normal vectors at the contact point is only approximately fulfilled, and the approximation improves as the amount of penetration decreases. The penetration volume, contact area, and its shape are not calculated in this approach, which precludes its use in several classes of frictional contact dynamics problems that require these quantities. The geometric potential approach is implemented using a discrete function representation (DFR) in [16], [39]. In DFR, each body's surface is discretized by a number of points subsequently used to check for their inclusion in a potentially colliding body. Essentially, this method minimizes the geometric potential by direct search leading to an inefficient approach that is nonetheless attractive for contact detection of complicated surfaces. Finally, in [29] the contact normal is found by determining the intersection of an elliptic curve on the surface of one ellipsoid with the surface of the other ellipsoid and vice versa. Hence, the boundary of the region containing the contact point is determined. The location of the contact point is found by iterating this procedure. The resulting algorithm requires the solution of a system of nonlinear equations.

The most commonly used broad phase method for collision detection of ellipsoids is based on the contact of bounding spheres; i.e., spheres that completely enclose the space around an object [5]. Bounding boxes or rectangular swept sphere (RSS) volumes can also be used for the contact test [21]. RSS corresponds to a volume covered by a sphere whose center is swept across the surface of a 3D rectangle. As elaborated upon later, the proximity query used herein employs the algebraic criteria proposed specifically for ellipsoids [37]. These criteria are based on the sign of the roots of a fourth order polynomial whose coefficients are computed from the ellipsoids’ radii, positions, and orientations. This method is implemented in [6] and has demonstrated good efficiency in continuous collision detection of ellipsoids.

Since the collision detection procedure pursued herein is intended to be used in large granular dynamics problems, approaches that leverage parallel computing are extremely desirable. This disqualifies several traditional methods, bringing to the forefront approaches which may be considered suboptimal in a sequential execution framework. One of the most widely used collision detection algorithms, discussed in [2], [3], is a method known as “Sort and Sweep”. This approach leverages the properties of Axis Aligned Bounding Boxes (AABBs) to detect collisions between objects of varying size. The basic premise of this algorithm is that if two objects are colliding, their projections/shadows will overlap on all three principal axes. Bounding boxes are generated for all objects and projected onto the global X, Y, and Z-axes. Each axis is then sorted by position and traversed in linear time. A stack-like data structure is used to keep track of collision pairs; a collision will be recorded only if two bodies overlap on all three axes. This approach is simple to implement and is ideally suited for execution on the CPU. At most three parallel threads are used to sort and traverse the axes sequentially and only one thread can be used to compare the three lists to detect collisions, thus limiting its uses on parallel architectures. One salient feature of this algorithm is its ability to take advantage of temporal coherence. Since objects in a dynamics simulation rarely move large distances in one time step (the positions are temporally coherent), updates can be made to the sorted axes easily and with relatively little cost.

Parallel collision detection algorithms became popular with the onset of easily programmable GPUs. Many algorithms that use GPUs rely on shaders to do computations. Shaders are sets of instructions designed to perform calculations on polygons and textures, but they can also be used to perform basic computations. Several collision detection algorithms have been developed using shaders, such as CULLIDE [12], R-CULLIDE [11], and Q-CULLIDE [10]. These algorithms could handle contacts between tens of thousands of polygons at near real-time speed and were faster than their CPU counterparts.

A parallel spatial subdivision algorithm for collision detection of spheres of equal size was introduced in [22]. It relies on the property that a sphere in a given bin in the spatially subdivided space can only interact with spheres in the 26 surrounding bins in a 3 × 3 × 3 grid. The bins are also sized larger than the spheres used. The algorithm presented in [22] was implemented on NVIDIA hardware using the CUDA Software Development Kit [27]. One unique feature, discussed at great length in [22] is the radix sort algorithm. Most collision detection algorithms require a fast sorting algorithm to arrange data into a usable form. Commonly used sorting methods such as quick sort, while fast on CPUs, are difficult to parallelize. Radix sort is not hindered by this drawback, as it is able to sort key-value pairs in parallel extremely efficiently. Additional work in [31] was done to improve the performance and scalability of the sorting algorithm, which was used in the GPU-based spatial subdivision method for collision detection proposed in [25]. The approach therein was designed to only determine sphere-to-sphere contacts and calculate the information needed by a Cone Complementarity Problem (CCP) [32] solver in a multibody dynamics simulation. The approach was successfully tested for contact detection of problems with more than 5 billion spheres and validated on smaller problems against the Bullet Physics Library (BPL) [8]. The ability to handle multi-billion body problems in [25] hinged on the use of GPU-computing. Since the ellipsoid collision detection algorithm proposed herein is designed for parallel execution on the GPU, this hardware platform and its relevance to scientific computing is briefly discussed next.

The GPU was originally designed for intense graphics processing, primarily encountered in video games, which require a dedicated hardware device capable of rendering, at high rates, hundreds of thousands of polygons at every frame. The computations performed are relatively simple and when enough fine grain parallelism can be exposed in an application, the GPU is capable of reaching extremely high FLoating-point OPeration (FLOP) rates. Specifically, the NVIDIA Tesla C2050 has a peak FLOP rate of 1.03 Teraflops in single precision and 0.515 Teraflops in double precision [28]. Comparatively, Intel core i7-965 CPU reaches 51.2 Gigaflops in single precision and 25.6 Gigaflops in double precision [18]. The GPU consists of several Streaming Multiprocessors (SMs). On an NVIDIA Tesla C1060 GPU, currently one of the most widely used GPU cards in scientific computing, each SM is made up of eight cores on which individual threads are processed in parallel. These cores are called Scalar Processors (SPs). Groups of two to three SMs are organized within a Texture Processing Cluster (TPC). The Tesla C1060 GPU contains 240 SP cores organized in 30 SMs grouped into 10 TPCs. This setup can manage simultaneously up to 30 720 parallel threads which the SPs swap between for execution with extremely low overhead due to hardware implemented support [24]. Threads on the GPU can use registers, which, like on the CPU, incur very little overhead. Generally, accessing a register is zero extra clock cycles per instruction, but delays may occur due to register read-after-write conflicts. On the GPU, each multiprocessor, along with having access to global/main memory (4 GB on Tesla C1060), has access to three other types of memory. The constant memory has extremely fast read times for cached data and is ideal for values that do not change during a kernel (function) call. Next, texture memory specializes in fetching (reading) of two dimensional textures; fetches are cached, increasing bandwidth if data is read from the same area. Lastly, shared memory is a smaller amount of extremely fast memory that is unique for each SM and shared by its SPs, unlike texture and constant memory that are global. This deep memory architecture was used to implement the ellipsoid collision detection algorithm described next.

Section snippets

Ellipsoid contact detection (the inner level)

The inner level of the proposed contact detection algorithm relies on the unconstrained optimization problem in [38]. For an efficient solution of this problem, the analytical expression of the Jacobian and Hessian are derived and later used in the minimization process. To this end, the positions (expressed in the global reference frame) of the contact points on two colliding ellipsoids are defined as explicit functions of an arbitrary unit vector. As a result, the parallelism constraint of the

Determining the contact state of two ellipsoids (the intermediate level)

At the inner level of the proposed methodology, the optimization-based contact detection approach outlined in the previous section and called herein the “Optimization-based Method (OpM)”, is computationally expensive. However, it provides all the information needed in a dynamics simulation engine to characterize the frictional contact forces associated with the contact of the two ellipsoids. Note that OpM can be used to gauge the contact state of any two ellipsoids. As illustrated shortly in

Parallel contact detection

The parallel contact detection approach is implemented on the GPU and draws on a spatial subdivision idea [25] modified to accommodate the specifics of the problem at hand. The strategy is to divide the space into bins and perform an exhaustive collision detection per bin that checks all pairs of bodies that intersect the considered bin. This exposes a level of parallelism that is called herein the “outer level”, and which can be controlled through the bin size. An optimal choice of the bin

Algorithm validation: comparison to state of the art

The following subsections briefly describe the validation process for ChM, OpM, and the overall algorithm, which is validated against BPL [8]. BPL is also used to gauge the performance of the parallel CUDA-based [27] ellipsoid collision detection implementation.

Conclusions

Three ideas combine in this paper to lead to a multi-level parallel ellipsoid collision detection algorithm. The inner level of this algorithm solves an optimization problem for which first and second order analytical sensitivity information is used to produce collision state information (normal vector, penetration, contact points, etc.). At an intermediate level, a new approach based on the characteristic equation associated with the relative position and orientation of two ellipsoids takes

Acknowledgments

Financial support for this project was provided by a National Science Foundation grant NSF-CMMI-0840442 and Function Bay, Inc. The authors would like to thank Naresh Khude, Toby Heyn, and Andrew Seidl for feedback provided on an earlier version of this document.

References (43)

  • N. Chakraborty et al.

    Proximity queries between convex objects: an interior point approach for implicit surfaces

    IEEE Transactions on Robotics

    (2008)
  • Y. Choi et al.

    Continuous collision detection for ellipsoids

    IEEE Transactions on Visualization and Computer Graphics

    (2009)
  • P.W. Cleary, N. Stokes, J. Hurley, Efficient collision detection for three dimensional super-ellipsoidal particles, in:...
  • C. Erwin, Physics Simulation Forum, http://www.bulletphysics.com/Bullet/wordpress/, January 15,...
  • N. Govindaraju et al.

    Quick-cullide. Fast inter-and intra-object collision culling using graphics hardware

    Proceedings VR 2005

    IEEE Virtual Reality

    (2005)
  • N. Govindaraju et al.

    Fast and reliable collision culling using graphics hardware

    IEEE Transactions on Visualization and Computer Graphics

    (2006)
  • N. Govindaraju et al.

    CULLIDE. Interactive collision detection between complex models in large environments using graphics hardware

  • E. Haug

    Computer Aided Kinematics and Dynamics of Mechanical Systems

    (1989)
  • T. Heyn, Simulation of Tracked Vehicles on Granular Terrain Leveraging GPU Computing, M.Sc. Thesis, University of...
  • J. Hoberock, N. Bell, Thrust, C++ Template Library for CUDA, http://code.google.com/p/thrust/, July 1,...
  • C. Hogue

    Shape representation and contact detection for discrete element simulations of arbitrary geometries

    Engineering Computations

    (1998)
  • Cited by (21)

    • Contact detection between a small ellipsoid and another quadric

      2022, Computer Aided Geometric Design
      Citation Excerpt :

      Moreover, the three degrees of freedom provided by its three axes allow to approximate many different objects. This justifies that an important part of the literature in this field involves ellipsoids (see, for instance, Jia et al. (2011); Pazouki et al. (2012); Wang et al. (2004, 2001) or Caravantes et al. (2022); Jia et al. (2020) for more recent progress on the treatment of ellipsoids). However, the ellipsoid has positive curvature and the shape of other objects requires the use of other quadric surfaces, for example, hyperboloids with negative curvature.

    • Modeling granular material dynamics and its two-way coupling with moving solid bodies using a continuum representation and the SPH method

      2021, Computer Methods in Applied Mechanics and Engineering
      Citation Excerpt :

      Hence, a fully resolved simulation of a practical granular material flow problem in conjunction with DEM will typically lead to very large degrees of freedom (DOF) counts, which poses both computational and storage challenges. Furthermore, the non-homogeneity of the grains in real-world problems poses another difficulty in the contact model and the contact-detection stage of the DEM, see, for instance, [26]. Continuum models of granular material flow have gained popularity over the last two decades as they have proved capable to address scale limitations in the DEM approach.

    • A benchmark study on accuracy-controlled distance calculation between superellipsoid and superovoid contact geometries

      2017, Mechanism and Machine Theory
      Citation Excerpt :

      An iterative application of the method to find the minimum distance between two ellipsoids is presented, and achieves an error less than 10−6 in 11 iterations, but the method was not benchmarked or compared with other algorithms. Pazouki et al. 2012 [12] propose an efficient and parallelizable method based on ellipsoids for simulating large scale multibody systems, instead of the traditional spheres. The algorithm is divided in three levels.

    • Improving the Scalability of GPU Synchronization Primitives

      2023, IEEE Transactions on Parallel and Distributed Systems
    View all citing articles on Scopus
    View full text