Generalizing the Convex Hull of a Sample: The R Package alphahull

This vignette presents the R package alphahull which implements the α -convex hull and the α -shape of a ﬁnite set of points in the plane. These geometric structures provide an informative overview of the shape and properties of the point set. Unlike the convex hull, the α -convex hull and the α -shape are able to reconstruct non-convex sets. This ﬂexibility make them specially useful in set estimation. Since the implementation is based on the intimate relation of theses constructs with Delaunay triangulations, the R package alphahull also includes functions to compute Voronoi and Delaunay tesselations.


Introduction
The problem of reconstructing a set S from a finite set of points taken into it has been addressed in different fields of research. In computational geometry, for instance, the efficient construction of convex hulls for finite sets of points has important applications in pattern recognition, cluster analysis and image processing, among others. We refer the reader to Preparata and Shamos (1985) for an introduction to computational geometry and its applications. From a probabilistic point of view, the set of points from which we try to reconstruct S is assumed to be non-deterministic. Thus, the term set estimation refers to the statistical problem of estimating an unknown set given a random sample of points whose distribution is closely related to it. Under this perspective, the target S might be, for example, a distribution support, its boundary or a level set. See Cuevas and Fraiman (2009) for a survey of set estimation theory.
The support estimation problem is formally established as the problem of estimating the support of an absolutely continuous probability measure from independent observations drawn from it. The papers by Geffroy (1964), Rényi and Sulanke (1963), and Rényi and Sulanke (1964) are the first works on support estimation. They deal with the convex case. When S is a convex support the natural estimator is the convex hull of the sample. See Schneider (1988) for classical results on the convex hull estimator. However, when S is not convex, the convex hull of the sample is not an appropriate choice. In a more flexible framework, Chevalier (1976) and Devroye and Wise (1980) proposed to estimate the support (without any shape restriction) of an unknown probability measure by means of a smoothed version of the sample. The problem of support estimation is introduced by Devroye and Wise (1980) in connection with a practical application, the detection of abnormal behaviour of a system, plant or machine. We refer to Korostelëv and Tsybakov (1993) for a compilation of the most relevant theoretical results on the performance of such estimator. Anyhow, there are also approaches in-between the two aforementioned. In Rodríguez-Casal (2007), the estimation of an α-convex support is considered. The α-convexity is a condition that affects the shape of the set of interest but which is much more flexible than convexity and therefore, it allows a wider range of applications. The α-convex hull of the sample is the natural estimator when S is α-convex. In this work we discuss the details on the implementation of this estimator.
Set estimation is also related to another interesting problem, the estimation of certain geometric characteristics of the set S such as the surface area (boundary length in R 2 ). There are other statistical fields which also cope with problems regarding set measurements as, for example, the stereology. However, stereology focuses on the estimation of certain geometric characteristics of S without needing to reconstruct the set, see, e.g., Baddeley and Jensen (2005), Cruz-Orive (2001/02), whereas the primary object of interest of set estimation is the set itself. So in our framework, given a random sample of points in S, the solution to the surface area estimation problem consists in defining an estimator that captures the shape of S and then estimating the surface area of S by means of the surface area of the estimator. Bräker and Hsing (1998) studied the asymptotic properties of the boundary length of the convex hull of a random sample of points in R 2 . They obtained the asymptotic normality of the boundary length of the convex hull estimator as well as its convergence rate in mean. In spite of the fact that the results are really significant, they are established on the assumption that the set of interest is convex, which may be too restrictive in practice. As mentioned before, more flexible estimators, such as the α-convex hull, can be considered. The α-shape, introduced by Edelsbrunner, Kirkpatrick, and Seidel (1983), is another geometrical structure that serves to characterize the shape of a set. Its definition is based on a generalization of the convex hull. This vignette presents the R implementation (R Core Team (2022)) of the α-convex hull and α-shape of a random sample of points in the plane with the package alphahull, see Pateiro-López and Rodríguez-Casal (2010).
The document is organized as follows. In Section 2 we introduce some notation and describe the primary estimators under study, the α-convex hull and the α-shape of a random sample of points taken in the set of interest. The details on the implementation in R of the estimators are given in Section 3, along with the comprehensive description of the library alphahull. Section 4 concludes with a discussion of the contributions included in this document.

The α-convex hull
Let S be a nonempty compact subset of the d-dimensional Euclidean space R d , equipped with the norm ∥·∥. Assume that we are given a random sample X n = {X 1 , . . . , X n } from X, where X denotes a random variable in R d with distribution P X and support S. The problem is to find a suitable estimator of S based on the sample. It seems natural that, in order to properly define such estimator, we should impose some geometric restriction on S. As we have already commented, assuming convexity considerably limits the family of sets to estimate. So we focus on a more flexible shape condition, named α-convexity.
We denote byB(x, r) and B(x, r) the open and closed ball with center x and radius r, respectively. Given A ¢ R d , A c and ∂A will denote the complement and boundary of A, respectively.
is called the α-convex hull of A. Therefore, if A is α-convex any point of A c can be separated from A by means of an open ball of radius α. Note that the definition of the α-convex hull given by (1) reminds us of the definition of the convex hull, but replacing the open balls of radius α with half-spaces. Regarding the relation between convexity and α-convexity, it can be proved that, if A is convex and closed, then it is also α-convex for all α > 0. If the interior of the convex hull is not empty then the reciprocal is also true, see Walther (1999).
Using the same ideas as in the convex case, an estimator for an α-convex support can be defined. Assume that S is α-convex for some α > 0. Then it seems reasonable to consider an estimator fulfilling this shape restriction. So, given a sample X n ¢ S, the natural estimator of S is the smallest α-convex set which contains X n , that is, the α-convex hull of the sample, C α (X n ), see Figure 1. What can we say about the performance of C α (X n ) as a support estimator? Like in other contexts, in order to evaluate a set estimator, we need certain measure of the distance between the estimator and the target S. Thus, the performance of a set estimator is usually evaluated through either the Hausdorff distance, d H , or the distance in measure, d µ , where µ stands for the Lebesgue measure. We refer to Edgar (1990) for a discussion on metrics between sets. Rodríguez-Casal (2007) proved that, if S is α-convex and standard with respect to P X , then d H (S, C α (X n )) = O((log n/n) 1/d ) almost surely. A Borel set A is said to be standard with respect to a Borel measure ν if there exists ε 0 > 0 and δ > 0 such that ν(B(x, ε) ∩ A) g δµ (B(x, ε)), for all x ∈ A, 0 < ε f ε 0 . The standardness condition prevents the set S from being too spiky, see Cuevas and Fraiman (1997) for more details. Although the family of α-convex sets is much wider than the family of convex sets, C α (X n ) achieves the same convergence rates as the convex hull of X n in the convex case, see Dümbgen and Walther (1996). Moreover, if S belongs to Serra's regular model, that is, if S is morphologically open and closed with respect to a compact ball of radius α (see Serra (1984)), then d H (S, C α (X n )) = O((log n/n) 2/(d+1) ) almost surely. Again, the α-convex hull is able to achieve the same convergence rate as the convex hull when S belongs to the Serra's regular model. Edelsbrunner et al. (1983) defined in R 2 a similar construct, the λ-hull of a finite set of points in the plane, for an arbitrary λ ∈ R. Following their terminology, C α (X n ) equals the λ-hull of X n for λ = −1/α and it can be computed in time O(n log n) using O(n) space. The algorithm, described in Section 3, is based on the closed relationship that exists between λ-hulls and Delaunay triangulations. The Delaunay triangulation of a finite set of points contains, as subgraphs, various structures that have important applications in pattern recognition, cluster analysis, etc. See Aurenhammer and Klein (2000) for a survey. The α-shape is one of those subgraphs, derived from a straightforward generalization of the convex hull. For α > 0, the α-shape of X n is defined as the straight line graph connecting α-neighbour points. Two points X i , X j ∈ X n are α-neighbour if there exists an open ball of radius α with both points on its boundary and which contains no sample points, see Figure 2. The definition of α-shape can be extended to arbitrary real α and higher dimensions, see Edelsbrunner and Mücke (1994). The α-shape is an approach to formalize the intuitive notion of shape for spatial point sets. The value of the parameter α controls the shape of the estimator. For sufficiently large α, Figure 1: In pink, α-convex hull of a set of points in the plane for some α > 0. the α-shape is identical to the boundary of the convex hull of the sample. As α decreases, the shape shrinks until that, for sufficiently small α, the α-shape is the empty set, see Figure  3. As with the α-convex hull, the α-shape of n points in the plane can be determined in time O(n log n) and space O(n), see Edelsbrunner et al. (1983). The description of the algorithm and the details of its implementation in R are given in Section 3.

Implementation
The R package alphahull consists of three main functions: delvor, ashape, and ahull. The implementation of the α-convex hull and of the α-shape is based on the intimate relation of theses constructs with Delaunay triangulations. The function delvor described in Subsection 3.1 computes the Delaunay triangulation and the Voronoi diagram of a given sample of points in the plane. Based on the information provided by the function delvor, the function ashape described in Subsection 3.2 constructs the α-shape for a given value of α > 0. Finally, the function ahull described in Subsection 3.3 constructs the α-convex hull. The R package alphahull also includes plot functions for the different objects and some other auxiliary functions which are described throughout this section.

Voronoi diagram and Delaunay triangulation
The Voronoi diagram of a finite sample of points X n = {X 1 , . . . , X n } in R 2 is a covering of the plane by n regions V i where, for each i ∈ {1, . . . , n}, the cell V i consists of all points in R 2 which have X i as nearest sample point. That is, We denote the Voronoi Diagram of X n by V D(X n ). The Voronoi cells are closed and convex. Furthermore, V i is unbounded if and only if X i lies on the boundary of the convex hull of X n . Otherwise V i is a nonempty convex polygon. Two sample points X i and X j are said to be Voronoi neighbours if the cells V i and V j share a common point.  A triangulation of X n is a planar graph with vertex set X n and straight line edges that partition into triangles the convex hull of the nodes X n . The Delaunay triangulation of X n , denoted by DT (X n ), is defined as the straight line dual to V D(X n ). That is, there exists a Delaunay edge connecting the sample points X i and X j if and only if they are Voronoi neighbours, see Figure 4. In other words, each circumcentre of a Delaunay triangle coincides with a Voronoi cell vertex. The Delaunay triangulation was introduced by Voronoi (1908) and extended by Delaunay (1934), to whom it owes its name. A complete overview over the existing literature on these geometric constructions can be found in Okabe, Boots, Sugihara, and Chiu (2000). We also refer to the survey by Aurenhammer (1991) for more details on the properties of the Delaunay triangulations and the Voronoi diagrams and their multiple applications.
From the computational viewpoint, efficient methods for the computation of Delaunay triangulations and Voronoi diagrams have been developed. Aurenhammer and Klein (2000) presented a review of algorithms, from the earliest intuitive methods to more efficient representations of these geometric structures. For example, the incremental insertion process by Green and Sibson (1978), the Divide and Conquer method by Shamos and Hoey (1975) or the plane-sweep technique by Fortune (1987) are the basis of a large class of worst-case optimal algorithms for computing the whole Voronoi diagram in the plane. Of course, these techniques can also be applied to the computation of the Delaunay triangulation. See for example the efficient incremental algorithm by Lawson (1977). Both the Voronoi diagram and the Delaunay triangulation of n points can be computed in O(n log n) time and linear space. Furthermore, by the duality between the Voronoi diagram and the Delaunay triangulation, either tessellation is easily obtained from a representation of the other in O(n) time.
Currently, there are several libraries available in R that compute the Delaunay triangulation or the Voronoi diagram of a given set of points. See the packages deldir by Turner (2021) or geometry by Habel, Grasman, Gramacy, Mozharovskyi, and Sterratt (2022), among others. These libraries differ on the implemented algorithms and the data structures that store the information. For example, the package deldir computes the Delaunay triangulation and the Voronoi diagram of a planar point set according to the second iterative algorithm of Lee and Schachter (1980). Unfortunately, this package does not return the kind of data structure we need in order to compute the α-shape and the α-convex hull. The function deldir computes the triangulation of a set of points enclosed in a finite rectangular window. In consequence, the endpoints of the Voronoi edges outside that window are discarded. This fact does not appear to be a problem unless we need to know all the Voronoi edges, as in our case. Note that, in principle, the location of the furthest Voronoi vertex is unknown and enlarging the window size to ensure that the information is complete has a considerable computational cost. Our package alphahull computes the Delaunay triangulation and the Voronoi diagram with respect to the whole plane. For each edge of the Delaunay triangulation the corresponding segment in the Voronoi diagram is obtained by connecting the circumcenters of the two neighbour triangles that share that edge. For those edges of the Delaunay triangulation that lie on the boundary of the convex hull, the corresponding segments in the Voronoi diagram are semi-infinite. We compute the triangulation by invoking the function tri.mesh from package interp, see Gebhardt, Bivand, and Sinclair (2022). The code to compute the corresponding Voronoi diagram is a modified version of the function voronoi.mosaic which is also included in the package interp. The output of the function delvor is a list with three components. The first component, mesh, stores the primal and dual information. For each edge of the Delaunay triangulation mesh contains the indexes ind1, ind2 and the coordinates (x1, y1), (x2, y2) of the sample points that form the edge, the coordinates of the endpoints of the corresponding segment in the Voronoi diagram (mx1, my1), (mx2, my2), and an indicator that takes the value 1 for those endpoints of the Voronoi edges that represent a boundless extreme, that is, bp1 = 1 if (mx1, my1) is a dummy node and the same for bp2. The second component, x, stores the sample points and the third component, tri.obj, stores the information of the Delaunay triangulation obtained from the function tri.mesh. As an illustration, we have applied the function delvor to a given set of n = 5 points. The result is assigned to the object dv and printed out. The plot of the previous Delaunay triangulation and Voronoi diagram is displayed in Figure  5. This graph is produced using the function plot.delvor (S3 method for class 'delvor'). The arguments are the same as those of the function plot.deldir from package deldir.

The α-shape
For the construction of the α-shape of a finite sample of points X n , the package alphahull implements the algorithm by Edelsbrunner et al. (1983), see Table 1.
Algorithm α-shape 1: Construct the Voronoi diagram and Delaunay triangulation of X n . 2: Determine the α-extremes of X n . 3: Determine the α-neighbours of X n . 4: Output the α-shape.  which contains no sample points. Finding the α-extreme points is not a difficult task once the Delaunay triangulation and the Voronoi diagram are determined. Note that, if the sample points X i lies on the convex hull of X n , then X i is α-extreme for all α > 0. If X i does not lie on the convex hull we only need to compute the distances from X i to the vertexes of the Voronoi Finding the α-neighbour points is, however, trickier. Consider an edge of the Delaunay triangulation connecting the sample points X i and X j and its dual edge of the Voronoi diagram. The points X i and X j are α-neighbours for all α satisfying α min f α f α max , where α min and α max are determined from the position of X i and X j with respect to the vertexes of the dual Voronoi edge.
The function ashape, included in the library alphahull, computes the α-shape of a given sample X n for a given value of α > 0. The output of the function ashape is a list with six components. The components x and alpha store the input information whereas the component delvor.obj stores the output object from the function delvor, see Subsection 3.1. The indexes of the α-extremes are given by the component alpha.extremes. The αneighbours connections are given by the first two columns of the matrix edges. The structure of edges is that of matrix mesh returned by the function delvor. Note that the α-shape is a subgraph of the Delaunay triangulation and, therefore, edges is a submatrix of mesh. The length of the α-shape is stored in the component length. The function plot.ashape (S3 method for class 'ashape') produces a plot of the α-shape. Graphic parameters can control the plot appearance. Moreover, the Delaunay triangulation and the Voronoi diagram can be added to the plot by specifying the argument wlines (wlines = "del" for the Delaunay triangulation, wlines = "vor" for the Voronoi diagram or wlines = "both" for both plots). Next, we show an example on how these functions work. We have applied the function ashape to a uniform random sample of size n = 50 on the unit square with α = 0.2. The result is assigned to the object alphashape. > alpha <-0.2 > alphashape <-ashape(x, alpha = alpha) > names(alphashape) [1] "edges" "length" "alpha" "alpha.extremes" [5] "delvor.obj" "x" A plot of the α-shape may be obtained as follows. The result is displayed in Figure 6.

The α-convex hull
Recall the definition of the α-convex hull given in Equation (1). By DeMorgan's law, the complement of C α (X n ) can be written as the union of all open balls of radius α which contain no point of X n , that is, Therefore, in order to compute the complement of the α-convex hull of a sample of points we should identify all those balls of radius α that do contain no point of the sample. Fortunately, the problem simplifies thanks to the following lemma by Edelsbrunner et al. (1983).  Edelsbrunner et al. (1983). The package alphahull implements the algorithm in Table 2.
Algorithm α-convex hull 1: Construct the Voronoi diagram and Delaunay triangulation of X n . 2: Determine the union of open balls and halfplanes that form C α (X n ) c . 3: Output the α-convex hull. The function ahull, included in the library alphahull, computes the α-convex hull of a given On the other hand, the component arcs stores the boundary of the α-convex. Note that ∂C α (X n ) is formed by arcs of balls of radius α (besides possible isolated sample points). These arcs are determined by the intersections of some of the balls that define the complement of the α-convex hull. The extremes of an arc can be written as c + rA θ (v) and c + rA −θ (v) where c and r represent the center and radius of the arc, respectively, and A θ (v) represents the clockwise rotation of angle θ of a unitary vector v, see Figure 8. Thus, the structure of the matrix arcs is as follows. For each arc of the boundary, the first two columns (c1, c2) correspond to the coordinates of the center c. The third column r corresponds to the radius α. The next two columns (v.x, v.y) correspond to the coordinates of the unitary vector v whereas the angle θ is stored in the sixth column theta. Finally, the indices of the end points of each arc are stored in the last two columns, end1 and end2. For isolated points in the boundary of the α-convex hull, columns 3 to 6 of the matrix arcs are equal to zero. The component xahull consists on a 2-column matrix with the coordinates of the original set of points besides possible new end points of the arcs in the boundary of the α-convex hull. The component alpha stores the input information and the component ashape.obj stores the output object from the function ashape (see Subsection 3.2) which is invoked during the computation of the α-convex hull. Finally, the boundary length of the α-convex hull is stored in the component length. The function plot.ahull (S3 method for class 'ahull') produces a plot of the α-convex hull. As with plot.ashape and plot.delvor some graphic parameters can control the plot appearance. The α-shape can be added to the plot by specifying the argument do.shape = TRUE. Moreover, the Delaunay triangulation and the Voronoi diagram can be added to the plot by specifying the argument wlines (wlines = "del" for the Delaunay triangulation, wlines = "vor" for the Voronoi diagram or wlines = "both" for both plots). As an example, we have applied the function ahull to a uniform sample of size n = 200 on the annulus with outer radius 0.5 and inner radius 0.25. The result is assigned to the object alphahull.
> plot (alphahull,do.shape = TRUE,col = c(6,4,rep(1,4)), xlab = "x-coordinate", ylab = Once C α (X n ) c is constructed we can decide whether a given point p ∈ R 2 belongs to the α-convex hull or not, by checking if it belongs to any of the open balls or halfplanes that form the complement of the α-convex hull. The function inahull returns a logical value specifying whether p belongs to C α (X n ) c . The function areahull returns the area of the α-convex hull. The idea behind this function is that, by joining the end points of adjacent arcs in the boundary of the α-convex hull, we can define polygons that help us to determine the area of the estimator.

Conclusions
In this document we have described the implementation in R of the α-convex hull and of the α-shape of a random sample of points in the plane. The package alphahull is the result of that implementation, which is based on efficient algorithms presented by Edelsbrunner et al. (1983). The α-convex hull and the α-shape generalize the notion of convex hull and serve to characterize the shape of an unknown set based on random sample of points taken into it. The results obtained by Rodríguez-Casal (2007) on the behaviour of the α-convex hull estimator give us the theoretical basis for using this geometrical construct as a support estimator.