Shape Spaces via Medial Axis Transforms for Segmentation of Complex Geometry in 3D Voxel Data

In this paper we construct a shape space of medial ball representations from given shape training data using methods of Computational Geometry and Statistics. The ultimate goal is to employ the shape space as prior information in supervised segmentation algorithms for complex geometries in 3D voxel data. For this purpose, a novel representation of the shape space (i


Introduction
Segmentation and detection of objects in image data is of major importance in several applications, such as for instance medical imaging, video surveillance and motion tracking. There are basically two different categories of segmentation algorithms, namely supervised and unsupervised methods. Unsupervised algorithms do not make benefit out of object models or statistics on object models; this models will not be studied in this paper. Supervised methods and algorithms on the other hand make use of statistics on object models to be segmented. This is the type of algorithm which we investigate in this paper. Historically, unsupervised segmentation algorithms have been introduced first. However, there the segmentation results are poor if the data is noisy, has little contrast, or if the object is partially occluded. To cope with these difficult situations supervised segmentation has to be used.
In this paper we focus on supervised segmentation methods based on minimization of energy functionals, which have been proven to be very efficient. The shape prior model is a medial ball representation of low dimensionality. In this context it is common to differ between edge based (see [51]) and region based energy functionals (see [41]).
In the following we review some energy minimization segmentation approaches which incorporate shape prior information: In [19] (this work is actually in 2D) contours are represented as simple, closed Bspline curves. The shape prior and statistic are computed as the mean and the principal components of the spline control points of the training shapes. For segmentation a Mumford-Shah like energy functional is supplemented by the Mahalanobis distance to the shape prior, which is calculated from the statistics. This approach is based on the assumption that the shapes are distributed according to a multivariate Gaussian distribution. Using kernel space techniques, a generalization to non-Gaussian distributed shapes is given in [18]. Applications to medical MR image segmentation have been studied in [15].
A level set approach for segmentation with shape priors has been studied in [38]. There shapes are associated with the associated signed distance functions. On the training distance functions of the shapes the actual statistics, which is used in the segmentation algorithm, is calculated. Shape statistics are implemented in a geodesic active contour evolution as maximum a posteriori estimator of the shape. By this approach geometrically and topologically complex shape priors can be considered. However, this representation requires large computational resources, and efficient implementations with narrow bands tend to create inaccuracies and artifacts in the segmentation results. In [23], the work of [38] has been improved concerning efficiency. The same shape prior strategy as in [38] has been used in combination with the unsupervised Chan-Vese energy minimization [13] in [50]. Further generalizations of [38] are given in [45] and [46].
In [16], shape statistical prior information is incorporated in a variational segmentation functional with an additional regularization term. A generalization of this work is presented in [27]. More recently, the paper [12] proposes a variational level set framework that takes into account shape prior information which combines gradient and region based segmentation.
A compact representation of shapes, called discrete m-rep (Medial atom representation) has been proposed in [25] and [43] (see also [47]). A discrete m-rep consists of medial atoms aligned on a regular grid structure, that approximates the medial axis defined by Blum in [9]. Medial atoms consist of a coordinate position in space, a radius and two boundary vectors. The actual surface associated with the m-rep can be constructed using an enfolding B-spline of the surface that interpolates the boundary points and normals defined by the medial atoms. In contrast to discrete m-reps as defined above, continuous m-reps were introduced in [52]. There, the boundary of the object and the medial axis itself are approximated using B-splines. Discrete m-reps can be regarded as elements of a Riemannian shape manifold, which defines distances between m-reps by lengths of geodesics amongst them. An m-rep shape space is suitable for modeling shapes of very similar geometry and topology. Image segmentation with m-reps has been considered in [30,42]. Mumford-Shah energy segmentation on an m-rep shape space has been presented in [17]. A limitation of using m-reps defined on a fixed grid structure is that the geometry of the object to be segmented has to be determined in advance.
Therefore, in this work, we use shapes which are defined via a fixed number of medial balls which are defined by its position and radius only. This shape representation does not require an underlying grid-structure as m-reps do. In fact this flexibility in combination with surface generation using skin surfaces (cf. Section 5.1) allows us to consider objects of complex geometry. To clarify and highlight the notation we give the following comparision of the two different approaches: medial balls: uses (position,radius) [our work] medial atom: uses (position, radius, two boundary vectors) [47] Note that m-reps are defined by medial atoms on a regular grid that approximates the medial axis, while our representation is a collection of medial balls that sample the medial axis. Medial balls are a basic tool in Computational Geometry. Since in our approach we rely on medial balls rather than atoms, it is natural to work with established Computational Geometry algorithms for surface creation rather than algorithms usually applied for m-reps. Such combinations of algorithms -as for instance skin surfaces -are outlined below.
In this paper, we consider the construction of medial ball shape spaces and the computation of statistics of training shapes, which are then used for supervised energy minimization segmentation. We are given training shapes, which are represented as triangular surface meshes. They can either be provided by expert segmentation, or have been obtained from unsupervised segmentation (see e.g. [2,17]). For the latter one can also imagine to use data of other imaging modalities, which have higher contrast or are less affected by noise.
The following scheme, which can as well serve as an outline of our work, is studied:

Preprocessing (Shape Statistics)
• For each mesh a discrete medial axis transform is computed. We refer to c(M ) = (x 1 , . . . , x k ) ∈ R 3×k as the centers of the k medial balls of M , and r 1 , . . . , r k are their radii. It is important for our application that each discrete medial axis transform M j consists of the same number k of medial balls (x i , r i ) ∈ R 3 × R + . The algorithm developed in here is capable to do so, which is a novel aspect in the literature. For sake of simplicity, the discrete medial axis transform computed in this step will be called ball representation according to its definition (Section 2).
• A labeling of the ball representation M j is computed by an EM-algorithm and the Kuhn-Munkres algorithm (Section 3).
• Using Procrustes analysis, a mean ball representation is computed from the labeled ball representations, and a Mahalanobis distance between ball representations is defined. These are the building blocks of our medial ball shape space. (Section 4).

Segmentation
• For segmentation the shape space is implemented as shape prior information in a simplified Mumford-Shah functional. The implied surface of the ball representations is constructed by a skin surface [21] (Section 5).
In Section 6 the full pipeline of statistics and segmentation is applied to synthetic data, and Section 7 concludes the paper.

Medial Axis Transform
In this section we consider the problem of approximating a number of bounded open sets Ω 1 , . . . , Ω n ⊂ R 3 by sets B 1 , . . . , B n of approximate medial balls, where all sets B i have the same cardinality. Stability of the approximation, described below, is of vital importance for the statistical analysis of medial axis representations. Our novel approach for calculating sets of approximate medial balls of the same cardinality for a class of objects combines three well known methods. First we utilize Voronoi diagrams to approximate objects by sets of (approximate) medial balls [7,8]. As these sets will usually have a rather huge cardinality we then use set covering methods to obtain sufficiently small and stable subsets [5,4]. Finally, we use a k-means clustering algorithm [31] to control the cardinality of the approximating sets in order to obtain sets of approximate medial balls of uniform cardinality. We start by reviewing some basic facts of the medial axis transform.

Discrete medial axis transform
The following definitions of the medial axis and medial axis transform are standard and can for instance be found in [10]. Note that the cardinality of the inner Medial Axis Transform is in general infinite. Moreover the medial axis M(Ω) behaves unstable with respect to perturbations of high curvature of ∂Ω. That is, the medial axis might contain branches that are far from being intuitive. The corresponding inner medial axis transform MT in (Ω) expresses this unstable behavior by small balls near the surface of Ω. For these reasons we seek for an approximation of MT in (Ω), which we define next. In the following we assume that the input object Ω is given by a set S Ω of sample points of the boundary ∂Ω of Ω and a triangular mesh T (S Ω ), representing ∂Ω. We first show how to approximate one three-dimensional object Ω by the union of its discrete medial axis transform DMT(Ω) in in a stable way. We start with the well known Voronoi approach [7,8].

Definition 2.3
The Voronoi Diagram of a set of points S Ω in R 3 is a partition of R 3 into (possibly unbounded) convex polyhedral regions, called Voronoi cells, such that each point s i ∈ S Ω has an associated Voronoi cell v( We first compute the Voronoi diagram of S Ω . Then we extract for each sample point s i ∈ S Ω the inner pole point p i , which is the vertex of the Voronoi cell v(s i ) being farthest away from s i and inside Ω. Finally, we construct for each inner pole point p i a so called polar ball B pi,ρi centered at p i with radius ρ i = s i − p i . The set of all polar balls created in this way is the inner discrete medial axis transform DMT(Ω) in , and S Ω is contained in the boundary of its union [7,8]. See Figure 1(a) for an example with more than 10000 balls. In the original approach a dense sampling S Ω of a smooth surface ∂Ω of Ω is required in order to be able to distinguish inner Voronoi vertices from outer ones [7,8].
To overcome this restriction we use, as mentioned above, the triangular surface mesh T (S Ω ) as additional input. This allows us to easily distinguish between inner and outer Voronoi vertices. Thus noise -always present in real-world data sets -and poor sampling quality do not affect the correctness of our inner/outer labeling, and correct operation of this step is ensured.
The centers of DMT(Ω) in are close to M(Ω), see [6] for a quantitative analysis and a precise statement of that fact. This implies that DMT(Ω) in might include small, surface near balls corresponding to unwanted features of MT in (Ω). Moreover, DMT(Ω) in approximates Ω by up to |S Ω | balls. For our purposes the instability and the high cardinality of DMT(Ω) in prevent its direct usage.
To avoid these disadvantages we apply a pruning algorithm to DMT(Ω) in in order to extract a proper subset of DMT(Ω) in . The result will be a stable (we remove surface near balls which result from instability) and compact representation of Ω, cf. Moreover the pruning step will give us control to obtain sets of approximate medial balls of predefined cardinality, see Subsection 2.2 for details. In the following we briefly describe the pruning approach of [4], see there for further details. By the above construction each ball in DMT(Ω) in has four sample points on its boundary but none in its interior. We enlarge each ball of DMT(Ω) in by a sufficiently small constant ε > 0. Then we use a spatial search structure to find for each enlarged ball b i all sample points from S Ω which are now covered by b i (typically tens or even hundreds of sample points are contained). Finally we use a set-covering algorithm to find an (almost) minimal subset DMT(Ω) in * of the enlarged balls whose union covers all sample points S Ω . This set DMT(Ω) in * is the output of the pruning step. Let us stress the fact that the applied set-covering algorithm to find a minimal subset of balls will favor balls that cover a large fraction of S Ω and thus large areas of ∂Ω in order to make the cardinality of DMT(Ω) in * , DMT(Ω) in * , as small as possible. These balls are centered near stable parts of M(Ω). This implies that surface-near balls, which originate from small perturbations of high curvature of ∂Ω (recall the above discussion) are avoided. Thus stability of our approach is significantly improved by the set-covering algorithm, see also [4]. In addition the one sided Hausdorff distance from the union of the obtained compact discrete medial axis transform DMT(Ω) in * to the original object Ω is bounded by O(ε).

Sets of approximate medial balls of uniform cardinality
As described in Section 1 our goal is to obtain sets of approximate medial balls of uniform cardinality for different input objects, Ω i , i = 1, . . . , n. So far we have shown how to compute a stable representation DMT(Ω i ) in * of the enlarged discrete medial axis transform for one fixed object Ω i . We can control |DMT(Ω i ) in * | to some extent by the value ε by which we enlarge the polar balls.
For each single object Ω i , the value ε is chosen such that the resulting set |DMT(Ω i ) in * | is at least as large as the desired uniform cardinality k, that is, k ≤ min . This approach is justified by the fact, that for piecewise linear ∂Ω, the quantity |DMT(Ω i ) in * | tends to infinity for ε → 0. We then apply the kmeans clustering algorithm [31] to the centers of each DMT(Ω i ) in * and get for every set DMT(Ω i ) in * exactly k clusters. For each cluster we choose as an representative ball the one whose center is closest to the center of that cluster. This results in n stable representations of Ω 1 , . . . , Ω n by sets of balls B 1 , . . . , B n with uniform cardinality k.
For the remainder of the paper we will use the term ball representation to refer to the pruned inner discrete medial axis transform of uniform cardinality, denoted by M 1 , . . . , M n .

Labeling of the Ball Representations
In this section we use statistical algorithms for labeling of the ball representations. Given two ball representations M = (x 1 , . . . , x k ; r 1 , . . . , r k ) and M = (x 1 , . . . ,x k ;r 1 , . . . ,r k ), the labeling problem consists in determining a rearrangement of indices such that the two ball representations optimally match. Every matching involves a distance measure between ball representations. In view of the used definition of an Euclidean invariant shape space of ball representations considered below (see Section 4), we consider the following minimization problem for determining the optimal alinement: Here SO (3) is the special orthogonal group of degree 3 (that is the group of rotations in R 3 , which can be represented as unitary matrices with determinant one), S k is the symmetric group of degree k, and · denotes the Euclidean norm in R 3 . The parameter α is a weighting parameter, providing a trade-off between matching of balls and radii.
Since it is difficult to compute global minimizers of (2) efficiently, we use a two-step algorithm from [35], to compute approximate minimizers. This algorithm consists of the following two steps: • An Expectation Maximization (EM) algorithm is used to compute a similarity transformation, i.e., a translation b ∈ R 3 , a rotation A ∈ SO(3), and a scaling ρ > 0, which is close to an optimal one in (2). The algorithm is explained in Subsection 3.1.
• Keeping the similarity transform fixed, the Kuhn-Munkres algorithm is used to compute an optimal labeling π ∈ S k in (2). This algorithm is explained in Subsection 3.2.
The algorithm outlined below also applies to variants of the minimization problem (2), such as For the sake of simplicity of notation we restrict attention to (2) and omit a discussion on general minimization problems.

EM-algorithm to compute the similarity transformation
EM-algorithms have been used successfully for solving a wide range of labeling problems [34]. In general, EM algorithms [40] consist in maximizing a likelihood function, which depends on parameters and hidden variables. There two successive steps are performed iteratively: • An Expectation step, which consists in computation of the expected values of the hidden variables and • a Maximization step, which is to optimize the likelihood function for the parameters.
We propose an algorithm which is closely related to [35]. However, the difference is that we take into account information on the radii of the medial balls in addition.
The key idea of [35] is to compute the optimal similarity transformation of (2) relaxing the constraint π ∈ S k . This allows to apply the EM algorithm.
A mapping π ∈ S n can be represented as a matrix P ∈ {0, 1} k×k with entries The relaxation consists in assuming instead of S k stochastic matrices It is common to interpret P j,i as the probability that j is mapped to i. For medial ball representation this means that the ball ( Furthermore, a distribution p = (p 1 , . . . , p k ) is assumed over {1, . . . , k}. The number p i represents the probability, that an arbitrary element of {1, . . . , k} is mapped to i.

Distances of balls.
Under the condition π(j) = i and for a given similarity transformation (ρ, A, b), we assume that the transformed ball (x j , r j ) is normally distributed around (x i ,r i ), i.e., ρAx j +b is normally distributed with meanx i and ρr j is normally distributed with meanr i . Concerning the radii, this assumption is slightly inaccurate since radii cannot become negative; we overcome this problem by choosing the variance σ sufficiently small, such that the 0.01 quantile is positive. Hence we define (for i, j = 1, . . . , k) The likelihood function for the parameters of the similarity transform (ρ, A, b) and the relaxed labeling (P, p) for given ball representations M andM is then given by the product probability Taking the logarithm on both sides defines the log-likelihood function It is important to note, that if P is a permutation, then p i = 1/k. Therefore in this case, maximizers of (5) are minimizers in (2) and vice versa. Following [35], enlarging the class of labelings from S k to stochastic matrices has little effect on the optimal similarity transform. The advantage of the formulation (5), compared to (2), is, that (5) can be minimized efficiently with an EM-algorithm, where the variables ρ, A, b, p are regarded as parameters, and the entries of P are regarded as hidden variables of the likelihood function l. The EM algorithm for minimization of (5) reads as follows: Step: Compute the expected value of the relaxed permutation matrix P , By this assignment it is guaranteed that over A, ρ, b and p 1 , . . . , p k . Compute the new value of l. end while Compared to different algorithms for minimizing (2) such as Markov Chain Monte Carlo Methods (MCMC), the output of the EM Algorithm is more sensitive to initial values of the parameters. This is no drawback in our case, since the ball representations are often already quite well aligned. However, the provided algorithm is assumed to be more efficient than MCMC.

Labeling
This subsection is concerned with computing an optimal labeling π ∈ S k in (2).
It is instructive to reformulate the labeling problem as a combinatorial optimization problem.
• Let α be as in (2). The weights of the edges between the nodes are defined by Here, (ρ, A, b) is the similarity transform computed in Subsection 3.1.

A permutation
is computed.
For the fixed similarity transform transformation (ρ, A, b), a minimizer π ∈ S k of (2) is a minimizer of (8) and vice versa. For the solution of the matching problem (8) we apply the Kuhn-Munkres algorithm, which is a special case of the primal-dual algorithm in linear programming [3]. According to [49] the complexity of the algorithm is O(k 3 ). In our applications, the high order of complexity of the algorithm is not an issue since it is only applied for medial ball representations of dimension k, which is in the range of 10 to 50.

Medial Ball Shape Space by Procrustes Analysis
In the following we perform a statistics of the ball representations M 1 , . . . , M n and establish a shape space, which we call medial ball shape space. Moreover, a distance between elements of the medial ball shape space is employed to compare them quantitatively.

Elements of the medial ball shape space
A shape is all the geometrical information that remains when location, scale and rotational effects are filtered out from an object (taken from [20, Chapter 1]). We consider this definition when constructing the medial ball shape space as the factor space of ball representations M = (x 1 , . . . , x k ; r 1 , . . . , r k ) modulo similarity transformations. That is In other words, two medial ball representations M = (x 1 , . . . , x k ; r 1 , . . . , r k ) and M = (x 1 , . . . ,x k ;r 1 , . . . ,r k ) represent the same shape (or in other words are equivalent with respect to ∼ ST ) if there exists a similarity transformation (ρ, A, b) such that We emphasize that in the segmentation literature (see e.g. [17,42,38,19]) frequently shape spaces are not defined as factor spaces. As a consequence the according shape space allows for different representations of the same shape.
Below we derive a segmentation model taking into account the original shape concept and coordinates of Kendall [33]. Compared to Bookstein [11], Kendall coordinates do not depend on the choice of a baseline i.e., two medial balls in our case. Therefore statistical analysis like PCA is not distorted. Since it is bulky to define a distance between equivalence classes we systematically choose representatives, for which we define a shape distance. The process is similar to [20,Chapter 4] where mid axis representations M = (x 1 , . . . , x k ) (without radii) have been considered. where The parameter α is the same as in Equation (2). The minimization problem (11) can be solved with an iterative algorithm, which is along the lines of [20, Chapter 5].
2. M is normalized with respect to translation by multiplying the ball centers of M with the (k − 1) × k Helmert submatrix H (see [20, p.34] for the definition). The advantage in using the Helmert matrix for this task (compared to for instance centering the coordinates) is that the arising representation only consists of k − 1 centers and that the Helmert matrices can be computed efficiently inductively. The resulting representation T (M ) is given by 3. T (M ) is normalized with respect to scaling by dividing the coordinates and radii by the Frobenius norm c(T (M )) . The resulting scaled representation is denoted by S(T (M )).

S(T (M ))
is normalized with respect to Rotation by minimizing the Frobenius norm min Rotating S(T (M )) by the optimal A ∈ SO(3) gives , which is the representative of the class. The difference between the projection of vec(M ) and vec(µ) is c(P µ (M )). PCA and Mahalanobis distance are visualized on the right hand side. The data points vary in horizontal direction much more than in vertical direction. The Mahalanobis distance between A and B is smaller than the distance between A and C, which is the opposite measured in the Euclidean distance.
As shown in [33], the optimal rotation A in (14) can be calculated as follows: If (c(S(T (µ))))c(S(T (M ))) T = V ΛU T is the signed singular value decomposition of (c(S(T (µ))))c(S(T (M ))) T , then The mean shape µ and the representative are utilized to compute distances in Σ in the next Subsection 4.2.
We emphasize that in our approach we consider ball representations of the shapes. This should not be confused with m-reps, which have an additional grid structure. In the latter a principal geodesic analysis seems preferable to a PCA (see [47]).

Variability in the medial ball shape space
Standard techniques for capturing the variability of data are principal component analysis (PCA) and the Mahalanobis distance. Historically, these techniques have been defined in Euclidean space. A standard way to transfer these concepts to Riemannian manifolds is to project the data to a tangent space (which is Euclidean) and perform an variability analysis there. Recently, techniques have been established, which consider the whole tangent bundle for parallel transport of tangent vectors along geodesics and perform a principal geodesic analysis, [24]. In our case, the medial ball shape space Σ is a Riemannian manifold with singularities. As is detailed in [48] for shape spaces of point configurations and carries over to our situation, Σ has singularities at equivalence classes of ball representations under translation and scaling, where the rotation group does not act free. These shapes arise from ball representations consisting of balls of the same radius lying on a straight line. We can exclude these ball representations from our further considerations since they are not of interest to us, or apply the simulation of simplicity concept [22] to ball representations M 1 , . . . , M n and avoid this case. For sake of simplicity, we present only details on the projection technique, although the whole shape analysis and segmentation pipeline is generalizable to a nonlinear framework.
The tangent space of Σ at [µ] ∼ ST . For our medial ball shape space Σ, we compute projections of shapes on the tangent space at [µ] ∼ ST . Let M = (x 1 , . . . , x k−1 ; r 1 , . . . , r k ) be a ball representation, in normalized form (15), and vec(M ) = (x T 1 , . . . , x T k−1 ) T ∈ R 3k−3 . Then the orthogonal projection of M to the tangent space of Σ at µ is given by The eigenvectors e 1 , . . . , e d are sorted by the size of the corresponding eigenvalues λ 1 , . . . , λ d and give an orthonormal basis of R d . This basis represents the main directions of variability of the data vectors m 1 , . . . , m n .
A distance which takes into account the variability of the data m 1 , . . . , m n is given by the Mahalanobis distance. This distance is defined by withŨ = (e 1 , . . . , ed) andD the diagonal matrix with entries λ 1 , . . . , λd.
Applying these concepts to ball representations, let The Mahalanobis distance d Σ between M andM in Σ is then given by (with P µ (M ) = (c µ ; r µ ) and P µ (M ) = (c µ ;r µ )) In case that the ball representations contain many balls, a PCA is performed first and a Mahalanobis distance as in (21) is computed. PCA and Mahalanobis distance are illustrated in Figure 2, right part.
As a result of this section, a mean shape and distances between shapes have been defined and this statistics can be used as shape prior in segmentation in the sequel (cf. (32) and (33)). Generally spoken, the Mahalanobis distance is used to make a point cloud uniformly distributed. Therefore one can use quadratic regularization functionals. For a comprehensive work on the problem of defining suitable approximations of shape metrics in image processing we refer to Charpiat et al. [14] where the authors also perform warping of two shapes onto another by gradient descent that minimizes the corresponding distance.

Segmentation
In this section we consider object segmentation in 3D voxel data. Thereby we aim for taking into account shape prior information given in the form that the object to be recovered belongs to the medial ball shape space. The distance on the shape space is given by the Mahalanobis distance (see . The medial ball shape space is computed from boundaries of training objects as discussed in Section 2.
The boundary of a ball representation is defined as the skin surface [21]. Below we give a definition and some properties of skin surfaces which predestine them as boundary surfaces of medial ball representations.
Finally we describe the whole pipeline of the proposed segmentation process.

Skin Surfaces
In this subsection, we review the definition and basic properties of skin surfaces.
Properties of skin surfaces. We employ the concept of a skin surface later on for definition of a surface from a ball representation. We believe that skin surfaces are very suitable for this task, since they have several attractive properties which we discuss in the sequel.
• Information efficiency: A skin surface only requires center points, weights and a shrinking factor for its complete description. Center points and weights are exactly the information available to us after (mean) ball representation computation in Sections 2 and 3. The shrinking factor can be used as a tuning parameter depending on the application. Other popular surface construction methods like NURBS (or other splines), used for instance in [19], or Gregory patches (see [44,17]) suffer from the drawback, that an additional grid structure of the control points is required for surface generation, which is difficult to obtain in an automatic way.
• Computational efficiency: There exists an efficient algorithm for computing skin surfaces (see [37]). For k weighted input points, the algorithm produces a triangular mesh consisting of O(k 2 ) vertices.
• Ball representation fidelity: If M is a ball representation of some surface in R 3 , and skn(M ) is the skin surface defined by M , then M is contained in the ball representation of skn(M ). This follows easily from the definition of the skin surface as a convex hull.
• Economy: A small number of input points can generate complicated skin surfaces [21].
An example of a skin surface, meshed with the algorithm from [37], is given in Figure  3.

Definition of a skin surface.
Basic building blocks of skin surfaces are weighted points p = (x p , w p ) ∈ R 3 × R. Weighted points can for instance represent balls with center x and radius r, setting x p = x and w p = r 2 , and therefore a ball representation can be regarded as a set of weighted points. Addition and scalar multiplication of weighted points p, q are defined by and sp = (sx p , sw p + (s 2 − s)||x p || 2 ) for s > 0.
Here, ·, · and || · || denote the Euclidean scalar product and its induced norm. As usual, but with these algebraic operations, the convex hull of a set of weighted points shrinkage p s of a weighted point p = (x p , w p ) by a factor s ≥ 0 is defined by p s = (x p , sw p ).
Assume now we are given a set P of weighted points p = (x, w) with w > 0. Shrinkage applied to P by s ≥ 0 is defined by P s = {p s | p ∈ P }. Let B w (x) = {y ∈ R 3 | x − y ≤ w} be a ball with radius w and center x. We denote by ∂ denote the boundary of a set in Euclidean space. Let s ≥ 0. Then the s-skin of P is defined by Note that the square root of a negative number is imaginary and a ball with an imaginary radius is the empty set.
In the special case s = 0, the skin surface of p 1 , . . . , p k reduces to the convex hull of x p1 , . . . , x p k , and in case s = 1, the skin surface is the boundary of the union of the input weighted points.
Since weighted points are shrunk by a factor of s ∈ [0, 1] in the construction of the skin surface, the skin surface of a surface represented by medial balls is smaller than the original surface. A way to circumvent this problem is by prescaling the input weights w p1 , . . . , w p k by 1/s. The implied skin surface of a shape. For a fixed ball representation µ, and some given similarity transformation (ρ, A, b), a shape in Σ implies a surface. For its computation, it is necessary to reverse the transformations in Subsection 4.1. To give some details, assume that the shape in Σ is given by its tangent coordinates P µ = (c µ , r µ ). First note that the tangent projection (18) is injective, with inverse mapping tangent coordinates to normalized ball representations. Normalized ball representations are then aligned in R 3 by applying A −1 and scaling with 1/ρ. When reversing translation, note that the Helmert matrix H has a right inverseH ∈ R (k−1)×(k−1) . ApplyingH and translating the result by b, an (aligned) ball representation M consisting of k balls in R 3 is obtained, to which the skin surface is the implied boundary.

Region and edge based segmentation
For segmentation of voxel images, there are essentially two types of segmentation methods. Region based segmentation is applied to images, if the mean intensity of voxels inside the object to segment differs significantly from the mean intensity outside the object. If contrasts are low and objects are only separated by curves, gradient based segmentation is used. Here, we briefly discuss both approaches. Let Ω ⊂ R 3 be a bounded domain, and u : Ω → R an image intensity function.

Region based segmentation.
For a closed surface γ ⊂ Ω, let I(γ) ⊂ Ω be the inner part of γ and O(γ) ⊂ Ω the outer part of γ. The mean values of an image intensity function u on I(γ) and O(γ) are then given by A simplified version of the Mumford-Shah model introduced in [13] consists in minimization of the functional If we assume that the mean intensity of an object in image u differs significantly from the mean intensity of the region outside of this object, the functional I SM S is minimized by a submanifold γ which represents the boundary of the object to segment, and I(γ) is the object itself.

Edge based segmentation.
In edge based segmentation, an object and its closed boundary surface γ are implied by high gradients. In this case, it is common to minimize a functional which penalizes small gradients, as for example the Snakes energy introduced in [32]: Note that edge based energy functionals are usually minimized by active contour methods. They consist of evolving a small initial contour mesh towards the boundary of the object, which is a minimizer of the energy functional. An advantage of active contour methods is that the evolution converges to a minimizer close to the initial contour. However, since the evolution equations only contain local information, but shape prior is a global information, active contour methods cannot be used efficiently in this case.
In order to retain control over local minimizers of (31), we use the term αArea(γ) in (31) instead, thus forcing the contour closer to the initial contour.
In the presence of noisy data that e.g. arises in MRI or ultrasound imaging, minimization of both functionals (30) and (31) over γ is not well-posed. Furthermore, if the object to segment is partially occluded, minimizers of (30) and (31) might not only segment the object, but also parts of the background, and knowledge about the true shape of the object to segment has to be considered.
Here we use the mean ball representation µ of Equation (11) as a priori information and the Mahalanobis distance d Σ as defined in Subsection 4.2, formula (23) to measure the distance between two ball representations. We recall that with a given fixed ball representation µ, an equivalence class [M ] ∼ ST of ball representations can be uniquely represented by tangent coordinates P µ (M ) = (c µ (M ), r µ (M )), see (18). Tangent coordinates and a similarity transformation (ρ, A, b) ∈ R >0 × SO(3) + ×R 3 uniquely determine a skin surface γ = γ(P µ (M ), f ) as defined in Subsection 5.1, formula (28). We obtain the following functionals as a regularization of (30) resp. (31) for some β > 0: Note that the regularization term actually measures the Mahalanobis distance between M and µ, since P µ (µ) = 0.

Results
In this section we discuss the results obtained by applying our approach to test images. A dataset of synthetic torus voxel images was segmented using functional (32) in Section 6.1. Additionally, we conducted an experiment on real-world data by segmenting the human hippocampus in Section 6.2 using functional (33). Both functionals were minimized using the evolutionary algorithm CMA-ES (see [28,36]). This choice is mainly motivated by the fact that the calculation of the gradient for the functional is tough, since it requires finding the derivative with respect to the skin surface, which is given by the coordinates of the medial ball representation. Thus, instead of analytical differentation, numerical differentiation should be used for this purpose. However, the skin surface meshing process we used for surface construction ( [37]) does not guarantee that the resulting polyhedra always have the same structure or even the same number of vertices and faces, as it is for example the case by using spline based interpolation surfaces. Those irregularities in the resulting surface meshes complicate the numerical differentiation process and thus we decided to use gradient-free evolutionary minimization algorithms. We chose the evolutionary algorithm CMA-ES for the minimization of (32), because it has been successfully applied to a similar task in [17], where its superior behavior over two other common evolutionary minimization techniques was demonstrated. For more details on the minimization process, especially also concerning the choice of the minimization parameters, we also refer to [17].

Synthetic Examples
We now consider the segmentation of a synthetically created and corrupted torus voxel image. By this example we demonstrate not only the influence of the shape prior on the segmentation result, but in particular also the capability of the method to segment surfaces of compley geometry, i.e., a torus in this case. We generated 15 binary training voxel images containing elliptic tori with differing radii and positions (see Figure 4). Using the software ITK-SNAP [53] we computed the according input meshes Ω 1 , . . . , Ω 15 which are the actual input data for the shape statistics. Using the surface meshes, Scheme 1.1 has been used to compute a shape space and a Mahalanobis distance in it. Figure 3 shows the mean medial ball representation of the shape space. This shape information was then used to segment a torus corrupted by a large black strip as shown in Figure 5(a). Because high contrasts occur in the images, we chose (32) as the segmentation functional, which we minimized using the CMA-ES. The segmentation algorithm was run with k = 15 balls. The shrink factor s was included in the minimization process. As initial value for s it proved suitable to take high values if long and thin structures were expected and smaller values in case of more compact objects. The results of the segmentation process with different values for the regularization parameter β are shown in Figure 5(b)-(d). As expected, small values of β, i.e., segmentation with low influence of the regularization, results in improper segmentation results ( Figure 5(b)-(c)), while the right choice of β yields a good segmentation, as seen in Figure 5(d).
Note that the slight inaccuracies in the segmentation are a consequence of using skin surfaces as surface generation method, which often results in a surface of ball-like structure by nature. This effect can be minimized by using more medial balls to represent the shape. In principle, the medial ball representation presented above is able to describe complex shapes with bended or flat surfaces and sharp corners provided that the number of medial balls is sufficiently high. It has to be mentioned though that the time complexity of the segmentation substantially rises when using more input balls. This is mainly due to the skin surface meshing algorithm which has to be computed for every new instantiation of a medial ball representation. Minimization using evolutionary optimization algorithms like the CMA-ES involves multiple evaluations of the functional with different models at each step of the segmentation and thus the time complexity for each step rises significantly. However, each evaluation of the functional is performed independently of each other and thus parallelization can be applied which is especially boosting the performance on modern multi-core processor environments.
For the synthetic images above, 15 balls were used to describe the torus shape and the CMA-ES algorithm was applied with λ = 100 functional evaluations at each step. Considering 100 iterations for one segmentation, the average time for one step in this setting was 20 seconds on a standard desktop PC with an Intel Core 2 Q9550 2.83 Ghz Quad Core CPU and 4 GB of RAM. For the hippocampus model with 50 medial balls (cf. Section 6.2 below), an average time of 60 seconds per step was measured.

Real-World Example -Hippocampus Segmentation
In this section we test the presented algorithm on a real-world example from medical imaging. Our object of interest to segment is the human hippocampus. To this end, we use a database of MRI images of the human brain presented in [29] which includes 25 labelled samples of the right and left hippocampi respectively. Using the method described above we constructed a shape space out of the set of right hippocampi for all training images. Here, k = 50 balls were used to model the hippocampus. Afterwards we segmented a test image utilizing the shape prior with the edge based functional (33). This is due to the fact that region based segmentation is not applicable in MRI images where many regions with different contrast exist. To construct a gradient image of the hippocampus, we thresholded the test image according to the mean contrast values gained from the training images and computed a morphological gradient which is shown in Figure 7. The computation of the gradient image thus depends on the given training image data. We want to point out however that the presented methodology is not intended as a fully automatic medical imaging segmentation workflow but could prove to be effective in this setting when used in conjunction with standard medical imaging preprocessing steps.
To evaluate our segmentation results we used two common similarity coefficients in the image segmentation community: Jaccard and Dice's coefficient. Interpreting two binary segmentation images as sets A and B Jaccard and Dice's coefficient, J C and D C , are defined as follows: Additionally, to compare our results against a similar approach based on classical mreps (cf. [30,42]), we manually created a shape space of m-reps for the training images using a mesh of size 5 × 10 , i.e. 50 atoms as with the medial ball model. This shape prior was then used to perform image segmentation with a similar edge-based functional as (33). For a detailed outline of the segmentation algorithm using m-reps and a-priori shape knowledge we refer to the work of [17]. Figure 6 shows the mean medial models constructed for the m-rep and medial ball representation respectively. Figures 8 and 9 show the segmentation results using the m-rep and medial ball model. In Table 1 the similarity coefficients for the resulting segmentations are summarized. Both segmentations yield good results, however the medial ball model is able to outperform the m-rep based method by a small margin. We want to note however, that the m-rep approach does not include automatic shape space generation as it is the case with our method.  [30,42] with and without surface is shown. The bottom row corresponds to the mean medial ball representation of the hippocampus as presented in this work.

Conclusion
In this paper we presented a new approach for segmentation of 3D voxel images taking into account statistical shape information. The algorithm requires minimal user interactions for segmentation which is achieved by implementing a pipeline of algorithms. The pipeline involves the automatic generation of training shapes represented as medial ball representations that were automatically generated from a given set of input meshes. Afterwards the training shapes are labeled, and a shape space based on Procrustes statistics is established. The resulting shape space and the Mahalanobis distance are used as a prior in region-and edge based segmentation algorithms.
We did a proof-of-concept for our algorithm by evaluating its performance on a synthetic dataset of 15 randomly generated torus images. By utilizing a segmentation  Table 1: Segmentation evaluation results for the test image. The suggested medial ball algorithm performs better than the m-reps algorithm with respect to the two distance measures JC and DC . energy which includes statistical regularization using the PCA of the training data, we were able to segment a distorted torus image correctly. Additionally, this example shows the benefit of using only a set of medial balls as object model in combination with skin surfaces as surface generation method, which enables us to segment images with complex geometry.
Additionally, the algorithm was applied to a real-world dataset of human hippocampi in MRI images. A comparison with the segmentation using shape priors on the traditional m-rep model showed that our methodology is able to outperform m-rep based segmentation by a small margin.
The automatic generation of a medial ball object model from a set of training images is another advantage of our algorithm in contrast to the work of Pizer et al. (e.g. [42]), where expert segmentations have to be constructed by using a predefined medial ball model itself.
We emphasize that Siddiqi and Pizer [47] work with medial atoms while we work with medial balls, which allows for more flexibility in the objects to be represented.