A Bayesian Approach to Statistical Shape Analysis via the Projected Normal Distribution

. This work presents a Bayesian predictive approach to statistical shape analysis. A modeling strategy that starts with a Gaussian distribution on the conﬁguration space, and then removes the eﬀects of location, rotation and scale, is studied. This boils down to an application of the projected normal distribution to model the conﬁgurations in the shape space, which together with certain identiﬁability constraints, facilitates parameter interpretation. Having better control over the parameters allows us to generalize the model to a regression setting where the eﬀect of predictors on shapes can be considered. The methodology is illustrated and tested using both simulated scenarios and a real data set concerning eight anatomical landmarks on a sagittal plane of the corpus callosum in patients with autism and in a group of controls.


Introduction
defined a shape as the geometric feature of an object that is invariant to rigid motions and global scaling. There has since been a growing literature devoted to developing both theory and methods for statistical shape analysis. Most approaches borrow traditional ideas from geometry and statistics in order to model and analyze shapes (e.g. Small, 1996), with a more recent trend focusing on techniques for closed curves, e.g. to describe the boundary of an object (Klassen et al., 2004;Srivastava et al., 2005;Kurtek et al., 2012;Cheng et al., 2015). These extensions go back to the original question: On which space should we define the probability models required for shape analysis?
There are at least three spaces on which models can be defined (Dryden and Mardia, 2016): 1) The configuration space, represented by the Euclidean space R mp , where configurations Z 1 , . . . , Z n of m-dimensional shapes are depicted by p landmarks. This is the space where raw objects are represented. 2) The pre-shape space, typically a hypersphere of unit radius in m(p − 1) dimensions, where configurations are represented after translations and scalings have been removed. We denote the configurations in the pre-shape space by Z † 1 , . . . , Z † n . 3) The shape space, where configurations are invariant under location, rotation and scaling. We denote configurations in this space by Z * 1 , . . . , Z * n . One could go from the configuration space to the pre-shape space using the transformation Z † = HZ HZ , where H is the Helmert sub-matrix, and from the configuration space to the shape space via Procrustes Analysis (Gower, 1975;Goodall, 1991).
Models on the pre-shape space are widely used due to the availability of the complex elliptical family of distributions, which have the property of being rotation-invariant (see, e.g. Micheas et al., 2006). Popular members of this family are the complex Bingham distribution (Kent, 1994) and the complex Watson distribution (Mardia and Dryden, 1999), with the former being a conjugate family to the latter. See also Leu and Damien (2014) for a novel Bayesian analysis of the complex Bingham distribution, and Mardia et al. (2013) for a model for the alignment of two unlabeled landmark configurations.
The rotation-invariance property can be seen as a drawback for the statistical identification of the underlying parameters. The lack of identifiability may hinder the correct interpretation of the parameters and make a Bayesian approach difficult. Indeed, the parameters of these distributions are not explicitly related to the corresponding distributional moments; thus, for example, extensions to the regression setting are not straightforward. In order to overcome these drawbacks of shape analysis on the preshape space, we use a statistical model on the configuration space with the similarity transformations "integrated out", i.e. a marginal approach (Dryden and Mardia, 2016).
An alternative way to remove similarity transformations is the use of Bookstein coordinates (Bookstein, 1986), which allows us to use well-known models on the configuration space and then project them onto the shape space. Following this idea, Kume and Welling (2010) proposed a maximum-likelihood estimation procedure; however, they were not concerned with the identifiability of the model. In this paper we undertake a Bayesian analysis of the projected-normal distribution on the shape space which considers the identifiability of the model and facilitates parameter interpretation. We also derive some new properties of the projected normal distribution and extend the model to consider the effect of predictors on shapes.
The rest of the paper is organized as follows. In Section 2 we introduce the projected normal distribution on the shape space and discuss both the data structure and the transformations, including Bookstein's coordinates. Some useful properties of the projected normal distribution are presented in Section 3. Section 4 is concerned with the identifiability analysis. To this end, constraints on the parameter space and a strategy to fix some of the parameters using the observed shape variables are proposed. Based on these constraints and a suitable reparameterization, in Section 5 we develop a Gibbs sampling algorithm for posterior inference. In Section 6, we extend the model to a regression setting. Section 7 illustrates our proposal using simulated scenarios; in particular, a Monte Carlo study allows us to visualize the asymptotic behavior of our density estimator of the projected normal distribution. In Section 8 we fit the model to a data set consisting of eight anatomical landmarks on a sagittal plane of the corpus callosum in patients with autism and in a group of controls. Section 9 provides some concluding remarks.

The projected normal distribution on the shape space
The projected normal distribution (Dryden and Mardia, 1991), is obtained by the projection of a p-dimensional normal distribution onto a manifold M ⊂ R p . A common example is the projection of a bivariate normal distribution on a circle, used for the analysis of directional data (Mardia and Jupp, 2000;Jammalamadaka and SenGupta, 2001). Let us describe the data structure required for shape analysis and the transformations needed to derive the projected normal distribution on the shape space.
Let Z be a p×2 random configuration matrix, and denote by X = (x 1 , y 1 , . . . , x p , y p ) T the 2p-dimensional landmarks vector with elements from Z. We assume X ∼ N 2p (ν, Ω), with N d denoting a d-dimensional Gaussian distribution. We remove the effect of location, rotation and scale from the landmarks using the method proposed by Bookstein (1986).

Location
In order to remove the location effect, we consider the linear transformation of X and This transformation shifts all landmarks, except the first one, by the vector (x 1 , y 1 ). The random vectorX is still normally distributed on R 2p , with meanμ = Lν and covariance matrixΣ = LΩL T . If, in (1), we shifted the first landmark too, then it would be fixed at the location (0, 0) yielding a singular normal distribution. Instead, we marginalize with respect to the first landmark (x 1 , y 1 ) fromX; the corresponding (p − 1) shifted landmarks are denoted by X * = (x * 2 , y * 2 , . . . , x * p , y * p ) T , so X * ∈ R 2(p−1) , where x * i = x i − x 1 and y * i = y i − y 1 . Then X * ∼ N 2(p−1) (μ, Σ), where μ = (μ x2 , μ y2 , . . . , μ xp , μ yp ) T and Σ is the sub-matrix ofΣ corresponding to the last 2(p − 1) coordinates ofX.

Rotation and scale
To remove the effect of rotation and scale, we apply the following transformation: Note that the second landmark remains the same. If the transformations defined for u i and v i (i = 3, . . . , p) were to be applied also to the second landmark, it would move it to the location (1, 0) giving rise to a singular distribution on R 2(p−1) . The vector U = (u 3 , v 3 , . . . , u p , v p ) T is known as the Bookstein coordinates and it is in the shape space. Letting H = (h x , h y ) T and det(J ) = (h 2 x + h 2 y ) p−2 , the distribution of (H, U ) is given by where

Properties of the projected normal distribution
The next result gives conditions for the symmetry of the projected normal distribution.
The parameters affect the shape of the projected normal distribution in different ways. For example, in panel (b), the parameters μ 1 and μ 2 induce asymmetry. This asymmetry affects the tails of the distribution. In panel (c), the parameter Σ 12 induces asymmetry near the mode of the distribution and the tails remain mostly unaffected. Thus, certain choices of the parameters μ and Σ 12 induce different kinds of asymmetry, while the parameter Σ 22 controls dispersion and orientation of the distribution.
We now exhibit an interesting relationship between the projected normal and the Student t distributions.
The proof of Theorem 2 can be found in the Supplementary Material. For p = 3, Theorem 2 shows that U follows a bivariate Student t distribution. In this case, E(U ) = The asymmetry in (b) mainly affects the tails of the distribution while in (c) the region near to the mode is more affected. In (d) a different kind of dispersion combined with asymmetry is illustrated. 0 2 , while Var(U ) is undefined as there are only 2 degrees of freedom. For p ≥ 4, the projected normal distribution with μ = 0 d and Σ = I d is a rescaled version of a standard Student t distribution. Thus, when landmarks follow a normal distribution with μ = 0 d and Σ = I d in the configuration space, the statistical shape analysis is reduced, after the Bookstein transformation, to that of a rescaled multivariate Student t distribution.
The invariance properties of the projected normal distribution make it appealing for statistical shape analysis. However, different sets of parameters (μ, Σ) can generate the same model, leading to lack of parameter identifiability. This is an important issue and we discuss it in some detail in the next section.

Identifiability analysis
The following result states in what sense the distribution (5) is not identifiable.
The proof of Proposition 1 is given in the Supplementary Material. This result states that the projected normal distribution is invariant within the set of rescaled and/or rotated versions of μ and Σ, so that the actual parameter space is the equivalence class where SO(2) denotes the group of rotations in the plane. This clearly complicates the interpretability of the parameters and the comparability between different models.
The distribution on the left-hand side of (7) corresponds to the projection of the random vector X * r = rRX * . If we apply transformation (2) to X * r , the Bookstein coordinates U remain the same; only H is affected by the transformation. Thus the lack of identifiability is related to H = (x * 2 , y * 2 ) T , which are unobserved variables that determine the scale and orientation of the rest of the coordinates of X * . In order to have an identifiable model, we constrain those parameters that control the location and variability of (x * 2 , y * 2 ). To this end, consider the following partitioning of the parameters where the 2×1 vector μ 1 and the 2×2 matrix Σ 11 are fixed and the remaining parameters are estimated. We fix the value of Σ 11 as in Hernandez-Stumpfhauser et al. (2016), where an identifiability constraint is derived for a p-dimensional projected normal distribution with support on the unit hypersphere.
The proof of Proposition 2 is given in the Supplementary Material. We fix the value of μ 1 by means of the procedure described in the following remark.
Remark 1. The value of μ 1 can be fixed at μ T 1 = (x * 3 ,ȳ * 3 ) T , wherex * 3 andȳ * 3 are the sample means of the inverse transformations given byx * 3 =ū 3 x * 2 −v 3 y * 2 andȳ * 3 = u 3 y * 2 +v 3 x * 2 , for values of x * 2 and y * 2 obtained under the constraint Var([x * 3 , y * 3 ]) ≈ I 2 . This result follows from the constraint Σ 11 = I 2 , since any procedure to fix the value of μ 1 based on the observed shape variables must consider the constraint imposed on Σ 11 . The values of x * 2 and y * 2 can be obtained with a simple optimization algorithm. See Algorithm 1 and the corresponding discussion in the Supplementary Material. Once μ 1 and Σ 11 have been fixed, we are in a position to derive a Bayesian approach for inference on the parameters μ 2 , Σ * 22 and γ.

Posterior inference
Based on the reparameterization (9), we have Θ = (μ 2 , Σ * 22 , γ). Posterior inference on Θ from the marginal density (5) is not straightforward, but inferences based on the joint density (3) of (H, U ) are relatively simple provided that H is regarded as a latent variable. Thus, the augmented model has density where, X * = X * (H, U ). The model is completed with the conjugate prior specification To derive a Gibbs sampling algorithm for model (10), first consider a partition of the vector X * = (X * 1 , X * 2 ) T , where the dimensions of X * 1 and X * 2 are 2 × 1 and 2(p − 2) × 1 respectively. Now, using the reparameterization (9) and the identifiability constraints on Σ 11 and μ 1 , the joint density function of (3) can be factorized as Using (12), we can now derive the following Gibbs sampler algorithm.

Extension to shape regression
We now propose an extension of the projected normal distribution to the shape regression setting, where we include the effect of predictors through the parameter μ. Specifically, we let where μ z = (μ, βz) T , β is a 2(p − 2) × q matrix of regression coefficients and z is a q-dimensional vector of predictors. The conjugate prior (11) can be readily extended to this regression setting. The model (14) considers the identifiability constraint on Σ and μ; thus, only the last p − 2 coordinates of μ are affected by the value of the predictor z.
In this case, instead of updating μ 2 in Algorithm 2, we have to update β. The posterior distribution for the j-th column of β is given by

Examples with simulated data
Here we provide two illustrations and a simulation study. The first example illustrates the estimation process without predictors, whereas the second considers the regression model. For the first exercise we simulated random triangles, each described using three landmarks.
We considered two cases: Scenario 1, where random triangles were simulated from a normal distribution in the configuration space with mean μ 1 = (1, −1, −3, 0.5, 3.5, 5) and covariance matrix Σ 1 = 0.2I 6 ; and Scenario 2, where triangles were simulated using a normal distribution with mean μ 2 = μ 1 and covariance matrix Σ 2 = {B 1 , B 2  The difference between these scenarios is the non-null correlation between the x and y coordinates of the landmarks in Scenario 2. Figure 2 shows the landmarks together with the corresponding Bookstein coordinates. The algorithm was coded in R (R Core Team, 2016). Figure 3 shows the true and estimated densities for Scenarios 1 and 2 under vague priors and illustrates the effect of the correlation on the landmarks in Scenario 2, where the density becomes asymmetric and deformed. The density is well estimated with a sample size of 300. To study the impact of a small sample size on the estimation of the projected normal distribution, we performed a Monte Carlo study. Specifically, we simulated from the true densities of Scenarios 1 and 2, using sample sizes ranging from 30 to 300. The Hellinger distance and the Kullback-Leibler divergence between the true and the estimated densities were calculated for each sample. Figure 4 shows how these metrics decrease to zero as the sample size increases.
The second illustration is designed to assess the changes in the projected normal distribution as the predictor varies. It also allows us to explore how the model discriminates between groups. We considered a predictor z with two levels and affecting three landmarks. Specifically, z takes values in the set {1 : sick, 0 : healthy}. Here, μ i = β 0 + β 1 z i , where β 0 = (1, −1, −3, 0.5, 3.5, 5) t and β 1 = (0.2, 0.95, −0.8, 0.95, −2.3, 2.1) t . For each level of the predictor we assumed n 1 = n 2 = 150. We set Σ = {A 1 , A 2 , A 3 }, where Figure 5 shows the true and estimated densities for sick (level 1 of the predictor) and healthy (level 2). Location, shape and orientation, are all affected by the predictor. The estimated densities are very similar to the true ones.
Note that we can classify individuals into sick and healthy, for example based on the posterior probability Figure 2: In Scenario 1, three uncorrelated landmarks in the configuration space were simulated; panel (a). After the Bookstein transformation, using Landmark 1 and 2, the shape information is represented in panel (b). In Scenario 2, the x and y coordinates for each landmark are correlated; panel (c). After the transformation, the shape information is displayed in panel (d).
. Table 2 shows the results of this classification using the prior probabilities π(sick) = π(healthy) = 0.5. Using the simple rule "classify the individual as sick if and only if P(sick | U ) > 0.5", the false-positive rate was 16%, while the false-negative rate was 14.7%. Thus, the shape regression model can be used as a classification tool.

Application
The corpus callosum is a wide flat bundle of neural fibers beneath the cortex of the brain at the longitudinal fissure. Its function is to connect the left and right cerebral hemispheres and facilitates inter-hemispheric communication. Many studies have reported a reduction in the size or volume of the corpus callosum in patients diagnosed with Autism Spectrum Disorder (ASD), see, e.g., Piven et al. (1997), Hardan et al. (2000), Brambilla et al. (2003), Vidal et al. (2006), Hardan et al. (2009) and Prigge et al. (2013). However, only a few of these studies have considered possible differences in the shape of the corpus callosum between patients with autism and healthy individuals. Indeed, potential differences in shape might reveal focal abnormalities helping to pinpoint which cortical regions are involved in the pathophysiology of autism (c.f. Casanova et al., 2011).  With this is mind, we produced a dataset consisting of eight anatomical landmarks that describe the shape of the corpus callosum, both in children diagnosed with autism and in a control group. For our study, the landmarks were located in the sagittal plane view of the corpus callosum using magnetic resonance images (MRI) of the brain. These MRI images were obtained from the public repository ABIDE (Autism Brain Imaging Data Exchange) (Di Martino et al., 2014) Table 2: Classification rates.
The MRI images were downloaded and stored in NifTI format, and then loaded into R using the oro.nifti library (Whitcher et al., 2011). Once in R, the images were explored using the EBImage library (Pau et al., 2010). During this exploration, a trained technician -working under the supervision of a radiologist -selected a sagittal view of the corpus callosum for each patient. Each sagittal view was saved as an image and loaded into the StereoMorph library (Oslen and Westneat, 2015), with the purpose of locating and digitalizing the landmarks. Figure 6 shows the sagittal plane view of the corpus callosum of a typical patient, together with the location of the eight anatomical landmarks. Locating the landmarks can be a difficult task, since there is considerable variability between the patients (see Figure Table 3 shows the sample sizes for each level of disease and sex. The ages of the patients range from 6 to 13 years.
We used landmarks 1 and 2 to perform the Bookstein transformation, and then fitted models that included disease, sex and age as predictors; we also included some interactions. In particular, we considered the following three models: model 1, including the main effects only; model 2, which considers the main effects and the interaction Male Female Total  Autism  153  24  177  Control  145  37  182  Total  298  61  359   Table 3: Sample size for each condition.
between disease and age; and model 3, which includes the main effects and the interaction between sex and disease. In all three models we considered the reference-cell parametrization, i.e. the parameter β 1 represents the baseline given by female patients of the control group; β 2 quantifies the effect of the disease (1 : yes, 0 : no); β 3 corresponds to gender, coded as (1 : male, 0 : female). Additionally, in model 2, β 4 quantifies the effect of the interaction disease×gender; similarly, in model 3, β 4 represents the effect of the interaction sex×disease. In each of these three models we used vague priors on all of the parameters.
For model comparison, we computed the Conditional Predictive Ordinates (CP O i ) (Chen et al., 2000), summarized in the log-pseudo marginal likelihood statistic LPML= n i=1 log (CP 0 i ) (Geisser and Eddy, 1979). The values of the LPML for each model were −1555.39, −1559.167 and −1550.228 for models 1, 2 and 3, respectively. Even though model 3 attained the highest LPML, the 95% credible intervals for the sex-disease interaction coefficients suggest these terms can be removed from the model. Since there does not seem to be an effect of the interaction terms for either of the models 2 and 3, we opted for model 1 which has the second highest LPML. The posterior mean for each of the parameters, together with the corresponding credible intervals for model 1, are shown in Table 4. These results suggest that age does have an effect on the shape of the corpus callosum for landmarks 3 to 8. On the other hand, all the credible intervals for the coefficients of disease and sex include the value zero, suggesting a lack of effect of those variables on the shape of the corpus callosum.
With the aim of visualizing the distributions of some of the landmarks in the shape space, we computed the marginal posterior predictive distributions with model 1. Figure 7 shows these predictive distributions for comparison purposes.
The distributions of cases and controls for females and males all look quite similar, while the predictive distributions for the female-control and male-autism groups show small differences. As expected, larger differences in the shape of the corpus callosum are observed with changes in age. Furthermore, our results agree with those found in He et al. (2010), who used nine anatomical landmarks to compare the shape of the corpus callosum in autistic patients and controls. That said, the results for our data set are not conclusive so as to highlight significant differences in the shape due to the disease. Indeed, He et al. (2010) did not find significant differences in their studies based on landmarks. However, by comparing the boundaries of the corpus callosum after automatic or semi-automatic segmentation in the MRI images, they found significant differences in the distance between the interior genu and the posterior-most section of the corpus callosum. Finally, it is important to mention that our regression model allows us to perform comparisons controlling the effect of covariates such as sex and age. Here, age is an important factor that helps explain the observed differences in the shape of the corpus callosum.

Discussion
Our approach offers a simple way to perform statistical shape analysis. The definition of a probability model in the configuration space and its projection onto the shape space allows us to work with a well-known model in the configuration space. This approach, together with the constraints to attain identifiability, facilitates the Bayesian analysis of the model and the inclusion of predictors. In the particular case of the projection of a standard normal distribution, we have found connections with the Student t distribution that gives us some insight about the tail behavior of the projected normal distribution.
Typically, the parameters of a projected distribution have a different interpretation than in the original space. In our case, μ (representing the mean in the configuration space) becomes an asymmetry parameter in the shape space and also affects the tails of the distribution. Similarly, Σ (the variance-covariance matrix in the configuration space) represents the dispersion in the shape space and also affects the symmetry of the distribution. In our extension to the regression case, we are indexing the parameter μ with the predictors. Thus, the tails of the projected normal distribution are more affected by the changes in the predictors than other aspects of the distribution.
Even though the use of the Bookstein coordinates induces correlation in the shape variables, in our approach we are estimating the full covariance structure; thus, it is not really a problem. The proposed strategy can also be applied to other distributions on the configuration space and their projections onto the shape space. We are currently studying the use of infinite mixture models on the configuration space, which would allow us to model data sets of landmarks with heavy tails or asymmetries in the configuration space.

Supplementary Material
Supplementary Material for 'A Bayesian approach to statistical shape analysis via the projected normal distribution' (DOI: 10.1214/18-BA1113SUPP; .pdf). The online Supplementary Material contains the proofs of Theorems 1 and 2, as well as those of Propositions 1 and 2. It also contains Algorithm 1 from Section 4 and some images relating to the application described in Section 8.