Analysis of AneuRisk65 data: k -mean alignment

: We describe the k -mean alignment procedure, for the joint alignment and clustering of functional data and we apply it to the analysis of the AneuRisk65 data. Thanks to the eﬃcient separation of the variabil- ity in phase variability and within/between clusters amplitude variability, we are able to discriminate subjects having aneurysms in diﬀerent cerebral districts and identifying diﬀerent morphological shapes of Inner Carotid Arteries, unveiling a strong association between arteries morphologies and the aneurysmal pathology.


k-mean alignment
We here summarize the k-mean alignment procedure that we shall use in Section 2 to analyze the AneuRisk65 data described in Sangalli, Secchi and Vantini (2013). This procedure, introduced in Sangalli et al. (2010), is able to efficiently align and cluster in k groups a set of curves. The procedure can be seen as a continuous alignment with k ≥ 1 templates, or equivalently as a k-mean clustering of curves with warping allowed. In fact, if the number of clusters k is set equal to 1, the algorithm implements the Procrustes aligning procedure described in Sangalli et al. (2009), whereas, if no alignment is allowed, it implements a functional k-mean clustering of curves (see, e.g., Tarpey and Kinateder (2003)).
The described procedure merges the goal of alignment, i.e., decoupling phase and amplitude variability, with the goal of k-mean clustering, i.e., decoupling within and between-cluster amplitude variability. K-mean alignment is also able to disclose clustering structures in the phase, even though this is not one of the stated goals of the procedure. Overall, the technique has the capacity to point out features that can neither be captured by simple k-mean clustering without alignment nor by simple curve alignment without clustering.

A problem-specific definition of phase and amplitude variabilities
Consider a set F of (possibly multidimensional) curves f (s) : R → R d . The choice of the considered space F depends on the data features and on the regulari ty required by the data analysis; typical examples are L 2 spaces or a subset of L 2 composed of curves in L 2 having derivatives up to a given order also in L 2 . Continuously aligning f 1 ∈ F to f 2 ∈ F means finding a warping function h(s) : R → R, of the abscissa parameter s, such that the two curves f 1 •h and f 2 are the most similar or equivalently the least dissimilar (with (f • h)(s) := f (h(s))); see, e.g., Ramsay and Silverman (2005). It is thus necessary to specify a similarity index ρ(·, ·) : F × F → R that translates the concept of similarity between two functional data for the problem under study (or equivalently a dissimilarity index E(·, ·) : F × F → R that measures the dissimilarity between two functional data), and a class H of warping functions h (such that f •h ∈ F , for all f ∈ F and h ∈ H) that indicates which transformations for the abscissa are admissible for the considered problem. Aligning f 1 to f 2 , according to (ρ, H), means finding h * ∈ H that maximizes ρ(f 1 •h, f 2 ) (or equivalently that minimizes E(f 1 •h, f 2 )). The chosen index of similarity/dissimilarity and class of warping functions define univocally what are phase and amplitude variabilities for the problem being analyzed. The choice of the couple (ρ, H), or equivalently (E, H), must hence be problem-specific. The couple similarity/dissimilarity index and class of warping functions must satisfy the following properties, that we deem minimal requirements for coherence: (a) The similarity index ρ is bounded from above, with maximum value equal to 1. Moreover, ρ is Equivalently, the dissimilarity index E is bounded from below, with minimal value equal to 0, and the properties above are suitably reformulated. (b) The class of warping functions H is a convex vector space and has a group structure with respect to function composition •. (c) The index ρ and the class H are coherent in the sense that, if two curves f 1 and f 2 are simultaneously warped by the same warping function h ∈ H, their similarity does not change and the same property can of course be formulated for the dissimilarity E. This guarantees that it is not possible to obtain a fictitious increment of the similarity between two curves f 1 and f 2 by simply warping them simultaneously to f 1 • h and f 2 • h. This property is also referred as the parallel-orbit property or isometry of the action of H (e.g., Wu and Srivastava, 2014).
Together, (b) and (c) imply the following property (d) For all h 1 and h 2 ∈ H, This means that a change in similarity between f 1 and f 2 obtained by warping simultaneously f 1 and f 2 , can also be obtained by warping the sole f 1 or the sole f 2 .
Many indices for measuring similarity (dissimilarity) between functions have been considered in the literature on functional data analysis (see, e.g., Ferraty and Vieu, 2006, for a proficient mathematical introduction to the issue). Table 1 reports some possible choices of couples (dissimilarity index, class of warping functions) that satisfy properties (a)-(d). Equivalent similarity indices can of course be considered. For instance, the normalized Pearson correlation f1,f2 ||f1|| 2 ||f2|| 2 is a similarity index naturally induced by the dissimilarity (semimetric) || f1 ||f1|| − f2 ||f2|| ||, where ·, · denotes the inner product in L 2 and || · || the corresponding norm. A multivariate extension of the similarity 2 || ||, is introduced in Section 2 to analyze the AneuRisk65 data set; see, eq. (1.3). The class of warping functions indicated in Table 1 Examples of coherent couples (dissimilarity index, class of warping functions) satisfying property (c).f denotes the spatial mean of the curve f ; f ′ denotes the first derivative of f ; sign denotes the sign function, i.e., sign( (1.1) and the more general class of diffeomorphisms H diffeomorphism , composed by increasing functions that are smooth and have a smooth inverse. The case where no alignment is performed corresponds to the special case H identity = {h : h(t) = t}.

Aligning to k templates
Consider the problem of aligning and clustering a set of n curves {f 1 , . . . , f n } ⊂ F with respect to a set of k template curves µ = {µ 1 , . . . , µ k } ⊂ F . For each template curve µ j , define its domain of attraction and the labeling function λ(µ, f ) = min{r : f ∈ ∆ r (µ)}. Thus µ λ(µ,f ) indicates the template whose f can be best aligned to and hence λ(µ, f ) indicates the cluster that f should be assigned to.
If the k templates µ = {µ 1 , . . . , µ k } were known, then aligning and clustering the set of n curves {f 1 , . . . , f n } with respect to µ would simply mean assigning f i to the cluster λ(µ, f i ) and aligning it to the corresponding template µ λ(µ,fi) , for i = 1, . . . , n. Here we are interested in the more complex case when the k templates are unknown. Ideally, if our aim is aligning and clustering the set of n curves {f 1 , . . . , f n } with respect to k unknown templates, we should first solve the following optimization problem for any other set of k templates ψ = {ψ 1 , . . . , ψ k } ⊂ F and any other set of n warping functions g = {g 1 , . . . , g n } ⊂ H, and then, for i = 1, . . . , n, The optimization problem (i) describes a search both for the set of optimal k templates, and for the set of optimal n warping functions. Note that the solution (µ, h) to (i) has mean similarity 1 n n i=1 ρ(µ λ(µ,fi) , f i • h i ) equal to 1 if and only if it is possible to perfectly align and cluster in k groups the set of n curves, i.e., if and only if there exists h = {h 1 , . . . , h n } ⊂ H and a partition for all i and j belonging to the same element of P.
The non-linear optimization problem (i) is not analytically solvable in its complete generality. For this reason, in Sangalli et al. (2010) we proposed to simultaneously deal with (i) and (ii) via a k-mean alignment algorithm that iteratively alternates template identification steps, assignment and alignment steps and normalization steps. In the template identification step, we estimate the set of k templates associated to the k clusters identified at the previous assignment and alignment step. The jth template can be identified as the curve µ j , in some set of curves C, that maximizes the total similarity within the jth cluster: ( 1.2) Two choices for the set of curves C are particularly natural: C may coincide with the entire considered functional space F , in which case the templates are the within cluster Frechet templates, or C may coincide with the sample of curves {f 1 , . . . , f n }, in which case the templates coincide with the within cluster medoids, or Karcher templates. In the assignment and alignment step, we align the n curves to the set of the k templates obtained in the previous template identification step, as described above. The k-mean alignment algorithm also considers the problem of non-uniqueness of the solution, by targeting a specific solution via a normalization step. The algorithm is initialized with a set of initial templates, and stopped when, in the assignment and alignment step, the increments of the similarity indices are all lower than a fixed threshold. In many practical situations, as for instance in the case of the AneuRisk65 dataset, the functional data are not available on the entire real axis but observed on arbitrary intervals and thus both the template identification step and the assignment and alignment step have to be carried out in an approximated way. In this case, the similarities between two functions are computed over the intersection of the domains of the two functions; moreover, if the template identification is carried out on the entire space F , the Frechet mean (no longer analytically available) is approximated by a local mean.
Details for the practical implementation of a k-mean alignment procedure are given in Sangalli et al. (2010). The procedure is coded in the fdakma R package downloadable from CRAN Parodi et al. (see 2014).

Shape invariant models
When analyzing the AneuRisk65 data, it makes sense to consider two vessel centrelines to be perfectly aligned if they are identical except for a shift and/or a dilation along the three space coordinates. Because location of the scanned volume and proportions of the skull change across patients (see the data presentation), different shift and/or dilation for each space coordinate must be permitted. For this reason, Sangalli et al. (2009Sangalli et al. ( , 2010, used the following bounded similarity index between two curves f 1 , 3) represents the average of the cosines of the angles between the derivatives of homologous components of f 1 and f 2 , which is a possible multidimensional extension of the univariate similarity index reported at the sixth row of Table 1. The index 1.3 assumes its maximal value 1 when the two curves are identical except for shifts and dilations of homologous components, i.e., This similarity index is coherent, in the sense of property (c), with the classes of warping functions of the abscissa given in eq. (1.1). Here, in particular we shall consider the largest of these classes, i.e., the group of strictly increasing affine transformations. For f = {c 1 , . . . , c d } ∈ F , where c p indicates the pth component of f , for p = 1, . . . , d, assume the existence of µ = {µ 1 , . . . , µ d } ∈ F and of a parameter vector (a 1 , . . . , a d , b 1 , . . . , b d , m, q), with a p ∈ R + and b p ∈ R for p = 1, . . . , d, m ∈ R + , q ∈ R, such that c p (t) = a p µ p (mt + q) + b p for p = 1, . . . , d. (1.6) We shall write f ∈ SIM(µ), since the condition (1.6) means that f falls within a shape invariant model (SIM), with characteristic shape curve µ. For d = 1, SIM models were introduced by Lawton, Sylvestre and Maggio (1972). For further details, see Kneip and Gasser (1988). SIM models are strongly connected with the couple (ρ, H) defined in (1.3) and (1.5). Indeed, (1.7) Note that, thanks to property (d), the roles of f and µ can be swapped. Now, consider a set of n curves {f 1 , . . . , f n } ⊂ F , such that f i ∈ SIM(µ) for all i = 1, . . . , n; then, the following property follows immediately: Because of (1.7), when using the couple (ρ, H) defined in (1.3) and (1.5), it is possible to perfectly align and cluster, in k groups, a set of n curves if there exist k characteristic shape curves, µ 1 , . . . , µ k , such that ∀ i = 1, . . . , n, ∃ l i ∈ {1, . . . , k} : f i ∈ SIM(µ li ).
In this case the optimization problem (i) is solved by setting µ λ(µ,fi) ≡ µ li , and its objective function achieves the maximum total similarity 1.

Theoretical framework
The introduction, in a functional data analysis, of a similarity (derived by a metric E satisfying some specific properties) and of a group H of warping functions, with respect to which the similarity is invariant (i.e., property (d) in Section 1.1), provides a mathematical framework where a sound and not ambiguous definition of phase and amplitude variability can be given. Indeed in this framework, we can prove that the analysis of a continuously-registered functional data set can be re-interpreted as the analysis of a set of suitable equivalence classes associated to unaligned functions and induced by the group of the warping functions. The theoretical investigation for a coherent formalization of the problem of registration, in relation to properties required to the metric E and to the group H of warping functions, is deepened and detailed in Vantini (2012).
The most important required property is the H-invariance of the metric E (Sangalli et al. (2009(Sangalli et al. ( , 2010) which indeed induces property (d) of Section 1.1. H-invariance provides the quotient set made by the orbits induced by the action of H over F with a natural metric δ (dependent on the joint choice of E and H) defined as follows: The introduction of a quotient set provided with a natural metric jointly induced by the original metric E and by the group H enables a not ambiguous definition of Phase Variability and Amplitude Variability. Phase variability is defined as that occuring between functions belonging to the same equivalence class, i.e. the variability within equivalence classes, and the amplitude variability is the one between functions not belonging to the same equivalence class and not imputable to phase variability, i.e. the variability between equivalence classes. Moreover, within this mathematical framework the k-mean alignment algorithm is an implementation of a k-mean clustering algorithm on the n equivalence classes {[f 1 ],[f 2 ], . . . , [f n ]} induced by the action of H over the functional data set { f 1 ,f 2 , . . . , f n } performed by using the naturally defined metric δ.

Decoupling phase and amplitude variability in the AneuRisk65 data by 1-mean alignment
To enable meaningful comparisons across subjects in the AneuRisk study, it is necessary to first efficiently decouple the phase and the amplitude variability. As described in the data presentation, in this application phase variation is mainly due to differences in the dimension of skulls among subjects, whereas the amplitude variation is mainly due to differences in the carotid morphological shapes. We shall in particular explore if the morphological features of these vessels relates with aneurysms presence and location, contrasting subjects in the Upper group and subjects in the Lower and No-aneurysm groups, often joined in a unique "Lower-No" group, as indicated in the data description. First derivatives of the vessel centrelines have been aligned by the 1-mean alignment, using the similarity index (1.3) and the class of warping functions (1.5). The variability captured by the optimal warping functions found during this alignment process was analyzed in Sangalli et al. (2009) and was not found to be associated to the aneurysmal pathology. In particular, no significant difference exists between the warping functions of subjects in the Upper and Lower-No groups. Subsequent analysis may hence focus on the aligned data. The optimal warping functions can be used to correspondingly align the radius and curvature profiles; see Figure 3. After alignment it is possible to start appre- ciating a common pattern for the curvature profiles (bottom right panel) that was not visible before alignment (bottom left panel). The registered radius and curvature profiles highlight many interesting aspects. Figure 3 shows that the vessel gets progressively narrower toward the terminal bifurcation of the ICA; this is the so-called tapering effect. Tapering concerns all arteries, but it is particularly apparent close to the terminal bifurcation of the ICA, where the artery has to enter the skull. The figure also shows that most ICAs display two peaks of curvature in the terminal part of the vessel; these peaks of curvature are in correspondence with the carotid syphon. The same figure displays also Gaussian kernel density estimates of the aneurysm location along the ICA, before  and after alignment (left and right, respectively), based on data from patients having an aneurysm along the ICA or at its terminal bifurcation. The majority of the ICA aneurysms are located in the terminal part of the vessel, where tapering is stronger, and after the main curvature peak. These results support the conjecture concerning the influence of the vessel morphology and the aneurysm onset, via the hemodynamics. In fact, the tapering of the vessel and the peak in its curvature determine hemodynamic regimes that may facilitate aneurysm formation and development. The density estimates of the aneurysm location show that, after alignment, the locations of the ICA aneurysms cluster in two neatly separated groups, before and after −13 mm from the vessel terminal bifurcation. This fact suggests that this is the average position of the dural ring, i.e., the hole in the skull bone the ICA goes through to enter inside the skull. Notice that this ring cannot be detected directly through angiographies, but indications of the location of the aneurysm relative to the dural ring may be of great importance, since aneurysms within the skull are more dangerous, as explained in the data description.
The autocovariance structures of aligned radius and curvature profiles are thus explored separately by means of Functional Principal Component Aanalysis (FPCA), in order to estimate their main uncorrelated modes of variability. Since the 65 curves are observed on different abscissa intervals, these analyses focus on the interval where all curves are available, i.e., for values of the abscissa between −3.37 cm and −0.78 cm. Figure 4 shows the first two eigenfunctions of the autocovariance function for radius and for curvature, respectively, and the percentage of total variance explained is printed over each graph. Figure 4 also reports, for each considered principal component, the distributions of the corresponding scores for subjects in the Upper group and subjects in the Lower-No group. These scores may be used to discriminate the two groups of patients. In fact, the distribution of FPCA scores have significantly different means and/or variances for the two groups, as confirmed by appropriate t-tests and F-tests for equality of means and variances. According to these differences, Upper group patients tend to have wider, more tapered and less curved ICA's compared to patients belonging to the Lower-No group. Moreover the variance of these geometrical features is significantly smaller in the Upper group than in the Lower-No group. The Upper group is indeed very well characterized in terms of the geometrical features represented by the first two principal components of aligned radius and curvature profiles. Sangalli et al. (2009) shows in fact that a quadratic discriminant analysis of the scores of the first two principal components of aligned radius and curvature profiles correctly identifies 31 out of the 33 patients in the Upper group. This gives strong statistical evidence in favor of the conjecture explored within the project.

Identifying ICA's with different morphological shapes by k-mean alignment
The problem at hand might suggest the presence of more than one prototype of morphological shape of ICA. To evaluate this possibility we use k-mean alignment.
The top-center and top-right panels of Figure 1 give an indication of how many clusters k should be considered, i.e., how many morphological shapes of ICA are present in the datasest. The first panel reports the boxplot of the similarity indices between the unaligned centrelines and their estimated mean curve ("unaligned"), and the boxplots of the similarity indices between the k-mean aligned centrelines and the associated k estimated templates, for k = 1, 2, 3. The second panel displays the corresponding means of the similarity indices (orange circles linked by orange lines). The same plot also shows the means of the similarity indices that would be obtained by the simple k-mean algorithm without alignment (black circles linked by black lines). As highlighted by this figure, 1mean alignment leads to a large increase in the similarities, with respect to the similarities of the unaligned centrelines, but a further significant gain can be obtained by setting k = 2 in the aligning and clustering procedure; instead, a choice of k = 3 would not be justified by any additional increase in the similarities. Thus the k-mean alignment algorithm suggests the presence of k = 2 amplitude clusters within the analyzed centrelines. The bottom-right panels of Figure 1 show the results of 2-mean alignment of these three-dimensional curves. Figure 5 gives a three-dimensional visualization of the estimated templates of the two amplitude clusters. These two templates indeed identify two prototypal shapes of ICA commonly used in the medical literature, see e.g. Krayenbuehl, Huber and Yasargil (1982), and the two clusters can be described as the Ω-shaped ICA's cluster (35 curves in orange), whose siphons feature just one bend, and the Sshaped ICA's cluster (30 curves in green), whose siphons are characterized by two bends. As displayed in the right panel of Figure 6, the simple k-mean clus- Fig 5. 3D image of the estimated templates of the 2 amplitude clusters, found by 2-mean alignment of the ICA centrelines. The template of the orange cluster is the prototype of an Ω-shaped ICA (single-bend siphon), whereas the one of the green cluster is the prototype of an S-shaped ICA (double-bend siphon) .   Fig 6. 3D image of the estimated templates of the 2 clusters found by simple 2-mean clustering without alignment of the ICA centrelines. The two templates appear to have almost the same morphological shape, and seem to differ mainly in their phase. Clustering, without joint alignment, is driven by the predominant phase variability.
tering of these data, without joint alignment, leads instead to uninteresting classification results, failing to identify ICA's with different morphological shapes.
It is very interesting to note that the clustering found by 2-mean alignment seems indeed to be relevant for the aneurysmal pathology, in the sense that there is statistical evidence of a dependence between cluster membership and aneurysm presence and location. Looking at how subjects in the Upper, Lower and No-aneurysm group have been allocated respectively to the Ω-shaped ICA's and S-shaped ICA's clusters, we obtain the following conditional contingency table.
Upper group (33) Lower group (25) No-aneurysm (7) Ω 70% 48% 0% S 30% 52% 100% Note that the 7 subjects in the No-aneurysm group all display S-shaped ICA's and that among the 33 patients in the Upper group (those having the most dangerous aneurysms) only a minority has an S-shaped ICA, whilst the majority (70%) has an Ω-shaped one. This prompted us to conjecture that the ICA syphons act as flow energy dissipators. While S-shaped (double bend syphon) ICAs are expected to be very effective in dissipating the flow energy, Ω-shaped (single-bend syphon) ICA's would not be as efficient. A higher flow energy downstream of Ω-shaped ICA's would result in an overloaded mechanical stress for downstream arteries, creating more stimuli for aneurysm onset and development.
More results in support of this conjecture are reported in Passerini et al. (2012), where the relationship between morphological and hemodynamical features, and their impact on aneurysm pathology, is further explored.

Discussion
In this work, after the decoupling of phase and amplitude variabilities by k-mean alignment, we focussed on amplitude variability and on clustering in the am-plitude. This is because, in the specific application here considered, the phase variability, which is mostly due to the different dimensions of patients skulls, seems not to be relevant in the study, as it does not influences the considered pathology. In other applications, though, the clustering might be in the phase, rather than in the amplitude, or even in both phase and amplitude. It is thus important to mention that the alignment and clustering procedure here described, beside correctly detecting true amplitude clusters, is also able to simultaneously disclose clustering structures present in the phase, as illustrated for instance in Sangalli et al. (2010) and in Bernardi et al. (2014a,b); Patriarca et al. (2014), via both simulations and applications to real data.