Analysis of nonlinear modes of variation for functional data

A set of curves or images of similar shape is an increasingly common functional data set collected in the sciences. Principal Component Analysis (PCA) is the most widely used technique to decompose variation in functional data. However, the linear modes of variation found by PCA are not always interpretable by the experimenters. In addition, the modes of variation of interest to the experimenter are not always linear. We present in this paper a new analysis of variance for Functional Data. Our method was motivated by decomposing the variation in the data into predetermined and interpretable directions (i.e. modes) of interest. Since some of these modes could be nonlinear, we develop a new defined ratio of sums of squares which takes into account the curvature of the space of variation. We discuss, in the general case, consistency of our estimates of variation, using mathematical tools from differential geometry and shape statistics. We successfully applied our method to a motivating example of biological data. This decomposition allows biologists to compare the prevalence of different genetic tradeoffs in a population and to quantify the effect of selection on evolution.


Introduction
Researchers in an increasing number of fields, including biology, medicine, and engineering, collect samples of curves, images or objects of common shape. Growth curves of plants and animals, gene expression signals, medical images, and human speech are all real life examples of high dimensional multivariate or functional data, see Dryden and Mardia (1998) ;Fletcher et al. (2004); Ramsay and Silverman (2005). Understanding the variation in this type of data set is usually of primary interest. Principal Component Analysis (PCA) is the most widely used method to decompose variation in functional data. However, the directions of variation found by PCA are data driven and linear. So, principal directions are not always easily interpretable by the experimenters and the decomposition fails in the presence of nonlinear variation. The analysis presented in this paper decomposes the variation in terms of modes that are predetermined by the experimenters. Since some of these modes could be nonlinear, we develop a new method, Template Mode of Variation or TMV, to decompose variation along nonlinear modes. This decomposition problem is particularly challenging because of the complexity of non-Euclidean geometry. Our method was applied successfully to several sets of reaction norm curves in evolutionary biology, thermal performance curves of caterpillars in Izem (2004) and Izem and Kingsolver (2005), flies in Izem (2004) and viruses in Knies et al. (2006). The matlab code for this decomposition is available at http://www.fas.harvard.edu/˜rizem. The method that we present here could be generalized to decompose predetermined modes in any set of curves or images and along directions of interest that satisfy our assumptions.
The data and problem which motivated our analysis is presented in Subsection 1.1. We discuss the main assumptions of our analysis in Subsection 1.2. We contrast our approach to other approaches in the literature for analysis of nonlinear variation in Functional Data Analysis (FDA), Shape Analysis, or Manifold Learning in Subsection 1.3. Finally, we summarize the content of the paper in Subsection 1.4.

Motivating data and problem
The data which motivated the analysis in this paper comes from evolutionary biology and is shown in Fig. 1. Each curve shows the growth rate as a function of temperature for a given family of caterpillars, where a family is a set of offsprings of same parents. This data is an example of reaction norm curves, a widely collected data set in biology, where each curve shows the change of a trait as a function of the environment for some genotype, see Scheiner (1993). Thus, in our example, the growth rate is the trait of interest, the temperature is the environmental condition of interest, and each family represents a different genotype in the population. Other examples of reaction norm curves are growth of a virus as a function of temperature in Knies et al. (2006) and photosynthesis of a plant as a function of light intensity or nutrients in the soil.
The curves' variation represents the genetic variation in the population for different environmental conditions. Three gene-environment interactions were identified by biologists as being of particular interest: horizontal shift, vertical shift and generalist-specialist, see Kingsolver et al. (2001). These three variations are shown in Fig. 2. The first mode of variation, the vertical shift, corresponds to a population where some genes clearly dominate other genes in all environments. The two other modes of variation, the horizontal shift and the generalist-specialist, show two different gene-environment trade-offs where all genes are better in some environments and worse in others. In Kingsolver et al. (2001), the given hypothesis is that all these directions exist simultaneously, and there is a need for decomposing the genetic variation in the data into these modes of interest to see which directions are more important.
PCA is one of the most commonly used method to decompose variation in functional data. PCA was applied to the data set in Kingsolver et al. (2004) but it failed to answer the biological questions of interest. In particular, the paper found that the first linear principal directions were difficult to interpret biologically and could not distinguish the horizontal shift from the generalistspecialist. PCA failed to give biologically meaningful results in this data set because: first, PCA decomposition is data-driven rather than hypothesis driven; second, PCA can only find linear directions of interest and some of the directions of interest above are nonlinear.
The method described in this paper was successfully applied to decompose the variation in the data shown in Fig. 1 onto the three modes of variation of interest to biologists represented in Fig. 2. Our results are presented in Section 7 and show that the model fits 84% of the variation in the data, most of the variation (73%) is explained by the two nonlinear modes and the remaining 11% is explained by the, linear, vertical shift.

Common shape and predetermined directions
Our analysis assumes that functional data sets have common shape and that the modes of variation of interest are predetermined and parameterizable.
In the caterpillar example shown in Fig. 1, we see that each curve has a common shape. Each curve increases slowly, tends to reach a maximum and finally decreases rapidly. This common shape assumption is meaningful for several examples in functional data. Because functional data represent different realizations of the same underlying process, growth rate in our example, it is often considered that the variation is around a common template shape reflecting that process, see Gasser (1997, 1999); Ramsay and Silverman (2005); Dryden and Mardia (1998). In practise, it is often easy to determine the common shape from simply looking at the data. For example: a template gene expression signal could be periodic in time with each period corresponding to a cell cycle, and a template growth curve over time of a plant could be a logistic function. However, the term common shape is not easily formally defined. In this paper, the term of common shape or template shape will mean the shape in the center of the variation in the data.
There are often some directions of interest identified by the experimenters around this common template shape which we call modes of variation. This is the common shape curve and h i is the height of family i. Middle: Horizontal shift of curves: z i (t) = z(t − m i ), where z(t) is the common shape curve and m i is the location of maximum of family i. Bottom: Generalist-Specialist variation: z i (t) = w i z(w i t), where z is the common shape curve and w i is the width of the curve of family i. term was used early on, as in Castro et al. (1986). When these directions are predetermined, the variation in the data is then fully described by the template shape and the modes of variation around the template shape. In our example, we constructed the following 3-parameter Shape Invariant Model (SIM) Lawton et al. (1972) to model the three modes of variation of interest where z i (t j ) is the growth rate of family i at temperature t j . The function z represents the common shape. Parameters (h i , m i , w i ) represent the vertical, horizontal and generalist-specialist variation of family i from the template shape. The vertical shift is a linear mode of variation, however since z is not a linear function, the modes of variation parameterized by m and w are nonlinear. Note that the three parameters are on different scales, h is on the growth rate scale, m on the temperature scale and w is unit-less. Thus, it is difficult to compare the magnitude of the variance of different parameters. In addition, it is difficult to quantify the contribution of the variation of each parameter to the between growth curve variation. Our method is a nonlinear ANOVA type decomposition of the variation in growth rate induced by the variation of the parameters. Because our decomposition is done in the same scale as the data, the contributions of the parameters to the total variation in the data are comparable. The 3-parameter SIM is a particular case of a general Self Modeling Nonlinear Regression (SEMOR) model Kneip and Gasser (1988) where the response z i varies nonlinearly as a function of t ∈ R, where θ i ∈ R d ′ is the vector of parameters of variation, and where R is the common regression function. In the caterpillar example, θ i is the vector Other examples of modes are: variation in frequency and duration of speech signals, or variation in asymptotic height and growth rate of a logistic growth curve of plants. Definitions and results in this paper will be formulated in terms of the general model (1.2). Theoretical results will also assume that the common shape R is known. We discuss how to estimate R from the data using a procrustes type method in Section 6.

Our method and nonlinear variation in the literature
The importance of nonlinear variation of curves around a common shape, such as registration, or horizontal shift, has been recognized for some time in the FDA literature, see Chapter 5 in Silverman (2002, 2005). SIM or SEMOR models are also a common way to account for nonlinearity in FDA, some examples of the extensive literature in the topic are Lawton et al. (1972); Kneip and Gasser (1988); Härdle and Marron (1990); Kneip and Engel (1995); Lindstrom (1995); Ladd and Lindstrom (2000); Brumback and Lindstrom (2004). However, most registration methods which account for nonlinearity of the data have treated nonlinearity as a nuisance. These methods remove nonlinear modes to better estimate the common curve as in Gasser (1997, 1999), and then to apply conventional linear techniques such as PCA in Silverman (1995). In contrast, motivated by the goals of our collaborators, our premise is that nonlinear variation should be part of the decomposition. We believe that when nonlinear variation is itself of main interest, as is the case in some of the examples cited above, these modes should be properly accounted for in the decomposition. In SIM models, the focus has been on estimating the parameters and the common shape curve in an optimal way, such as in Kneip and Gasser (1988); Härdle and Marron (1990); Kneip and Engel (1995) and quantifying the variability by assuming random parameters and estimating their variance Lindstrom (1995); Brumback and Lindstrom (2004). The variance of each parameter in the SIM model can be used to quantify the variation in the data along a mode. However, as noted in Subsection 1.2, the parameters are on different scales which makes it difficult to compare the contribution of each parameter to the total variation in the data. In addition, the parametrization might not be unique and different parametric scales would have different variance values. In this paper, instead of using the variance of the parameters to quantify variability, we decompose the variation in the natural or intrinsic scale of the data in a nonlinear ANOVA type decomposition. The results of our decomposition are not tied to the parametrization we chose since an equivalent parametrization would produce the same space of variation. Thus, an equivalent parametrization of the modes of variation will produce the same decomposition. Our decomposition is also invariant to a linear transformation of the data, since a linear transformation of the data will be absorbed by the common shape function. In Shape Analysis, the notion of mean and variance in a metric manifold were defined in Fréchet (1948) as the Fréchet mean and Fréchet variance. These notions are widely studied in the field of robust statistics and have more recently been used by probabilists for characterizing distributions of shape data, see Kume and Le (2000); Le (2001); Patrangenaru (2003, 2005). These authors showed fundamental results for the Fréchet mean, conditions of its existence, consistency of its estimates from the sample, and a central limit theorem for the Fréchet mean on a Riemannian manifold. Our focus in this paper is not in estimating the Fréchet mean, but in decomposing variability. One drawback of the Fréchet variance is that it is a number regardless of the dimensionality of the manifold. It represents the total variation in the data, but it does not offer the full understanding of the variation that the variance-covariance matrix offers in the Euclidean case. In our approach, we decompose the Fréchet variance SSM into meaningful quantities, each representing variation along a mode of interest. i.e.
where SSM i quantifies the variation along mode i. This goal is reached by defining new metrics which allow for the decomposition of the variation. More precisely, we define a new d V in the manifold such that, whereR is a Fréchet mean. In the one dimensional case, the metric is simply the arcdistance or shortest distance between two points along the curved one dimensional differentiable manifold. In the two dimensional case we use a Pythagorean like formula to define a path metric in the manifold as a function of the arcdistances along each mode. Contrary to the Euclidean case, the path metric between two points along one-dimensional geodesics are not unique. So, our final distance accounts for this multiplicity by assigning weights for each path metric. Manifold learning methods as in Hastie and Stuetzle (1989); Tenenbaum et al. (2000); Donoho and Grimes (2003); de Silva and Tenenbaum (2003a); de Silva and Tenenbaum (2003b);Fletcher et al. (2004) extend the dimensionality reduction aspect of PCA to multivariate data lying in a nonlinear manifold. They are data-driven and exploratory, so the results they find are not always interpretable. Moreover, these methods do not attempt to quantify the variation along given nonlinear principal directions. In contrast, in our approach, we decompose the variation into pre-identified, and interpretable, modes given by the experimenter. Lastly, our decomposition exploits the fact that the data of interest to us are not arbitrary multivariate data but instead are functional data, with approximately common shape.

Structure of this paper
We illustrate the geometry of nonlinear spaces of variation with four toy examples in Section 2. Two toy examples illustrate one dimensional spaces of variation, and two other toy examples illustrate two dimensional spaces of variation. In Section 3 we define the Fréchet mean and variance for functional random variables varying in a manifold. We also propose a ratio measuring the variation along nonlinear modes. Section 4 discusses our choice of metrics in the manifold and the proposed quantification of one, two or multiple simultaneous modes of variation. In the one-dimensional case, the metric is simply the arcdistance. In the two-dimensional case and higher, the choice of metric is key to decomposing variability. The metric in higher dimensions is defined by a Pythagorean like formula using all the one dimensional arcdistances. The consistency of our estimates is shown in Section 5. We discuss the implementation of this method as well as present results of our method on the motivating example in Section 6. We finally discuss the result of the decomposition to the motivating data set in Section 7.

Geometry of the space of variation
Before considering the space of variation in the most general case in model (1.2), we explore four particular examples with a given common shape. The first two examples are one mode cases, the two other examples are two-mode cases. The one mode cases illustrate what the geometry of the space of variation is for a linear and a nonlinear mode, they are: (a) Linear mode example: vertical shift of curves around a common shape. In this example, θ i = h i , and R(θ i , t) = z(t) + h i in model (1.2).
(b) Nonlinear mode example: horizontal shift of curves around a common shape. In this example, θ i = m i , and R(θ i , t) = z(t − m i ) in model (1.2).
The two modes examples are: (c) Linearly separable model example: simultaneous variation of curves along the vertical shift and horizontal shift. In this example . We call this model linearly separable because we can write the function R((m i , h i ), t) as the sum of two different functions, each depending on only one parameter. More precisely, As we will see in Section 4, the geometry of the space of variation and the decomposition will be simpler to describe in this case because of this property. (d) General case example: simultaneous variation of curves along the horizontal shift and generalist-specialist. In this example, (1.2). As we will see in Section 4 defining a distance that allows the decomposition will be less trivial in this case, but will rely on similar ideas defined for one-mode and two-mode linearly separable examples.
For illustration, the curve variation as well as the space of variation is shown for each of these four examples with a parabola common shape in Fig. 3 and Fig. 4.
These four examples and their illustrations are discussed in details in Subsection 2.1. In Subsection 2.2, we present the general result on the dimensionality and geometry of the space of variation for a set of curves of common shape.

Examples in one and two dimensions
To visualize the variation in the curve space and in the point cloud space, we sample z at three distinct points (t 1 , t 2 , t 3 ). As illustrated in Fig.s 3 and 4, sampling at three points allows for a representation of an infinite dimensional curve as a point in three dimensions. So, a set of parabolic curves in the curve space appears as a set of points in the point cloud space and the variation in the curves corresponds to variation of the points. The linear mode example, the vertical shift, is presented in the two left panels of Fig. 3. We see that when there is one mode of variation of the curves and it is linear, here vertical shift, the points in the point cloud space fall along a line. We call the line in this example the space of variation of the data. Note that when the mode of variation is linear, the arithmetic average falls within the space of variation. Note also that deviation of each data point from the mean could be measured by the Euclidean metric. The nonlinear mode example is presented in the two right panels of Fig. 3. We see that when there is one mode of variation of the curves and it is nonlinear, here the horizontal shift, the points in the point cloud space fall along a curve. We call this curve in this example the space of variation of the data. Note that when the mode of variation is nonlinear, the arithmetic mean will not fall in the space of variation.  Similarly, using the Euclidean distance will not be the best characterization of deviation along the space of variation. For the one mode case, we will propose in Section 3 and Section 4 to use the arcdistance, or distance along the manifold, instead of the Euclidean distance as measure of proximity. We also propose to use the Fréchet mean and variance, or mean and variance along the manifold, as a measure of center and spread in the manifold. Fig. 4 illustrates the case of simultaneous variation along two modes, vertical shift and horizontal shift in the left two panels, horizontal shift and generalistspecialist in the right two panels. We see in both examples that when there are two modes of variation, the points in the point cloud space fall in a plane or a surface. We call the surface in each example the space of variation of the data. When the space of variation is curved, the mean does not fall in the space of variation and the Euclidean metric is not appropriate for proximity. As in the one-dimensional case, we propose in Section 3 to use the Fréchet mean set for a center in the manifold of variation. However, in contrast to the onedimensional case, we do not use the arcdistance or shortest distance along the manifold as measure of proximity. Instead, we define in Section 4 new metrics by a Pythagorean like formula using the one-dimensional arcdistances along each mode. The advantage of these new metrics is that, by construction, we can decompose variation in the manifold in an ANOVA type decomposition along the modes of interest. As discussed in Section 4, the construction of this metric is simpler in the linearly separable case than in the general case.

General dimension
For a variation or a set of variation around a common shape, the space supporting the points in the point cloud space is called the space of variation. In the one-mode examples, the line or the curve were the space of variation. In the two-modes examples, the surface is the space of variation. In general, given data as in model (1.2) with a common shape R(θ, t) and parameter space Θ ⊂ R d ′ , we define the the space of variation V as the subspace of R d such that, Theorem 2.1 states that under certain conditions on the common shape and the parameter space, the space of variation is a manifold of the same dimension d ′ as the parameter space See proof in Appendix. For the caterpillar data, this condition is satisfied for any polynomial common shape z of degree higher than 2 and the three modes of variation parameterized by (w, m, h) in model (1). The metrics d V which we define in the manifold of variation V in Section 4 using the arcdistances along each mode will additionally require that R(., t) be differentiable.

Variation in a manifold
In the previous section, we visualized the space of variation in the case of a one-dimensional mode or a two-dimensional simultaneous modes. The geometry of the space is non-euclidean in the presence of nonlinear variation, so the usual notions of mean and variance are not representative of the center and spread of the distribution. We first define in this section an appropriate mean, as a measure of center of variation, and variance, as a measure of spread along the manifold. The goal is to use these new mean and variance to define a ratio to quantify nonlinear modes.

Fréchet mean and Fréchet variance
It is well known that given a set of data in a nonlinear manifold, the arithmetic mean will not necessarily fall in the manifold. Similarly using the Euclidean distance as a measure of variation around the mean shape is not appropriate in the manifold because this distance is not a good measure of proximity in a nonlinear space. These facts have been discussed extensively in the shape analysis literature, and illustrated with a toy example in functional data in Izem et al. (2003). Because the usual definitions of mean and variance in a linear space are not meaningful measures of center and spread in a nonlinear space, Fréchet generalized the notions of mean and variance to manifolds in Fréchet (1948). The generalization proceeds as follows, for a real random variables X in a Euclidean space with measure µ, the expected value E(X) is the point in the space which minimizes the variance, i.e. let F (y) = ||X − y|| 2 dµ(x), then E(X) = argmin y∈R (F (y)) and V ar(X) = F (E(X)).
For a metric manifold (M, d) with measure µ, let the Fréchet function be, This function is well defined if M d 2 (x, y)dµ(x) < ∞, ∀y ∈ M In a metric manifold (M, d), the Fréchet mean set E F (X) of a random variable X, in the manifold M , with probability measure µ, is the set of points on the manifold which minimize the function F (y). The Fréchet variance V ar F (X)is the value F (X F ) for anyX F in the Fréchet mean set E F (X). i.e.
Note that although we can have more than one Fréchet mean, we can define a unique Fréchet variance. Note also that the Fréchet variance is a scalar, and not a matrix, even for manifolds of dimension d ′ ≥ 2. In this paper, we are concerned with functional data X, so the Fréchet mean set is a set of functional data.

Quantifying the variation
In our context, let z 1 , . . . , z n be a functional data set of common shape as in model (1.2), where each z i is associated to the regression functions R i and parameter θ i in a closed set Θ. Thus, and an intuitive estimate of the Fréchet function is We can also derive an estimate of the Fréchet mean set E F,n (R) and an estimate of the Fréchet variance 1 n SSM , wherẽ We propose to quantify the variation in the manifold by the following ratio, Note that for linear modes, if we take d V to be the Euclidean metric, this ratio corresponds to the usual ratio of sums of squares used in PCA. Note also that the choice of the distance d V is critical, it is the main building block to the decomposition we show in this paper. In one dimension, we choose d V to be the arcdistance, or the distance along the curved one dimensional manifold. For a differentiable one-dimensional manifold with parametrization γ : In higher dimensional spaces of variation, our choice of d V is motivated by our decomposition goal, i.e. we define d V such that by definition, and where RSS k quantifies the variation along mode k. Constructing the distance d V is challenging in manifolds since the notion of orthogonality is local whereas it is global in linear spaces. We reach this goal by incorporating in d V the arcdistances along each mode in a Pythagorean like formula. We first define these distances more formally in Section 4. We show in Section 5 that in each case, with our choice of distance, the estimates above are consistent estimates of the Fréchet mean and variance. Moreover, we find that in the one dimensional and linearly separable cases the Fréchet mean and its estimate (for a given n) are unique. Finally, we show in the general case, consistency. Using these estimates of Fréchet mean and variance, we use the proposed the above ratios RSS k 's to quantify nonlinear modes in a manifold.

Distance and decomposition of variation in the manifold
In the one dimensional case, the metric d V is simply the arcdistance and the quantification is easy to do as seen in Subsection 4.1. In the two-dimensional case and higher, the geodesic distance does not allow for a straightforward decomposition. So we define the metric as a function of the arcdistances along each mode. As seen in Subsection 4.2, the expression of this metric is easy in the linearly separable case. The distance is not trivial to generalize to the general, non linearly separable, case. We first see how to generalize it in the two-dimensional case in Subsection 4.3. Finally, Subsection 4.4 discusses the generalization of this metric to higher dimensional spaces of variation.

One mode
A natural choice of distance d V (x, y) between two points x and y in the onedimensional manifold of variation is the arcdistance, Arcd(x, y). We show in Section 5 that in the one dimensional case, the ratio RSS defined in Equation (3.1) is well defined, the Fréchet mean is unique, its estimate for a sample of points of size n is also unique, and this estimate is consistent.

Multiple simultaneous modes, linearly separable case
We described in Section 2 an example of a 2-dimensional linearly separable space of variation. It is shown in the two left panels of Fig. 4, the two modes of variation are the vertical shift and the horizontal shift. We first review the equality of path property in Subsection 4.2.1 and define it in the general case. In Subsection 4.2.2, we use the property to construct the metric and the decomposition of variation.

Linear separability and equality of path
We illustrate the geometry of the space of variation in the two dimensional, linearly separable case, in . For this special case, the two solid line segments and the two dashed curve segments from the two different paths are of equal length. This equality of path property is satisfied by any manifold generated by a linearly separable model. Namely, the arcdistance between two points along the space of variation if we fix one of the parameters, depends only on the other parameter. For our example, we have that so this distance along the surface of variation between the points (m, h 1 ) and (m, h 2 ) depends only on h 1 and h 2 and not on m. In addition, we have as in Equation 3.2 that Since the derivative ∂ ∂m z does not depend on h, the distance along the surface of variation between the points (m 1 , h) and (m 2 , h) depends only on m 1 and m 2 and not on h. In general, we define a linearly separable model as follows.

Distance and decomposition
Using this property, it is straightforward to define a metric in the manifold which will decompose the total variation in the data onto the modes of interest using the following metric We show that d V satisfies the conditions of a distance in the Appendix. The proof shows that the space of variation for a linearly separable model is homeomorphic to a subspace of R d ′ . Furthermore, the function d V we defined above corresponds to the Euclidean metric in this subspace of R d ′ . The right panel of Fig. 5 illustrates this mapping in the case of d ′ = 2. This mapping preserves the arcdistances along the curves of the modes of variation, and d V corresponds to the Euclidean distance in this mapping.
Finally, we have a simple decomposition in the linearly separable case. Recall that by definition SSM = n i=1 d 2 V (R i ,R). By construction from the above theorem, we have that where C 2 i,k is a function of the kth mode. So, finally we have the decomposition By construction, RSS k quantifies the variability around the Fréchet mean along mode k.

Two modes, general case
We described in Section 2 an example of a 2-dimensional general case space of variation. We saw in the two right panels of Fig. 4 that when the two modes of variation are horizontal shift and generalist-specialist, the space of variation is a two-dimensional manifold. The right panel of Fig. 6 is a representation of the same space. In this figure, the three parabolic curves correspond to the horizontal shift variation for three different fixed values of the width parameter and the three lines correspond to portions of the curve representing the generalistspecialist mode for three different fixed values of the horizontal shift variation. As illustrated in Fig. 6, in the general case we do not have the equality of path property. More precisely, the paths for going in two steps from one point to another in the manifold will not be equal. We define in Subsection 4.3.1 and Subsection 4.3.2 two different metrics d 1,O and d 2,O by considering two different mappings L 1,O and L 2,O of the space of variation into a subspace of R 2 . Each metric is homeomorphic to the Euclidean distance in the transformed subspace of R 2 . We finally combine the two distances in Subsection 4.3.3 to define a metric d V in the manifold indexed by the weight γ as As discussed in Remark 4.1, we chose γ to be 1/2 in our analysis. We see in Subsection 4.3.4 how we can use this distance for the decomposition of interest.
Because each mapping is a different transformation of the manifold into a plane, we need to define each transformation with respect to an origin O = R 0,0 = R((α 0 , β 0 ), t) in the space. We defer the discussion on the choice of origin to Section 5.

First transformation
The first transformation L 1,O maps the space of variation V into R 2 by preserving the arcdistances along the two curves which cross the origin O. The two curves which cross at the origin are the dotted curves in Fig. 6, and the image of the transformation L 1,O is shown in the bottom right panel of Fig. 6. We define the distance d 1,O in V as the Euclidean distance in L 1,O (V ). More formally, let After transformation, we define the metric d 1,O between two points in the manifold as equivalent to the Euclidean distance in the space L 1,O (V ) such that The following Proposition states that this function defines a metric, This Proposition is equivalent to showing that L 1,0 is isometric. By using simple algebra, we can further simplify the expression of d 1,O as a function of the parameters of variation rather than the linearizing function as,

Second transformation
The second transformation L 2,O maps V into R 2 by preserving the arcdistances along the curves which cross the points of interest. For example, the curves which cross at point 1 in Fig. 6 are the two dashed curves. Similarly, the two curves which cross at point 2 in Fig. 6 are the two dotted and dashed curves. The image of the space of variation by this transformation L 2,O is shown in the top right panel of Fig. 6. As for the first transformation, we define a metric in V as the Euclidean distance in L 2,O (V ). More formally, the second transformation L 2,O is defined as follows, We can then define a function d 2,O which corresponds to the Euclidean distance in the linearized space.
We can rewrite d 2,O as a function of the parametrization as follows which, in addition to the above conditions, makes d 2,O a distance.
Note that if the space V is linearly separable, then the two functions d 1,O and d 2,O are the same, and they are equal to the distance defined in the linearly separable case. This result is stated in the following proposition and proved in the Appendix

Distance in the manifold
The fact that d 2,O is not always a distance in the general case is not an issue since we can combine it with d 1,O to define a distance in the space of variation as stated in Proposition 4.4 proved in the Appendix.
is a metric for all γ ∈ [0, 1) in a differentiable manifold V . The parameter γ is a weight of each distance.
Remark 4.1. When analyzing the caterpillar data (see results in Section 7), the distance in the manifold generated by the generalist-specialist and horizontal shift corresponded to the equal weights case (i.e.γ = 1 2 ) We chose equal weights because there was no a-priori reason why one path should be weighted more than the other path. We repeated the decomposition by changing the distance from full weight on one distance (γ = 0) to full weight on the other distance (γ = 1) and the results of the decomposition were similar.

Decomposition
We have finally defined all the tools for our proposed decomposition. LetR O = RÕ 1 ,Õ2 be an estimate of a Fréchet mean (i.e. in the Fréchet mean set) then we can see how to decompose the variation with the distance d V,O For a given origin O, the term SSM 1,O quantifies the variation along the mode parameterized by α and the term SSM 2,O quantifies the variation along the mode parameterized by β.

Multiple simultaneous modes, general case
We constructed in Subsection 4.2 a metric and a decomposition for multiple dimensional case satisfying the linearly separable case. More generally, when have d ′ simultaneous modes of variation, we might have the linearly separable property satisfied for a subset of parameters but not for all parameters. For example, in model (1.1), we can write R((w, m, h), t) = R 1 ((w, m), t) + R 2 (h, t) where R 1 ((w, m), t) = w × z(w(t − m)) and R 2 (h, t) = h. We can define a distance in this three dimensional manifold as is the metric defined for a two-dimensional space of variation in Subsection 4.3.
Using the same tools we developed in Subsection 4.3 we can generalize the above metric and decomposition to a d ′ dimensional space where the equality of path property is not necessarily satisfied by any subset of parameters. In a 2-dimensional manifold, we found two possible mappings of V , each preserving one (out of two) possible 2-steps paths between two points along the curves of variation. In the general d ′ -dimensional manifold, the number of mappings is the number of possible d ′ -step paths between two points along the curves of variation which is (d ′ !). For each possible path i, we can define a mapping L i,O , which allows us to define a function d i,O . We define the metric in the manifold to be d O with weights γ i associated to each path metric. More precisely, for any two points R 1 and R 2 in the manifold Each d 2 i,O is the sum of d ′ terms, each depending on one parameter. Thus, we can reorganize the sums of square in the previous formula to be the sum of d ′ terms, each depending on one parameter.

Consistency and choice of origin
We state in this section the main theorems proving the consistency of our estimates with the distances defined in the previous section. We also discuss our choice of origin in the space of variation used in the non-separable case.

Consistency
Theorem 5.1 states the uniqueness of the Fréchet mean in the one dimensional case and the d ′ -dimensional case linearly separable case. We use Theorem 2.3 from Bhattacharya and Patrangenaru (2003) to prove consistency of our estimates in the general case in Proposition 5.1.
Theorem 5.1. For i.i.d random variables R 1 , . . . , R n of measure µ in (V, d V ) of finite Fréchet mean and Fréchet variance. For V a one dimensional differentiable manifold or a d ′ dimensional differentiable manifold linearly separable. We have that the Fréchet meanR F is unique, for all n its estimateR from the data is unique and See proof in the Appendix. Showing consistency in the general case is not trivial. We use Theorem 2.3 from Bhattacharya and Patrangenaru (2003) which establishes consistency of the Fréchet mean estimates for a complete manifold, to prove the following corollary

Choice of an origin
We propose to use an originÕ = R((α 0 , β 0 ), t) in a compact set K in V which is a minimizer of the Fréchet Variance, i.e.
andR F,O is an element of the Fréchet mean set. This choice of origin is conservative, it will guarantee that the estimate of the variance in the manifold will not be inflated. A question of interest is, can this variance be estimated from the data when this origin is not set in advance? Let F n (O) be the estimate of the Fréchet variance from the data, i.e.
whereR n,O is the estimate of the Fréchet mean associated to the origin O and the data R 1,1 , . . . , R n,n . An estimate of F n (O) from the sample is the minimum of F n (O) over a compact set, i.e. the estimate is The following proposition states that the sequence F n (O n ) converges to the desired variance function Var F,Õ . See proof in Appendix.

Implementation and extensions
We have discussed in this paper a useful decomposition of the variation in the manifold of variation. In the motivated model, the data does not lie in the manifold but is projected onto the manifold. More specifically, the data z i ∈ R d is such that z i = R i + ǫ i where R i is the projection of z i onto the manifold and ǫ i is additive error. The projection is the point in the manifold which is closest to the data in R d , i.e. which minimizes SSE such that This projection might not be unique but we can use diagnostic plots shown in Fig. 7 to detect multiplicity of fits. We can also use local information to choose between multiple projections, so that for data which are close together in R d , their projections onto V should be close together. We also suppose in this paper that the common shape function is known or could be estimated from the data. For the caterpillar example, the common shape function is estimated by a polynomial in an iterative procedure. Each iteration includes two steps, in each step parameters are fit to minimize SSE. Given initial parameters of variation, the first step finds the optimal values of the parameters of the polynomial. These polynomial coefficients are then used in the second step to find the optimal values of the parameters of variation. The parameters found in this iteration are used as initial parameters of the following iteration and this procedure is repeated until the algorithm converges to an optimal solution. This iterative procedure could be easily generalized for a common shape in any parametric family satisfying the condition of Theorem 2.1. This iterative procedure could also be generalized to estimate non-parameterically the common shape as well as the parameters of variation using algorithms described in Gasser (1997, 1999). Finding the geodesic distance along one-dimensional manifolds is central to our method. This distance could be computed analytically using Equation 3.2. Another approach based on approximation is to consider the path length between two points in a one-dimensional manifold as the sum of lengths of small segments along the curve. The length of each small segment is approximated by the corresponding Euclidean distances.

Results
Our TMV analysis has been implemented and used by biologists to decompose variation along modes of interest in thermal performance curves of caterpillars in Izem and Kingsolver (2005), and viruses in Knies et al. (2006). The fitted parameters and fitted template polynomial optimized the weighted sums of squared error, weighted by the sample sizes of each family. The results of the decomposition on the motivating caterpillar data are presented in Table 1. Fig. 7 shows a visual diagnostic of the fits.
The data have only six measurements per curve, as shown in Fig. 1. Thus, we assumed for simplicity that the template shape is a polynomial of degree 4 in the results shown in Table 1 and the top two panels of Fig. 7. For identifiability, we assumed further that the fitted height parameters sum to zero. The polynomial fit to each curve as well as the template shape diagnostic fit are shown in the top two panels of Fig. 7. We see in the top left panel of Fig. 7 that the variation of the fitted polynomials closely matches the variation in the original data shown in Fig. 1. The top right panel of Fig. 7 shows the warped data compared to the fitted template shape polynomial of degree 4. Each curve i was warped by its fitted parameters (ŵ i ,m i ,ĥ i ) so that all curves could be compared on the same scale to the fitted template shape. More precisely, a warped curve i represents the warped growth rate (z i −ĥ i )/ŵ i plotted at the warped temperature (t −m i )ŵ i , where z i and t represent the observed growth rate and temperature vectors of curve i. We found that this representation is an excellent graphic diagnostic of the model fit. Under the assumptions of our model, after warping, we expect that the global maxima of all curves is at 0, that the warped curves are aligned and that the template shape is lying in the middle of the warped curves. We see in the top right panel of Fig. 7 that these expectations are mostly met. One of the curves appear to be an outlier, it corresponds to a family with small sample size. Although the fitted polynomial lies in the middle of the warped curves, the template curve seem to slightly overestimate the growth rate at low temperatures, which is due to the limitation of an oversmooth polynomial fit. In contrast to this good fit, the two bottom panels of Fig. 7 show the warped data compared to two fitted template shapes of a polynomial of degree 6 in the left panel and of a polynomial of degree 3 in the right panel. These fits are not as good as the polynomial of degree 4 fit because we see that in both panels the fitted maxima (at zero for warped data) is outside of the measurement range for many curves and the warped curves are not aligned. Because the curves are not aligned, a small set of curves contribute to fitting a large range of the polynomial of degree 6 (colder temperatures) and different groups of curves fit different regions of the template polynomial of degree 3.
We see in Table 1 that our model explains about 5/6 of the variation in the data, which is surprisingly good since all the modes have biological meaning. Very little variation is occurring along the vertical shift which suggests that selection of caterpillars with high growth rate at a particular temperature in one generation, will not result in individuals with high growth rate overall temperatures in other generations. Our decomposition separates the contribution of the generalist-specialist variation (27.11%) from the contribution of the horizontal shift variation (45.76%), which was not possible using Euclidean methods such as PCA and conventional ANOVA. We tested the robustness of our results using 500 bootstrap samples, where each bootstrap sample was obtained by sampling with replacement the family curves. We fitted the parameters and derived the decomposition for each bootstrap sample and we see in Table 1 that our decomposition of the variation hold.
Proof. To show that V is a manifold of dimension d ′ we need to show that it is a separable topological space and that every neighborhood Ω in V is homeomorphic to a neighborhood in R d ′ . V is a topological subspace of R d with topology induced by the topology in (R d , ||.||). Since R d ′ is separable, and R(., t) is an homeomorphism then V is also separable. Since R(., t) is an homeomorphism from (R d ′ , ||.||) to (V, ||.||), then every neighborhoods Ω in V is homeomorphic to a neighborhood in R d ′ .
By definition of the linearly separable space, the first term Arcd 2 (R 1,O , R 2,O ) is equal to C 2 α1,α2 using the notation of Subsection 4.2 and the distance does not depend on the origin O. Similarly, the second term Arcd 2 (R O,1 , R O,2 ) is equal to C 2 β1,β2 using the notation of Subsection 4.2 and the distance does not depend on the origin O. Thus, d 1,O is equal to the distance d V defined in Theorem 4.1. The formula for d 2,O (R 1,1 , R 2,2 ) given in Subsection 4.3.2 simplifies in the linearly separable case to Thus, d 2,O is equal to the distance d V defined in Theorem 4.1.
Proposition 4.4. The non-negative function d V,O,γ defined as is a metric for all γ ∈ [0, 1) in a differentiable manifold V Proof. We prove this proposition in the general case that d is a distance where d = γd 2 1 + (1 − γ)d 2 2 , d 1 is a distance and d 2 satisfies properties (i) to (iii) in Proposition 4.2. We need to show that d is (i) positive and symmetric, (ii) d(x, y) = 0 iff x = y, and (iii) d satisfies the triangular inequality. (i) is satisfied by definition. Let us show (ii), if d(x, y) = 0, then d 1 (x, y) = 0 and d 2 (x, y) = 0 which implies that x = y since d 1 is a distance. Let us show (iii), the triangular inequality. We need to prove that d(x, y) ≤ d(x, z) + d(z, y) or equivalently that Let us consider the left hand side of the inequality To prove the inequality (A.2), it is sufficient to prove that We will reason by equivalence, inequality (A.2) is equivalent to On one hand, On the other hand, After canceling the common terms from the left and the right hand side of inequality (A.4), we have the inequality Since this last inequality is always true, then the triangular inequality is always true. Since the function d satisfies (i), (ii), and (iii) then it is a metric in M .
Proof. To show the uniqueness and convergence in the one dimensional case, we first map the manifold into R using an isometry L and then we use known consistency of the mean results in R to prove consistency in M . Let L be the class of functions, L = {L θ , θ ∈ Θ} such that for all θ 0 in Θ For a given parameter θ 0 , we can consider R(θ 0 , t) as the origin in the manifold V by the transformation L θ0 because by definition L θ0 (R(θ 0 , t)) = 0. For all θ 0 , L θ0 satisfies the isometry property, i.e. |L θ0 (R 1 ) − L θ0 (R 2 )| = Arcd(R 1 , R 2 ), and the distance between two points L θ0 (R 1 ) and L θ0 (R 2 ) does not depend on the origin. By definition of the space of variation and by construction of the transforming function L θ , this function is a homeomorphism. i.e.
By isometry the left hand side implies that Arcd(x 1 , x 2 ) = 0, which implies that x 1 = x 2 (because Arcd is a metric in V ). (ii) Continuity of the inverse of L θ0 : We need to show that This property follows directly from the isometry.
Moreover, L is such that L θ1 (x) = L θ0 (x) + sign(θ 1 − θ 0 )L θ1 (θ 0 ) Using L, let us show that the Fréchet mean is unique. We have that E(L θ0 (R)) = Argmin x∈R (L θ0 (R) − x) 2 dµ Let R 0 = L −1 θ0 (E(L θ0 (R i ))), we can show that R 0 is the Fréchet mean. Let F (R) be the Fréchet function, then ∀R = R 0 , F (R) > (L θ0 (R i ) − E(L θ0 (R))) 2 dµ, and by isometry, > Arcd 2 (R i , R 0 )dµ So, R O is the only element in the Fréchet mean set. We can repeat the same argument to show uniqueness of the Fréchet sample meanR and the empirical measure.
The proof in this case is equivalent to the proof in the one-dimensional case.
Since d V is a metric and L (θ1,0,...,θ d ′ ,0 ) satisfies the isometry property, then we can show that it is continuous, invertible and the inverse is continuous. Similarly, the proof of uniqueness of Fréchet mean estimate and Fréchet mean is equivalent to the one dimensional case.  Let x n be a minimizer of F n , i.e.
F n (x n ) = min x∈V F n (x) To use Theorem 2.3 of (Bhattacharya & Patrangenaru (2003)), we need to show that (V, d V ) is a complete metric space and all bounded closed set is compact.
1. Completeness: Let x n , and x m two points in V , then by definition of d V d V (x n , x m ) = γd 2 1 (x n , x m ) + (1 − γ)d 2 2 (x n , x m ) So, if (x n ) is a cauchy sequence in (V, d V ), it is cauchy in (V, d 1 ) and (V, d 2 ). We know by construction of d 1 and d 2 , (V, d 1 ) and (V, d 2 ) are both homeomorphic to (R 2 , ||.||) (where ||.|| denotes the norm derived from the Euclidean metric). By this homeomorphism, a cauchy sequence (x n ) in (V, d 1 ) converges to a 1 in V . Similarly, a cauchy sequence (x n ) in (V, d 2 ) converges to a 2 in V . Let us finally show that a 1 = a 2 . Since V is a subspace of R d ′ , we have that ||x n − a 1 || ≤ d 1 (x n , a 1 ) (A.8) (b/c the Euclidean distance is the shortest distance), so (x n ) converges to a 1 in R d ′ . Similarly, we have that ||x n − a 2 || ≤ d 2 (x n , a 2 ) (A.9) (b/c the Euclidean distance is the shortest distance), so (x n ) converges to a 2 in R d ′ . By equations (A.8) and (A.9), and using the triangular inequality of the Euclidean distance, we have that ||a 1 − a 2 || = 0 So, a 1 = a 2 . Then, d V (x n , a 1 ) converges, so (x n ) converges. 2. Compact sets in (V, d V ). Similarly, using the homeomorphism, a closed and bounded set A in (V, d V ) is closed and bounded in (V, d 1 ) and (V, d 2 ). By the homeomorphisms to (R 2 , ||.||), the space A is compact in (V, d 1 ) and (V, d 2 ), so it is compact in (V, d V ).
So, using Theorem 2.3 of (Bhattacharya and Patrangenaru (2003)), we have that (x n ) → R F (a.s) where R F is a Fréchet mean. On the other hand, we have that F n (R F ) → V ar F Since |F n (x n ) − V ar F | ≤ |F n (x n ) − F n (R F )| + |F n (R F ) − V ar F | We have that F n (x n ) → Var F (a.s) Proof. Let F (O) = Var F,O . Then, by the property of minimums F n (O n ) and F (Õ) and continuity of F n and F we have that, First, ∀ǫ, ∃(α, β) s.t. F n (α) − ǫ/2 < F n (O n ), and F (β) − ǫ/2 < f (Õ).
Second, F n (O n ) ≤ F n (β) because O n is a minimizer of F n , and F (Õ) ≤ F (α) becauseÕ is a minimizer of F .
which proves the convergence.