1 Introduction

Principal component analysis (PCA) is a powerful and most popular method to visualize and to reduce dimensionality of high dimensional datasets using tools in linear algebra [1]. Principal components can be obtained by solving an optimization problem to find the best-fit linear space to a given sample over an Euclidean space. The primal problem of this optimization is to minimize the sum of squares of distances between each observation in the given dataset to its orthogonal projection onto the linear space, and the dual problem is to find the largest direction of the variance of a given dataset. In the multivariate analyses, the results rely heavily on a given metric, which essentially determines how similar any pair of data points are. Thus replacing the conventional Euclidean metric by another metric can work depending on problems, especially datasets from non-Euclidean spaces. Tropical linear algebra has been well studied by many mathematicians (for example, Refs. [2,3,4]). Especially, it is well-known that convexity with the tropical metric behaves very well [5]. Therefore, in 2017, Yoshida et al. [6] applied tropical linear algebra to PCA by solving the primal problem of the optimization with the tropical metric over the tropical projective torus using the max-plus algebra.

Therein [6], two approaches to PCA using tropical geometry has been developed: (i) the tropical polytope with a fixed number of vertices “closest” to the data points in the tropical projective torus or the space of phylogenetic trees with respect to the tropical metric; and (ii) the Stiefel tropical linear space of fixed dimension “closest” to the data points in the tropical projective torus with respect to the tropical metric. Here “closest” means that a tropical polytope or a Stiefel tropical linear space has the smallest sum of tropical distances between each observation in the given sample and its projection onto them in terms of the tropical metric. The first approach (i) has been well studied and applied to phylogenomics [7], as the (equidistant) trees space has a nice property of tropical convexity [5]: since the space of equidistant trees with a fixed set of labels for leaves is tropically convex [8] and since a tropical polytope is tropically convex [2], a tropical polytope is in the space of equidistant trees if all vertices of the tropical polytope are in the tree space. Meanwhile, the second approach (ii) had little attention, even though the tropical projective space can be essential in data analyses such as the characterization of the neural responses under nonstationarity [9].

The Stiefel tropical linear space, that can be characterized by a Plücker coordinate computed from a matrix, has been studied and it has nice properties, such as projection and intersection [3, 8, 10, 11]. In the second approach (ii), Yoshida et al. [6] showed explicit formulation on the best-fit Stiefel tropical linear space to a given sample when the Stiefel tropical linear space is a tropical hyperplane and the sample size is equal to the dimension of the tropical projective space. Recently, Akian et al. developed a tropical linear regression over the tropical projective space and extended the best-fit tropical hyperplane to a sample with any sample size \(\ge 2\) [12]. However, in general, their formulation does not hold for finding the best-fit Stiefel tropical linear space if we vary the dimension of the Stiefel tropical linear space.

In this paper, therefore, we consider the explicit formulation of the best-fit Stiefel tropical linear space to a sample when we vary the dimension of the space and the sample size. More specifically, we focus on fitting a Stiefel tropical linear space of any smaller dimension to a sample with sample size \(\ge 2\) generated from a mixture of Gaussian distributions. In order to uniquely specify a Stiefel tropical linear space over the tropical projective space \(({\mathbb {R}} \cup \{-\infty \})^d \!/{\mathbb {R}} \textbf{1}\) with \(\textbf{1} = (1, 1, \ldots , 1)\), we use the Plücker coordinate or the matrix \(A \in (\mathbb {R} \cup \{-\infty \})^{m \times d}\) associated to it, where \(m < d\) and \(m-1\) is the dimension of the Stiefel tropical linear space. To compute the Plücker coordinate of a Stiefel tropical linear space is equivalent to compute tropical determinants of minors of its associated matrix. As Xie studied geometry of tropical determinants of \(2 \times 2\) matrices in [13], we study geometry of the best-fit Stiefel tropical linear space to a sample generated by a Gaussian distribution, as a location of the “apex”, i.e., the center of the Stiefel tropical linear space. Then we also study geometry of tropical polynomials. Specifically, we show an algorithm to project an observation onto a tropical polynomial in terms of the tropical metric and propose an algorithm to compute the best-fit tropical polynomial to a given sample in \({{\mathbb {R}}}^3 \!/{\mathbb {R}} \textbf{1}\).

This paper is organized as follows: In Sect. 2 we describe basics in tropical arithmetic and geometry. Then in Sect. 3, we define the Best-fit Stiefel tropical linear space, that is, the best-fit Stiefel tropical linear space over \(({\mathbb {R}} \cup \{-\infty \})^d \!/{\mathbb {R}} \textbf{1}\) with a given sample. Section 4 describes a characterization of the matrix associated with the Plücker coordinate of the best-fit Stiefel tropical linear space to a sample generated by a Gaussian distribution when we send the variances to zero. In this section we also investigate geometry of the best-fit Stiefel tropical linear space of the dimension \(m-1\) when \(m = d - 1\). Section 5 generalizes the results in Sect. 4 to a mixture of two Gaussian distributions. In Sect. 6 we show an algorithm to project an observation to a tropical polynomial in terms of the tropical metric and investigate the best-fit tropical polynomial to a sample when the variances are very small.

1.1 Contribution

We characterize the matrix associate with the Plücker coordinate for the best-fit Stiefel tropical linear space of dimension \(m-1\) to a sample generated by a Gaussian distribution over the tropical projective torus \({\mathbb {R}}^d \!/{\mathbb {R}} \textbf{1}\) as we send all variances to zero. Then we also characterize the matrix associate with the Plücker coordinate for the best-fit Stiefel tropical linear space of dimension \(m-1\) to a sample generated by a mixture of l many Gaussian distributions over the tropical projective torus \({\mathbb {R}}^d \!/{\mathbb {R}} \textbf{1}\) as we send all variances to zero. Then we investigate the best-fit tropical polynomial to a sample generated by a mixture of Gaussian distributions and propose one way to estimate the best-fit tropical polynomial equation to fitting the set of such observations.

2 Basics of Stiefel tropical linear spaces

Recall that through this paper we consider the tropical projective torus \({\mathbb {R}}^d \!/{\mathbb {R}} \textbf{1}\), which is isometric to \({\mathbb {R}}^{d-1}\). Here is a remark for the experts: we observe that tropical linear spaces are subsets of the tropical projective space \(({\mathbb {R}}\cup \{-\infty \})^d/{\mathbb {R}}{} \textbf{1}\) rather than the tropical projective torus \({\mathbb {R}}^d/{\mathbb {R}} \textbf{1}\). This relatively technical point will not be important in what follows, as the projection of a point in the tropical projective torus into a Stiefel tropical linear space remains in the tropical projective torus. So in the basic definitions, we will use \((\mathbb R\cup \{-\infty \})^d/{\mathbb {R}}{} \textbf{1}\) instead of \({\mathbb {R}}^d \!/{\mathbb {R}} \textbf{1}\). For basics of tropical geometry, see [3] for more details. In addition, the authors recommend readers to see [14] which contains very nice properties of tropical linear spaces and tropical convexity with the max-plus algebra.

Definition 1

(Tropical Arithmetic Operations) Throughout this paper we will perform arithmetic in the max-plus tropical semiring \((\,\mathbb {R} \cup \{-\infty \},\boxplus ,\odot )\,\). In this tropical semiring, the basic tropical arithmetic operations of addition and multiplication are defined as:

$$\begin{aligned} a \boxplus b := \max \{a, b\}, ~~~~ a \odot b := a + b, ~~~~ \text{ where } a, b \in \mathbb {R}\cup \{-\infty \}. \end{aligned}$$

Definition 2

(Tropical Scalar Multiplication and Vector Addition) For any scalars \(a,b \in \mathbb {R}\cup \{-\infty \}\) and for any vectors \(v = (v_1, \ldots ,v_d), w= (w_1, \ldots , w_d) \in (\mathbb {R}\cup \{-\infty \})^d\), we define tropical scalar multiplication and tropical vector addition as follows:

$$\begin{aligned} a \odot v:= & {} \left( a + v_1, \ldots ,a + v_d\right) ,\\ a \odot v \boxplus b \odot w := & {} \left( \max \left\{ a+v_1,b+w_1\right\} , \ldots , \max \left\{ a+v_d,b+w_d\right\} \right) . \end{aligned}$$

Definition 3

(Generalized Hilbert Projective Metric) For any two vectors \(v, \, w \in (\mathbb R\cup \{-\infty \})^d \!/{\mathbb {R}} \textbf{1}\), the tropical distance \(d_{\textrm{tr}}(v,w)\) between v and w is defined as:

$$\begin{aligned} d_{\textrm{tr}}(v,w){} & {} := \max _{i,j} \left\{ |v_i - w_i - v_j + w_j |: 1 \le i < j \le d \right\} \\{} & {} = \max _{i} \left\{ v_i - w_i \bigr \} - \min _{i} \bigl \{ v_i - w_i \right\} , \end{aligned}$$

where \(v = (v_1, \ldots , v_d)\) and \(w= (w_1, \ldots , w_d)\).

Remark 1

(Lemma 5.2 in [14]) The tropical metric \(d_{\textrm{tr}}\) over \({\mathbb {R}}^d \!/{\mathbb {R}} \textbf{1}\) is twice the quotient norm of the maximum norm on \({\mathbb {R}}^d\).

Definition 4

(Tropical Hyperplane) For any \(\omega :=(\omega _1, \ldots , \omega _d)\in {(\mathbb R\cup \{-\infty \})^d} \!/{\mathbb {R}} \textbf{1}\) such that \(\omega \not = (-\infty , \ldots , -\infty )\), the tropical hyperplane \(H_{\omega }\) is the set of points \(x\in ({\mathbb {R}}\cup \{-\infty \})^d \!/{\mathbb {R}} \textbf{1}\) such that the maximum of \(\{\omega _1+x_1, \ldots \omega _d+x_d\}\) is attained at least twice [2, 6]. We call \(\omega \) the normal vector of the tropical hyperplane \(H_{\omega }\).

Example 1

\(H_\omega \) for \((\omega _1,\omega _2,\omega _3)=(0,0,0)\), or simply \(H_0\), will be illustrated as the gray lines in Fig. 2 (left).

There is an explicit formula to compute a tropical distance from an observation to a tropical hyperplane (Lemma 2.1 and Corollary 2.3 in [15]). Therefore it is easier to find the best-fit tropical hyperplane over the tropical projective space. Meanwhile, it is not enough to work with best-fit tropical hyperplanes because we can reduce only one dimension from the ambient space with a tropical hyperplane. Therefore, in this paper, we consider a lower dimensional Stiefel tropical linear space as the subspace to which we project data points. In what follows, let [m] denotes the set of integers \(\{1, 2, \ldots , m\}\) where m is a positive integer.

Definition 5

(Tropical Matrix) For any \(V = \{v^{(1)}, \ldots , v^{(m)}\} \subset (\mathbb R\cup \{-\infty \})^d /{\mathbb {R}}{} \textbf{1}\), we define a tropical matrix \(M_V\) such that the size of \(M_V\) is \(m \times d\), and for any \(i\in [m]\), the i-th row of \(M_V\) is \(v^{(i)}\) (note here, we assume \(v^{(i)}\) is a row vector).

Definition 6

(Tropical Determinant) Let q be a positive integer. For any tropical matrix A of size \(q\times q\) with entries in \(\mathbb {R}\cup \{-\infty \}\), the tropical determinant of A is defined as:

$$\begin{aligned} \text {tdet\,}(A) := \max _{\sigma \in S_{q}}\left\{ A_{1, \sigma (1)}+A_{2, \sigma (2)}+\ldots +A_{q, \sigma (q)}\right\} , \end{aligned}$$

where \(S_{q}\) is all the permutations of \([q]:=\{1, \ldots , q\}\), and \(A_{i,j}\) denotes the (ij)-th entry of A.

Remark 2

The tropical determinant of any non-square matrix is \(-\infty \).

Our treatment of tropical linear spaces largely follows [10, Sections 3 and 4].

Definition 7

(Tropical Plücker Vector) Let \([d]:= \{1, \ldots , d\}\). For two positive integers dm with \(d>m\), if a map \(p:[d]^m\mapsto \mathbb {R}\cup \{-\infty \}\) satisfies the following conditions:

  1. 1.

    \(p(\omega )\) depends only on the unordered set \(\omega =\{\omega _1,\ldots ,\omega _m\}\subseteq [d]\),

  2. 2.

    \(p(\omega )=-\infty \) whenever \(\omega \) has fewer than m elements, and

  3. 3.

    for any \(\sigma = \{\sigma _1,\ldots ,\sigma _{m-1}\}\subseteq [d]\), and for any \(\tau = \{\tau _1,\ldots ,\tau _{m+1}\}\subseteq [d]\), the maximum

    $$\begin{aligned}\max \left\{ p\left( \sigma \cup \{\tau _1\}\right) +p\left( \tau \setminus \{\tau _1\}\right) ,\ldots ,p\left( \sigma \cup \{\tau _{m+1}\}\right) +p\left( \tau \setminus \{\tau _{m+1}\}\right) \right\} \end{aligned}$$

    is attained at least twice,

then we say p is a tropical Plücker vector.

Definition 8

(Tropical Plücker Coordinate) For two positive integers dm with \(d>m\), let \(p:[d]^m\mapsto \mathbb {R}\cup \{-\infty \}\) be a tropical Plücker vector. For any m-sized subset \(\omega \subseteq [d]\), \(p(\omega )\) is called tropical Plücker coordinate of p.

Definition 9

(Tropical Linear Space) Let \(p:[d]^m\mapsto \mathbb {R}\cup \{-\infty \}\) be a tropical Plücker vector. The tropical linear space of p is the set of points \(x\in ({\mathbb {R}}\cup \{-\infty \})^d/{\mathbb {R}} \textbf{1}\) such that, for any \(\tau = \{\tau _1,\ldots ,\tau _{m+1}\}\subseteq [d]\), the maximum

$$\begin{aligned} \max \left\{ p\left( \tau \setminus \{\tau _1\}\right) +x_{\tau _1}, \ldots , p\left( \tau \setminus \{\tau _{m+1}\}\right) +x_{\tau _{m+1}}\right\} \end{aligned}$$

is attained at least twice. Denote \(L_p\) the tropical linear space of p.

A note for experts: it is well known that tropical linear spaces are tropically convex [3, Proposition 5.2.8].

Definition 10

(Stiefel Tropical Linear Space) For two positive integers dm with \(d>m\), let A be a matrix of size \(m\times d\) with entries in \({\mathbb {R}}\cup \{-\infty \}\). For any m-sized subset \(\omega \subseteq [d]\), we write \(A_\omega \) for the \(m\times m\) matrix whose columns are the columns of A indexed by elements of \(\omega \). Notice that

$$\begin{aligned} p_{A}:\omega \mapsto \text {tdet\,}\left( A_\omega \right) \end{aligned}$$

is a tropical Plücker vector associate with A. The tropical linear space of \(p_{A}\) is called the Stiefel tropical linear space of A, denoted by L(A).

Remark 3

Let A be a tropical matrix of size \((d-1) \times d\). Then the Stiefel tropical linear space of A is a tropical hyperplane. Furthermore, any tropical hyperplane is a Stiefel tropical linear space [16, Remark 1.21]. For more details on the geometry of tropical linear spaces including tropical hyperplanes, see [14, 17].

Example 2

Let \( A = \left( \begin{matrix}-\omega _1 &{} -\infty &{} 0\\ -\infty &{} -\omega _2 &{} 0\end{matrix}\right) \). Then the tropical Plücker coordinates of \(p_A\) are \(p_{A}(\{1,2\})\!=\!\text {tdet\,}\left( \begin{matrix}-\omega _1&{}-\infty \\ infty&{}-\omega _2\end{matrix}\right) \!=\!-\omega _1-\omega _2\), \(p_{A}(\{1,3\})\!=\!\text {tdet\,}\left( \begin{matrix}-\omega _1&{}0\\ infty&{}0\end{matrix}\right) \!=\!-\omega _1\), and \(p_{A}(\{2,3\})\!=\!\text {tdet\,}\left( \begin{matrix}-\infty &{}0\\ omega_2&{}0\end{matrix}\right) \!=\!-\omega _2\). The Stiefel tropical linear space of A consists of x for which

$$\begin{aligned}{} & {} \max \left\{ p_{A}(\{2,3\})+x_1, ~ p_{A}(\{1,3\})+x_2, ~ p_{A}(\{1,2\})+x_3 \right\} \\= & {} \max \left\{ -\omega _2 + x_1, ~ -\omega _1 + x_2, ~ -\omega _1-\omega _2+x_3\right\} \end{aligned}$$

is attained at least twice. This is the tropical hyperplane \(H_{(\omega _1,\omega _2,0)}\).

Example 3

Let \(A = \left( \begin{matrix}0 &{} 5 &{} -5 &{} c\\ 0 &{} -5 &{} 5 &{} -c\end{matrix}\right) \) with \(0< c < 5\). Then the tropical Plücker coordinates of \(p_A\) are \(p_{A}(\{1,2\})=p_{A}(\{1,3\})=5\), \(p_{A}(\{1,4\})=c\), \(p_{A}(\{2,3\})=10\), \(p_{A}(\{2,4\})=5-c\), and \(p_{A}(\{3,4\})=5+c.\) The Stiefel tropical linear space of A consists of x for which the maximum is attained at least twice for any of the following four cases of \((2+1)\)-subsets of [4]:

$$\begin{aligned} \text {For } \tau =\{2,3,4\},{} & {} \max \left\{ 5+c+x_2, 5-c+x_3, 10+x_4\right\} .\end{aligned}$$
(1)
$$\begin{aligned} \text {For } \tau =\{1,3,4\},{} & {} \max \left\{ 5+c+x_1, c+x_3, 5+x_4\right\} .\end{aligned}$$
(2)
$$\begin{aligned} \text {For } \tau =\{1,2,4\},{} & {} \max \left\{ 5-c+x_1, c+x_2, 5+x_4\right\} .\end{aligned}$$
(3)
$$\begin{aligned} \text {For } \tau =\{1,2,3\},{} & {} \max \left\{ 10+x_1, 5+x_2, 5+x_3\right\} . \end{aligned}$$
(4)

Without loss of generality, we will set \(x_1=0\). From (4), Case-A (\(x_2=5\) and \(x_3 \le 5\)), Case-B (\(x_3=5\) and \(x_2 \le 5\)), or Case-C (\(x_2=x_3 \ge 5\)) holds.

  • Case-A: (3) \(\Rightarrow x_4=c \Rightarrow \) (2), (1) (already satisfied). Thus, \(x_2=5, x_3 \le 5, x_4=c\).

  • Case-B: (2) \(\Rightarrow x_4 \le c\). (1) \(\Leftrightarrow \) (3). Thus, \(x_2=5-2c, x_3=5, x_4\le -c\) or \(x_2\le 5-2c, x_3=5, x_4=-c\) or \(x_3=5, x_2=x_4+5-c, -c \le x_4 \le c\).

  • Case-C: (3) \(\Rightarrow x_4 = x_3 + c-5 \Rightarrow \) (2), (1). Thus, \(x_2=x_3 \ge 5, x_4=x_3+c-5\).

Taken together, the Stiefel tropical linear space for \(c=1\) is shown in Fig. 1. The two hinges in the tropical linear space \((0,5-2c,5,-c)\) and (0, 5, 5, c) are connected and each hinge is also connected to two more half lines in a balanced manner.

Fig. 1
figure 1

The Stiefel tropical linear space in Example 3 with \(c=1\) (red). The blue perpendicular represents the projection of \(u=(0,7,0,1)\) to \(w=(0,5,0,1)\) on the tropical linear space in Example 4

Remark 4

This Stiefel tropical linear space contains the unique tropical line segment connecting \(p=(0,5,-5,1)\) and \(q=(0,-5,5,-1) (= p + (0,0,10,0) + (0,-2,0,-2) + (0,-8,0,0))\) attached with the four half lines [17]. The Stiefel tropical linear space for \(m=2\) as in Example 3 is one-dimensional and is determined by specifying two points in a general position on it ([17] or Theorem 17 in this paper). In practice, the one-dimensional Stiefel tropical linear spaces are suitable for visualization.

Remark 5

Although the four conditions (1), (2), (3) and (4) are imposed, the solution is neither a point nor empty. That is, the conditions are somehow redundant and it is not the case that each condition reduces one dimension. For example, the intersection (and the stable intersection) of only (1) and (2) is already \(p_A\). Here the stable intersection, computable only with tropical Plücker coordinates, reduces dimensions without fail [11]. The intersection of (1) and (3) calculated by hand is a mixed dimensional set, while the stable intersection of (1) and (3) results in the one-dimensional Stiefel tropical linear space which is different from \(p_A\). Similarly, the stable intersection of (3) and (4) is \(p_A\). Finally, the stable intersection of \(p_A\) and (3) is a point \((0,5-2c,5,-c)\).

To perform a “tropical principal component analysis”, we need to project a data point onto a Stiefel tropical linear space, which is realized by the Red and Blue Rules [10, Theorem 15].

Theorem 1

(The Blue Rule) Let \(p:[d]^m\mapsto {\mathbb {R}}\cup \{-\infty \}\) be a tropical Plücker vector and \(L_p\) its associated tropical linear space. Fix \(u\in {\mathbb {R}}^d/{\mathbb {R}} \textbf{1}\), and define the point \(w\in {\mathbb {R}}^d/{\mathbb {R}} \textbf{1}\) whose i-th coordinate is

$$\begin{aligned} \quad w_i \,\, = \,\, \textrm{max}_\tau \,\textrm{min}_{j \not \in \tau } \bigl \{ u_j + p({\tau \cup \{i\}}) - p({\tau \cup \{j\}}) \bigr \}, \end{aligned}$$

for \(i = 1,2,\ldots , d\) and \(\tau \) runs over all \((m-1)\)-subsets of [d] that do not contain i. Then \(w\in L_p\), and any other \(x\in L_p\) satisfies \(d_{tr}(u,x)\ge d_{tr}(u,w)\). In other words, w attains the minimum distance of any point in \(L_p\) to u.

Remark 6

This closest point may not be unique and there may be other points in \(L_p\) which have the same tropical distance from u.

Theorem 2

(The Red Rule) Let \(p:[d]^m\mapsto {\mathbb {R}} \cup \{-\infty \}\) be a tropical Plücker vector and \(L_p\) its associated tropical linear space. Fix \(u\in {\mathbb {R}}^d/{\mathbb {R}} \textbf{1}\). Let v be the all-zeros vector. For every \((m+1)\)-sized subset \(\tau \) of [d], compute \(\max \{p({\tau -\tau _i}) + u_{\tau _i}\}\). If this maximum is unique, attained with index \(\tau _i\), then let \(\gamma _{\tau ,\tau _i}\) be the positive difference between the second maximum and this maximum, and set \(v_{\tau _i}=\max \{v_{\tau _i}, \gamma _{\tau ,\tau _i}\}\).

Then v gives the difference between u and a closest point of \(L_p\). In particular, if w is the point in \(L_p\) returned by the Blue Rule, we have \(u = w + v\).

Example 4

The projection from a point \(u=(0,7,0,1) \in {\mathbb {R}}^4/{\mathbb {R}} \textbf{1}\) to the Stiefel tropical linear space in Example 3 with \(c=1\) is given by the Blue Rule as

$$\begin{aligned} w_i{} & {} := \max _{\tau \ne i} \left\{ \min _{j \ne \tau } \left[ u_j + p(\{\tau ,i\}) - p(\{\tau ,j\}) \right] \right\} \\{} & {} = \max _{\tau \ne i} \left\{ p(\{\tau ,i\}) + \min _{j \ne \tau } \left[ u_j - p(\{\tau ,j\}) \right] \right\} , \end{aligned}$$

where the second term is independent of i. As we do not want to repeat the same calculation for different i, we define

$$\begin{aligned} C(\tau ) := \min _{j \ne \tau } \left[ u_j - p(\{\tau ,j\}) \right] = \min _{j \ne \tau } \left( \begin{array}{r} 0 - p(\{\tau ,1\}) \\ 7 - p(\{\tau ,2\}) \\ 0 - p(\{\tau ,3\}) \\ 1 - p(\{\tau ,4\}) \\ \end{array} \right) _j , \end{aligned}$$
(5)

whose values are

$$\begin{aligned} C(1)= & {} \min \left( \begin{array}{r} 7 - p(\{1,2\}) \\ 0 - p(\{1,3\}) \\ 1 - p(\{1,4\}) \\ \end{array} \right) = \min \left( \begin{array}{r} 7 - 5 \\ 0 - 5 \\ 1 - 1 \\ \end{array} \right) = -5 , \nonumber \\ C(2)= & {} \min \left( \begin{array}{r} 0 - p(\{2,1\}) \\ 0 - p(\{2,3\}) \\ 1 - p(\{2,4\}) \\ \end{array} \right) = \min \left( \begin{array}{c} 0 - 5 \\ 0 - 10\\ 1 - 4 \\ \end{array} \right) = -10 , \nonumber \\ C(3)= & {} \min \left( \begin{array}{r} 0 - p(\{3,1\}) \\ 7 - p(\{3,2\}) \\ 1 - p(\{3,4\}) \\ \end{array} \right) = \min \left( \begin{array}{c} 0 - 5 \\ 7 - 10 \\ 1 - 6 \\ \end{array} \right) = -5 , \nonumber \\ C(4)= & {} \min \left( \begin{array}{r} 0 - p(\{4,1\}) \\ 7 - p(\{4,2\}) \\ 0 - p(\{4,3\}) \\ \end{array} \right) = \min \left( \begin{array}{r} 0 - 1 \\ 7 - 4 \\ 0 - 6 \\ \end{array} \right) = -6 . \end{aligned}$$
(6)

Thus

$$\begin{aligned} w_i = \max _{\tau \ne i} \left\{ p(\{\tau ,i\}) + C(\tau ) \right\} = \max _{\tau \ne i} \left( \begin{array}{c} p(\{1,i\}) -5 \\ p(\{2,i\}) -10 \\ p(\{3,i\}) -5 \\ p(\{4,i\}) -6 \\ \end{array} \right) _\tau , \end{aligned}$$
(7)

that is,

$$\begin{aligned} w_1= & {} \max \left( \begin{array}{c} p(\{2,1\}) -10\\ p(\{3,1\}) -5\\ p(\{4,1\}) -6\\ \end{array} \right) = \max \left( \begin{array}{c} 5 - 10 \\ 5 - 5 \\ 1 - 6 \\ \end{array} \right) = 0 , \nonumber \\ w_2= & {} \max \left( \begin{array}{r} p(\{1,2\}) -5\\ p(\{3,2\}) -5\\ p(\{4,2\}) -6\\ \end{array} \right) = \max \left( \begin{array}{c} 5 - 5 \\ 10 - 5 \\ 4 - 6 \\ \end{array} \right) = 5 , \nonumber \\ w_3= & {} \max \left( \begin{array}{c} p(\{1,3\}) -5\\ p(\{2,3\}) -10\\ p(\{4,3\}) -6\\ \end{array} \right) = \max \left( \begin{array}{c} 5 - 5 \\ 10 - 10 \\ 6 - 6 \\ \end{array} \right) = 0 , \nonumber \\ w_4= & {} \max \left( \begin{array}{c} p(\{1,4\}) -5\\ p(\{2,4\}) -10\\ p(\{3,4\}) -5\\ \end{array} \right) = \max \left( \begin{array}{c} 1 - 5 \\ 4 - 10 \\ 6 - 5 \\ \end{array} \right) = 1 . \end{aligned}$$
(8)

So the Blue Rule outputs the vector (0, 5, 0, 1).

The Red Rule constructs a vector v as follows. First, we begin with \(v = (0, 0, 0, 0)\). Next we take the 3-sized subset \(\tau \) of [4] to redefine the components of v. When \(\tau = \{2,3,4\}\), compute \(\max \{p(\{3,4\})+u_2, p(\{2,4\})+u_3, p(\{2,3\})+u_4\} = \max \{6 + 7, 4 + 0, 10 + 1\} = 13\), whose index is 2 so \(v_2 = 13-11=2\). When \(\tau = \{1,3,4\}\), compute \(\max \{p(\{3,4\})+u_1, p(\{1,4\})+u_3, p(\{1,3\})+u_4\} = \max \{6 + 0, 1 + 0, 5 + 1\} = 6\), whose index is 1 and 4. As it is tie, v is not renewed. When \(\tau = \{1,2,4\}\), compute \(\max \{p(\{2,4\})+u_1, p(\{1,4\})+u_2, p(\{1,2\})+u_4\} = \max \{4 + 0, 1 + 7, 5 + 1\} = 8\), whose index is 2 so \(v_2 = 8-6=2\). When \(\tau = \{1,2,3\}\), compute \(\max \{p(\{2,3\})+u_1, p(\{1,3\})+u_2, p(\{1,2\})+u_3\} = \max \{10 + 0, 5 + 7, 5 + 0\} = 12\), whose index is 2 so \(v_2 = 12 - 10 = 2\). Hence the output vector is \( v = (0,2,0,0)\).

The statement of the Theorem 2 that \(u = w + v\) holds clearly.

Remark 7

For simplicity, when \(\tau \) contains only one element, we treat it as a positive integer instead of a set.

We write \(\pi _{L(A)}\) as the projection function which takes a point \(u\in \mathbb {R}^d/{\mathbb {R}} \textbf{1}\) and returns the nearest point \(w\in L(A)\) given by the Blue Rule. Depending on the size of m (i.e., the number of rows of A), we may prefer to use either the Blue Rule or the Red Rule to compute \(\pi _{L(A)}(u)\). If m is relatively small, then we can compute \(\pi _{L(A)}(u)\) naively with the Blue Rule in \(O(d^{m+1})\) time of operations. If m is relatively large, conversely, then we can use the Red Rule to compute the projection in \(O(m\cdot (d/m)^{m+1})\) time of operations. In practice, we note that most of the permutations considered in the Red and Blue Rules do not seem to affect the computation; There is a faster algorithm than Red Rule and Blue Rule to compute a projection onto a tropical linear space by Theorem 2 in [18]. However, in this research we only considered Red Rule and Blue Rule.

3 Best-fit Stiefel tropical linear space

In analogy with the classical PCA, the \((m-1)\)-th tropical PCA in [6] minimizes the sum of the tropical distances between the data points and their projections onto a best-fit Stiefel tropical linear space of dimension \(m-1\), defined by a tropical matrix of size \(m\times d\).

Definition 11

(Best-fit Stiefel Tropical Linear Space) Suppose we have a sample \(\mathcal {S} = \{x^{(1)}, \ldots , x^{(n)}\} \subset \mathbb {R}^d /{\mathbb {R}}{} \textbf{1}\). Let A be a tropical matrix of size \(m\times d\) with \(d>m\), and let L(A) be the Stiefel tropical linear space of A. If L(A) minimizes

$$\begin{aligned} \sum _{i=1}^n d_{\textrm{tr}} \left( x^{(i)}, \pi _{L(A)}\left( {x}^{(i)}\right) \right) , \end{aligned}$$

then we say L(A) is a \((m-1)\)-dimensional best-fit Stiefel tropical linear space of \(\mathcal {S}\). Here we recall that \(\pi _{L(A)}({x}^{(i)})\) is the projection of \(x^{(i)}\) onto the Stiefel tropical linear space L(A) for \(i = 1, \ldots n\).

Example 5

In the case of \(m=2\) and \(d=3\), the Stiefel tropical linear space becomes a hyperplane as shown in Example 2. For the sample \(\mathcal {S} = \{x^{(1)}, \ldots , x^{(8)}\} \subset \mathbb {R}^3 /{\mathbb {R}}{} \textbf{1}\) in Fig. 2 (left), the best fit hyperplane according to the numerical calculation in Fig. 2(middle) has the normal vector \(\omega =(0,0)\), which is the coordinate of the apex (hinge point).

Fig. 2
figure 2

(left) The best-fit hyperplane (gray) and the Fermat–Weber points (green) for the eight points (top) and iris data (bottom, only Setosa and Versicolor used). (middle) The contour plots of the cost function for the tropical PCA with minimum pointed by the red cross for the eight points (top) and iris data (bottom). (right) The contour plot of the cost function for the Fermat–Weber point (green) for the eight points (top) and iris data (bottom)

Definition 12

(Fermat–Weber Point) Suppose we have a sample \(\mathcal {S} = \{x^{(1)}, \ldots , x^{(n)}\} \subset \mathbb {R}^d /{\mathbb {R}}{} \textbf{1}\). A Fermat–Weber point \(x^*\) of \(\mathcal {S}\) is defined as:

$$\begin{aligned} x^* := arg\,min_{z \in \mathbb {R}^d /{\mathbb {R}}{} \textbf{1}}\sum _{i = 1}^n d_{\textrm{tr}}\left( z, x^{(i)}\right) . \end{aligned}$$

Remark 8

Under the tropical metric \(d_{\textrm{tr}}\), a Fermat–Weber point is not unique [19].

Remark 9

A Fermat–Weber point is a 0-dimensional best-fit Stiefel tropical linear space of a sample with respect to the tropical metric over the tropical projective torus \({\mathbb {R}}^d/{\mathbb {R}} \textbf{1}\).

Example 6

The Fermat–Weber points for Example 5 is all the points in the green region in Fig. 2 (left) according to the numerical calculation in Fig. 2 (right).

4 Gaussian distribution fitted by Stiefel tropical linear spaces over \(\mathbb {R}^d/\mathbb {R}{} \textbf{1}\)

4.1 Best-fit tropical hyperplanes

As a simple special case of the tropical PCA, we begin with a sample \(\mathcal {S} = \{X_1, \ldots , X_d\}\) from a single uncorrelated Gaussian, i.e., \(X_i \sim N((0,\ldots ,0), \sigma \mathbb {I}_{d \times d})\), where \(\mathbb {I}_{d \times d}\) is the identity matrix and \(\sigma > 0\), as well as its best-fit hyperplane. The first goal is to show that the best-fit hyperplane as \(\sigma \rightarrow 0\) is the one whose apex is located at the center of the Gaussian.

Lemma 3

Let \(X_1, X_2, X_3 \sim N(0,\sigma ^2)\). Then the mean tropical distances in \({\mathbb {R}}^3/{\mathbb {R}} \textbf{1}\) from \((X_1,X_2,X_3)\) to the tropical hyperplane \(H_0\) and the tropical line consisting of (0, 0, z) for \(z \in {\mathbb {R}}\) are given by \(\frac{3}{2\sqrt{\pi }}\sigma \) and \(\frac{2}{\sqrt{\pi }}\sigma \).

Proof

As \((X_1,X_2,X_3) = (X_1-X_3,X_2-X_3,0)\) in \({\mathbb {R}}^3/{\mathbb {R}} \textbf{1}\), we define new coordinates as

$$\begin{aligned} \left\{ \, \begin{aligned}&Y_1 := \frac{(X_1-X_3)+(X_2-X_3)}{\sqrt{2}}\\&Y_2 := \frac{-(X_1-X_3)+(X_2-X_3)}{\sqrt{2}} \end{aligned} \right. ~ ~ ~ ,\text { whose covariances are} ~ ~ \Sigma _Y = \left( \begin{array}{cc} 3\sigma ^2 &{} 0 \\ 0 &{} \sigma ^2 \\ \end{array} \right) . \end{aligned}$$

Then, due to the symmetry in integration, the mean tropical distance to \(H_0\) is given by

$$\begin{aligned}{} & {} \int _{-\infty }^\infty \int _{-\infty }^\infty d_\textrm{tr}\left( \left( \frac{y_1-y_2}{\sqrt{2}},\frac{y_1+y_2}{\sqrt{2}},0\right) ,H_0\right) \frac{1}{2\pi \sqrt{3}\sigma ^2}e^{-\frac{y_1^2}{2\left( 3\sigma ^2\right) }}e^{-\frac{y_2^2}{2\sigma ^2}} dy_1 dy_2 \\{} & {} \quad = 2 \int _0^\infty dy_2 \int _{-\infty }^\infty dy_1 d_\textrm{tr}\left( \left( \frac{y_1-y_2}{\sqrt{2}},\frac{y_1+y_2}{\sqrt{2}},0\right) ,H_0\right) \frac{1}{2\pi \sqrt{3}\sigma ^2}e^{-\frac{y_1^2}{2(3\sigma ^2)}}e^{-\frac{y_2^2}{2\sigma ^2}} \\{} & {} \quad = 2 \int _0^\infty \int _0^\infty \left[ d_\textrm{tr}\left( \left( \frac{y_1-y_2}{\sqrt{2}},\frac{y_1+y_2}{\sqrt{2}},0\right) ,H_0\right) + d_\textrm{tr}\left( \left( \frac{-y_1-y_2}{\sqrt{2}},\frac{-y_1+y_2}{\sqrt{2}},0\right) ,H_0\right) \right] \\{} & {} \quad \quad \times \frac{1}{2\pi \sqrt{3}\sigma ^2}e^{-\frac{y_1^2}{2\left( 3\sigma ^2\right) }}e^{-\frac{y_2^2}{2\sigma ^2}} dy_1 dy_2 \\{} & {} \quad = { 2 \left( \int _0^\infty \int _0^{y_1} + \int _0^\infty \int _{y_1}^\infty \right) \left[ d_\textrm{tr}\left( \left( \frac{y_1-y_2}{\sqrt{2}},\frac{y_1+y_2}{\sqrt{2}},0\right) ,H_0\right) + d_\textrm{tr}\left( \left( \frac{-y_1-y_2}{\sqrt{2}},\frac{-y_1+y_2}{\sqrt{2}},0\right) ,H_0\right) \right] } \\{} & {} \quad \quad \times { \frac{1}{2\pi \sqrt{3}\sigma ^2}e^{-\frac{y_1^2}{2\left( 3\sigma ^2\right) }}e^{-\frac{y_2^2}{2\sigma ^2}} dy_2 dy_1 } \\{} & {} \quad = { 2\int _0^\infty \int _0^{y_1} \left[ \frac{y_2+y_1}{\sqrt{2}}\right] \frac{1}{2\pi \sqrt{3}\sigma ^2}e^{-\frac{y_1^2}{2\left( 3\sigma ^2\right) }}e^{-\frac{y_2^2}{2\sigma ^2}} dy_2 dy_1 } \\{} & {} \quad \quad + { 2\int _0^\infty \int _{y_1}^\infty \left[ \frac{2 y_2}{\sqrt{2}}\right] \frac{1}{2\pi \sqrt{3}\sigma ^2}e^{-\frac{y_1^2}{2\left( 3\sigma ^2\right) }}e^{-\frac{y_2^2}{2\sigma ^2}} dy_2 dy_1 } \\{} & {} \quad = 2 \int _0^\infty \int _0^\infty \sqrt{2}y_2 \frac{1}{2\pi \sqrt{3}\sigma ^2}e^{-\frac{y_1^2}{2\left( 3\sigma ^2\right) }}e^{-\frac{y_2^2}{2\sigma ^2}} dy_1 dy_2 \\{} & {} \quad \quad + 2 \int _0^\infty \int _0^{y_1} \frac{y_1 - y_2}{\sqrt{2}} \frac{1}{2\pi \sqrt{3}\sigma ^2}e^{-\frac{y_1^2}{2\left( 3\sigma ^2\right) }}e^{-\frac{y_2^2}{2\sigma ^2}} dy_1 dy_2 \\{} & {} \quad = \frac{3}{2\sqrt{\pi }}\sigma . \end{aligned}$$

There we used

$$\begin{aligned} \int _0^\infty \int _0^\infty y_2 \frac{1}{2\pi \sqrt{3}\sigma ^2}e^{-\frac{y_1^2}{2\left( 3\sigma ^2\right) }}e^{-\frac{y_2^2}{2\sigma ^2}} dy_1 dy_2 = \frac{1}{2}\frac{\sigma }{\sqrt{2\pi }}, \\ \int _0^\infty \int _0^{y_1} y_1 \frac{1}{2\pi \sqrt{3}\sigma ^2} e^{-\frac{y_1^2}{2\left( 3\sigma ^2\right) }}e^{-\frac{y_2^2}{2\sigma ^2}} dy_1 dy_2= & {} \frac{3\sqrt{2}\sigma }{8\sqrt{\pi }}, \end{aligned}$$

and

$$\begin{aligned} \int _0^\infty \int _0^{y_1} y_2 \frac{1}{2\pi \sqrt{3}\sigma ^2} e^{-\frac{y_1^2}{2\left( 3\sigma ^2\right) }}e^{-\frac{y_2^2}{2\sigma ^2}} dy_1 dy_2= & {} \frac{\sqrt{2}\sigma }{8\sqrt{\pi }}. \end{aligned}$$

As \(\min _z d_{\textrm{tr}}((X_1,X_2,X_3),(0,0,z)) = d_\textrm{tr}((X_1,X_2,X_3),(0,0, X_3 - \frac{X_1+X_2}{2})) = |X_2-X_1|\), the mean tropical distance to the line is, by the symmetry in integration, given by

$$\begin{aligned} \int _{-\infty }^\infty \int _{-\infty }^\infty \left|x_2-x_1 \right|\frac{1}{2\pi \sigma ^2}e^{-\frac{x_1^2+x_2^2}{2\sigma ^2}} dx_1 dx_2= & {} \int _0^\infty \int _{x_2}^\infty 8 x_1 \frac{1}{2\pi \sigma ^2}e^{-\frac{x_1^2+x_2^2}{2\sigma ^2}} dx_2 dx_1\\= & {} \frac{2}{\sqrt{\pi }}\sigma . \end{aligned}$$

\(\square \)

Lemma 4

Let \(X_1, X_2, X_3 \sim N(0,\sigma ^2)\). Then the mean tropical distances in \({\mathbb {R}}^3/{\mathbb {R}} \textbf{1}\) from \((X_1,X_2,X_3)\) to the tropical hyperplane \(H_0\) and to the tropical hyperplane \(H_{(0,0,-c)}\) for \(c > 0\) divided by \(\sigma \) are given by \(\frac{3}{2\sqrt{\pi }}\) and \(\frac{2}{\sqrt{\pi }}\) as \(\sigma \rightarrow 0\).

Proof

The distance to \(H_0\) is given in Lemma 3. We regard \(x^{\prime }_1=\frac{x_1-x_3}{\sigma }\) and \(x^{\prime }_2=\frac{x_2-x_3}{\sigma }\) as random variables, whose joint probability density function \(p(x^{\prime }_1,x^{\prime }_2)\) was shown to be the correlated Gaussian in the proof of Lemma 3, to get

$$\begin{aligned}{} & {} \lim _{\sigma \rightarrow 0} \left( \int _{-\infty }^{-c/\sigma } + \int _{-c/\sigma }^\infty \right) \left( \int _{-\infty }^{-c/\sigma } + \int _{-c/\sigma }^\infty \right) \\{} & {} \qquad \times \frac{1}{\sigma } d_\textrm{tr}\left( (x_1-x_3,x_2-x_3,0),H_{(0,0,-c)}\right) p\left( \frac{x_1-x_3}{\sigma },\frac{x_2-x_3}{\sigma })d(\frac{x_1-x_3}{\sigma }\right) d\left( \frac{x_2-x_3}{\sigma }\right) \\{} & {} \quad =\lim _{\sigma \rightarrow 0} \left( \int _{-\infty }^{-c/\sigma } + \int _{-c/\sigma }^\infty \right) \left( \int _{-\infty }^{-c/\sigma } + \int _{-c/\sigma }^\infty \right) d_{\textrm{tr}}\left( \left( x^{\prime }_1,x^{\prime }_2,0\right) ,H_{\left( 0,0,-c/\sigma \right) }\right) p\left( x^{\prime }_1,x^{\prime }_2\right) d\left( x^{\prime }_1\right) d\left( x^{\prime }_2\right) \\{} & {} \quad = \lim _{\sigma \rightarrow 0} \int _{-c/\sigma }^\infty \int _{-c/\sigma }^\infty d_\textrm{tr}\left( \left( x^{\prime }_1,x^{\prime }_2,0\right) ,H_{\left( 0,0,-c/\sigma \right) }\right) p\left( x^{\prime }_1,x^{\prime }_2\right) d\left( x^{\prime }_1\right) d\left( x^{\prime }_2\right) \\{} & {} \quad = \frac{2}{\sqrt{\pi }}. \end{aligned}$$

\(\square \)

It was shown that the mean of the sum of distances between observations and their projections onto a tropical hyperplane \(H_\omega \) for \(\omega \in {\mathbb {R}}^d \!/{\mathbb {R}} \textbf{1}\) takes the minimum with \(H_0\), i.e., when the center of Gaussian is on the apex of the hyperplane with \(d=3\). We are curious if the same holds for the hyperplanes with general d. Imagine, if the Gaussian center is outside of \(H_0\), the distance remains finite (\(>0\)) even if \(\sigma \rightarrow 0\). Thus it suffices to consider the case when the Gaussian center is on \(H_0\) (if not exactly on the apex) to find the best-fit hyperplane. Furthermore, as the definition of \(H_0\) is that the maximum of \({x_1, x_2, \ldots , x_d}\) is attained at least twice on it, we only need to separately consider the cases by how many times the maximum is attained, actually.

Theorem 5

Let \(X_1, X_2, \ldots , X_d \sim N(0,\sigma ^2)\). Then, in the limit \(\sigma \rightarrow 0\), the mean tropical distances divided by \(\sigma \) from \((X_1,X_2,\ldots , X_d) \in {\mathbb {R}}^{d}/{\mathbb {R}} \textbf{1}\) to the tropical hyperplane \(H_\omega \), that passes through the origin, only depend on how many times(=k-times) the maximum is attained in the defining equations at the origin. It is the mean tropical distance to the projection to the k-dimensional \(H_0\) \((=d_{\textrm{tr}}((X_1, \ldots , X_k), H_0))\). Specifically, when the maximum is attained three times or twice, it is \(\frac{3}{2\sqrt{\pi }}\) or \(\frac{2}{\sqrt{\pi }}\), respectively.

Proof

When the hyperplane \(H_\omega \) passes through the origin, \(\max \{\omega _1, \omega _2, \ldots , \omega _d\}\) is attained at least twice (k times). By changing the coordinates, the condition can be written as

$$\begin{aligned} \omega _1 = \omega _2 = \cdots = \omega _k = 0 ~ ~ \text {and} ~ ~ \omega _i< 0 ~ ~ \text {for} ~ ~ k < i \le d. \\ \begin{array}{rl} &{} d_{\textrm{tr}}\left( \left( X_1, \ldots , X_k, X_{k+1}, \ldots , X_d\right) , H_\omega \right) \\ = &{} d_{\textrm{tr}}\left( \left( X_1, \ldots , X_k, X_{k+1}+\omega _{k+1}, \ldots , X_d+\omega _d\right) , H_0\right) \\ = &{} \max _{1 \le i \le k} X_i - \textrm{2nd} \max _{1 \le i \le k} X_i . \end{array} \end{aligned}$$

Note that the last equation holds only within the neighborhood of the origin, \(|x_i|< |\max _{k < j \le d} \omega _j|/2\) for \(1 \le i \le d\), which is satisfied when \(\sigma \rightarrow 0\). The numerical calculation of the distance is plotted in Fig. 3 and the specific cases with \(k=2,3\) coincide with Theorem 5. \(\square \)

Remark 10

A point on \(H_0\) can be represented as \(x_1=x_2=\ldots =x_k=c\) for some \(k \le d\). Specifically, the apex is the highest codimensional case where all \(x_i\)’s are equal \((k=d)\). Note that \(d_\textrm{tr}((X_1, \ldots , X_k, X_{k+1}, \ldots , X_d), H_\omega )\) in \(\mathbb R^{d}/{\mathbb {R}} \textbf{1}\) is the same as \(d_{\textrm{tr}}((X_1, \ldots , X_k), H_0)\) in \({\mathbb {R}}^{k}/{\mathbb {R}} \textbf{1}\) because the difference of the first and the second max of \((X_1,X_2,\ldots ,X_k)\) is equal to \(d_{\textrm{tr}}((X_1, \ldots , X_k), H_0)\). Thus it suffices to show that the mean distance to \(H_0\) decreases with k in Fig. 3 to prove that the mean distance takes the minimum for \(H_0\), i.e., when the center of Gaussian is on the apex of the hyperplane for general d.

Fig. 3
figure 3

Monte Carlo calculation of \(\mathbb {E} [d_{\textrm{tr}}((X_1, \ldots , X_k), H_0)]\) as a function of dimension k where \(X_1, X_2, \ldots , X_k \sim N(0,\sigma ^2)\). The average over \(10^6\) realizations suggests that it is monotonically decreasing. The red circle denotes the theoretical predictions \(\frac{2}{\sqrt{\pi }}\) and \(\frac{3}{2\sqrt{\pi }}\)

Conjecture 6

\(\mathbb {E} [d_{\textrm{tr}}((X_1, \ldots , X_k), H_0)]\) monotonically decreases with k. Thus, the hyperplane that fits X the best as \(\sigma \rightarrow 0\) converges to \(H_0\), i.e. when the apex is at the center of the Gaussian.

4.2 Best-fit Stiefel tropical linear spaces

Next we consider a non-hyperplane Stiefel tropical linear space as a subspace. In the hyperplane case, we have considered not only the convergence of the mean tropical distance to zero but also its convergece rate. Along this line, our ultimate goal is to prove the following conjecture.

Conjecture 7

Let \(X_1, X_2, \ldots , X_d \sim N(0,\sigma ^2)\). Then, as \(\sigma \rightarrow 0\), the expectation of the tropical distance from \((X_1,X_2,\ldots , X_d) \in {\mathbb {R}}^{d+1}/{\mathbb {R}} \textbf{1}\) to the Stiefel tropical linear space \(L_P\) divided by \(\sigma \) takes the minimum for \(P=0\).

However, for the general Stiefel tropical linear space, it is hard to consider the convergence rate exactly, although we can give its upper bound. Therefore, we mostly focus on the convergence although the minimizers whose mean distance goes to zero as \(\sigma \rightarrow 0\) is not unique in general. In what follows, we begin with a specific example of the (non-hyperplane) Stiefel tropical linear space for which the projection distance goes to zero as \(\sigma \rightarrow 0\). We end this section with a discussion on the non-uniqueness of the minimizer by showing that the mean distance goes to zero as \(\sigma \rightarrow 0\) when a Stiefel tropical linear space passes through the center of the Gaussian.

Lemma 8

The Plücker coordinates of the Stiefel tropical linear space associated with the \(2 \times d\) matrix,

$$\begin{aligned} A_1 = \left( \begin{matrix} \mu _1 &{} -\infty &{} 0 &{} \ldots &{} 0\\ -\infty &{} \mu _2&{} 0 &{} \ldots &{} 0 \end{matrix} \right) . \end{aligned}$$
(9)

are

$$\begin{aligned} p(\{1,2\})= \textrm{tdet} \left( \begin{matrix} \mu _1 &{} -\infty \\ -\infty &{} \mu _2 \end{matrix}\right) =\mu _1 + \mu _2, \\ p(\{1,i\})=\textrm{tdet}\left( \begin{matrix} \mu _1 &{} 0 \\ -\infty &{} 0 \end{matrix}\right) =\mu _1, \end{aligned}$$

for \(i = 3, \ldots , d\),

$$\begin{aligned} p(\{2,i\})=\textrm{tdet}\left( \begin{matrix}-\infty &{} 0\\ \mu _2 &{} 0 \end{matrix}\right) =\mu _2, \end{aligned}$$

for \(i = 3, \ldots , d\), and

$$\begin{aligned} p(\{i,j\})=\textrm{tdet}\left( \begin{matrix}0 &{} 0\\ 0 &{} 0 \end{matrix}\right) =0, \end{aligned}$$

for \(i \not = j\) such that \(i = 3, \ldots , d\) and \(j = 3, \ldots , d\).

For the purpose to unify the notation in the following proofs, we entirely use the indicator function for \(j \in \mathbb {N}\) with any fixed \(m \in \mathbb {N}\),

$$\begin{aligned} \text {I}_{j\le m}:= {\left\{ \begin{array}{ll}1 &{} \text{ if } j\le m \\ 0 &{} \text{ if } j> m\end{array}\right. }, \end{aligned}$$

and the Kronecker delta for \(i, j \in \mathbb {N}\),

$$\begin{aligned} \delta _{i j} := {\left\{ \begin{array}{ll}0 &{} \text{ if } i \ne j \\ 1 &{} \text{ if } i=j .\end{array}\right. } \end{aligned}$$

Lemma 9

Suppose \(X = (\mu _1 + \epsilon _1, \mu _2 + \epsilon _2, \epsilon _3, \ldots , \epsilon _d) \in \mathbb {R}^d/\mathbb {R}{} \textbf{1}\) where \(\mu _1, \mu _2, \epsilon _j \in \mathbb {R}\) for \(j = 1, \ldots , d\). Then the projected point \(X' \in \mathbb {R}^d/\mathbb {R}{} \textbf{1}\) of X onto the Stiefel tropical linear space of the matrix (9) is

$$\begin{aligned} X' = \left( \mu _1 + \min \{\epsilon ^*,\epsilon _1\}, \mu _2 + \min \{\epsilon ^*,\epsilon _2\}, \min \{\epsilon ^*,\epsilon _3\}, \ldots , \min \{\epsilon ^*,\epsilon _d\}\right) , \end{aligned}$$

where \(\epsilon ^*\) is the second smallest value in \(\{\epsilon _1, \ldots , \epsilon _d\}\).

Proof

     By using the indicator function, we can unify the notation as

$$\begin{aligned} X_j = \mu _j \text {I}_{j \le 2} + \epsilon _j, \end{aligned}$$

and, by Lemma 8,

$$\begin{aligned} p(\{\tau , i\}) = \mu _\tau \text {I}_{\tau \le 2} + \mu _i \text {I}_{i \le 2} . \end{aligned}$$

Then, the Blue Rule becomes

$$\begin{aligned} X'_i = \max _{\tau \ne i} \min _{j \ne \tau } \left( X_j+p(\{\tau , i\}) - p(\{\tau , j\})\right) = \mu _i \text {I}_{i \le 2} + \max _{\tau \ne i} \min _{j \ne \tau } \epsilon _j. \end{aligned}$$

Suppose \(\epsilon _{i_\text {min}}\) reaches the smallest value in \(\{\epsilon _1, \ldots , \epsilon _d\}\), then

$$\begin{aligned} X'_i = {\left\{ \begin{array}{ll} \mu _i \text {I}_{i \le 2} + \epsilon _{i_\text {min}} &{} i=i_\text {min} \\ \mu _i \text {I}_{i \le 2} + \epsilon ^* &{} i\ne i_\text {min} \end{array}\right. } \end{aligned}$$

\(\square \)

Remark 11

For example, if \(\epsilon _{1}\le \epsilon _{2}\le \ldots \le \epsilon _{d}\), then the second smallest value in \(\{\epsilon _1, \ldots , \epsilon _d\}\) is \(\epsilon _{2}\). The second smallest value in \(\{2,2,1\}\) is 2, and the second smallest value in \(\{2,1,1\}\) is 1.

Theorem 10

Suppose \(X = (\mu _1 + \epsilon _1, \mu _2 + \epsilon _2, \epsilon _3, \ldots , \epsilon _d) \in \mathbb {R}^d/\mathbb {R}{} \textbf{1}\) such that \(\mu _1, \mu _2 \in \mathbb {R}\) and \(\epsilon _j \sim N(0, \sigma )\) for \(j = 1, \ldots , d\). Let \(X' \in \mathbb {R}^d/\mathbb {R}{} \textbf{1}\) be the projected point of X onto the one-dimensional Stiefel tropical linear space of the matrix (9). Then the tropical distance between X and \(X'\) is

$$\begin{aligned} d_{\textrm{tr}}\left( X, X'\right) = \max _{1\le i\le d}\left( \epsilon _i - \epsilon ^*\right) , \end{aligned}$$

where \(\epsilon ^*\) is the second smallest value in \(\{\epsilon _1, \ldots , \epsilon _d\}\), and its expected value satisfies

$$\begin{aligned} \mathbb {E} \left[ d_{\textrm{tr}}\left( X, X'\right) \right] \le 2\sigma \sqrt{2\log (d)}. \end{aligned}$$

Specifically,

$$\begin{aligned} \lim _{\sigma \rightarrow 0} \mathbb {E} \left[ d_{\textrm{tr}}\left( X, X'\right) \right] = 0. \end{aligned}$$

Proof

Lemma 9 leads to the tropical distance. By the upper bound in [20],

$$\begin{aligned} \mathbb {E} \left[ d_{\textrm{tr}}\left( X, X'\right) \right]{} & {} \le \mathbb {E}\left[ \max _{1\le i\le d} \epsilon _i-\min _{1\le i\le d} \epsilon _i\right] = \mathbb {E}\left[ \max _{1\le i\le d} \epsilon _i\right] +\mathbb {E}\left[ \max _{1\le i\le d} \left( -\epsilon _i\right) \right] \\{} & {} \le 2\sigma \sqrt{2\log (d)}. \end{aligned}$$

\(\square \)

Next, we consider a generalization to the correlated Gaussian. Suppose we have a sample \(\mathcal {S} = \{X_1, \ldots , X_n\}\) where \(X_i \sim N(\mu , \Sigma )\), such that \(\mu =(\mu _1, \mu _2, 0, \ldots , 0)\in {\mathbb {R}}^d/{\mathbb {R}} \textbf{1}\) and \(\Sigma \in \mathbb {R}^{d \times d}\) such that

$$\begin{aligned} \Sigma = \left( \begin{array}{llllll} {2\sigma ^2} &{} {\sigma ^2} &{} 0 &{} 0 &{}\ldots &{}0\\ {\sigma ^2} &{} {2\sigma ^2} &{} 0 &{} 0 &{}\ldots &{}0\\ 0 &{} 0 &{} \sigma ^2 &{} 0 &{} \ldots &{}0\\ 0 &{} 0 &{} 0&{} \sigma ^2 &{} \ldots &{}0\\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots \\ 0 &{} 0 &{} 0 &{} 0 &{}\ldots &{} \sigma ^2 \\ \end{array} \right) , \end{aligned}$$

for \(\sigma > 0\). Then by [21, p. 202], we have

$$\begin{aligned} \begin{array}{lll} X_{i,1} = &{}\mu _1 + \sigma Z_{i,1} + {\sigma Z_{i} }&{} \\ X_{i,2} = &{}\mu _2 + \sigma Z_{i,2} + {\sigma Z_{i} } &{} \\ X_{i,j} = &{}\sigma Z_{i,j}&{} ~ ~ \text{ for } j = 3, \ldots , d,\\ \end{array} \end{aligned}$$

where \(X_i = (X_{i,1}, X_{i,2}, \ldots , X_{i,d})\) for \(i = 1, \ldots , d\), and \( { Z_{i}}, Z_{i,1}, Z_{i,2}, \ldots , Z_{i,d} \sim N(0, 1)\) for \(i = 1, \ldots , d\).

Lemma 11

Suppose \(X = (\mu _1 + \epsilon _1 + {\epsilon }, \mu _2 + \epsilon _2 + {\epsilon }, \epsilon _3, \ldots , \epsilon _d) \in \mathbb {R}^d/\mathbb {R}{} \textbf{1}\) where \(\mu _1, \mu _2, \epsilon , \epsilon _j \in \mathbb {R}\) for \(j = 1, \ldots , d\). Then the projected point \(X' \in \mathbb {R}^d/\mathbb {R}{} \textbf{1}\) of X onto the Stiefel tropical linear space of the matrix (9) is

$$\begin{aligned} X' =\left( \mu _1 + \min \{\epsilon ^*,\epsilon _1 + {\epsilon }\}, \mu _2 + \min \left\{ \epsilon ^*,\epsilon _2 + {\epsilon }\right\} , \min \left\{ \epsilon ^*,\epsilon _3\right\} , \ldots , \min \left\{ \epsilon ^*,\epsilon _d\right\} \right) , \end{aligned}$$

where \(\epsilon ^*\) is the second smallest value in \(\{(\epsilon _1 + {\epsilon }),(\epsilon _2 + {\epsilon }), \epsilon _3, \ldots , \epsilon _d\}\).

Proof

     By using

$$\begin{aligned} X_j = \mu _j \text {I}_{j \le 2} + \epsilon _j + {\epsilon \text {I}_{i \le 2}}, \end{aligned}$$

and

$$\begin{aligned} p(\{\tau , i\}) = \mu _\tau \text {I}_{\tau \le 2} + \mu _i \text {I}_{i \le 2} , \end{aligned}$$

the Blue Rule becomes

$$\begin{aligned} X'_i = \mu _i \text {I}_{i \le 2} + \max _{\tau \ne i} \min _{j \ne \tau } \left( \epsilon _j + { \epsilon \text {I}_{j \le 2}}\right) = \mu _i \text {I}_{i \le 2} + \min \left\{ \epsilon ^*, \epsilon _j + { \epsilon \text {I}_{j \le 2}}\right\} . \end{aligned}$$

Note that we essentially repeated the same arguments for \(\epsilon '_j := \epsilon _j + { \epsilon \text {I}_{j \le 2}}\) instead of \(\epsilon _j\) in Lemma 9. \(\square \)

Theorem 12

Suppose \(X = (\mu _1 + \epsilon _1 + {\epsilon }, \mu _2 + \epsilon _2 + {\epsilon }, \epsilon _3, \ldots , \epsilon _d) \in \mathbb {R}^d/\mathbb {R}{} \textbf{1}\) such that \(\mu _{1}, \mu _{2} \in \mathbb {R}\) and \(\epsilon _j \sim N(0, \sigma )\) for \(j = 1, \ldots , d\). Let \(X' \in \mathbb {R}^d/\mathbb {R}{} \textbf{1}\) be the projected point of X onto the one-dimensional Stiefel tropical linear space of the matrix (9). Then the tropical distance between X and \(X'\) is

$$\begin{aligned} d_{\textrm{tr}}\left( X, X'\right) = \max _{i = 1, \ldots , d}\left\{ \epsilon _i + { \epsilon \text {I}_{i \le 2}} - \epsilon ^*\right\} , \end{aligned}$$

where \(\epsilon ^*\) is the second smallest value in \(\{(\epsilon _1 + {\epsilon }),(\epsilon _2 + {\epsilon }), \epsilon _3, \ldots , \epsilon _d\}\), and its expected value satisfies

$$\begin{aligned} \mathbb {E} \left[ d_{\textrm{tr}}\left( X, X'\right) \right] \le {3}\sigma \sqrt{2\log (d)}. \end{aligned}$$

Specifically,

$$\begin{aligned} \lim _{\sigma \rightarrow 0} \mathbb {E} \left[ d_{\textrm{tr}}\left( X, X'\right) \right] = 0. \end{aligned}$$

Proof

Lemma 11 leads to the tropical distance. By the upper bound in [20],

$$\begin{aligned} \mathbb {E} \left[ d_{\textrm{tr}}\left( X, X'\right) \right] \le \mathbb {E} \left[ \max _{i = 1, \ldots , d}(2\epsilon _i)+\max _{i = 1, \ldots , d}(-\epsilon _i)\right] \le {3}\sigma \sqrt{2\log (d)}. \end{aligned}$$

\(\square \)

Next, we consider a generalization to more than one dimensional Stiefel tropical linear spaces. Suppose we have a sample \(\mathcal {S} = \{X_1, \ldots , X_n\}\) where \(X_i \sim N(\mu , \sigma ^2 \mathbb {I}_{d \times d})\), such that \(\mu =(\mu _1, \mu _2, \ldots , \mu _m, 0, \ldots , 0)\in {\mathbb {R}}^d/{\mathbb {R}} \textbf{1}\) for \(m < d\), and \(\sigma > 0\).

Theorem 13

Suppose \(X = (\mu _1 + \epsilon _1, \ldots , \mu _m + \epsilon _m, \epsilon _{m+1}, \ldots , \epsilon _d) \in \mathbb {R}^d/\mathbb {R}\textbf{1}\) such that \(\mu _1, \mu _2, \ldots , \mu _m \in \mathbb {R}\) and \(\epsilon _j \sim N(0, \sigma )\) for \(j = 1, \ldots , d\). Let \(X' \in \mathbb {R}^d/\mathbb {R}{} \textbf{1}\) be the projected point of X onto the \((m-1)\)-dimensional Stiefel tropical linear space of the \(m \times d\) matrix \(A_{m-1}\),

$$\begin{aligned} A_{m-1} = \left( \begin{array}{llllllll} \mu _1 &{} -\infty &{} -\infty &{} \ldots &{} -\infty &{} 0 &{} \ldots &{} 0\\ -\infty &{} \mu _2&{} -\infty &{} \ldots &{} -\infty &{} 0 &{} \ldots &{} 0\\ -\infty &{} -\infty &{} \mu _3 &{} \ldots &{} -\infty &{} 0 &{} \ldots &{} 0\\ \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ -\infty &{} -\infty &{} -\infty &{} -\infty &{} \mu _m&{} 0 &{} \ldots &{} 0\\ \end{array} \right) . \end{aligned}$$
(10)

Then

$$\begin{aligned} \lim _{\sigma \rightarrow 0} \mathbb {E} \left[ d_{\textrm{tr}}\left( X, X'\right) \right] = 0. \end{aligned}$$

Proof

By using

$$\begin{aligned} X_{j} = \mu _j \text {I}_{j \le m} + \epsilon _j \end{aligned}$$

and

$$\begin{aligned} p(\tau \cup \{i\}) = \sum _{s \in \tau \cup \{i\}} \mu _s \text {I}_{s \le m}, \end{aligned}$$

the Blue Rule becomes

$$\begin{aligned} X'_{i}= & {} \max _{\tau \subset [d]\setminus \{i\}} \min _{j \notin \tau } \left( X_{j}+p(\tau \cup \{i\}) - p(\tau \cup \{j\})\right) \\= & {} \max _{\tau \subset [d]\setminus \{i\}} \min _{j \notin \tau } \left( \mu _j \text {I}_{j \le m} + \epsilon _j + \mu _i\text {I}_{i\le m} - \mu _j \text {I}_{j \le m}\right) \\= & {} \mu _i\text {I}_{i\le m} + \max _{\tau \subset [d]\setminus \{i\}} \min _{j \notin \tau } \epsilon _j \\= & {} \mu _i\text {I}_{i\le m} + \min \{\epsilon ^*,\epsilon _i\} , \end{aligned}$$

where \(\epsilon ^*\) denotes the m-th minimum value in \(\{\epsilon _1, \ldots , \epsilon _d\}\). Then

$$\begin{aligned} d_{\textrm{tr}}\left( X, X'\right) = \max _{i = 1, \ldots , d}\left( \epsilon _i - \epsilon ^*\right) . \end{aligned}$$

By [20],

$$\begin{aligned} \mathbb {E} \left[ d_{\textrm{tr}}\left( X, X'\right) \right] \le \mathbb {E} \left[ \max _{i = 1, \ldots , d} \epsilon _i+\max _{i = 1, \ldots , d} (-\epsilon _i)\right] \le 2\sigma \sqrt{2\log (d)}. \end{aligned}$$

\(\square \)

Next, we consider a generalization to the correlated Gaussian as well as more than one dimensional Stiefel tropical linear spaces. Suppose we have a sample \(\mathcal {S} = \{X_1, \ldots , X_n\}\) where \(X_i \sim N(\mu , \Sigma )\), such that \(\mu =(\mu _1, \mu _2, \ldots , \mu _m, 0, \ldots , 0)\in {{\mathbb {R}}}^d/{\mathbb {R}} \textbf{1}\) for \(m < d\), and \(\Sigma \in \mathbb {R}^{d \times d}\) such that

$$\begin{aligned} \Sigma = \left( \begin{array}{ll} \mathbb {M} &{} {\textbf{0}_{m \times (d-m)}}\\ \textbf{0}_{(d - m) \times m} &{} {\sigma ^{2}} {\mathbb {I}}_{(d-m) \times (d-m)}\\ \end{array} \right) , \end{aligned}$$

where \(\mathbb {M}\) is a \(m \times m\) matrix such that

$$\begin{aligned} \mathbb {M} = { \left( \begin{matrix} 2\sigma ^2 &{} \sigma ^2 &{} \sigma ^2 &{}\ldots &{} \sigma ^2\\ \sigma ^2 &{} 2\sigma ^2 &{} \sigma ^2 &{} \ldots &{} \sigma ^2\\ \sigma ^2 &{} \sigma ^2 &{} 2\sigma ^2 &{} \ldots &{} \sigma ^2\\ \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots \\ \sigma ^2 &{} \sigma ^2 &{} \sigma ^2&{} \ldots &{} 2\sigma ^2\\ \end{matrix}\right) }, \end{aligned}$$

\(\mathbb {I}_{(d-m) \times (d-m)}\) is the \((d-m) \times (d-m)\) identity matrix, \(\textbf{0}_{m \times (d-m)}\) is the \(m \times (d-m)\) matrix with all zeros, \(\textbf{0}_{(d-m) \times m}\) is the \((d-m) \times m\) matrix with all zeros, and for \(\sigma > 0\).

Then we have

$$\begin{aligned} \begin{matrix} X_{i,1} &{}=&{} \mu _1 + \sigma Z_{i,1} + {\sigma Z_{i}}\\ \vdots &{} \vdots &{} \vdots \\ X_{i,m} &{}=&{} \mu _m + \sigma Z_{i,m} + {\sigma Z_{i}} \\ X_{i,j} &{}=&{} \sigma Z_{i,j}, \text{ for } j = (m+1), \ldots ,d,\\ \end{matrix} \end{aligned}$$

where \(X_i = (X_{i,1}, X_{i,2}, \ldots , X_{i,d})\) for \(i = 1, \ldots , d\), and \({Z_{i}}, Z_{i,1}, Z_{i,2}, \ldots , Z_{i,d} \sim N(0, 1)\) for \(i = 1, \ldots , d\).

Theorem 14

Suppose \(X = (\mu _1 + { \epsilon _1 + \epsilon }, \ldots , \mu _m + { \epsilon _m + \epsilon }, \epsilon _{m+1}, \ldots , \epsilon _d) \in \mathbb {R}^d/\mathbb {R}{} \textbf{1}\) such that \(\mu _1, \mu _2, \ldots , \mu _m \in \mathbb {R}\) and \(\epsilon _j \sim N(0, \sigma )\) for \(j = 1, \ldots , d\). Let \(X' \in \mathbb {R}^d/\mathbb {R}{} \textbf{1}\) be the projected point of X onto the \((m-1)\)-dimensional Stiefel tropical linear space of the matrix \(A_{m-1}\) in (10). Then

$$\begin{aligned} \lim _{\sigma \rightarrow 0} \mathbb {E} \left[ d_{\textrm{tr}}\left( X, X'\right) \right] = 0. \end{aligned}$$

Proof

By using

$$\begin{aligned} X_{j} = \mu _j \text {I}_{j \le m} + \epsilon _j + {\epsilon \text {I}_{j \le m}} \end{aligned}$$

and

$$\begin{aligned} p(\tau \cup \{i\}) = \sum _{s \in \tau \cup \{i\}} \mu _s \text {I}_{s \le m}, \end{aligned}$$

the Blue Rule becomes

$$\begin{aligned} X'_i= & {} \max _{\tau \subset [d]\setminus \{i\}} \min _{j \notin \tau } \left( X_j+p(\tau \cup \{i\}) - p(\tau \cup \{j\})\right) \\= & {} \mu _i \text {I}_{i \le m} + \max _{\tau \subset [d]\setminus \{i\}} \min _{j \notin \tau } \left( \epsilon _j + {\epsilon \text {I}_{j \le m}}\right) \\= & {} \mu _i \text {I}_{i \le m} + \min \left\{ \epsilon ^*,\epsilon _i + {\epsilon \text {I}_{i \le m}}\right\} , \end{aligned}$$

where \(\epsilon ^*\) denotes the m-th minimum value in \(\{\epsilon _1 + {\epsilon }, \ldots , \epsilon _m + {\epsilon }, \epsilon _{m+1}, \ldots , \epsilon _d\}\). Then

$$\begin{aligned} d_{\textrm{tr}}(X, X') = \max _{i = 1, \ldots , d}\left( \epsilon _i + {\epsilon \text {I}_{i \le m}} - \epsilon ^*\right) . \end{aligned}$$

By [20],

$$\begin{aligned} \mathbb {E} \left[ d_{\textrm{tr}}\left( X, X'\right) \right] \le {3} \mathbb {E} \left[ \max _{i = 1, \ldots , d} \epsilon _i\right] \le {3} \sigma \sqrt{2\log (d)} \end{aligned}$$
(11)

\(\square \)

So far we fixed the specific Stiefel tropical linear spaces associated with the matrix \(A_2\)(9) and \(A_{m-1}\) (10), and then we showed that \(\lim _{\sigma \rightarrow 0} \mathbb {E} [d_{\textrm{tr}}(X, X')] = 0\). However, note that a Stiefel tropical linear space which has this property is not unique. In fact, any Stiefel tropical linear space that passes through the center of the Gaussian distribution has this property.

Theorem 15

Suppose we have a random variable

$$\begin{aligned} X = \left( \mu _{1} + \epsilon _{1}, \mu _{2} + \epsilon _{2}, \mu _{3}+\epsilon _{3}, \ldots , \mu _{d} +\epsilon _{d}\right) \end{aligned}$$

where \(\mu _{j} \in \mathbb {R}\) and \(\epsilon _{j} \sim N(0, \sigma )\) with small \(\sigma > 0\) for \(j = 1, \ldots , d\). Suppose we project \(X_1\) to the Stiefel tropical linear space that passes through \(\mu =(\mu _{1}, \mu _{2}, \mu _{3}, \ldots , \mu _{d})\). Then the expected value of the tropical distance between X and the projected point \(X'\) goes to 0 as \(\sigma \rightarrow 0\).

Proof

By [20],

$$\begin{aligned} \mathbb {E}\left[ d_{\textrm{tr}}\left( X, X'\right) \right] \le \mathbb {E}\left[ d_{\textrm{tr}}(X, \mu )\right] = \mathbb {E}\left[ d_{\textrm{tr}}\left( \mu +\epsilon , \mu \right) \right] = \mathbb {E}\left[ d_\textrm{tr}(\epsilon , 0)\right] \le 2 \sigma \sqrt{2 \log d}. \end{aligned}$$

\(\square \)

Example 7

The projection of the point \(U = (\mu _1, \mu _2, \mu _3) \in \mathbb {R}^3/\mathbb {R}{} \textbf{1}\) to the Stiefel tropical linear space with \(P = (P_{12}, P_{13}, P_{23})\) is \(w = (\mu _1, \mu _2, \mu _3)\) if and only if P is on the hyperplane \(H_U\).

5 Mixture of two Gaussians fitted by a Stiefel tropical linear space of dimension one over \({{\mathbb {R}}}^d \!/{\mathbb {R}} \textbf{1}\)

Here we consider the tropical PCA for the mixture of two Gaussians, whose centers are located in general positions.

5.1 Deterministic setting: Stiefel tropical linear space of dimension one that passes through given two points

Under the assumption of the infinitesimal variances, the problem of finding the best-fit Stiefel tropical linear space for a mixture of two Gaussians turn out to finding the one-dimensional Stiefel tropical linear space that passes the centers of the both Gaussians as a deterministic problem. Here we specifically prove that the one-dimensional Stiefel tropical linear space that passes the given two points exists uniquely.

Lemma 16

The Stiefel tropical linear space with the Plücker coordinates \(P = (P_{12}, P_{13}, P_{23})\) that passes through given two points \(\mu = (\mu _1, \mu _2, \mu _3)\) and \(\nu = (\nu _1, \nu _2, \nu _3) \in \mathbb {R}^3/\mathbb {R}{} \textbf{1}\) in a general position (\(\mu _i-\nu _i \ne \mu _j-\nu _j\) for \(1 \le i < j \le 3\)) is unique.

Proof

The condition that the \(\mu \) is on P is (by the definition of the Stiefel tropical linear space) that

$$\begin{aligned} \max \left\{ P_{23}+\mu _1, P_{13}+\mu _2, P_{12}(=0)+\mu _3 \right\} \end{aligned}$$

is attained at least twice. Similarly, the condition that the \(\nu \) is on P is that

$$\begin{aligned} \max \left\{ P_{23}+\nu _1, P_{13}+\nu _2, P_{12}(=0)+\nu _3 \right\} \end{aligned}$$

is attained at least twice. Thus P must be in the union of the following nine regions.

  1. (1-1)

    When \(P_{23}+\mu _1 \le P_{13}+\mu _2 = \mu _3\) and \(P_{23}+\nu _1 \le P_{13}+\nu _2 = \nu _3\), then \(P_{13} = \mu _3 - \mu _2 = \nu _3 - \nu _2\) contradicts.

  2. (1-2)

    When \(P_{23}+\mu _1 \le P_{13}+\mu _2 = \mu _3\) and \(\nu _3 = P_{23}+\nu _1 \ge P_{13}+\nu _2\), then, \(P_{23}=\nu _3-\nu _1\) and \(P_{13}=\mu _3-\mu _2\) if \(\mu _1 - \nu _1 \le \mu _3 - \nu _3 \le \mu _2 - \nu _2\) is satisfied.

  3. (1-3)

    When \(P_{23}+\mu _1 \le P_{13}+\mu _2 = \mu _3\) and \(P_{23}+\nu _1 = P_{13}+\nu _2 \ge \nu _3\), then, \(P_{13} = \mu _3-\mu _2\) and \(P_{23} = \mu _3-\mu _2 +\nu _2 -\nu _1\) if \(\mu _1 - \nu _1 \le \mu _2 - \nu _2 \le \mu _3 - \nu _3\) is satisfied.

  4. (2-1)

    Swap \(\mu \) and \(\nu \) in (1-2).

  5. (2-2)

    When \(\mu _3 = P_{23}+\mu _1 \ge P_{13}+\mu _2\) and \(\nu _3 = P_{23}+\nu _1 \ge P_{13}+\nu _2\), then \(P_{23} = \mu _3 - \mu _1 = \nu _3 - \nu _1\) contradicts.

  6. (2-3)

    When \(\mu _3 = P_{23}+\mu _1 \ge P_{13}+\mu _2\) and \(P_{23}+\nu _1 = P_{13}+\nu _2 \ge \nu _3\), then \(P_{23}=\mu _3-\mu _1\) and \(P_{13}= \mu _3-\mu _1+\nu _1 -\nu _2\) if \(\mu _2 - \nu _2 \le \mu _1 - \nu _1 \le \mu _3 - \nu _3\) is sastisfied.

  7. (3-1)

    Swap \(\mu \) and \(\nu \) in (1-3).

  8. (3-2)

    Swap \(\mu \) and \(\nu \) in (2-3).

  9. (3-3)

    When \(P_{23}+\mu _1 = P_{13}+\mu _2 \ge \mu _3\) and \(P_{23}+\nu _1 = P_{13}+\nu _2 \ge \nu _3\), then \(P_{13} - P_{23} = \mu _1-\mu _2 = \nu _1-\nu _2\) contradicts.

In a unified description, \(P_{13} = \nu _3 - \nu _2 + \max (\mu _1-\nu _1, \mu _3-\nu _3) - \max (\mu _1-\nu _1, \mu _2-\nu _2)\) and \(P_{23} = \nu _3 - \nu _1 + \max (\mu _2-\nu _2, \mu _3-\nu _3) - \max (\mu _2-\nu _2, \mu _1-\nu _1)\) are clearly unique. \(\square \)

Remark 12

Another simple proof is available: by Example 7, P should lie on both hyperplane \(H_\mu \) and hyperplane \(H_\nu \), whose intersection is unique. However, our proof can easily generalize to higher dimensions and clarifies the conditions on the general position.

Remark 13

One intuitive interpretation why (1-1), (2-2) and (3-3) contradict is that they impose symmetrical conditions on \(\mu \) and \(\nu \). The other conditions impose \(\mu _i- \nu _i \le \mu _j - \nu _j \le \mu _k - \nu _k\) and derive the unique hyperplane with a specific configuration that passes through the two points. Without loss of generality we can set \(\mu _3=\nu _3=0\). Then, (1-2), (1-3) and (2-3) holds for \(\mu _1 < \nu _1\) while (2-1), (3-1) and (3-2) holds for \(\mu _1 > \nu _1\).

Theorem 17

The Stiefel tropical linear space with \(P_{ij}\) for \(1 \le i < j \le d\) that passes through given two points \(\mu = (\mu _1, \mu _2, \mu _3, \ldots , \mu _d)\) and \(\nu = (\nu _1, \nu _2, \nu _3, \ldots , \nu _d) \in \mathbb {R}^d/\mathbb {R}{} \textbf{1}\) in a general position (\(\mu _i-\nu _i \ne \mu _j-\nu _j\) for \(1 \le i < j \le d\)) is unique and obtained as the tropical determinant.

Proof

By the definition of the Stiefel tropical linear space, the condition that the \(\mu \) is on P is that

$$\begin{aligned} \max \left\{ P_{\tau _2\tau _3}+\mu _{\tau _1}, P_{\tau _1\tau _3}+\mu _{\tau _2}, P_{\tau _1\tau _2}+\mu _{\tau _3} \right\} \end{aligned}$$

is attained at least twice for all possible triplets \((\tau _1, \tau _2, \tau _3)\). The condition that the \(\nu \) is on P is that

$$\begin{aligned} \max \left\{ P_{\tau _2\tau _3}+\nu _{\tau _1}, P_{\tau _1\tau _3}+\nu _{\tau _2}, P_{\tau _1\tau _2}+\nu _{\tau _3} \right\} \end{aligned}$$

is attained at least twice for all possible triplets \((\tau _1, \tau _2, \tau _3)\). By considering the both \(\mu \) and \(\nu \) simultaneously for a specific \(\tau = (i, j, k)\), we come back to Lemma 16,

$$\begin{aligned} P_{ik} - P_{ij} = \nu _k - \nu _j + \max \left( \mu _k-\nu _k, \mu _i-\nu _i\right) - \max \left( \mu _j-\nu _j, \mu _i-\nu _i\right) . \end{aligned}$$

Specifically,

$$\begin{aligned} (P_{ik} - P_{i2}) - (P_{i2} - P_{12})= & {} \nu _i + \nu _k + \max (\mu _i-\nu _i, \mu _k-\nu _k) - \nu _1 - \nu _2 \\{} & {} - \max (\mu _1-\nu _1, \mu _2-\nu _2), \end{aligned}$$

where, without loss of generality, we set \(P_{12}=\nu _1 + \nu _2 + \max (\mu _1-\nu _1, \mu _2-\nu _2)\) to get

$$\begin{aligned} P_{ik} = \nu _i + \nu _k + \max \left( \mu _i-\nu _i, \mu _k-\nu _k\right) = \max \left( \mu _i+\nu _k, \mu _k+\nu _i\right) . \end{aligned}$$

This solution is unique for any \(P_{ik}\). Imagine you obtain \(P_{ik}\) by two different ways through \(P_{ik}-P_{i l_1}\) and \(P_{ik}-P_{i l_2}\). Then the difference of the solutions obtained through \(l_1\) and \(l_2\) vanishes, \(P_{i k}^{through ~ P_{i l_1}}-P_{i k}^{through ~ P_{i l_2}} = 0\). Similarly, \(P_{i k}^{through ~ P_{i l_1}}-P_{i k}^{through ~ P_{l_2 k}} = 0\). Thus, the solution does not depend on the way to solve. That is, the solution is consistent (not empty) and unique. \(\square \)

Remark 14

One can prove Theorem 17 using the fact that a tropical line segment between two points are unique if and only if these two points are in relative general position, i.e., all the inequalities in (5.9) in [17] are strict. Then we can extend the tropical line segment to its associated Stiefel tropical linear space by the way described in page 293 in [17].

Remark 15

The Stiefel tropical linear space with \(P_{ij}\) for \(1 \le i < j \le d\) that passes through given two points \(\mu = (\mu _1, \mu _2, 0, \ldots , 0)\) and \(\nu = (\nu _1, \nu _2, 0, \ldots , 0) \in \mathbb {R}^3/\mathbb {R}{} \textbf{1}\) partly in a general position (\(\mu _i-\nu _i \ne \mu _j-\nu _j\) for \(1 \le i < j \le 3\)) is NOT unique.

5.2 Probabilistic setting: distance to best-fit space

In order to make it simple, suppose we have two random variables:

$$\begin{aligned} \begin{array}{cccccccc} X_1 &{}=&{} (5 + \epsilon _{11},&{} -5 + \epsilon _{12}, &{} \epsilon _{13},&{} \ldots , &{} \epsilon _{1d}), \\ X_2 &{}=&{} (-5 + \epsilon _{21},&{} 5 + \epsilon _{22}, &{}\epsilon _{23},&{} \ldots , &{} \epsilon _{2d}), \\ \end{array} \end{aligned}$$

where \(\epsilon _{ij} \sim N(0, \sigma )\) with small \(\sigma > 0\) for \(i = 1, 2\) and \(j = 1, \ldots , d\).

Lemma 18

The Plücker coordinates of the Stiefel tropical linear space corresponding to the \(2 \times d\) matrix,

$$\begin{aligned}A_0 = \left( \begin{matrix} 5 &{} -5 &{} 0 &{} \ldots &{} 0 \\ -5 &{} 5 &{} 0 &{} \ldots &{} 0 \end{matrix}\right) ,\end{aligned}$$

which contains \(X_1\) and \(X_2\), are

$$\begin{aligned} p_{A_0}(\{i, j\}) = {\left\{ \begin{array}{ll} 10 &{} \text{ if } \quad i = 1 \text{ and } j = 2\\ 5 &{} \text{ if } \quad i = 1 \text{ and } j = 3, \ldots , d\\ 5 &{} \text{ if } \quad i = 2 \text{ and } j = 3, \ldots , d\\ 0 &{} \text{ if } \quad i \not = j \text{ such } \text{ that } i = 3, \ldots , d, j = 3, \ldots , d. \end{array}\right. } \end{aligned}$$

Lemma 19

Suppose we project \(X_1\) to the Stiefel tropical linear space \(p_{A_0}\) and \(P(\cup _j\{|\epsilon _{1j}|\ge 5\}) \le \delta \) for \(\delta > 0\) and \(j = 1, \ldots , d\). Then, with the probability \(1-\delta \), the projected point \(w \in \mathbb {R}^d /\mathbb {R}\textbf{1}\) is \((w_1, \ldots , w_d)=(5+\alpha , -5+\epsilon _{12}, \alpha , \ldots , \alpha )\) and

$$\begin{aligned} d_{\textrm{tr}}(X_1, w) = \max _{j \in [d] \setminus \{2\}} \epsilon _{1j} -\alpha , ~ ~ ~ where ~ ~ ~ \alpha =\min _{j \in [d] \setminus \{2\}} \epsilon _{1j}. \end{aligned}$$

Proof

By the Blue Rule and \(p_{A_0}(\{i,j\}) = 5\text {I}_{i \le 2} + 5\text {I}_{j \le 2}\),

$$\begin{aligned} w_i= & {} \max _{\tau \ne i} \min _{j \ne \tau } ~ \left( X_{1,j} + p_{A_0}(\{\tau ,i\}) -p_{A_0}(\{\tau ,j\})\right) \\= & {} 5\text {I}_{i \le 2} + \max _{\tau \ne i} \min _{j \ne \tau } ~ \left( X_{1,j} - 5\text {I}_{j \le 2}\right) \\= & {} 5\text {I}_{i \le 2} + \min \{ X_{1,i} - 5\text {I}_{i \le 2}, \text {2nd-min}_j ~ \left( X_{1,j} - 5\text {I}_{j \le 2}\right) \}. \end{aligned}$$

If \(|\epsilon _{1i}|<5\) for \(i = 1, \ldots , d\), then \(w_i = 5\text {I}_{i \le 2} + \min \{ X_{1,i} - 5\text {I}_{i \le 2}, \alpha \}.\) \(\square \)

Lemma 20

Suppose we project \(X_2\) to the Stiefel tropical linear space \(p_{A_0}\) and \(P(\cup _j\{|\epsilon _{2j}|\ge 5\}) \le \delta \) for \(\delta > 0\) and \(j = 1, \ldots , d\). Then, with the probability \(1-\delta \), the projected point \(w \in \mathbb {R}^d /\mathbb {R}\textbf{1}\) is \((w_1, \ldots , w_d)=(-5 + \epsilon _{21}, 5+\beta , \beta , \ldots , \beta )\) and

$$\begin{aligned} d_{\textrm{tr}}(X_2, w) = \max _{j = 2, \ldots ,d} \epsilon _{2j} - \beta , ~ ~ ~ where ~ ~ ~ \beta =\min _{j = 2, \ldots ,d} \epsilon _{2j}. \end{aligned}$$

Theorem 21

Suppose w is the projected point of either \(X_1\) (or \(X_2\)) onto the Stiefel tropical linear space \(p_{A_0}\) and \(P(\cup _{i, j}\{|\epsilon _{ij}|\ge 5\}) \le \delta \) for \(\delta > 0, i = 1, 2\) and \(j = 1, \ldots , d\). Then the expected value of the tropical distance between \(X_1\) or \(X_2\) and w is smaller than \(2\sigma \sqrt{2\log (d-1)}\) with the probability \(1-\delta \).

Proof

Let \(\epsilon _i \sim N(0, \sigma )\) for \(i = 1, \ldots (d-1)\). By Lemmas 19 and 20 and by [20],

$$\begin{aligned} \mathbb {E}[d_{\textrm{tr}}(X_1, w)] = \mathbb {E}[d_{\textrm{tr}}(X_2, w)]= & {} \mathbb {E}[\max _{i = 1, \ldots , (d-1)}\epsilon _i] - \mathbb {E}[\min _{i = 1, \ldots , (d-1)}\epsilon _i], \\= & {} 2\mathbb {E}[\max _{i = 1, \ldots , (d-1)}\epsilon _i], \\\le & {} 2 \sigma \sqrt{2 \log (d-1)}. \end{aligned}$$

\(\square \)

Remark 16

We can similarly prove the same theorem for \(X_1 = (\nu _1 + \epsilon _{11}, -\nu _2 + \epsilon _{12}, \epsilon _{13}, \ldots , \epsilon _{1d})\) and \(X_2 = (-\nu _1 + \epsilon _{21}, \nu _2 + \epsilon _{22}, \epsilon _{23}, \ldots , \epsilon _{2d})\), where \(\nu _1, \nu _2 > 0\) be positive real numbers such that \(\nu _1 \not = \nu _2\) under the assumption \(\max \{P(\cup _{i, j}\{|\epsilon _{ij}|\ge \nu _1\}), P(\cup _{i, j}\{|\epsilon _{ij}|\ge \nu _2\})\} \le \delta \) for \(\delta > 0\) for \(i = 1, 2\) and for \(j = 1, \ldots , d.\)

Remark 17

One issue here is that \(X_1\) and \(X_2\) are not in a general position. In fact, the best-fit one-dimensional Stiefel tropical linear space for two Gaussian described in Theorem 21 may not be unique in the limit of \(\sigma \rightarrow 0\). However, the above one is the best one in the sense it is natural and stable (robust).

It may be rather convenient to consider a general position case for which the solution is unique and should coincide with the deterministic one shown in the previous subsection. In this general case, the Blue Rule becomes too complicated and we simply bound with inequalities instead.

Theorem 22

Suppose we have random variables

$$\begin{aligned} \begin{array}{cccccccc} X_1 &{}=&{} (\mu _{11} + \epsilon _{11},&{} \mu _{12} + \epsilon _{12}, &{} \mu _{13}+\epsilon _{13},&{} \ldots , &{} \mu _{1d}+\epsilon _{1d}) \\ X_2 &{}=&{} (\mu _{21} + \epsilon _{21},&{}\mu _{22} + \epsilon _{22}, &{} \mu _{23}+\epsilon _{23},&{} \ldots , &{} \mu _{2d}+\epsilon _{2d}) \\ \end{array} \end{aligned}$$

where \(\mu _{ij} \in \mathbb {R}\) are in general positions (\(\mu _i-\nu _i \ne \mu _j-\nu _j\) for \(1 \le i < j \le 3\)) and \(\epsilon _{ij} \sim N(0, \sigma )\) with small \(\sigma > 0\) for \(i = 1, 2\) and \(j = 1, \ldots , d\). Suppose we project \(X_1\) (or \(X_2\)) to the Stiefel tropical linear space that passes through \(\mu _1=(\mu _{11}, \mu _{12}, \mu _{13}, \ldots , \mu _{1d})\) and \(\mu _2=(\mu _{21}, \mu _{22}, \mu _{23}, \ldots , \mu _{2d})\). Then the expected value of the tropical distance between \(X_1\) (or \(X_2\)) and the projected point \(X'_1\) (or \(X'_2\)) goes to 0 as \(\sigma \rightarrow 0\).

Proof

By [20],

$$\begin{aligned} \mathbb {E}[d_{\textrm{tr}}(X_1, X'_1)]{} & {} \le \mathbb {E}[d_{\textrm{tr}}(X_1, \mu _1)] = \mathbb {E}[d_{\textrm{tr}}(\mu _1+\epsilon _1, \mu _1)]= \mathbb {E}[d_{\textrm{tr}}(\epsilon _1, 0)] \\{} & {} \le 2 \sigma \sqrt{2 \log d}. \end{aligned}$$

\(\square \)

6 Mixture of three or more Gaussians fitted by tropical polynomials over \({{\mathbb {R}}}^3 \!/{\mathbb {R}} \textbf{1}\)

To explore a possible extension of a Stiefel tropical linear space as a subspace, we consider the projection of data points onto tropical polynomials. In \({{\mathbb {R}}}^3 \!/{\mathbb {R}} \textbf{1}\), the only nontrivial Stiefel tropical linear space is a tropical hyperplane, which is specified by a tropical linear function with a normal vector \(\omega =(\omega _x, \omega _y, 0)\),

$$\begin{aligned} \omega _x \odot x \boxplus \omega _y \odot y \boxplus 0 . \end{aligned}$$
(12)

Similarly, we can consider a x-quadratic tropical hypersurface, which is specified by a corresponding tropical quadratic function,

$$\begin{aligned} \omega _{xx} \odot x^2 \boxplus \omega _x \odot x \boxplus \omega _y \odot y \boxplus 0 . \end{aligned}$$
(13)

We can further consider a x-cubic tropical hypersurface, which is specified by a corresponding tropical cubic function,

$$\begin{aligned} \omega _{xxx} \odot x^3 \boxplus \omega _{xx} \odot x^2 \boxplus \omega _x \odot x \boxplus \omega _y \odot y \boxplus 0 , \end{aligned}$$
(14)

although we do not treat cubic cases in this paper.

6.1 Deterministic setting: possible configurations of tropical curves that pass through given points

Throughout this paper, we have the mixture of Gaussians whose centers are located in general positions in mind to fit. Furthermore, under the assumption of the infinitesimal variances, the problem of finding the best-fit tropical curve for a mixture of Gaussians in \({{\mathbb {R}}}^3 \!/{\mathbb {R}} \textbf{1}\) can turn out to finding the curve that passes the centers of all the Gaussians. Thus, we first summarize the possible configurations in this deterministic case. Remember that the degree of freedom for a linear tropical curve to pass through is limited to two points. Thus higher degree polynomial curves may be suitable to fit three or more Gaussians.

6.1.1 Best-fit tropical linear curves or hyperplanes

Let us briefly review the linear curve or hyperplane case, where we try to find the straight line that passes through the two given points \((x_1, y_1, z_1)\) and \((x_2, y_2, z_2)\) in \({{\mathbb {R}}}^3 \!/{\mathbb {R}} \textbf{1}\). Without loss of generality, \(z_1=z_2=0\) and \(x_1 < x_2\) are assumed, as well as \(y_1 \ne y_2\). Depending on the slope of the line that connects given two points, there are three possible configurations for the two points to lie on the different half lines as in Fig 4.

Fig. 4
figure 4

Examples of all three possible configurations for two points on a plane. Depending on the configuration pattern, the points lie on the different half lines

Lemma 23

(best-fit tropical linear curves or hyperplanes (Fig 4)) When the slope of the line that connects given two points, \((x_1, y_1, 0)\) and \((x_2, y_2, 0)\) in \({{\mathbb {R}}}^3 \!/{\mathbb {R}} \textbf{1}\), is larger than 1 on the plane for the first two coordinates, the normal vector of the hyperplane that passes through the two points is \((\omega _x, \omega _y, 0) = (-x_1,-x_1+x_2-y_2,0)\). When the slope is between 0 and 1, the normal vector is \((\omega _x, \omega _y, 0) = (-x_2-y_1+y_2,-y_1,0)\). When the slope is negative, the normal vector is \((\omega _x, \omega _y, 0) = (-x_2,-y_1,0)\).

Proof

Direct calculations. \(\square \)

Remark 18

Algebraically speaking, the condition that a point (xy) is on a hyperplane is equivalent to the condition that the normal vector of the hyperplane is on a hyperplane whose normal vector is (xy). Thus, if two points \((x_1,y_1)\) and \((x_2,y_2)\) are on a hyperplane, the normal vector of the hyperplane is the intersection of two hyperplanes whose normal vectors are \((x_1,y_1)\) and \((x_2,y_2)\).

6.1.2 Best-fit tropical x-quadratic curves

Here we try to find the quadratic curve that passes the three given points \((x_i, y_i, z_i)\) in \({{\mathbb {R}}}^3 \!/{\mathbb {R}} \textbf{1}\) for \(i=1,2,3\). Without loss of generality, \(z_1 = z_2 = z_3 = 0\) and \(x_1< x_2 < x_3\) are assumed, as well as \(y_1 \ne y_2 \ne y_3 \ne y_1\). Depending on the slope of the connecting line segments, there are \(9(=3 \times 3)\) possible configurations for the three points to lie on the different half lines or line segments as in Fig. 5. Interestingly, x-quadratic curves cannot pass through one of the nine configurations.

Fig. 5
figure 5

Examples of all eight possible configurations for three points on a plane. Depending on the configuration pattern, the points lie on the different half lines or line segments. The third and the following figures show, respectively, “TooHigh-TooHigh”, “TooHigh-High/Low”, “High-TooHigh”, “High-High/Low” and “Low-High/Low” configurations

Lemma 24

(Best-fit tropical x-quadratic curves (Fig. 5)) In the case of “Low-TooHigh” configuration with \(y_1 > y_2\) and \(y_1+2(x_3-x_2) < y_3\), there is no x-quadratic curve that passes through the three points in \({{\mathbb {R}}}^3 \!/{\mathbb {R}} \textbf{1}\). In the other eight configurations, there is a unique x-quadratic curve that passes through the three points in \({{\mathbb {R}}}^3 \!/{\mathbb {R}} \textbf{1}\), where the points lie on the different half lines or line segments depending on the configuration as in Fig 5.

Proof

Direct calculations. \(\square \)

6.2 Probabilistic setting: distance to best-fit space

To perform a PCA for point clouds, we need a projection rule onto a curve.

Lemma 25

The projection rules in each delineated region of \(\mathbb R^3/{\mathbb {R}} \textbf{1}\) to the hyperplane \(H_0\) as well as the x-quadratic curve whose nodes are (0, 0, 0) and (0, 1, 1) are the rules shown in Fig. 6. Especially, the distances from (xy, 0) to the curves are denoted by the red texts.

Proof

By the triangle inequality, you only need to consider the boundaries of each region as candidates of the projection. Remaining is done by direct calculations for each region. \(\square \)

Fig. 6
figure 6

Projection rule in \({\mathbb {R}}^3/{\mathbb {R}} \textbf{1}\) to the hyperplane \(H_0\) (left) and the quadratic curve whose nodes are (0, 0, 0) and (0, 1, 1) (right). The red texts represents the distance from a point (0, xy) to the curve with one of the geodesics shown as a red arrow. This distance function is piecewise linear on the domains delineated by the dotted gray lines and the curve itself. Although, in the quadratic case, we do not have a simple rule like “max - 2nd max” for the hyperplane, at least one of the geodesics is a vertical or horizontal line segment, demonstrating the equivalence to \(L_1\) norm

Similar to fitting to a Stiefel tropical linear space, for fitting to a tropical polynomial, we also have an upper bound for the convergence rate of the mean distance between observations in a given sample and their projections as \(\sigma \rightarrow 0\). There, in practice, we do not know the Gaussian center \(\mu \) in general and we estimate \(\mu \) by its point estimate \({\hat{\mu }} =\frac{1}{n}\sum _{i=1,..,n} X_i\).

Lemma 26

Suppose \(X_i \sim N(\mu , \sigma \mathbb {I}_d)\) in \(\mathbb R^d/{\mathbb {R}} \textbf{1}\) for \(i=1,...,n\). Then \(\mathbb {E} \left[ d_{\textrm{tr}}(X_i, \frac{1}{n}\sum _{i=1,..,n} X_i) \right] \le \sqrt{\frac{n-1}{n}} 2\sigma \sqrt{2\log (d)}\) .

Proof

For univariate random variables \(U_i\), if \(U_i \sim N(\mu , \sigma )\) for \(i=1,...,n\), \(U_i - \frac{1}{n}\sum _{i=1,..,n} U_i \sim N(0, \sqrt{\frac{n-1}{n}} \sigma )\). By [20],

$$\begin{aligned} \mathbb {E} \left[ d_{\textrm{tr}}(X_i, \frac{1}{n}\sum _{i=1,..,n} X_i) \right] = \mathbb {E} \left[ d_{\textrm{tr}}(X_i - \frac{1}{n}\sum _{i=1,..,n} X_i, 0) \right] \le \sqrt{\frac{n-1}{n}} 2\sigma \sqrt{2\log (d)} \end{aligned}$$

\(\square \)

Theorem 27

Suppose the centers of l Gaussians in \({\mathbb {R}}^3/{\mathbb {R}} \textbf{1}\) are estimated as \({\hat{\mu ^k}} = \frac{1}{n}\sum _{i=1,..,n} X^k_i\) where \(X^k_i \sim N(\mu ^k, \sigma \mathbb {I}_3)\) for \(i=1,...,n\) and \(k=1,...,l\). Let \(X^k_{i, proj}\) be the projection of \(X^k_i\) to the tropical polynomial curve that passes through all the estimated centers of the p Gaussians. Then the expectation of their tropical distance is smaller than \(\sqrt{\frac{n-1}{n}} 2\sigma \sqrt{2\log (3)}\).

Proof

By Lemma 26,

$$\begin{aligned} \mathbb {E}\left[ d_{\textrm{tr}}(X^k_i,X^k_{i, proj}) \right] \le \mathbb {E}\left[ d_{\textrm{tr}}(X^k, {\hat{\mu ^k}})\right] \le \sqrt{\frac{n-1}{n}} 2\sigma \sqrt{2\log (3)}. \end{aligned}$$

\(\square \)

7 Discussion

In this paper, we focus on asymptotic behaviors of best-fit Stiefel tropical spaces over the tropical projective space when a sample is generated by a mixture of Gaussian distributions. Specifically we focus on asymptotic behaviors of a matrix associated with the Plücker coordinates of a Stiefel tropical space over the tropical projective space when a sample is generated from a mixture of Gaussian distributions. Then we investigated on best-fit tropical polynomials over the tropical projective space when a sample is generated from a Gaussian mixture.

First, we consider a single Gaussian case and we showed that when the mean of the Gaussian distribution is located at the point of a Stiefel tropical space which has the co-dimension equal to d (i.e., the apex of hyperplanes), then it is a best-fit Stiefel tropical space over the tropical projective space. However, it is not clear that this is an only best-fit Stiefel tropical space to a sample generated by a single Gaussian distribution. For \(d = 3\), we prove that when the mean of the Gaussian distribution is located at the point of a Stiefel tropical space which has the co-dimension equal to \(d = 3\), then it is the unique best-fit Stiefel tropical spaces over the tropical projective space. In general it is still an open problem.

Actually, the convergence results (but not the convergence rates) in Theorems 101213, 141522 immediately follow from the continuity of \(d_{\textrm{tr}}\) or \(\lim _{\sigma \rightarrow 0} d_{\textrm{tr}}(X,L) = d_{\textrm{tr}}(\lim _{\sigma \rightarrow 0} X,L) = 0\) whenever the Stiefel tropical linear space L contains the mean of the distribution from which X is sampled. However, our proofs give the upper bounds of the convergence rates on the way to prove the convergence as an additional information.

In addition, in this paper, for a simplicity, we consider a mixture of Gaussian distributions, where each Gaussian distribution has a diagonal covariance matrix, i.e., variables are uncorrelated in each Gaussian distribution. We do not know an asymptotic behavior of the best-fit Stiefel tropical space when we have general correlations between variables in each Gaussian distribution.

Then we consider fitting a tropical polynomial to a sample generated by a mixture of Gaussian distributions. Specifically, we consider a special type of polynomial when \(d = 3\). In general it is not clear how to project an observation to a given tropical polynomial in terms of the tropical metric, similar to the blue rule and red rule in a case of a Stiefel tropical linear space. Projecting a point onto a tropical polynomial over the tropical projective space is a necessary and an important tool for statistical inference (supervised learning) using tropical geometry. We propose an algorithm to project a point onto a tropical polynomial for \(d = 3\) and it is a future work to generalize this algorithm for \(d \ge 3\).