Approximate triangulations of Grassmann manifolds

We define the notion of an approximate triangulation for a manifold $M$ embedded in euclidean space. The basic idea is to build a nested family of simplicial complexes whose vertices lie in $M$ and use persistent homology to find a complex in the family whose homology agrees with that of $M$. Our key examples are various Grassmann manifolds $G_k({\mathbb R}^n)$.


Introduction
Smooth manifolds admit piecewise-linear triangulations [14]. However, there are many subsequent questions one might ask: How many simplices are required? What is the minimal number of vertices? Is there an algorithm to construct a triangulation?
A great deal of work in algebraic topology has been devoted to these questions. The question of the number of simplices required to triangulate a given manifold is often attacked by sophisticated cohomological methods involving characteristic classes (such arguments also often yield estimates on the minimal embedding dimension for the manifold). Surprisingly, much of this work is very recent [6], [7]. A main result in [7] is the following. Theorem 1.1 ([7], Theorem 3.10). Every triangulation of the Grassmann manifold G k (R n+k ) must have at least [(n + k)(n + k + 1) − 2kn] · (2 kn+1 − 1)

simplices.
For example, any triangulation of the manifold G 2 (R 4 ) must have at least 372 simplices. The Grassmann manifolds will be defined in Section 2.1 below. These are important spaces to study because of their utility in algebraic topology, especially with respect to the study of characteristic classes [11].
Unfortunately, most results along these lines are not constructive; that is, the proofs do not yield an explicit triangulation of the manifold. In fact, if one seeks a triangulation of a Grassmannian G k (R n ) the end result is usually disappointment. For the smallest nontrivial space, G 1 (R 3 ) = RP 2 , there are many well-known small triangulations, and even an algorithm to generate a triangulation from any collection of points in general position [1]. Beyond that, however, results are sparse.
In this paper, we develop a procedure to find what we call an approximate triangulation of the manifold G k (R n ) (Definition 2.8). The basic idea is to first generate a sample of points on G k (R n ). This already leads to technical difficulties involving embeddings of these spaces into a euclidean space R N , but we are able to solve this. We then build a nested family of simplicial complexes on the point cloud, parametrized by the positive real numbers. The persistent homology of this family is then computed and we identify an interval of parameters for which the mod 2 homology of the complexes in that range agrees with that of G k (R n ). Such a complex is then a viable model for the manifold: its vertices lie in G k (R n ) ⊂ R N and it has the correct homology. We then implement this procedure for the following spaces: Computational limitations have so far prohibited further calculations; we discuss this in Section 4.
Acknowledgments. This problem was suggested to me by Vidit Nanda; I thank him for the inspiration and helpful conversations. Henry Adams provided useful tips for Javaplex. I am also grateful to Mikael Vejdemo-Johansson for the use of his rather powerful computer.

Materials and Methods
Further details and proofs of the results in Subsections 2.1 and 2.2 may be found in [11].
2.1. Grassmann manifolds. Denote by R n the euclidean space of dimension n. By a k-frame in R n we mean a k-tuple of linearly independent vectors; denote by V k (R n ) the collection of k-frames in R n . This is an open subset of the k-fold cartesian product R n × · · · × R n .
Definition 2.1. The Grassmann manifold G k (R n ) is the set of all k-dimensional planes through the origin in R n . It is topologized via the quotient map V k (R n ) → G k (R n ) which takes a k-frame to the k-plane it spans.
When k = 1, we see that G 1 (R n ) is the real projective space RP n−1 , a manifold of dimension n − 1. In general we have the following result.
The map X → X ⊥ , which takes a k-plane to its orthogonal complement is a diffeomorphism between G k (R n ) and G n−k (R n ).

Schubert cells.
Grassmannians have a well-known cell decompostion into Schubert cells. Consider the sequence of subspaces of R n : R 0 ⊂ R 1 ⊂ R 2 ⊂ · · · ⊂ R n , where R i consists of the vectors of the form (a 1 , . . . , a i , 0, . . . , 0). Any k-plane X gives rise to a sequence of integers Consecutive integers differ by at most 1.
Given a Schubert symbol σ, let e(σ) ⊂ G k (R n ) denote the set of k-planes X such that Each X ∈ G k (R n ) belongs to precisely one of the sets e(σ).
Theorem 2.5. The n k sets e(σ) form the cells of a CW-decomposition of G k (R n ).
Proposition 2.6. The number of r-cells in G k (R n ) is equal to the number of partitions of r into at most k integers each of which is ≤ n − k.
For example, the possible Schubert symbols and cells for G 2 (R 4 ) are as follows. Such a symbol has the form σ = (σ 1 , The mod 2 homology of G k (R n ) is easily computed from the Schubert cell decomposition: since the induced boundary maps are all either 0 or multiplication by 2, the mod 2 homology has basis corresponding to the cells.
Continuing the example of G 2 (R 4 ), we have 2.3. Persistent Homology. Suppose we are given a finite nested sequence of finite simplicial complexes where the R i are real numbers R 1 < R 2 < · · · < R p . For each homological degree ≥ 0, we then obtain a sequence of homology groups and induced linear transformations (homology with Z/2-coefficients for simplicity) Since the complexes are finite, each H (K R i ) is a finite-dimensional vector space. Thus, there are only finitely many distinct homology classes. A particular class z may come into existence in H (K Rs ), and then one of two things happens. Either z maps to 0 (i.e., the cycle representing z gets filled in) in some H (K Rt ), R s < R t , or z maps to a nontrivial element in H (K Rp ). This yields a barcode, a collection of interval graphs lying above an axis parametrized by R. An interval of the form [R s , R t ] corresponds to a class that appears at R s and dies at R t . Classes that live to K Rp are usually represented by the infinite interval [R s , ∞) to indicate that such classes are real features of the full complex K Rp .
As an example, consider the tetrahedron T with filtration , and T 5 = T . The barcodes for this filtration are shown in Figure 1. Note that initially, there are 4 components (β 0 = 4), which get connected in T 1 , when 3 independent 1-cycles are born (β 1 = 3). These three 1-cycles die successively as triangles get added in T 2 , T 3 , and T 4 . The addition of the final triangle in T 5 creates a 2-cycle (β 2 = 1). For analyzing point cloud data, one needs a simplicial complex modeling the underlying space. Since it is impossible to know a priori if a complex is "correct", one builds a nested family of complexes approximating the data cloud, computes the persistent homology of the resulting filtration, and looks for homology classes that exist in long sections of the filtration. We discuss two popular methods for doing this in the next subsection. Vietoris-Rips and witness complexes. Now suppose we are given a discrete set X of points in some metric space (typically a Euclidean space R m ). The standard example of such an object is a sample of points from some geometric object M . We would like to recover information about M from the sample X, and the first step is to obtain an approximation of M using only the point cloud X. There are many such techniques; perhaps the most classical is the Delaunay triangulation of X. This is defined as follows.
The corresponding Delaunay triangulation, Del(X), is the nerve of the Voronoi decomposition; that is, a collection V ( Figure 2 for an example. While the Delaunay triangulation provides a good approximation to the underlying space M , it has several disadvantages. If the point cloud X is large, there will be a very large number of simplices in Del(X). Also, Del(X) suffers from the "curse of dimensionality;" that is, if the ambient dimension (m) is large, calculating the Voronoi decomposition is computationally expensive.
There are many popular alternatives to the Delaunay triangulation. The one used most often is the Vietoris-Rips complex, which is built as follows. Consider the point cloud X and let r > 0. The Vietoris-Rips complex with parameter r is the simplicial complex V R(X, r) whose k-simplices are That is, if one imagines a ball of radius r/2 around each point x ∈ X, then we join the points x i and x j with an edge if the balls intersect. Observe that if r < r then there is an inclusion of complexes V R(X, r) ⊂ V R(X, r ). We Many software packages support the calculation of Vietoris-Rips persistence on point clouds. In this paper, we use the Eirene package developed by Gregory Henselman [9]. Other popular programs include Ulrich Bauer's Ripser [2] and Vidit Nanda's Perseus [12].
In Section 3.5, we shall use the witness complexes of de Silva and Carlsson [5]. The idea is to model the Delaunay triangulation on a smaller set of points L ⊂ X, called landmarks, in such a way that the topology of the underlying object is well-approximated. Moreover, the definition makes sense in any metric space, so assume that X is a metric space with distance function d (e.g., X could be a finite point cloud in R m with the usual Euclidean distance). Choose a subset L = { 1 , 2 , . . . , n } of X = {x 1 , x 2 , . . . , x N } and let R ≥ 0 be a real number.
The witness complex W (X, L, R) is defined as follows: • The vertex set of W (X, L, R) is L; • , ∈ L span an edge if there exists an x ∈ X, called a witness, such that • A collection 0 , . . . , p ∈ L spans a p-simplex if { i , j } span an edge for all i = j. Examples of witness complexes are shown in Figure 2(b) alongside the associated Delaunay triangulation. Four landmark points were chosen using the maxmin procedure described below. The complex on the left has R = .0329, and the complex on the right has R = .1317. Note that the larger value of R yields a complex with more simplices. Also, note that the witness complex is a coarse approximation of the Delaunay triangulation.
We make some observations about this definition. Let D be the n × N matrix of distances from points in L to points in X.
• If R = 0, then , ∈ L form an edge if there is an x i ∈ X such that d(x i , ) and d(x i , ) are the two smallest entries in the i-th column of D. This is analogous to the existence of an edge in the Delaunay triangulation Del(L). • For R > 0, one may think of relaxing the boundaries of the Voronoi diagram of L and taking the nerve of the resulting covering of X. • If 0 ≤ R < R , then there is an inclusion of simplicial complexes W (X, L, R) ⊆ W (X, L, R ). By a theorem of de Silva and Carlsson [5], this complex is a natural analogue of the Delaunay triangulation for a space represented by point cloud data.
Suppose that X is a sample of points from some object M ⊂ R m . There is no guarantee that W (X, L, R) recovers the topology of M , but experiments on familiar geometric objects [5] (spheres, for example) suggest that for a suitable range of values of R and good choices of landmarks L, the topology of W (X, L, R) is the same as that of M . This begs the questions: (1) How should the landmark set L be chosen?
(2) What is the correct value of R? The second question is best handled via the use of persistent homology, which we discussed in Section 2.3 above. As for the choice of landmarks, there are three standard options: (1) Select landmarks at random.
The maxmin procedure yields more evenly-spaced landmarks, but tends to emphasize extremal points. It is generally more reliable than a random selection [5]. Another useful resource is [3]. In our experiments in Section 3.5 below we use the maxmin process to generate landmarks.

Sampling procedures.
To build a Vietoris-Rips or witness complex on points in G k (R n ), we need to develop a sampling procedure. The first question to be asked is in which euclidean space do we embed G k (R n )? This is highly nontrivial. Even in the case of projective spaces (k = 1) it is not so obvious how to proceed. A whole industry has been devoted to the question of the minimal embedding dimension of RP n [4], but the proof of the minimality of any particular embedding rarely comes with an explicit formula for the map. An exception is if one insists on an isometric embedding [15], but the minimal dimension of such an embedding for RP n is n(n + 3)/2, which grows rather quickly.
For arbitrary Grassmannians, one could try to use the Plücker embedding (where [v] denotes the line spanned by the vector v) and then embed the target projective space into euclidean space. Of course this explodes the dimension further, making this an impractical solution. Aside from some low dimensional projective spaces, we will instead approach this problem via the following result.
Proposition 2.7. The manifold G k (R n ) is diffeomorphic to the smooth manifold consisting of all n × n symmetric, idempotent matrices of trace k. The map ϕ realizing this takes a k-plane X to the operator defined by orthogonal projection onto X.
Proof. If X is a k-plane with orthonormal basis x 1 , . . . , x k , denote by A the n × k matrix having the x i as columns. Define a map ϕ : G k (R n ) → M n (R) by X → AA T . This map is clearly smooth since it consists of polynomials in the entries of the various x i . Note that choosing a different basis for X amounts to conjugating AA T by the corresponding change of basis matrix. The matrix AA T is symmetric: the k × k identity matrix, since the columns of A are orthonormal). Finally, the trace of AA T is k since its rank is k and its only eigenvalues are 0 and 1. Thus the image of ϕ lies in the set of symmetric, idempotent matrices of trace k. To see that ϕ surjects onto this set, note that such a matrix B is projection onto a k-dimensional subspace X and there exists a basis x 1 , . . . , x k with ϕ(X) = B. Injectivity of ϕ follows since the subspace determined by a projection is unique. Now, to generate a sample of points on which to build a Vietoris-Rips or witness complex, we will use the embedding ϕ. A crude sampling is then obtained by the following procedure.
• Select k random vectors in R n .
• Perform the Gram-Schmidt orthogonalization algorithm to yield an orthornomal set x 1 , . . . , x k . Let A be the matrix with x i as columns. • Compute AA T . One immediate problem with this process is that the k-plane it constructs lives in the top-dimensional Schubert cell with probability 1. However, since we know the space we are interested in, and we know its homology, we can bias our sample to ensure we include points from each Schubert cell. The following procedure implements this idea.
• Determine the percentage of sample points desired from each Schubert cell. For example, one might choose 5% from a 1-cell, 10% from a 2-cell, and so on. • Elements of a given Schubert cell correspond to the column space of a particular matrix form. Generate such a matrix B using random vectors of the required form. • Generate a random n × n orthogonal matrix X.
• Add the matrix A = X(BB T )X T to the point cloud. Note the final step above. If we merely took the matrix B, we would not end up with a well-distributed sample. For example, in the case of G 2 (R 4 ), such a matrix lying in the 1-cell of the Schubert decomposition has the following form The corresponding point in R 16 would have most coordinates equal to 0, which is clearly not what we want. Conjugating the various BB T by a random orthogonal matrix X (a different X for each B) yields a wider distribution of points in G k (R n ).
The MATLAB files we used to generate samples in various projective spaces and Grassmannians are available at https://github.com/niveknosdunk/ grassmann.

Approximate triangulations.
We are now ready to search for simplicial complexes modeling the spaces G k (R n ). The procedure we employ is as follows.
• Construct a sample of points on G k (R n ).
• Construct a collection of Vietoris-Rips or witness complexes on the point cloud. • Compute the persistent homology of this filtration.
• Determine a range of parameters where the homology of the complexes agrees with that of G k (R n ).
Definition 2.8. Let K r denote either V R(X, r) or W (X, L, r). If there exists a parameter r > 0 for which the homology of K r agrees with that of G k (R n ), then we call K r an approximate triangulation of G k (R n ).  Note that K r is a subcomplex of the euclidean space in which we have embedded G k (R n ). However, it does not necessarily lie inside the embedded G k (R n ). Still, its vertices do lie on G k (R n ) and so we can think of this as being close to a triangulation of this manifold.
Note that ψ(−x, −y, −z) = ψ(x, y, z) and so it descends to a map RP 2 → R 4 . Generate a sample of 100 points on S 2 and then use this map to get the points in R 4 . The persistence diagrams are shown in Figure 3. There is a tiny window, around r = 0.87 where we get the correct homology. Now generate a sample of 200 points. As expected the Vietoris-Rips complex has the correct homology for a longer range of parameters, as indicated in Figure 4. Here we see a long interval 0.69 < r < 0.87 where we get the correct homology. So the Vietoris-Rips complex built on these 200 points in R 4 is a good approximation to RP 2 .

Figure 5.
Vietoris-Rips persistence diagrams for 100 points on RP 2 , using the isometric embedding into R 5 (a) H 1 persistence and (b) H 2 persistence Figure 6.
Vietoris-Rips persistence diagrams for 200 points on RP 2 , using the isometric embedding into R 5 (a) H 1 persistence and (b) H 2 persistence 3.2. RP 2 , Part II. The embedding of RP 2 into R 4 is not an isometric embedding, though. For that we need R 5 : (x, y, z) → yz, xz, xy, If we then generate 100 random points on this surface, we obtain the Vietoris-Rips barcodes in Figure 5. This works better than the embedding into R 4 ; we get the correct answer for 0.625 < r < 0.871. The result for 200 points is even better, and is shown in Figure 6 3.3. RP 3 . We use the fact that RP 3 is diffeomorphic to SO(3), the space of 3 × 3 orthogonal matrices of determinant 1. If we select 100 random points on this space in R 9 , we find that there is only a tiny window where β 2 = 1, so 100 points probably is not enough to yield a good approximate triangulation. The H 3 barcode is shown in Figure 8.
If we now sample 200 points at random on RP 3 (computation time 6:54) we obtain the barcodes in Figures 9 and 10. Note that we get the correct homology for 2.1 < r < 2.4.  Vietoris-Rips persistence diagrams for 200 points on RP 3 (a) H 1 persistence and (b) H 2 persistence 3.4. G 2 (R 4 ), Part I. We now consider the first Grassmannian that is not a projective space. Embed the 4-manifold G 2 (R 4 ) as the space of symmetric idempotent 4 × 4 matrices of trace 2. As a first attempt, we take the naïve sampling approach of generating random pairs of orthonormal vectors to build a point cloud of such matrices. However, persistence calculations now become rather cumbersome. Table 1 shows some statistics on computation times for point clouds of various sizes on a MacBook Pro, 16GB RAM, computing homology up to dimension 4.  Table 1. Computation times for Vietoris-Rips persistence up to dimension 4 for G 2 (R 4 ). An X indicates that the software could not complete the calculation. In a quest for more memory, we received an offer from Mikael Vejdemo-Johannson to use his machine. It has 256GB RAM. We began the 200 point Vietoris-Rips calculation in Eirene in the background and logged out. After 10 hours it was still processing and was using 97% of the system memory. The next morning the process was complete; the output file (in JLD2 format) was 74 GB (!). Since Eirene uses PlotlyJS to render barcodes, they cannot be viewed remotely. Even if the file could be retrieved, it is unclear that our 3.5. G 2 (R 4 ), Part II. We then took a different approach. The Vietoris-Rips complex is nice because it is easy to compute, but it suffers from combinatorial explosion. We turned to witness complexes and made the associated computations using the Javaplex package [10] in MATLAB.
The initial attempt simply generated elements of G 2 (R 4 ) by taking a pair of orthonormal vectors in R 4 and using them to build a certain 4 × 4 matrix. For this experiment, we biased the sample in the following way. For a given number M of points on G 2 (R 4 ), we took 5% from the 1-cell, 15% from each of the 2-cells, 25% from the 3-cell, and 40% from the 4-cell. One could choose different proportions, of course.
This worked remarkably well. We generated 5000 points on G 2 (R 4 ) and constructed the witness complex on 100 landmarks chosen using the maxmin process. The barcodes for one such trial are shown in Figure 13. Note that we get the correct homology for r > 0.125. This witness complex, which has 145,011 simplices, is therefore a good approximate triangulation of G 2 (R 4 ). The point cloud and witness points are available as text files at https://github.com/niveknosdunk/grassmann.

Conclusions
In this paper we demonstrated the utility of using Vietoris-Rips and witness complexes to obtain approximate triangulations of the Grassmann manifolds G k (R n ). We were able to construct such spaces with relatively few vertices, but some questions remain for further study.
(1) How small of a sample can we use to generate an approximate triangulation? For example, a result in [7] asserts that any triangulation of G 2 (R 4 ) must have at least 14 vertices. We built an approximate triangulation using a witness complex on 100 landmarks. Surely our Figure 13. Barcodes for a witness complex on 100 points in a 5000-point sample on G 2 (R 4 ) algorithm will not work with only 14 points, but we plan to investigate how few we can get away with. A theorem of Niyogi-Smale-Weinberger [13] provides lower bounds on the number of points required to compute homology correctly with high probability, but these are certainly too high and can be improved in practice. (2) Can we push the computations further? The next Grassmannian to study is G 2 (R 5 ). This is a 6-manifold, and using our procedure we would embed it in R 25 . The machine used to compute the persistent homology of the witness complexes on G 2 (R 4 ) in MATLAB ran out of memory on 100 landmarks in G 2 (R 5 ). We therefore need either a bigger machine running MATLAB, or software that can handle witness complexes. The GUDHI package [8] is one option, but we have not attempted it yet. (3) The author expects to gain access to a new GPU based supercomputer at his institution in the next year. This may allow for similar computations on higher-dimensional G k (R n ).