Parallel Enumeration of Triangulations

We report on the implementation of an algorithm for computing the set of all regular triangulations of finitely many points in Euclidean space. This algorithm, which we call down-flip reverse search, can be restricted, e.g., to computing full triangulations only; this case is particularly relevant for tropical geometry. Most importantly, down-flip reverse search allows for massive parallelization, i.e., it scales well even for many cores. Our implementation allows to compute the triangulations of much larger point sets than before.


I n t ro d u c t i o n
Triangulations are ubiquitous in combinatorics, optimization, algebra and other parts of mathematics. For an overview about the range of applications we recommend the first chapter of the monograph [7] by De Loera, Rambau and Santos, and the rest of that book is recommended for the foundations. The software which defines the state of the art is Rambau's TOPCOM [21]; see also [19]. An implementation of a different method is part of Gfan by Jensen [15]. While this comes with some advantages of its own, TOPCOM is superior in most cases. In this article we describe an algorithm using different ideas but which still shares some of TOPCOM's features. Most importantly we report on new computational results obtained with our implementation MPTOPCOM; some of these are out of the reach of the other software systems.
In order to explain the method it is best to start by looking into TOPCOM's algorithm; see also [7, §8.3]. As a first step, for a fixed point configuration, P , TOPCOM creates one triangulation as a seed. Then TOPCOM generates all triangulations of P which can be obtained by local transformations, known as flips. The flips induce a graph structure on the set of triangulations of P , and TOPCOM makes a breadth first search starting at the seed. In practical applications it is often important to restrict the attention to those triangulations of P which are regular, i.e., they are induced by a convex lifting function. Regularity can be checked via solving a linear programming feasibility problem, and this is supported by TOPCOM, e.g., via calling cddlib [9]. Any two regular triangulations of P are connected by a sequence of flips. Checking for regularity on the way is costly but is also beneficial since the regular triangulations of P have a particularly nice structure. The reason is that they correspond to the vertices of a convex polytope, the secondary polytope of P , which was introduced by Gel fand, Kapranov and Zelevinsky [11], and the edges of the secondary polytope correspond to flips (but there are flips which do not arise from edges). The coordinates of a regular triangulation ∆, seen as a vertex of the secondary polytope, are the GKZ-vector of ∆. Already in 2002 Imai et al. [12] proposed an algorithm which exploits the GKZ-vectors to tailor a reverse search scheme [2] for enumerating the regular triangulations of P , and we call their method down-flip reverse search. Surprisingly, the down-flip reverse search algorithm apparently received little attention, and some of its properties have been discovered independently by Pournin and Liebling [20]. A key challenge in practice is that relevant point sets P exhibit a great deal of symmetry. Usually this symmetry is given in terms of generators of some finite group G acting on P by permutations. The real task now is to compute the (regular) triangulations of P up to symmetry, i.e., exactly one representative from each G-orbit. This reduction is strictly necessary since the total number of triangulations is often too large. Imai et al. [12] also describe down-flip reverse search up to symmetry.
While the general algorithmic idea is known, a practical implementation of down-flip reverse search requires one to overcome several challenges. That is the focus of this paper. We will show that our implementation MPTOPCOM of down-flip reverse search, which can enumerate hundreds of millions of triangulations of a given point set, is superior to other methods in practice. Two aspects are most important. First, for large input, the implementation must be parallelized in order to be able to benefit from modern hardware. This is where the reverse search scheme shows its full strength; mts [4] is a competitive parallel implementation of the abstract reverse search method based on MPI [25]. Second, by far the most frequent subtask is to compute a canonical representative for each G-orbit of a given triangulation. A drawback of reverse search is that the representatives are often recomputed, even for the same orbit and the same triangulation. This is because we want to avoid any communication between the nodes in the computation tree since this is what makes reverse search so successful. Our main new algorithmic contribution is a procedure for computing these canonical representatives in a way that is -in our particular situation-much faster than standard methods from computational group theory. Despite our highly optimized setup, depending on the input and the desired output, the computation of the canonical representatives still may take more than 90% of the combined total time. The benefit of our approach, however, is tremendous. It turns out that, even if we restrict our parallel implementation to a single core, we are able to beat TOPCOM by a factor of five or more on medium size input. More importantly, on large input MPTOPCOM scales rather well even for more than a hundred cores. This means that we are now able to compute triangulations of point sets an order of magnitude larger than before. This paper is organized as follows. In Section 2 we collect the basic facts concerning the reverse search method of Avis and Fukuda [2]. This is a powerful general scheme for organizing a large enumeration via a rooted tree. The paradigmatical example is a dual convex hull computation. Here the input is a system of linear inequalities, and the output are the vertices and the rays of the feasible region, which is a convex polyhedron. A generic linear objective function induces an orientation on the edges of that polyhedron, and any pivoting strategy for the simplex method yields a tree, in which each edge is directed toward the global optimum. The basic idea of down-flip reverse search is to mimic this behavior on the secondary polytope of a point configuration. We close this section by describing the principles underlying the parallelization employed by mts [4]. Section 3 deals with the basic notions concerning triangulations, and we formulate the down-flip reverse search algorithm. Then, in Section 4 we briefly explain the general approach of Imai et al. [12] to combine enumeration up to symmetry with down-flip reverse search. The core of our paper is Section 5. The major challenge in MPTOPCOM's implementation is to efficiently find canonical representatives for each orbit of triangulations. To this end we introduce the concept of switch tables and evaluations to facilitate that computation (cf. Definition 9). This is developed in a way which specifically addresses the enumeration of triangulations, and yet we believe that it may also be useful in other circumstances. Our main theoretical contribution is a procedure for computing canonical representatives via switch tables (cf. Algorithm 4) and its analysis. In Section 7 we report on experimental results. First we consider a few standard examples, such as the 16 vertices of the 4-dimensional regular cube. More interesting are our new results. This concerns, e.g., products of simplices; these computations have been employed by Schröter for deriving results on coarsest matroid subdivision of hypersimplices [23]. Further, for the first time we are able to enumerate all regular and full triangulations of the 3-simplex with dilation factor of three. This is a configuration of 20 points in R 3 , and it turns out that it has precisely 21 125 102 regular and full triangulations, up to the symmetry induced by the symmetric group of degree four, acting on the vertices of the tetrahedron (cf. Theorem 19). This outcome is relevant, e.g., for tropical geometry, as this leads to the classification of the smooth tropical cubic surfaces in the tropical 3-torus; see [16, §4.5]. A full account of the consequences from that one computation is beyond the scope of this paper.
We are grateful to Jörg Rambau and Benjamin Schröter for comments and suggestions.
2. B u d g e t e d r e v e r s e s e a rc h a n d i t s pa r a l l e l i z at i o n Reverse search [2] is a technique for enumerating large sets of objects. Essentially, one explores a graph Γ = (V, E) where V is the set of objects to be enumerated and the edges are given by an adjacency oracle. For every node v, the number δ(v) denotes the outdegree of v in Γ. The adjacency oracle Adj(v, j) returns the j'th neighbor of v ∈ V for j ∈ [δ] or null if no such neighbor exists. To apply reverse search to a problem, one provides the adjacency oracle, a local search function π(v) and an element v * ∈ V . It is required that π(v) returns the tuple (u, j) such that Adj(u, j) = v and that repeated application of π to any v ∈ V results in a path from v to v * . The local search function therefore generates a spanning tree of Γ with root v * .

Algorithm 1 Reverse Search
repeat 4: while j < δ(v) do 5: j ← j + 1 6: if π(Adj(v, j)) = v then 7: v ← Adj(v, j) until depth = 0 and j = δ(v) 18: end procedure Reverse search (cf. Algorithm 1) traverses this spanning tree in depth-first fashion starting from v * , and at each node outputs the object corresponding to that node. One can use the local search function to backtrack. This allows to implement reverse search in a way which does not require any additional memory after an initial setup. However, it is also possible to cache auxiliary information needed by the application, for example to avoid recomputation when backtracking. This way the memory usage can be controlled even for computations with extremely large output. For more details on reverse search see [2].
It was recognized from the beginning that reverse search can be easily parallelized. The enumeration process can be restarted from any node given only a description of that node, and no node is reachable via multiple paths. This means that one does not have to store previously-visited nodes, and communication between processes is minimal. Many reverse search trees are highly unbalanced and so the underlying problem is to explore an unbalanced tree in parallel.
The first parallel reverse search implementation was the generic reverse search layer in ZRAM [6] which was applied in various areas, including the first parallel program for vertex/facet enumeration [6] and a program for certain quadratic maximization problems [8]. Other parallel reverse search applications include the computation of Minkowski sums [26]. Furthermore, Jensen computed exact mixed volumes [13] and homotopy continuation [14] via parallel reverse search implemented in Gfan [15].
Recently budgeted reverse search [3] was introduced as a simple scheme for load-balancing in parallel reverse search. There, one adds an additional parameter (the budget) and each of the parallel processes explores its assigned subtree subject to the budget. Once the budget is exhausted, the process backtracks to the root of the subtree while reporting unexplored nodes along the backtrack path. These unexplored nodes are scheduled later for exploration. This is a particularly simple approach to parallel tree exploration that in practice can scale beyond 1000 cores [3]. If properties of the trees generated by an application are known, it can be possible to prove certain performance guarantees [1].
Budgeted reverse search inherits a number of other features from reverse search. One can checkpoint and restart the overall process by waiting for all processes to exhaust their budget and writing the descriptions of unexplored nodes to a file. The budget can be tuned dynamically based on the number of unexplored subtrees available. While the implementation of [3] is specific to vertex/facet enumeration, a generic implementation of parallel budgeted reverse search (mts) is also available along with a tutorial [4].
Implementing a budgeted reverse search application with mts allows a clean separation of the parallelization layer and application code. This permits independent development of the application and the parallelization layer. While the current mts implementation uses MPI [25] and can run on heterogeneous clusters, a compatible shared memory implementation is also possible. The current mts implementation dedicates a process as master and another to handling output. This overhead is insignificant when many processes are available, but limits parallel efficiency when only few processes are used. We refer to Section 7 below for further details on how MPTOPCOM employs mts for enumerating triangulations.
3. T r i a n g u l at i o n s o f p o i n t c o n f i g u r at i o n s Let P ⊂ R d be a finite set of n points that affinely spans the entire space. A (polyhedral) subdivision Σ of P is a polytopal complex which covers the convex hull conv P , such that the vertices of each cell form a subset of the given points P ; cf. [7, §2.3.1]. The subdivision Σ is regular if it is induced by a height function h : P → R in the sense that the lower convex hull of conv (p, h(p)) | p ∈ P ⊂ R d+1 projects to Σ by omitting the last coordinate. The subdivision Σ is a triangulation if all its cells are simplices. The set of all subdivisions of P is partially ordered by refinement, and the triangulations are precisely the finest subdivisions. Our goal in this section is to discuss and present algorithms for enumerating all regular triangulations of P . For a given subdivision Σ the set of all height functions which induce Σ on P is a relatively open polyhedral cone, the secondary cone of Σ. The secondary cone of Σ is non-empty if and only if Σ is regular. The set of all secondary cones forms a polyhedral fan, the secondary fan Σ-fan(P ). The relatively open secondary cones partition the space R n of height functions, i.e., the secondary fan is complete. Each non-empty secondary cone contains a (d + 1)-dimensional linear subspace, the space of linealities of the secondary fan. Fixing the heights on an affine basis in P amounts to intersecting the secondary fan of P in a way such that each cone in the resulting fan is pointed. We call that (n − d − 1)-dimensional pointed polyhedral fan the pointed secondary fan of P . It is unique up to linear transformations.
As a key fact the pointed secondary fan is polytopal, i.e., it is the normal fan of a convex polytope of dimension n − d − 1. We quickly review the construction. For a triangulation ∆ of P the GKZ-vector is gkz ∆ = gkz ∆ (p) | p ∈ P , where gkz ∆ (p) is the sum of the (Euclidean) volumes of those simplices in ∆ which contain p as a vertex. The convex hull Σ-poly(P ) = conv { gkz ∆ | ∆ triangulation of P } of all GKZ-vectors is the secondary polytope of P ; its normal fan is the (pointed) secondary fan of P ; cf. [7, §5.2.2]. The vertices of Σ-poly(P ) are precisely the GKZ-vectors of the regular triangulations. As we assumed the point configuration P is spanning, the dimension of its secondary polytope equals We will now sketch an algorithm by Imai et al. [12] to enumerate (regular) triangulations of P based on applying reverse search to the vertex-edge graph of Σ-poly(P ) (or a suitable supergraph). See or if λ(gkz ∆ ) = λ(gkz ∆ ) and ∆ is lexicographically larger than ∆.
This defines a total ordering on the set of all triangulations of P . Since λ induces a total ordering on the vertices of Σ-poly(P ) the lexicographic ordering is not required as a tie-breaker if restricted to regular triangulations. However, it is important for nonregular triangulations; see Example 8 below.
A flip f is a local modification of a triangulation, ∆, of P which yields another triangulation, ∆ ; we write f = [∆ ∆ ]. Each edge of the secondary polytope comes from a flip, but the converse does not hold [7, §5.3.1]. We call f an up-flip if ∆ > ∆. Otherwise ∆ < ∆, and f is a down-flip. The flip graph Φ of P is the graph whose nodes are the triangulations of P and whose edges are given by the flips. The flip graph Φ is directed with respect to up-flips. Note that Φ is not necessarily connected [7, §7.3 and §7.4]. However, the subgraph Φ reg induced on the subset of regular triangulations is always connected.
is a flip and ∆ is regular, then ∆ does not need to be regular. That is, the connected component Φ c reg of the regular triangulations may be strictly larger than Φ reg . We call Φ c reg the regular component of the flip graph, and we call a triangulation in the regular component semi-regular. Moreover, even if ∆ and ∆ both are regular then f does not necessarily correspond to an edge of Σ-poly(P ). In general the vertex-edge graph of Σ-poly(P ) may be a proper subgraph of Φ reg as it may have fewer edges.
The reverse search algorithm (cf. Algorithm 1) works on a graph which is given only implicitly by means of an oracle. It is only after the termination of the procedure that the entire graph (or rather a spanning tree) is known. Therefore, we introduce the following notation. Our adjacency oracle Φ reg (∆, j) for the regular flip graph Φ reg at a given regular triangulation ∆ of P returns a regular triangulation which can be obtained from ∆ by some down-flip, and it is the jth one in the total ordering from Definition 1. If there are less than j regular triangulations accessible via down-flips, then the oracle returns null . We call our predecessor function π, and it returns the maximal triangulation which can be obtained via an up-flip, if it exists. There is a unique regular triangulation ∆ * whose GKZ-vector is maximal among all triangulations, regular or not. We set π(∆ * ) = null , and we call ∆ * the optimal triangulation of P . Recall that our ordering of the triangulations and thus this notion of optimality relies on the choice of an ordering of the points in P . R(∆, Φ reg , π) 7: end procedure Theorem 3 (Imai et al. [12]). The down-flip reverse search algorithm 2 computes all regular triangulations of P .
Proof. For each fixed ordering of the point set P the beneath-and-beyond method provides the corresponding placing triangulation, which is known to be regular (cf. [7,Lemma 4.3.5]). This can be taken to implement Step 2. The triangulation ∆ which arises as an argument to the call of reverse search in Step 6 is the optimal triangulation ∆ * . Checking the regularity of a given triangulation requires to solve a linear program. That is, in practice the oracle Φ reg first needs to determine all flips, then to reduce to the set of regular down-flips and finally sort them according to Definition 1. The sequence of flips from the initial triangulation ∆ to the optimal triangulation ∆ * is monotone in the sense of [7,Def. 5.3.8].
The claim now follows from the fact that, first, the regular flip graph is connected, second, λ induces an acyclic orientation, and, third, the correctness of Algorithm 1.
Note that, by [7,Cor. 5.3.11], the number of up-flips to the optimal triangulation does not exceed Remark 4. Solving the linear programs on the way to implement the oracle Φ reg is costly. Therefore, in practice, this is often avoided (or rather postponed as a post-processing step, which can be parallelized in a trivial way). The breadth-first-search strategy of TOPCOM then computes the entire component Φ c reg instead of Φ reg . The situation differs, however, for the down-flip reverse search algorithm. The reason is that there may be nonregular triangulations which are locally maximal with respect to the objective function λ. Therefore, in the steps prior to calling reverse search checking for regularity in the predecessor function π cannot be dispensed with. In the actual reverse search part of the computation this is not necessary. This variant of down-flip reverse search will, in general, compute a proper subgraph of Φ c reg which comprises all regular triangulations, since each regular triangulation reaches ∆ * via a sequence of up-flips.
Example 5. Figure 1 shows the secondary polytope of the "mother of all examples" (MOAE) from [7,Exmp. 5.5.7]; see also [18]. This is a configuration of six points in R 2 with three vertices of the convex hull and three points in the interior of the outer triangle. There are 18 triangulations, 16 of which are regular. The 18 triangulations are grouped into five orbits, and the coloring of the vertices in Figure 1 shows those orbits. The two nonregular triangulations (are semi-regular and) share the same GKZ-vector, and these occur as the central blue point in a hexagon with yellow vertices. The optimal triangulation corresponds to the one black vertex (at the top and toward the back). The thick edges form the reverse search tree of the graph Φ c reg = Φ; the optimal triangulation is the root.

Remark 6.
A triangulation of P is full if it uses all the points in P . It was proved in [12, §5] that the subgraph of Φ reg induced on the full triangulations is connected; see also [20] and [7, Cor. 5.3.14]. As a consequence, the down-flip reverse search algorithm can be applied to enumerate the full triangulations only. The simple idea is to apply the reverse search scheme to the graph whose nodes correspond to the orbits of the (possibly regular) triangulations and whose edges are induced by the edges in Φ, i.e., by flips. Following Imai et al. [12] we suggest to adapt the approach via GKZ-vectors from Section 3 to the symmetric setting. Let P ⊂ R d be a finite point set, and let Reverse search tree of MOAE from Example 5. The two blue points correspond to the two nonregular triangulations. Since they share the same GKZ-vector they actually coincide; here we chose to draw them slightly apart in order to reveal the tree. be a finite group of affine unimodular automorphisms which acts on the set P by permutations. In particular, this action is faithful, and it preserves the volume. Clearly, there is an induced action of G on the set of all triangulations of P , which leaves the set of regular triangulations invariant.
Lemma 7. Let g be an element of G, and let ∆ be a triangulation. Then we have gkz g·∆ (g(p)) = gkz ∆ (p) for all points p ∈ P .
Proof. This is an immediate consequence of G preserving the volume.
The main challenge in implementing reverse search for enumerating triangulations is to find a fast implementation for computing the canonical representatives. To this end we will to some extent deviate from standard wisdom in the algorithmic theory of permutation groups as presented, e.g., in the monograph [24] by Seress. One reason is that our groups are fairly small. In most relevant cases G contains a few thousand elements, while the set of triangulations on which G acts often has millions of orbits. Moreover, despite the fact that the point set is symmetric, few triangulations exhibit much symmetry. This entails that most orbits are about the size of the entire group. See Section 7 for more details and precise numbers.
In order to employ reverse search for traversing (a connected component of) the flip graph, the total ordering on the set of triangulations introduced in Definition 1 is crucial. This leads us to represent the G-orbit of a triangulation ∆ of P by That is, the canonical representative ρ(∆) of an orbit G · ∆ is characterized as follows: (i) its GKZ-vector is lexicographically maximal among all GKZ-vectors of the triangulations in G · ∆, and (ii) among all triangulations in G · ∆ which satisfy (i) it is lexicographically maximal, considered as a characteristic vector of maximal simplices.
In particular, ρ maps triangulations to triangulations in the same orbit such that ρ(∆) = ρ(∆ ) if and only if G · ∆ = G · ∆ , and it follows that ρ(ρ(∆)) = ρ(∆). Recall that the above definition still depends on the ordering of the points in P as well as on the ordering of the maximal simplices. If the triangulation is regular, then the entire orbit G · ∆ consists of regular triangulations and thus their GKZ-vectors are pairwise distinct.
The following example shows that the GKZ-vectors of nonregular triangulations may behave in an unexpected way, and this shows that (ii) is necessary.
Example 8. Let I 4 be the set of 16 vertices of the 4-dimensional 0/1-cube. Here and below we use I = [0, 1] to denote the unit interval on the real line. We may read each vertex as a bitstring of length four, and in this way we obtain a natural encoding in terms of the hexadecimal digits 0 through F. The full group G of affine unimodular automorphisms is the wreath product of a cyclic group of order two (corresponding to a reflection which sends x i to 1 − x i ) by the symmetric group Sym(4) (acting on the four neighbors of a fixed vertex); the total size of G equals 384. Consider the triangulation ∆ whose maximal simplices read Yet the lexicographically maximal GKZ-vector in both orbits is the same, and it reads (23, 3, 2, 8, 2, 6, 10, 6, 2, 3, 10, 9, 10, 11, 1, 14) .
It follows that neither ∆ nor ∆ are regular (but semi-regular). The same holds for the canonical representatives ρ(∆) and ρ(∆ ).
This example illustrates that, in general, the GKZ-vectors cannot distinguish between G-orbits. Moreover, it also shows that GKZ-vectors alone do not suffice to establish a total ordering on the flip graph. For a tie-breaker we need, e.g., the lexicographic ordering as in Definition 1.

C a n o n i c a l r e p r e s e n tat i v e s v i a s w i t c h ta b l e s
The purpose of this section is to explain why the canonical representatives of triangulations are chosen as in (1) and how this can be exploited. To this end we start out with a more abstract setting. We assume that the finite group G acts on the set [m] := {0, . . . , m − 1}.
The following concept is our main tool.
Note that we have j > i if st(i, j) = id. We denote by µ(st) the depth of the switch table, which is defined as That is, µ is the index of the first row of st which only contains the identity. In general, a switch table is by no means unique as there may be many candidates in G for st(i, j).
The ith row of a switch table tells us which elements ≥ i can be moved to position i while leaving the first i elements of [m] unchanged. That is, the switch st(i, j) lies in the stabilizer Proof. Consider the case i = 0, where we need to show that the 0th row of the switch table gives a transversal of H := G [1] in G = G [0] . The group G acts transitively on the left cosets of H by multiplication on the left. Let g be an element in the complement G \ H. If this does not exist there is nothing to show. Since g does not stabilize i = 0 there is an index j > 0 such that g · j = 0. By definition of the switch table there must also be a switch s = s(0, j) with s · j = 0. We obtain that s −1 g lies in H and thus sH = gH, which proves the claim for i = 0.
Erasing the 0th row and the 0th column of the switch table yields a switch table for H = G [1] . This shows that, inductively, the above argument also resolves the general case.
Let σ := σ 0 σ 1 · · · σ m−1 be the product of the σ i . A switch table is somewhat reminiscent of a system of strong generators for G; e.g., see [22]. However, the concepts differ as can be seen from the example below. and all other entries of the switch table are the identity element. The depth of this switch table is 3. We have σ 0 = 4, σ 1 = 3, σ 2 = 2 and σ = 24, which is the order of Sym(4). Let G st i be the subgroup of G, called the ith switch group, which is generated by the ith row of the switch table st. The switch group G st 0 is generated by the 4-cycle (0 1 2 3), while G st 1 is generated by the 3-cycle (1 2 3), and G st 2 is generated by the transposition (2 3).
Similarly, G st 1 is a proper subgroup of G [1] , which is the symmetric group of degree three acting on {1, 2, 3}, whereas G st 2 agrees with G [2] . Observe that G st 2 is not a subgroup of G st 1 , which in turn is not a subgroup of G st 0 . Now we consider a second action of G on some other set Ω and a map v : Ω → R m . Our first action of G on [m] induces a (linear) action on the vector space R m by permuting the coordinates. If the compatibility condition v(g · ω) = g · v(ω) is met we call v an m-evaluation map of the G-action on the set Ω. Due to the compatibility we obtain an action of G on the set v(Ω) ⊂ R m . Now we define the canonical representative ρ(v(ω)) for that action as the lexicographically maximal vector in the orbit G · v(ω). Below we will discuss the relationship of this definition with the canonical representative of a triangulation from (1). The following example is natural.
Example 14. If G acts on [m] then this induces a second action on the set Ω of all subsets of [m]. Sending ω ∈ Ω to its characteristic vector of length m yields an evaluation map. The canonical representative of an orbit is the lexicographically smallest set in that orbit. This is TOPCOM's approach.
Our main method for computing canonical representatives is the procedure c a n o n i c a l in Algorithm 4. It essentially relies on the function G o o d S w i t c h e s in Algorithm 3 which determines all switches that may lexicographically improve a given vector z ∈ R m in the orbit of G [i] . The idea is to employ a depth first search.

Algorithm 3 Find switches which may yield a larger vector in G
return { st(i, j) | j ∈ J} 12: end procedure Proposition 15. Let ω ∈ Ω be an arbitrary element. For all i ∈ [m], Algorithm 4 computes an element ω ∈ G [i] · ω such that v(ω ) is lexicographically maximal among all elements in the orbit G [i] · ω. In particular, for i = 0, the evaluation v(ω ) is the canonical representative ρ(v(ω)).
Proof. The correctness of Algorithm 4 essentially follows from Corollary 11, which provides a transversal of G [m] as a subgroup of G in terms of switches. As that transversal arises from an inductive construction Algorithm 4 works recursively.
It remains to discuss the case where v(ω ) 0 = v(ω) 0 . Then we need to consider all possible switches which may or may not change the entry with index 0 but keep the v-value. This may include the identity. for s ∈ S do 8: ω ← c a n o n i c a l(s · ω, v, i + 1, st) 9: if v(ω ) > v(ω ) then For ω ∈ Ω denote by ψ(ω) the maximum number of identical entries of the vector v(ω). Since G acts on the set v(Ω) by coordinate permutations, the number ψ(ω) is an invariant of the orbit G · ω. We let ψ be the maximal ψ(ω), taken over all elements in Ω.
Define φ i as the minimum of ψ and σ i ; this depends both on the switch table and on the evaluation function. Furthermore, we set φ := φ 0 φ 1 · · · φ µ(st)−1 . By construction we have φ ≤ σ ≤ #G. The benefit from our somewhat elaborate setup comes from the fact that the number φ may be much smaller than the order of the group G; see Example 18 below. Proof. In the procedure G o o d S w i t c h e s, called with the vector z of length m and the index i, the set E constructed in Step 2 has size at most m − 1. Hence sorting E is of order O(m log m). The size of set J constructed in Step 5 or Step 11 is bounded by the number of identical entries ψ(ω) in the vector v(ω). Thus, the set S constructed in Step 6 or Step 12 is bounded by the size of J as well. But the set S also contains only non-trivial entries from the ith row of the switch table, in case of Step 12 a single copy of the identity. Hence the set of switches returned is of size at most φ i .
The height of the recursion tree of c a n o n i c a l is at most m − 1, and hence the number of leaves is bounded by φ 0 φ 1 · · · φ m−1 = φ. Consequently, the total number of nodes and also the total number of edges is of order O(φ). For each edge we lexicographically compare two vectors of length m. All other costs are dominated by the total complexity of these comparisons.
Finally, we can explain why we chose the canonical representative of a triangulation as in (1) which rests on Definition 1. Let us recall our setup. The point set P ⊂ R d , of cardinality n, is affinely spanning. It is equipped with a group G of affine unimodular automorphisms. This action induces an action on the set Ω of all triangulations of P . We can encode a triangulation as its set of d-simplices, which are the maximal cells. The latter are encoded as those (d+1)-subsets of P which are affinely spanning. If m is the number of all d-simplices then encoding a triangulation ∆ as its characteristic vector among the set of all d-simplices yields an m-evaluation map of the action of G on Ω; see Example 14. For computing the canonical representative of a triangulation ∆ ∈ Ω in a brute-force approach all elements of G are applied and the lexicographically largest one is picked. Of course, the elements of G can be precomputed once in the initialization. Since most orbits are expected to be about the size of G this is often superior to the more traditional approach of trying generators of G until no new triangulations are found, since it requires fewer comparisons of (characteristic vectors of) triangulations.
Yet evaluating at GKZ-vectors leads to a significant improvement. The map which sends a triangulation ∆ to its GKZ-vector is an n-evaluation, and n is always much smaller than m. More importantly most entries of a typical GKZ-vector are distinct and hence the parameter ψ which enters the complexity analysis in Corollary 17 is very small. This explains criterion (i) in Definition 1. The tie-breaker criterion (ii) is only necessary for dealing with nonregular triangulations. The following example occurs in a computation which should be considered small by current standards. Larger input results in larger gains.
Example 18. Let us again look at the set P = I 4 of vertices of the 4-dimensional regular cube. The group G is the full group of affine (unimodular) automorphisms of order 384. Mapping to GKZ-vectors yields a 16-evaluation. Any switch table has depth four with Their product σ = σ 0 σ 1 σ 2 σ 3 equals 384, which is the order of the group G.
To assess the complexity of computing the canonical representatives, the shape of the GKZ-vectors is the key. Here is the full distribution of values of ψ(·) for the 247 451 orbits of semi-regular triangulations of I 4 . The average value of ψ (taken over all orbits) is approximately 3.22. This results in an average complexity parameter of φ ≈ 66.8, which is much smaller than 384, the order of the group.
6. I m p l e m e n tat i o n d e ta i l s Our implementation MPTOPCOM builds on and uses existing code from mts [4], polymake [10] and TOPCOM [21]. The mts setup dedicates one process to the master and a second one for dealing with the output; the remaining processes are reserved for the workers. Let us give a brief overview of the interplay of the different software systems: While parallelization is handled by mts, the single worker processes are in the domain of polymake and TOPCOM. Triangulations and groups are handled by TOPCOM's highly optimized code. Since TOPCOM provides vectors and matrices already, a first implementation used these for gkz-computation. However, replacing TOPCOM's vectors and matrices by the corresponding objects from polymake vastly improved performance. At this point, all vectors, matrices, and sets are handled by polymake. Checking regularity is handled by TOPCOM's internal interface to cddlib [9].
Parallelizing with more threads will scale almost linearly at first, but depending on the size of the example, the curve of time consumption over number of threads will flatten sooner or later, as can be seen in the Figures 2, 3, and 4. One reason for this is that the reverse search algorithm makes workers check all neighbors of a node in order to filter out those that are actually being visited (see Algorithm 1, Step 6), leading to duplication of certain tasks among the workers. Even if a worker A does not visit a node v, this node can be a neighbor of a node being visited. In this case worker A computes the flips of v to determine its predecessor, in the symmetric case it also needs to compute the canonical representative. If worker B actually visits node v it will redo the computations of worker A. This leads to duplication of work between A and B and multiplication of work in general for many threads. Furthermore every worker will do the same preparatory work at startup, namely compute all determinants of non-trivial simplices formed from the point set at startup and cache the result, in order to speed up gkz-computation, and as TOPCOM usually proceeds, every worker will compute the full group from the generators provided. This is already a problem for a single worker that might encounter node v several times as neighbor of different nodes. We approach this problem by introducing caches for flips and canonical representatives for the single workers. Sharing these caches among the workers as hinted at before is possible, but not realized yet, and would probably damage the flexibility of MPTOPCOM.
We use several encodings of the same triangulation simultaneously. For instance, enumerating all maximal simplices spanned by our point configuration first allows to store a triangulation as a set of machine-size integers. Those integers are the indices pointing into the array of maximal simplices.
It is of major practical importance to equip each worker with constant sized caches for various objects; they follow the least-recently-used paradigm. Caching is combined with hashing such that previously computed data is instantly available, if it is still cached. Cached data includes the volumes of all maximal simplices. Further, for each triangulation we cache, e.g., the sets of up-and down-flips, the canonical representatives, and the GKZ-vectors of the canonical representatives.
The implementation is flexible enough to deal with several scenarios. In our fastest setup we assume that the coordinates of the point configuration are machine size integers. This means that we may use int for the entries of the GKZ-vectors. The condition on the integrality is natural for the applications for algebraic and tropical geometry we have in mind. If the coefficients need to be so large that they do not fit into an int then it is hopeless to enumerate all triangulations anyways.
As pointed out several times the total ordering on the triangulations from Definition 1 depends on an ordering of the points. In practice it is often beneficial to pick a random ordering. This may reduce the height of the search tree for the canonical representative.
We experimented with several compilers on various kinds of hardware. By and large we found clang, version 3.8.0, to be about 10% faster than various versions of gcc compilers. Therefore our timings below employ clang. For the hardware we tried the following: • A cluster with four nodes, each of which is equipped with 2 × Intel Xeon E5-2630 v2 Hexa-Core (2600-3100MHz, 5201.45 bogomips) and 64GB RAM per node. On this machine we used 40 threads. Recall that two processes are reserved for the master and the output processes. That is, n processes means n − 2 workers.
7. E x p e r i m e n ta l r e s u lt s We tried our implementation on a number of point configurations which occur naturally in geometric combinatorics and related areas. Our notation is as follows. The point set ∆ d comprises the d+1 vertices 0, e 1 , e 2 , . . . , e d of the standard d-dimensional simplex, where e i is the ith standard basis vector of R d . We abbreviate I = [0, 1], and so I d is the d-dimensional unit cube. From these several interesting point configurations can be formed, e.g., by taking products. The point configurations of type I d or ∆ p × ∆ q or I p × ∆ q are in convex position, i.e., they form the vertices of polytope. For such point configurations Table 1 shows the number of orbits of regular and semi-regular triangulations; in all cases the group G is the full group of affine symmetries.
To the best of our knowledge the results for ∆ 2 × ∆ 6 , ∆ 3 × ∆ 4 and 3 · ∆ 3 are new. On an AMD Ryzen 7 1700 TOPCOM takes 2345 seconds to enumerate the regular component, while MPTOPCOM requires 2598 seconds (single-threaded) and 457 seconds (with 10 processes), respectively. Figure 2 shows how MPTOPCOM scales with the number of processes. It shows that, on a standard desktop computer, already four processes, i.e., two workers, suffice for MPTOPCOM to be substantially faster than TOPCOM. This should be compared with Figure 3 which shows very similar behavior on our cluster. The only exception is that, on this hardware, even a single-threaded version of MPTOPCOM beats TOPCOM. One possible reason could be a better cache performance. The single-threaded MPTOPCOM-1 is the pure down-flip reverse search algorithm built without the overhead of mts and MPI. The natural group action is by the product Sym(p + 1) × Sym(q + 1) of symmetric groups. In tropical geometry, e.g., their regular subdivisions control the combinatorial types of tropical polytopes [16, §5.2]. The special case where one of the factors is one-dimensional, i.e., when the product of simplices is a prism, is fully understood [7, §6.2.1]. Therefore, in our experiments we restrict our attention to cases with 2 ≤ p ≤ q. There is a formula for the number of all triangulations of ∆ 2 × ∆ q [7, 9.2.5], but this does not immediately yield the number of (semi-)regular triangulations or the number of orbits. Figure 4 shows the speed for computing the triangulations in the regular component of ∆ 2 × ∆ 6 with MPTOPCOM on the cluster, depending on the number of processes. This computation is medium size, i.e., a bit larger than the previous, and so it pays to use Our new results for ∆ 2 × ∆ 6 and ∆ 3 × ∆ 4 helped Schröter [23] to obtain new results on coarsest subdivisions of hypersimplices. An attempt to handle ∆ 3 × ∆ 5 is currently under way (running on more than a hundred cores for a few weeks). So far it has found more than 400 million orbits of semi-regular triangulations. 7.3. Dilated simplices. A third class of point configurations is denoted as k · ∆ d . These are the n+d n = n+d d lattice points in the simplex ∆ d which is dilated by the factor k. For any polynomial in d + 1 indeterminates which is homogeneous of degree k the monomials correspond to points in the point configuration k · ∆ d . In particular, the vertices of the Newton polytope form a subconfiguration. It follows that the tropical hypersurfaces in the tropical d-torus T d+1 /R1 of homogenous degree k are dual to regular subdivisions of k · ∆ d ; see [16, §3.1]. The regular unimodular triangulations of k · ∆ d , which are necessarily full, correspond to those tropical hypersurfaces which are smooth. For the first time, we computed the full triangulations of 3 · ∆ 3 , and these classify the smooth tropical cubics in 3-space [16, §4.5].
Theorem 19. There are exactly 21 125 102 orbits of regular and full triangulations of 3 · ∆ 3 with respect to the natural action of the symmetric group of degree four. Out of these, 14 373 645 are unimodular. 8. C o n c l u d i n g r e m a r k s Table 1 also contains some empty rows, where we do not know the respective number of triangulations. Most of these will be out of reach for the current implementations, including MPTOPCOM. The reason for listing these nonetheless is to give a feel for the orders of magnitude involved. One main complexity parameter for enumerating triangulations is the difference n − d of the number of points and the dimension. This is also one plus the dimension of the secondary fan, modulo linealities. Our experiments suggest that, as a very rough estimate, the range for TOPCOM seems to be limited by n − d ≈ 13. This bar is raised substantially by MPTOPCOM to cover point configurations with n − d = 17 such as 3 · ∆ 3 . It is an interesting question if MPTOPCOM can, e.g., deal with I 3 × ∆ 2 where n − d = 19. That particular point configuration played a role in work of Orden and Santos [17] on efficient triangulations of cubes; see also [7, §6.3.3]. The empty rows of Table 1 show some cases which seem to be rather difficult challenges, with the current techniques. This includes the five-dimensional cube I 5 or the dilated simplex 4 · ∆ 3 . Proving results about their triangulations might require clever strategies for random probing.
R e f e r e n c e s