Multi-way dual Cheeger constants and spectral bounds of graphs

We introduce a set of multi-way dual Cheeger constants and prove universal higher-order dual Cheeger inequalities for eigenvalues of normalized Laplace operators on weighted finite graphs. Our proof proposes a new spectral clustering phenomenon deduced from metrics on real projective spaces. We further extend those results to a general reversible Markov operator and find applications in characterizing its essential spectrum.


Introduction and main ideas
Cheeger constant, encoding the global connectivity property of the underlying space, was invented by Cheeger [7] and related to the first nonzero eigenvalue of the Laplace-Beltrami operator on a compact Riemannian manifold, which is now well-known as Cheeger inequality. Afterwards, it was extended to discrete settings by several authors in spectral graph theory or Markov chain theory, see e.g. [17], [2], [1], [24], [31], [37], [16], [8]. Intriguingly, this promote significantly the research in many unexpected theoretical and practical areas, such as the explicit construction of expander graphs, see e.g. [1], [27], [38], [34], graph coloring, image segmentation, web search, approximate counting, for which we refer to [25], [23] for detailed references.
Recently, Miclo [29] (see also [14]) introduced a set of multi-way Cheeger constants (alternatively called higher-order isoperimetric constants), h(k), k = 1, 2, . . ., in discrete setting and conjectured a higher-order Cheeger inequality universal for any weighted graph. This conjecture was solved by Lee, Gharan and Trevisan [25] by introducing the powerful tool of random metric partitions developed originally in theoretical computer science. Their approach also justify the empirical spectral clustering algorithms of [33] which are very important and powerful tools in many practical fields (see e.g. [28], [35]). Amazingly, in an interesting turn of the tables, this new progress of spectral graph theory provides feedback to the setting of Riemannian manifold. Funano [18] and Miclo [30] (with different strategies) extended the higher-order Cheeger inequality to weighted Riemannian manifolds and found very important applications there.
In contrast to a Riemannian manifold, on a graph, the spectrum of the normalized Laplace operator is bounded from above by 2. Explicitly, one can list them as 0 = λ 1 ≤ λ 2 ≤ · · · ≤ λ N ≤ 2, (1) where N is the size of the graph G. Therefore, a graph has its own particular spectral gaps 2 − λ k which have no counterparts in the Riemannian setting. In order to investigate the spectral gap 2 − λ N , Bauer and Jost [4] invented a dual Cheeger constant, h(1) in our notation below, encoding the bipartiteness property of the underlying graph. Explicitly, it holds that connected G is bipartite ⇔ h(1) = 1. ( (A graph is bipartite if its vertex set can be divided into two classes and edges are only permitted between two vertices from opposite classes.) They then proved a dual Cheeger inequality, providing a strong quantitative version of the fact that 2 − λ N vanishes if and only if the underlying graph is bipartite. This has already found important applications for the convergence of random walks on graphs, synchronization for coupled map lattices [4] and characterizing behaviors of the essential spectrum of infinite graphs [3].
In this paper, we introduce a set of multi-way dual Cheeger constants, h(k), k = 1, 2, . . . , N , encoding more detailed information about how far/close a graph is from being a bipartite one. The dual relations between h(k) and h(k) are nicely showed by the fact that In fact, if a graph can satisfy h(k) + h(k) = 1 for a small number k, then, roughly speaking, it actually has a large size bipartite subgraph in a reasonable sense (Proposition 3.1 (iii)). For example, it holds that for an odd cycle C N (Proposition 7.2), h(k) + h(k) = 1, 2 ≤ k ≤ N.
Recall an odd cycle is not bipartite but very close to be. Moreover, this framework provides new viewpoint about the previously defined constants. We see that the dual Cheeger constant of Bauer-Jost, h(1), is actually dual to h(1) = 0 and the Cheeger constant h(2) is dual to h(2). We prove higher-order dual Cheeger inequalities, estimating the spectral gaps 2 − λ N −k+1 in terms of h(k), universally holding for any weighted finite graphs (see Theorem 1.2). This complete the whole picture about graph spectra and (dual) isoperimetric constants. Interestingly, our proof proposes a new type of spectral clustering via the top k eigenfunctions employing a metric on a real projective space. As in [25], the proof is in principle algorithmic and hence we anticipate the practical applications of this new spectral clustering.
The deep relations between higher eigenvalues and geometry of graphs have been explored in several important works of Chung, Grigor'yan and Yau [12,10,11]. For discussions about the spectral gap 2−λ N and curvature of graphs, the readers are invited to [5]. In Markov chain theory, there is the fundamental work of Diaconis and Stroock [16] about various geometric bounds of eigenvalues, in particular, of 2 − λ N . Note that the language of Markov chains and that of normalized graph Laplacian we use here can be translated into each other. For example, a chain is aperiodic if and only if its associated graph is not bipartite .
In this spirit, it turns out our results can be applied to a very general setting. We extend the multi-way dual Cheeger constants and higher-order dual Cheeger inequalities to a reversible Markov operator P on a probability space (X, F, µ), following recent works of Miclo [30] and F.-Y. Wang [39]. Let us denote the infimum (supremum resp.) of the essential spectrum of P by λ ess (λ ess resp.). We obtain a characterization for λ ess in terms of extended multi-way dual Cheeger constants h X (k), It can be considered as the counterpart of F.-Y. Wang's new criterion for λ ess in terms of multi-way Cheeger constants h X (k), Both arguments depend on an approximation procedure developed by Miclo, by which he solves the conjecture of Simon and Høegh-Krohn in a semi-group context. A further discussion about the relations between h X (k) and h X (k) enables us to arrive at sup k≥1 h X (k) > 0 ⇔ −1 < λ ess (P ) ≤ λ ess (P ) < 1.
1.1. Statements of main results. In order to put our results into perspective, we start with recalling the (higher-order) Cheeger inequalities. Let G = (V, E, w) be an undirected, weighted finite graph. V, E stands for the set of vertices, edges respectively. We denote by w uv the positive symmetric weight associated to u, v ∈ V where e = (u, v) ∈ E (sometimes we also write u ∼ v). For convenience, we may put w uv = 0 if u, v are not connected by an edge. The degree d u of a vertex u is then defined as d u := v,v∼u w uv .
The expansion (or conductance) of any non-empty subset S ⊆ V is defined as where S represents the complement set of S in V , and |E(S, S)| := u∈S,v∈S w uv , vol(S) := u∈S d u = |E(S, S)| + |E(S, S)|. Then, for every k ∈ N, the k-way Cheeger constant is defined as where the minimum is taken over all collections of k non-empty, mutually disjoint subsets S 1 , S 2 , . . . , S k ⊆ V . We call such kind of k subsets a ksubpartition of V following [14]. Note by definition we have the monotonicity h(k) ≤ h(k + 1). The classical Cheeger inequality asserts that Resolving a conjecture of Miclo [29] (see also [14]), Lee-Gharan-Trevisan [25] prove the following higher-order Cheeger inequality. Theorem 1.1 (Lee-Gharan-Trevisan). For every graph G, and each natural or in another form where C is a universal constant.
Observe that when k > N 2 , one of k disjoint non-empty subsets can only contain a single vertex, hence h(k) = 1. Therefore (7) is more useful for the first half part of the spectrum.
We will study the corresponding phenomena for the remaining part of the spectrum. Define the following quantity for a pair of disjoint subsets Then, for every k ∈ N, we can define a k-way dual Cheeger constant as follows.
where the maximum is taken over all collections of k pairs of subsets For ease of the notation, we will denote the space of all k pairs of subsets described as above by Pair(k) and call every element of Pair(k) a k-subbipartition of V . Here we have the monotonicity h(k) ≥ h(k + 1). Bauer-Jost [4] proved a dual Cheeger inequality Our main result in this paper is the following higher-order dual Cheeger inequality. Theorem 1.2. For every graph G, and each natural number or in another form, where C is a universal constant.
This can be considered as a strong quantitative version of the fact that λ N −k+1 = 2 if and only if G has at least k bipartite connected components (see Proposition 3.1 (i)).
Dually, when k > N 2 , one of the subset pairs {(V 2i−1 , V 2i )} k i=1 ∈ Pair(k) has to contain empty subset, hence h(k) = 0. Therefore (11) is more useful for the second half part of the spectrum.

1.2.
Clustering on real projective spaces. The lower bound estimate of 2 − λ N −k+1 in (11) is the essential part of Theorem 1.2. For the proof, we will follow the route of the one in Lee-Gharan-Trevisan [25] which justifies the spectral clustering algorithms using the bottom k eigenfunctions of [33]. (Note that by this route, one can also only get an order k 3 in (6), Lee-Gharan-Trevisan used other strong techniques to derive k 2 with a price of a much larger C.) The novel point of our proof is to explore a new type of spectral clustering.
For an orthogonal system of eigenfunctions f 1 , f 2 , . . . , f k of the normalized Laplace operator ∆, one can construct the mapping For illustration, we ignore those vertices on which F vanishes and consider the mapping further to a unit sphere, where · is Euclidean norm in R k . We will also use ·, · for the inner product of vectors in R k . The spectral clustering algorithms using the bottom k eigenfunctions aim at obtaining k subsets of V with smaller expansions, i.e. clustering those groups of vertices which are closely connected inside the group and loosely connected with outside. Roughly speaking, [25] used the sphere distance to cluster vertices in V via their image on the unit sphere under F .
We explore the clustering phenomena using the top k eigenfunctions f N −k+1 , . . . , f N . Now insert them to the definition of F to renew it. We first observe that for any u ∈ V Therefore if λ N −K+1 > 1 is large, there exists at least one neighbor v 0 of u such that F (u), F (v 0 ) < 0. That is, every vertex always has at least one neighbor far away from it under the sphere distance. This indicate that the aim of a proper clustering in this case should be different. In fact, instead of pursuing small expansion subsets, we aim at finding k subsets each of which has a bipartition such that the quantity 1 − φ is small. Or roughly speaking, we hope to find k subsets whose induced subgraphs are all close to bipartite ones. Let's explain how real projective spaces come into the situation by the following extremal but inspiring example. Consider a disconnected graph G which has two bipartite connected components. Then the embedding of its Figure 1. The graph G and its embedding into the sphere vertices into the sphere S 1 via its top two eigenfunctions are as shown in Figure 1. If we use the sphere distance, we will get two clusters, e.g. V 1 ∪ V 3 and V 2 ∪ V 4 . But recall our purpose, we actually hope to have V 1 ∪ V 2 and V 3 ∪ V 4 . A solution for this problem is to identify the antipodal points of the sphere and obtain two clusters V 1 = V 2 and V 3 = V 4 . Afterwards, we "open" each cluster to get two pairs of subsets which we desire. Therefore, we should use the metric on the real projective space instead of that of the sphere.
To understand the above clustering via top k eigenfunctions more intuitively, we can think of the edges in E as "hostile" relations. Vertices are clustered because of sharing common enemies. In contrast, the traditional clustering via the bottom k eigenfunctions treat edges as "friendly" relations. We anticipate applications of this kind of hostile spectral clustering methods in practical fields, e.g. the research of social relationship networks. This hostile clustering is technically quite crucial for our purpose of proving Theorem 1.2, as discussed in Lemma 5.2 and 5.3.

1.3.
Organization of the paper. In Section 2 we collect necessary results from spectral graph theory and random partition theory of doubling metric spaces. Section 3 is devoted to various interesting relations between h(k) and h(k). We discuss the lower bound estimates of λ N −k+1 in Section 4. And in Section 5 and 6 we present the proof of the lower bound estimate of 2 − λ N −k+1 of (11). In Section 7, we prove for cycles a slightly "shifted" version of higher-order dual Cheeger inequalities with absolute constant even independent of k based on the results of [14]. We also calculate the example of unweighted cycles in detail. In Section 8, we explore an application of higher-order dual Cheeger inequalities to essential spectrum of a general reversible Markov operator.
We comment that the results about weighted graphs in this paper except Proposition 3.5 can be extended to graphs permitting self-loops, or in the language of Markov chains, lazy random walks. One just need to be careful about the fact µ(u) ≥ v,v∼u,v =u w uv (see below for µ) in that case.

Preliminaries
2.1. Spectral theory for normalized graph Laplacian. We assign a natural measure µ to V that µ(u) = d u , for every u ∈ V . The inner product of two functions f, g : V → R is given by We denote l 2 (V, µ) the Hilbert space of functions on V with the above inner product.
The normalized graph Laplacian ∆ is defined as follows. For any f ∈ l 2 (V, µ), and u ∈ V In matrix form, ∆ = I − P , where I is the identical matrix, and P : and a dual version of Rayleigh quotient of F The support of a map F is defined as We call λ an eigenvalue of ∆ if there exists some f ≡ 0 with ∆f = λf . Let 0 = λ 1 ≤ λ 2 ≤ · · · ≤ λ N be all the eigenvalues of ∆. The Courant-Fischer-Weyl min-max principle tells us and dually The next lemma can be found in Bauer-Jost [4] (Lemma 3.1 there). For its various variants, see e.g. [9], [25].
The next lemma is basically contained in the proof of Theorem 3.2 in Bauer-Jost [4].
We remark that here we do not require each of V 1 , V 2 is non-empty, but only their union. This lemma is derived from the combination of Lemma 2.1 and a construction in Bauer-Jost [4] (following previous ideas in Desai-Rao [15]). For convenience, we recall the proof here briefly.
from the original graph G in the following way. Duplicate all the vertices in P (f ) and N (f ). Denote by u the new vertices duplicated from u. For any edge (u, v) such that u, v ∈ P (f ) or u, v ∈ N (f ), replace it by two new edges (u, v ), (v, u ) with the same weight w uv = w vu = w uv . All the other vertices, edges and weights are unchanged.
Consider the function g : V → R, Then the above construction convert the inside edges of P (f ), N (f ) into the boundary edges of supp (g). Furthermore, one can check where for the last equality we used This complete the proof of the lemma.

2.2.
Padded random partitions of doubling metric space. Random partition theory of metric spaces was firstly developed in theoretical computer science. It has found plenty of important applications in pure mathematics, see e.g. [26], [22], [25]. We discuss a result of that in this section which is needed in our arguments later. We first introduce the concept of doubling metric spaces. There are two kinds of doubling properties: metric doubling and measure doubling.
The metric doubling constant ρ X of a metric space (X, d) is defined as is called a doubling measure if there exists a number C µ such that for any x ∈ X, r > 0, Similarly we call dim µ := log 2 C µ the measure doubling dimension. Note that the measure doubling dimension of R k with the standard Euclidean volume measure is exactly k.
The two doubling dimensions are related by the following result (see e.g. the Remark on p. 67 of [13]).
A partition of a metric space (X, d) is a series of subsets for some number m, where S i ∩ S j = ∅, for any i = j and X = m i=1 S i . A partition can also be considered as a map P : that contains x. A random partition P of X is a distribution ν over the space of partitions of X. The following padded random partition theorem is a slightly modification of Theorem 3.2 in Gupta-Krauthgamer-Lee [20], (see also Lemma 3.11 in [26]).
Theorem 2.4. Let (X, d) be a finite metric subspace of (Y, d). Then for every r > 0, δ ∈ (0, 1) there exists a random partition P, i.e. a distribution ν over all possible partitions of X, such that • diam(S) ≤ r, for any S in every partition P in the support of ν; The random partition obtained in the above theorem is called a (r, α, 1 − δ)-padded random partition in [25].
Proof. We invite the readers to [20] for the proof of this theorem. But we comment here that one can replace the dim d (X) in Theorem 3.2 of [20] by dim d (Y ) as we do here in the conclusion. The reason is that the only point where the metric doubling dimension plays a role in the proof is the following fact (this is more clear in [26]). Let Z ⊆ X be a subset in which each pair of distinct points has a distance at leat . Then the cardinality of , for any x ∈ X and radius t ≥ . Surely one can count the cardinality in the outer space Y .

Relations between h(k) and its dual h(k)
In this section, we explore some interesting relations between the multiway Cheeger and dual Cheeger constants. The following two propositions can be considered as strong extensions of Theorem 3.1 and Proposition 3.1 in Bauer-Jost [4].
Moreover, we have (i) h(k) = 1 if and only if G has at least k connected components, each of which is bipartite. ( Proof. By definition of h(k), h(k) and the formula (20), we have Observe further in the above calculation, the equality in (23) can be achieved when the graph G is bipartite. Since then for each S i , we can always find a bipartition This actually proves (ii).
Hence for each i, Then recalling (20), we get are k connected components, each of which is bipartite. Conversely, if we know G has k connected components, each of which is bipartite, we can choose the k-sub-bipartition to be the bipartitions of those k components. Then by definition, we have Together with (22), we know h(k) = 1. For Let V 2i 0 −1 ∪ V 2i 0 attain the maximus in the above, we have Remark 3.2. The property (ii) above shows nicely the duality between h(k) and h(k). Recalling an equivalent fact of bipartiteness is that whenever λ is an eigenvalue, so is 2 − λ (see e.g. Lemma 1.8 in [9]), and employing property (i) above, we conclude We note that the fact h(k) + h(k) = 1 only for certain k does not imply that G is bipartite. One can think of the trivial case that for any graph with N vertices, when k > N 2 , we have i.e. h(k) + h(k) = 1. We also have the following example.
Example 3.3. Consider the unweighted (i.e. every edge has a weight 1) complete graph K 2n with 2n vertices. By thinking of k disjoint edges, it is not hard to check Therefore we have h(n) + h(n) = 1.
In fact even when h(k) + h(k) = 1 for 2 ≤ k ≤ N , the graph still can be non-bipartite. One example is an odd cycle (see Proposition 7.2). However, it is already very close to a bipartite graph.
By (24), it is immediately to see that for bipartite graphs the classical Cheeger inequality (5) and Lee-Gharan-Trevisan's Theorem 1.1 are equivalent to the following dual estimates.
Corollary 3.4. Let G be any bipartite graph. Then It is interesting to note that the dual fact of Bauer-Jost's dual Cheeger inequality (9) is that λ 1 = 0.
To prove this proposition, we need the following lemma due to Bauer-Jost [4] (see also Theorem 4.2 in [3]).
Proof of Proposition 3.5. For any k-subpartition S 1 , S 2 , . . . , S k of V , by Lemma 3.6, for each 1 ≤ i ≤ k we have a partition S i = V 2i−1 ∪ V 2i , such that By definition, we know Combining (20) and (27), we arrive at Therefore, we obtain Since S 1 , S 2 , . . . , S k are chosen arbitrarily, we prove

Lower bound estimate of λ N −k+1
In this section, we prove the lower bound estimate of λ N −k+1 . For any . Then for every k ∈ N, we define a new constant which is no smaller than h(k), .
We prove the following result.
We note that the right hand side of (11) is a quick corollary of this result.
By construction, we know every f i is not identically 0. Then (17) tells us where the maximum is taken over all collections of k constants at least one of which is non-zero. It is directly to see For the numerator of the quotient in (29), we have Then we obtain .

The metric for clustering via top k eigenfunctions
In this section, we start to prove the lower bound estimate of 2−λ N −k+1 in (11). Recall that the max-min problem in (17) is solved by the corresponding eigenfunctions. Hence for the top k eigenfunctions f N +k−1 , . . . , f N , we have .
Then it is directly to calculate where F is the map from V to R k defined by i.e. in the way of (12). One can also obtain the fact (31) by applying the min-max principle directly to the operator I + P . Following the route in [25] for dealing with Rayleigh quotient of the bottom k eigenfunctions, we will localize F to be k disjointly supported maps {Ψ i } k i=1 for which R(Ψ i ) can be controlled from above by R(F ). Afterwards, we will apply Lemma 2.2 to handle each R(Ψ i ) further. More explicitly, our requirements for the localization are for each i • u∈V Ψ i 2 µ(u) can be bounded from below by certain fraction of The first one will be realized by theory of random partitions on doubling metric spaces combining with the crucial Lemma 5.2 below. The second one is solved by Lemma 5.3 below. Before all those arguments, we need introduce our new metric first.

5.1.
Real projective space with a rough metric. We can use the standard Riemannian metric on real projective spaces inheriting from that of spheres via the canonical antipodal projection But for ease of calculations, we adopt a rough metric. That is for any where · is the Euclidean norm of vectors in S k−1 ⊂ R k . It is easy to check that d is a metric on P k−1 R.
. Proof. Let's denote the distance function deduced from the standard Riemannian metric on P k−1 R by d Rie , and the Riemannian volume measure by µ Rie . (i) is easy. (Compare the fact that the diam(P k−1 R, d Rie ) = π 2 ). One can further observe 2 Since (P k−1 R, d Rie ) has constant sectional curvature 1 (see more geometry properties of projective spaces scattered in [19]), by comparison theorem, Further recalling Lemma 2.3, this implies Consider the vertex set V of a graph G, and a nontrivial map F : V → R k . We write V F := supp F for convenience. Then we define a map to the real projective spaces, By the metric d defined above, we get an non-negative symmetric function which satisfies the triangle inequality on V F . That is, we obtain a pseudo metric space ( V F , d F ).

Spreading lemma.
We prove the following spreading lemma for the new metric extending Lemma 3.2 in Lee-Gahran-Trevisan [25]. For any map F : V → R k , let's call the quantity u∈V µ(u) F (u) 2 the l 2 mass of F on V , denoted by E V for short. By spreading, we mean the l 2 mass of F distribute evenly on V F .
. . , f k in the way of (12). If S ⊆ V satisfies diam(S ∩ V F , d F ) ≤ r, for some 0 < r < 1,, then we have The map F is said to be (r, 1 k(1−r 2 ) )-spreading if it satisfies the conclusion of the lemma. This property tells when the subset is of small size, the l 2 mass of F on it can not be too large. The l 2 mass of F is impossible to concentrate in a particular region.
Proof. Since E S = E S∩ V F , we can suppose w.l.o.g. that S ⊆ V F . As in [25], for a unit vector x ∈ R k , i.e., and we also have Now, for any u ∈ S, we get Note that Therefore we have Inserting this back into (35), we arrive at This proves the lemma.

Localization lemma.
Let F be a map from V to R k . For any subset S ⊆ V F , we define its -neighborhood with respect to the metric d F as Now for any given subset S ⊆ V , we define a cut-off function, Then we can localize F to be It is obvious that Ψ | S = F | S and supp (Ψ) ⊆ N (S ∩ V F , d F ). We extend Lemma 3.3 in [25] to our new metric in the following localization lemma.
Lemma 5.3. Given < 2, let Ψ be the localization of F via θ as above.
Then for any e = (u, v) ∈ E, we have Proof. First observe that if F (u) = F (v) = 0, (38) is trivial. If only one of F (u), F (v) vanishes, (38) is implied by the fact that |θ| ≤ 1. Therefore we only need to consider the case that both u, v ∈ V F in the following.
In conclusion, we have Recalling (39) and the fact |θ| ≤ 1, we complete the proof of the lemma.
6. Finding k-sub-bipartition with small 1 − φ In this section, we will prove Theorem 6.1. For a map F : V → R k constructed from l 2 (V, µ)-orthonormal functions f 1 , f 2 , . . . , f k in the way of (12), there exists a k-sub-bipartition where C is a universal constant.
Recall (31), once we take the k functions above to be the top k orthonormal eigenfunctions, we will have 2−λ N −k+1 ≥ R(F ). Then by the definition of h(k), (11) follows immediately from this theorem.
We need the following lemma (modified from Lemma 3.5 of [25]) to get k-disjoint subsets of ( V F , d F ).
Lemma 6.2. Suppose F is (r, 1 k 1 + 1 8k )-spreading, and ( V F , d F ) has a (r, α, 1 − 1 4k )-padded random partition, then there exists k non-empty, mutually disjoint subsets T 1 , T 2 , . . . , T k ⊆ V F such that Proof. Let P be the (r, α, 1 − 1 4k )-padded random partition in the assumption. Denote by I B d F (v, r α )⊆P(v) the indicator for the event that B d F (v, r α ) ⊆ P(v) happens. Then we calculate the expectation Therefore there exist at least one partition By the spreading property in the assumption, we know for every 1 ≤ i ≤ m We can construct the desired k disjoint subsets from S 1 , S 2 , . . . , S m by the following procedure. If we can find two of them, say S i , S j , such that then replace them by S i ∪ S j . Note that in this process we did not violate the fact (43) since Repeat the above operation until we can not find two of them such that (44) holds. Therefore when we stop, we get a series of subsets T 1 , T 2 , . . . , T r for some number r such that and We are not sure about the lower bound of at most one of those Observe that Recalling (42) and (46), we know r ≥ k, and if we take we will have This proves the lemma.

Trees and cycles
In this section, we explore the relations between spectra and k-way Cheeger and dual Cheeger constants on trees (i.e. graphs without cycles) and cycles. In particular the following kind of relations, It is proved by Miclo [29] and Daneshgar-Javadi-Miclo [14] that for trees, C 1 (k) can be 1 2 and for cycles it can be That is, for those special classes of graphs, C 1 (k) can even be independent of k. (Of course the case k = 1 is trivial and we list it here just for comparison.) Recalling (24) in Remark 3.2, we can take C 2 (k) = C 1 (k) to be independent of k for trees and even cycles since they are all bipartite. If we shift the relative position of eigenvalues and multi-way dual Cheeger constants a little, we can prove the following results for cycles.
with the notation h(0) := h(1) and , if N − k + 1 ≤ N − 2 is even. As commented above, we only need to prove the theorem for odd cycles, for which we need the following observation.
Proof. Observe the fact that any proper subset S of V possesses a bipartition Recall the proof of Proposition 3.1 (ii), we know h(k) + h(k) = 1 for any 2 ≤ k ≤ N .
Proof of Theorem 7.1. Let C N −1 be a cycle obtained from C N by contracting one edge (u 0 , v 0 ). Precisely, we mean removing (u 0 , v 0 ) from the edge set of C N and identifying vertices u 0 , v 0 to be one vertex renamed as η in C N −1 .
Let's use a prime to indicate quantities of the new graph C N −1 . Then we Now by an interlacing idea in Butler [6], we claim (Note that the interlacing results for the so-called weak coverings in [6] does not apply to our case since different weights and vertex degrees of the new graph C N are chosen there. See also the general result Theorem 4.1 for contracting operations in Horak-Jost [21] which works for unweighted graphs.) Indeed, by (17) we have where maximum is taken over all possible N − k + 1 dimensional subspace This satisfies Now we have for odd N , C N −1 is bipartite, and then for k ≥ 2, By the result (53) of [14] and Proposition 7.2, we proves the theorem for odd N and k ≥ 3. For k = 1, 2, by the dual Cheeger inequality (9), 2 − λ N −1 ≥ 2 − λ N ≥ 1 2 (1 − h(1)) 2 . In the following, we consider the special class of unweighted cycles as an example. We will see the constants in (53) can be better for unweighted cycles and C 2 (k) can also be independent of k for unweighted odd cycles.
Proof. For 2 ≤ k ≤ N , since it always holds h(k) = 1 − h(k), we only need to calculate h(k).
be the k-subpartition which achieves h(k) of a cycle. Then we can always suppose V * = V \ k i=1 S i = ∅ and every S i is connected since otherwise we can construct k connected partitions from them without increasing their expansions as follows. First suppose V * 1 is a non-empty connected component of V * and is connected to S i 1 ∈ {S i } k i=1 by an edge.
Then replace S i 1 by S i 1 ∪ V * 1 . Note in this replacement, the boundary measure is unchanged while the volume increases, hence Therefore we can suppose be the one with the minimal expansion, then we have . Same arguments as that for (58) tell us those replacements do not increase the expansion.
Observe in an unweighted cycle for any connected non-empty proper subsets S i ⊆ V , we have φ(S i ) = 2 2 S i , where S i represents the number of vertices in S i . Then it is directly to obtain By similar arguments, one can fix when N is odd, It is known that the eigenvalues of an unweighted cycle (see e.g. Example 1.5 in [9]) listed in an increasing order are It is then directly to clarify Proposition 7.4. For an unweighted cycle, we have for each 1 ≤ k ≤ N where , if k is odd, Proof. Recall the following basic inequalities We only need to consider k ≥ 2. When N − k + 1 is odd, we have This verifies (60) for odd N − k + 1 and in fact also (59) for even k. When N − k + 1 is even, similarly we have Otherwise N 2 < k ≤ N 2 + 1, then h(k) = 1. Observing 2 − λ N −k+1 equals 1 if N is even and 1 − cos (N −1)π 2N ≥ 1 − cos π 3 > π 9 if N is odd, we have verified (60) for even N − k + 1. This also verifies (59) for odd k.
Remark 7.5. (i) This example shows unweighted cycles are a class of graphs for which λ k and h(k) 2 , 2−λ N −k+1 and (1−h(k)) 2 are equivalent respectively up to an absolute constant. In fact, one can easily extend this conclusion to weighted cycles with uniformly bounded weights, paying a price that the constant will then depend on the uniform weight bounds.
(ii) We emphasis that this example also shows it is possible to expect for certain class of graphs that C 1 , C 2 can be improved to be 1 for even k and odd N −k +1 respectively. For related discussions about Cheeger inequality, see Chapter 5 of [32].

Essential spectrum of reversible Markov operators
In this last section, we discuss an application of the higher-order dual Cheeger inequality (11) to characterize the essential spectrum of a general reversible Markov operator, following the spirit of Miclo [30] and F.-Y. Wang [39].
Let's start from extending our notations to that abstract setting. Assume (X, F, µ) be a probability space. We define Then Let P : L 2 (X, µ) → L 2 (X, µ) be a linear operator such that P 1 = 1, and for any f ∈ L 2 (X, µ), f ≥ 0 implies P f ≥ 0, ( where 1 stands for the function that always takes the value one. We will also use 1 S for a measurable subset S ⊆ X to represent the characteristic function of S. The operator P is then called a Markov operator. We will consider a reversible (alternatively called symmetric) Markov operator P such that µ is an invariant measure for it. Explicitly, for any f ∈ L 2 (X, µ), we requireˆX Actually, there exists a symmetric measure J on X × X (see e.g. [39]), such that Then we have for any two measurable subsets A, B of X, where the inner product notion is extended from the previous finite graph setting.
Remark 8.1. The previous weighted finite graph setting can be fitted into this general framework, see [30] for a dictionary. However, in this section we only discuss the case that L 2 (X, µ) is of infinite dimensional.
It is known that the spectrum σ(P ) of the operator P lies in [−1, 1]. In the following we will denote the top and bottom of the essential spectrum σ ess (P ) of P by λ ess (P ) := sup σ ess (P ) and λ ess (P ) := inf σ ess (P ).
In their spirit, we extend the k-way dual Cheeger constant to the present setting as follows.
h X (k) := sup where the supremum is taken over all possible k-sub-bipartitions of X, precisely, all collections of k pairs of subsets, (A 1 , A 2 ), . . . , (A 2k−1 , A 2k ), where for any 1 ≤ p = q ≤ 2k, A p and A q are disjoint and for each 1 ≤ i ≤ k, µ(A 2i−1 ∪ A 2i ) > 0. Accordingly, we have h X (k) ≥ h X (k + 1). Then we have the following relations between h X (k) and h X (k), extending the previous Proposition 3.1.
Proof. This proposition can be proved in the same way of (23) for (22), bearing in mind for a partition A 1 ∪ A 2 of S ⊆ X the following fact, We prove the following characterization of λ ess (P ) in terms of the multiway dual Cheeger constants.
An immediate corollary is the following one.
To prove Theorem 8.4, we need to extend the higher-order dual Cheeger inequalities to the present setting. Recalling the comments below (31), the proper operator we should use here is L = I + P , which is bounded and self-adjoint. Then we can follow [39] to study λ k := sup Define λ ess (L) := inf σ ess (L). Then λ k is the k-th eigenvalue of L if λ k < λ ess (L) and λ k = λ ess (L) otherwise (see e.g. [39]). We can now state the following inequalities.
Theorem 8.6. Let C be the same constant as in (11). Then for k ≥ 1, 1 C 2 k 6 (1 − h X (k)) 2 ≤ λ k ≤ 2(1 − h X (k)). (67) Proof. The upper bound can be proved by the same technique for Theorem 4.1. One only need to keep in mind the following fact, for any f ∈ L 2 (X, µ).
For the lower bound, we refer to the proof of Lemma 2.2 in [39]. Because we only need to replace the operator L = I − P there by L and use the higher-order dual Cheeger inequalities for the finite discrete structure there. Basically, the approximation procedure only involves the operator P . We also recall here the fact that on a graph with N vertices, λ k of the operator I + P equals 2 − λ N −k+1 , where λ N −k+1 is the (N − k + 1)-th eigenvalue of ∆ = I − P .
Proof of Theorem 8.4. The proof can be done in the same way as [39]. For completeness, we recall here. First observe λ ess (P ) > −1 ⇔ λ ess (L) > 0.
Remark 8.7. Observe that for this application, the order of k in (67) is not important. But we do need the constant in (11) to be universal for any weighted finite graph to derive (67) via the approximation procedure in [29], [39].