Multi-Frequency Joint Community Detection and Phase Synchronization

This paper studies the joint community detection and phase synchronization problem on the stochastic block model with relative phase, where each node is associated with an unknown phase angle. This problem, with a variety of real-world applications, aims to recover the cluster structure and associated phase angles simultaneously. We show this problem exhibits a “multi-frequency” structure by closely examining its maximum likelihood estimation (MLE) formulation, whereas existing methods are not originated from this perspective. To this end, two simple yet efficient algorithms that leverage the MLE formulation and benefit from the information across multiple frequencies are proposed. The former is a spectral method based on the novel multi-frequency column-pivoted QR factorization. The factorization applied to the top eigenvectors of the observation matrix provides key information about the cluster structure and associated phase angles. The second approach is an iterative multi-frequency generalized power method, where each iteration updates the estimation in a matrix-multiplication-then-projection manner. Numerical experiments show that our proposed algorithms significantly improve the ability of exactly recovering the cluster structure and the accuracy of the estimated phase angles, compared to state-of-the-art algorithms.

Community detection on SBM.Consider the symmetric SBM with N nodes that fall into M underlying clusters of equal size s = N /M.SBM generates a random graph G such that each pair of nodes (i, j) are connected independently with probability p if (i, j) belong to the same cluster, and with probability q otherwise.The goal is to recover underlying cluster structure of nodes, given the adjacency matrix A SBM ∈ {0, 1} N ×N of the observed graph G.During the past decade, significant progress has been made on the information-theoretic threshold of the exact recovery on SBM [1], [10], [11], in the regime where p = α log N /N, q = β log N /N, and √ α − √ β > √ M .The maximum likelihood estimation (MLE) formulation of community detection on SBM is capable of achieving the exact recovery in the above regime, where H := {H ∈ {0, 1} N ×M : H1 M = 1 N , H 1 N = s1 M } is the feasible set.However, the MLE (1) is nonconvex and NP-hard in the worst case.Therefore, different approaches based on MLE (1) or other formulations are proposed to tackle this problem, such as spectral method [12]- [19], semidefinite programming (SDP) [10], [20]- [27], and belief propagation [11], [28].
Phase synchronization.The phase synchronization problem concerns recovering phase angles θ 1 , . . ., θ N in [0, 2π) from a subset of possibly noisy phase transitions θ ij := (θ i − θ j ) mod 2π.The phase synchronization problem can be encoded into an observation graph G, where each phase angle is associated with a node i and the phase transitions are observed between θ i and θ j if and only if there is an edge in G connecting the pair of nodes (i, j).Under the random corruption model [2], [29], observations constitute a Hermitian matrix whose (i, j)th entry for any i < j satisfies, A Ph,ij = e ι(θi−θj ) , with probability r ∈ [0, 1), u ij ∼ Unif(U (1)), with probability 1 − r, where ι = √ −1 is the imaginary unit, and U (1) is unitary group of dimension 1.The most common formulation of the phase synchronization problem is through the following nonconvex optimization program where C N 1 is the Cartesian product of N copies of U (1).Again, similar to SBM, solving (2) is non-convex and NPhard [30].Many algorithms have been proposed for practical and approximate solutions of (2), including spectral and SDP relaxations [2], [31]- [34], and generalized power method (GPM) [35]- [37].Besides, [38]- [40] consider the phase synchronization problem in multiple frequency channels, which in general outperforms the formulation (2).
Recently, an increasing interest [41]- [43] has been seen in the joint community detection and phase (or group) synchronization problem (joint estimation problem, for brevity).As illustrated in Figure 1, the joint estimation problem assumes data points associated with phase angles (or group elements) in a network fall into M underlying clusters, and aims to simultaneously recover the cluster structure and associated phase angles (or group elements).The joint estimation problem is motivated by the 2D class averaging procedure in cryo-electron microscopy single particle reconstruction [7], [8], [44], which aims to cluster 2D projection images taken from similar viewing directions, align (U (1) or SO(2) synchronization due to the Fig. 1: Illustration of the joint estimation problem on a network with two clusters of equal size.Each node is associated with a phase angle.Each pair of nodes within the same cluster (resp.across clusters) are independently connected with probability p (resp.q) as shown in solid (resp.dash) lines.Also, a phase transition θ ij = θ i − θ j (resp.θ ij is noise) is observed on each edge (i, j) within each cluster (resp.across clusters).The goal is to recover the cluster structure and the associated phase angles simultaneously.in-plane rotation) and average projection images in each cluster to improve their signal-to-noise ratio.
In this paper, we study the joint estimation problem based on the probabilistic model, stochastic block model with relative phase (SBM-Ph), which is similar to the probabilistic model considered in [41]- [43].Specifically, given N nodes in a network assigned into M underlying clusters of equal size s = N /M, we assume that each node i is associated with an unknown phase angle θ * i ∈ Ω, where Ω is a discretization of [0, 2π) 1 .For each pair of nodes (i, j), if they belong to the same cluster, their phase transition θ ij := (θ i − θ j ) mod 2π can be obtained with probability p; otherwise, we obtain noise generated uniformly at random from Ω with probability q.The goal of the joint estimation problem is to simultaneously recover the cluster structure and associated phase angles.This problem can be formulated as an optimization program maximizing not only the edge connections inside each cluster, but also the consistency among the observed phase transitions within each cluster.Still, such kind of optimization programs, similar to community detection on SBM (1) and phase synchronization (2), is non-convex.In [41], an SDP based method is proposed to achieve approximate solutions with a polynomial computational complexity.[42] proposes a spectral method based on the block-wise column-pivoted QR (CPQR) factorization, which scales linearly with the number of edges in the network.The most recent work [43] develops an iterative GPM, where each iteration follows a matrix-multiplication-thenprojection manner.The iterative GPM requires an initialization, and the computational complexity of each iteration also scales linearly with the number of edges in the network2 .However, existing methods are not developed from the MLE perspective, which limit their performance on the joint estimation problem.

A. Contributions
Unlike existing methods, this paper studies the joint estimation problem by first closely examining its MLE formulation, which exhibits a "multi-frequency" structure (detailed in Section III).More specifically, the MLE formulation is maximizing the summation over multiple frequency components, whose first frequency component is actually the objective function studied in [41]- [43].Based on the new insight, a spectral method based on the multi-frequency column-pivoted QR (MF-CPQR) factorization and an iterative multi-frequency generalized power method (MF-GPM) are proposed to tackle the MLE formulation of the joint estimation problem, and both significantly outperform state-of-the-art methods in numerical experiments.The contributions of this paper can be summarized as follows: • We study the MLE formulation of the joint community detection and phase synchronization problem with discretized phase angles, and show it contains a "multifrequency" structure.In a similar manner, we introduce the truncated MLE for the joint estimation problem with continuous phase angles in [0, 2π).• Inspired by [42] and the "multi-frequency" nature of the MLE formulation, we propose a spectral method based on the novel MF-CPQR factorization.The MF-CPQR factorization is adjusted from the CPQR factorization to cope with information across multiple frequencies.Similarly, we also introduce an iterative MF-GPM by carefully designing steps of leveraging the "multi-frequency" structure.• We compare the performance of our proposed methods to state-of-the-art methods [42], [43] on both discrete and continuous phase angles in [0, 2π) via a series of numerical experiments.Our proposed methods significantly outperform them in exact recovery of the cluster structure and error of phase synchronization.

B. Organization
The rest of this paper is organized as follows.The formal definition of SBM-Ph, the MLE formulation of the joint estimation problem, and the extension to continuous phase angles, are detailed in Section III.Section IV and V present the spectral method based on the MF-CPQR factorization and the iterative MF-GPM, respectively.Numerical experiments are in Section VI.Finally, we conclude the paper in Section VII.

C. Notations
Throughout this paper, we use [n] to denote the set {1, 2, . . ., n}, and I{•} to denote the indicator function.The uppercase and lowercase letters in boldface are used to represent matrices and vectors, while normal letters are reserved for scalars.X F and Tr(X) denote the Frobenius norm and the trace of matrix X, and v 2 denotes the 2 norm of the vector v.The transpose and conjugate transpose of a matrix X (resp.a vector x) are denoted by X and X H (resp. x and x H ), respectively.An m × n matrix of all zeros is denoted by 0 m×n (or 0, for brevity).An identity matrix of size n × n is defined as I n .The complex conjugate of x is denoted by x.The inner product •, • between two scalars, vectors, and matrices are x, y = xy, x, y = x H y, and X, Y = Tr(X H Y ), respectively.In terms of indexing, (i, j)th entry of X is denoted by X ij , and ith entry of x is denoted by x i .X i,• (resp.X •,j ) is used to denote ith row (resp.jth column) of X.We use X i,j: (resp.X i:,j ) to denote the segment of the ith row (resp.jth column) from the jth entry (resp.ith entry) to the end, and x i: to denote the segment from ith entry to the end.In addition, the sub-matrix of X from the ith row and jth column to the end is denoted by X i:,j: .Lastly, we use O and Θ to denote the usual Big-O and Big-Theta notations.The notations are summarized in Table I. the ith row (the jth column) of X.
X i,j: (X i:,j ) Segment of the ith row (the jth column) from the jth entry (the ith entry) to the end of the row (the column).
x i: Segment of the vector x from the ith entry to the end.
X i:,j: Sub-matrix of X from the ith row and the jth column to the end.O Big-O notation.Θ Big-Theta notation.

II. PRELIMINARIES
In this section, we introduce some important definitions that will be used in our algorithms later.
Definition 1 (QR factorization).Given X ∈ C m×n , a QR factorization of X satisfies X = QR, where Q ∈ C m×m is a unitary matrix, and R ∈ C m×n is an upper triangular matrix.
Such factorization always exists for any X.The most common methods for computing the QR factorization are Gram-Schmidt process [45], and Householder transformation [46].Definition 2 (Column-pivoted QR factorization).Let X ∈ C m×n with m ≤ n has rank m.The column-pivoted QR factorization of X is the factorization as computed via the Golub-Businger algorithm [47] where Π n ∈ {0, 1} n×n is a permutation matrix, Q is a unitary matrix, R 1 is an upper triangular matrix, and R 2 ∈ C m×(n−m) .
The ordinary QR factorization is proceeded on X from the first column to the last column in order, whereas the order of the CPQR factorization is indicated by Π n .We refer to [47] for more details on the CPQR factorization.
Definition 3 (Projection onto H in (1)).For an arbitrary matrix X ∈ R m×n , we define as the projection of X onto H.
The projection aims to find the cluster structure that has the largest overall score given by X.It is shown in [48] that projection onto H is equivalent to a minimum-cost assignment problem (MCAP), and can be efficiently solved by the "incremental algorithm" for MCAP [49,Section 3] with O(n 2 m log m) computational complexity.The uniqueness condition of the projection P H (X) can be found in the proof of [49,Theorem 2.1] and [50,Theorem 2].If the solution is not unique, the "incremental algorithm" for MCAP [49, Section 3] will generates a feasible projection randomly.

III. PROBLEM FORMULATION
In this section, we formally define the probabilistic model, SBM-Ph, studied in this paper.We first consider discrete phase angles and formulate the corresponding MLE problem, which exhibits a multi-frequency structure.Then, we extend the problem to continuous phase angles and formulate a truncated MLE problem. .Each pair of nodes (i, j) are connected independently with probability p if i and j belong to the same cluster, or equivalently, M * (i) = M * (j).Otherwise, i and j are connected independently with probability Our observation model can be represented by the observation matrix A ∈ C N ×N , which is a Hermitian matrix whose (i, j)th entry for any i < j satisfies, where A ji = A ij .We also set the diagonal entry Notice that a realization generated by the above observation matrix ( 3) is a noisy version of the clean observation matrix A clean ∈ C N ×N , whose (i, j)th entry satisfies, Specially, A is equal to A clean when p = 1 and q = 0.
Remark 1. Unlike the observation matrix (or adjacency matrix) A SBM in SBM [1], [10], [11], [21] with only {0, 1}valued entries, A in (3) extends to incorporating the relative phase angles θ ij into edges.On the other hand, while entries of the observation matrix A Ph in the phase synchronization problem [2], [39], [40] encode the the pairwise transformation information, they do not have the underlying M -cluster structure.

B. MLE with Multi-Frequency Nature
Based on the observation matrix A, we detail the MLE formulation for recovering the cluster structure and phase angles in this section.Given parameters, phase angles associated with nodes {θ i ∈ Ω} N i=1 and the cluster structure {S m } M m=1 of equal size s, the probability model of observing A ij between node pair (i, j) is where M(•) is the assignment function corresponding to the cluster structure {S m } M m=1 , and K = 2K max +1.The likelihood function given observations on the edge set E is due to the independence among edges within E. Notice that maximizing the likelihood function ( 5) is equal to maximizing the following log-likelihood function By taking the FFT w.r.t. the support Ω of ((θ i −θ j ) mod 2π)s and inverse FFT (IFFT) back, ( 7) is equivalent to where A (k) is the kth entry-wise power of A with A As indicated by (8), the MLE exhibits a multifrequency nature, where the kth frequency component is ij , e ι(θi−θj ) in (8).Although the following program using the first frequency component is a reasonable formulation for the joint estimation problem as suggested by [41]- [43], it is indeed not a MLE formulation.One can show that ( 9) is equivalent to which is not the MLE (7) of the joint estimation problem.
To proceed, we perform a change of optimization variables for (8).By defining a unitary matrix V ∈ C N ×M whose (i, m)th entry satisfies the cluster structure {S m } M m=1 and the associated phase angles i=1 are encoded into one simple unitary matrix V .Then, the optimization program (8) can be reformulated as V satisfies the form (10), (11) where each V (k) is generated by V through the entry-wise power that satisfies The optimization program ( 11) is non-convex, and is thus computationally intractable to be solved exactly.Although one can try SDP based approaches similar to [41], it is not guaranteed to obtain exact solutions to the MLE, let alone the high computational complexity when N and K max are large.Therefore, we propose a spectral method based on the MF-CPQR factorization and an iterative MF-GPM in Section IV and Section V, respectively.

C. Extension to Continuous Phase Angles: A Truncated MLE
We consider the joint estimation problem on a discretization of [0, 2π) in Section III-A, and then derive the MLE formulation in Section III-B.Now, we turn to the joint estimation problem with continuous phase angles in [0, 2π) Following the similar steps as ( 5), ( 6), (7), the MLE formulation is The MLE formulation ( 13) is essentially equal to counting the times that δ([ is the Dirac delta function.We can express the Dirac delta function with its Fourier series expansion, (14) The straightforward truncation in (14) corresponds to approximating the Dirac delta with the Dirichlet kernel.By this truncation, the problem in ( 13) is converted to The optimization program ( 15) is a truncated MLE of the joint estimation problem with continuous phase angles of (13).
As one can observe from ( 8) and ( 15), the only difference is that θ i ∈ Ω is discrete in (8), and θ i ∈ [0, 2π) is continuous in (15).Algorithms in Section IV and V can also be directly applied to the joint estimation problem with continuous phase angles after simple modification.Due to the similarity between the joint estimation problem and its continuous extension, we will only focus on the joint estimation problem on Ω (despite numerical experiments) in remaining parts of this paper for brevity.

IV. SPECTRAL METHOD BASED ON THE MF-CPQR FACTORIZATION
In this section, we propose a spectral method based on the novel MF-CPQR factorization for the joint estimation problem.We start with introducing main steps and motivations of Algorithm 1 in Section IV-A.Section IV-B states the novel algorithm, the MF-CPQR factorization, designed for our spectral method, together with the difference between the MF-CPQR factorization and the CPQR factorization.In Section IV-C, we discuss the computational complexity of our proposed algorithm in details.
Our spectral method based on the MF-CPQR factorization is inspired by the CPQR-type algorithms [42], [51], together with the multi-frequency nature of the MLE formulation (11).Similar to the CPQR-type algorithms, Algorithm 1 is deterministic and free of any initialization.Meanwhile, in terms of computational complexity, Algorithm 1 scales linearly w.r.t. the number of edges |E| and near-linearly w.r.t.K max .

A. Motivations
, which yields . ., Kmax 3 (Recovery of the cluster structure and the phase angles) For each node i ∈ [N ], assign its cluster as Then estimate the phase angle given the recovered cluster assignment Output: Estimated cluster structure { M(i)} N i=1 and estimated phase angles { θi } N i=1 recovering the cluster structure and associated phase angles based on {R (k) } Kmax k=−Kmax via ( 17) and ( 18).In terms of motivations for Algorithm 1, we start from the MLE formulation (11).We first relax (11) by replacing the constraints in (10) with by noticing that V in (10) forms an orthonormal basis.The optimization problem in (19) is still non-convex and there is no simple spectral method that can directly solve the problem.One approach is to relax the dependency of V (k) among different frequencies and split (19) into different frequencies, and that is, for k = −K max , . . ., K max , we have The optimizer of ( 20) is the matrix that contains the top M eigenvectors of A (k) denoted by Φ (k) ∈ C N ×M .This accounts for step 1 (eigendecomposition) in Algorithm 1.
In fact, one can infer the cluster structure from {Φ (k) } Kmax k=−Kmax .To see this, for k = −K max , . . ., K max , we split A (k) into deterministic and random parts: where clean being the entry-wise kth power of A clean (4), and the residual ∆ (k) is a random perturbation with E[∆ (k) ] = 0. Obviously, each E[A (k) ] is a low rank matrix that satisfies the following eigendecomposition: where Ψ (k) ∈ C N ×M is a matrix defined in a similar manner as V (k) in ( 12), and satisfies Ψ (k) H Ψ (k) = I M .Then, for k = −K max , . . ., K max (except for k = 0), the non-zero entry in each row of Ψ (k) indicates the underlying cluster assignment M * (i) and the exact phase angle θ * i of node i.Therefore, to recover the cluster structure and associated phase angles, it suffices to extract For the ease of illustration, we first consider the case when p = 1 and q = 0.This indicates, for , where O (k) ∈ C M ×M is some unitary matrix.However, {O (k) } Kmax k=−Kmax are unknown and even not synchronized among all frequencies.To address this issue, the MF-CPQR factorization is introduced.Here, we assume that the first s nodes are from the cluster S * 1 , the following s nodes are from S * 2 , and so on.Applying the MF-CPQR factorization (step 2) in Algorithm 1 yields (assume for k = −K max , . . ., K max .Therefore, each Q (k) ∈ C M ×M is a unitary matrix that includes the unknown unitary matrix O (k) , and each R (k) ∈ C M ×N is a matrix excludes O (k) .More significantly, {R (k) } Kmax k=−Kmax contains all the information needed to recover the cluster structure and associated phase angles.
To recover the cluster structure, the CPQR-type algorithm [42] only uses R (1) .By noticing that for each node i, the ith column of R (1) (e.g., R •,i ) is sparse (its mth entry R (1) mi is nonzero if and only if m = M * (i)), one can determine the cluster assignment of node i by the position of the nonzero entry.Meanwhile, the associated phase angle can also be determined by obtaining the phase angle from the nonzero entry (up to some global phase transition in the same cluster).When the observation A is noisy, the CPQR-type algorithm recovers the cluster structure and associated phase angle of node i by the position of the entry with the largest amplitude.The following Theorem 1 proves as long as the perturbation to E[A (k) ] is less than a certain threshold, Φ (k) is still close to Ψ (k) O (k) , for k = −K max , . . ., K max (except for k = 0).Theorem 1 (Row-wise error bound, adapted from [42]).Given a network with N nodes and M = 2 underlying clusters, for a sufficiently large N , we suppose for some small constant c 0 .Consequently, with probability at where Theorem 1 guarantees that i) amplitudes of other entries are less than the entry indicating the true cluster structure with high probability, ii) the phase angle information is preserved with high fidelity.Theorem 1 can be proved by following the same routines as [42] by replacing the orthogonal group element O i with the U (1) group element (e.g., e ιθi ).The reason why Theorem 1 holds for k = −K max , . . ., K max (despite k = 0) is due to statistics of random perturbations {∆ (k) } k in (21) do not change among different frequencies.This is because the noise models of A (k) and A are the same.More specifically, the noisy entry e ιuij : u ij ∼ Unif(Ω) in (3) has the same statistics as e ιkuij in A (k) due to the fact that ku ij still yields the distribution Unif(Ω).all the ith columns in R (k) Kmax k=−Kmax are extracted to estimate the cluster assignment and phase angle following (17) and (18).
Note that the CPQR-type algorithm in [42] is not developed from the MLE formulation (11) of the joint estimation problem, and thus does not capture the multi-frequency nature.In this paper, we leverage {R (k) } Kmax k=−Kmax that contain information about the cluster structure and associated phase angles across multiple frequencies (step 3).Specifically, we first consider the same case as that in (22) for intuition.As illustrated in Figure 2, the matrix concatenated by the ith (i ≤ s) columns across all frequencies is The cluster assignment of i can be acquired by finding the non-sparse row of the above matrix, and the phase angle can be determined by evaluating the non-sparse row (e.g., FFT).When the observation A is noisy, ( 17) and ( 18) are used to estimate the cluster structure and associated phase angles, which can be interpreted as checking the consistency or conducting majority vote among all frequencies.The performance is expected to be at least as good as the CPQR-type algorithm.This is because each Φ (k) has the same theoretical guarantee as the CPQR-type algorithm according to Theorem 1, and ( 17) (18) are just checking the consistency across all frequencies.In Section VI, we will show that our proposed spectral method based on the MF-CPQR factorization is capable of significantly outperforming the CPQR-type algorithm.
Besides, for the joint estimation problem with continuous phase angles, ( 17) and ( 18) will be modified as Solving the max problem over [0, 2π) is infeasible in general.Instead, one can apply the zero-padding and FFT for an approximate solution with any desired precision.Specifically, in estimating the cluster assignment, by padding zeros to [R

B. MF-CPQR Factorization
As stated in Definition 2, the difference between the ordinary QR factorization and the CPQR factorization is selecting appropriate pivot ordering (encoded in Π N ).The CPQR factorization attempts to find a subset of columns that are as most linearly independent as possible and are used to determine the basis.In this paper, the CPQR factorization across multiple frequencies is developed to cope with the multi-frequency structure of the MLE formulation.
Output: {Q (k) } Kmax k=−Kmax , {R (k) } Kmax k=−Kmax , and Π N Algorithm 3: One step QR factorization using Householder transformation , where ∠r 1 is the phase angle of r 1 3 u ← r − θe, where e = [1, 0, . . ., 0] Output: Q and R. k = −K max , . . ., K max .The multi-frequency column-pivoted QR factorization of X (k) is the factorization , as computed via Algorithm 2 where Π n ∈ {0, 1} n×n is a permutation matrix fixed for all k = −K max , . . ., K max , is an upper triangular matrix, and It requires to i) obtain the same subset of columns among all frequencies that are as most linearly independent as possible, and ii) use the same pivot ordering (or Π N ) among all frequencies.The former promotes the cluster structure estimation performance because each node i (other than the pivots) is assigned to a cluster mainly according to the similarities between the column i and the columns of pivots, the latter ensures the validity of ( 17) and (18).
The MF-CPQR factorization is detailed in Algorithm 2, where the Householder transform [46] (Algorithm 3) is adopted for a better numerical stability.Specifically, the novel MF-CPQR factorization is different from the ordinary CPQR [45], [52] in the pivot selection.The pivot is determined by finding the column with the largest summation of 2 norm of residuals over all frequencies (see line 3 in Algorithm 2).

C. Computational Complexity
In this section, the computational complexity of Algorithm 1 is summarized step by step in Table II.Here, we suppose M = Θ(1).First, it consists of O(K max ) times of eigendecomposition for M eigenvectors, which is O(|E|) per time if using Lanczos method [53].For the MF-CPQR factorization, it consists of M times of column pivoting (O(N K max ) per time) and M K max times of one step QR factorization (O(N ) per step).In terms of recovering the cluster structure, we first compute M N times of FFT for length-K max vectors (O(K max log K max ) per vector) and then compute the maximums (O(N K max )+O(N )).Since the FFT of {R (k) } Kmax k=−Kmax is already computed, it is only O(N ) for synchronizing the phase angles.Overall, the computational cost is linear with the number of edges |E| and nearly linear in ), the complexity of Algorithm 1 will be reduced.For instance, in the case when |E| = O(N log N ) or |E| = O(N ), which is very common as shown in [54], the complexity of Algorithm 1 will be O(K max N max{log N, log K max }) or O(K max N log K max }), respectively.

V. ITERATIVE MULTI-FREQUENCY GENERALIZED POWER METHOD
In addition to the spectral method based on the MF-CPQR factorization proposed in Section IV, we develop an iterative multi-frequency generalized power method for the joint estimation problem, which is inspired by the generalized power method [43] and the"multi-frequency" nature of the MLE formulation (11).

A. Detailed Steps and Motivations
Since the joint estimation problem is non-convex, the iterative multi-frequency generalized power method requires a good initialization of the cluster structure and associated phase angles that are sufficiently close to the ground truth.Various spectral algorithms (e.g., CPQR-type algorithm [42, Algorithm 1], [43,Algorithm 3], and Algorithm 1) can be used for initialization.It is observed experimentally that random initialization will result in convergence to a sub-optimal solution.Each iteration of Algorithm 4 consists of three main steps.The first step (line 3) is the matrix multiplication between A (k) and V (k),t for all k = −K max , . . ., K max (line 4).Then we leverage (line 4) V (k),t+1 across all frequencies to aggregate and refine the information needed for the joint estimation problem (23), which is inspired by (17).The last step is estimating the cluster structure and associated phase angles.As mentioned before, Algorithm 4: Iterative multi-frequency generalized power method Input: The observation matrix A, the initialization {Sm} M m=1 and {θ i ∈ Ω} N i=1 , and the number of iterations T 1 Construct { V (k),0 } Kmax k=−Kmax using {Sm} M m=1 and {θ i ∈ Ω} N i=1 according to (12) 2 for t = 0, 1, . . ., T − 1 do / * Matrix multiplication * / 3 For k = −Kmax, . . ., Kmax, compute the matrix multiplication / * Recovery of the cluster structure and associated phase angles * / 5 For each node i ∈ [N ], assign its cluster assignment as then estimate the associated phase angle given the estimated cluster assignment 6 Output: Estimated cluster structure { M(i)} N i=1 and estimated phase angles { θi } N i=1 giving V max,t+1 and then finding the corresponding cluster assignment is equal to solving the MCAP (see Definition 3).This is equivalent to projecting V max,t+1 onto the feasible set H (line 5), after which the matrix H t+1 is obtained.The reason why the projection P(•) is needed rather than directly using the index of the largest entry in each row of V max,t+1 is because the solution of the latter approach does not necessarily satisfy the constraint based on the size of each cluster.The associated phase angles can be recovered according to the recovered cluster structure (24).Besides, the modification of the iterative MF-GPM for the joint estimation problem with continuous phase angles is the same as that of the spectral method based on the MF-CPQR factorization.
The iterative GPM in [36] is built upon the classical power method, which is used to compute the leading eigenvectors of a matrix.The method in [36] adds an important step: projection onto the feasible set that is induced by the constraints on the cluster structure and phase angles.The iterative MF-GPM introduced here takes a step further by not only taking advantage of the efficiency of the power method and the projection, but also leveraging the information across multiple frequencies.In Section VI, numerical experiments show that the iterative MF-GPM largely outperforms GPM [43].

B. Computational Complexity
In this section, we compute the complexity of Algorithm 4 step by step in Table III.Again, here we assume   This section deals with numerical experiments of the spectral method based on the MF-CPQR factorization (Algorithm 1) and the iterative MF-GPM (Algorithm 4) to showcase their performance against state-of-the-art benchmark algorithms 3 .For comparison, the benchmark algorithms are chosen as i) the CPQR-type algorithm [42], ii) the GPM [43], where both of them can be modified identically from the joint community and group synchronization problem into the joint community detection and phase synchronization problem.Specifically, algorithms in [42], [43] are single frequency version of our proposed algorithms, which can be realized by replacing the summation over k in ( 17), ( 18), (23), and ( 24) with k = 1.
In each experiment, we generate the observation matrix A using the probabilistic model, SBM-Ph, as discussed in Section III and estimate the cluster structure and associated phase angles by the spectral algorithms based on the MF-CPQR factorization, the iterative MF-GPM, and the benchmark algorithms.To evaluate the numerical results, we defined two metrics, success rate of exact recovery (SRER) and error of phase synchronization (EPS), for recovering the cluster structure and associated phase angles.In terms of SRER, it shows the rate of algorithms exactly recover the cluster structure.Let Ŝm = {i ∈ [N ]| M(i) = m} be the set of nodes assigned into the mth cluster by algorithms, and we have that As for the EPS, it assesses the performance of recovering phase angles.We define θ * ,(m) = [e ιθ * i ] i∈S * m ∈ C s for each cluster that concatenates the ground truth θ * i for all i ∈ S * m , and similarly θ(m) = [e ι θi ] i∈S * m ∈ C s for the estimated phase angles.Then, after removing the ambiguity with aligning θ(m) with θ * ,(m) in each cluster as The EPS is actually the maximum error of estimated phase angles among all nodes.Besides, both the SRER and EPS are computed over 20 independent and identical realizations for each experiment in the following.In the rest of this section, we first present the results of the joint estimation problem in Section VI-A and followed by the extension to continuous phase angles in Section VI-B.

A. Results of the Joint Estimation Problem
We first show the results of the spectral method based on the MF-CPQR factorization (Algorithm 1) against the CPQRtype algorithm [42] on the joint estimation problem, where the case of M = 2, s = 500, and K max = 16 is considered.Similar to [42], [43], we test the recovery performance in the regime p, q = O( log n /n), where different p = α log n /n and q = β log n /n with varying α and β are included.In Figure 3, we show SRER (25) and EPS (26).As one can observe from Figure 3a and 3c, our proposed spectral method based on the MF-CPQR factorization outperforms the CPQRtype algorithm [42] in SRER.EPS follows a similar pattern.
Next, we test the performance of the iterative MF-GPM (Algorithm 4) against the GPM [43] under the same choice of M , s, and K max as before.Since the GPM and the iterative MF-GPM require initialization that is close enough to the ground truth, we can choose either [43,Algorithm 3] or the CPQR-type algorithm [42].We set the number of iterations to be 50 as suggested by [43].Again, as one can observe from Figure 4, our proposed iterative MF-GPM achieves higher accuracy in both SRER and EPS.Surprisingly, one may also notice the region where p is small and q is large (top left area in Figure 4c), the iterative MF-GPM is capable of recovering the cluster structure with high probability, however, this is not the case in recovering associated phase angles.
When compare the results shown in Figure 3 and 4 together, the spectral method based on the MF-CPQR factorization shows very similar result as the iterative MF-GPM, which are both significantly better than the GPM [43] and the CPQR-type algorithm [42].However, compared to the iterative MF-GPM, the spectral method based on the MF-CPQR factorization is free of initialization.One may also observe the performance of the GPM [43] outperform the CPQR-type algorithm [42].

B. Results with Continuous Phase Angles
In this section, we show the results of our proposed algorithms against benchmark algorithms on the joint estimation problem with continuous phase angles.As mentioned in Section III-C, the algorithms tested in Section VI-A can be directly applied after simple modification (See Section IV-A for details), and thus we choose the similar setting as Section VI-A.Besides, since (15) is a truncated MLE formulation of the true one (13), experiments of the spectral method based on the MF-CPQR factorization and the iterative MF-GPM with different K max are conducted to study the trend of the results as K max grows.The results are detailed in Figure 5, with very similar performance as shown in Figure 3 and 4. In addition, as K max grows, the cluster structure recovery and phase synchronization become more accurate in both MF-CPQR based method and iterative MF-GPM.
To choose a suitable K max for the continuous phase angles, we need to consider the trade-off between the performance and the computational complexity.We observe that the estimation accuracy is improved as K max increases.On the other hand, the computational complexity scales linearly with K max .In addition, the computational complexity also depends on the number of nodes N and the number of clusters M , which needs to be taken into consideration for the trade-off between accuracy and efficiency.Thus, it is difficult to state a simple optimal policy for choosing K max for the continuous phase angles.Despite this, we have shown that our methods outperform the CPQR-type algorithm and the GPM as long as K max ≥ 1, and moreover largely outperform other baseline algorithms when K max ≥ 10.Therefore, our choice of K max is between 10 to 30 for most cases.

VII. CONCLUSION
In this paper, we study the joint community detection and phase synchronization problem from an MLE perspective, and provide the new insight that its MLE formulation has a "multifrequency" nature.We then propose two methods, the spectral method based on the novel MF-CPQR factorization and the iterative MF-GPM, to tackle the MLE formulation of the joint estimation problem, where the latter one requires the initialization from spectral methods.Numerical experiments demonstrate the advantage of our proposed algorithms against state-of-the-art algorithms.
It remains open to establish the theoretical analysis that can tightly characterize the noise robustness of our proposed algorithms.Sub-optimal bounds can be easily derived following the analysis in [42], [43] by considering the frequency-1 component.However, these results do not explore additional frequency information.The key difficulties lie in i) analyzing the properties and relationships of eigenvectors among different frequency components with dependent noises, and ii) analyzing the power method across multiple frequencies.We leave them for future investigation.
In addition, there are several directions that can be further explored.It is natural to expect that the proposed approach can be extended to compact non-Abelian groups (e.g., rotational groups, orthogonal groups, and symmetric groups) using the corresponding irreducible representations.

A
. Stochastic Block Model with Discrete Relative Phase Angles SBM-Ph is considered in a network with N nodes and M ≥ 2 underlying clusters of equal size s = N /M.We assume each node i ∈ [N ] falls into one of M underlying clusters with the assignment M * (i) ∈ [M ], and is associated with an unknown phase angle θ * i ∈ Ω, where Ω := {0, . . ., (2K max + 1)∆} is a discretization of [0, 2π) with ∆ = 2π /(2Kmax + 1).We use S * m to denote the set of nodes belonging to the mth cluster for all m ∈ [M ].SBM-Ph generates a random graph G = ([N ], E) with the node set [N ] and the edge set E ⊆ [N ]×[N ]

1 :
Algorithm 1 consists of three steps: i) Eigendecomposition of A (k) , ii) MF-CPQR factorization, and iii) Recovery of the cluster structure and phase angles.It first computes matrices {Φ (k) } Kmax k=−Kmax that contain the top M eigenvectors of each A (k) via eigendecomposition.Secondly, matrices {R (k) } Kmax k=−Kmax are obtained through the MF-CPQR factorization, which is detailed in Algorithm 2. The last step is Algorithm The spectral method based on the MF-CPQR factorization Input: The observation matrix A, and the number of clusters M . 1 (Eigendecomposition) For k = −Kmax, . . ., Kmax, compute the top

Fig. 2 :
Fig. 2: Illustration of step 3 in Algorithm 1.For each i ∈ [N ], all the ith columns in R (k) Kmax k=−Kmax are extracted to estimate the cluster assignment and phase angle following (17) and (18).

Fig. 3 :
Fig. 3: Comparison between the CPQR-type algorithm [42] (in the first row) and the spectral method based on the MF-CPQR factorization (in the second row) in terms of the success rate of exact recovery (SPER) and the error of phase synchronization (EPS), where a smaller black area in each figure indicates a better performance.Experiments are conducted with the setting M = 2, N = 1000, and K max = 16.(a) and (c): SRER (25) under varying α in p = α log n /n and β in q = β log n /n; (b) and (d): EPS (26) under varying α and β.

Fig. 4 :
Fig. 4: Comparison between the GPM [43] (in the first row) and the iterative MF-GPM (in the second row) in terms of the success rate of exact recovery (SPER) and the error of phase synchronization (EPS), where a smaller black area in each figure indicates a better performance.Experiments are conducted with the same setting as Figure 3. (a) and (c): SRER (25) under varying α in p = α log n /n and β in q = β log n /n; (b) and (d): EPS (26) under varying α and β.

20 Fig. 5 :
Fig. 5: Results for the joint estimation problem with continuous phase angles in [0, 2π) using the CPQR-type algorithm [42], the GPM [43], the spectral method based on the MF-CPQR factorization, and the iterative MF-GPM, where a smaller black area in each figure indicates better performance.The choice of M and N are the same as Figure 3.The first and third columns show the SRER, and the second and fourth columns shows EPS.(a), (b), (c), and (d): The results of the CPQR-type algorithm [42] and the GPM [43]; (e), (f), (g), and (h): The results of the spectral method based on the MF-CPQR factorization and the iterative MF-GPM with K max = 5; (i), (j), (k), and (l): The results of the spectral method based on the MF-CPQR factorization and the iterative MF-GPM with K max = 10; (m), (n), (o), and (p): The results of the spectral method based on the MF-CPQR factorization and the iterative MF-GPM with K max = 20.

TABLE I :
Notation table.

TABLE II :
The computational complexity of Algorithm 1 in each step.

TABLE III :
The computational complexity of Algorithm 4 in each step.