Dynamic community discovery via common subspace projection

Detecting communities of highly internal and low external interactions in dynamically evolving networks has become increasingly important owing to its wide applications in divers fields. Conventional solutions based on static community detection approaches treat each snapshot of dynamic networks independently, which may fragment communities in time (Aynaud T and Guillaume J L 2010 8th Int. Symp. on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (IEEE) pp 513–9), resulting in the problem of instability. In this work, we develop a novel dynamic community detection algorithm by leveraging the encoding–decoding scheme present in a succinct network representation method to reconstruct each snapshot via a common low-dimensional subspace, which can remove non-significant links and highlight the community structures, resulting in the mitigation of community instability to a large degree. We conduct experiments on simulated data and real social networking data with ground truths (GT) and compare the proposed method with several baselines. Our method is shown to be more stable without missing communities and more effective than the baselines with competitive performance. The distribution of community size in our method is more in line with the real distribution than those of the baselines at the same time.


Introduction
Community detection, also known as node clustering, is one of the most active topics in the field of graph mining and network science [2][3][4][5]. The task of community detection is to group the vertices of a graph into clusters by considering the connection structure in such a way that the edges within the cluster far exceed the edges between them. Generally, the wide variety of grouping techniques are based on similarity measures defined on structural properties of the vertices [6][7][8]. As community detection has been used extensively in many applications, such as identifying political parties [9], genetically similar structures [6] and fraud in telecommunication networks [10], there are many community detection methods available in the literature [11,12]. Most existing approaches are developed to tackle static community discovery, where network connections are fixed in time. However, in many real world systems, the relationships between the nodes are constantly changing, leading to the evolution of community structures. That is, a community may grow by absorbing new members or become smaller due to the leave of some of its nodes, or even disappear after a period of time. Examples include social networks such as active collaboration circles [13], video sharing [14], communication networks such as email communications [15], and food webs [16].
Though the conventional community detection methods for static networks can be applied in dynamic scenarios when a time-evolving network is represented by a sequence of snapshots of networks, a fundamental drawback rooted in such scheme is that most of traditional algorithms are sensitive to tiny changes in the network structure [1,[17][18][19]. Specifically, for the two similar networks, the algorithm can provide very different partition results even if a few links are disturbed. Besides, as dynamic community discovery is usually associated with community tracking [20], the framework of an independent identification of each snapshot needs an additional set matching operation to obtain the evolution trajectories of the nodes' affiliations. Instead of detecting and tracking communities in phases like the aforementioned solution, we propose to achieve the two goals simultaneously. Precisely, we tailor the succinct representation method [21] to project all snapshots into a common but lower-dimensional subspace where the snapshots can be reconstructed and compared more effectively. It should be noted that the proposed method here is different from the previous work [21]. First, that work aims at summarizing a dynamic network, i.e. finding a representation of all snapshots of the dynamic network in a period of interest, which is achieved with principal component analysis (PCA) on the multiple network snapshots. In the present method, however, we focus on dynamic community detection using node similarities across the period of interest, i.e. the correlation between nodes' concatenated connectivity profile. Although we also use PCA, this technique is applied to the node similarity matrix 4 to obtain an eigen-space, wherein all connections among nodes at each time step can be reconstructed. The benefits of this operation are, it not only makes the strong ties between similar nodes retained, but serves the need of global smoothing.
Furthermore, we compare the proposed method with various existing baselines including the state-of-the-art methods and show the superiority of our method. Surprisingly, we find that most existing dynamic community detection approaches lack adaptiveness to different community evolution patterns, which will be detailed in the experimental results. Besides, we explore the community properties related to the performance of the algorithms and especially analyze the results detected by our method.
In the remainder of this paper, we first provide a short summary of previous works relevant to this topic in section 2. In section 3 we describe the proposed method in detail, followed by its experimental evaluations and discussion. Finally, we conclude with a discussion of the proposed method and highlight some promising directions for future work.

Static community discovery
A community is described as a substructure in a network [22], where the nodes are more densely connected internally than with the rest of the network. However, community is not defined strictly, but algorithmically [12]. Even so, it is still of particular importance to view the network structure in mesoscales. To this end, many approaches have been developed to mine the communities in modular networks [23][24][25]. Generally, those approaches fall into several categories: node similarity based method [26], optimization oriented method [27], network dynamics based method [28] and statistical modeling [25]. Among various methods, spectral clustering [29] is a commonly used community detection method based on nodes' similarity, which has been adopted to solve the problem of overlapping community structure [7]. Spectral clustering based community detection intimately associates with subspace sparse coding. Several recent studies [8,30,31] have shown that, by introducing subspace sparse representations of the nodes, the performance of spectral clustering can be improved.
However, most of the existing community discovery algorithms work on static networks without temporal information.

Dynamic community discovery
Community discovery in dynamic networks focuses on identifying and tracing the community evolution, which can characterize the time-varying behaviors of the dynamic network as well [18,32]. Generally, the dynamic community discovery algorithms fall into two main categories: one-stage methods [33,34] and two-stage methods [1,35]. A one-stage method is designed by detecting and tracing simultaneously, TimeRank [34] falls in this category. By labeling the time attribution on the nodes of each snapshot, and transferring the dynamic network into a static network, TimeRank method [34] uses the static community discovery algorithm to detect the cross-time communities and trace the evolution of communities. In contrast, a two-stage method detects communities in each snapshot and then matches each community between pairs of snapshots to trace the dynamic community structure. Moreover, the two-stage method is more complex than the one-stage method due to additional matching for all potential community pairs. A critical issue in one-stage dynamic community detection is the stability of the results, which generally involves two aspects: a tiny variation in network structure may lead to quite different detection results for a generic algorithm, and the algorithms may fail to detect meaningful communities due to the complex evolution processes of dynamic networks [1]. Note that, the two-stage method still cannot solve the instability problem if its detecting operation does not involve the smoothing over snapshots [18,32]. Recently, by considering the time-dependent relationship between sequential snapshots, some two-stage methods [36,37] have shown its advances in mitigating the instability. Those methods belong to the evolutionary clustering [19].
In this work, we propose a one-stage method. But different from the existing algorithms, our approach solves the problem from a signal processing perspective, i.e. we treat a dynamic network as the combination of signal connections and noisy connections, for which the theoretical basis is demonstrated in the following section. We first generate a similarity matrix to characterize the proximity strengths of the nodes in terms of their dynamic connection records. By leveraging the above signal combination assumption, we suppose that small weights in similarity matrix may indicate noisy links. There we use the PCA technique on similarity matrix to get the signal subspace, where the links between dissimilar nodes can be filtered out by projection and reconstruction operations on the connections of the original network snapshots. As a result, the connections between different communities are most likely to be removed and the relationships among nodes in the communities are emphasized, facilitating the identification of communities. Compared to two-stage methods that require additional matching for all potential community pairs, our method detects the communities all at once, which makes it more efficient than two-stage methods. Moreover, since our method is performed on the snapshots of dynamic networks, not the conjunction of all snapshots in many one-stage methods, our method is also superior to the state-of-the-art one-stage method such as TimeRank.

Problem formulation
Considering a dynamic network G that describes the evolution of relations between interacting objects, there are two ways to model it: by temporal networks [38][39][40] or alternatively, by a sequence of snapshots [41]. Our proposed method belongs to the latter form. That is, G = {G t |t = 1, 2, . . . , T}, where G t is the snapshot at the tth time, and T is the length of the sequence G. Let A t denote the adjacency matrix corresponding to the snapshot G t = (V t , E t ), where E t and V t are the edges and nodes appearing in G t , respectively. Accordingly, the node set of G is V = ∪V t within the period-of-interest.
Dynamic community discovery aims to detect all dynamic communities in a dynamic network. We define the dynamic communities C = {C 1 , . . . , C j , . . . , C K } as a set of clusters with constant labels C j (j = 1, . . . , K), where K is the total number of dynamic communities. Moreover, C j = {i t |i t ∈ V t , 1 t T} consists of nodes from different time steps, where i t denotes node i at time t. That is, the community members are time-stamped. As such, it is convenient to observe the life-cycle of a dynamic community as well as the transition of community affiliation of a node [32]. For instance, to see how many members a community C j has at a specific time t 0 , one can count the nodes with time stamp t 0 in C j . Similarly, since there is a correspondence between community affiliations and time-stamped nodes, by tracing the community affiliation of a node in time, the transition from one affiliation to another can be detected.
Generally, discovering the underlying community patterns in networks is also known as clustering of nodes, which is usually based on certain similarity measures defined over networks in a high-dimensional feature space. Among the existing approaches, matrix factorization has been actively used in mapping the nodes to a lower-dimensional vector subspace of the latent feature space [30,35]. For instance, the spectral decomposition of the Laplacian of the adjacency matrix or its variants that can embed nodes into the space composed of one or more eigenvectors is a common practice. In this work, the proposed method is also based on matrix factorization. However, different from spectral clustering where the nodes are encoded in a subspace and then spatial clustering algorithm is applied on the embedding, we construct a common subspace to act as the filter for the connection among nodes, which can be derived from the following theoretical analysis.
Since the connections that link different communities usually cause the ambiguity of community structure and affect the decision boundary of detection models, such connections can be treated as noise. From this point of view, we decompose the zero-mean similarity matrix asM = X + W, where X denotes the signal part and W is the noise part. Suppose that all signals and noises are mutually uncorrelated and the noises have identical variance δ, then the covariance matrix R =M T ×M can be rewritten as where I is the identity matrix. By applying eigen-decomposition on R, then we have where U is the eigen-matrix composed of the eigenvectors corresponding to R. Let Σ be the diagonal matrix  (5) time-stamped node clustering. There are two data processing channels indicated by gray arrows and blue arrows, respectively. Specifically, gray arrows represent the process of finding the common subspace (step (1)- (3)), while blue arrows represent the process of structural filtering on individual network snapshots (step (4), (5)).
composed of eigenvalues corresponding to R, and U be the matrix of eigenvectors, then R can be rewritten as: Suppose the signal X is low-rank and the signal-noise ratio is large, then equation (2) can be rewritten as: where Λ = U s 0 0 0 . Obviously, the first term in equation (2) is equivalent to the reconstructed signal matrix X. The principal components in U can be obtained by performing PCA on R.
In the light of the theoretical analysis, we use PCA to obtain a common low-dimensional subspace for all network snapshots, where the encoding-decoding operation acts as filtering and retains the strong ties between similar nodes in each snapshot. Then, we perform a conventional clustering method on the node column vectors corresponding to the decoded snapshots. As a result, changes of the community affiliations of a node can be easily identified.

Community discovery in common subspace
The framework of our method is diagrammatically illustrated in figure 1 with a toy example. Specifically, we first measure global similarities between nodes by comparing their connectivity profiles across time. In the example shown in the figure, the inner product is performed on each pair of historical connectivity profiles of the nodes (e.g., i, n, k and j) obtained by concatenating the adjacency matrices of snapshots, resulting in the similarity matrix (as shown in the left-hand side of figure 1. Then we employ the PCA method on the node similarity matrix to derive an eigen-subspace (as shown in the center of the plot) with which different network snapshots are reconstructed in an encoding-decoding scheme. Thus the left-hand side adjacency matrices in figure 1 can be projected to the eigen-subspace and then reconstructed with inverse projection. This way, the weak ties lying between two communities are most likely to be filtered out as noises. As a result, two nodes in the same community are much more likely similar to each other but distinct if they belong to different communities. That is, the community structure in the snapshots is more prominent, which will facilitate the follow-up clustering. Below are the details: As the connections on each node is time-varying, a matrix A ∈ R (T×|V|)×|V| is constructed to store the historical edges in the dynamic network G. Here, A(i * t, j) = 1 indicates that node i is connected with node j in the tth snapshot, where i, j ∈ V and t = {1, 2, . . . , T}. Clearly, a node can be characterized by its immediate and higher-order neighborhoods. For simplicity, we use the nearest neighbors to describe a node, that is, the ith column (or row) of A t (:, i) is a basic representation of node i in snapshot G t , which is denoted by v i t . Then the complete representation of node i is the column vector A(: T ] T as shown in figure 1. Next, node similarity is calculated according to the similarity of connectivity profiles between two nodes. In general, the direct connections and the second order neighbors are used to depict the connectivity profile of a node, as a node is more relevant to its nearest neighbors than distant nodes, while the second order neighbors can account for the similarity when the nearest neighbors change in time frequently. In this case, the corresponding similarity matrix M can be written as follows: where is the total number of common neighbors of two nodes in T time stamps, indicating a rough similarity between nodes. As aforementioned, the direct connectivity is a straightforward indicator of the pairwise similarity, especially for the case where dynamic networks change unevenly. Thus alternatively one can combine these two components together, which results in where α, β ∈ [0, 1] determine the relative importance of the direct and indirect connectivity in measuring dynamic similarity between nodes, respectively. 5 Then the columns of M are employed to represent the nodes, instead of A. Assuming that the contribution of different neighboring nodes to the characterization of a target node is different, we expect that the nodes can be expressed by their similar neighbors. Therefore, we take the links between a node pair with a low similarity value (measured by the number of common neighbors) as noise. Then, a denoising technique can be devised to derive a low-dimensional subspace based on the previous theoretical analysis.
We apply PCA to the covariance matrix M to get the principal matrix P which usually spans a low-dimensional space. The spanned space is taken as the common projection subspace. Specially, we first normalize the column representation vectors M(:, i) as follows: In step 2, we calculate the covariance matrix Φ, Step 3 performs the spectral decomposition to get the principal matrix P = (p T 1 , p T 2 , . . . , p T s ) T , where p i = (p i1 , . . . , p i|V| ) ∈ R |V| is an eigenvector corresponding to the top-s eigenvalues. Generally, s is a prior parameter and can be set according to the distribution of the eigenvalues, which is detailed in the appendix A. Alternatively, s can also be determined by the contribution rate of s principal components, i.e.
The subspace spanned by the principal matrix P is what we seek.
To mitigate the instability of the communities detected by static community detection approaches, we project all snapshots to the common subspace and then reconstruct the connection patterns by filtering the negative weights. The details are shown in the following: (i) Firstly, the feature representations of the nodes in each snapshot are normalized to have zero-mean: where j t is the node in V t and v j t is its vector representation.
(ii) Then the normalized vectorv i t is projected to the common subspace: Algorithm 1. Dynamic community discovery by a common subspace (ComSP).
Input: G = {G t |t = 1, 2, . . . , T}, the dynamic network; s, the selected number of principal components; K, the number of clusters Output: the set of dynamic communities C 1: A ← the matrix storing the historical edges in G 2: M ← the similarity matrix produced by equation (6) 3: P ← the principal matrix by equations (7) and (8) :v it ← the normalized vector by equation (9) 8:ṽ it ← the representation by equations (10)-(12) 9: R = R ∪ṽ it 10: end for 11: end for C ← the result of the K-means algorithm for R Return C Table 1. Statistics of the datasets: the total number of nodes (n), the minimum number of nodes and edges in the snapshots (n min and m min , respectively), the maximum number of nodes and edges in the snapshots (n max and m max , respectively), the average (n andm) and the variance (n var and m var ) of node and edge numbers, respectively, the number of dynamic communities (K) and the size of dynamic network (T). Decimal part is omitted. (iii) Furthermore, the low-dimensional embedding of each snapshot is decoded in the following way: which means that the relational patterns among the nodes in each snapshot are reconstructed. In fact, v i t embodies the information about neighbors of node i. Thus, f i t reconstructs the relationship between node i and other nodes in snapshot G t , in which positive values correspond to statistically strong connections among similar nodes but negative values indicate a weak relation over the time window. Through the above encoding -decoding process, f i t combines the local connection pattern with temporal information for global-smoothing.
(iv) Filtering the negative connection, we keep the strong connection for the reconstructed relationship, which is the succinct and global-smoothing representation of each node i at time t. The detailed operation is as follows:ṽ where δ(·) is the step function which takes 1 for positive variables and 0, otherwise.ṽ i t is the normalized vector of node i in snapshot G t . Finally, we cluster all the novel time-stamped representation of the nodes {ṽ i t |t = 1, 2, . . . , T, i = 1, 2, . . . , |V|} using existing clustering methods such as the K-means algorithm [42]. This way, detecting and tracing dynamic communities are done in one stage. Specifically, the label of clusters are constant over time in this case and the nodes that appear in several snapshots may have multiple community labels, which makes it very convenient to trace a possible change of each community members and the evolution of communities as well.
Algorithm 1 gives a detailed description of the whole procedure, which is called ComSP for short. We find that the time complexity of the algorithm is of O(T × |V| 2 × K), which is lower than the state-of-the-art one-stage method TimeRank [34].

Dataset description
We conduct experiments using synthetic networks and a real-world social network with different periods. Table 1 summarizes the statistics of the datasets.
Synthetic networks is a sequence networks with four snapshots, designed to imitate the evolution of communities in dynamic networks. We use the stochastic block model (SBM) [43][44][45][46][47] to generate four snapshots, where the cross-block connection probability is set to 0.2 and the in-block connection probability is 0.6. Moreover, there are 4 communities with 250 nodes for each in these snapshots. Then a perturbation is applied on the edges of the snapshots with the probability 0.01. Particularly, in the last snapshot, the community affiliations of two randomly selected nodes are changed, compared to the previous snapshot. Hence, this sequence embodies typical operations on communities, i.e. apparent continuation and unnoticeable growth/contraction [48].
Reddit 6 the raw dataset records the social networking activities with the threads of posts and comments associated with subreddits, e.g. 'dogs', 'tennis', which are the innate community labels for the networks composed of the users and their interactions. We take these subreddits as the ground truths (GT) of communities. Moreover, the community label of a node is determined by the majority of the subreddits their replies belong to. Then the communities of the evolving social network behave in a dynamic way, e.g. birth/death, growth/contraction and merging/splitting. Note that, to reduce the influence of random sampling on the performance comparison, we use the entire dataset contributed by Pushshift 7 . Based on the raw records, we construct three sequences of weekly unweighted and undirected networks (referred to as Reddit-I(a), Reddit-II and Reddit-III(a), respectively) from 1st September 2010 to 28th September 2010, and two longer sequences (Reddit-I(b), Reddit-III(b)) from September and October 2010. The details of how to construct these network sequences can be found in https://github.com/NightmareNyx/ CommunityTracking.

Baselines and experimental settings
The baseline methods used for comparison cover all three categories of dynamic community discovery approaches [32], namely, the instant optimal that considers communities of each snapshot independently, the temporal trade-off in which communities at t is the result of trade-off between the optimal solution at t and the known past (or global optimization), and the cross-time method that searches partition solutions for all snapshots simultaneously.
MultiGL [41,49] utilizes a multislice generalization of modularity to study the community structure of dynamic network. Specially, this method couples between successive snapshots and rewrite the weight of intra-slice and inter-slice connections to realize 'cross-time' category. Then GenLouvain algorithm with the specified quality function is used to detect communities in the new network with |V| × T nodes.
TimeRank [34] also belongs to the 'cross-time' category. In the method, a time-weighted network with (|V| × T) nodes is producted by MutuRank [50] then undergoes spectral clustering. It has two variants, i.e. TR-AOC, TR-NOC, relying on the type of relations between nodes. Since it was shown that TR-NOC is generally better than TR-AOC [34], in our experiment, we use TR-NOC as the baseline.
Spectral clustering [29] is designed for a static community discovery. Here, we use it as an 'instant optimization' method. Thus we implement the two-stage detection with spectral clustering, which is referred to as ts-Spect approach. Specifically, we use the spectral clustering algorithm to discover communities in each snapshot, and match the communities in two consecutive snapshots. The match quality is measured with the Jaccard index.
GDG [30] considers the continuous density field to map each node into the geometric space, then detects static community by clustering algorithm. In order to conduct comparisons with two-stage methods, we use this algorithm to detect communities in each snapshot and then align communities to trace dynamic communities.
PisCES [35] is a temporal trade-off approach, which achieves the global smoothing of the eigenvectors of each snapshot by the regularization optimization method. PisCES detects the clusters in each snapshot by applying the K-means algorithm on their corresponding smoothed eigenvectors. These two operations are performed simultaneously. Finally, we utilize the maximum matching degree to trace the community detected by PisCES. sE-NMF [36] uses evolutionary non-negative matrix factorization as a temporal smoothness framework for community detection. Then, by the greedy search procedure, sE-NMF uses mutual information (MI) to measure similarity between clusters of successive snapshots and maps local cluster to dynamic community for tracing.
CCPSO [37] formulates dynamic community detection as multi-objective optimization problem. It regards the common knowledge between the optimal partitions of the current and previous snapshots as consensus community, then utilizes consensus community to guide particle swarm optimization (PSO) for the current snapshot. At the last step, CCPSO detects the clusters for each snapshot. In this sense, CCPSO falls into the category of temporal trade-off approach. Due to lack of tracing method in its original algorithm [37], we track a community in a way similar to PisCES. Adj-Mat uses the adjacency matrix A t (:, i) as the representations of the nodes in each snapshot, which is the basic representation v i t of our method. By performing K-means on all basic representations of each snapshot, Adj-Mat detects and traces dynamic communities.
SuRep [21] obtains the symmetric matrix M by averaging all snapshots. Then, similar to the proposed method, SuRep groups the novel time-stamped representation of the existing nodes and traces the dynamic communities.
Other methods such as non-negative tensor factorization [33] have been compared by Sarantopoul et al [34], and TimeRank outperformed this method in most cases. We omit it in our experiments.

Evaluation metrics
We evaluate the model performance with several metrics that are borrowed from the field of clustering and commonly used in testing and comparing methods. They are normalized mutual information (NMI) [51], averaged rand index (ARI) [52], and the BCubed version of precision (P) recall (R) and the combined metric (F 1 ) [53], whose definitions and calculations are detailed in the appendix B. Particularly, NMI derives from entropy in information theory, which calculates mutual information between ground truth labels and labels from clustering results. However, the most important information we can get from clustering results is not labels, but rather which nodes are clustered together and which ones are not, motivating people to define rand index. To notice the importance of small clusters in imbalanced data, ARI is proposed. Difference from the macroscopic measure of NMI and ARI, BCubed measures the precision and recall of the predicted community affiliation of each node and averages them to compare predicted communities with GT communities.  Table 2 displays the NMI, ARI and Bcubed indexes for all six datasets. There are several observations in the results. Apparently, no method excels in every dataset. However, our proposed model shows a competitive performance, compared to the baselines. In particular, the improvement is prominent in ARI and F 1 . More importantly, it is shown that the proposed method is more stable than all nine baselines when confronted with different datasets. Surprisingly, TR-NOC, PisCES and Adj-Mat identify only one giant community in Reddit-I(a) (with 4 snapshots) and Reddit-I(b) (with 8 snapshots) respectively, which is attributed to the value 0 of ARI. Similar results are found for CCPSO in SBM, implying a lack of adaptiveness in diverse evolution patterns of communities. In fact, the poor detection results of CCPSO in SBM can be traced to the its initialization (PGLP) [54], which is more likely to identify the whole network as one community in the networks where there are many links between communities, resulting in high recall but low precision and NMI index. Another observation is that, MultiGL has poor performances for all six datasets. The reason is that MultiGL take little notice of time coherence between snapshots, leading to high sensitivity to small changes of links. Moreover, our method is superior to SuRep on most of the datasets, though both of the two methods use PCA for network reconstruction. We also note that in the simulated dynamic network dataset (SBM), the proposed method and SuRep fail to spot two nodes whose community affiliations are changing in the fourth snapshot. We believe this is due to the denoising effect of the PCA operation, as trivial changes of a community will be neglected in the encoding-decoding process.

Results
Modularity is the most widely used measure to evaluate the compactness and topological consistency of communities. However, a major drawback of using such golden quality function is that it will favor methods that are designed to maximize it, which may result in misleading comparisons. The stability of the detection methods can also be manifested from the variations of the modularity [55] at each time step, as shown in figure 2. By comparing the modularity curves, it is shown that the instant(static) community detection methods (ts-Spect and GDG) are considerably volatile with large variations of modularity, while our method is relatively more stable than the baselines in term of modularity changes and scores as well. Specifically, our method outperforms the baseline methods significantly except CCPSO and TR-NOC that are competitive on last two datasets. It should be noted that PisCES and CCPSO are designed for smoothing [32,35,37], leading to even curves in all datasets, especially in Reddit-III(b) there is a remarkable structural change in the snapshot at time step 5, compared to the previous snapshots, where the performance of other methods vary evidently. However, global smoothing (e.g. PisCES) overlooks the evolution of communities, which usually causes poor performance of community identification.
Besides the ground-truth based metrics and golden standard indices, the partition quality can also be evaluated by comparing the distribution of the community size resulting from the different methods and ground-truths. Figure 3 shows the sizes of communities in the boxes, where red boxes are the ground-truth distributions. Compared with the others, our method and SuRep produce the size distributions that closely approximate the ground-truth distribution, while as shown in table 2 our method excels SuRep. Moreover, the distribution of community sizes in TR-NOC shows large variances and low medians, indicating that TR-NOC tends to generate large communities, which explains why TR-NOC has high recalls. By contrast, the distribution in CCPSO has low variances and low medians, which means that it generate a lot of small communities shown in table 2 and explains why it has high precision.
Note that, the tracking operation in two-stage methods, e.g. GDG and ts-Spect, may produce lots of communities. We evaluate our method and two-stage methods by averaging the metrics of each snapshot in Reddit, the results are shown in table 3. It can be found that the proposed method shows a competitive performance, compared to the state-of-the-art two stage methods. In particular, although CCPSO is Table 3. Performance comparisons between ComSP and the two-stage methods on the identification of community affiliations of each snapshot in Reddit. The top 2 performance is highlighted in bold, and the second-best results are also underlined.    superior to our method in terms of precision, our method has much higher recall values than CCPSO. By looking into the precisions of CCPSO which are close to 1, we find that CCPSO generates a lot of communities in each snapshot, independent of the aligning community operation. As a result, CCPSO performs well in Reddit-III which consists of a large number of small communities.
To get more insights into the differences between the proposed method and its rival TR-NOC, we visualize the partition results of the two methods for each snapshot of Reddit-I(a) with the ground truth as a reference. It becomes evident in figure 4 that the communities discovered by our method accord with the ground truth in tracking the community evolution. Specifically, there are three ground-truth clusters (labeled in three colors) and two major components in the first snapshot which are successfully identified by the proposed method. However, TR-NOC neither unfolds three clusters in the first snapshot, nor tracks the split of the second largest component at the second time step. More noticeably, in the challenging situation of community organization at the third time step where there are some bridging nodes between two communities (as shown in the ground truth), our method provides basically clear separation between these two communities with only one error-tagging node on the boundary. In contrast, TR-NOC mixes two communities (in violet and green, respectively) together. We can conclude from the visualized results that TR-NOC tends to form large communities in essence, which explains why the recall of TR-NOC is remarkably higher than most of the baseline methods(as shown in figure 3).
Furthermore, we compare the performance of our method with TR-NOC in identifying the community affiliations of the top-10% nodes in average clustering coefficient 8 in dynamic networks. In fact, tracking the community affiliations of critical nodes has its own merits in some fields such as neuroscience [56]. In the experiments, we use this task to evaluate the performance of our method. Specifically, we rank the nodes according to their average clustering coefficient in the period of interest, where the average clustering coefficient is defined as the local clustering coefficients of a node in all snapshots averaged over time. Then we select the nodes whose average clustering coefficient is in the top-10% and spot the community memberships of these nodes in Reddit datasets with different time spans. Table 4 shows the superiority of our method on four datasets as a whole. Note that recall of TR-NOC remains higher than ours in this application, as discussed above. 8 The average clustering coefficient is computed byc

Conclusion
In this work, we have proposed a novel dynamic community discovery algorithm, which projects each snapshot into a common subspace to produce a global smoothing for each snapshot, and clusters on all time-stamped nodes in dynamic networks. This way, our method gains the best stability performance in dynamic networks, compared to the state-of-the-art methods. Another advantage of our method is, by clustering the nodes in the projected subspace, that community detection and tracing are performed in one stage. Compared with the two-stage methods, the one-stage method omits the matching stage for the sequential snapshots and reduces the computational complexity. However, we also note that one limitation of our method, like other one-stage approaches, is that our method is off-line, which means we cannot yet detect communities of dynamic networks in real time.
We have evaluated the proposed method on both real and synthetic datasets and demonstrated that it performs more stably than the baselines. The sizes of communities discovered by the proposed method are more close to the GT, that is, the number of communities is almost the same as the ground truth, indicating that our method can successfully trace most of the communities. However, the recall rate of our method is inferior to the state-of-the-art method. We believe this is due to clustering based on the constructed node similarity matrix which tends to neglect some details of temporal connection patterns among nodes. To mitigate the influence of scattered nodes and increase the purity of clusters is an important part for our future work.

Appendix B. Evaluation metrics
Let |S| be the number of nodes, L(i) be the real community of node i, and |L(i)| be the size of the community L(i). ψ(i, j) is 1 if and only if nodes i and j are correctly detected in the same community, then Bcubed measure can be written as follows: We draw a statistical matrix N with the dimensions |C| × |L|, where |C| is the number of detected communities. N cl is the number of nodes which belong to the cth predicted community and the lth true community at the same time, then N c· and N ·l are the number of nodes in the cth predicted community and the lth true community, respectively. Therefore, NMI is given by: