Modularity-based approach for tracking communities in dynamic social networks

Community detection is a crucial task to unravel the intricate dynamics of online social networks. The emergence of these networks has dramatically increased the volume and speed of interactions among users, presenting researchers with unprecedented opportunities to explore and analyze the underlying structure of social communities. Despite a growing interest in tracking the evolution of groups of users in real-world social networks, the predominant focus of community detection efforts has been on communities within static networks. In this paper, we introduce a novel framework for tracking communities over time in a dynamic network, where a series of significant events is identified for each community. Our framework adopts a modularity-based strategy and does not require a predefined threshold, leading to a more accurate and robust tracking of dynamic communities. We validated the efficacy of our framework through extensive experiments on synthetic networks featuring embedded events. The results indicate that our framework can outperform the state-of-the-art methods. Furthermore, we utilized the proposed approach on a Twitter network comprising over 60,000 users and 5 million tweets throughout 2020, showcasing its potential in identifying dynamic communities in real-world scenarios. The proposed framework can be applied to different social networks and provides a valuable tool to gain deeper insights into the evolution of communities in dynamic social networks.


Introduction
With the rapid rise of different social networking systems, such as online social networks, mobile networks [1], and collaboration networks [2], social network analysis has emerged as a critical research focus.One of the most intriguing tasks in social network analysis is community detection, which helps to identify meaningful group structures in the network.Despite community detection being well-covered in recent academic literature, it mainly focuses on detecting communities in static networks, i.e., networks without temporal properties.However, real-world networks, such as online social networks, are not static: they often possess temporal properties, as nodes and edges can appear and disappear.Conventional social network analysis methods aggregate all observed interactions over a period of time and build a static network to represent all the interactions over that period [3].Inevitably, such representation is unable to capture temporal changes in the communities.For example, in a Twitter retweet network [4,5], the evolution of the community structure could be caused by new users starting to retweet a specific account or by members of a community that suddenly stop doing so.Treating these types of networks as static can lead to invalid associations between the users, e.g., a user may show similarities with a community at a certain time and later move towards a different community.While this user would represent the only connection between the two communities, omitting the temporal properties poses the risks of obtaining the aggregation of these communities as a single large community.In addition, time can also affect content semantically: for example, the meaning of the hashtag #MeToo changed dramatically with the emergence of the #MeToo social movement.Thus, enabling community detection in such dynamic networks is crucial.
A dynamic network can be represented by a time series of static networks called snapshots [6].Each snapshot corresponds to the interactions aggregated over a defined period, such as a week or an hour.An intuitive method to detect communities from a dynamic network partitioned in this way consists in employing well-studied static community detection algorithms on each snapshot.
Next, communities in dynamic social networks can be tracked by identifying the events that shape the evolution of a community over time [7,8,9].This step involves matching the communities found in different snapshots through an algorithm, usually based on the similarity of community members.Therefore, an arbitrary threshold is required to determine if two or more communities found in different snapshots represent the same community at different times.The main drawback is that the final result may change drastically depending on the selected threshold value: a high similarity threshold means less tolerance for fluctuating members, whereas setting a too-low similarity threshold relates dissimilar communities to each other.The scarcity of threshold-free frameworks negatively affects their real-world implementation, which could instead prove useful in countering threats that plague online social platforms such as disinformation [10,11] and information manipulation [12,13].
In this paper, we propose a threshold-free framework to track communities evolution and structure over multiple snapshots of a dynamic network.Given the communities detected at each snapshot, we represent the entire network as a similarity network, where the nodes are the static communities found, and the weight of the edges corresponds to their similarity.By applying Local Modularity Optimization [14], we reveal the groups of nodes having high modularity and similarity.In this manner, we turn the community matching task into a modularity optimization problem where dynamic communities are found by optimizing modularity locally on all nodes.The node groups found embody the evolution of a community over time.Finally, each group is disentangled to reconstruct the temporal evolution of the represented dynamic community.
Unlike most approaches, our framework does not require any threshold to be set.
To evaluate the proposed framework, we used four synthetic dynamic networks [7] containing embedded communities and events.Results show that, in most cases, our approach performs better than other state-of-the-art community tracking methods.In the second part of our evaluation, we show how the proposed approach can be applied to a real-world co-hashtag Twitter network [15] to extract relevant dynamic information about communities.
In the next section, we provide a brief overview of existing research in dynamic community tracking.In Section 3, we provide a detailed description of the proposed framework.The evaluation and comparison of our framework with other approaches are given in Section 5. Furthermore, Section 5 presents the results on a real-world Twitter co-hashtag network.Section 6 concludes the paper with a summary and suggestions for future work.

Related work
Since most social networks evolve over time [16], the evolution of social networks over time has attracted increasing interest from researchers over the last few years.In the following we report works that, like the one described in this paper, are based on existing static community detection algorithms that are adapted to deal with the temporal nature of social networks [17].This approach is appropriate for social networks with highly dynamic community structures [18].One of the first works in which static network snapshots have been used to track the evolution of the communities over time is [19].The authors proposed a method based on agglomerative hierarchical clustering to identify and track stable clusters over time.Authors of [20], extended the clique percolation method to track events in the evolution of dynamic networks.After building joint graphs for pairs of consecutive snapshots, they subsequently matched the clique-based communities obtained using an autocorrelation function to find overlap between two states of a community.The authors of [21] described a community event identification strategy between communities detected in two consecutive snapshots that is implemented as bit operations.In [7] a community matching strategy is reported to efficiently identify and track dynamic communities in multiple snapshots through weighted bipartite matching, whereas [18] provided an event-based framework to detect transitions between communities in consecutive snapshots.In a later work from the same authors [8], the event definition formula has been improved to track community transitions throughout the observation time, no longer restricting it to consecutive snapshots.The authors of [9] proposed the Group Evolution Discovery (GED) framework.This method, in addition to the similarity of community members, considers nodes position and importance within the community to match communities and identify the evolution of the community in successive snapshots.More recent research [22,23] proposed a method to detect and model the evolution of a community based on a novel similarity measure, named mutual transition.In [17], a method for the Identification of Community Evolution by Mapping (ICEM) has been presented.In this approach, community evolution is tracked by the community members, which are implemented in a hash-map.More information on [7,8,18,9,22,23,17] is provided in Section 5, as these studies have been used for comparative purposes.

Methods
Before presenting the framework, let us describe a few general concepts related to dynamic social networks.

Dynamic communities
We represent a dynamic social network as a set of n graphs {G 1 , G 2 , ..., G n } where G i includes only the set of nodes and edges in a particular snapshot i. Next, we define as .., C i ki } the set of k i communities found at the i th snapshot.The problem we address is to find a set of m dynamic communities D = {D 1 , D 2 , ..., D m } that occur in one or more snapshots.Each dynamic community D j is represented by a timeline of its constituent communities ordered in time.Figure 1 shows an example of timelines for five dynamic namic communities independently or as a single entity composed of the union of all their communities; in this case, D (2,3) and D (4,5) .

Critical events
In order to track communities and their structural changes over time, we need to define structural events to describe the evolutionary behavior of dynamic social networks.In [20] and [21], some types of events are proposed, but they do not cover all the possible ways in which a community can evolve.This paper uses the same models of critical events as proposed in [7]: Growth (a community gains new members), Contraction (a community loses some members), Merging It is worth noting that a community may not necessarily be present in every snapshot.Considering the example in Figure1, the community in D 2 is observed at the 1st snapshot and then again at the 3 rd snapshot.This "intermittence" can be caused by the behavior of community members or depend on the duration granularity of each snapshot.

Similarity
When tracking dynamic communities and the evolutionary events of their constituent communities, the key to finding relationships between communities from different snapshots is similarity.Most previous works require setting a similarity threshold: communities are considered related to each other only if their similarity value crosses this threshold.In dynamic social networks where the structure evolves, unstable communities may gradually lose members while gaining new ones.Such communities can last long, even if all of their original members have left.An excessively high similarity threshold results in nonidentifying this type of community, whereas a low similarity threshold relates disparate communities to each other.
The framework proposed in this paper exploits similarity, but does not require any threshold to be set.Here, we compute the similarity as the overlap of members between two communities.Let C i and C j denote the sets of communities found in the snapshots i and j, where i = j.Further, let α ∈ C i and β ∈ C j .We define the overlap between communities α and β as: where |α| and |β| are the number of members, respectively, in α and β, while |α ∩ β| is the number of members they share.Although we opted for overlap as a similarity metric, the approach described here is not constrained to any particular metric.Therefore, it is possible to use a different measure as needed, for example, Jaccard as used in [7].
6. Disentangling and reconstruction of the temporal evolution of communities.

Figure 2:
Workflow of the proposed approach.Starting from a network, we proceed dividing it in snapshots.Then, we apply a static community detection algorithm on each snapshot.
We build the community similarity network using the communities found in the previous step.
Through local modularity optimization dynamic communities are found.For each dynamic network, communities are reordered by the origin snapshot, bringing out critical events.

Community tracking
In order to identify the dynamic communities, communities extracted at each snapshot need to be matched with the communities belonging to other snapshots.In Figure 2 are reported the steps that make up the framework.
Starting with an initial network, we first break this down into snapshots.Second, we apply a static community detection method on each snapshot.Third, we construct the community similarity network, where each static community is represented by a node.This step does not consider the source snapshots of the various static communities, thus allowing matching between communities found in non-consecutive snapshots.Then we find the communities forming the dynamic communities by optimizing modularity locally on all nodes.Finally, we reconstruct the temporal evolution by reordering the communities at the temporal level and deriving the occurred events.

Community similarity network
In our framework, the matching step relies on a weighted undirected com- where the vertex set (nodes) V c is made up of all the communities found at each snapshot, E c is the set of edges (links) linking the communities, and W c the set of weights assigned to each edge, corresponding to the similarity value between the linked communities.An edge exists between two communities only if they originate from different snapshots and their similarity value is greater than zero.

Local modularity optimization
After building the similarity network, we identify groups of similar nodes by optimizing modularity locally on all nodes.Modularity is a scale value between [−0.5, 1] and measures the density of links within communities compared to links between communities.Modularity is the most widely used metric to evaluate community structures [24].By optimizing modularity, we seek a community assignment for each node in the network such that modularity M is maximized using the function defined by: where A is the adjacency matrix with A ij representing the weight of the edge between the vertices i and j, k i = Σ j A ij is the sum of the weights of the edges attached to the vertex i, c i is the community to which vertex i belongs, δ is the Kronecker delta function δ(c i , c j ) returns 1 if c i = c j and 0 otherwise and m = Σ ij A ij is the sum of the weights of all edges in the graph.In modularity optimization, weights are interpreted as the strength of connection, influencing the optimization process.To optimize modularity, each vertex in the network is assigned to its community.Then, each vertex i is removed from its community and moved to the community of each neighbor j.When a vertex is assigned to a new community, the modularity increase is calculated: where Σ in is the sum of all the weights of the links within the community i is moving, k i is the weighted degree of i, k i,in is the sum of the weights of the edges between i and other nodes in the community that i is moving, and m is the sum of the weights of all edges in the network.Then, once ∆M is calculated for all communities connected to i, the community that resulted in the most significant increase in modularity encompasses i.If no increment is encountered, i remains in its original community.The process described above is applied to all vertices until no modularity increase occurs.This is the same approach as the Louvain algorithm in its first phase [14].However, differently from our approach, it proceeds with a second phase, building a new network where the vertices represent the communities found in the previous phase.Then it applies the first phase to it, further optimizing the modularity.A complete run of both phases is called a pass.Such passes are carried out repeatedly until a maximum of modularity is achieved.As our goal is to aggregate vertices with high similarity and not to reach maximum modularity, we leverage only the first step.Moreover, due to the underlying characteristics of the community similarity network, applying the entire procedure provided by Louvain may result in obtaining a low granular set of communities, thus resulting in communities where some members can be poorly related to others.Therefore, by applying local modularity optimization to the vertices of the community similarity network, we obtain groups with high modularity composed of similar communities that correspond to the dynamic communities featured in the dynamic network.

Events reconstruction
Once the dynamic communities have been extracted from the community similarity network, we proceed with the identification of the events that describe their evolutionary behavior.Given a group of communities D, let us define more formally each critical event as follows: • Growth -a community grows at the i th snapshot C i ∈ D, if there exists a community at the j th snapshot C j ∈ D, where i > j, that shares members and is smaller than • Contraction -a community contracts at the i th snapshot C i ∈ D, if there exists a community at the j th snapshot C j ∈ D, where i > j, that shares members and is bigger than • Merging -a community C i is the result of a merge at the i th snapshot, if there exists a set of communities C j set = {C j 1 , C j 2 , ..., C j n }, where C j set ∈ D and i > j, where each community in C j set shares members with , where C i set ∈ D and if there exist a community C j ∈ D, where i > j, where each community in C i set shares members with • Birth -a community C i is born at the i th snapshot, if there does not exist a community at the j th snapshot C j ∈ D, where i > j, that shares • Death -a community C i is dead at the i th snapshot, if there does not exist a community at the j th snapshot C j ∈ D, where j > i, which shares

Experiments and Results
In this section we describe the experiments to compare our framework with the state of the art.Also, we provide an example based on real-world data to show the potential of our framework in extracting valuable information from social interactions.

Datasets
To validate our method, we use synthetic and real-world datasets.For benchmarking purposes, we use the synthetic networks contained in the four benchmark datasets proposed in [7].These synthetic networks have been constructed through a dynamic extension of the static LFR benchmark [25] to model different types of evolutionary behavior of communities over time.Each graph comprises 15,000 vertices and contains five static networks, which means there are five snapshots to simulate evolving communities.For each snapshot, the ground-truth information about communities is available.The nodes in each of the four synthetic networks have a mean degree of 20, a maximum degree of 40, and a mixing parameter value of µ = 0.2, which controls the overlap between communities.Furthermore, at each snapshot, 20% of the nodes changed their memberships to reflect the natural movement of members between communities over time.Synthetic networks were designed to contain all types of community evolution events: • BirthDeath: 40 new communities are constructed to replace 40 existing communities.
• ExpandContract: 40 randomly selected communities expand or contract their 25% size at each snapshot.
• MergeSplit: 40 communities are randomly selected to be split, while 40 mergings of two random communities happen.
• Intermittent: 10% of existing communities are unobserved for their concealment at each snapshot.

Evaluation metric and experimental setup
As noted previously, our framework is independent of the choice of the static community detection algorithm applied at each snapshot.To produce a fair evaluation, we used the same approach for static community detection for all the considered competitors, namely the Louvain algorithm [14] choosing the hierarchical level at which the number of communities most closely matches the number of communities in the ground-truth at the 1 st snapshot.To evaluate the performance in dynamic network tracking, we adopted the same approach as in [7,26,27].For each evaluated competitor, dynamic network tracking was performed considering two scenarios: i) using the ground-truth of memberships to static networks at each snapshot; ii) using the static networks derived by Louvain as mentioned above.Then, we used Normalized Mutual Information (NMI) to compare the memberships to dynamic networks found in these two scenarios.
The NMI is a well-known entropy measure in information theory, which measures the similarity of two clusters.It is defined as: The NMI is defined as: where H(X) is the entropy of the random variable X associated with an identified community, H(Y ) is the entropy of the random variable Y associated with a ground-truth, and H(X, Y ) is the joint entropy.A NMI closer to 1 indicates higher similarity, and thus a more robust dynamic community detection with respect to the presence of noise in the information related to the static communities available at each snapshot.Such resilience is paramount to be able to perform reliable dynamic communities tracking in real-world data.
Since the datasets consist of five snapshots, we obtained five values of NMI per dataset.For each dataset, we calculated the NMI in five sequential experiments.While for the first evaluation, the timeline is composed solely of the 1 st snapshot, for each succeeding experiment, we added the subsequent snapshot, ending with the evaluation where the entire timeline composed of five snapshots is analyzed.As mentioned above, we compared our framework with other state-of-the-art approaches using the same community sets derived by Louvain for an objective and consistent comparison.Since some of these approaches require a threshold to be set, we tested them with different threshold values: (0.1; 0.3; 0.5).To select a threshold value to use for comparison, we calculated the average of the five NMI values for each dataset.Then, for each approach, we selected the threshold value that gave the highest average NMI.

Algorithms
In this section, we briefly describe the state-of-the-art algorithms used to compare our proposed method.The approaches we will describe are those proposed by: Greene [7], Takaffoli [8,18], Brodka [9], Tajeuna [22,23], and Mohammadmosaferi [17].Virtually, these frameworks and ours begin by considering the dynamic network as a series of snapshots.Then, they independently identify community structures at each of these snapshots using a community detection algorithm,.However, the approaches differ in the strategy and measure of similarity used to track and detect the changes that communities may undergo over time.
Greene et al. [7]: in this work, the authors proposed a heuristic threshold-based method, which allows for many-to-many mappings between communities across different time steps.The strategy proceeds as follows.First, a static community finding algorithm is applied at each snapshot.Communities belonging to the 1st snapshot are assigned to a dynamic community.Then, the consecutive snapshot communities are compared with each dynamic network's front community.The front community of a dynamic network is the community found in the most recent snapshot belonging to that dynamic community.
To perform the matching, the authors used the Jaccard coefficient for binary sets [28]: The pair is matched if similarity exceeds a matching threshold k.Furthermore, the authors assumed that a community emerged at a certain snapshot t is considered dissolved if there was no matching after d consecutive snapshots.This condition allows the discovery of evolving communities at non-consecutive snapshots.For our comparisons, we assumed d = inf to enable matching with all non-consecutive snapshots.The approach gave the best results with k = 0.1.Takaffoli et al. [8,18]: in their method, after producing sets of static communities for each snapshot, the authors looked for critical events occurring at consecutive and non-consecutive snapshots.The tracking process compares communities at distinct snapshots, through the following similarity measure: In their work, the authors automatically determined threshold similarity k using a text-mining approach, since they assessed their approach on networks incorporating content information.Since we evaluate synthetic networks, it is not possible to determine the threshold automatically.From our experiments, we obtained the best results for k = 0.3.Brodka et al. [9]: the authors of this work developed a framework called group evolution discovery (GED), which can identify overlapping communities.

Threshold
The authors of previous approaches relied on a similarity metric to identify the changes and evolution that communities can undergo.In this work, this measure has been extended by including a topological metric in their comparison: where N I α (x) is the value of the importance of node x within the community α.This measure can be any centrality metric, e.g., centrality degree, social position, betweenness degree, page rank, etc.Although the comparison is made at consecutive timestamps, the inclusion effect helps to track overlapping and non-overlapping communities.The method requires two threshold values, k and j.We used the same value for k and j and obtained the best results for k, j = 0.1.Tajeuna et al. [22,23] : in this approach, each community has been represented as a vector (transition probability vector ) containing information on the number of members in common between communities over time.Then, the authors compared the vectors corresponding to different communities: given two communities α and β and their transition probability vectors v α and v β the similarity between the communities is found as:

Threshold
where probabilities p α,x and p β,x are components of the vectors v α and v β respectively.The threshold value k is set automatically as the predicted point of intersection of two Gamma curves based on the non-zero values obtained by rating the similarity of two transition probability vectors.
Mohammadmosaferi et al. [17] : the authors of this work introduced a novel method for Identification of Community Evolution by Mapping (ICEM).In this approach, a hash map is used to map the members of each community into a pair, which includes the snapshot and a community index.From the second snapshot, ICEM builds a similarity list for each community and determines the evolution of a community based on that list.In addition to common critical events, the authors identified partial events resulting from partial similarity between two communities: where α is a community from the i th snapshot and β a community from the j th snapshot, where i < j.The communities α and β are partially similar if sim(α, β) > k and sim(β, α) > k, while they are very similar if sim(α, β) > j.
Therefore, k and j are the thresholds to identify partially similar and very similar communities, respectively.Since our concern is not to distinguish different types of critical events in this evaluation, we set j = 0.5 and obtained the best results for k = 0.1.

Results of the evaluation and comparison
Table 6 shows the NMI values per dataset achieved by the proposed framework at each snapshot.We observe that the worst results are obtained with the MergeSplit dataset.Merge and split events are the ones that most distort the dynamic communities structure and, therefore, the most difficult to track.GED and Takaffoli obtain the worst results.The poor results obtained by GED may depend on its inability to relate non-consecutive communities to each other, thus leaving the newly identified dynamic communities without a proper followup.Interestingly, the evaluated approaches achieve their worst results with the MergeSplit dataset.However, even for this dataset, our method does not fall below 0.96 in terms of NMI.The design of our framework seems effective in handling this type of event since, instead of performing sequential matching, it aims to optimize modularity in a network where the temporal component is omitted.This makes it more robust to structural changes in communities across snapshots.Moreover, using a threshold parameter also affects the number of dynamic communities found by a given method.Figure 4 shows the number of communities found by each method on the MergeSplit dataset.Our proposed method, along with Tajeuna and Takaffoli, produces a number of communities quite close to that present in the ground-truth.The reason why Greene and ICEM have fewer communities is related to the low threshold value set.Lowvalue results in more matches between communities, merging them together.On the other hand, GED, as noted earlier, suffers from comparing only communities present in consecutive snapshots.

Application to Real-World Data
In our second evaluation, we used the dataset provided by [15] to test the behavior of the proposed framework on a real dataset.This is a dataset con-   taining the activity of 63,358 fake Twitter accounts, which produced 5,457,758 tweets during 2020.By the term fake account, we refer to those social media accounts that contain false information or pretend to be a real person or organization [29].This kind of dataset is ideal for experimenting with the proposed method, as the tweets from the accounts were not collected using predefined languages or keywords that tied the activity to specific topics.We preprocessed the data to produce a weighted undirected graph for each snapshot, which in this instance equals one day of activity.Daily graphs were obtained by constructing a co-hashtag network: in this graph, two users are connected if they used the same hashtag on the same day.The weight of each edge is given by the number of hashtags used in common between two users.To find the static communities at each snapshot, we used the Order Statistics Local Optimization Method (OSLOM) [30], which is based on measuring the of communities compared to a null model without community structure [31].After identifying statistically significant communities by agglomerating neighborhood nodes, OSLOM performs several iterations of adjustments, like node deletion or addition, in order to increase the significance of the communities.We chose OSLOM for its good performance in community detection in online social networks [32,33].Its main drawback is that it tends to find low-dimensional static communities [34] that may not represent the meaningful community structure of the network [35].The latter is not relevant to the experiment, since through the proposed framework, the identified static communities will be joined into bigger dynamic communities.
Having the static communities identified by OSLOM, we applied our frame-work, obtaining 103 dynamic communities.
To investigate the behavior of the found dynamic communities over time, we extracted the hashtags used in each static community, namely the hashtags used daily by each dynamic community.Then, we introduced a new metric related to a dynamic community: the average hashtags overlap.Given two sets of hashtags h 1 and h 2 , the hastags overlap is found as: The average hastags overlap of a dynamic community D is found by averaging all the hastags overlaps between the static communities belonging to D. An acceptable alternative metric might have been the one used in [8], where instead of keywords or hashtags, the authors relied on topics.Moreover, since this is a dataset composed of the activity of sold fake accounts during a one-year period, it is not unlikely that communities of accounts used quite different sets of hashtags on different days.Despite this, a set of hashtags effectively represents a community on a specific day.
By averaging the hashtag overlap values among all static communities belonging to a dynamic community we obtain insights on the characteristics of the identified dynamic communities.In Figure 5 we plotted the found dynamic communities against the overall number of members on the x axis and and the days of activity on the y axis.Moreover, through color, we reported the average hashtags overlap of each dynamic community.A correlation between the number of members and the days of activity emerges from Figure 5, i.e., the more users there are, the more days a community tends to be active, and vice versa.
In addition, dynamic communities with high values in terms of members and days of activity also exhibit a low value of average hashtags overlap, meaning that during the one-year period, these dynamic communities discussed disparate topics.A low value of average hashtag overlap does not imply that a dynamic community is not relevant, as this metric may be affected by other factors, such as the strategy eventually employed.
As an example, in Figure 6, we reported a time glimpse of a dynamic com-munity ranging from February 9 to February 13, 2020.To better understand temporal behavior, each community is represented by its ten most used hashtags.We observe that initially, there is only one community using mostly generic hashtags such as #relax, #beach, and #nature and some more specific ones such as #apple and #iphone.The next day, this community split into two communities, one of which turns out to be larger than the initial one due to the arrival of new users.This community resumes the hashtags used in the previous day regarding apple products, introducing others such as #iphone11 and #iphone12.Instead, the other community focuses on cryptocurrency and blockchain technology through the hashtags #btc, #bitcoin and #blockchain.
On the following day, a portion of users from one and the other merge into a new community centered on hemp-related topics: #cannabis, #cbd, #cannabisnews.However, this community expands on the subsequent day, reverting to more generic hashtags.This behavior continues the next day, where we find #funny yet also #valentines day since it is Valentine's Eve.
The given example shows how the topics addressed by users of an online social network can vary drastically from day to day.Observing this kind of phenomenon and, more generally, studying the behavior of a community over time is possible exclusively by taking into account its temporal properties, that is, by analyzing it as a dynamic community.

Conclusions
In this paper, we have described a framework to track dynamic communities and their evolution over time.Unlike most frameworks proposed in the literature, the community matching phase does not require a threshold value since it is based on modularity optimization, which is applied on a weighted undirected community similarity network.We used synthetic graphs with embedded events to evaluate the framework and compared the results with those obtained using other state-of-the-art frameworks.Notably, for most competitor frameworks we first performed a series of experiments to determine the best threshold value.
Notwithstanding, our framework achieved scores comparable with the others, even outperforming them in several instances.Notwithstanding, our framework mostly got better scores without the need to compute a threshold value.In addition, we have presented a preliminary evaluation on a real-world Twitter network.Our framework revealed 103 dynamic communities with different characteristics.Experiments on this network suggest that, especially with regard to networks built from online social media, dynamic communities found can be very dynamic in terms of covered topics.Since our method is free from any threshold value, it provides consistent outcomes independent from the frequency of fluctuations of the community members.Moreover, the proposed framework is independent of the choice of the underlying community detection algorithm, therefore the latter may be chosen based on the properties of the network, e.g., weighted or unweighted, directed or undirected.When used on online social media, the proposed framework turns out to be useful as it facilitates the identification of user groups with similar interests or behaviors.In addition, tracking dynamic communities can assist in studying the behavior of influencers and key opinion leaders within a community from a temporal perspective, thus providing a valuable addition to research applications.

Figure 1 :
Figure 1: Example of five dynamic communities tracked over five snapshots, featuring growth, contraction, merging, splitting and death events.

(
two or more communities merge into a new one), Splitting (a community is split into two or more new ones), Birth (a new community appears), Death (a community disappears).Examples of such events are shown in Figure 1: the community in D 1 grows at the 2 nd snapshot; the community in D 4 contracts at the 3 rd snapshot; the community in D 3 merges with the community in D 2 at the 4 th snapshot; the community in D 4 splits into two communities, respectively, in D 4 and D 5 at the 5 th snapshot; the communities in D 1 , D 2 and D 4 were born at the 1 st snapshot, the community in D 3 at the 2 nd snapshot, and the community in D 5 at the 5 th snapshot; the community in D 1 dies at the 4 th snapshot.

Figure 3 :
Figure 3: NMI of the proposed dynamic community tracking framework and state-of-the-art methods on the four synthetic datasets.

Figure 4 :
Figure 4: Number of communities found on the MergeSplit synthetic dynamic network, by the proposed dynamic community tracking method and state-of-the-art methods, relative to the number of communities in the ground-truth.

Figure 5 :
Figure 5: Points are plotted against the days of activity on the y axis and the overall number of members on the x axis.Each point represents a dynamic community and the color encodes the average hashtags overlap value.

Figure 6 :
Figure 6: Alluvial diagram representing the structure of the analyzed dynamic community.

Table 1 :
NMI values obtained for each dataset with the Greene framework.

Table 2 :
NMI values obtained for each dataset with the Takaffoli framework.

Table 3 :
NMI values obtained for each dataset with the Brodka framework.

Table 4 :
NMI values obtained for each dataset with the Tajeuna framework.

Table 5 :
NMI values obtained for each dataset with the Mohammadmosaferi framework.

Table 6 :
NMI values obtained for each dataset with our framework.