Polarization dynamics: a study of individuals shifting between political communities on social media

Individuals engaging on social media often tend to establish online communities where interactions predominantly occur among like-minded peers. While considerable efforts have been devoted to studying and delineating these communities, there has been limited attention directed towards individuals who diverge from these patterns. In this study, we examine the community structure of re-post networks within the context of a polarized political environment at two different times. We specifically identify individuals who consistently switch between opposing communities and analyze the key features that distinguish them. Our investigation focuses on two crucial aspects of these users: the topological properties of their interactions and the political bias in the content of their posts. Our analysis is based on a dataset comprising 2 million tweets related to US President Donald Trump, coupled with data from over 100 000 individual user accounts spanning the 2020 US presidential election year. Our findings indicate that individuals who switch communities exhibit disparities compared to those who remain within the same communities, both in terms of the topological aspects of their interaction patterns (pagerank, degree, betweenness centrality.) and in the sentiment bias of their content towards Donald Trump.


Introduction
Social media platforms have become influential spaces for political discourse, enabling users to express their opinions and engage with political figures.However, the exchange of ideas among users with opposing ideologies remains limited, primarily due to the formation of echo chambers [1].Echo chambers refer to homogeneous communities where individuals gather based on their shared beliefs, limiting their exposure to alternative viewpoints and hindering meaningful debates [2].These communities have been found in multiple platforms [3,4] and contexts such as the 2010 U.S. congressional midterm elections [5], the climate change debate [6] or the 2011 parliament elections in Germany [7].
Prior studies have extensively documented the adverse implications associated with closed communities and echo chambers.Specifically, these include the increase in negative discourse, hate speech and political extremism [8], confirmation bias (i.e. the users tendency to seek out and receive information that strengthens their preferred narrative) [4] and spreading of rumors and fake news [9,10].
The exploration of how users change their communities constitutes a vital area of investigation within computational social science [11].This phenomenon holds significant real-world implications, particularly as political elections can hinge upon individuals who swing from one political party to another [12].By studying the behavioral patterns of users within these dynamic social networks, valuable insights can be gained for the design of better communities.Additionally, the study of users who transition from a political community to other political community helps in the comprehension of polarization and depolarization processes, thus contributing to the development of strategies aimed at mitigating polarization and extreme ideological divisions, fostering a healthier political discourse.
There is limited research on individuals who transition between ideological communities.In this context, Wang et al. focus on understanding the "bandwagon fan" phenomenon in Reddit sport communities, showing how some sports fans start following a new sports team solely because of its recent success [11].Albanese et al. propose a method based on XGBoost, the Louvain method for community detection and topic modeling to detect users who change their political community on Twitter [13].Zolezzi et al. analyze the flow of users between these communities [14].However, their approaches do not incorporate sentiment analysis to determine whether these changes reflect only shifts in interaction patterns or also signify changes in political perspectives.Thus, the comprehensive understanding of opinion dynamic patterns of individuals switching from one political community to another remains relatively unexplored in the existing literature, to the best of our knowledge.
To address these gaps, we first curate a Twitter dataset with 2 million tweets about US president Donald Trump during the 2020 US presidential election year.We identify communities within two distinct time periods in the retweet interaction network.We propose a method to automatically detect users who change their political community.Then, we examine the salient features between users who shift their community and those who remain within their original community.We found that individuals who switch communities differ both in the topological aspects of their interaction patterns as well in the bias of their content towards Donald Trump.
The main contributions of this work are: • We propose a machine learning framework that effectively identifies and characterize users who undergo online community changes.We find significant correlations between community-shifting and changes in sentiment towards Donald Trump.
• We found that users who change their community have lower pagerank values, indicating reduced relevance in the online debate and their messages did not spread.
A possible interpretation of these results is that a user changes community when they do not have strong affinities with their community and their messages have no response.
The paper is organized as follows.In the "Data" section, we describe the data used in the study and how we collect it.In the "Method" section, we describe the unsupervised community detection framework, the sentiment analysis pipeline, the definition of community-shifting users and how we characterize the users interactions.In the "Results" section, we present the main findings of this work.Finally, we interpret these results in the "Conclusions" section.

Data
Twitter provided developers with a range of APIs to facilitate the development of applications.The Streaming API, which enabled real-time acquisition of tweet samples from the platform, allowed developers to filter the data based on language, terms, hashtags, and other criteria [15].The dataset used in this study includes attributes such as tweet ID, text, timestamp, user ID, username.In the case of retweets, information about the original tweet's user account is also included.
For the purpose of our research, we used a dataset of tweets mentioning Donald Trump in the United States during the 2020 presidential election year.The dataset was collected using the keyword "realDonaldTrump", the official Twitter account of President Donald Trump.The data collection process encompassed two distinct time intervals: May 9th to May 16th (hereafter referred as t 1 ) and June 10th to June 16th, 2020 (hereafter referred as t 2 ).A subset of the dataset employed in this study has been previously used in [13].To ensure consistency and alignment in the analysis, we follow the methodology of that work, considering only users who received or made more than five retweets in each time period.This threshold of five retweets serves as a trade-off between having a very low filter, which could compromise the robustness of community detection, and setting a very high filter, which could result in the exclusion of a significant amount of data.

Ethical Considerations and Data Availability
This study strictly adheres to ethical considerations by exclusively utilizing publicly available data [16].No data from private user accounts were collected or utilized.The tweet IDs were made publicly available for reproducibility and transparency purposes ‡. ‡ https://github.com/fedealbanese/community-changing-users/

Community detection
A graph community refers to a subset of nodes within a larger network that exhibits a higher density of connections among its constituent nodes compared to nodes outside the community [17].In the context of our Twitter dataset, it can be represented as a group of users who are more interconnected through retweets [18], for instance.
The interaction among individuals in our study is represented as a graph, denoted as G = (N, E), where nodes (N ) correspond to users and edges (E) represent retweets between them.To accurately capture the dynamics of multiple retweets from one user to another, we model the graph as a directed and weighted network.The edge points from the content creator to the retweeter, representing the flow of information through the network, originating from the content creator and spreading to the users who amplify the message [13].
The Stochastic Block Model (SBM) is a method that can be used for uncovering the latent communities within a network [19].In contrast to alternative and popular methods that maximize modularity [17], the SBM incorporates hierarchical priors and Bayesian inference techniques, enabling the model to provide a robust representation of the underlying data [20].On the other hand, models that focus on maximizing modularity do not adequately account for deviations from the null model in a statistically consistent manner [21].Therefore, such models primarily serve as descriptive tools and lack the ability to support inferential conclusions.In this study, we employ the SBM implemented in the graph-tool python library to discern communities within directed and weighted networks [22].

Sentiment Analysis
Sentiment analysis is a natural language processing technique employed to ascertain the sentiment or emotional tone expressed in a text.There are multiple approaches in order to classify the sentiment of a text such as Rule-Based algorithms [23], machine learning models like Support Vector Machine (SVM) or Naive Bayes (NB) [24] and deep learning networks [25].Considering the characteristics of our dataset (tweets with short and informal text), we use VADER (Valence Aware Dictionary and sEntiment Reasoner), that is a lexicon and rule-based sentiment analysis tool which was specifically design for social media content [26].For each tweet, VADER gives a sentiment score.According to the tool documentation, the score takes values from 1 to −1 where a score bigger than 0.05 is considered positive, a score between 0.05 and −0.05 neutral, and lower than −0.05 negative.The sentiment score of a user is defined as the average sentiment of their tweets.

Users interactions
By considering users as nodes and retweets as edges within the retweet network, we characterize users by computing the following graph metrics: (i) In-degree: the number of retweet the user did.
(ii) Out-degree: the number of times they were retweeted.
(iii) Pagerank: the importance of the user within the network.
(iv) Betweenness centrality: the influence the user has over the flow of information in the network.
These metrics portray the users relationship with the other users and distinguish the relevance in the network that they might have.

Communities in the retweet networks
We use SBM to identify the communities of users of the retweet networks.In consideration of the bipartisan nature of US politics, characterized by the presence of two predominant parties (Democrats and Republicans) [27,28], we configured the SBM parameters to identify two communities within the networks.
In Fig. 1, we visualize the retweet network and the results of the community detection algorithm for each time period.The users are concentrated in two communities, portraying the political polarization in that country.The graph visualizations were produced with Force Atlas 2 layout using Gephi software [29].
We would assign the labels "republican community" to the community that have the users "@realDonaldTrump" and "@Mike Pence" (the official Twitter accounts of the 2020 republican presidential and vice-presidential candidates) and "democratic community" to the community that have the users "@JoeBiden" and "@KamalaHarris" (the official accounts of the 2020 democratic presidential and vice-presidential candidates).The sentiment analysis results discussed in the next section further support this label assignment.
In table 1 we show main features of community analysis: the size and percentage of both groups at both times and how similar they are using the Jaccard Index.First, we should notice that republican communities are consistently bigger that the democratic ones.This imbalance can be attributed to the fact that data was collected using the keyword "realDonaldTrump", the Republican President of the United State at those times.Second, we can see most of users remain in both groups when they are compared at these two moments: 95.5% for republicans and 92.7% for democrats.Those who don't are the focus of our research in this work.

Sentiment of the communities
We hypothesize that when users from different political communities mention Donald Trump, their expressed sentiments will exhibit notable discrepancies.We would expect that the republican community would have a more positive sentiment when mentioning Donald Trump and the democratic community would have a more negative sentiment when mentioning Donald Trump.In Fig. 2, we show the statistical distribution of the sentiment values of the users when mentioning President Donald Trump in each community and time periods.Given the complex nature of the data, we use bootstrap sampling, a method in which we repetitively draw sample data with replacement form the data distribution to estimate mean values and confidence intervals which gives simple representation of the data [30].Fig. 2 depicts both the original distribution of sentiment values for t 1 (a) and t 2 (b) and the bootstrap sample of mean values, also for t 1 (c) and t 2 (d).Results show that the users of the republican community have positive mean sentiment scores, meanwhile users from the democratic community have negative mean sentiment scores confirming our hypothesis.These result is consistent for both time periods.We also found that there are statistically significant differences between the sentiment scores of the communities (p−value < 0.01) using the Mann-Whitney U non-parametric test [31] and the Kruskal-Wallis H-test [32] at t 1 and at t 2 .Also, it is worth noticing that the differences between the communities is stable thought time.
While our analysis has revealed distinct sentiment scores for each community when mentioning Trump, not all community members align with these sentiments.In particular, 71.6% of the users in the democratic community users express negativity, and 67.7% of the users in the republican community users have a positive sentiment score.Although these percentages represent a majority sentiment alignment, it's noteworthy that there exist users within these communities whose sentiments deviate from the prevailing trend.

Users who change their community
In this subsection, we aim to explore the differences between those users who change communities when comparing them at two different time instants t 1 and t 2 .Understanding the differences between them can provide insights on how individuals engage with other users from diverse ideological communities.Formally, we define a user as a community-shifting user (u c ) when their community at time t 1 and t 2 are different.On the other hand, we define a user as a non communityshifting user (u nc ) when their community at time t 1 and t 2 are the same.

Topological analysis
From the community detection performed previously, we can identify those users who change their community.Table 2 shows the graph metrics at t 1 for the users who change their community (u c ) and those than remains in their own (u nc ).Taking into account that u nc represent between 5% and 8% of the total members of each communities, mean values and confidence intervals were estimated through a subsampling approach.Specifically, we conducted 10.000 iterations of random sub-sampling with replacement, measuring the mean of each sub-sample.The resulting distribution of means was then utilized to report the mean and standard deviation values.
We found that, in all four metrics, there are statistically significant differences (p − value < 0.01) between u c and u nc using the Mann-Whitney U nonparametric test [31] and the Kruskal-Wallis H-test [32].It should be notices that u c have smaller values than u nc in all cases.These behavioral patterns can be attributed to several factors.Firstly, users who will switch communities display lower values of in-degree and outdegree, meaning that they are not active retweeters nor retweeted.Also, the fact that u c had statistically lower pagerank values mean that those users were less relevant to the tweeter conversation and their messages did not spread in their original community.A possible interpretation of these results is that a user changes community when they do not have strong affinities with their community and their messages have no response.

Sentiment score
In this section we first analyze the mean sentiment scores towards Donald Trump for users from Republican and Democratic communities who either maintained (u nc ) or changed their community (u c ) affiliations at t 1 .We would like to know if the differences between these two groups found in topological properties have a counterpart in this feature of tweets content.
Equivalently to table 2, we report the mean values and confidence intervals using the same sub-sampling approach.For Republican users, the mean sentiment score at t 1 for those who remained within their community was s nc = 0.081 ± 0.433, while for the observed differences in sentiment scores between users who changed and those who did not were not statistically significant, which is consistent with the fact that the confidence intervals exceed the mean values in an order of magnitude.Therefore there is limited predictive capability based on sentiment scores, in contrast to the topological features.
On the other hand, we also characterize the variation of the average sentiment score between the first (t 1 ) and second (t 2 ) time periods of the users when mentioning President Donald Trump in each community (the sentiment score at t 2 minus the sentiment score at t 1 ).
Table 3 show the variation in average sentiment scores between t 1 and t 2 for the users of the republican and democratic community.Results show that republican users that maintain their community (those who were in the republican community at t 1 and remain there at t 2 ) change their sentiment score on average 0.035 ± 0.082.But users who were in the republican community at t 1 and change their community to the democratic one at t 2 show variation in their average sentiment score of −0.066 ± 0.074.An equivalent effect was measure for democrat users.Therefore, we observe that users who change their communities also change their sentiment scores different than the user who maintain their community.
These variations in the average sentiment score of the users who change their community are statistically significantly different (p − value < 0.01) than those of the users who did not change their community, according to the Mann-Whitney U nonparametric test and the Kruskal-Wallis H-test.The results show that the classification of users who undergo community changes aligns with significant changes in sentiment when mentioning Donald Trump between the first and second time periods.Despite the limited predictive capability of the sentiment scores, results demonstrate a relationship between changing the community and changing their sentiment towards Donald Trump.

Conclusions
In conclusion, our research has provided insights into the dynamics of online community changes and their implications for user behavior within social networks.Through the introduction of a machine learning framework, we have identified and characterized users undergoing community shifts.
Analyzing the changes in sentiment expressed by users during community-shifts can provide valuable insights into how these transitions could impact individuals' political views.Importantly, we have found significant correlations between community shifts and alterations in sentiment towards Donald Trump, shedding light on the interplay between ideological shifts and online interactions.
Furthermore, our findings have showed that users who undergo community changes exhibit significantly lower pagerank values, suggesting a small relevance in the online discussion and limited message propagation thought the network.One plausible interpretation of these findings suggests that users may transition to a different community when their affinity to their current community weakens, and their messages have little resonance within their peers.
This work present an analysis of sentiment in tweets and user interactions, leveraging topological network metrics.The results contributes to advancing our understanding of the dynamics of online social networks and provides computational tools for effectively identifying and studying users who undergo community changes.
Future research involve the analysis of diverse datasets across various contexts, countries, and social networks.Also, examining an extended dataset spanning multiple years, employing different time windows, could offer valuable insights.

Figure 1 .
Figure 1.Retweet network of the first time period from May 9th to May 16th, t 1 , (a) and second time period from June 10th to June 16th, t 2 , (b) during 2020 US presidential election.Each node is a Twitter user and each edge (directed and weighted) represents the retweets between two given users.The nodes are colored depending on its community: the "republican community" users are in red and the "democratic community" users are in blue.

Figure 2 .
Figure 2. Sentiment score of tweets belonging to the republican and the democratic community for the first time period from May 9th to May 16th, t 1 , (a) and second time period from June 10th to June 16th, t 2 , (b) during 2020 US presidential election.(c) and (d) show the Bootsrap sample of mean values at the same times.

Table 1 .
The total number of users of each community for both time periods (t 1 and t 2 ).

Table 2 .
Mean graph features of users who change their community (u c ) and users who have the same community over time (u nc ) at t 1 .Differences are statistically significant (Mann-Whitney U nonparametric test and Kruskal-Wallis H-test).

Table 3 .
Delta of the average sentiment score between the users of the republican and democratic community at t 1 and t 2 (sentiment t2 −sentiment t1 .Note: Differences are statistically significant (Mann-Whitney U nonparametric test and Kruskal-Wallis H-test).communities, it was s c = 0.080 ± 0.401.Similarly, Democratic users who retained their community affiliation exhibited a mean sentiment score of s nc = −0.102± 0.408, whereas those who underwent a community change showed a mean score of s c = −0.087± 0.456.Conducting Mann-Whitney U nonparametric test and the Kruskal-Wallis H-test statistical tests, we determined that