Profile Update: The Effects of Identity Disclosure on Network Connections and Language

Our social identities determine how we interact and engage with the world surrounding us. In online settings, individuals can make these identities explicit by including them in their public biography, possibly signaling a change to what is important to them and how they should be viewed. Here, we perform the first large-scale study on Twitter that examines behavioral changes following identity signal addition on Twitter profiles. Combining social networks with NLP and quasi-experimental analyses, we discover that after disclosing an identity on their profiles, users (1) generate more tweets containing language that aligns with their identity and (2) connect more to same-identity users. We also examine whether adding an identity signal increases the number of offensive replies and find that (3) the combined effect of disclosing identity via both tweets and profiles is associated with a reduced number of offensive replies from others.


Introduction
Our social identities such as age, gender, or occupation play a crucial role in shaping how we express thoughts and opinions through language, and in turn, how others interact with us.In social media platforms such as Twitter, the identity that one chooses to associate oneself with can influence behaviors such as the topics one engages with or the ties one forms.One can choose to explicitly disclose their identity through various means, but the effects and consequences of such actions are largely unknown.In this paper, we perform a large-scale study to understand how explicitly disclosing social identities leads to changes in the interactions of one's social network.
Identity disclosure and management is an essential part of online behavior (Joinson et al., 2010;Pavalanathan and De Choudhury, 2015), as individuals navigate what aspects of themselves are salient to others.In more public platforms like Twitter, individuals must weight how to present themselves based on the mix of audiences who may find them (Marwick and boyd, 2011;Bazarova and Choi, 2014;Duguay, 2016).People may explicitly express social identities in social media by including phrases related to the identity in profile descriptions, as shown in Figure 1.Profile descriptions, similar to posts, also contain rich textual features associated with the user's social identity (Li et al., 2014;Priante et al., 2016;Wilson and Wun, 2020;Wang et al., 2019).Crucially, these profiles are not static: Individuals add and remove identity markers from their bios to emphasize new or specific aspects of themselves, such as political affiliations (Jones, 2021) or gender pronouns (Tucker and Jones, 2023;Jiang et al., 2022).
Disclosed social identities can affect how they are perceived and targeted by other users.Prior studies have drawn connections between the disclosure of identities-especially marginalized or minority identitiesand identity-based hate or cyberbullying, therefore hindering people from fully expressing themselves and sometimes even forcing them to hide identities online (Haimson et al., 2015;Jhaver et al., 2018).However, not all identities are marginalized and the potential varied outcomes for identity disclosure are yet to be quantified.
To understand the effects of identity disclosure, we conduct a large-scale quasi-experimental study on hundreds of thousands of users who updated their profiles to disclose a particular social identity.We observe that while overall tweet activity levels remain stable post-disclosure, their tweets contain significantly higher volumes of identity-relevant language, which we further dissect into topic and style properties.We demonstrate that this disclosure is also associated with social network changes: users actively engage more with similar-identity individuals following disclosure.Finally, we examine the number of offensive replies received from others during pre-and post-disclosure periods, where we show that contrary to existing studies (Chan, 2022;Meyer, 2003), the addition of identity signals in profiles did not lead to increased levels of received offensiveness, even for identity categories known to be prone to targeted offensiveness such as sexual and gender minorities.Overall, our findings suggest profile-based identity disclosure is an active process signaling future behavior changes in the priorities of a user.
2 Social Identities and Self-disclosure Prior work has examined identity disclosure from the perspectives of language, networks, and social interactions, particularly in online spaces.We build on the previous studies to formulate hypotheses that examine whether disclosure of social identities leads to changes in behaviors of both the user themself and how they are perceived by others.

Social Identities and Language
Sociolinguistics has long associated language with social identities of the speaker (Labov, 1966;Eckert, 2000;Pomerantz, 2007).Specifically, Bucholtz and Hall (2005) propose a framework for understanding

Create identity classifiers
Obtain covariates identity through linguistic interaction, where they suggest that identities can be indexed through linguistic aspects such as style, stances, and labels (Schilling-Estes, 2004).This framework also posits that the display of identity through language can be an intentional form of agency to meet social goals (Duranti, 2008).From this perspective, we can assert that the intention to disclose one's social identity can be reflected through their language, which may be indicative of the identity.
Our first hypothesis examines the relationship between identity disclosed through language and through profile updates.We hypothesize that the modification of one's profile to disclose a particular social identity will motivate the user to tune their linguistic style to accommodate their presented identity.H1 The addition of a social identity on a Twitter profile will lead to posting more identity-aligned tweets compared to a reference group.

Networked Effects of Identity Disclosure
People present themselves to others by controlling the amount of information available to maintain a publicly desirable image, a concept known as impression management (Goffman, 1959).This management helps achieve socially desirable goals such as maintaining reputation (Schlenker and Britt, 1999;Zivnuska et al., 2004).In social networking platforms such as Twitter or Instagram, the downstream effects of impression management can be translated into measurable outcomes such as maintaining connections with "friends" in the platform who can provide desirable effects such as social support or access to information (Lampe et al., 2007;Yan et al., 2022).We thus expect that the addition of social identity in one's profile reflects a desire to connect with like-minded others, which results in an increased effort to forge connections with people of the same identity.H2 The addition of a social identity on a Twitter profile will directly lead to establishing more network connections with users of the same identity compared to a reference group.

Consequences of Identity Disclosure
Identity disclosure can lead to undesirable consequences.Privacy is a major risk of disclosure in online spaces (Ampong et al., 2018).Also, the disclosure of minority or marginalized identities can lead to being targeted for online harassment.For example, nonbinary users consider disclosing their identity on social media a stressful event (Haimson et al., 2015;Haimson and Veinot, 2020).
As our final hypothesis, we test whether disclosure of one's identity can lead to increased hostility directed at the user.Specifically, we measure if a user becomes a target of offensive content following the addition of their identity on the profile.H3 The addition of a social identity on a Twitter profile will result in receiving more offensive replies compared to a reference group.

Identifying Identity Change in Profiles
Here, we describe our pipeline for identifying instances of Twitter users disclosing social identities on their profiles.An overview of the data collection and processing is shown in Figure 2.

Identifying Twitter Profile Changes
We first identify a set of users who have added signals of their social identity to their Twitter profiles.This information is unobtainable using just the Twitter API as it only returns a user's profile information at the time of the API call and does not provide a chronological timeline of profile changes.We instead use the Twitter Decahose dataset which contains a 10% sample of the entire Twitter activities over a period of over 12 months.We identify all activities of every user between April 2020 and April 2021.Each tweet or retweet object includes various metadata, one of which is the user's profile description at the time of the tweet.We collect all instances of user profiles for our Twitter users and sort them in chronological order, enabling us to identify when a user changed their profile.We remove verified accounts and users whose language is set to a language other than English, resulting in 15,215,776 users and 73,048,466 unique profiles.

Categorizing Social Identities
Deciding what counts as a social identity can be challenging.Here, we start from an initial set of social categories based on two relevant studies.We also create subcategory-level identities within each category, which is the basic unit of social identity in this study.This process results in a total of ten categories (Table 1) and 44 subcategories of identities (Appendix Table 2).Further details on the subcategories are in Appendix Section A.
After categorizing n-grams into identity categories and subcategories, we follow the approach from prior work (Yoder et al., 2020;Pathak et al., 2021) and construct regular expressions for each category and subcategory based on the n-grams to improve precision.For example, when constructing regular expressions for age, we ensure that the corresponding phrases include identifiers such as 'years old' or 'y/o'.
Next, we identify a set of users who have changed their profiles to disclose their social identity.We run our regular expressions on every unique profile to determine whether a profile is associated with a particular identity.We assign multiple labels if a user's profile is associated with multiple identity categories (e.g."18yo -he/him -father of two wonderful children"), but leave out profiles that our method labels as belonging to multiple subcategories within the same category when they are meant to be mutually exclusive (e.g.age -"18yo -30y/o", political affiliation -"devout democrat -conservative").Based on the mapped identities per profile, we can identify all users who satisfy the following two conditions: (1) each user has made only one change in their profile during the 1 year observation period, and (2) the only change is the addition of a new social identity-i.e., the phrase indicating identity should only exist in the changed profile and not the previous version.This filtering results in a set of 283,793 users who added a single new social identity through Twitter profiles, which we refer to as IDENTITYADDED.Tables 1 and 2 contain categoryand subcategory-level counts.
We validate the quality of our pipeline for capturing instances of identity disclosure through an annotation task.For each subcategory, three annotators are provided twenty samples which each consist of two subsequent profiles, one pre-and one post-change.The twenty samples include ten positive samples from IDENTITYADDED as well as ten negative samples, which vary from (1) no disclosure in either, (2) disclosure in both, and (3) disclosure only in pre-change.The resulting Krippendorff's α was 0.74, indicating a high level of agreement that the changes detected by our approach do constitute meaningful self-disclosure of identity.We then evaluate our pipeline by evaluating it on the majority vote from the annotations, from which we saw that 41/44 identities achieved an F1 score higher than 0.5 (Appendix Tables 3 and 4).We therefore removed the three identities with low performance: education:student, ethnicity:korean, and occupation:art.

Propensity Score Matching
Since our research questions center around behavioral changes following social identity disclosure through profiles, a meaningful measurement can be made by comparing against a control group that displays similar behaviors but does not disclose social identities through profile updates.We adopt propensity score matching (PSM), a quasi-experimental method widely adopted in observational studies involving observational social media data (Yuan et al., 2023;Choi et al., 2023).
Apart from the IDENTITYADDED users we also identify 849,901 users who (1) made one profile update during the 1-year observation period but (2) did not include any phrases of social identity in their profiles before or after the update, which we refer to as NOTADDED users.For each user in IDENTITYADDED and NOTADDED, we identify the following covariates obtained at the date of the profile change: number of days since account creation, number of friends, number of followers, number of total posts, number of tweets and retweets posted during one month prior to the time of profile update.Further details of the matching can be found in Appendix Section C.
As a result of the matching process, we are left with 283,566 treated users and 1,228,945 matched users.
We refer to the resulting matched set as CONTROL users.Figure 9 in the Appendix shows that the distribution containing the standardized mean difference of every covariate reduces sharply after matching, demonstrating the diminished effect caused by confounding covariates.

Estimating Treatment Effects
Our setting of treated and control variables allows us to perform a widely used causal inference method known as difference-in-differences (DiD; Abadie, 2005).Though DiD is most commonly used when the outcome variable is a continuous variable, it can be applied to different types of outcomes such as count variables (Cameron and Trivedi, 2013;Mark et al., 2013).Accordingly, we use the following equation: where y i,t is the outcome variable at time t for user i, T = 1 is a binary assignment status to treatment group, and t >= tr is whether time t is beyond treatment period.X i is the time-invariant covariates of i, which consist of the number of friends, followers, and total posts.All experiments are modeled as a negative binomial regression using generalized estimating equations (GEE) in statsmodels.Because our hypothesis testing are done on multiple identities, we apply the Bonferroni-Holm correction (Holm, 1979) to account for false positives when reporting significance test results from the regressions.

How does Identity Disclosure affect Language?
To understanding behavioral changes following identity disclosure, we first study whether users change their language following profile updates to include a social identity.We hypothesize that the addition of an identity signal provides a certain level of boost to represent their identity more through the content they produce and engage with.
Measuring Identity-specific Language We first construct classifiers to measure the amount of identity alignment from a tweet.Based on existing findings that posts and profile descriptions in online platforms are reflective of one's social identity (e.g., Priante et al., 2016;Preot ¸iuc-Pietro and Ungar, 2018), we assume that if a user has disclosed a social identity on their profile description for a sufficiently long time, then the text created by the user contains topical and stylistic features indicative of the disclosed identity.Accordingly, we first identify users who did not update their profile during our observation period, and identify cases where their profile did (ALWAYSPOSITIVE) or did not (ALWAYSNEGATIVE) include an identity (refer to Figure 2).We then aggregate the tweets created by each user and assign positive or negative labels to the tweets based on the user's identity existence.Each classifier is a RoBERTa (Liu et al., 2019) model pretrained from tweets and further finetuned on the labeled tweet dataset.Further training details and examinations on classifier performances can be found in Appendix Section B.
Experiment Setting We use the scores from the classifiers to measure levels of identity-specific language from both the content that users post (tweets) and engage with through sharing (retweets).Using the identity classifiers, we obtain scores for every tweet and retweet generated by each IDENTITYADDED and CONTROL user between one month before and after the profile update.We then count the number of tweets with an inferred identity score higher than 0.5 and aggregate them into two periods, before and after the profile update.
We consider these as the total number of identity-relevant tweets the user tweeted or retweeted before or after treatment.We also count the number of total tweets regardless of identity score, which captures overall activity levels.We run separate regressions with the number of total tweets/retweets and identity-specific tweets/retweets as outcome variables, and include the number of total activities as a control variable when modeling identity-specific activities.
Results Figure 3 shows the effects of adding profiles on four different types of tweet activity counts: the number of total tweets (Figure 3 and retweets (Figure 3(d)).We can first observe that, contrary to prior work Lampe et al. (2007), the additional disclosure of social identity via profiles does not lead to greater overall activity levels compared to profile updates without such disclosure (Figures 3(a) and 3(b)).In fact, we observe the opposite for several types of identities, most notably drops of both tweet and retweet levels in binary gender pronouns and student status.
The only statistically significant increases we observe arise from disclosing political statuses.
On the other hand, we observe statistically significant increases in the number of tweets posted and retweeted which contain identity-specific language, across almost every category (Figures 3(c) and 3(d)).
Though there exists variance among categories, in general, we observe that identity-specific tweets increased by around 20-40% and identity-specific retweets increased by around 10-30%, indicating that though the content volume does not change, the percent of identity-related content within that volume increases substantially.
Further comparisons within identity categories reveal interesting findings.For instance, we observe that for both tweets and retweets, the increase following identity disclosure of men is lower than that of women and nonbinary genders.One possible reason is women and nonbinary gender users may undergo harder decisions to disclose their identity, which results in a greater change in their behavior following disclosure.Similarly, our results on ethnicity disclosures show larger identity-specific activities for African identities compared to the American identity, suggesting the level of language change may differ by identity types.  in number of tweets following identity disclosure.Significant positive and negative values that pass the correction test are marked in red (positive) and blue (negative).While identity disclosure does not lead to increased activity levels, there are significant increases in the number of tweets and retweets that contain identity-specific language Identity-specific language: topic or style?To further understand which aspects of language change following identity disclosure, we compare the tweets through two components of language: topic and style.We examine whether having a IDENTITYADDED user disclose their identity results in their language becoming more similar to that of a ALWAYSPOSITIVE user regarding each component.Further details for computing the distances can be found in Appendix Section D.
Figure 4 shows changes in the distance between the language of users who change towards disclosing their identity to those who always have had the identity visible.While topic differences remain relatively unchanged, the difference in style between the two user groups are reduced following identity disclosure for all categories apart from age.Though users do not significantly shift their topics of interest, they tune their language to appear more similar to the style associated with the identity that they choose to disclose.5 Does identity disclosure in profiles lead to network rewiring towards same-identity connections?
In our next analysis, we investigate whether the addition of identities leads to bridging more connections with like-minded others.To do so, we collect the ego networks of every IDENTITYADDED and CONTROL user where an edge between two users u and v is defined when u replies to or retweets a tweet posted by v.We divide a user's network activities by pre-and post-treatment where we look at a timespan of 12 weeks.We use the same set of regular expressions from the profiles of all users included in the networks and extract any social identities from their profiles during the 12-week period.The subset of connected users who have adopted the same identity as the ego user at any point will be considered same-identity nodes.Thus, in our subsequent diff-in-diff analysis, the outcome variable is the number of same-identity nodes before and after the identity disclosure.
Results Figure 5 displays the treatment effect on the out-and in-degree of the network when restricted to users of the same identity.We can observe that across most categories, the out-degree of same-identity neighbors significantly increases after identity disclosure in profiles (Figure 5(a)).This indicates that the users who choose to disclose their identities also choose to connect to more people that share the same identity.
We next look at the in-degree level changes, which is a stronger indicator of how the addition of identity is viewed by others (Figure 5(b)).We observe that the in-degree of same-identity groups is less likely to increase compared to the out-degree, which indicates that inbound connections are less likely to be made Figure 5: Effect sizes of identity disclosure on out-and in-degree network sizes.Users reach out to those of the same identity following disclosure (out-degree), but not all identities receive increased attention from others in return (indegree) compared to outbound connections, as the former requires others to actually be motivated to establish new connections with the user who has made a profile change.
Additional results, shown in the Appendix, highlight identity-specific changes.Figure 12 contains the effect sizes of the total out-and in-degree network sizes following disclosure, revealing that the overall network size only increases for political identities.These results support our claim that users choose to strategically rewire their connections more towards those of the same identity while keeping overall network sizes stable instead of merely being more open in general.Figure 13 shows changes in connection levels towards different identities in the same category.We find that gender pronouns is the only category to increase in both in-degree and out-degree for all identities, which is in line with existing work that showed tie clustering among such pronouns (Tucker and Jones, 2023).Last of all, we compare changes in cross-partisan connections for conservative and liberal users, where we observe significant increases of outbound connections from those who disclose their liberal identity to conservative users, but not the other way round.
6 Does identity disclosure lead to receiving more offensive content?
In our final research question, we investigate possible negative consequences of disclosing one's identity, namely whether identity disclosure leads to increased targeted offensive content.
Experiment Setting For each IDENTITYADDED and CONTROL user, we use the 10% sample dataset to collect a history of the tweets posted by the user during one month before and after the time of their profile update, as well as all replies received from other users during this period.Next, we use a publicly available classifier for detecting offensiveness from Hugging Face (Barbieri et al., 2020) 1 to obtain offensiveness scores of both the tweets posted and the replies from others.We then formulate an equation to model the expected number of offensive replies log (y i,t ) = β 0 +β 1 X i +β 2 (T = 1)+β 3 (t ≥ tr)+β 4 (T = 1) (t ≥ tr)+log (β 5 n id )+log (β 6 n id ) (T = 1) (t ≥ tr).
The added term log (β 5 n id ) indicates the log-normalized number of identity-specific tweets posted by the user and log (β 6 n id ) (T = 1) (t ≥ tr) is the interaction effect between identity disclosure via profile and identityspecific tweets.
Results Figure 6(a) (β 4 ) first shows that identity disclosure through profiles increases offensiveness for only a handful of categories -ethnicity:American, gender:men, personal:socialmedia, and political:activism.
However, when we observe changes in offensiveness levels caused by increased identity of tweets (Figure 6(b)   (β 5 )), we can see that significant effects can be seen from several categories.Interestingly, the disclosure of identity through tweets leads to reduced levels of offensiveness from others for the three studied gender types, as well as for occupations and religion types.Meanwhile, we observe increased levels of offensive replies from all three types within the political category, hinting that this may be due to heated political conversations that often correlate with offensiveness.Lastly, the interaction effect of identity disclosure via both tweet and profile (Figure 6(c)) (β 6 ) suggests that the combined effect from disclosure through both channels reduces levels of offensiveness for every category where increased identity disclosure through tweets was associated  with increased offensiveness.One potential explanation is that disclosing identity through both profile and tweet could create a sense of consistency, which helps reduce levels of hostility towards that identity group.

Discussion
Our findings indicate that disclosing social identities, regardless of category, follows similar behaviors in that both the content produced and connections made by the user become more aligned with the announced identity.We can assume that at the heart of such disclosure lies the innate desire to express oneself and find comfort among like-minded peers.It is also notable that instead of just becoming more active overall, users maintain similar levels of activity and connectivity while channeling their effort towards more identityaligned decisions.This comes at the expense of interactions with those unassociated with the identity, and coupled with existing findings that more people are disclosing their identity on Twitter (Pathak et al., 2021;Jones, 2021), could even signal that our Twitter networks might become more homogeneous over time.
Another interesting finding regarding the effects of disclosure was that identity disclosure via profiles did not result in significant increases in offensive replies targeted to the user for marginalized categories such as nonbinary gender, LGBTQ+ sexualities or minority ethnicities.While our results do not and are not meant to deny the existence of identity-targeted hate in social media platforms that is a major source of harm, we take a more positive view suggesting that the consequences of disclosing identity through profiles may not be as severe as anticipated, and that disclosure should be promoted and more widely accepted.

Conclusion
We conduct a case study for identifying whether added disclosure of one's social identity through profile updates leads to subsequent changes in linguistic style and network connections, and whether the disclosure leads to increased offensiveness from others.We propose methods for measuring identity disclosure through both profile-and tweet-level language and apply them to quasi-experimental difference-in-differences methods to show that identity disclosure through labels leads to increased disclosure in subsequent language.
Furthermore, we also observe that identity disclosure can lead to increased connections with like-minded identities, which is much more prevalent from the outward versus inward ties.Finally, we observe that, contrary to existing concerns, the negative effect of increased offensiveness from disclosing a social identity via profiles does not exist for most identities, and that the combined disclosure from both profile-and tweetlevels led to reduced targeted offensiveness levels.Overall, our results suggest that the decision to disclose one's social identity can be encouraged, with negative effects appearing less than is concerned.The code and annotated data for the study will be available at https://github.com/minjechoi/twitteridentity.

Limitations and Ethical Considerations
One limitation that our analysis is focuses only on Twitter.The amount of disclosure may differ by type of platform depending on why people use it (Jaidka et al., 2018;van Dijck, 2013).Identity may be visible through means other than the profile text.One example would be a profile image, which can indicate demographic features such as age, gender, and ethnicity (Yoder et al., 2020).

Identity-unaware offensiveness classifiers
To conduct the experiment on offensiveness levels after identity disclosure, we use finetuned classifiers trained on an external Twitter corpus (Barbieri et al., 2020).The black-box nature of these classifiers and datasets contain the risk of predicting text features of some identities as more offensive than others without sufficient understanding of contexts surrounding the identity, such as African-American English (Sap et al., 2019;Harris et al., 2022).In fact, our correlation results between the scores of the offensive classifier and identity-specific classifiers on a large corpus (Figure 10 in the Appendix) may lead to conclusions such as identity-specific language from nonbinary genders being more likely to be offensive than men or women, or the identity-specific language of Mexicans being the most offensive compared to other ethnicities.

Purpose of identity classifier
It is possible that one may associate the regular expression-based pipelines for identifying profile disclosures and the identity classifier models with purposes such as detecting whether a user possesses a hidden identity trait based on their prior Twitter history.We argue that our models are not served for that purpose.Rather, our categorization of users is entirely based on self-declared phrases indicative of social identities, which we examine through a meticulous verification process.Our results are driven from purely observational data aggregated at a scale of hundreds or thousands of users, which removes the possibility of identification.
Results on the disclosure of marginalized identities One of the findings of our study is that the disclosure of social identities via profile changes did not result in increased levels of targeted offensiveness, even for marginalized identity groups such as specific gender or ethnicity groups.One possible limitation is that our study is based on users who have willingly made the decision at some point to update their profile and make their identity visible to their friends and to the public, and those who did update may have been in a situation where they felt more comfortable to disclose in the first place.This creates a selection bias that might interfere with the generalizability of our findings to the general population of Twitter users, and thus further caution should be made when estimating the reactions following disclosure in online spaces.
Nevertheless, we conclude from our findings that identity disclosure through profiles can be an effective means of expressing oneself and connecting with like-minded others, and would encourage users to do so if seeking such outcomes.

D Measuring topic and stylistic distances between IDENTITYADDED and ALWAYSPOSITIVE users
To measure topic distributions, for each identity we run zero-shot contextualized topic models (Bianchi et al., 2021a,b) on the tweets of ALWAYSPOSITIVE users with 50 topics for 20 epochs, then obtain a 50-dimensional distribution which represents their topics D T AP .We then infer the topic distributions of the pre-and posttreatment tweets from IDENTITYADDED as D T pre and D T post , which we use to measure the Jensen-Shannon distances of each distribution to D T AP .For style, we select five style variables from Kang and Hovy (2021) as well as classifier models from the Hugging Face API trained on public datasets: offensiveness (Barbieri et al., 2020), formality (Rao and Tetreault, 2018;Pavlick and Tetreault, 2016), sarcasm (Misra and Arora, 2023), toxicity (cjadams et al., 2017, 2019), and positive sentiment (Hartmann et al., 2023).For each identity, we computed the binary style scores for every tweet of the ALWAYSPOSITIVE users to obtain a N × 5 dimension matrix of style scores with N as the number of tweets.We fitted PCA on the matrix to obtain the projection of its principal component, D S AP , which we use to represent the stylistic distribution of ALWAYSPOSITIVE users.Likewise, we obtained the same matrices for tweets from pre-and post-treatment periods of IDEN-TITYADDED users, and transformed these matrices into a single dimension using the principal component from fitted PCA of ALWAYSPOSITIVE, resulting in D S pre and D S post .We then used cohen's d (Cohen, 2013) to compute the difference between each of the style distributions to D S AP .
0         14: Plot on effect sizes of in-and out-degree when restricted to the opposite political ideology.Interestingly, we observe increased out-degree connectivity from those who declare themselves as liberals towards conservative users, but not from conservatives.

Figure 3 :
Figure 3: Effect sizes of identity disclosure on tweet and retweet-level activities.The x-axis indicates percentage increase

Figure 4 :
Figure4: Changes in style and topic differences between ALWAYSPOSITIVE users and IDENTITYADDED users before and after identity disclosure.Style becomes more similar after the disclosure compared to topics, where the relative distances are much smaller.

Figure 6 :
Figure6: Effect size of identity disclosure on the number of offensive replies received.(left) identity addition to profile, (middle) number of identity-specific tweets per week, (right) interaction effect of identity disclosure through profile and number of identity-specific tweets per week

Figure 8 :Figure 9 :
Figure8: F1 scores of identity-specific language classifiers on test set.While most tasks have a relatively high F1 score above 0.7, some identities are harder to be predicted correctly in a binary setting.26

Figure 10 :
Figure10: A heatmap comparing the correlations of identity-specific language with different styles.Similar categories exhibit similar styles.

Figure 11 :
Figure11: Pairwise comparison of the identity classifiers.Each identity classifier was used to obtain the identity scores from an identical dataset of 1 million randomly sample tweets.Spearman's rank was used to obtain the pairwise similarities between the score distributrions of any two identities.Pairwise similarities are largest between within-category comparisons, indicating that the language associated with identity disclosure follows some categorical properties as well.

Figure
Figure14: Plot on effect sizes of in-and out-degree when restricted to the opposite political ideology.Interestingly, we observe increased out-degree connectivity from those who declare themselves as liberals towards conservative users, but not from conservatives.

Table 1 :
The number of users who updated their Twitter profiles to disclose social identities.Refer to Table2in theAppendix for counts at subcategory level.

Table 2 :
Count of users who added social identities to their Twitter profiles once in our observation period for each subcategory-level identity.
Figure7: AUC scores of identity-specific language classifiers on test set.Almost all of our categories exceed 0.7, a reasonable cutoff for binary classification.