Studying Anti-Social Behaviour on Reddit with Communalytic

,


Introduction
The main goal of this chapter is to demonstrate a mixed-methods approach and a new research tool to study anti-social behaviour within online communities, and specifically on Reddit.We will refer to anti-social behaviour as any behaviour that may cause, or is likely to cause, harm or distress to one or more persons ("Anti-Social Behaviour, Crime andPolicing Act 2014," 2014).Two common forms of anti-social behaviour on social media are trolling and hate speech.Trolling is often initiated to disrupt on-topic conversations and provoke other users through deceptive behaviours, accompanied with "inflammatory, extraneous, or off-topic messages" (Buckels, Trapnell, & Paulhus, 2014;Lampe et al., 2014).Trolling may be driven by the perpetrator's own entertainment and online fame without a clear purpose, but it has also been used by state actors to dissuade and incite others online (Howard & Bradshaw, 2017).Alternatively, hate speech refers to negative expressions that are typically directed toward a collective of people based on their religious affiliation, ethnicity, race, nationality, sexual orientation, gender identity, disability, or other shared group characteristics (Costello, Hawdon, & Ratliff, 2017;Faris et al., 2016).Although for some online groups, what is described here as 'anti-social' may actually be a communal norm and be practiced by group members to socialize; we are more interested in studying group dynamics where 'anti-social' is not a norm and where such behaviour may negatively affect the overall group cohesion and interactions at the community level and may have psychological and emotional consequences for individuals (Craker & March, 2016;Duggan, 2017;Giménez Gualdo et al., 2015;Hodson et al., 2018;Lindsay et al., 2016).There is also a concern that some forms of anti-social behaviour, such as hate speech, may galvanize xenophobic behaviour offline (Awan, 2014;Awan & Zempi, 2016) and lead to changing social norms at the societal level.
Even though anti-social behaviour on the internet is not a new phenomenon, the number of people who are exposed to it has risen exponentially with the widespread adoption of social media (Anderson, 2018;Runions, Bak, & Shaw, 2017).For example, nearly 60% of Canadian adults have encountered hate speech, racist, or sexist content online at least once a month (Ryerson Leadership Lab, 2019).Given the ubiquity of social media in modern-day society, it is imperative that the extent and types of anti-social behaviour on social media are well-documented and studied.However, one of the major challenges in studying anti-social behaviour online is that not all "antisocial" posts can be easily flagged as offensive or abusive.Are there varying degrees of online anti-social behaviour?How do we distinguish sarcastic texts, 'netspeak' jargon, and other content (e.g., emojis, images, and memes) that may go unnoticed due to their subtlety, but have detrimental effects on groups and individuals?Questions like these require rigorous empirical analysis to better understand how online anti-social behaviour may be changing our society and the practice of public discourse in the 21st century.
To help researchers interested in examining online anti-social behaviour, this chapter introduces Communalytic, a new research tool for studying anti-social acts4 in public groups on Reddit, a popular social media site.We will use a case study approach to demonstrate how a researcher can use Communalytic to examine interactions which may lead to toxic exchanges among members of a public group called r/metacanada.
While previous research has extensively examined anti-social acts on social media, most of the literature in this area has relied primarily on Twitter data (e.g., Gorrell et al., 2019;Maity, Chakraborty, Goyal, & Mukherjee, 2018;Southern & Harmer, 2019;Theocharis et al., 2016).This over reliance on Twitter data for academic studies is likely due to the public nature of the Twitter platform, and the wide availability of research tools for collecting and analysing data from Twitter.Notably, there are far fewer studies that examine antisocial acts on Reddit (Massanari & Chess, 2018;Massanari, 2017).Communalytic was developed to address a lack of availability of Reddit research tools and to enable internet researchers to study online communities and communication practices on this platform, and more specifically to study how anti-social acts manifest themselves on this social platform.Considering the anonymous nature of online interactions on Reddit and how often it is used for political and polarizing discussions, Reddit is a useful platform for researchers to study and better understand dynamics that drive online anti-social behaviour and their impact on various online communities.This section will provide an overview of Reddit as a social networking platform, as well as a specific subreddit called r/metacanada which we use as a case study in this chapter.

What is Reddit?
Reddit is a social media platform that was founded by Alexis Ohanian and Steven Huffman in 2005 (Anderson, 2015).Originally billed as "the front page of the internet", it is comprised of online communities called subreddits, where users (also known as redditors) can share posts, images, or URLs.These subreddits cover a vast array of topics from history to politics and everything in between, with an estimated 1.8 million subreddits available on Reddit (Redditmetrics, n.d.).Users can "upvote" (i.e., "like") or "downvote" (i.e., "dislike") posted content, influencing the rank of that post for both their own main feed, as well as within the subreddit that it belongs to.In other words, more popular content will become more visible, and less popular content will be shuffled down to the bottom of the feed.In addition to upvotes and downvotes, discussion between users are facilitated on the posts in the form of comments and replies.To help supervise the content that is posted in each subreddit, user-appointed moderators or modsare tasked with regulating each subreddit based on each subreddit's rules on appropriate content.For example, the subreddit r/politics has a list of eleven rules that users must abide by when posting content, such as "no hateful speech" and "no copy-pasted articles." Given the specificity and expansiveness of subreddit topicsas well as the open and public nature of the posted content -Reddit has become a subject of interest for researchers in various fields of study.For example, several recent papers have examined the positive benefits and practicality of Reddit as a platform for informal learning (e.g., Del Valle et al., 2020;Haythornthwaite et al., 2018;Staudt Willet, & Carpenter, 2019).In addition, content analyses of subreddits have revealed that Reddit has facilitated increased public engagement with scientists (e.g., Hara, Abbazio, & Perkins, 2019), and has often been used by researchers to learn more about topics as wide and varied as weight management (Pappa, Cunha, Bicalho, Ribeiro, Silva, Meira, & Beleigoli, 2017), to users' attitudes toward vaccination (O'Kane, Zhang, Lama, Hu, Jamison, Quinn, & Broniatowski, 2019), to users' experiences with mental illness (Yoo, Lee, & Ha, 2019).Finally, and more broadly, Reddit has also been touted as a potential platform for participant recruitment (Gutierrez, 2018;Shatz, 2017).
However, despite the positive potentials of Reddit as a social platformsuch as the enabling of supportive communities and access to user-generated, niche information -Reddit can facilitate online anti-social behaviour (Ging & Siapera, 2018;Massanari, 2017;Massanari, & Chess, 2018;Topinka, 2018), or what Massanari (2017) calls 'toxic technocultures'.These are defined as "the toxic cultures that are enabled by and propagated through sociotechnical networks such as Reddit,4chan,Twitter,and online gaming" (p. 333).Such 'toxic technocultures' can be fueled by platform affordances.In the case of Reddit, the ease with which users can create multiple accounts and subreddits, allows for the ability to create and participate in anti-social acts, with little to no repercussions.For example, subreddits that are banned for toxic contentsuch as r/incels, which promoted mysognistic views and rapeoften have similar subreddits where users can flock to (e.g., r/Braincels).In addition, the policies enforced by platform administrators encourage Reddit as a "neutral platform" for discussion.As a result, administrators rarely intervene in disputes, citing their neutrality toward the nature of the content, irrespective of how inappropriate or toxic it may be.Still, users may report behaviour or an entire subreddit community that they deem to violate Reddit's community guidelines on harassment, bullying, and threatening behaviour (Reddit, 2019).In these cases, intervention may result in the banning of users and/or the subreddit community.This course of action is usually reliant on users' self-directed action, such as flagging and reporting toxic and hateful content, including links, comments, and subreddits, on Reddit.Given the open and permissive structure and policy guidelines of Redditcoupled with the ease of access to open and public data available on subreddit communitieswe chose Reddit as a social media platform to analyse anti-social behaviour in our case study.The following section will describe our selection process of the r/metacanada subreddit used in this case study.

Case of r/metacanada
To identify a subreddit for the case study, we decided to look for groups that are known to solicit strong reactions from other users, such as those that discuss and espouse nationalistic and extreme right-wing ideologies.To locate potential subreddits, we took the following steps: first, a broad keyword search was conducted on Google Scholar.The following keywords were used : Reddit, nationalism, altright, right-wing, islamophobia, and white nationalism.This step was conducted in order to help us locate any existing research that may have examined these constructs within Reddit, as well as the specific subreddits of interest.Several studies were located (Nithyanand et al., 2017;Qian et al., 2019;Topinka, 2018), and were reviewed to help us identify what subreddits were generally associated with extreme right-wing sentiment.Finally, the same keyword search was conducted within Reddit in order to identify subreddits relating to our research interest.Only publicly accessible subreddits were searched.
Based on these preliminary searches, the following ten subreddits were identified as potentially being suitable for use as a case study: r/AskThe_Donald, r/Conservative, r/politics, r/ConservativesOnly, r/LeftistWatch, r/POLITIC, r/canada, r/The_Europe, r/askaconservative, r/metacanada.Using Communalytic, we extracted 1 full day's worth of posts for all ten subreddits and analyzed the posts for toxicity (the tool and the process are described in Section 3.2).During the selection process, we examined the following aspects of each dataset: 1) the number of posts and replies extracted in the one-day period, 2) the highest and average toxicity scores for each dataset, and 3) the top ten toxic posts.At this stage, four subreddits (r/ConservativesOnly, r/LeftistWatch, r/The_Europe, r/askaconservative) were eliminated from consideration for low posting activities.We also excluded three subreddits (r/AskThe_Donald, r/politics, r/POLITIC) because their top ten toxicity posts did not include comments that specifically focussed on nationalistic or right-wing topics or issues.For the remaining three subreddits (r/Conservative, r/canada, and r/metacanada), we reviewed a small sample of their posts to gauge their level of toxicty.At the end of the review, we selected r/metacanada for this case study due to the high level of toxic and nationalistic content present in the subreddit.
The r/metacanada subreddit5 was created on May 6, 2011and is self described as "The only notretarded Canadian subreddit."We collected data from this subreddit in the two weeks (October 9, 2019 to October 22, 2019) leading up to the Canadian Federal Election which took place on October 21, 2019.At the period of data collection, there were 31.2thousand subredditors who subscribed to this community.This particular subreddit has ten rules that all members must abide by, including "no doxxing" -or revealing other users' personal information -and "no brigading", which includes the organization of a group of subredditors to attack, harass, and/or downvote another user.Other rules in this subreddit are "use NP for reddit links", where redditors are asked to "replace 'www' in the link with 'np' ", "don't vote/comment in linked threads", "no floodposting/disruptive shitposting", "no racism", "no condoning/threatening illegal activity", "follow rules of reddit", "Mark NSFW posts NSFW", which is an acronym that indicates the posted content is "not safe for work", and "no shitty bots." 3 Method

Detecting Anti-Social Acts at Scale
Examining anti-social interactions in online communities is a rapidly growing area of inquiry.Recent research has examined a number of different types of anti-social acts, such as hate speech (Southern & Harmer, 2019), impoliteness (Theocharis et al., 2016), rudeness (Su et al., 2018), incivility (Kenski, Coe, & Rains, 2017;Rossini, 2019), offensive comments (Kwon & Gruzd, 2017), and stereotyping (Southern & Harmer, 2019).Considering the volume of available data, we will focus on the automated approaches to detecting anti-social posts in text-based communication on Reddit.

Content-based approaches:
Prior literature on detecting anti-social acts at scale has primarily used supervised machine learning that predominantly relies on content-based features to identify relevant posts (Al-Makhadmeh & Tolba, 2019;Kwok & Wang, 2013;Pitsilis, Ramampiaro, & Langseth, 2018;Gorrell et al., 2019;Borkan et al., 2019;Hosseini, Kannan, Zhang, & Poovendran, 2017).For example, Dybala et al. (2010) used support vector machines (SVM) to classify comments posted on unofficial school websites in Japan into those that are potentially harmful and not.The SVM method relied heavily on the use of vulgar words by the purported bullies.Alternatively, Dinakar et al. (2011) developed a binary text classifier to determine whether a message is on a sensitive topic or not.The authors then trained multiclass classifiers to categorize messages into one of three possible attacks: an attack on 1) sexual minorities ('sexuality'), 2) race and culture, or 3) one's intelligence.Dadvar and colleagues (2013) developed a multi-criteria evaluation system to detect cyberbullying among YouTube commentators.Their system assigns a 'bulliness' score to each user based on user information (age and membership duration), content features (post length, presence of profane words, profanity and bullying sensitive topics, the use of first and second person pronouns, nonstandard spelling), and activity features (number of uploads, subscribed channels, posts).The researchers further improved the performance of their expert system by adding supervised machine learning using a Naïve Bayes classifier (Dadvar et al., 2014).
One of the most promising works is by Nahar and colleagues (2014), which involves a fuzzy SVM approach to cyberbullying detection designed to handle noisy, imbalanced, and streaming text from social media.The advantage of their approach is that it only requires a small training set which can then be expanded based on unlabelled streaming data.The feature set included keywords, the number of swear words, presence of pronouns, the degree of users' emotions, the number of capitalized letters that may indicate shouting, additional metadata, and users' age and gender.The evaluation, based on three different datasets from Myspace, Kongregate, and Slashdot, demonstrated the superior performance of the proposed approach over more traditional, fully supervised approaches.
Although showing a lot of potential, solutions based on content-based classifiers, as described above, suffer from several shortcomings.They require training and as such tend to be domain and context dependent, making them less effective in environments where bullies or trolls mayand often douse slang, image-based messaging, or other subversion techniques to attack others.Another limitation of such approaches is that content-based only techniques focus on individual messages and are, therefore, not well equipped to detect a coordinated campaign by a set of users (or a set of accounts managed by a single entity) to disrupt an online group or discussion.

Graph-based approaches:
To address some of the limitations of the content-based approaches, we can turn to graph-based approaches which focus on user accounts (instead of posts) and connections between them.From a graph perspective, online participants can be considered as nodes, and interactions between them as edges.A graph-based approach has the benefit of not relying on the content of messages and therefore removes the need to train a text classifier to support different languages, communities, and platforms.Another advantage is that such approaches are capable of identifying clusters of related accounts based on certain network properties (e.g., densely-connected accounts).This, therefore, allows for the detection of coordinated anti-social acts.Existing graph-based approaches rely on the detection of anomalies in the network structure.They can generally be divided into three broad categories: feature-based methods, community-based method or relational learning (Aggarwal, 2013).
First, feature-based methods "transform the graph anomaly detection problem to the well-known and understood outlier detection problem" (Akoglu et al., 2014).Features may include node-level measures such as various node centralities, dyadic measures such as the number of common neighbours, or group-level measures such as density, reciprocity and modularity (Gruzd & Tsyganova, 2015).An example is a technique called OddBall (Akoglu et al., 2010) which uses graph-based measures, such as the number of neighbours and the number of triangles, for each ego network (that is a node/ego, all of its neighbours and connections among the neighbours) in order to detect those ego networks that deviate from the majority.The second type are community-based methods.They usually rely on partition or community detection techniques that are able to identify densely connected groups of nodes.Usually these would be the nodes that bridge different communities.For example, gSkeletonClu algorithm (Huang et al., 2013) finds outlier nodes as a by-product of the graph clustering algorithm.In FocusCOanother implementation of a community-based approach (Perozzi et al., 2014) the algorithm also requires clusters to include nodes that have similar attributes.Nodes that are placed in the same cluster but differ from the other nodes in some attribute values, are labelled as outliers.The third approach is based on relational learning methods.This is a binary classification approach that classifies graph objects such as nodes and edges, while considering their inter-dependencies.For example, if one node is labelled as a 'troll', then this would increase the chance that the node connected to it is also a 'troll'; in other words, nodes connected to each other will likely have the same class label.Thus, in addition to node attributes, relational learning algorithms exploit class labels and attributes of node neighbours.Algorithms in this category often rely on an inference procedure to classify unlabelled nodes iteratively (Macskassy & Provost, 2007).
In this work, we propose to combine both a content-based and a graph-based approach.We start by discovering a communication network that represents who interacts with whom in an online group.Next, we apply a content-based machine learning classifier to determine whether an interaction between any two nodes in the communication network can be viewed as an anti-social act.Specifically, we rely on Perspective API, a machine learning classifier developed by Google that can recognize different types of anti-social acts such as toxicity, identity attack, insult, and threat (Chakrabarty, 2020).While the earlier versions of the Perspective API has been criticized for assigning high toxicity scores for non-toxic posts mentioning one's identity such as posts with LGBTQ+ related words (Hosseini et al., 2017;Jain et al., 2018), the most recent iteration of the Perspective API (the current being version 6) has shown a high level of accuracy (~80%) in offensive language detection (Jigsaw, 2019;Pavlopoulos et al., 2019), and has been used in a number of recent empirical studies (Delisle et al., 2019;Hopp & Vargo, 2019;Mittos et al., 2019;Obadimu et al., 2019).
Using Perspective API, we score each interaction between two users on a scale from 0 to 1 based on the likelihood of that text-based interaction exhibiting an anti-social act.We then assign these scores as individual weights to each edge in the network.Edges with higher scores closer or equal to one are more likely to denote 'anti-social' exchanges between users.The following section describes our approach in more detail including data collection from subreddits, the exporting of datasets in various formats, and analysis, as implemented in an online research system called Communalytic.Once communication networks are discovered and anti-social scores are assigned to edges, we use Gephi, a popular social network analysis tool (Bastian et al., 2009), to examine anti-social patterns in the network.

Introduction to Communalytic
Communalytic is a web-based research tool that can collect and analyze publicly available data from Reddit6 .It is designed to study patterns of anti-social behaviour and can display the results of analyses visually in a variety of ways.Currently, the main data source for Communalytic is Reddit, specifically the subreddits within Reddit.Subreddits are the online forums that comprise Reddit and are denoted with an "r/" before the subreddit title.They are often dedicated to a specific topic, which users can subscribe to, post, and comment exclusively to that subreddit.Using Communalytic, researchers can collect publicly available submissions, comments, and replies posted within a subreddit.When importing data from Reddit, users are asked to specify the subreddit that they wish to collect data from and indicate the length of data collection (see Figure 1).

Figure 1. Data collection form in Communalytic
Once the data is collected, users can perform a variety of tasks with the dataset.Communalytic provides an overview of the dataset, including the subreddit name, the number of posts extracted, and the time period of extraction (see Figure 2).In the Dataset Overview screen, users can view visual representations of the number of posts per day, a word cloud depicting the most frequent words used in the dataset, as well as the top ten posters of a selected subreddit.In addition, this view enables users to export both posts and communication network data to their own computer for further analysis.Posts are exported as a CSV file, which also contains metadata about the collected posts, such as the author's username, date published, the content of the post, and the number of upvotes, as provided by Reddit's Public API.The network data file is exported as a GraphML file and can then be imported to other softwaresuch as Gephifor social network analysis.A snapshot of the network is also automatically generated by Communalytic, providing users with a static preview image of the dataset.

Figure 2. Dataset overview screen in Communalytic
After data collection has been completed, users can run a 'toxicity analysis' on the dataset.During this stage, with the help of Perspective API, Communalytic generates seven types of anti-social scores (between 0 to 1) for each post in the dataset, with scores closer to one indicating higher levels of toxicity.Communalytic uses the following scores as provided by Perspective: toxicity, severe toxicity, insult, identity attack, profanity, threat, and attack on commenter.Table 2 includes definitions and corresponding sample posts for each category.Using the toxicity analysis option in Communalytic, users are able to determine the overall level of 'anti-social' in the dataset by examining the average scores, distributions of scores for all posts in the datasets, as well as by reviewing the top 10 posts that received the highest and lowest scores (see Figure 3a, 3b).The scores for individual posts and replies are also downloadable as a CSV file.

Figure 3a
. Toxicity analysis summary table for r/metacanada subreddit.One particularly useful feature of Communalytic is that within the exported network file, one can access edge-level weights corresponding to each of the seven Perspective scores of anti-social acts available in Communalytic.For example, based on the data in Table 1, a user n6 sent a highly toxic (toxicity=0.86)and insulting (insult=0.85)post to user n2125 (edge e2).And since each reply is recorded as a single edge, some users will have multiple edges between them.For instance, user n8 has two edges connecting her to user n428 (edges e8 and e9).These additional metadata fields embedded in the network file allow researchers to visualize and examine different communication layers based on different types of anti-social acts.This will be demonstrated in Section 4.2.

Toxicity analysis
In total, there were 22,560 posts, including 1,717 submissions (posts that start a new thread), and 20,843 replies.Table 3 shows how many posts are automatically classified as one or more of the seven Perspective's anti-social scores available in Communalytic.For example, 5.3% to 15.0% of posts can be characterized as toxic, whereas only 0.2% to 6.2% of posts are characterized as severe toxic.These ranges vary depending on the threshold used.The table shows the counts based on the three different thresholds: 0.7, 0.8, and 0.9.Considering the polarizing nature of this group, we expected a larger portion of posts to be toxic, but only a small fraction of them really are.It is likely that the level of toxicity was limited due to the active moderation by eight moderators in this subreddit.Future work will need to compare these levels to other subreddits to establish the baseline.To determine a cut-off value for the Perspective scores, we recommend testing different thresholds to identify a suitable level based on the research questions and the focus of a given subreddit.This is because by lowering the threshold to 0.7 or lower, the system will more likely catch most antisocial acts, but at the same time, it will increase the likelihood of labelling a post as 'anti-social' when it is not; thus, introducing false positive results.On the other hand, by setting the threshold to 0.9 or higher, we will reduce the chance of false positives but will be at risk of missing some anti-social posts that scored below 0.9.In general, if your project aims to identify more severe cases of toxicity, and explicit cases of anti-social, then setting the threshold to 0.9 may be appropriate.But if your project is seeking to examine all possible anti-social acts, then you may consider casting a wider net by lowering the threshold to 0.7.To evaluate the accuracy of the Perspective scores, we recommend recruiting human coders who would review and score a smaller, random sample of the collected posts to compare their scores with the ones assigned by Perspective API.This way, you would be able to establish and report accuracy, precision and recall measures for how well Perspective detects anti-social acts in your specific dataset.The calculation of these evaluation metrics is outside the scope of this chapter, but well covered in other texts (see, for example, Dhaoui et al., 2017).In this case study, we use the threshold of 0.8.
While Perspective calculates several different scores, some of them are interrelated.For example, based on correlation analyses (see Table 4), the following four scores are highly correlated with each other (Pearson correlation > 0.9): toxicity, severe toxicity, insult, and profanity.This suggests that depending on one's research questions, it might be enough to examine one of the abovementioned scores.For the purpose of this chapter, out of the four highly correlated scores, we will examine the toxicity score.
We also note that the threat score is the most 'conservative' metric because it flags a smaller proportion of posts as anti-social, between 0.2% to 1.7% of posts, depending on the set threshold.
Considering the limited scope of the threat score relative to the other scores, we exclude it from further analysis.In sum, for the remainder of this chapter, we will examine the types of interactions and resulting communication networks based on the following anti-social scores: toxicity, identity attack and attack on commenter scores.

Social network analysis
While the review of toxicity scores offers a general sense of how toxic a particular group is, this analysis alone does not highlight which users tend to instigate such behaviour, which users are on the receiving end, and whether there is a specific pattern to the spread of anti-social behaviour.For example, is the anti-social behaviour a norm in the group as whole or are there users who are more likely to engage in such behaviour than others.Furthermore, are there signs of coordination among users (a behaviour known as brigading) to target others in the group?To help us answer these questions, we use Communatlytic to discover and export a communication network representing who replies to whom in the group.As noted earlier, the uniqueness of this network data is that edges have additional attributes assigned to them in the form of Perspective API scores.
To fully understand the inner-dynamics of r/metacanada, we turned to Gephi, an open-source software.Communatlytic exports network files in the GraphML format, which is supported by a wide variety of programs for social network analysis (SNA), including Gephi.Previous studies of various online groups suggest that by examining communication network structures, we may be able to predict the level and quality of group participation, and even the group's longevity (Chua et al., 2007;Gruzd & Haythornthwaite, 2013).
Excluding isolatesthat is, posters who have not received any repliesthe resulting network consisted of 2454 nodes and 14579 edges (see Figure 4).In this case, each node represents a redditor, and each edge represents a reply to an original post or another reply.The size of each node corresponds to the number of other users they replied to or received a reply from (also known as total degree centrality).Similar to other online groups (Yang et al., 2018), the r/metacanada network exhibits a core-periphery structure; that is, there is an active group of users in the core of the network who post and reply to each other, with less active group members found at the periphery of this network.Another metric that can be used to describe this network is modularity.
It is a network-level measure that ranges from 0 to 1 where values closer to 0 suggest a highly connected network (Gruzd et al., 2016).By applying the label propagation algorithm (Raghavan et al., 2007), we calculated the value of modularity as 0.264.Because this value is closer to 0, it indicates that most conversations were primarily among the same group of users.We also note that the overall reciprocity of the network is 0.38, meaning that 38% of all ties among users are reciprocal (that is, they received a reply); this value is consistent with other online discussion groups sharing similar interests (i.e., politics and identity) (Del Valle et al., 2020;Sun, 2019).In order to identify the most toxic users and their interaction patterns, we used the Filter Tool in Gephi to show only edges with values higher than or equal to the selected threshold of 0.8 in accordance with the three scores we are examining: toxicity, identity attack and attack on commenter scores.Figure 5 shows the resulting network visualizations after the filter was applied.
In each network visualization, the node size represents the number of other nodes the user replied to (out-degree centrality).This way, larger nodes represent users who tend to attack others.Table 5 lists the network-level properties and metrics for each of these sub-networks.Even though all three networks in Figure 5 display a similar core-periphery structure, there are some structural differences between them.For example, the modularity scores for the Identity Attack and Attack on Commenter networks are much higher than for the Toxicity network.This suggests that interactions classified as Identity Attack and Attack on Commenter tend to stay within closely-connected groups of users (higher values of modularity), likely due to the directed nature of such attacks.
Based on the network visualizations, we can see how some users tend to be the primary spreaders of anti-social acts in this group (users depicted as larger size nodes).Furthermore, based on relatively low values of reciprocity, users who are attacked do not tend to reply in kind.The only exception is the Attack on Commenter network which has the highest reciprocity value of 0.234 relatively to the other two networks.This means that about 23.4% of interactions that were classified as "Attack on Commenter" are reciprocal, as opposed to only about 6.7% for the Identity Attack type interactions.
To explore this pattern of interactions further, for each of the three scores, a researcher can use the Data Laboratory option in Gephi to locate and examine users with the greatest number of incoming edges (in-degree centrality) and those with the greatest number of outgoing edges (out-degree centrality).Users with the highest in-degree values would be recipients of anti-social content from the greatest number of nodes.And users with the highest out-degree values are those who post anti-social replies to the greatest number of nodes in the network.While outside the scope of this chapter, future research may include employing a more qualitative and content-driven approach in examining the anti-social acts of these key users within the network in more detail.Coupled with the network-level data analysis and visualization, a qualitative approach would provide a more nuanced understanding of anti-social behaviour present within an online community, and help add richness to this line of research.

Conclusions
The rising tide of online anti-social behaviour has elevated public concern and skepticism over the perceived benefits and promise of social media in society (Bauman & Baldasare, 2015).A darker side of social media has emerged and remains evident today, with various countries, governing bodies, and citizens grappling with the impending normalization of aggressive behaviour, hostility, and negative discourse in online spaces.This realization has led to an influx of research examining the patterns of 'anti-social' in online communities, as well as the development of the necessary tools required for systematic investigations.
One such tool is Communalytic.It provides researchers with an accessible and easy to use approach for analyzing public groups on Reddit.Its ability to export data for social network analysis, along with anti-social scores, makes it a useful tool for researchers to examine and analyze anti-social behaviour both at the group and user levels.These functions are supported by the use Google's Perspective API, which uses a machine learning classification system to score the content across various categories of anti-social behaviour and in six different languages (English, French, German, Italian, Portuguese and Spanish).Furthermore, the ability to export network-level data allows for an additional analysis of online exchanges using metrics from social network analysis.

Figure 3b .
Figure 3b.The distribution of toxicity scores for r/metacanada subreddit, showing that the toxicity score for most posts is under 0.3.(Y axis: post count; X axis: toxicity score)

Figure 5 .
Network visualization of anti-social interactions in r/metacanada with a threshold of 0.8 (Node size = out-degree centrality)

Table 2 .
Seven categories of anti-social acts from Perspective API available for analysis in Communalytic, their definitions and examples

Table 1 .
Edge-level attributes for each of the seven categories of anti-social acts (highlighted values are referenced in text above).

Table 3 .
Number and percentage of toxic posts

Table 4 .
Pearson correlation analysis among Perspective API scores

Table 5 .
Sub-Network level properties and metrics for the three selected scores of anti-social acts