Abstract
Content moderation is critical for maintaining healthy online spaces. However, it remains a predominantly manual task. Moderators are often exhausted by low moderator-to-posts ratio. Researchers have been exploring computational tools to assist human moderators. The natural language understanding capabilities of large language models (LLMs) open up possibilities to use LLMs for online moderation. This work explores the feasibility of using LLMs to identify rule violations on Reddit. We examine how an LLM-based moderator (LLM-Mod) reasons about 744 posts across 9 subreddits that violate different types of rules. We find that while LLM-Mod has a good true-negative rate (92.3%), it has a bad true-positive rate (43.1%), performing poorly when flagging rule-violating posts. LLM-Mod is likely to flag keyword-matching-based rule violations, but cannot reason about posts with higher complexity. We discuss the considerations for integrating LLMs into content moderation workflows and designing platforms that support both AI-driven and human-in-the-loop moderation.
1 INTRODUCTION AND BACKGROUND
Online communities function in large part due to the outcomes of effective moderation. Well-moderated communities are productive, open to a variety of members, and incur low physical and social costs [16]. Productivity is often stymied by the common abuses detailed in Grimmelmann’s taxonomy of moderation [16]. In particular, because only limited posts can be viewed at a time, the moderator’s role becomes increasingly important to reduce the “cacophony” and “manipulation” of antisocial content in communities that have thousands of posts a day [16, 32]. While we often see content moderation in the form of censoring hate speech, abuse [14, 18, 38] or trigger warnings, many other types of posts need to be flagged and removed [31]. The “abuse” that moderators need to filter lies on a spectrum beyond explicit hate speech, including irrelevant or off-topic content, trolling, and content violating community rules. This expanded definition makes it more difficult to detect abuse because this task often requires human-level reasoning [7, 22, 24, 31].
For example, 1a shows the guidelines of r/AskHistorians, and 1b shows a complex rule violation in the same subreddit, breaking the following community guideline: “Rule 7) Answers should not be speculative or anecdotal.” This is not easily determinable with a rule-based model that looks for the words “what if.” Identifying this violation would require an understanding of hypotheticals. Given the complexities of community rule violations, social media platforms, such as Facebook, Twitter, and Reddit, heavily rely on user reporting and human-moderator-based manual efforts. Additionally, online communities typically suffer from severely high post-to-moderator ratio, for example, large subreddits such as r/AskReddit with 44.7M members, have only 33 moderators to deal with thousands of daily posts. This causes emotional and physical exhaustion for moderators inundated with posts and frustration for community members who deal with lower-quality content, lack of transparency into removed posts, and a flaky appeals process [12, 18, 21]. Further, in unpaid moderation contexts like Reddit and Facebook, moderators often quit due to time allocation issues, conflicts with other moderators over policies, and shifts in community values as membership changes [35]. Reddit’s community rules represent community values, but their interpretation can be subjective [23], making moderation taxing for both automated agents and human moderators. As for the values misalignment and moderator infighting, this was often found to be because moderators would manipulate the rules for “power” and “dominance” [35]. A consistent read of the rules or an impartial judge could aid in this administrative struggle.
Reddit offers automated moderation tools such as AutoModerator (Automod) that are configured to filter for undesirable phrases defined in a wiki of regular expressions. If a post contains an undesirable phrase, the post is automatically taken down. However, regex-based tools like Automod are not able to parse more complex cultural conversations or provide transparency for enforcement actions [21]. Keeping Automod up-to-date also creates additional work for human moderators [18]. Prior work has explored other automated techniques leveraging machine learning and NLP methods [3, 17]. However, these systems are often based on word-ban classifiers, which are inflexible to changing community guidelines and rarely provide transparency in their decision-making. With the growing adoption of large language models (LLMs), research has started exploring LLMs for content moderation tasks [30]; notable is the work of Kumar et al. on leveraging LLMs for toxicity detection [24]. Prior work has shown that fine-tuning LLMs may lead to overfitting for content moderation cases [27]. Other studies have discussed the idea that moderation is difficult because of disagreement on how to read a rule and that LLMs can provide an objective third-party decision [28, 38]. However, the effectiveness and reasoning capacities of LLMs in identifying rule violations on online platforms still remain unknown and underexplored.
Motivated by the above, our work asks the following research question—What is the reasoning capability of LLMs when handling rule violations in online communities? We conduct our study by designing an LLM-based moderator workflow (
A key objective of this work is to evaluate the reasoning of off-the-shelf LLMs and their performance on Reddit data without much additional context or fine-tuning. We propose a workflow for moderators of community-based platforms to manage content at scale while providing meaningful feedback and explanations to their users. We explore the conditions where LLMs succeed in distinguishing rule violations and where they struggle. We find that
2 STUDY AND METHODS
2.1 System and Study Design
We propose an LLM-based moderator (
2.1.1 Proposed Workflow.
Figure 2 depicts how we prompted
2.2 Evaluation and Dataset
We divide our evaluation into quantitative performance metrics and human-evaluated performance metrics with multi-step prompting to better gauge the reasoning ability of the model. The quantitative performance metrics include—1) Precision, 2) Recall, 3) Identifying which guidelines the model is unable to reason about, and 4) Identifying which subreddit category in which the model was able to reason the best. In the human evaluation task, for some key representative examples, we aimed to determine—1) What kind of prompt engineering (e.g., multi-step prompting, justification, etc.) can help the model to better reason about nuanced details, 2) Why a model may have an incorrect decision, and 3) What are the types of rules the model has trouble reasoning about.
While quantitative metrics are important to determine the consistency of the model, this work primarily centers around manual evaluation and follow-up data collection on Reddit posts. This research focuses on text-based subreddits since image reasoning models are not as readily available. The subreddits from which we sample are r/askhistorians, r/askscience, r/changemyview, r/explainlikeimfive, r/jokes, r/outoftheloop, r/philosophy, and r/writingprompts.
For each subreddit, we collected two types of posts: (1) Rule-Passing Posts that are valid in the subreddit and (2) Rule-Violating Posts that violate community guidelines beyond keyword-based violations. For rule-passing posts, we used the Reddit API endpoint that gets “hot” posts, assuming that “hot” posts are valid given that they have not been removed despite high interactions. The rule-violating posts were hard to collect as the Reddit API does not allow obtaining removed or reported posts. So, we obtained this data through the following means (Table A1 shows examples):
—Manual Selection. We read through new Reddit posts, and manually selected ones that violated community guidelines.
—Manual Writing. We manually wrote posts that intentionally violated a certain rule.
—AI-Generation. We provided ChatGPT withthe rules and mission of a subreddit, and asked it to generate posts that violate a specific rule. We then manually modified the post so that it was not a word-choice-detectable rule break.
Overall, our dataset consisted of 600 rule-passing posts—100 each from r/askscience, r/changemyview, r/explainlikeimfive, r/jokes, r/outoftheloop, and r/writingprompts. We obtained a total of 144 rule-violating posts—24 from r/askhistorians, 34 from r/changemyview, 39 from r/explainlikeimfive, 24 from r/nostupidquestions, and 23 from r/philosophy.
3 RESULTS
We evaluate the performance of
3.1 LLM-Mod on Rule-Passing posts
We first evaluate how
3.1.1 Current community activity.
Classification of posts based on guidelines that relate to current forum activity was impossible because we did not provide this information as context. For example,
3.1.2 Organized and helpful responses.
Post:
Judgment:
Justification:
3.1.3 Gauging Human Emotion.
The model struggled to gauge human emotion associated with certain posts. This was especially evident when classifying posts in r/ChangeMyView. Users on this subreddit post a personal opinion or stance that they feel very strongly about and ask other users to change their viewpoint. Their guidelines specifically state, “Posts cannot express a neutral stance.” They also state, “Don’t be rude or hostile to other users.” The model sometimes took these rules as in violation of each other. For example, for the following post, the model took the user’s exaggerated language as disrespectful to others. Here, while the original poster used some exaggerated language and was upset by the song, they did not single out other community members or were unnecessarily rude to others who enjoyed the song. The model was unable to capture the user’s emotional tone.
Post:
Judgment:
Justification:
3.1.4 (Potentially) Discriminatory Jokes.
On r/Jokes, the model was unable to classify jokes where the punchline or setup included statements on race, sexual orientation, gender, etc. Upon human inspection, these posts were not necessarily discriminatory and were typically a clever play on words. However, reasoning about whether a joke crosses a line requires a greater level of understanding of the joke and overcoming subjectivity when the line of civility is crossed. For example, in the following example, the poster plays on words using the double meaning of straight and might not necessarily mean to discriminate against any group based on sexual orientation. While this would have been allowed on r/Jokes, the
Post:
Judgment:
Justification:
3.2 LLM-Mod Rule-Violating posts
Now, we evaluate the performance of
3.2.1 Multi-step prompting.
We adopted a multi-step prompting approach to augment additional information to the
Level 1: Keyword Association. This occurs when a post contains keywords directly associated with a rule violation. For instance, the below example post (from r/askhistorians) highlights how
Post:
Judgment:
Justification:
Rules 1 and 2 of r/askhistorians prohibit hypothetical posts as they are not historical in nature. Hypothetical questions often contain the words “what if”, as seen in the above post. We found for this particular rule,
Level 2: Stance Analysis. This includes occurrences when a post must be analyzed for a stance beyond simple word association. In an example post from r/changemyview,
Post:
Judgment:
Justification:
In comparison to level 1 (keyword association), we note that level 2 (stance analyses) lacks particular keywords that associate with loaded questions the same way that “what if” associates with hypotheticals. Similarly, it is hard to conduct a keyword-based classification on the neutral stance of a post.
3.3 Other Strengths and Weaknesses in LLM-Mod ’s Responses
We thematically group other characteristics of
Strengths. We note the two strengths of
—Defining Key Terms. One of the prompt augmentation methods we used was asking a prompt to define key terms. For instance, asking the LLM to define a “neutral stance” and come up with example posts for it. GPT 3.5 was able to do this very well and consistently. This indicates that it, at least definitionally, knows the terms of a rule and can identify simple cases of rule breaks if they are especially egregious.
—Identifying problematic parts of a post. Another prompt augmentation method we used was asking the LLM to identify the problematic part of a post after telling it that a post violates a certain rule. With this information, the LLM was able to identify consistently which part of the post was the violating part and why it broke the given rule. This indicates that the LLM can reason about Level 2 concepts but cannot identify them without help.
Weaknesses. We noted two weaknesses of
—Vacillating responses: Another observation that makes
User
User
User
—Non-committal language Despite the ability to identify a problem with a post,
4 DISCUSSION AND CONCLUSION
4.1 Implications
Our research indicates that while there are promising signs, further exploration is necessary before directly adopting LLMs for automating moderation. Caution is critical, considering the potentially severe consequences of moderation actions [15], including content removal [18], user account suspension [10], community-wide bans [6], and quarantines [5]. However, we do not see replacing human moderators as the end goal. Human moderators are essential for communities to maintain the pulse of the members [36]. Instead, automated moderators should be seen as helpful tools to lessen the burden [9] and help more proactive moderation of online communities [26]. Our results indicated moderation contexts where
LLMs capture many common natural language tasks; they recognize sentiment and detect slurs or derogatory remarks, making them useful in identifying explicit hate and offensive speech, a common violation on Reddit [32]. This can also be extended to tasks such as the “hypothetical argument” rule in r/AskHistorians, the “homework question” filter in r/AskScience, or the “must start with ‘CMV:’ ” rule in r/ChangeMyView. LLMs could also be used to generate sample invalid posts with explanations or used to define the key terms in a community’s guideline—when the LLM controls the generation of the post, it can reason with higher accuracy. In our case,
4.2 Limitations and Future Directions
Our dataset for this project was relatively small because our objectives were not only limited to analyzing automated performance metrics but also to gain interpretable insights into what works for LLM-based moderators and what does not. Due to the nature of the Reddit API, it was challenging to obtain rule-violating posts. In the future, we would explore gathering violating posts by accessing the moderator report queue, reaching out to Reddit moderators across several large subreddits, and utilizing available large-scale datasets on removed content [4, 7]. With a larger corpus of violating posts, we may be able to extrapolate further trends in LLM reasoning on subtle rule violations. Further, specialized communities and communities serving sensitive populations [2, 34] may require additional considerations and safeguarding strategies when relying on automated (and LLM-based) tools for content moderation.
Although this paper primarily focused on content removals, the role of moderators—and content moderation more broadly—also involves promoting resilience within online conversations (e.g., enabling discussions to proceed despite an adverse event occurring [25, 39], ensuring that conflicts do not escalate [8]) and encouraging desirable behavior (e.g., prosocial outcomes [1]) within online communities. Future work should explore how LLMs can be leveraged to foster resilience and desirable behavior online.
In addition, several subreddit community guidelines included rules that depended on the current activity in the forum. For example, users are expected to check that a question has not already been answered before making their post. This was not within the context we could reasonably provide
This work inspires future research in exploring how providing more context on a post could help an LLM reason about rule-violating posts. We would consider including an analysis of community sentiment (from comments) and metadata from Reddit like a post’s upvote/downvote counts, number of comments, etc. Future models may be capable of reasoning more accurately, with clearer explanations over multi-media posts. Further exploration could be done on how to effectively incorporate human moderators in the loop of automated decisions while decreasing their overall workload. This could include having them only engage with content that has been appealed, review every decision and ask follow-up questions, or some other appropriate middle ground. An assessment of the ethical implications, the role of the modern moderator, and changes in community dynamics must be analyzed before productionizing
A APPENDIX
Post Source | Rule-Violating Post |
Sample Follow-up Prompts | |
Supplemental Material
Available for Download
- Jiajun Bao, Junjie Wu, Yiming Zhang, Eshwar Chandrasekharan, and David Jurgens. 2021. Conversations gone alright: Quantifying and predicting prosocial outcomes in online conversations. In Proceedings of the Web Conference 2021. 1134–1145.Google ScholarDigital Library
- Stevie Chancellor, Andrea Hu, and Munmun De Choudhury. 2018. Norms matter: Contrasting social support around behavior change in online weight loss communities. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarDigital Library
- Eshwar Chandrasekharan, Chaitrali Gandhi, Matthew Wortley Mustelier, and Eric Gilbert. 2019. Crossmod: A cross-community learning-based system to assist reddit moderators. Proceedings of the ACM on human-computer interaction 3, CSCW (2019), 1–30.Google ScholarDigital Library
- Eshwar Chandrasekharan and Eric Gilbert. 2019. Hybrid approaches to detect comments violating macro norms on reddit. arXiv preprint arXiv:1904.03596 (2019).Google Scholar
- Eshwar Chandrasekharan, Shagun Jhaver, Amy Bruckman, and Eric Gilbert. 2022. Quarantined! Examining the effects of a community-wide moderation intervention on Reddit. ACM Transactions on Computer-Human Interaction (TOCHI) 29, 4 (2022), 1–26.Google ScholarDigital Library
- Eshwar Chandrasekharan, Umashanthi Pavalanathan, Anirudh Srinivasan, Adam Glynn, Jacob Eisenstein, and Eric Gilbert. 2017. You can’t stay here: The efficacy of reddit’s 2015 ban examined through hate speech. Proceedings of the ACM on human-computer interactionCSCW (2017).Google ScholarDigital Library
- Eshwar Chandrasekharan, Mattia Samory, Shagun Jhaver, Hunter Charvat, Amy Bruckman, Cliff Lampe, Jacob Eisenstein, and Eric Gilbert. 2018. The Internet’s hidden rules: An empirical study of Reddit norm violations at micro, meso, and macro scales. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1–25.Google ScholarDigital Library
- Jonathan P Chang, Charlotte Schluger, and Cristian Danescu-Niculescu-Mizil. 2022. Thread with caution: Proactively helping users assess and deescalate tension in their online discussions. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–37.Google ScholarDigital Library
- Frederick Choi, Tanvi Bajpai, Sowmya Pratipati, and Eshwar Chandrasekharan. 2023. ConvEx: A Visual Conversation Exploration System for Discord Moderators. Proceedings of the ACM on Human-Computer Interaction 7, CSCW2 (2023), 1–30.Google ScholarDigital Library
- Farhan Asif Chowdhury, Dheeman Saha, Md Rashidul Hasan, Koustuv Saha, and Abdullah Mueen. 2021. Examining factors associated with twitter account suspension following the 2020 us presidential election. In Proceedings of the 2021 IEEE/ACM international conference on advances in social networks analysis and mining. 607–612.Google ScholarDigital Library
- Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated Hate Speech Detection and the Problem of Offensive Language. In International AAAI Conference on Web and Social Media.Google Scholar
- Bryan Dosono and Bryan Semaan. 2019. Moderation practices as emotional labor in sustaining online communities: The case of AAPI identity work on Reddit. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–13.Google ScholarDigital Library
- Casey Fiesler, Jialun Jiang, Joshua McCann, Kyle Frye, and Jed Brubaker. 2018. Reddit rules! characterizing an ecosystem of governance. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 12.Google ScholarCross Ref
- Mirko Franco, Ombretta Gaggi, and Claudio E Palazzi. 2023. Analyzing the Use of Large Language Models for Content Moderation with ChatGPT Examples. In Proceedings of the 3rd International Workshop on Open Challenges in Online Social Networks. 1–8.Google ScholarDigital Library
- Tarleton Gillespie. 2022. Do not recommend? Reduction as a form of content moderation. Social Media+ Society 8, 3 (2022), 20563051221117552.Google Scholar
- James Grimmelmann. 2015. The virtues of moderation. Yale JL & Tech. 17 (2015), 42.Google Scholar
- Manoel Horta Ribeiro, Justin Cheng, and Robert West. 2023. Automated content moderation increases adherence to community guidelines. In Proceedings of the ACM web conference 2023. 2666–2676.Google ScholarDigital Library
- Shagun Jhaver, Iris Birman, Eric Gilbert, and Amy Bruckman. 2019. Human-machine collaboration for content regulation: The case of reddit automoderator. ACM Transactions on Computer-Human Interaction (TOCHI) 26, 5 (2019), 1–35.Google ScholarDigital Library
- Shagun Jhaver, Amy Bruckman, and Eric Gilbert. 2019. Does transparency in moderation really matter? User behavior after content removal explanations on reddit. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019).Google ScholarDigital Library
- Shagun Jhaver, Himanshu Rathi, and Koustuv Saha. 2024. Bystanders of Online Moderation: Examining the Effects of Witnessing Post-Removal Explanations. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems.Google ScholarDigital Library
- Prerna Juneja, Deepika Rama Subramanian, and Tanushree Mitra. 2020. Through the looking glass: Study of transparency in Reddit’s moderation practices. Proceedings of the ACM on Human-Computer Interaction 4, GROUP (2020), 1–35.Google ScholarDigital Library
- David Jurgens, Libby Hemphill, and Eshwar Chandrasekharan. 2019. A Just and Comprehensive Strategy for Using NLP to Address Online Abuse. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 3658–3666.Google ScholarCross Ref
- Vinay Koshy, Tanvi Bajpai, Eshwar Chandrasekharan, Hari Sundaram, and Karrie Karahalios. 2023. Measuring User-Moderator Alignment on r/ChangeMyView. Proceedings of the ACM on Human-Computer Interaction 7, CSCW2 (2023), 1–36.Google ScholarDigital Library
- Deepak Kumar, Yousef AbuHashem, and Zakir Durumeric. 2023. Watch Your Language: Large Language Models and Content Moderation. arXiv preprint arXiv:2309.14517 (2023).Google Scholar
- Charlotte Lambert, Ananya Rajagopal, and Eshwar Chandrasekharan. 2022. Conversational resilience: Quantifying and predicting conversational outcomes following adverse events. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 548–559.Google ScholarCross Ref
- Daniel Link, Bernd Hellingrath, and Jie Ling. 2016. A Human-is-the-Loop Approach for Semi-Automated Content Moderation.. In ISCRAM.Google Scholar
- Huan Ma, Changqing Zhang, Huazhu Fu, Peilin Zhao, and Bingzhe Wu. 2023. Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning. arXiv preprint arXiv:2310.03400 (2023).Google Scholar
- Todor Markov, Chong Zhang, Sandhini Agarwal, Florentine Eloundou Nekoul, Theodore Lee, Steven Adler, Angela Jiang, and Lilian Weng. 2023. A holistic approach to undesired content detection in the real world. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 15009–15018.Google ScholarDigital Library
- Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A Measurement Study of Hate Speech in Social Media. In Proceedings of the 28th ACM Conference on Hypertext and Social Media (Prague, Czech Republic) (HT ’17).Google ScholarDigital Library
- Sankha Subhra Mullick, Mohan Bhambhani, Suhit Sinha, Akshat Mathur, Somya Gupta, and Jidnya Shah. 2023. Content Moderation for Evolving Policies using Binary Question Answering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track). 561–573.Google ScholarCross Ref
- Chan Young Park, Julia Mendelsohn, Karthik Radhakrishnan, Kinjal Jain, Tushar Kanakagiri, David Jurgens, and Yulia Tsvetkov. 2021. Detecting Community Sensitive Norm Violations in Online Conversations. In Findings of the Association for Computational Linguistics: EMNLP 2021. 3386–3397.Google ScholarCross Ref
- Joon Sung Park, Joseph Seering, and Michael S Bernstein. 2022. Measuring the prevalence of anti-social behavior in online communities. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–29.Google ScholarDigital Library
- Koustuv Saha, Eshwar Chandrasekharan, and Munmun De Choudhury. 2019. Prevalence and psychological effects of hateful speech in online college communities. In Proceedings of the 10th ACM Conference on Web Science. 255–264.Google ScholarDigital Library
- Koustuv Saha, Sindhu Kiranmai Ernala, Sarmistha Dutta, Eva Sharma, and Munmun De Choudhury. 2020. Understanding Moderation in Online Mental Health Communities. In HCII. Springer.Google Scholar
- Angela M Schöpke-Gonzalez, Shubham Atreja, Han Na Shin, Najmin Ahmed, and Libby Hemphill. 2022. Why do volunteer content moderators quit? Burnout, conflict, and harmful behaviors. New Media & Society (2022), 14614448221138529.Google ScholarCross Ref
- Joseph Seering, Tony Wang, Jina Yoon, and Geoff Kaufman. 2019. Moderator engagement and community development in the age of algorithms. New Media & Society 21, 7 (2019), 1417–1443.Google ScholarCross Ref
- Qiaosi Wang, Koustuv Saha, Eric Gregori, David Joyner, and Ashok K Goel. 2021. Mutual Theory of Mind in Human-AI Interaction: How Language Reflects What Students Perceive About a Virtual Teaching Assistant. In Proc. CHI.Google ScholarDigital Library
- Meng Ye, Karan Sikka, Katherine Atwell, Sabit Hassan, Ajay Divakaran, and Malihe Alikhani. 2023. Multilingual Content Moderation: A Case Study on Reddit. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. 3828–3844.Google ScholarCross Ref
- Justine Zhang, Jonathan Chang, Cristian Danescu-Niculescu-Mizil, Lucas Dixon, Yiqing Hua, Dario Taraborelli, and Nithum Thain. 2018. Conversations Gone Awry: Detecting Early Signs of Conversational Failure. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1350–1361.Google ScholarCross Ref
Recommendations
Practicing Moderation: Community Moderation as Reflective Practice
CSCW1Many types of online communities rely on volunteer moderators to manage the community and maintain behavioral standards. While prior work has shown that community moderators often develop a deep understanding of the goals of their moderation context and ...
Towards Intersectional Moderation: An Alternative Model of Moderation Built on Care and Power
CSCWShortcomings of current models of moderation have driven policy makers, scholars, and technologists to speculate about alternative models of content moderation. While alternative models provide hope for the future of online spaces, they can fail without ...
Content Moderation and the Formation of Online Communities: A Theoretical Framework
WWW '24: Proceedings of the ACM on Web Conference 2024We study the impact of content moderation policies in online communities. In our theoretical model, a platform chooses a content moderation policy and individuals choose whether or not to participate in the community according to the fraction of user ...
Comments