From Answering for Points to Commenting for Others

In this paper, we examine posting behavior on a large-scale online knowledge sharing platform, Stack Overflow, and investigate how that behavior changes as users gain experience and status. We use a trace ethnographic approach, analyzing both the sayings and doings of users and the numerical and categorical trace data produced by their activities on the platform. Through analysis of this data, as well as interviews with users at different levels of experience, we identify a pattern of behavior where users shift their focus over time from directly answering questions themselves to supporting others and maintaining the quality of knowledge on the platform thus playing a particular role as the knowledge community scales


INTRODUCTION
In this paper, we examine posting activity on Stack Overflow, a large online Community Question Answering (CQA) platform where more than 100 million people visit each month to share and obtain knowledge on computer programming. As a CQA platform, knowledge sharing on Stack Overflow is mainly organized through the asking and answering of questions. The resulting threads can be searched using keywords and tags, making the knowledge shared a resource not just for the question asker but also for other users. However, unlike a thread where a single user answers the question of another, threads on Stack Overflow may have multiple answers supplied by multiple users along with comments on both the question and answers by still more users. This results in rich knowledge resources and while the structure of threads may become complex, their quality is moderated by a system of up-and downvoting along with the ability for users with certain status levels to suggest edits or even directly edit the posts of others themselves. Status level is determined through a "reputation score" system where users receive points for their contributions largely based on the upvotes of other users on their questions and answers. In this paper, we take an interest in the activities of Stack Overflow users in relation to their reputation scores with a particular focus on how posting behaviour changes as reputation score increases.
Taking a trace ethnographic approach, we maintain an interpretivist stance while engaging both with the sayings and doings of users and with the numerical and categorical trace data that their activities produce on the digital platform. This means that our analysis is mixed methods in nature, including both ethnographically oriented approaches and close statistical analysis of online activity. Through these analyses, we identify and unpack a pattern of behaviour where over time, users largely stop answering questions themselves and focus on commenting on the questions and answers of others. In this way, they shift the focus of their activity to helping other users learn to produce high quality questions and answers through activity that can be described as distributed mentoring [2,5]. As our analysis will show, while relatively few in number, high reputation score users play an important role in nurturing the knowledge community as it scales by focusing on supporting newer users and maintaining quality norms.

BACKGROUND
The task of understanding posting behaviors on online platforms is one that has been taken on by researchers working in a wide range of subject areas and traditions. When the activity on an online platform can viewed as knowledge sharing or the platform itself can be viewed as a space for learning, posting behavior has been of particular interest in the tradition of computer supported collaborative learning [3]. However, much of this interest has been expressed in relation to formal educational settings where the scale of participation is relatively small and activity is framed by school or higher education contexts [13]. Similarly, in the emerging area of Learning at Scale, there has been a good deal of interest in posting behavior but most clearly in relation to MOOCs where there are still aspects of the structures and framing of formal education [7]. However, in their 2020 argument that studies of informal learning communities should form a more significant part of the Learning at Scale research contribution, Hudgins et al. [7] call for research that better understands what allows such communities to scale to sometimes enormous sizes while supporting learning. Examining an aspect of such largescale communities that relates to the question of how the knowledge sharing practices users undertake changes as their experience and status increases is the aim in the present study.
Like more formal large-scale online settings and MOOCs, informal communities such as the focus in this study, Stack Overflow, also have structures and mechanisms that support learning and knowledge-sharing activities. These include reward systems, such as reputation score, as well as community standards and crowd-based moderation. Such features are found on wide range of different platforms [7]. In the case of Stack Overflow, one of the most highly visible mechanisms is the reputation score. Users receive and lose reputation score for almost all activity on the platform and the metric is used in a wide variety of processes from sorting questions and answers to deciding moderation privileges. Several studies have examined reputation score in relation to activity on the platform with results that suggest reputation has a significant role in shaping the pattern of activity on the platform [14]. For example, Roy et al. [10] found that while the score proports to incentivize the sharing of high quality knowledge in the form of answers to the questions of other users, it is also associated with the phenomenon of a small group of around 1% of users producing a large number of low quality posts through a behavior the authors' describe as 'reputation collecting'. At the same time, however, a number of studies have suggested that reputation score is a good metric of the expertise of a user [9] and that it accurately predicts the quality of contributions [8]. Such studies of reputation score in relation to the number and quality of user contributions offer a way of interrogating large-scale data sets and uncovering trends in user behaviors. However, they tend to focus more on quantifiable performance and tend to not include users' own perspectives on their online participation, therefore limiting opportunities to interpret the discovered trends and patterns.
Another strand of studies examining learning and knowledge sharing on large-scale platforms takes a more interpretive approach. For example, Sengupta and Haythornthwaite [11] used qualitative content analysis to examine the role of commenting on Stack Overflow finding that the activity supported learning, knowledge development and community. In their study of fan fiction communities, Campbell et al. [2] found that the posting activity of members as they reviewed each other's contributions formed what they describe as distributed mentoring. A process where feedback is aggregated, and the quality of content is taken to be a collective responsibility. However, the authors do not address the relationship between the structures and mechanics of the online platform hosting the communities studied and the way in which feedback aggregation takes place. In this way, the study implies that the work of aggregating feedback is done by the individual members who receive it. While this strand of research allows us to interpretatively unpack how learning may be happening on large-scale platforms, their interpretive orientation is rarely accompanied by attention to the specificities of platform mechanics or to the broader trajectories of user participation on platforms.
This study combines examination of large-scale numerical and categorical data to understand how posting behavior changes over time in relation to the metric of reputation score with analysis of the accounts of higher-reputation users' as they discuss their participation on the platform.
Higher-reputation users' trajectories and accounts are of particular interest here precisely since they have participated actively on the platform and their contributions have been generally recognized via voting as valuable. The availability and abundance of their traces on the platform allows us to examine the longitudinal aspect of their participation and their use of different platform mechanics, such as answering and commenting. In addition, as both their reputation scores and contributed content are publicly visible, it may be argued that the traces of their participation can serve as a resource for socialization for newer users. Therefore, examining their activities may shed light on what counts as legitimate participation on the platform. Importantly, complementing the analysis of user trajectories with their own accounts allows us to interpret those trajectories as constituting particular knowledgesharing practices.
It can be argued that in addressing the issue of scale, both streams of studies outlined above tend to approach scale as a given that de facto allows for different learning processes, such as those described through the notion of distributed mentoring. However, as Chen at al. [3] note, scale can also present challenges for learning and collaboration online, associated, for instance, with coordinating collective actions and assessing the quality of content. By employing the chosen methodological approach, we aim to examine how Stack Overflow users do not just operate in a large-scale environment, but continuously produce it throughout their participation careers, as well as what they position as challenges and assets of the scale.

METHOD
To examine the platform careers of Stack Overflow users, this study maintains an interpretivist stance to the analysis of both data generated through interviews and the numeric and categorical data produced by the platform itself. To do this, we draw on trace ethnography [6] working from the premise that digital traces such as posts, votes and tags left by users on platforms like Stack Overflow are not only a way for platforms to operate but are also how users understand their own activities. In this sense, we argue that these traces form part of the field in any ethnographic study of platform activity.
Trace data in this study has been used in three different ways. Firstly, trace data was collected for a specific tag for a short Figure 1: Distribution of reputation scores duration to grasp typical activity patterns on the platform. Secondly, trace data was used to identify potential interviewees by their reputation scores. Thirdly, after identifying interviewees, more detailed trace data was captured for each participant with the aim of tracing their participation trajectories.
In the first phase of the procedure, a dataset was compiled which contains traces of various forms of activity on the platform, as well as metadata such as timestamps and the reputation score of posters. Given the sheer size of Stack Overflow, included users were limited to those who posted in a thread tagged for the Python programming language during a one-month period. Using the Stack Overflow Application Programming Interface (API) to extract data from the platform's database, this resulted in a dataset with the activity of 368,176 users who had contributed at least one question, answer, or comment during the month.
In the second phase, the dataset was organized by user reputation score at the end of the month and a selection of users at different score levels were then invited to take part in interviews. This resulted in in-depth interviews with 16 users with a broad span of reputation scores that at the time ranged from a few thousand to over a million. However, in relation to the overall distribution of scores on the platform, this span can be seen to be rather narrow. Figure 1 shows a semi-log plot of the distribution of reputation scores for users included in the one-month dataset. It shows a strongly skewed, heavy-tailed distribution with most users having a low score and very few having a high score.
Given this distribution, it is reasonable to conclude that Stack Overflow follows the widely observed 'long tail' pattern [1] where active contributors tend to reach the top few percent of scores within their first few weeks of participation. For this reason, all 16 interviewees fall within the top 2% of reputation scores despite having a wide range of scores. The interviews conducted with these selected users were semi structured in nature. They followed a guide that included topics related to the participants' use of the platform and how it has changed over time. Each was audio recorded and transcribed in full before the responses were analyzed thematically in relation to the questions asked. Participants have been assigned pseudonyms based on a random name generator.
In the third phase, the Stack Overflow API was used again to collect all available traces of activity for each of the interviewees so that trajectories of their participation on the platform could be assembled. These trajectories were then considered in relation to the interviews and readings of the participant's activity on the platform so that an understanding of each participants' platform career could be developed.
The project this study is conducted within was reviewed by the Swedish Ethical Review Authority and it was judged to be of low risk and in compliance with prevailing laws and regulations. In the case of the trace data collected from all users who contributed to a thread tagged with the Python tag during a onemonth period, the data has only been analyzed and presented at an aggregated level and care has been taken to avoid presenting items in such a way that individual users can be identified. In compliance with the terms and conditions established by parent company Stack Exchange for the reuse of data from Stack Overflow, the trace data used in this project is covered by a Creative Commons license [4]. Consent was not sought from users for the collection and use of this data. In contrast, those selected users who participated in interviews as part of the study did provide informed consent. Those users are identified in the findings through fictionalized usernames with indicators of their approximate reputation score to protect their identities.

Variation in Contribution Types
Starting with the dataset that includes all traces produced through the activity of users in threads tagged with the Python tag over the period of one month, we calculated the number of questions, answers and comments contributed by each user. This revealed a large variation in the number of posts per user and between post types (see Table 1). Users asked relatively few questions, answered some, but commented on the questions and answers of others the most. For all post types, some users contributed no posts during the month while in the case of answers and even more so comments, some posted large numbers.
Returning to the trace dataset, we extracted the reputation score for each user at the end of the month and compared it with the number of posts that user had made. Like reputation score (see Figure 1), the other variables of interest also have a strongly skewed distribution. To investigate the relationships between them, we first transformed the data using the Box-Cox technique, first adding one to all the values to avoid any zeros. This produced approximately normal distributions for all the variables (see Table  2). With the data transformed, we then examined possible correlations between reputation score and each of the three post types. This yielded a set of marginal correlations. These results show that users with higher reputation scores tend to post marginally fewer questions per month (r = -0.13, p < 0.001), though it should be noted that users in general ask relatively few questions (see Table 1). Higher reputation users also post more answers per month (r = 0.25, p < 0.001), but post even more comments (r = 0.32, p < 0.001). This shows as might be expected that as users gain experience on the platform, they ask less questions and contribute more answers, but also that a greater proportion of their posting activity is focused on commenting on the questions and answers of others. While relatively subtitle when examined through a cross-sectional analysis, this difference suggests a possible shift in posting behavior as users gain reputation score. This shift stands out in the context of Stack Overflow where the mechanics of the reputation score along with the status and privileges it affords can be seen to incentivize certain activities over others. For example, an upvote on a question or answer gains the poster 10 points [15]. Within this system of incentives, however, there are no mechanisms by which posting comments yields points. In other words, while subtle, the differences in relationships between question, answer and comment posting with reputation score suggests that higher reputation users tend to focus more on activities that do not yield them reputation points.
Preference for commenting over answering was also brought up by higher-reputation-score interview participants, who offered accounts of typical comments they post and rationales for choosing to comment: Sometimes it will be: "Ok, I do not think this question is worth answering, because it is just a typo, but here is where the typo is." Or even: "I do not have enough time to write a good answer, but I think this is where you should be looking." And I would rather put that as a comment than a one line, not really a full answer. I want to take pride in my answers. (frogwhale, score of more than a million) Such accounts demonstrate that the choice of contribution type is to some extent conditioned by users' understanding of what counts as a 'good question' or a 'good answer' on the platform. As one user noted: I post comments when the answer of the question is obvious or where there is no real answer at all. (octavemoss, score in the hundreds of thousands) This points towards the assumption that comments may be used as workarounds to help question askers even when the formulation of a question or what are seen as possible solutions to it do not meet what can be seen as the rigid quality criteria norms of the Stack Overflow community. Comments are used not only to provide solutions, but to address the quality of questions. One interviewee stated that, seven out of ten questions usually is either a duplicate questions or it is missing some information. (carrotpie, score in the hundreds of thousands) In relation to this issue, several participants described how much of their commenting activity involves repetition of a few key phrases that they have saved so that they can copy-paste into Stack Overflow such as, Please post a minimal reproduceable example, […] I have it in a separate text and I just copy/paste that because we need a minimal reproduceable example to work with the question. (onioncereal, score in the tens of thousand) These interview accounts indicate that users do not necessarily choose to comment instead of answering. Rather, some questions lack sufficient details, and comments can be employed to bridge the gap and help posters reformulate their questions in ways that are answerable and in line with Stack Overflow quality criteria norms. This suggests that one of the reasons that comments may make up a greater proportion of the posts contributed by higherreputation-score users is that they are more familiar with the quality criteria norms of Stack Overflow and therefore more likely to police those norms.

Moving Towards Moderation and Mentoring
Examining the posting activity of the higher-reputation-score users interviewed for this study over time suggests comments may not only make up a greater proportion of the contributions made by these users, but that overall levels of contribution may Taking a user with about 14 years of experience of using the platform and a reputation score of nearly 100 thousand as an example, it is possible to see that after around five years of participation, the overall number of contributions per month started to decline. In Figure 2, this trend can be seen to continue for the remainder of the trajectory, but by about 8 years of participation, the user stopped contributing questions and answers entirely, while continuing to post comments, albeit at a lower frequency than at the peak of their participation. A similar pattern can also be seen in the activity of another interviewed higher-reputation-score user. In the example visualized in Figure  3, the user has amassed a reputation score of over a million during their 14 years of participation. Their number of monthly contributions can be seen to increase for the first four years and then decline until about eight years when it stabilizes for the remaining six years with almost all the contributions during that later period consisting of comments. A trend toward commenting rather than answering was also recognized by several of our high-reputation-score interview participants. For example, one interviewee outlined the shift in their posting behavior saying, So, I certainly answer far fewer questions than I used to. Partly I think my own standards for questions have gone up a little bit.
[…] these days there is a lot of cases where I would ask clarifying questions and comments instead, and sort of encourage the user to do more of the work themselves when asking the question. That combined with, there is not an awful lot of questions, so a lot of questions end up as duplicates, mean that although I still spend a significant time on the platform, I spend a lot of time adding comments rather than answering questions. (earthdeer, score in the hundreds of thousands) As our interviews indicate, shifting away from answering questions is often associated both with changes in personal preferences (e.g., higher 'standards for questions' worth answering, amount of time a person is willing to dedicate to answering), and with such reported aspects of Stack Overflow as the large number of duplicate questions and the extremely fast pace of answers associated with competition over reputation points. These aspects may be associated with the large scale of Stack Overflow discouraging users from answering. At the same time, users' commenting activities can be seen to mitigate these challenges. For example, comments may connect duplicate questions to already answered ones, therefore reducing 'noise' and structuring the knowledge on the platform, while to a certain extent providing the question asker with suggestions on the solution. As the interviewee whose trajectory is shown in Figure  2 describes: If I notice something that's related to the question or the answer that may be relevant for other people passing by, or if I notice something that's confusingly worded or misleadingly worded, I will also write a comment. (geminieagle, score around a hundred thousand) Similarly elaborating on their use of comments as a productive activity, the interviewee whose trajectory is shown in Figure 3 offered: Sometimes you find people adding an answer which doesn't contribute anything new to something which has already got multiple answers, and I will then say … have a standard spiel which says: "If you answer late on a question, and don't provide anything new, you won't get very much in the way of upvotes, you may even get downvoted." (frogwhale, score of more than a million) Later, the same interviewee further reflected on the role of comments in helping new users to understand the norms on the platform noting that, Stack Overflow has been accused of being hostile to newcomers […] And I often will end up with adding a comment to help them understand why they're getting downvoted, what they need to do to improve the question or whatever." (frogwhale, score of more than a million) These extracts illustrate two common approaches amongst the higher-reputation-score users interviewed for the study. The first one can be summarized as moderating in a sense of using comments to improve the quality of questions and answers on Stack Overflow for the benefit of future users who will be searching for similar problems and reading existing answers. The second one relates to using comments to mentor newer users on the platform by offering them feedback on their questions and answers and by advising on platform guidelines and quality norms.
Interestingly, however, regardless of approaches to commenting, there is a common feature visible in the trajectories of higher-reputation-score users. As can been seen in both Figure  2 and Figure 3, reputation score continues to increase even after a user has shifted to contributing non-reputation-score generating comments. This continued increase in score is primarily generated by answers posted earlier that continue to attract attention and upvotes. It suggests that higher-reputation-score users may reach a point where their existing contributions generate reputation score to an extent that means that they do not need to attend to generating more while still maintaining the status implied by a top-level score within the community.

DISCUSSION
This study contributes to the research on learning and knowledge sharing in informal large-scale online communities, using the case of Stack Overflow. We first examined the correlation between users' reputation scores and their posting behaviors. Earlier research has highlighted that number of answers posted is positively correlated with reputation score [14]. This study confirms that result but shows that number of comments posted is marginally more correlated. Several earlier studies argue or assume that the relationship between answers and reputation score can at least partly be attributed to users whose primary motivation is to acquire reputation points and so look specifically for questions that can be easily answered [10]. The results of this study, however, show that users with the highest levels of reputation score tend not to post answers, but instead post comments on the questions and answers of other users, thereby focusing on activity that yields no score. Our analysis of user trajectories shows that not only are higher-reputation-score users more likely to comment, but that over time users tend to prioritize commenting over other contribution types. In terms of platform mechanics and design choices, these results also suggest that reputation score may not motivate users' active and continuous engagement throughout their entire trajectories of participation on the platform. At the same time, while commenting offers no possibility for gaining reputation points, the reputation scores of the examined users continued to gradually increase due to their earlier contributions. Therefore, rather than taking increases in reputation-score to be a proxy of a user's increased or even stable participation or expertise, its role in reflecting participation or expertise requires a more careful consideration in relation to other activities, such as commenting.
Analysis of the interviews in this study suggests that users are themselves aware of some aspects of the trends we identify. Our analysis shows that commenting is seen as a way to moderate content, which can be seen as a way to mitigate some of the problems associated with scale and produce the scale of the platform as an advantage for a large audience of users. Commenting can also be discussed as a form of distributed mentoring [2,5], since the aggregation and abundance of comments may become a resource for users to improve their questions and answers based on commenters' own conceptions of the norms for quality on the platform.
Our findings also point to the importance of considering moderating and distributed mentoring within the context of users' participation trajectories, as well as particular characteristics of a platform. In the case of the interviewed Stack Overflow users, moderating and mentoring activities to a large extent displaced answering activities, and this shift at least partly came as a response to problems such as high numbers of duplicate or lowquality questions. Problems that can be understood as challenges associated with the size of Stack Overflow or indeed other similar platforms as they scale.
Based on the analysis presented here, we argue that commenting both addresses the challenges associated with scale and is also key to producing norms that reflect a certain kind of community and by extension, platform. In the case of Stack Overflow, establishment and policing of those norms can be argued to be oriented toward producing widely generalizable knowledge that meets specific quality standards. Specifically, as interpreted by the interviewed high-reputation users themselves, comments are associated with making questions and answers valuable for future readers, structuring the knowledge available by indicating duplicate questions, and improving the quality of questions to occasion higher quality answers despite a constant influx of new users. Of course, this is only possible due to the mechanics of Stack Overflow that allow for commenting, editing, and the durability and searchability of the knowledge made available.

CONCLUSION
In relation to the mechanics of Stack Overflow, the analysis presented in this study suggests that the premise that posting behavior is motivated by increasing score may not hold for those with already high scores. The findings indicate that as a user's reputation score grows, they may reach a point where their activity orients to the common good of maintaining the quality of the platform as a knowledge resource and to supporting others. In the case of Stack Overflow, this change is marked by a shift to posting that does not generate score, but the mechanics of the platform are such that the scores of higher-reputation-score users who choose not to post score generating posts still tend to increase. On the one hand, this feature can be seen to lead to an exacerbation of an unfair cumulative advantage since a kind of Matthew effect is produced where those who already have high scores only continue to get higher scores while those with lower scores may struggle to reach the highest status levels. Attention has been drawn to the inherent inequities associated with this phenomenon by users on Stack Overflow and by researchers who have explored implementing decay functions [12]. On the other hand, however, the mechanics allow the most experienced users to maintain their status while shifting their attention to what can be argued to be more altruistic posting behaviors. From this perspective, our findings suggest that any plan to implement a decay function to alleviate Mathew effect issues associated with user scores should consider the effect that function might have on the behavior of higher score users who may play a particular role in the functioning of a knowledge community as it scales. That the relationship between posting behavior and the mechanics of reputation score involves such a balance of opportunities and costs suggests that understanding the details of such relationships beyond simply viewing scores as a fixed motivating factor is necessary for understanding the dynamics of knowledging on large-scale platforms. Given the widespread implementation of score mechanisms on knowledge sharing platforms, this complexity suggests the need for studies that examine changes in user behavior in relation to scores that go beyond cross-sectional approaches to further examine changes over time and that further examine the norms produced in relation those changes. Such studies would help to further unpack the roles that users at different score levels, and in relation to different scoring mechanics, play in enabling the scaling of platforms.