Dynamics of Disagreement: Large-Scale Temporal Network Analysis Reveals Negative Interactions in Online Collaboration

Disagreement and conflict are a fact of social life. However, negative interactions are rarely explicitly declared and recorded and this makes them hard for scientists to study. In an attempt to understand the structural and temporal features of negative interactions in the community, we use complex network methods to analyze patterns in the timing and configuration of reverts of article edits to Wikipedia. We investigate how often and how fast pairs of reverts occur compared to a null model in order to control for patterns that are natural to the content production or are due to the internal rules of Wikipedia. Our results suggest that Wikipedia editors systematically revert the same person, revert back their reverter, and come to defend a reverted editor. We further relate these interactions to the status of the involved editors. Even though the individual reverts might not necessarily be negative social interactions, our analysis points to the existence of certain patterns of negative social dynamics within the community of editors. Some of these patterns have not been previously explored and carry implications for the knowledge collection practice conducted on Wikipedia. Our method can be applied to other large-scale temporal collaboration networks to identify the existence of negative social interactions and other social processes.

In the last few decades, network science has significantly advanced our understanding of the structure and dynamics of the human social fabric 1 . Much of the research, however, has focused on positive relations and interactions such as friendship and collaboration 2,3 . Considerably less is known about networks of negative social interactions such as distrust, disapproval, and disagreement. Although such interactions are not as common, they strongly affect people's psychological well-being, physical health, and work performance [4][5][6][7][8] .
Previous work has investigated social networks with negative relations mainly from the perspective of structural balance theory [9][10][11] . According to this theory, the structure of social networks with both positive and negative relations, also known as "signed networks, " should follow principles such as "the friend of my friend is my friend" and "the enemy of my friend is my enemy. " Empirical research has found evidence that balanced triads are indeed dominant [12][13][14] , that the global network structure tends to be balanced 15,16 , and that networks evolve to increase their balance 17 . While this research illuminates how negative social relations interact with positive social relations, we still know very little about the structure and dynamics of negative interactions. For example, we have limited knowledge regarding the extent to which interaction mechanisms that we have observed for positive interactions, such as direct reciprocity and generalized, or "pay-it-forward" reciprocity, transfer to negative interactions.
Motivated by these gaps in our knowledge, we study a large online collaboration community and analyze the timing and configuration of sequences of contributions to identify patterns of negative social interactions among users. In such communities, users sometimes undo or downrate contributions made by other users. Most often these actions intend to maintain and improve the collaborative project. However, it is also possible that these actions are social in nature. Although we cannot tell whether a particular content-related action is dictated by hostility towards another user, we can study the dynamic patterns of such actions in aggregate to find if social processes occur in the community and if so, the form they take. This is what we do here.
We focus on six interactions, each of which is a sequence of two content-related actions between two or three users. We analyze the six interactions as temporal "motifs" 18,19 . We compare the observed frequency and dynamics of the motifs to those generated by a null model without any systematic clustering of actions in time for individuals but with equivalent daily patterns and community structure. In addition, we study the status effects associated with the motifs, where we use the contributor's volume of activity on the collaborative platform as a proxy for status. We confirm that the interaction is social if the corresponding motif exhibits time patterns that significantly differ from those observed in the null model and if the status effects fit with our expectations based on social psychology and results from previous studies. To present a richer picture of the observed patterns of interaction, we also look for prominent cultural effects associated with the motifs.
Our data come from Wikipedia, the largest free-access online encyclopedia. There has been much research on the coverage, quality, and biases of information provided by Wikipedia [20][21][22] and the controversy of topics and articles on it 23,24 . Wikipedia and its community of editors have also been extensively studied to investigate fundamental social processes. For example, using the Wikipedia community as a test case, previous research has shown that social rewards improve individual effort 25 , language choice reveals power relations 26 , success breeds success 27 , and conflict can be both productive and counter-productive 28-30 . Wikipedia is edited by millions of volunteers. Usually, editors add text to articles, polish the existing text, or correct minor errors. Sometimes, however, they revert other editors-they undo other editors' contributions by restoring an earlier version of the article. The option to "revert" an edit is a specific feature of the Wikimedia software that was originally implemented to cope with vandalism. Reverts are technically easy to detect regardless of the context and the language and hence, they enable analysis at the scale of the whole system.
Although reverts on Wikipedia are intended to improve the encyclopedia's content, previous research has acknowledged that they could also imply negative social interactions. According to one perspective, reverts constitute maintenance work and imply conflict and coordination costs 31,32 . According to another perspective, the number of words added, deleted, and restored can be used to visualize the bipolarity of the network of editors behind an article, which in turn is associated with the perceived controversy of the article 33 . In fact, information on reverts can be used to construct a measure of article controversy and study editorial wars in relation to topics and articles [34][35][36] . Further, the time-series of reverts and non-revert edits can be analyzed to infer states of conflict and cooperation over the history of an article 37 . Reverts can also be combined with other actions with negative valence on Wikipedia, such as voting against an editor's promotion to admin level, to infer a signed network of relations 14 . Finally, Wikipedia reverts have been used to develop formal mathematical models of opinion dynamics explaining conflict in collaborative environments in general and beyond the Wikipedia platform 38,39 . Similarly to these previous studies, we acknowledge that a revert is not necessarily socially motivated but reverts consistently occurring in certain patterns could be signs of negative social interactions. Going beyond most existing research, we analyze the dynamics of the entire networks of reverts by considering various temporal motifs. Previous work has analyzed the distribution and structure of editing sequences and editing motifs on Wikipedia 40,41 and demonstrated the potential of this method to reveal unknown patterns in online collaboration. We here apply this method to reverts only in order to reveal patterns of negative interactions on Wikipedia. We are also the first to analyze data of reverts on multiple language editions of Wikipedia as social networks. The networks we analyze comprise about 4.7 million reverts over the period January 2001-October 2011 among the editors of 13 different language editions of Wikipedia: English, Spanish, German, Japanese, French, Portuguese, Chinese, Hebrew, Arabic, Hungarian, Persian, Czech, and Romanian. The evidence we find suggests that certain social processes interfere with how knowledge is negotiated on Wikipedia. The approach we assume can be applied to other online collaboration networks and content communities to identify the presence of similar interferences.

Results
To build the revert networks, we create time-stamped directed links going from the reverters to the reverted users. The aggregate revert networks in different language editions vary in size from 4,019 to 701,169 nodes (Table 1). All networks have average clustering and reciprocity that are significantly larger than expected in a simple random graph with the same size and density (these results were confirmed in exponential-random graph models 42,43 not reported here). The giant components in the networks have short path lengths in the range 3.04-4. 16. The out-degree and in-degree distributions are fat-tailed, consisting of a few editors with many reverts and many editors with only a few reverts ( Supplementary Fig. S1). Further, the out-degree assortativity in the directed networks is significantly negative, implying that users who revert frequently tend to revert users who revert rarely. There is no evidence for in-degree assortativity. Our analysis focuses on temporal motifs, which are classes of event sequences that are similar not only in the topology but also in the temporal order of the events 44 . In particular, we analyze the two-event temporal motifs in which after A reverts B, A reverts B again (AB-AB), B reverts A back (AB-BA), B reverts C (AB-BC), C reverts B (AB-CB), A reverts C (AB-AC), and C reverts A (AB-CA), as illustrated in Fig. 1. To identify the motifs, we look at every revert and identify if and when a response of these six forms occurs, restricted to a time window of 24 hours. The response may happen either in the same or in a different article. Further, the response may occur immediately after the original revert or alternatively, the reverter and the reverted may be involved in other reverts in-between the original revert and the response. Figure 2 illustrates the identified motifs in an example time-snippet of the English-language network. If the actions in the temporal motif are socially motivated, then the motif represents a particular social interaction. For example, if B reciprocates a revert by A (AB-BA), this could indicate "self-defense" (undoing the "attack") or "retaliation" (responding to the "attacker" with the same); if B reverts someone else after being reverted (AB-BC) then this could be a case of "pay it forward" retaliation. Our method does not allow us to establish the exact meaning of the social interaction as it is extremely hard to make causal claims with observational data. What we instead do is to look for evidence that the interaction is social. We assume if this is the case, the temporal motif will be common, occur fast, be associated with social status effects, and possibly, be related to prominent cultural differences between editors working in different languages. Thus, our analyses test against the null hypothesis that the six motifs we observe are not socially motivated and that they do not occur more often than random chance and that they are not associated with status effects as predicted by social psychology theories and previous research.
Thus, to establish evidence for social interactions, we need appropriate null models for comparison. First, if the motifs occur intentionally and systematically, they should exhibit time patterns that differ from any random collection of pairs of reverts. Typically, in motif analysis, the null model is a suitably randomized version of the empirical data 18 . The null model we use preserves the network structure and major temporal features such as burstiness but shuffles the timestamps of events within individuals and within a window of 24 hours. The shuffle occurs with these restrictions to preserve the general patterns of individual activity and to identify each social interaction against this expected sequence of events.
Second, if the motifs represent social interactions, they should exhibit social status patterns that differ from those associated with any other pairs of reverts. We rely on common social psychology theories and previous research to derive expected status effects for most of the interactions we study and we test them below. To measure status, we use the base-ten logarithm of the number of edits the editor has completed by the time of the revert under question. This measure has a number of advantages to alternative operationalizations of status in the community of editors. The number of edits gives us a well-balanced continuous measure of experience and seniority that is easy to compare across different language versions of Wikipedia and that can be used over the entire ten-year period of observation. Importantly, the number of edits has been shown to be the strongest predictor for promotion to administrator status 45 .
Here, we visualize results only for the largest network we study, the English language Wikipedia; the results for the other 12 networks are reported in the Supplementary Information. Since we test multiple hypotheses on multiple large networks, we approach the significance of results with more scrutiny. We take a conservative approach and only interpret the results that are consistent in direction and significance across different languages and operationalizations (see Materials and Methods). Most of the differences observed by language are not robust to different operationalizations and hence, apart from the few cases mentioned below, we do not attribute different results by language to cultural differences.
AB-AB. In our data, the AB-AB motif is more common and with a shorter time interval between the two actions than expected ( Fig. 3a; see also Supplementary Fig. S2, Supplementary Tables S1 and S2). In general, more active and experienced editors do more reverts and hence, the reverters tend to have higher status than the reverted user. The same holds for the AB-AB motif, without any significant deviations for most languages (see Supplementary Fig. S8 and Supplementary Table S3). This result is somewhat surprising because we expected that individuals that are closer in status and thus, in a sense, status rivals, will be more likely to revert each other repeatedly 46 .
AB-BA. The AB-BA motif occurs more commonly and the response rate (the time lapse between AB and BA) is shorter than what the null model generates (Fig. 3b). However, for the Chinese and Japanese languages the opposite is true-the AB-BA motif is not more common and the response time is significantly longer than expected ( Supplementary Fig. S3). The revert networks for these two languages do not differ in any other structural aspect from the networks of the other editions. Further, no such deviations are observed in any of the other interactions that we study. Since there is no specific editorial rule that may drive these deviations, we attribute them to cultural differences.
Social comparison theory states that people strive to gain accurate self-evaluations and as a result, they tend to compare themselves to those who are similar 47 . But since focus on relative performance heightens feelings of competitiveness, rivalry is stronger among similar individuals 48 . Indeed, we find that for all languages, the users involved in the AB-BA motif tend to be closer in status than expected ( Supplementary Fig. S9). In fact, the average difference in status is negative, meaning that A tends to have lower status than B. Additional analyses show that the status effect is driven by both equal-status and low-status reverters (Fig. 4).
We also found that the motif is more likely to occur in the same article compared to a baseline built by randomly pairing each node's reverts (Supplementary Table S4). This suggests that the interaction is largely driven by "self-defense", whereby when editor A reverts B, B reverts that revert. Nevertheless, up to 50% of the back-reverts occur in different articles.
AB-BC. The AB-BC motif does not occur more commonly than expected and when it happens, the response time is longer (Fig. 3c). The literature suggests that individuals pay it forward to lower-status individuals 49,50 but this is not the case here. The only status effect we observe is that B is likely to revert someone else if B was reverted by a lower-status user ( Supplementary Fig. S10 and Supplementary Table S3).
AB-CB. The AB-CB motif is not more common (Fig. 3d) although when it happens, it appears slightly faster (except for the Arabic, Chinese, and Japanese Wikipedias; Supplementary Table S2). In the AB-CB motif, A and C have a leader-follower relationship 51 and from organizational theory we know that rational actors imitate each other when they want to maintain relative competitive position 52 . We thus expected that C is likely to be competing for status with A, implying that C should have slightly lower status than A. However, this does not appear to be the case ( Supplementary Fig. S11 and Supplementary Table S3). In general, we find no significant status effects associated with the AB-CB motif.
AB-AC. The AB-AC motif is rare in our data, and the time interval between two successive actions is longer than expected (Fig. 3e). The reverts in this motif are executed by users with high status on users with low status (Supplementary Fig. S12). Low-status users do not participate in such reverts (Fig. 4c). We expected these status effects as we know that more senior Wikipedia editors tend to scout out for vandals and watch out for edits made by newcomers. Thus, we do not find evidence for systematic serial attacks but rather for senior editors "keeping the streets clean. " AB-CA. The AB-CA motif is more common than expected, although with a response rate as fast as expected. (Fig. 3f). On average, C is significantly lower in status than both A and B (Supplementary Fig. S13). One  The results on frequency are quantified using z-scores for counts. The results on rate are quantified using multiple measures: Kolmogorov-Smirnov test statistic, mean time, and skewness. The results on status are quantified using regression analyses with standard errors adjusted for clustering within reverter and reverted. (a) The AB-AB temporal motif occurs more commonly and with a faster response rate than expected. The difference in status interpretation of the AB-CA motif is that when a revert occurs, a third party intervenes and questions the authority of the reverter. A priori, we expected that an editor of higher status would be more likely to contest another editor's opinion. In contrast, the status effects we observe suggest that "third-party defense" happens due to daring newcomers, rather than experienced and established users.

Discussion
Large-scale collaboration by volunteers online provides much of the information we obtain and the software products we use today. The repeated interactions of these volunteers have also given rise to communities with shared identity and practice. The social interactions in these communities can in turn induce biases and subjectivities into collaborative public goods. Identifying the social interactions that play a role in these communities is thus an important first step towards understanding the biases in the content and products we consume.
In online collaboration communities, individuals may sometimes negatively evaluate or even undo other individuals' contributions. These actions may be related to the collaborative project but may also indicate negative social interactions among individuals. To better understand the latter, we here proposed to investigate the temporal and structural patterns of sequences of content-related actions.
To exemplify this approach, we analyzed six temporal motifs in the network of reverts among Wikipedia editors. We found evidence that Wikipedia editors systematically revert the same person, revert back their reverter, and come to defend a reverted editor. We did not find evidence that they "pay forward" a revert, coordinate with others to revert an editor, or revert different editors serially. In addition, we found that reciprocation of reverts tends to occur between status equals, while those who revert the reverters of others tend to be with low status. Further, high-status contributors are more likely to be involved in serial reverts. In essence, if senior Wikipedia editors revert, they may continue and revert someone else, they may get reverted by a low status contributor, or they may get reverted back if the editor they reverted is of equal standing.
Some of our findings are intuitive in the context of the structure and organization of Wikipedia. For example, vigilantism is common on Wikipedia and in particular, senior editors are always on the lookout for potential vandals or clueless newcomers. Hence, it is not surprising that serial reverts are not systematic but mainly conducted by senior editors. There are also explicit Wikipedia rules that discourage serial reverts; in particular, the "three-revert rule" prohibits editors from performing more than three reverts on the same page within 24 hours.
The context of Wikipedia also provides alternative explanations for some of our findings. For example, the result that "third-party defenders" tend to be of low status can be due to sock-puppetry, which refers to the practice of maintaining multiple accounts on Wikipedia with the aim to disrupt discussions or avoid editorial restrictions. If this were the case, senior editors would systematically have secondary accounts which they would use to revert the reverts of the edits they have done from their primary account. Such cases undoubtedly occur in our data. Nevertheless, a lot of this secretive behavior could also be happening through anonymous accounts and our analyses excluded those. Future research should investigate whether sock-puppetry indeed dictates the patterns we observed.
Still, some of our findings are unexpected. For example, although editorial wars and back-and-forth conflict have been extensively studied before, we were surprised to find that reciprocation of reverts is common except among the editors of the Japanese and Chinese editions of Wikipedia. The Japanese and Chinese cultures are known as "honor-shame cultures" so one plausible explanation is that editors in these languages avoid direct conflict in fear of ostracism 53,54 . In general, we treated most differences by language as a robustness test, as we acknowledge that multiple hypothesis testing on large amounts of data is bound to produce significant results in either direction 55 . Nevertheless, it is worthwhile to use more robust techniques to specifically investigate any cultural differences between the different language communities on Wikipedia.
Overall, our aim was to quantify the extent to which certain types of negative social interactions play a role in online collaborative communities, using the particular example of Wikipedia. We focused our analysis on the macro-level and thus, admittedly, we miss the nuanced understanding that thick ethnographic descriptions can produce 56 . Nevertheless, our findings provide the foundation and hopefully, an inspiration, for such more focused studies in the future. Our finding that certain types of negative social interactions and status considerations interfere with knowledge production on Wikipedia has practical implications both for controlling the quality of content and for maintaining editors' productive involvement on Wikipedia. Future research should use more in-depth analyses at a smaller scale to verify and qualify our findings.
Although our study revealed new knowledge about the structure and dynamics of negative interactions on Wikipedia, our findings do not automatically extend to other social systems. Still, in alignment with Keegan between the reverter and the reverted user is significantly larger for English but this result is not significant for the majority of languages. (b) The AB-BA motif is more common and occurs at a faster rate than expected. The results are consistent for all languages except Japanese and Chinese. Regardless of language, the editors involved have significantly smaller status differences and in fact, on average, B has smaller status than A. (c) The AB-BC motif does not occur more commonly and has slower response rate than expected. The difference in status between A and B is significantly smaller and in fact, negative. (d) The AB-CB motif is not more common than expected but usually occurs at a faster rate. There is no evidence for status effects, as the results observed here for English Wikipedia are not consistent across the other languages. (e) The AB-AC motif does not occur more commonly and is in fact slower than expected. The difference in status between the reverter and the reverted user is significantly larger than expected. (f) The AB-CA motif occurs more commonly than expected. It is significantly faster for the English Wikipedia here but this result is neither consistent among the other languages nor robust to different measures of the shape of the distribution. The status difference between C and A is significantly smaller than expected. et al. 41 , we strongly believe in the research potential of sequence and temporal motif analysis for illuminating this research problem. Thus, future research should apply our approach to open-source software projects and bulletin-board-type systems to examine the extent to which our findings generalize to other collaborative communities online.

Methods
Wikipedia data. Our data contain who reverts whom, when, and in what article. To obtain this information, we analyzed the Wikipedia XML Dumps (https://dumps.wikimedia.org/mirrors.html) of the 13 language editions we study. To detect restored versions of an article, a hash was calculated for the complete article text following each revision and the hashes were compared between revisions. To create the network, we assumed that a link goes from the editor who restored an earlier version of the article (the "reverter") to the editor who made the revision immediately after that version (the "reverted"). In an alternative operationalization, we could have created links from the reverter to every editor who made a revision in the time between the restored version and the revert but this would have resulted in considerably denser networks.
The revert networks were then pruned by removing self-reverts and by removing anonymous users, vandals, and bots, as well as any resulting isolates. We identified anonymous users as editors who did not use a registered username to revert (the data include an IP address instead) and vandals as editors who had all their edits reverted by others. The latter rule meant that we also got rid of newcomers who became discouraged and left Wikipedia after all their initial contributions were reverted. Since we are interested in social interactions emerging from repeated activity, we do not believe that this decision affects our results. To remove the bots, we identified all accounts with bot status (https://en.wikipedia.org/wiki/Wikipedia:Bots/Status) in the Wikipedia database (as of August 6, 2015). In addition, we removed any accounts with a username containing different spelling variations of the word "bot. " Null model. A number of sophisticated statistical methods for analyzing network data already exist but they are not adapted for large growing networks observed over a long continuous period of time. Exponential random-graph models (ERGMs) cannot account for the timing of interactions because they express the structural properties of aggregate networks or networks observed at a single moment in time 42 . Extensions such as temporal ERGMs 57 and stochastic actor-based models 58 account for network dynamics but are nevertheless restricted to relatively small networks with a fixed set of nodes and a few "snapshots" over time.
To establish the statistical significance of our observations, we use a null model created by randomizing the underlying network. The randomization needs to preserve the daily pattern and the community structure of the network, while removing any systematic clustering in the timing of events in which an individual is involved, as we consider such clustering evidence for a social process. Hence, we do not randomize the network structure but only the timing of events. There are several possible ways to do this 18,59 . First, one can shuffle the entire time sequence of events but this destroys individual activity patterns and increases the time variation in the individual's activities, as many editors joined Wikipedia for a limited period only and the shuffle is over almost 11 years. Individual activity patterns can be preserved if the randomization instead repeatedly samples two random nodes with equal number of events and swaps the time sequences for the events. The same could be achieved by shuffling within shorter periods of time, say 24 or 168 hours.
In addition to swapping and shuffling within a node, one can swap and shuffle within dyad, only within a node's in-links, or only within a node's out-links. These methods preserve the activity patterns characteristic for dyadic exchange, individual "visibility", and individual activity, respectively. However, in this way they also restrict the baseline to these narrow scopes. For example, swapping within dyads would compare the occurrence of the AB-AB motif to the AB-BA motif but would not answer whether either of them is more common than chance.
To account for these potential caveats, we choose to use as a baseline a shuffle of the timestamps of events within individuals and within a window of 24 hours. For each link l, we look at the source i and collect other links in which i participated (as either source or target) up to 24 hours before or after l. We then swap l's time with the time of one of these links selected at random; if the set of candidate links is empty, no swap occurs. We shuffle within individuals to preserve individual activity patterns and to identify each social interaction against this expected sequence of events. We execute the shuffle within a limited time window to preserve the duration of activity per individuals. The time window should be at least 24 hours to allow for interactions between editors from different time zones and with different daily routines. We found that a time window on the order of 24-240 hours produces similar results. Hence, we chose 24 hours as this is the time window we use to define the social interactions.
In short, our shuffling method does not change the structure of the network (who reverts whom), which implies that it preserves the community structure centered around articles and topics. Neither does the method change the overall sequence of timestamps, thus preserving any natural burstiness of activity due to editors being in the same time zone or due to the occurrence of news-worthy events, for example. In contrast, the method deliberately shuffles the sequence of reverts that an editor is involved in, thus removing any individual behavioral patterns.

Statistical tests.
The shuffled network provides us with an expected distribution for each interaction over the period of 24 hours. Comparing the weight of the distribution in the observed data with that in multiple shuffled realizations of the network can tell us the extent to which the interaction is more common than expected. Similarly, comparing the shape of the distribution can tell us the extent to which the interaction occurs at a faster rate than expected. To quantify the comparison, we estimate the z-score as follows:

D R R
In equation (1), X D is the relevant statistic in the data (for example, count or mean), X R is the same statistic in the shuffled networks, μ is the mean, and σ is the standard deviation. The statistic we use for frequency is the count of events (Supplementary Table S1). Two statistics we use for rate include the mean and skewness. Alternatively, to compare the shapes of two distributions, we also conduct a two-sample Kolmogorov-Smirnoff (KS) test between the data and one randomly chosen baseline generation. The KS test measures the maximum area deviation between the two normalized cumulative distributions. The sign of the KS statistic tells us the direction of the difference while the p-value tells us the likelihood of the observed distance given the null hypothesis of the distributions being the same. The problem with the KS test is that it is extremely sensitive to small deviations and thus overestimates the significance of the differences. Hence, we use the signed KS statistic to report the results but we use the mean and skewness z-scores to discuss their significance. Overall, the results are similar in terms of direction across the three statistics (Supplementary Table S2).

Status.
To investigate the effect of status in the interactions, we operationalize status as the base-ten logarithm of the number of edits the editor has completed by the time of the revert under question. We transformed the number of edits with the logarithm because they follow a power-law distribution (Supplementary Fig. S18). As a result, the difference in status can then also be expressed as the base-ten logarithm of the ratio of number of edits. To find evidence that status plays a role, we need to compare the status difference in the focal interaction with the status difference in any other revert. The observations are not independent across and within these two groups, however, but nested within individuals. To account for this, we execute regression analyses with standard errors adjusted for clustering within reverter and reverted (Supplementary Table S3).