An analysis of the spatial and temporal distribution of large‐scale data production events in OpenStreetMap

Organized mapping activities within OpenStreetMap frequently lead to the production of massive amounts of data over a short period. In this article we utilize a novel procedure to identify such large‐scale data production events in the history of OpenStreetMap and analyze their patterns. We find that events account for a significant share of OpenStreetMap data and that organizational practices have shifted over time towards local knowledge‐based events and well‐organized data imports. However, regions in the “Global South” remain dependent on remote mapping events, pointing to uneven geographies of representation. We also find that events are frequently followed by periods of increased activity, with the exact nature of effects depending on contextual elements such as previous events. These findings portray organized activities as a significant and unique component which requires consideration when using OpenStreetMap data and analyzing their quality.

for the evaluation of VGI projects, Fast and Rinner (2014) identify the project (i.e., the key conceptual and methodological decisions made by the initiators) as one of the contextual levels of organization shaping the data, alongside with participants and infrastructure. They also acknowledge an increasing institutional involvement in VGI, especially by governmental agencies (supported by the findings detailed in Haklay, Antoniou, Basiouka, Soden, & Mooney, 2014). This exposes organizations as an additional level of organization located between "project" and "participants".
Given that organized activities undermine the principle of independence required for achieving collective intelligence, or at least transform its meaning, studying VGI projects such as OpenStreetMap (OSM) requires also understanding the role and impacts of these activities. The literature offers various explorations of organized activities in OSM (see Section 2) yet, to the best of our knowledge, it still lacks a systematic analysis of the patterns and implications of this level of organization. In fact, two recent analyses-one of the VGI literature (Yan et al., 2020) and one of research related to the OSM project (Sehra, Singh, & Rai, 2017)-do not mention this issue in their lists of trends and themes. This article aims to address this knowledge gap regarding the role of organized activities in OSM by exploring a specific yet informative case: large-scale data production events, that is, events which produce large volumes of data for a specific region over a short period of time. Such events hold the potential to significantly affect the data and, due to the interconnectedness of production procedures and data, also contribution patterns. As such, studying large-scale events holds the potential to contribute knowledge on the roles of organized activity in OSM. In this article we identify such events via a global analysis of the entire history of contributions to the OSM database. Studying the characteristics and spatial and temporal patterns of events, along with the dynamics following them, we provide an extensive account of the role of organized activities in OSM and their impacts.
We continue as follows. Section 2 discusses the OSM project and the current knowledge regarding the organization of mapping activities within it. Section 3 then presents an event identification procedure used to identify a global set of events taking place throughout the history of OSM. Section 4 uses the results of this procedure to analyze the nature of the events, defining a typology of events and exploring their spatial and temporal patterns and effects on contribution activities. Section 5 then turns to discuss the practical and theoretical implications of our results for OSM. Section 6 concludes by offering directions for future research.

| ORG ANIZED AC TIVITIE S WITHIN THE OS M PROJEC T
This article studies large-scale data production events, as a form of organized activity, 1 within OSM, a prominent VGI project and community dedicated to creating an editable and open map of the surface of the Earth (Mooney & Minghini, 2017). In OSM, spatial entities are represented as nodes (single points, e.g., a tree), ways (collections of nodes forming a line or a polygon, e.g., a road, a building), or relations (collections of nodes, ways, and other relations, e.g., a bus route consisting of a way representing the route and a collection of bus stop nodes).
Additionally, it is possible to add tags to entities, containing semantic information such as land use or name using "key-value" pairs (e.g., landuse = residential). Anyone owning an OSM account can create a new entity, delete an existing one, or edit entities' geometries and tags (some less frequent operations such as posting notes on the map are possible as well).
The OpenStreetMap foundation maintains the project and while at times intervening in the data through the acts of Working Groups (cf. Bittner, 2017), it does not control the data (OpenStreetMap, 2020e). Guidelines for contributors are available in the project's Wiki pages (https://wiki.opens treet map.org) which are also open for editing, thus making it a negotiated product which offers a relatively loose structure for data and mapping activities (Ballatore & Mooney, 2015;Mocnik, Zipf, & Raifer, 2017). Hence, OSM is not characterized by a strong structural mechanism, keeping the project open for different kinds of interactions, some of them of an organizational nature, thus making OSM a "community of communities" (Anderson, Sarkar, & Palen, 2019, p. 2).
As noted above, governmental agencies have become increasingly active within OSM (Haklay et al., 2014).
However, governmental agencies are far from the only type of organization involved in OSM today. They are joined by local communities (Hristova, Quattrone, Mashhadi, & Capra, 2013;Mooney, Minghini, & Stanley-Jones, 2015;Perkins & Dodge, 2008), humanitarian organizations (Dittus, Quattrone, & Capra, 2017;Eckle & de Albuqurque, 2015) and even business corporates such as Facebook and Apple (Anderson et al., 2019). In these cases, data are produced in an organized fashion, thus contradicting the original bottom-up conceptualization of OSM as a VGI project.
Institutional engagements with OSM are frequently embodied in (usually) time-constrained organized contribution activities which aim to enrich the database for a specific region. Such events take multiple forms: mapping parties, where mappers meet together to do some on-the-ground mapping (OpenStreetMap, 2020k); mapathons, where mappers meet (mostly physically, but sometimes virtually) to jointly map from afar a certain area ("armchair mapping"; OpenStreetMap, 2020j); and organizations can coordinate global remote mapping events, as usually done by the Humanitarian OpenStreetMap Team (HOT) when responding to disaster and hazard situations (OpenStreetMap, 2020g). These last two remote mapping events rely on utilizing auxiliary geographic information such as satellite images to map from afar. Another type of organized contribution common within OSM is a bulk import of an external data set, usually from an authoritative source (Zielstra, Hochmair, & Neis, 2013). While an import can be carried out by a single user, this event still represents a type of organized contribution, with the organization taking place outside of OSM (the import itself could also entail some coordination efforts, cf. Grinberger, 2018;OpenStreetMap, 2020i).
These three primary types of events-local, remote, and data import (Coetzee, Minghini, Solís, Rautenbach, & Green, 2018)-are grouped together here under the term "large-scale data production events." Several studies have attempted to study the effects of such events. Imports, for example, were found to introduce quality issues that affect the feasibility of future imports (Zielstra et al., 2013) and to have a limited effect on the behavior of established mappers or on retaining new users (Juhász & Hochmair, 2018). Studies of humanitarian remote mapping efforts show that coordination practices and event frequency are important in engaging newcomers during events and retaining them (Dittus, Quattrone, & Capra, 2016a, 2016b). Yet, apart from high-profile cases, these events generally rely on a relatively stable community of experienced mappers (Dittus et al., 2017). Finally, local mapping parties, despite early evidence (Perkins & Dodge, 2008), are generally found to produce a meaningful amount of data while cultivating a lasting behavioral effect only for light and medium mappers (Hristova et al., 2013;Mooney et al., 2015).
Similarly, remote mapping events were found to impact mostly the activity of newcomers (Schott, 2019).
Despite these efforts, knowledge about the nature and effects of large-scale events in OSM remains limited. This is evident in the ongoing (and sometimes heated) discussion within the OSM community and project regarding data imports (cf. https://lists.opens treet map.org/piper mail/impor ts/2013-Janua ry/001715.html ;Juhász & Hochmair, 2018;OpenStreetMap, 2020i;Zielstra et al., 2013). Below we present a global analysis of such events throughout the history of OSM, facilitating the production of new knowledge on their characteristics, patterns, and impacts on contribution patterns.

| Event identification procedure
Identifying a large-scale event is not a straightforward task, given the relative nature of the term "large"-for example, 1,000 contributions made within a span of one month will not form a unique event in a region where the average monthly number of contributions is 800, but will make a significant effect in a region where the monthly average is 100. Hence, in this article we utilize a relative definition of large-scale events which considers the trajectory of data production activities within a given region. Such an approach requires defining a naïve model describing the dynamics of data production in the absence of an event. Grӧching, Brunauer, and Rehrl (2014) provide such a model based on empirical patterns identified in OSM data. Their model uses measures of the growth of the database (difference in the number of features between consecutive time periods) and of progress (relative growth) to identify four stages: no data; start (small number of active contributors); growth (low, medium, and high, differentiated by growth values); and a state of saturation. This translates into an S-shaped cumulative development trajectory with a relatively moderate rate of change when no event is introduced (Figure 1). We define large-scale events as sudden increases in the size of the database that greatly exceed growth patterns predicted by the S-shaped curve ( Figure 1). Accordingly, we base our event identification procedure on fitting a logistic curve: to empirical data describing the size of the database over time, and finding significant and positive errors (e r,t = y r,t −ŷ r,t ). Here ŷ r,t is the predicted cumulative value for region r at time t (measured as the time since the first contribution to the region), and a, b, c, and d are parameters.
The work by Grӧching et al. (2014) relies on counting the number of entities in a region. Mapping activities, however, also include other types of contribution actions such as deleting objects and modifying their geometries or tags (Mooney & Minghini, 2017). While it is probable that most events will include the addition of new F I G U R E 1 A synthetic example of the effects of an event on cumulative values and errors entities to the database, there is no reason to assume that creating data will be the primary focus of all events.
Furthermore, the level of effort varies between contribution types: deleting an object may be done with one simple operation, but creating or editing an entity requires multiple operations such as locating nodes, tagging entities, or modifying individual nodes. To account for this, our procedure counts the number of contribution operations per spatial unit and time period instead of the number of entities or contributions. Deletions are counted as one operation, creations as the number of nodes added plus the number of tags added, and editing operations as the number of new tags and nodes plus the number of those deleted. 2 Events lead to significant estimation errors. These errors also affect estimation errors for consecutive periods until predicted values "catch up" with observations ( Figure 1). To control for such effects, we use time-lagged errors, e * r,t = e r,t − e r,t−1 , which eliminate the temporal trend within errors ( Figure 1). We compute standard scores for lagged errors and identify events as time periods in which the error is significantly large and positive at 99% confidence.

| Data extraction and preprocessing
The procedure detailed above first requires delineating regions and producing time-series data describing the cumulative number of contribution operations. As theoretically the spatial extent of an event can range from subnational to international, national boundaries do not present a useful geographic system for this analysis.
Instead, a data-driven approach based on a quad-tree-like process was used to produce a spatial system. relations were excluded from the analysis). The procedure then recursively divided each cell into equally sized sub-quadrants until none of them included more than 50,000 entities. The resulting division thus contains cells of varying sizes and number of entities, allowing for some cells containing a relatively low number of entities (in cases where the division of one cell produced some sub-quadrants with at least 50,000 entities and others with less). To reduce the probability of incorrect identification of events, cells with less than 20,000 entities were excluded from the analysis, resulting in 10,136 cells. Figure 5 presents the outcome of this process.
Using the OSHDB tool again, we extracted all contributions within each cell along with their time-stamp and computed the number of operations included in them. To create a less noisy and smoother curve, the estimation procedure utilized a temporal resolution of 1 month. While this may have led to a state in which some non-event operations were attributed to an event, it reduced the sensitivity of the procedure to the day-to-day or week-toweek fluctuations that are part of OSM data. The cumulative number of operations was calculated per month and cell, producing the basic information for the curve fitting procedure. The OSHDB query also produced additional information per spatiotemporal unit, including the number of contributions by type (i.e., creation, deletion, tag editing, and geometry editing) and the maximal share of all operations in a given month carried out by one user only. The fact that until the introduction of version 0.5 of the OSM API in October 2007 object changes were not recorded through versioning (OpenStreetMap, 2020b) introduces uncertainties into the data. Hence, only data from November 2007 and onwards were used in the fitting procedure. The fitting itself was carried out using a Levenberg-Marquardt nonlinear least-squares procedure and implemented in R using the nlsLM function from the minipack.lm package (Elzhov, Mullen, Spiess, & Bolker, 2016). Python packages such as pandas, numpy, and scipy were used for subsequent analysis.

| Event identification results
To assess the results of the curve fitting procedure, we computed normalized root-mean-square error (NRMSE) values (computed as a percentage of the median observed value) for each cell in our spatial system. This analysis shows that the procedure generally produced reasonable results as NRMSE values were relatively low and followed a long-tail distribution ( Figure 2) with a median value of 12.45% and interquartile range of 26.15%. In some extreme cases, almost all contribution operations were made during a single month, leading to an almost perfect fitting of the curve. In such cases estimation errors were minimal, hence leading to the identification of small errors as events instead of the true larger contributions. To control for this, we identified false positives through a Box-Cox transformation to the number of operations in events and removing events in which the number of operations was more than two standard deviations below the mean value (i.e., events attributing for less than 15,714 operations). To identify false negatives, the same procedure was performed for the set of observations not identified as events, adding observations in which the number of operations was more than two standard deviations above the mean to the set of events (i.e., months attributing for more than 47,267 operations).
The procedure identified 55,374 events. The spatiotemporal resolution of the analysis led to events spanning more than one cell and/or month being counted more than once, meaning this number is inflated relative to the true number of events. The median number of events per cell was 5.00, the maximum number of events was 46, and 47.89% of the cells recorded three to five different events ( Figure 3). When translated to the percentage of months identified as events for each cell, the median value was 3.65% and the maximum value was 33.58%. As might be expected, cells in which more events were identified show a higher total number of operations ( Figure 3a).
However, there is no clear relation between the frequency of events in a cell and their volumes, expressed in the average number of operations per event ( Figure 3b): a higher frequency is associated with smaller events only for cells in which fewer than six events were identified; beyond that the average size of events increases again before stabilizing, suggesting that a high frequency of events in a cell does not make each individual event less meaningful in terms of the volume of contributions.

| Characterizing events
As discussed in Section 2, there are different types of large-scale data production events (e.g., data imports, local mapping parties, and remote mapping efforts). We used a k-means clustering procedure to better differentiate between these types of events, thus producing a typology of events. The procedure used the following variables, extracted as part of the OSHDB query (see Section 3.2): • Creations ratio, the share of creations (contributions adding new entities to the database) out of all contributions during the event; • Deletions ratio, the share of deletions (contributions deleting existing entities); • Tag changes ratio, the share of tag changes (contributions adding a key-value pair to an existing entity or changing an existing tag); • Geometry changes ratio, the share of geometry edits (contributions changing the geometry of an existing entity); • Maximal share, the maximal share of operations made by one user out of all contribution operations (i.e., we computed the number of operations made by each of the users active during the event and divided the maximum of these by the total number of operations).
The first four variables characterize contributions during the event, representing the extent to which they were focused on adding data to the database or on altering existing entries (through deletions, tag changes, or geometry edits). The fifth variable represents the extent to which the event was the result of a centralized effort carried out by one user (or a few users), as in a data import, or a more distributed effort where multiple users were making meaningful contributions, in terms of the volume of operations. We used a combination of clustering metrics and visual analysis to determine the value of the k (number of clusters) parameter. First, we computed solutions for values of k ranging between 2 and 12. Second, we computed three clustering metrics for each solution: the Davies-Bouldin score (DBS), the Calinski Harabasz score (CHS) and the mean silhouette coefficient (MSC). All three agreed that four clusters presented the best solution (DBS = 0.90; CHS = 40,420.04; MSC = 0.41). Yet, when we surveyed the characteristics of clusters in this solution, we discovered that the intra-cluster variance within one of the clusters was large, hence limiting the interpretability of the results. We therefore decided to choose a  4 The three measures did not agree on which solution was the second best (for DBS it was k = 8 with a score of 0.92; for CHS it was k = 6 with a score of 39,200.08; and for MSC it was k = 7 with a mean coefficient of 0.40). We chose to set k = 6 after comparing boxplot figures for all three second-best solutions, as it presented the most coherent set of clusters. Table 1 presents the characteristics of each cluster using two additional variables: the mean number of active users during the event (total users) and the mean time-step value, the time-step representing number of months since the first contribution recorded in the region. Of the six clusters, four represented highly centralized contributions (clusters 1-4) characteristic of a data import, showing average maximal share values above 84%. The other variables help differentiate between these imports. Cluster 2 is almost entirely focused on creating new entities and is the second earliest type of event on average, pointing to an intervention within a relatively immature database, hence termed "early import." Clusters 3 and 4 seem to operate within a more mature data environment (average time-step values of 93.18 and 98.29) and hence shift their focus from creation to editing and deletions.
Cluster 4 is especially focused on updating geometries and hence is named here "geometry update," while the other is termed "late import." The last import cluster (cluster 1) is unique as, beyond being the earliest type of event on average, it focuses almost entirely on tagging activities. As the results in the next sections show, this "tag import" cluster signifies a unique phenomenon within the history of OSM.
The other, more distributed, two clusters (5 and 6) represent community mapping events. The greatest differences between the two exist in the number of active users and editing and creation contributions: cluster 6 involves more users and is characterized by a focus on creating new entities rather than editing existing ones. This pattern fits with the profile of remote mapping events which mobilize larger communities that contribute based on little local knowledge. Accordingly, we identify cluster 6 with remote mapping events and cluster 5 with local knowledge events, that is, events in which the mappers show more knowledge about the area, evident in higher rates of tag and geometry changes. This cluster may include mapping parties along with mapathons and other remote mapping events.

| Event patterns
Based on this characterization of events, we now turn to examine the role of events in the entire OSM database, their patterns, and their relations with spatial contexts and the state of the database. Table 2 explores the weights of events in the context of the entire history of OSM mapping activities, starting from November 2007. The first insight that emerges from this table is that events are far from a negligible phenomenon within the OSM project: a significant share across all measures is attributed to events. While these numbers are somewhat inflated due to the temporal resolution of the analysis, they still point to the significant role of events in shaping the OSM database, with about 60% of creation contributions being attributed to events and more than 50% of all operations. The earliest types of imports (tag imports, early imports) present the most significant weights, especially in the context of creations and tagging of entities. Yet these results are certainly biased by temporal trends: events introduced later to the project necessarily make a smaller impact due to the increase in the size of the database.
Given this, the finding that remote events are responsible for more than 12% of all creations despite becoming a common practice 2.5 years after early imports (on average) suggests that over the more recent time periods this type of event is becoming more significant, as do the other types of later events (late imports, geometry imports, local knowledge events).
Investigating the temporal trends of event weights (Figure 4)

involvement of business corporations within OSM (Anderson et al., 2019). A shift of practices thus emerges which
implies that events remain a significant mode of contribution in OSM today.
It is plausible to expect that this shift in practices would follow a geographical pattern: all event types, from remote events, usually associated with humanitarian efforts, through local knowledge events, and down to bulk imports, are inherently tied to specific regions. To explore this, Figure 5  France, Japan, and New Zealand; late imports and especially geometry imports have little weight, if any, in the sub-Saharan region and in far-east Asia. Data in these latter regions are mainly attributed to remote events and, to a lesser extent, to local knowledge events. Therefore, it seems that less developed regions do not experience this shift in practices the same way regions in the Global North do.
The patterns in Figure 5 can also be used to assess the validity of the event identification results, as they point to known events in OSM history. For example, the impacts of tag import events are especially clustered in the TA B L E 2 The share of event operations and contributions out of all contributions/operations in the history of OSM (last row), by event cluster

| Effects of events
An event is a singular occurrence that greatly affects the data. Due to the inseparability of data and production processes, it is probable that events will also affect contribution behaviors in the months following them. One way to explore this point could be to compute a measure of change, such as the percentage change in the value of a certain measure (e.g., number of operations) in the period after the event in relation to a similar period before it, that is: where c i,t,k is the percentage change in the value of measure k (e.g., change in the number of operations or the number of active users) for an event taking place at time t in cell i; k T,i is the median value of k in cell i for a certain time period T, where −T indicates time before the event (e.g., if T = 6, −T refers to [t -6, t -1]) and T time after the event (e.g., T = 6 refers to [t + 1, t + 6]).
One issue with this measure is that its values may be excessively large for early time-steps: as the volume of activity during these time-steps is low, a small increase in absolute numbers may translate to a large percentage increase. To control for this, we compare events' c i,t,k values with values for corresponding months in regions in which no event was registered. We call this measure the relative median change rc i,t,k and its computation includes: • identifying "control" observations (months in which no event was registered) and computing c values for them (c i,t,n ); • computing median c i,t,n values per time-step for control observations (i.e., c t,n ); and • computing the relative median change as the difference between each event observation c i,t,k and the c t,n value for the respective time-step: To avoid biasing the results, only observations for which no event was registered T months before and after the observation were included in the analysis. Also, events of the same type taking place on consecutive months were considered as one event. In such cases, -T ended one month before the first month in which an event was identified and T started one month after the last month in which an event was identified. Figure 6 uses boxplots to represent the distribution of rc i,t,k values for number of operations, number of contributions by type (creations, deletions, tag changes, geometry changes), and number of active users for T = 6 months (black boxes) and T = 12 months (grey boxes), by event type.
One prominent result is that for almost all combinations of measures and event types, the median value is positive, meaning that in more than 50% of cases the volume of activity increased after the event. Additionally, in most cases the range of positive values is greater than the range of negative values, providing further support to the impression that events of all types frequently act to encourage activity and have no intrinsic negative effect.
Over most measures, the widest ranges and hence the highest values for T = 6 are registered for remote events, early imports, and tag imports. Notice the distinction here between early types of imports (early, tag) and later ones (late, geometry). This suggests that not only the database has matured but also the community, developing more sound import practices that require less correction afterwards (see OpenStreetMap, 2020i). In this sense, imports not following good practices can be seen as catalysts for community engagement (in addition to the natural increase in activity over time attributed to the growth of global and local communities). The increase in activity following remote events can be explained by humanitarian mapping projects tending to be mission-centric campaigns with no clear deadline (Dittus et al., 2017) and that organizations such as HOT practice validation procedures. The 7.3% increase in the median number of active users for T = 12 in relation to T = 6 for this type of event further supports this. Beyond this, differences between the T = 6 and T = 12 results are visible mostly for tag imports. As this is the earliest type of event (see Table 1), this result may still be related to the volume of activity prior to event. However, given the relative nature of the measure, it is still probable that this import did serve as an incentive for users to correct the data.
(2) c i,t,k = 100 Another way in which an event can affect data production dynamics is by affecting the probability of the emergence of future events (as discussed in the context of imports by Zielstra et al., 2013): if an event is deemed successful, the community may be interested in replicating it; if unsuccessful, the community may strive to change practices and organize an event of a different type, countering the effects of the previous event, or decide to avoid organizing any type of event.
To explore this possible effect, the first type of event in each cell was isolated. Then the frequency with which initial event types were followed by events of all types (considering all subsequent events) was computed (Table 3). The results do show a certain form of path dependency: apart from tag imports, all event types were followed by events of the same type in more than 40% of cases. Additionally, imports are more frequently followed by imports of all types; for example, early imports are frequently followed by early imports, tag imports, and late imports and least frequently by local knowledge and remote events. Hence it is possible to differentiate between "import" regions and "community" regions.
F I G U R E 6 Distribution of relative median change values by event type for T = 6 months (black boxes) and T = 12 months (gray boxes). Numbers indicate the median value Interestingly, remote events are almost as frequently followed by local knowledge events as they are by additional remote events. It is probable that in many cases these subsequent events are part of validation efforts made by organizations such as HOT, as suggested above. The data support this: in 47.59% of the 1,866 cases in which a remote event was the first event registered in a cell, the next event was a remote or a local knowledge event, with a median time difference of 6 months. Thus, it seems that remote events show only limited success in encouraging organized activity beyond the scope of a specific, time-dependent, context. Finally, it is interesting to note that local knowledge events frequently follow the later types of imports (late, geometry) and are also frequently followed by these types of events. This suggests a common causal origin: both organizing the event or identifying available data and importing it while following good practices require a motivated local community.
These results, along with the findings from Figure 6, suggest that the effects of events and the processes leading to their creation emerge from intricate relations between local context, the global community, and previous contribution patterns.

| D ISCUSS I ON
In this article we have set out to assess the role of organized activity in OSM by identifying large-scale data production events and characterizing them in terms of their spatial and temporal patterns, contextual relations, and effects. Our findings show that organized activity is indeed a major component of OSM as a VGI project, with large-scale events being a frequent phenomenon encompassing much, and possibly the majority (considering the share of operations and creations attributed to events; Table 2), of data production. As such, the results here can offer some empirical insights relevant to the discussions within OSM mentioned in Section 2. First, it seems that in terms of activity following an event, this phenomenon does not have an intrinsic negative effect. Not only was it rare for activity volumes to decrease after such events, the most contested type of events (early imports) have led to more community engagement by increasing the need for controlling for adverse data effects. This is most evident perhaps in the case of the 2007 import of TIGER data in the USA, which is notoriously known for its adverse effects on data quality. On the one hand, this import may have fueled the suspicion towards bulk imports. Yet, phrasing in the OSM Wiki entry on TIGER (e.g., "Enough editing has occurred since the original upload"; OpenStreetMap, 2020l), the numerous discussions on the American OSM community mailing list (https:// lists.opens treet map.org/piper mail/talk-us), and the still ongoing TIGER fixup efforts suggest that these issues have TA B L E 3 The frequency with which each event type, when being the first one recorded for a cell, was followed by each of the types of events served as a call for action for the American OSM community, perhaps even becoming part of its identity. In at least one case this import directly motivated a second organized effort: the 2009 automated TIGER fixup, identified in our results as a cluster of tag imports. Another case where such an early import was followed by a tag import deleting and changing redundant data was registered for the city of Tel Aviv-Jaffa, Israel. In this case, a December 2012 import of address data (labeled here as an early import) was immediately followed by an automated correction in January 2013 (a tag import) in an effort coordinated by the local community (Grinberger, 2018).
This last case relates to another finding: the frequent co-occurrence of late imports, geometry imports, and update events. This finding suggests that imports may be an indication for a functioning and vibrant local community (as in the case of Tel Aviv-Jaffa) which can sustain any possible adverse effects. Nevertheless, local knowledge events tend to present a limited effect on non-organized activity or on the number of active users. This is not to say that field mapping events or local mapathons hold no value. First, it is possible that these are required to keep the community engaged and hence their impact may lie in retaining mappers and avoiding a decrease in activity.
Second, these events may work in other ways not captured by our data, such as increasing the cohesiveness of the local community and improving coordination and communication. These possible impacts can be further studied through analyses of changes to collaboration networks within the local community (Mooney & Corcoran, 2014;Truong, de Runz, & Touya, 2019), of the effects of dedicated interventions (Juhász & Hochmair, 2018), or of changes to retention rates and the "life cycles" of contributions (Bégin, Devillers, & Roche, 2017. The results provide some indications that remote mapping events encourage further activity, but the findings are somewhat contradictory and need to be interpreted in relation to the contexts producing the events themselves (i.e., organized validation efforts after humanitarian mappings events). For example, in the 12 months following the response to the 2010 Haiti earthquake (OpenStreetMap, 2020f), the median number of monthly operations across the 20 cells covering Haiti was 99.50 (range 11.5-4,061) and the median number of active users per month was 2.5 (range 1-7.5). Yet, this event was a major milestone in the institution of HOT (OpenStreetMap, 2020g), which later led the 2016-2017 mapping efforts following Hurricane Matthew, which included Haiti as well.
The findings here support the critique that production processes cannot be separated from the data in VGI (Sieber & Haklay, 2015), pointing to complex relationships between communities, spaces, previous data contributions, and sociocultural contexts. The maturity of the data encourages a shift in organizational practices as the weight of initial imports decreases while new modes of organization, such as community-based events and imports that enhance data rather than just producing them, are emerging. Accordingly, it is likely that in the future community-based events will become an even more important component of data production procedures. This shift towards community events will not necessarily work to support the main objective of OSM (see https:// www.opens treet map.org/about): representing local knowledge. Remote events, which are more effective than local knowledge events in terms of scale and number of involved mappers (see Tables 1 and 2), rely on external information in the shape of satellite images, and thus integrate local knowledge only to a limited extent. The geographical focus of remote events on less developed spaces in sub-Saharan Africa and south-east Asia points to a possible adverse effect on local representation, despite the general enrichment of representation through such events. From this it follows that not all spaces enjoy the shift in organizational practices alike and that organized activity, despite being more community-driven, cannot solve the digital divide typical of VGI (Sui, Goodchild, & Elwood, 2013) and may even enhance it.
This points to an important theoretical gap that requires attention. Despite contribution behaviors receiving attention as a topic related to understanding the data and their quality (Sehra et al., 2017;Yan et al., 2020), studies of this topic tend to focus on individual mappers, their behaviors, and the interactions between them. The findings here suggest that this may not always be the best resolution for studying OSM data. In their discussion of the relevance of actor-network theory for studying Web 2.0 cartographies, Bittner, Glasze, and Turk (2013, p. 940) mention "asymmetries between actors because bigger actors (e.g., international organizations or companies) succeed in stabilizing many transactions in black boxes, which allow them to appear as a huge and single homogenous actor." The concept of "black boxes" (actors jointly operating so smoothly as to form a seemingly unified entity) did not receive enough attention in the study of OSM data quality. An event can be read as an attempt to homogenize contributions; this is clear for imports that originate from a specific organizational environment, but true also for local and remote events that follow common goals, guidelines, and practices. As such, events act to introduce specific dimensions of performance into the project. Studying the behaviors of individuals without considering this may lead to "not seeing the wood for the trees"; the total effect of an assemblage may hide behind the individual actions of its members. Hence, quality measures that treat mapper groups as assemblages acting alongside individual mappers can expose important information regarding asymmetries of knowledge and representation. For example, conceptualizing the 2007 TIGER import as one assemblage leading to the creation of the fixup assemblage (by individual actions and automated fixes) can expose whether these black boxes steered attention away from other mapping issues, thus affecting the quality of the data. Analysis of co-contribution networks identifying dense clusters of cooperation (see Ma, Sandberg, & Jiang, 2015), in which tags or entity types are conceptualized as nodes, holds the potential for identifying such assemblages, alongside spatiotemporal clustering of contribution behaviors. Analyses of this kind should also consider additional contextual elements, such as the aggregated expertise of users (Muttaqien, Ostermann, & Lemmens, 2018) or activities beyond mapping, such as involvement in the development of the project and the discourse around it (Plennert, 2018)  OpenStreetMap, 2020d), in response to users' feedback, or via remote/on-the-ground mapping. Some of these corporations (e.g., Mapbox, Apple, Grab, Facebook) contribute also to areas which did not experience the shift towards community events, thus opening up the integration of more local knowledge into OSM. Yet, this knowledge is collected and incorporated only if it fits with the corporate's needs and practices. Hence, they bring into the project their own perspective, at times through massive contributions. Quality measures relating to these groups of mappers as one entity can help generate knowledge regarding the effects of their actions on the data and on contribution patterns, a knowledge which is still lacking given the recency of this phenomenon.

| CON CLUS IONS
This article explores the role and effects of large-scale data production events in the history of the OSM project, events in which many data operations are carried out over a short period of time. As these events require some form of organization, we use this to consider the significance of organizational context in OSM. Our results show that organizational contexts play a significant role in the development of the data in OSM, yet that their specific effects change over space and time: as the project and the data mature, partially through events and the responses to them, practices shift towards community-based events; yet this does not automatically imply an increase in the extent of representation of local knowledge, as evident in event dynamics in regions outside of the "Global North." The insights emerging from these findings point to theoretical issues that require consideration. First, organizational activities being not just a byproduct of OSM but a crucial and constitutive element of it, as the findings above show, calls for integrating organizational contexts in a formal way into analyses of OSM. This research makes only a first step in this direction. Formal measures based on conceptualizing organized activity as a unique entity and utilizing techniques such as network-based analyses of mapping interactions (Moony & Corcoran, 2013, 2014Stein, Kremer, & Schlieder, 2015), inquiries into the identity of mappers active in different spaces (Quinn, 2017), and studies of more specific effects on the nature of representation still need to be developed.
Second, the reults expose complex interactions between practices, data, context, physical spaces, and digital environments during data production. These dynamics picture OSM as an ontogenetic digital space (Kitchin & Dodge, 2011), a space that is always in a state of becoming through the practices of participating mappers. New contributions, whether creations, edits, or deletions, continuously alter these spaces while relying on their existing spatial forms and imperfectly citing previous contribution practices. Such spaces extend themselves through interactions beyond any specific scale and are constantly engaged with other spaces, such as physical and cultural ones. OSM research has yet to consider the implications of this state for analysis (perhaps with the exception of Schott, 2019). Conceptualizing OSM in this way calls for relational modes of inquiry that focus on the relations that produce space. The network-based studies mentioned above certainly mark a step in the right direction. Yet more work is required in order to integrate dynamics, meaning, and practices into such methods. Future endeavors should explore issues such as how networks and community structures change following major contributions (see Grinberger, 2018), how meaning is being introduced into OSM projects through activities such as tagging and adding Wiki entries (Ballatore & Mooney, 2015;Mooney & Corcoran, 2012), what trajectories mappers follow through OSM, and what practices they employ. While the VGI literature relates to some of these issues, a holistic inquiry into the subject, one that will probably extend beyond quantitative and computational approaches, can serve to further enhance our understanding of OSM and possibly also other crowd-based mapping endeavors.