Quantifying the drivers behind collective attention in information ecosystems

Understanding human interactions in online communications is of paramount importance for our society. Alarming phenomena such as the spreading of fake news or the formation of echo-chambers can emerge in unhealthy communication environments and, ultimately, undermine the democratic discourse. In this context, unveiling the individual drivers that give rise to collective attention can help to conserve the health of our information ecosystems. Here, following a recently proposed analogy between natural and information ecosystems, we explore how competition for attention in online social networks and the strategies adopted by the users to maximize their visibility shape our communication dynamics. Specifically, by analyzing large-scale datasets from the micro-blogging platform Twitter and performing numerical modeling of the system dynamics, we are able to measure the amount of competition for attention experienced by users and how it changes when exogenous events captivate collective attention. The work relies on topic modeling to extract users’ interests and memes context from the data and a framework based on ecological niche theory to quantify the strength of negative (competitive) and positive (mutualistic) interactions for both users and memes. Interestingly, our findings show two different behaviors. While memes undergo a sharp increase in competition during exceptional events that can lead to their extinction, users perceive a decrease in effective competition due to a stronger effect of mutualistic interaction, explaining the focus of collective attention around specific topics. Finally, to confirm our results we reproduce the observed shifts with a data-driven model of species dynamics.


Introduction
Online social networks (OSNs) have not only transformed the way in which we communicate, but also how we access and process information. In fact, since their appearance, they have played a double role: as spaces to build social interactions and as news media platforms. Consequently, our communication model has switched over from a centralized mass media environment and face to face interactions, to an era when all the actors are, at the same time, information sources and receivers. This duality and the new paradigm it induced, make also OSNs a perfect example of social information processing, leading to emergent phenomena such as viral information spreading [1], fake news [2] and the formation of echo-chambers [3,4].
These changes have also exposed our cognitive limitations. Our brain passed from hearing few broadcast information sources, to be bombed by millions of messages demanding our attention. This has led to a variety of phenomena, such as an acceleration of social dynamics [5], and terms like 'competition for attention' entered our everyday vocabulary. Moreover, OSNs are often designed to captivate our brain and maximize the time we spend on them by providing reinforcing feedbacks [6] and instant gratification [7,8]. These limitations along with the huge amount of information we produce every second, induce competition between ideas/memes for visibility, and users tend to adopt different strategies to increase the chances that their messages would be read.
To understand how these low-level drivers shape the way our society processes information, in recent years, different approaches have been proposed. From a data analytic perspective, several works demonstrated the role of competition in information diffusion [9,10], our social interactions [11][12][13][14] and the quality of the information we share [15]. All these results also found confirmation from theoretical models, for example for the distribution of memes popularity [16][17][18] or how echo-chambers emerge [19].
This strong emphasis on the effects of competition and on how users respond to it, also led researchers to draw an analogy between information and natural ecosystems [5,[20][21][22]. Actors (e.g. users and memes) of OSNs are seen as species in ecological communities where they seek to maximize their abundance-visibility in this case-and all are competing for limited resources (e.g. individuals' attention). Moreover, the communication strategies adopted by the users like, for example, the use of specific hashtags to provide context to their messages can be represented as mutualistic, as they favor both visibility of users and growth of certain memes.
One example of how this analogy can lead to insights on the functioning of information ecosystems has been the work of Borge-Holthoefer et al [20], who studied the organization of online discussions around social protests in Spain in 2011 as an ecological network, finding that its structure evolved toward a nested architecture, very close to the typical organization of natural mutualistic assemblages [23,24]. Moreover, building on these results, in a recent work Palazzi et al [21] proposed an ecology-inspired model [25,26] to explain the structural flexibility showed by OSNs in response to external shocks such as breaking news. In [5], instead, the authors were able to explain the acceleration in collective attention they found in the digital streams and cultural items by employing a mathematical model based on Lotka-Volterra dynamics-a theoretical framework often used in population dynamics and theoretical ecology [27,28]. Finally, from a more theoretical perspective, models based on the concept of ecological neutrality [29,30], were able to reproduce several emergent patterns found in online communications [22].
Here, following this research line, we exploit the similarities between natural and information ecosystems to understand the main drivers that shape collective attention. Specifically, analyzing different datasets extracted from the online platform Twitter and an ecology-inspired numerical model, we are able to quantify the intensity of the competitive and mutualistic interactions experienced by each user, and how these change when exogenous events focus collective attention. We start by relying on topic modeling to study the evolution of users' interests over time, along with the diversity of the discussion around the different topics. In this way, we can measure the similarity between users and memes and then, employing a modeling framework based on ecological niche-theory [26], build interaction networks similar to ecological communities with both competitive and mutualistic terms. We then analyze the obtained networks with the aim of understanding how variations in the amount of competition and mutualism experienced by the users can explain the focus of collective attention around one or few dominating topics. These analyses allow us to explain these shifts in attention in terms of a reduction of the effective competition experienced by the users. We then confirm this finding by reproducing the patterns we observe in the data with numerical simulations of an ecological model based on species abundance maximization [21]. Our results not only shed light on how information processing drives users' behaviors in social media but, in a broader sense, they spotlight the opportunities that an ecological approach can offer to the study of information ecosystems for understanding collective phenomena in techno-social systems The rest of the work is organized as follows. In the next section, we introduce the datasets we will employ in the analysis, the topic modeling technique used to extract information topics from Twitter discussions, the ecologically inspired numerical model and the methods to quantify the strength of competitive and mutualistic interactions from the data. Then, in sections 3.1 and 3.2 we present our results for the time evolution of topics and users' interests respectively while, in section 3.3, we study the user-meme interaction networks. Moreover, in section 3.4, we confirm our previous results with the numerical simulations of our model. Finally, with section 4 we summarize the main findings of this work and draw our conclusions.

Twitter events datasets
Since we are interested in quantifying the changes in collective attention during exceptional events, we decided to focus on data from the online social platform Twitter. Twitter, along with being a social network and due to its microblogging nature, is also often considered as a news and information media. Thus, social, political  or natural events will be likely reflected in its users' activity, allowing us to record both the change in their interests and in their interactions.

Information topics
For our analyses, we focus on three large datasets describing Twitter activity: the consultations and protests around the self-determination referendum organized by the Catalan government in November 2014 [21]; the general elections held in Spain in 2019 [21] and the response to the earthquake that hit Nepal in 2015 [31]. All the datasets have been collected through the Twitter streaming API and are publicly available (see appendix A for details on the data collection process and their availability). Table 1 summarizes the main features of the datasets used. We choose those datasets because they cover both expected and unexpected events [32,33]: from political discussions with a fixed timeline (Spanish elections), to political/social protests like the Catalan referendum with a mixture of organized activities and improvised protests, and an inherently unexpected natural disaster (Nepal's earthquake). In particular, since our aim is to study how exceptional events shape the communication dynamics in online discussions, we focus only on events that are exogenous to the classical OSNs dynamics and where we can have a detailed view on their evolution-e.g. salient periods can be easily identified from the news and other media.
In order to extract information topics from the tweets, for all the datasets, we only considered tweets with at least one hashtag (with the only exception of the Nepal earthquake dataset, where due to the extremely large size we had to restrict our analysis to only hashtags that have been posted at least 100 times). From each tweet, the information considered is: the timestamp of the tweet, the user-ID (previously anonymized) of the user who posted the tweet and all the hashtags present in it. In this way, we were able to extract users' interests and their evolution throughout the discussion, along with hashtags co-occurrences, fundamental to infer the information topics. Figure 1 presents the time evolution of the number of tweets for the datasets considered. In all the cases, it is easy to see peaks in the activity, usually related to external events, like the television debate between candidates or the polling day for the Spanish elections or the referendum day for the Catalan referendum. These high activity periods are usually preceded and followed by calmer periods characterized by a lower and constant activity. Following this separation, we split each dataset into different parts, each corresponding to a 'peak' or 'rest' period, and study how users' interests and competition for attention varies between them. In particular, our aim in splitting the datasets is to have periods small enough to clearly distinguish between peaks, containing only a single event, but also to have enough data during the rest intervals. Thus, for each dataset, we identified the largest spike in activity and made its duration the size of the period. For example, for the Catalan referendum, since the largest spike lasts around 10 days, we decided to divide the dataset in intervals of exactly 10 days. We applied a similar reasoning for the Nepal earthquake dataset, in which the activity peak lasts 3 days so, we have 3 periods of 3 days each. However, for the Spanish elections dataset, due to its short duration and the effects of circadian rhythms, we were forced to have slightly uneven intervals to guarantee enough data during the resting periods.
Once the datasets and the different periods are defined, we focus on how to extract users' interests from the raw data. This step is fundamental to measure the similarity between users and, eventually, quantifying the competition for attention that both users and memes experience over the development of the events. To extract users' interests from their timeline, we rely on a network theory approach based on hashtags co-occurrence, designed specifically to infer information topics in Twitter [34][35][36]. Hashtags are used in many social media as keywords to indicate the content of a message and they often represent memes, whose meaning is known to the users. They thus provide a concise indication of a tweet's semantic context. In this way, the appearance of two or more hashtags in the same tweet is often the sign of a semantic association between them-they belong to the same topic-as it happens to words co-occurring frequently in texts [37,38]. Based on this association, we extract information topics as cohesive clusters of hashtags that significantly appear together in different tweets.
To do so, for each period considered in the data, we build a weighted co-occurrence network between hashtags, where a link is laid if two hashtags appeared together in the same tweet and its weight represents the number of different tweets where they co-occurred. In order to eliminate spurious links and assure the significance of the semantic associations between the hashtags, we then remove all the links which weight is equal or smaller than 3. Once the networks have been built, information topics can be detected as clusters of densely connected hashtags-i.e. communities in the graph. Following the same procedure proposed in [35,36], we thus employ the OSLOM tool [39] to detect communities in the different networks. We decided to use OSLOM for community detection because it is able to extract overlapping communities where nodes can belong to more than one community at a time. Thus, in our case, it is able to catch the fact that hashtags with different meanings can be part of several topics. Figure 2 represents an example of topics extracted from the Spanish elections dataset. In the four topics the names of the distinct national parties are clearly identifiable along with abbreviations of the most important dates.
Once the information topics are extracted from the data, we can use them to detect users' interests from the hashtags they posted, and their membership to the different topics. For each user, we build a feature vector u whose entries u i quantify user's interest toward topic i. Specifically, u i accounts for the number of times the user posted a hashtag belonging to topic i in their tweets. Moreover, since hashtags can be part of different topics, we split the weight of each hashtag according to the number of communities it belongs to. For example, a hashtag part of only one topic t will contribute to the pertinent u t by adding one. On the contrary, one belonging to  [35]. (panel (b)) Hashtag-topic vectors are built directly with the membership of each hashtag to one or more topics. User-topic vectors can be calculated from the hashtags tweeted by each user. If a hashtag belongs to more than one topic its weight is split evenly between all the topics it participates. (panels (c), (d) and (e)) From the user and hashtag vectors, cosine similarity is employed to calculate topic overlap. (panels (c) and (e)) The hashtag-hashtag and user-user competitive matrices are calculated proportionally to topic similarity between users (hashtags). (panel (d)) Finally, the mutualistic matrix is built as the similarity between users and hashtags. two topics will contribute 0.5 to each of the two corresponding entries of u. In figures 3(a) and (b) we provide a diagram showing how the hashtags posted by the users are classified into topics and then used to build the user vector u. The same procedure can be then applied to hashtags to build an equivalent hashtag vector h for each of them, accounting for their participation in the different topics.
User vectors quantify how individual attention is distributed across the different topics and how external events can focus it over a specific theme. However, individual interests are also a proxy of competition for attention. If everyone is interested in a specific topic, hashtags on that topic will experience a stronger competition for spaces in the users' screens. In the same way, users compete with their peers to get their messages read. On the other side, a user can decide to post about popular topics to have the advantage of reaching a potentially larger audience, leading to a sort of mutualistic interaction.
To quantify how much competition users experience in peak and resting periods in our datasets, we measure how similar the user vectors are one to the other. For each couple of user vectors u and v we calculate the cosine similarity: In this way, we have a measure that ranges from 0 (users with totally disjoint interests) to 1 (totally aligned users). Finally, hashtag competition can be estimated in the same way using the hashtag vectors h instead of u (see section 2.4 for details).

Dynamical model of users' attention
Defining users and hashtags similarity allows us to quantify the amount of competition and mutualism experienced by each node. However, to get further insights on the driving mechanisms behind these changes in collective attention, we also employ an ecology-inspired visibility optimization model proposed to explain structural changes in the user-hashtag interaction networks [21] over the course of an event.
Following the analogies between ecological communities [25,26] and information ecosystems [21,22], the model is based on the assumption that the attention dynamics recorded in the data is the result of an optimization process where users aim to maximize their visibility. Users and hashtags are thus seen as species of a mutualistic assemblage belonging to two different classes-e.g. plants and pollinators in natural ecosystems. Competition takes place between species of the same guild (user-user or hashtag-hashtag), while mutualistic interactions occur between species of different guilds (user-hashtag), with the species dynamics modeled by Lotka-Volterra equations with a Holling-type II functional response [24,25]: where n U i and n H i stand for the abundance (visibility) of species i part of the users' (U) or hashtags' (H) guild, while ρ U i and ρ H i represent the respective growth rates and h the handling time of the Holling-type II mutualistic functional response. The intensity of the users and hashtags competitive and mutualistic interactions is defined by matrices β UU , β HH and γ UH , respectively. Finally, θ is the adjacency matrix of the mutualistic interactions accounting for the hashtags produced by each user in their posts.
The optimization process at the basis of the model is the same as the one proposed in Suweis et al [25], where species (users) rewire their mutualistic connections to randomly selected partners (hashtags) and the new links are kept only if they lead to an increase in abundance (visibility). Otherwise, the original connection is restored. Specifically, at constant time intervals, a random user U is selected and one of its existing connections to hashtag H is rewired to a new hashtag H . The link to be rewired is selected with probability p UH ∝ 1 − k −1 H where k H is the degree of H. After the rewiring, we let the system evolve until the equilibrium is reached. If at the end of this period U's abundance is greater than the previous one, the link is kept; otherwise, the previous configuration is restored.
It is important to notice that in our model only users maximize their abundance (visibility) since they choose the hashtags to post in their tweets. Thus, changes in the hashtag networks are due only to users' actions.
Finally, information topics are modeled as ecological niches [40] associated to each species. For the users, they represent the set of individual interests while, for the hashtags, they define their semantic context.

Estimating competition and mutualism from data
Once the species dynamics are defined, we need to estimate the interaction strength among species (β UU , β HH and γ UH ) from our data. To do so, we follow the approach proposed by Cai et al [26], where competition and mutualistic strengths are proportional to the niche overlap between species of the same (competition) and opposite (mutualism) guild. This way, we can estimate niche overlap G gg ij between species i belonging to guild g and species j of g from the data as the cosine similarity between the topics vectors of i and j. Then, the elements of the competition matrices β HH ij and β UU ij are simply proportional to the ovelap: β HH ij ∝ Ω c · G HH ij and β UU ij ∝ Ω c · G UU ij , with Ω c a global factor to tune the absolute strength of the interaction. In a similar way, we can define the mutualistic strength as: γ UH ik ∝ Ω m · G UH ik , with the only difference that, in the model, the mutualistic strength γ UH ik is then multiplied by θ UH ik to include the fact that user i may (θ UH ik = 1) or may not (θ UH ik = 0) interact with hashtag k. Finally, to be able to compare the two strengths we impose Ω c = Ω m .

Results
We start our analyses by dividing our datasets in different periods, depending on their activity (i.e., the overall number of tweets produced). We thus distinguish between what we define as peak periods, when external events produce spikes in Twitter activity, such as the television debate day for the Spanish elections dataset or the referendum day for the Catalan dataset; and resting periods, where the discussion is still active but not driven by major external events. In this way, we can use the resting periods to define a baseline for activity and collective attention and compare them with the peak periods. Figures 4(a)-(c) show the split of the three datasets we are considering. For the Spanish elections dataset (figure 4(a)) we define five different periods with the second and fourth covering large events, while the first and fifth represent our baseline. The third one is a mixture of peak and resting periods, since it is characterized by an increased activity but only smaller events took place. We adopt a similar split also for the Catalan selfdetermination referendum dataset. In this case, we defined seven periods, with the first and seventh as peaks, while the second, fourth and sixth as resting states. The remaining two (third and fifth intervals) are classified as mixed. Finally, in the Nepal earthquake dataset we identify three periods, where the first one represents the resting state, the second the highest activity and the third one a mix of peak and rest.
Once defined our baseline and active periods, we apply the topics extraction procedure to each period and study how the hashtags communities and users' interests evolve through time. To do so, for each period we compute the hashtag co-occurrence network and extract the most significant communities from it, defining our topics and hashtags' and users' membership as described in section 2.2.

Topics evolution
From the networks it is possible to study how information topics evolve, and the corresponding discussions re-arrange around the external events. Table 2 summarizes the main features of the co-occurrence networks obtained from the data. A result that first catches the eye is the fact that, even if the networks for different periods have different sizes-e.g., the number of nodes/hashtags for the Catalan referendum dataset ranges from 310 to 1602 between the least and most active periods-the number of communities/topics is more or less constant. This contrasts with our assumption that, during relevant events, collective attention focuses only on one or a few 'important topics'. This discrepancy is solved if we consider not only the number of topics but also the activity around them, in terms of the number of tweets they generate. Figure 4(d) shows how tweets are distributed among the ten largest topics for the different periods of the Spanish elections dataset. Although activity in general is highly heterogeneous, however, during peak periods (marked by gray bars) this tendency  exacerbates with one or two topics generating almost the 90% of the tweets (we obtain similar results for the other two datasets, not shown). Taken together, the last two results (i.e., the almost constant number of topics over different periods, as shown in table 2, and the higher heterogeneity in attention during peak events presented in figure 4(d)) depict a clear scenario. During normal activity periods, users' interests are focused on specific topics and hashtags compete for attention inside individual discussions. When an external event captures collective attention, original topics remain active but generate way less activity than before. To visualize how attention flows between the different topics over time, figure 5 shows an alluvial diagram of the activity around the topics in the Catalan referendum dataset (we obtain similar results for the other datasets, not shown). In each block (relative to a time period) the size, in terms of number of tweets, and the flows of activity for the four largest topics are shown. As a confirmation of the results in figures 4(c) and (d), during peak periods (first, third and seventh) all flows converge around one single topic. When external events fade out, flows split back to different discussions or, some hashtags stop to be used during calm intervals (e.g. the second period) to regain interest in high activity periods as shown, for example, by the large flows moving from the first period to directly the third.

Users' similarity
Once studied how exceptional events shape hashtags dynamics, we can now focus on the other side of the coin: users' behavior. We use the cosine similarity between user-topic vectors to quantify interests overlap between Figure 6. Cosine similarity between user-topic vectors of the most active users for the Catalan referendum dataset. Cosine similarity has been calculated following equation (1). The most active users have been selected as the ones that participated in the discussion for, at least, the 90% of the time and posted at least ten tweets during the whole period, leading to 400 individuals. Shaded areas mark peak periods. Figure 7. Estimation of competitive and mutualistic strengths for: hashtag-hashtag and user-user competition (first and second row, respectively) and user-hashtag mutualistic interactions (third row) for the 400 most active users and the hashtags they used. Interaction strength for each user/hashtag is calculated from the matrices β HH , β UU and γ UH as the average value of each row. In the box-plots the gray line represents the median of the distribution and colored areas the 25th-75th percentiles. Shaded areas mark peak periods. pairs of users. A higher average similarity will point to a focus of collective attention around few specific topics and, in turn, increased competition.
In figure 6 we present the distribution of similarity between all the possible pairs of the most active users (active at least 90% of the time and who posted at least ten tweets with hashtags for a total of 400 users) for the Catalan referendum dataset. As expected, the results match what we found for the hashtags. During low activity intervals (i.e. periods 2 and 4) users' similarity is practically uniformly distributed between 0 and 1 meaning that separated discussions are taking place, each with an-almost-equivalent activity. On the other side, high activity periods are characterized by an almost perfect alignment between users with the similarity distribution peaked around exactly one: everyone is talking about the same topic. Figure 8. Re-scaled difference between competition and mutualism over the different periods. Each point represents the average difference between competitive and mutualistic strength for each user (hashtag) re-scaled to the value of the observed during the resting period (fifth interval for the Spanish elections, first for the Nepal earthquake and second period for the Catalan referendum dataset). Shaded areas mark peak periods.

Quantifying competition and mutualism during exceptional events
The results so far in figures 4-6 can be easily understood in terms of visibility maximization. In normal times, users tend to focus on their specific interests and aim for visibility inside their communities. Important events can captivate everyone's attention, leading users to join the conversation. This simple response driven by individuals' strive for visibility has some complex consequences, however. On one side, an increase in competition for attention. With a huge amount of messages covering one single topic, the probability for each user's ideas to be heard decreases. On the other side, the use of hashtags related to relevant topics can lead to a mutualistic (positive) interaction as they allow to reach a larger audience. Our goal is to quantify the strength of both these interactions to understand which one dominates the dynamics in the different periods. To do so, as detailed in section 2.4, from users' and hashtags' similarities we build three interaction networks-two for Figure 9. Comparison between the effective mutualistic interaction strength γ UH ij θ UH ij extracted directly from the data and the one obtained from the visibility optimization model. (panels (a)-(c)) Calm-to-peak transition. In this case, the model is initialized as follow: to simulate the effect of an exceptional event we consider the β and γ matrices for the 400 most active users and the hashtags they used from a high activity period in the dataset (period 2 for the Spanish elections and the Nepal earthquake datasets, period 1 for the Catalan referendum) while the user-hashtag interactions θ that mark if a user posted a certain hashtag is taken from the previous/next calmer period (period 1 the Spanish elections and the Nepal earthquake and period 2 the Catalan referendum). (panels (d)) Peak-to-calm transition. Matrices β and γ for the 400 most active users the relative hashtags are selected from a calm period following a period of high activity for the Spanish elections dataset (period 5) while matrix θ is extracted from the previous period (period 4). In all the cases we let the model evolve for 5 × 10 4 rewiring attempts. The remaining parameters are Ω c = Ω m = 0.08 for the Spanish elections and the Nepal earthquake, and Ω c = Ω m = 0.1 for the Catalan referendum. The box-plots represent the distribution of the values γ UH ij θ UH ij . The orange lines represent the median of the distributions and shaded areas the 25th-75th percentiles.
user-user (β UU ) and hashtag-hashtag (β HH ) competition, and one for user-hashtag mutualism (γ UH )-and analyze changes in nodes strength throughout time. Using niche overlap to estimate competitive and mutualistic strengths [26] has the advantage of re-scaling both terms making them comparable. This way we can quantify how much each one weighs in the dynamics of the system.
The box-plots in figure 7 show how strength is distributed in the different periods for the three datasets, and the 400 most active users and the hashtags they used. In all the cases we find an increase in both competition and mutualism during high activity events. However, there are differences between interaction types. Competition experienced by users is quite high even during low activity periods. The arrival of an external event, and the subsequent shift in attention, only slightly increases its strength. Hashtag competition instead is, in general, lower during resting periods, but shows a larger increase during events. Something similar happens with mutualistic interactions: with moderate values (with a median of around 0.4) for normal periods and sharp increases up to almost 1.0 in peak periods.
These differences, in both basal and aroused levels, lead to the question of which mechanism weights the most during the different periods. To answer it, in figure 8 we plot the average difference between competitive and mutualistic strength for each of the 400 users and associated hashtags, re-scaled to the value they experience during the resting periods. Interestingly, we find two opposite behaviors for users and hashtags. During the peak intervals, the effective competition perceived by the users decreases considerably as mutualistic strength increases more than competitive strength. This effect pushes users to actively participate in the discussion as they see that the ideas they post receive more attention. On the contrary, during events, hashtags on average experience an even stronger competition with respect to normal periods. This is probably due to the fact that hashtags not related to important topics will receive extremely low attention, while the ones affiliated with the major topics have to endure the increased pressure generated by users' interests.

Modeling attention switches during exceptional events
After quantifying the effective competition experienced by the agents, we conclude our analyses by trying to understand what are the major drivers behind the re-organization of the information topics we see in the data. To do that, we employ the visibility optimization model introduced in section 2.3. In particular, while in [21] the authors run the same model on synthetic data to reproduce the structural reorganization of the user-hashtag interaction networks during exogenous events, here we go one step further and inform the model with the matrices obtained directly from the events datasets. This way, we can test the hypothesis that visibility optimization, and thus the reduced perceived competition, is the main driver behind the focusing of collective attention.
We do so by running the model for two consecutive periods of our data: a calmer one followed by a peak one-e.g., periods 1 and 2 in the Spanish elections dataset-and then comparing the user-hashtag network optimized by the model, γ UH ij θ UH ij with the one extracted from the data. Specifically, to simulate the arrival of an external event, we initialize our simulations with matrices β UU , β HH and γ UH from the peak period, but with the user-hashtag adjacency matrix θ from the previous, calmer one. In this way, θ represents the interests users had before the arrival of the event, while β and γ denote the push that the new topic exerts on the agents. We then let the model evolve, rewiring the connections in θ, until a steady state in species abundance has been reached (see the caption of figure 9 for the model parameters). At the end of the simulation, we compare the effective mutualistic interactions γ UH ij θ UH ij obtained from the data of the peak period with the one optimized by the model. In addition, we also check if visibility optimization is able to explain not only the focus of collective attention around a specific event, but also the opposite transition when, at the end of a peak period, users focus back on their original topics of interest. Consequently, we initialize the model with matrices β UU , β HH and γ UH from a calm period (period 5 for the Spanish elections), and then optimize matrix θ from the previous peak period (period 4).
In figure 9, the box-plots show the comparison between the data and the model, for both the calm-topeak transitions for all the three datasets (panels (a)-(c)) and the peak-to-calm (panel (d)) transition for the Spanish elections. In all the cases, we find a quite good agreement between matrices γ UH ij θ UH ij , demonstrating that the optimization process alone reproduces the effective strength of the user-hashtag interaction networks extracted from the data. This further confirms our previous results, i.e., that users focusing on the trending topics during exogenous events experience a lower effective competition and, once the event ends, visibility optimization drives them back to their original interests.

Discussion and conclusions
The advent of OSNs changed the way in which we, as a society, process information. We moved from the classical one emitter-multiple receivers scheme typical of the mass media era to an intricate network where everyone is a producer and consumer at the same time. Thus, understanding the drivers that shape our collective behavior in this new scenario is of paramount importance to avoid distortions of our information processing such as the spreading of fake news or the formation of echo-chambers that can alter the democratic debate.
In this work, we built on the analogy between natural and information ecosystems to quantify the competition for attention experienced by the agents of an OSN during exceptional events and how they respond to it. In particular, employing a modeling framework based on ecological niche-theory, we were able to measure the intensity of competitive and mutualistic interactions between users and hashtags. Our results show that, although during the peak of the event, competition sensibly increases for both users and hashtags, it is also associated with a sharp increase in users' mutualism. Interestingly, for users-that, on average, already experience higher levels of competition-this translates into a positive gain in terms of interactions, effectively reducing the net competition. On the other side, this effect is limited for the hashtags that end up experiencing stronger competition. Moreover, those patterns seem to be universal as they do not depend on the nature of the event-political, social or natural-nor on the way tweets have been collected.
Our findings also have some profound implications in terms of information processing and collective behaviors as they bring light to the mechanisms behind the emergence of collective attention in online social media. In [21], employing the same visibility optimization model, the authors showed that exogenous events cause a structural change in the user-hashtag interaction network. Here, informing the model with real data, we understand why this new configuration is more advantageous for the users. The agreement between the data and the numerical simulations suggests that the seek for visibility drives users' behavior: individuals participating in online discussions tend to maximize their visibility inside theirs topics of interest leading to high levels of competition inside each topic. At the arrival of an event, the same visibility maximization invites users to follow the dominating topic, increasing competition but also creating larger audiences. This reorganization, however, is stable because mutualistic advantages exceed the stronger competition.
Although the success of our modeling in shedding light on users' behavior is encouraging, it still represents an idealization of the complex dynamics taking place in online social systems. For example, we are implicitly assuming, as in the vast majority of ecological models, that mutualistic (competitive) interactions only take place between species of different (same) guilds. However, this assumption may overlook some second order dynamics, like intra-guild mutualism that has been recently discussed in the ecological modeling literature [41,42]. In Twitter, these effects translate into positive interactions between users or hashtags, for example the boost in visibility obtained by one user when their messages are retweeted by another one or the co-occurrence of hashtags that allow each of them to reach larger communities. The inclusion of such mechanisms could help to understand the emergence of influential users and the endogenous dynamics of Twitter events. Finally, another simplifying assumption in our framework is that the interaction strength Ω is set equal for competitive and mutualistic interactions (Ω c = Ω m ). This is a common assumption in ecological modeling to reduce the parameters space and, in our framework, it is needed to correctly compare the interaction matrices β and γ. Nevertheless, inferring Ω from the data would allow for a better characterization of competition dynamics in OSNs.
In conclusion, we believe that our work demonstrates, once more, how an ecological approach to the study of information ecosystems can provide useful tools for understanding collective and social dynamics.