Nearcasting forwarding behaviors and information propagation in Chinese Sina-Microblog

As the largest social media in China, the Sina-Microblog plays an important role in public opinion dissemination. Despite intensive efforts in understanding the information propagation dynamics, the use of a simple outbreak model to generate summative indices that can be used to characterize the time series of a single Weibo event has not been attempted. This work fills this gap, and illustrates the potential of using a simple outbreak model in conjunction with the historical data about the cumulative forwarding users for nearcasting the propagation trend.


Introduction
The Chinese Sina-Microblog is reported to have 446 million active users per month (32% of the Chinese population) at the end of 2018 and is particularly popular among the young generations. Take the event (Weibo) "304 car theft case" as an example, around 185 thousand of forwarding users were generated into 36 hours since it was posted [1]. This important platform presents some unique characteristics different from other microblogs for public opinion diffusion [2,3] that a systematic approach towards understanding the diffusion patterns and nearcasting the information propagation is called for.
The information propagation of a microblog is created through the action of "follow" connecting as an interaction network, where a following user is called "follower" and a followed user is called "followee" [4,5]. There are a large amount of literature and there seem to be three typical and important information diffusion models developed for microblogs [6] according to the spreading process, influence ability [7,8] and forwarding factors [9,10]. The model we are exploring is based on the propagation process as we focus on the dynamic changes of the status of users during the spreading process.
Our approach follows the classical rumor model that evolved from the epidemic models, in which the population is stratified into three mutually exclusive and exhaustive classes: heard rumor (ignorants), actively spreading rumor (spreaders) and no longer spreading rumor (stiflers). The propagation of information (rumor) occurs through the transitions of infections from ignorants to spreaders, and removals from spreader to stiflers [11]. This is very much similar to epidemic models for infectious disease spreading in the population. Extensions from this classic rumor model to information spreading in microblogs include propagation transition models [12][13][14][15][16] and population classification models [17][18][19][20][21]. In particular, taking into account of different behaviors of differential users in different platforms, Huang et al. [12] extended the rumor spreading model to characterize browsing behaviors of users and examine differentiating rumor refuting effects. This model involves a parameter similar to the birth rate in a standard epidemic model to describe the entry of new susceptibles. Liu et al. [13] proposed a modified rumor spreading model (SIRe), where two contact parameters are used to describe the stifler's broadcasting effect and social intimacy degree. Su et al. [14] developed a Microblog-Susceptible-Infected-Removed (Mb-SIR) model for information propagation by explicitly considering the incomplete reading behaviors of users using the probability that a newly posted or retweeted message will be read by its followers. These models also utilized real data from the Chinese Sina-Microblog in their numerical simulations.
Our framework follows the work of Borge-Holthoefer et al. [15] that considered the situation where spreaders are not always active and an ignorant is not interested in spreading the rumor. Their numerical simulations were based on data from Twitter. A similar model for an information dissemination network was proposed in [16], where different transition probabilities from the spreader stage to stifler stage and from the ignorant stage to stifler stage were used. This set a theoretical foundation to compare microblog information dissemination and epidemic disease spreading.There have been other efforts to incorporate further states during the propagation process, in addition to the states of ignorant, spreader and stifler. In particular, reflecting some information features and diffusion characteristics of Weibos in the Chinese Sina-Microblog, Li et al. [17] incorporated the number of fans of infectious and validated the modified SIR model using actual event data. Liu et al. [18] considered a dynamic model to characterize the super-spreading phenomenon in tweet information propagation. An ignorant spreaders, super-spreaders, stiflers model (ISJR model) was proposed in [19] that considered the role of super-spreaders to show how super-spreaders can accelerate the information dissemination and amplify information influence in microblogging networks. Other models have also been proposed to incorporate the recovery state of users in different platforms including the Chinese Microblogs, Japanese Mixi, and Facebook [20,21].
Here we consider the capacity of using a simple compartmental model for information propagation of Microblog for nearcasting the trend of information propagation in the Chinese Microblogs. Nearcasting is an important issue for assessing public opinions, that aims to project the forwarding trend at the earliest possible stage of a Weibo outbreak so interventions for the information propagation and/or rapid response to the public opinions can be designed and implemented effectively. Our ultimate goal is to develop nearcasting technologies for a group of Weibos (similar to "Tweet" in Twitter) in the ecosystem of Chinese Microblogs. To achieve this goal, we need to develop computable summative indices to characterize each Weibo, and see how these indices can be effectively calculated or at least estimated from the public available information at the early stage of information propagation of the Weibo. Here,we formulate the basic compartmental model (SFI) and introduce the in the Chinese Sina-Microblog (Section 2), and then contrast two methods to use this SFI-model for nearcasting.
2. The SFI model for forwarding behavior

Model description
We consider a population of Chinese Sina-Microblog users, stratified in terms of three distinct states: the susceptible state (S ), in which users are unaware of but susceptible to the Weibo; the forwarding state (F), in which users have been forwarding the Weibo actively to influence other users; and the immuned state (I), in which users have already forwarded the Weibo, but are no longer forwarding the Weibo and even if receiving it again.
Assuming the populations susceptible to the Weibo is sufficiently large and denoting the total numbers of users in the susceptible, forwarding and immune state at time t ≥ 0 by S (t), F(t) and I(t), respectively, we obtain the following susceptible-forwarding-immune model: 2)  In the model, β is the average number of exposures to the Weibo of a susceptible user, p is the probability that the exposed user will forward the Weibo and α is the rate at which a user in the F state becomes inactive to forwarding. The mass action term can also be interpreted as follows: an active forwarding user will contact an average number of βN users per unit time, among which pβN will choose to forward the Weibo and (1 − p)βN will not. Since the probability of a contacted user to be a susceptible user is S (t)/N, the number of exposed users who leave the state S are βS (t)F(t) among which new forwarding and immuned users are pβN(S (t)/N)F(t) = pβS (t)F(t) and (1 − p)βS (t)F(t), respectively. As usual, 1/α is the average duration a F-user remains active in forwarding.
An important distinction between a standard epidemic-SIR model and the Weibo-SFI model is the direct immunity of a susceptible user gained through exposure to the Weibo. The parameter 1− p reflect the suitability of the Weibo for a susceptible user to trick the activity of forwarding. This SFI model was used in [15,16] as a basic building block of a more complicated framework. We adopt this for the Weibo information spread dynamics model in the hope this can be further expanded to discuss the dynamics of an ecosystem consisting of multiple Weibos sharing a same set of key words. The novelty of our approach here is to develop some analytic indices which will be used in our subsequent studies to consider Weibo spreading dynamics in a complex Weibo ecosystem, and explore the feasibility of using openly available Weibo data from the Chinese Sina-Microlog to estimate these indices for the purpose of nearcasting the propagation trend.
The official Chinese Sina-Microblog only provides limited information of the propagation, an important piece of information we can obtain from the Chinese Sina-Microblog directly is the number of cumulative forwarding users given by Observe that this is NOT a new compartment from the coupled system (2.1). Instead, from Eq (2.1), we can obtain It is easy to show that S (t), F(t), I(t) and hence E(t) all remain nonnegative. We consider the case when the Weibo is posted by a single user at the beginning, leading the initial condition: From Eqs.(2.1) and (2.5) it follows that S (t) is decreasing since S (t) = −βS (t)F(t) < 0, and that the function E(t) is increasing since E (t) = pβS (t)F(t) > 0. Therefore, S (t) decreases to a limit S ∞ := lim t→∞ S (t) > 0, E(t) increases to a limit E ∞ := lim t→∞ E(t) < N, F(t) tends to 0 (F ∞ = 0), and I ∞ = N − S ∞ . These limits are shown to be relevant to the so-called final size of the Weibo spreading.

The Weibo propagation indices
Weibo reproduction ratio o : The initial growth of the F-population is given by r = pβS 0 − α, and the propagation of the Weibo will never take off if as the Weibo reproduction ratio. Then o < 1 implies a rapid decline of F-user which results in information propagation never taking off. However, when o > 1 the F-population used grow exponentially initially.
Index o denotes the number of F-users generated by one active forwarding user during an active period. A typical curve for F-population with a Weibo with o > 1 has the bell shape as shown in Figure 2. Figure 2. The temporal variation of the actively forwarding and accumulated forwarding users, F(t) and E(t), expressed as functions of the time variable t (horizontal axis), throughout the paper, the time unit is day and the variables for the numbers of the users in different compartments are real nonnegative numbers.
Maximal Weibo forwarding users F max : The maximum of the F-curve is achieved when F (t) = 0. At this point, we have S (t) = α/pβ. We also use Eqs.(2.1) and (2.2) to get Integrating then yields Maximum Weibo cumulative forwarding users E s : The maximum Weibo cumulative forwarding users E s , which represents the number of users that encountered the Weibo, is also a factor we are interested in. To establish an analytic formula, we integrate from 0 to +∞ on both sides of Eq (2.1) to get Adding Eqs.(2.1) and (2.2) and integration yields also From Eqs.(2.11) and (2.12), we get In order to analyze the relation between E and S , we integrate from Eqs (2.1) and (2.5) to obtain Similarly, E 0 = F 0 = 1, F ∞ = 0 and o = pβS 0 /α, from Eqs (2.12) and (2.13) we get ( notify and then We observe from Eqs (2.9) and (2.15) that, with a fixed o , the final outcome of the Weibo spread is determined by parameter S 0 . Note that, differently from an epidemic study, the initial susceptible Weibo users are usually unknown.
Weibo public opinion times and velocities: To understand the global performance of the Weibo ecosystem, we need to know the take-off and extinctive time of a particular Weibo, denoted by t b and t e , respectively. The definitions depend on the threshold F * which is set in advance such that . In the experiment we design, we let F * = 0.1 × F max . The difference, t e − t b , is the duration during which the Weibo remains active in the ecosystem. We denote this by t i := t e − t b . Relevant to the timings and the spread duration, we can define the (initial) outbreak velocity V o , the propagation decline velocity V d , and the average spreading velocity V a as follows: where t max is given when F(t) = F max . Note that t b , t e , t max , V o , V d and V a can be calculated when model parameters are estimated although we do not have the closed form of these quantities. A schematic picture is given in Figure 2.

An example:
The number of cumulative forwarding users can be collected through the Weibo' Application Programming Interface (API). Table 1 lists such a data set, studied in paper [22], for an actual event lasted 16 days. The LS method [23] can be used to estimate the parameters of the SFI model (2.1), (2.2), (2.3) and (2.5). The parameter vector is set as Θ = (β, α, p, S 0 ), and the corresponding numerical result for E(t) is denoted by f E (t, Θ). The LS error function can be used, where ID denotes the real number of cumulative forwarding users given in Table 1, t k = k is the sampling time, k = 1, 2, ... 16.
We use the standard package ( fmincon ) of MATLAB to solve a nonlinear LS problem, with parameter initial values and ranges given as follows: β ∈ [2 × 10 −     Figure 3a shows the fitting result between the numerically calculated and the actual numbers of cumulative forwarding users. From the data fitting, we can estimate the model parameters and then inform other information not available from the API. Namely, we can obtain the temporal variations of the numbers of susceptible, forwarding, immunized and cumulative forwarding users, S (t), F(t), I(t) and E(t), after obtaining the model parameters.
Note that for the Weibo data listed in Table 1,the estimated basic reproductive ratio o = 2.5460, and since it is much greater than one, we should expect a rapid information spread at the beginning of the propagation. Figure 3a does confirm this expectation. We should also expect, from surveillance data of infectious disease outbreak such as influenza, a large number of F-users and a large number of E-users for the Weibo. The numerical simulations reported in Figure 3b, however, shows otherwise: despite a large number of susceptibles ( S 0 = 1.0221 × 10 7 ), the cumulative E-users during the entire propagation is only E s = 9.2430 × 10 5 and maximum F-users at the propagation peak is only F max = 2.4724 × 10 5 . A reason is that p is only 0.1006 and hence a large number of individuals exposed to the Weibo became immunied immediately upon exposure.

Method 1 (all parameters estimated):
The numerical illustration in the last section indicates a good data fitting results using the numbers of cumulative forwarding users during the entire outbreak period. These numbers can be collected from the API, and therefore a Weibo can now be characterized by the SFI-model parameters (β, α, p, S 0 ), using the information from API retroactively.
An important question in Weibo information management is how many units (days) of data about the cumulative forwarding users we need to be able to estimate the model parameter and thus make the prediction for the near future about the propagation trend and calculate key Weibo indices. Figure 4 reports our numerical experiments in which parameters (β, α, p, S 0 ) are estimated from the past days, and then the cumulative F-users are predicted and compared with the actual data from API data.  . Numerical experiments with all parameter estimated based on days since the outbreak started: the numbers of the E-users which we used for prediction are marked with a red asterisk; the numbers of the E-users which we predict are marked as pink circles; the predicted E-population and F-population are shown using a blue line and black line, respectively; and the maximal F-population is marked as black asterisk. In the figure, the horizontal axis is the time (as unit time by day) and the vertical axis is the number of users.
Comparing with the estimated parameters β = 2.5651 × 10 −6 , α = 1.0365, p = 0.1006, and S 0 = 1.0221 × 10 7 from the entire outbreak duration, we observe that the estimated parameter and the resulted SFI-model prediction does not fit the actual data until 7 days have pasted, when the peaking time is 8.49 days.

Method 2 (a-prior estimation):
Therefore, the use of an SFI-model for nearcasting is not promising if we need to use the historical data to estimate all the model parameters altogether. On the other hand, in the Chinese Sina-Microblog, parameter β is determined by the compactness of the network structure and parameter α is usually user-specific rather than Weibo-specific. Thus, it is feasible to estimate parameters β and α before the outbreak of an event ( for example from other events ). Figure 5 gives the numerical results with fixed β and α that are given a-priori.
This experiment indicates the remaining parameters ( p, S 0 ) can be estimated from the data in the first 3-days, much ahead of the outbreak peak time. Therefore, the nearcasting capacity of an SFImodel can be significantly enhanced should the network-specific parameter β and user-specific α be estimated in advance.

Comparison:
The numerical experiments with all parameter estimated (Method 1) and some parameters a-priori estimated (Method 2) clearly show that Method 2 has a much better performance in nearcasting with limited historical data well before the peaking time of the Weibo information outbreak. Tables 2, 3, 4 give the comparison of indices o , F max , E s , indices t b , t s , t i , t max , and indices V o , V e , V a , separately. For Method 1, Tables 2,3,4 show that at the beginning of the Weibo propagation, our predictions have low accuracy when using the historical data until Day 7. Using the historical data until after the propagation peak (Day 8), the predicted maximum F-users (F max ), cumulative F-users (E s ), reproduction ratio ( o ) and the outbreak velocity (V o ) are larger than the actual values, indicating over-estimation of the propagation potential. However, as one gains more and more historical data, the prediction converges to the actual values. For the prediction of key times during the Weibo information outbreak, Method 1 can produce predictions of key instants, after Day 7, such as take-off, extinction, duration and propagation peak time within 1-day error. The predicted maximal forwarding users (F max ), the outbreak velocity (V o ), the propagation decline velocity (V e ) and the average spreading velocity (V a ) are all consistent with corresponding actual instants. So, the time after which a good nearcasting prediction is possible is Day 7, and the prediction quality is very high if historical data until Day 9 is used. Method 2 is more effective in nearcasting both in terms of estimating the peaking time at the 3day, but also estimating all relevant indices. Therefore, it is advised that one should perform the nearcasting by estimating parameters β and α before the outbreak of the event. This proposed apriori estimation method for nearcasting emphasizes the importance of understanding the propagation of other previously happened Weibos propagation events in advance, in order to better monitor and respond to public opinion dynamics at a real time. known actual E unknown actual E predicted E predicted F predicted F max Figure 5. Numerical experiments with a-priori estimated (and thus) fixed β (2.5651 × 10 −6 ) and α (1.0365) based on days since outbreak started: the numbers of the E-users which we used for prediction are marked with a red asterisk; the numbers of the E-users which we predict are represented by pink circles; the predicted E-population and F-population are shown using a blue line and black line, respectively; and the maximal F-population is marked as black asterisk. In the figure, the horizontal axis is the time (as unit time by day) and the vertical axis is the number of users.

Discussions
This paper concerned with nearcasting for forwarding behaviors in the Chinese Sina-Microblog based on the use of a simple compartmental SFI model for a Weibo to be forwarded by users exposed to the Weibo. A significant difference of Weibo information propagation from the pathogen of an infectious disease is that an exposed user may gain immunity to the Weibo so the user becomes completely uninterested in forwarding the Weibo.
The relative simplicity of the model permitted the construction of various indices and their calculations/estimations: the Weibo reproduction ratio, the maximal forwarding users, the maximum cumulative forwarding users, and critical propagation peak time, taking off and extinction times, and propagation velocities during different phases of the information outbreak. An important issue, to use this simple model to predict these critical indices and thus contribute to nearcasting propagation trend based on historical Application Programming Interface (API) data, was addressed through some numerical experiments based on a real Weibo event data. We considered two cases, where we can accurately estimate all model parameters with the historical data passing the peaking time, or we can rapidly and accurately estimate two critical parameters relevant to the Weibo when two other model parameters relevant to the network characteristic and forwarding users' waning interest rate are estimated a-priori. This second approach clearly shows the nearcasting capacity of a simple compartmental SFI-model as long as we have some information about the network characteristics and the anticipated public interest in a class of Weibo events.
The simple model provides two important pieces of information about the propagation characteristics of a particular Weibo: the intrinsic growth rate (the basic Weibo reproduction ratio), and the maximal capacity of the network for the particular Weibo event (the maximal number of cumulative forwarding users). As anticipated from long time intensive modeling study of biological ecosystems, the intrinsic growth rate and the maximal capacity gives the logistic growth dynamics [24] of a single Weibo propagation dynamics. A full understanding of this propagation dynamics of a single Weibo event is the first critical step towards examining the propagation dynamics of a group of interactive Weibo events in a complex network.
There are many different model frameworks, as briefly reviewed in the introduction. We have listed existing studies which are directly relevant to our mechanistic approach towards developing a simple compartmental model for characterizing the Weibo information spread in the Sina-Microblog. We refere to [25][26][27] and references therein for further studies at the intersection of mathematics, epidemics, information diffusion and control. Interestingly, the work [25] that introduces a class of infants with maternal antibodies giving passive temporary immunity for the considered infectious disease may find an application for our modeling Weibo spread within the Sina-Microblog as some immediate followers of opinion leaders may inherit a certain immunity for a Weibo. Obviously, the Sina-Microblog possesses all kinds of features of complex networks [26], the sub-network structure and the network topology require much further investigations that our simple mechanistic model can serve as a building block towards a more complicated network framework. A primary goal of nearcasting the propagation trend is to inform optimal design of interventions, so the methodologies designed for the control of infection dynamics [27] may also be adopted in our simple model and its variations.
We mention also that a variant of the epidemiological SIR model was used in [28] to accurately describe the diffusion of online content over the online social network Digg.com. The work examined also qualitative properties of our viral information propagation model, demonstrate the model's applications to nearcasting social media spread in online social networks. Alternatively, a linear diffusive model was proposed and considered in Feng et al. [29,30] using a temporal-spatial partial differential equation model to explain these rates of spread in the DOSN. The PDE and network analogue of the SFI model should be developed to reflect the spatial spread and network heterogeneity.