Dynamic Influence Prediction of Social Network Based on Partial Autoregression Single Index Model

Everything is connected in the world. From small groups to global societies, the interactions among people, technology, and policies need sophisticated techniques to be perceived and forecasted. In social network, it has been concluded that the microblog users influence and microblog grade are nonlinearly dependent. However, to the best of our knowledge, the nonlinear influence predication of social network has not been explored in the existing literature. This article proposes a partial autoregression single index model to combine network structure (linear) and static covariates (nonparametric) flexibly. Compared with previous work, our model has fewer limits and more applications. The profile least squares estimation is employed to infer this semiparametric model,andvariablesselectionisperformedviathesmoothlyclippedabsolutedeviationpenalty(SCAD).Simulationsareconducted todemonstratefinitesamplebehaviors.


Introduction
The 21st century has seen the explosion of data collection in this era of Big Data.Nowadays the focus has inclined from time series and longitudinal data to network data.Specifically, the network data can show details of complex relationship rather than isolated individual.The rising of such data has been fundamental in many areas, including biomedical sciences [1], transportation [2], socialization [3], and physics [4].
In consideration of traditional dynamic data, time series analysis [5] played an important role.Common univariate models included ARMA, ARIMA, ARCH, and GARCH.While multivariate time series also payed attention to the relationship between variables, the classical models such as vector autoregression (VAR) and state space model have been well-studied, with illustration in Fan (2003) [6] and Box (2010) [7].The longitudinal data sets arising often in applied sciences grabbed the scientific interest of either the pattern of change over time or simply the dependence of outcome on the covariates.Statistical methods to longitudinal data were well developed.The earliest research was parameter regression method, including linear model, generalized linear model [8], and mixed effect model [9].When the actual model was not clear, Yang et al. (2009) [10] have applied a single index model to model longitudinal data and established a large sample property of estimation of index coefficients.The semiparametric regression model combined the advantages of parametric and nonparametric regression models.Li (2015) [11] inferred longitudinal data with partial linear single index model.In addition, partial differential equation (PDE) containing unknown multivariable functions and their partial derivatives could also depict multidimensional dynamic system; see Wu (2014) [12] and Chen (2017) [13].
Involved in another type of data, network analysis offers a wide variety of tools to observe complex connected systems, which has changed our way of perceiving and analyzing networks in the world.The ability to store and operate with network data in a digital environment has enabled multiplicity of new analytic methods.Besides the developed static network analysis, researchers have made a substantial effort on dynamic network to explore its inherent evolution.
Recently, anomaly detection, specifically the incremental community detection of dynamic networks, has attracted a lot of attention.In [14] Bansal et al. have developed iterative execution at vertex level based on the CNM algorithm, and Nguyen et al. [15] have proposed a rule-based QCA algorithm.
As regards the complex dynamic network, Lu (2016) [16] has extended H-index to quantify nodal influence with emphasis on spreading influence of node.Meanwhile, we focus on the nodal influence predication of social network.The current frontier is the Network Vector Autoregression (NAR) model (Zhu, 2017) [17], which made use of a linear regression approach synthesizing time series dynamics and network structure.However, it is hard for such a linear model to fit complex and volatile configuration all the time.Motivated by NAR model, this paper introduces a competitive semiparametric approach that applies partially linear single index models to dynamic autoregression networks analysis, which inherits the advantages of both models.
When conducting quantitative analysis, the real model and its variables is usually unknown and misspecified.A crucial problem in building a multiple regression model is the selection of predictors to include.When the number of predictors is large compared with the sample size, it is desirable to produce sparse models that involve only a small subset of them.To improve prediction accuracy, the variable selection should be done in advance.The most commonly used criteria exist in large numbers, such as AIC, BIC, Cp [18], LASSO [19], group LASSO [20], SCAD [21], and elastic net [22].It is worth noting that SCAD, which is employed in our paper, has superior oracle properties.
The rest of the paper is organized as follows.In addition to constructing the partial autoregression single index model, related well defined properties are introduced in Section 2. Section 3 adopts the profile least squares for parameter estimation of our model.Abundant simulations are carried out in Section 4 to support the small sample performance of this method.We also confirm that our model is better than previous model.Our proposed model can be applied in dynamic influence (response variable) prediction of users.Specifically, nonlinear static covariates, such as users age, gender, and registration time, are modeled in single index part.The current average effect of other users is quantified linearly while the previous nodal influences are autoregressive.However, since there is no appropriate dataset to our knowledge, we have not performed this empirical analysis.Relevant technical proofs will be delayed to the appendix.

Model
Define the number of nodes in the social networks as , which usually takes large value.To illustrate the relationship among nodes, we denote the adjacency matrix  = (  ) ∈ R × , where   = 1  0. If   = 1( → ), node  can be affected by nodes ; otherwise,   = 0.   represents the response of the th node at time .For each node , assume that there exists -dimensional time-independent feature covariate   = ( 1 , ⋅ ⋅ ⋅ ,   )  .Considering the flexible correlation between above explanatory and response variables, we present the following partial autoregression single index (PASI) model.
We merely discuss the  = 1 case of this model for simplicity since they share similar inferences and properties.
This PASI model is divided into linear and nonlinear parts.Nonlinear (   ) characterizes the influence of   on the dependent variable, which extends the model range of nodal effect.Let  = ( 1 , ⋅ ⋅ ⋅ ,   )  ∈ R  denote the covariable coefficient (i.e., nodal effect coefficient).In the linear part, indicates the average effect of others on the node  at time  − 1, and  1 is the network effect coefficient. (−1) shows the standard autoregression effect, which means the observation at moment  − 1 is correlated to   , and  2 is autoregression coefficient.
It is worth noting that the nonparametric/nonlinear effect of static covariate is nontrivial actually.Ma et al. (2013) [24] have concluded that the microblog users influence (i.e.,   ) and microblog grade (i.e., certain component of   ) are nonlinearly dependent.Thus, the pure linear NAR model is not sufficient enough and the PSAI model is established, which has few limits and can be applied to more areas.Similar to NAR model, we address the identification of PASI model (3).The proof of the following theorem is delayed to the appendix.
Theorem 1. Suppose that ‖  ‖ <∝ and  is a fixed value.If | 1 | + | 2 | < 1, there exists a sole strictly stationary solution for model ( ) and such analytic solution can be expressed as Based on the form of solution ( ), it is convenient to deduce the conditional distribution of Y  given nodal information Z. Denote  * (⋅) = (⋅ | Z) and cov * (⋅) = cov(⋅ | Z) for simplicity.For any integer ℎ, the conditional autocovariance is Γ(ℎ) = cov * (Y  , Y −ℎ ).It is easy to prove Γ(ℎ) =  ℎ Γ(0) for ℎ > 0 and Γ(ℎ) = Γ(0)(  ) −ℎ for ℎ < 0. Conditional mean and variance of Y  are advanced in the following proposition.Proposition 2. Assume the conditions of eorem hold.Conditioned on a given Z, the strictly stationary solution ( ) follows the normal distribution with mean and covariance as follows: where V(⋅) is the vectorization of matrix and ⊗ marks Kronecker product.According to formula ( ), the conditional mean of Y  depends on four factors: the effect of the node B 0 , the effect coefficient of the network  1 , the autoregression coefficient  2 , and the structure of the network .In order to explain this proposition, we will discuss several special cases as below.e proof for Proposition including the following three cases is given in the appendix.
Case .Suppose B 0 =  0 for ∀1 ≤  ≤ ; that is to say, each node has the same nodal effect.Without losing generality, set  0 > 0. It can be proved that eigenvector of (− 1 − 2 ) −1 is 1, and the corresponding eigenvalue is Obviously, the conditional mean is irrelevant to the network structure , and the stronger the network effect (i.e.,  1 ) or the autoregression coefficient (i.e.,  2 ), the larger the value the node mean takes.In this case,  * (  ) →∝ when  1 +  2 → 1.
Case .Assuming   = 1 for ∀ ̸ = , every node is connected to the others, which makes the network fully connected and extremely dense.Generally, this kind of network does not exist.Therefore, we only intend to complete the theoretical integrity.In this case, it can be proved that where B 0 depends on its nodal effect B 0 apparently.Then the node mean will increase with B 0 rising.Under the stability condition Case .First-order Taylor expansion: It is difficult to explain the general network structure through ( 5) and ( 6).Accordingly, we utilize first-order Taylor's expansion of  1 to approximate  * (Y  ) and cov * (Y  ).Note that if | 1 | is relatively large (small), such approximation performs bad (well).
Based on ( 8) and ( 9), It should be noticed that the (∑     0 )/  of formula (10) represents the average nodal impact from its neighbors.We attribute this to the local impact of the node  and denote   = (∑     0 )/  .In addition, (10) shows that node's large impact B 0 and local impact   will amplify its mean.By formula (11), the conditional variance of any node is determined only by the covariance of the autoregression coefficient of  2 and the variance of   .And (12) implies that the correlation of the interconnected nodes (  =   = 1) is stronger than that of unconnected ones (  =   = 0).

Parameter Estimation
As shown in previous section, the PASI model is constructed with the advantages of both partial linear single index and NAR model.We now employ the profile least squares method to estimate the parameters.Denote   = ( 1 / 1 , ⋅ ⋅ ⋅ ,   /  )  ∈ R  as the th column of .Then Model (2) can be reconfigured as follows.
Herein we use the Newton-Raphson iterative algorithms to perform calculation, which iteratively updates estimations of nonparametric and parametric parts through corresponding objective functions.The procedures will not stop until the parameters  converge.ξ can be obtained when iteration stops.Accordingly, we get the following.
Besides, as the consistency and asymptotic normality of the NAR model have been confirmed, the large sample properties of our estimations behave well since nonlinear (⋅) has linear expansion.The related proofs are omitted.Now we consider the original PASI model (1) with order autoregression.When the numbers of covariates  and autoregression terms  are large compared with the sample size , it is desirable to produce sparse models that involve only a small subset of predictors.With such models one can improve the prediction accuracy and enhance the model interpretability.To this end, we use the following objective function to reduce the dimension of models: where   (⋅) is a penalized function with tuning parameter .
It is noteworthy that each component of  and  has different penalty functions with different tuning parameters.In order to select autoregression terms only, we set   1 (⋅) = 0 and adjust the objective as below: Analogous work can be done to select Z-variables: A number of penalty functions have been studied by scholars.
In this paper, we use SCAD penalty for the sake of oracle property, whose first-order derivative is and   (0) = 0, where  = 3.7 and () + = { > 0} is the hinge loss function.The tuning parameters  1 and  2 are chosen by BIC.In the end, we obtain the penalized estimators by minimizing   (, ) with respect to  and .

Numerical Simulation and Results
We demonstrate the advantage of our proposal by three different generation mechanisms corresponding to different adjacency matrices in the following subsections.We ran a simulation with data generated according to where  = √ 3/2 − 1.645/ √ 10 and  = √ 3/2 + 1.645/ √ 10.
The random error (i.e.,   ) comes from the standard normal distribution (0, 1).The covariate   = ( 1 ,  2 ,  3 ) ∈ R 3 follows a multivariate normal distribution.Its mean and covariance matrix are  = (0, 0, 0)  and ∑  = (  1  2 ), where We set the parameter of the covariate to be  = (0.1, 0.3, 0.4)  .The above model has been analyzed by Carroll et al. (1997) [26].In order to generate a sequence Y  of response variables, we need to generate the initial value of dependent variable Y 0 based on the strictly stationary distribution given in Proposition 2. Once the initial value of dependent variable (Y 0 ) is generated, the response sequence (Y  ) can be generated based on formula (3).

. . . Stochastic Block Model.
A stochastic block model [28] divides the N individuals in the network into community locations (B 1 , B 2 , ⋅ ⋅ ⋅ , B  ).In other words, there exists a mapping :  → B  .For example, if an individual  is in the community B 1 , then () = B 1 .As can be seen, the stochastic block model is a simplified representation of the multiple relationships of the network, and it explains the overall structure characteristics of the network.It is noteworthy that stochastic block model is a study at the location level rather than an individual level.
We randomly assign a block label (i.e.,  = 1, 2, ⋅ ⋅ ⋅ ) to each network node.Set  = 5, 10, 20; it indicates that there are  communities in the network.If the node  and the node  belong to the same community, then we set (  = 1) = 0.3 −0.3 ; otherwise (  = 1) = 0.3 −1 .Nodes in the same community are more likely to have edge relationship than in different communities.Set  = 10, ( 1 ,  2 )  = (0.3, 0.6)  .Therefore, we show the network graph generated by a stochastic block model for  =100 and  =5 in Figure 4.Under the same conditions as network graph, the histograms of the out-degree and in-degree for the network nodes generated by stochastic block model are given in Figure 5.In this figure, the horizontal and vertical axes are the out-degree (or in-degree) and frequency, respectively.From Figure 5, it can be seen that the distribution of the outdegree and in-degree of the network is skewed and rightbiased.
. . .Power Law Distribution Model.The power law distribution model [29] reveals scale-free properties for social network; that is, most users have less social relations, and only a very small number of users have more social relations.The scale-free characteristic of the network means that the node degree follows the power law distribution; that is, () ∝  − , where  is the power law index and () represents the proportion of nodes with a degree  in the total network.
Let the in-degree of nodes in the network   = ∑    follows the power distribution; that is, (  = ) =  − ( is a constant).We set the power index  as 2, 3, 5.As shown in Figure 9, the smaller the power index is, the longer the tail of in-degree distribution is.Set  = 10, ( 1 ,  2 )  = (0.4,0.5)  .Figure 7 is a network graph generated by the node degree that obeys the power distribution ( = 2.5). Figure 8 shows a histogram of the distribution of the network's in-degree and out-degree.We can find that the distribution of variables shows normal characteristics; namely, the middle part is high, and the sides are low.
. .Conclusions.Based on the parameter settings in Section 4.1, we next explore the consistency of parameter estimation via Mean Square Error (MSE) criterion.Firstly, we simulated networks generated by three generation mechanisms with distinct network size (i.e.,  = 100, 200, 300).Furthermore, multiple experiments are implemented with repetition times M=30, where θ() = (γ () ,  ()  1 ,  () 2 )  ( = 1, 2, ⋅ ⋅ ⋅ , 30) denotes the parameter estimation in the th experiment.Finally, we derive a MSE by taking average of squared difference between the values of the true parameter and the estimated parameter.Before constructing the PASI model, we need to detect the correlation between explanatory variables and response variables to determine the linear and nonlinear parts in the PASI model.A relational matrix can be used to represent the linear relationship between response variables and interpretation variables.If the correlation coefficient between the response variable and the independent variable is large, these variables are taken as the linear part of the model.Otherwise, then the variables are regarded as the nonlinear part of the model.In addition, we also test whether it is reasonable to establish a linear regression as presented in Table 1.
Figure 11 is a correlation coefficient matrix.The columns in this figure represent the dependent variable   , the covariables  1 ,  2 ,  3 , and independent variable  −1 from left to right.We can see correlation coefficients (  ,  −1 ) = 0.80 and (  ,  −1 ) = 0.42.That is to say, the correlation of  −1 and  −1 with   is stronger than the other three variables.Table 1 shows the results of linear regression  analysis between dependent and independent variables.As a result, coefficients  1 and  2 are significant while  1 ,  2 , and  3 are not significant.Therefore, it is not appropriate to use 1, 2, and 3 as linear parts.Combining Figure 11 with Table 1, we set  −1 and  −1 as linear part and the remaining covariables are used as variables for nonparametric part in the PASI model.
The MSE of parameter estimation are displayed in Tables 2, 3, and 4. They come from dyad independent model, stochastic block model, and power law distribution model, respectively.These tables present consistent results.
As expected, MSE of parameter estimation decreases when network size becomes greater.That is to say, the estimation of parameter is of consistency.
To compare the results provided by the PASI and NAR models, we summarized the MSE of dependent variable over  = 1, ⋅ ⋅ ⋅ , 10 when  = 100 in Figure 12.One can see that the MSE of PASI model are smaller than those of the NAR model in different generation mechanisms of adjacency matrix.This suggests that PASI is a competitive tool for analyzing the nodal effect of finite sample in social network.
where  denotes delay operator.

Figure 3 :
Figure 3: Dependent variable generated by the dyad independent model change over time.

Figure 4 :
Figure 4: Social network of stochastic block model.

Figure 5 :
Figure 5: Out-degree and in-degree histogram of stochastic block model.

Figure 6 :
Figure 6: Dependent variables generated by stochastic block model change over time.

Figures 3 , 6 ,
Figures 3, 6, and 10 show sequence diagrams and histograms under three network generation mechanisms.The vertical axis of the left graph in the three figures denotes the average responses at each moment; that is,   = (1/) ∑  =1   .The horizontal axis denotes 11 moment points.The right graphs of three figures reveal a histogram for dependent variable of node.The horizontal axis of the right graph represents the sum of the response variables of each node at 11 time points, and the vertical axis denotes the frequency.We can find that the distribution of variables shows normal characteristics; namely, the middle part is high, and the sides are low.

Figure 7 :Figure 8 :
Figure 7: Social network of power law model.

Figure 9 :
Figure 9: In-degree histogram of power law model (different power law index).

Figure 10 :
Figure 10: Dependent variables generated power law model change over time.

Figure 11 :Figure 12 :
Figure 11: Correlation coefficients between variables generated by power law model.

Table 2 :
MSE of parameter estimation.

Table 3 :
MSE of parameter estimation.

Table 4 :
MSE of parameter estimation.We review formula (3) Y  = B 0 + Y −1 +   .Based on time series theory, we firstly center the sequence {Y  } because B 0 ̸ = 0. we get the expectation of Y  .