Joint Estimation of the Non-parametric Transitivity and Preferential Attachment Functions in Scientific Co-authorship Networks

We propose a statistical method to estimate simultaneously the non-parametric transitivity and preferential attachment functions in a growing network, in contrast to conventional methods that either estimate each function in isolation or assume some functional form for them. Our model is shown to be a good fit to two real-world co-authorship networks and be able to bring to light intriguing details of the preferential attachment and transitivity phenomena that would be unavailable under traditional methods. We also introduce a method to quantify the amount of contributions of those phenomena in the growth process of a network based on the probabilistic dynamic process induced by the model formula. Applying this method, we found that transitivity dominated PA in both co-authorship networks. This suggests the importance of indirect relations in scientific creative processes. The proposed methods are implemented in the R package FoFaF.


Introduction
Science has never been more collaborative. In this era that has been witnessing an unprecedented explosion of multi-author scholarly articles (Larivière, Gingras, Sugimoto, & Tsou, 2015), collaboration has become more and more important in the path to scientific success (Jones, Wuchty, & Uzzi, 2008;Bornmann, 2017). Promising ideas from numerous analytic fields, including complex network theory, statistics, and informetrics, have been weaved together to understand this collaborative nature of science (Zeng et al., 2017;Fortunato et al., 2018).
One of the early attempts to analyze the formative process of scientific collaborations came from physics when Newman proposed a non-parametric method to estimate the preferential attachment (PA) and transitivity functions from scientific collaboration networks (Newman, 2001a). PA (Price, 1965;Merton, 1968;Price, 1976;Albert & Barabási, 1999) and transitivity (Heider, 1946;Holland & Leinhardt, 1970, 1971, 1976 are two of the most fundamental mechanisms of network growth. On the one hand, PA is a phenomenon concerning the first-order structure of a network. In PA, the higher the number of co-authors a scientist already has, the more collaborators they will form. On the other hand, transitivity concerns the second-order structure: co-authors of co-authors are likely to collaborate. Newman's method is non-parametric in the sense that it does not assume any forms for either the PA or transitivity function. The method, however, considers each phenomenon in isolation and thus completely ignores any entanglements of the two phenomena, which are entirely plausible in real-world networks. Apart from this non-parametric-in-isolation approach, a joint-estimation approach, in which the two phenomena are considered simultaneously, has been attempted recently (Kronegger, Mali, Ferligoj, & Doreian, 2012;Ferligoj, Kronegger, Mali, Snijders, & Doreian, 2015;Zinilli, 2016), all under the framework of stochastic actor-based models (Snijders, 2001). This approach is, however, inherently parametric: it assumes the forms of the PA and transitivity functions a priori, thus risks losing fine details of the two phenomena, details that are difficult to be captured by any parametric functional forms.
We argue that the ideal method, whenever possible, should combine the best of both worlds: it should consider both phenomena simultaneously, and it should not assume any functional forms for them.
Our main contributions are three-fold. In our first contribution, we propose a network growth model that combines non-parametric PA and transitivity functions and derive an efficient Minorize-Maximization (MM) algorithm (Hunter & Lange, 2000) to estimate them simultaneously. This iterative algorithm is guaranteed to increase the model's log-likelihood per iteration. We demonstrate through simulated examples that our approach are capable of capturing complex details of PA and transitivity, while the conventional approaches cannot (cf. Fig. 1). We also perform a systematic simulation to confirm the performance of our algorithm.
In our second contribution, we suggest a method to quantify the amount of contributions of PA and transitivity in the growth process of a network. Our quantification exploits the probabilistic dynamic process induced by the network growth formula and can be easily extended to other network growth mechanisms.
In our third contribution, we apply the proposed methods to two real-world co-authorship networks and uncover some interesting properties that would be unavailable under conventional approaches. In particular, as contrast to the typical power-law functional form assumption, the transitivity effect seems to be highly non-power-law. We also found that transitivity dominated PA in the growth processes of both networks. This suggests the importance of indirect relations in scientific creative processes: it does matter who your collaborators collaborate with. All the proposed methods are implemented in the R package FoFaF.
The rest of the paper is organized as follows. The proposed method is discussed in details in Section 2. In Section 3, we discuss how to exploit the probabilistic dynamic process imposed by the model formula to sensibly quantify the amount of contributions of PA and transitivity. We apply the proposed method to two real-world collaboration networks and discuss the results in Section 4. Concluding remarks are given in Section 5.

Proposed Method
We first review briefly the history of PA and transitivity modelling and then describe our network growth model that incorporates non-parametric PA and transitivity functions. We also explain its relation to some conventional network models. We then discuss maximum partial likelihood estimation for the model and provide two simulated examples to demonstrate how our method works. We conclude the section with a systematic simulation to investigate the performance of the proposed method.

PA and transitivity modelling
The notion of a rich-get-richer phenomenon has its root in the theoretical works of Yule (Yule, 1925) and Simon (Simon, 1955). Its status as a fundamental process in informetrics was cemented by revolutionary works of Merton (Merton, 1968) and Price (Price, 1965(Price, , 1976. The term "preferential attachment" was coined by Barabási and Albert when they re-discovered the mechanism in the context of complex networks (Albert & Barabási, 1999).
In PA, the probability a node with degree k receives a new edge is proportional to its PA function A k . When A k is an increasing function on average, the PA effect exists: a node with a high degree k is more likely to receive more new connections. To estimate the PA phenomenon in a network is to estimate the function A k given that network's growth data. Various non-parametric approaches (Newman, 2001a;Pham, Sheridan, & Shimodaira, 2015) and parametric ones (Massen & Jonathan, 2007;Gómez, Kappen, & Kaltenbrunner, 2011) have been proposed. In parametric methods, power-law function forms, e.g., A k = (k + 1) α , are often employed (Krapivsky, Rodgers, & Redner, 2001).
Transitivity started out as a concept in psychology (Heider, 1946) and was developed theoretically in the framework of social network analysis by Holland and Leinhardt in the 1970s (Holland & Leinhardt, 1970, 1971, 1976. It was introduced to the informetrician's modelling toolbox in 2001 when Newman provided a heuristic method to estimate the transitivity function in real-world co-authorship networks (Newman, 2001a) and Snijders introduced his now-famous stochastic actor-based models that include transitivity as a network formation mechanism (Snijders, 2001).
In transitivity, the probability that a pair of two nodes with b common neighbors is proportional to the transitivity function B b . When B b is an increasing function on average, the transitivity effect is at play: the more common neighbors a pair of nodes share, the easier for them to connect. Similar to the case of PA, non-parametric approaches (Newman, 2001a) and parametric approaches (Kronegger et al., 2012;Ferligoj et al., 2015;Zinilli, 2016) have been proposed to estimate B b from observed network data.
We re-emphasize that all existing methods either consider PA or transitivity in isolation, or are of a parametric nature.

Proposed network model
Our model can be viewed as a discrete Markov model, which is a popular framework in social network modeling (Holland & Leinhardt, 1977). Let G t denote the network at time t. Starting from a seed network G 0 , at each time-step t = 1, · · · , T , v(t) new nodes and m(t) new edges are added to G t−1 to form G t . In particular, at the onset of time-step t, let k i (t) denote the degree of node i and b ij (t) the number of common neighbors between nodes i and j in G t−1 . Our model dictates that the probability that a new edge emerges between node i and node j at time step t is independent of other new edges at that time and is equal to where A k is the PA function of the degree k and B b is the transitivity function of the number of common neighbors b. In other words, the un-ordered pair of the two ends (i, j) of a new edge follows a categorical distribution over all un-ordered pairs of nodes existing at time t. Each pair's weight is proportional to the product of PA and transitivity values of that pair at t. Thus this formulation can capture simultaneously PA and transitivity effects. Suppose that the joint distribution of v(t), m(t), and G 0 is governed by some parameter vector θ. We make a standard assumption, which is virtually employed in all network models, that θ is independent of A k and B b . As we shall see later, this independence assumption enables a partial likelihood approach in which one can ignore θ in estimating A k and B b . Next we discuss the relation between the model in Eq. (1) and models in the literature.

Related models
As explained earlier, while there are models that either include a non-parametric A k function (Pham et al., 2015) or a non-parametric B b function (Newman, 2001a), Eq. (1) is the first to combine both non-parametric functions. It includes as special cases some well-known complex network models, such as the Barabási-Albert model (Albert & Barabási, 1999) or the Erdös-Rényi-with-growth model (Callaway, Hopcroft, Kleinberg, Newman, & Strogatz, 2001).
The well-known stochastic actor-based model (Snijders, 2001(Snijders, , 2017Ripley, Snijders, Boda, Vörös, & Preciado, 2018) has been employed in studies of scientific co-authorship networks (Kronegger et al., 2012;Ferligoj et al., 2015;Zinilli, 2016). It is, however, not clear how to convert the PA and transitivity functions in our probabilistic setting to those in the setting of stochastic actor-based model, since the two models are defined differently. We note that the PA and transitivity phenomena are virtually modelled in a parametric manner in the stochastic actor-based model.
One key assumption of the model in Eq. (1) is that A k and B b do not depend on t, i.e., they stay unchanged throughout the growth process. While this time invariability assumption is standard and employed in all the network models mentioned previously, there is a growing body of models departing from it. A time-varying A k has been discussed in the context of citation networks (Csárdi, Strandburg, Zalányi, Tobochnik, & Érdi, 2007;Wang, Yu, & Yu, 2008;Medo, Cimini, & Gualdi, 2011), while different parametric forms for such A k are studied by Medo (Medo, 2014). More recently, the R package tergm (Krivitsky & Handcock, 2019) allows the estimation of time-varying parametric PA and transitivity functions. There is, however, no existing work that employs time-varying and non-parametric modelling simulataneously, presumably for the reason that a huge amount of data is probably needed in such a model. It is likely that in practical situations one always has to choose between non-parametric modelling and time-varying modelling. We demonstrate in Section 4.4 that the time invariability assumption do indeed hold in all real-world networks analyzed in this paper.

Maximum Partial Likelihood Estimation
Here we describe how to estimate the parameters of the model in Eq. (1). Denote D = {G 0 , G 1 , · · · , G T } the observed data, and let A = [A 0 , A 1 , . . . , A kmax ] with A k > 0 be the PA function and B = [B 0 , B 1 , . . . , B bmax ] with B b > 0 be the transitivity function. Here k max is the maximum degree and b max is the maximum number of common neighbors between a pair of nodes. Given D, our goal is to estimate A and B without assuming any specific functional forms, an approach we call "non-parametric".
With the independence assumption stated in the previous section, the part of the log-likelihood that contains A and B and the part of the log-likelihood that contains θ are separable, i.e., L(A, B, θ|D) = L(A, B|D) + L(θ|D) holds, where L denotes the log-likelihood function. This allows us to ignore θ in estimating A k and B b . Starting from Eq. (1), with some calculations we arrive at where n k1,k2,b (t) is the number of node pairs (i, j) that satisfy (k i (t), k j (t), b ij (t)) = (k 1 , k 2 , b) with k 1 ≤ k 2 at time-step t, and m k1,k2,b (t) is the number of new edges between such node pairs. The number of new edges at time-step t is then expressed as m(t) = kmax k1=0 kmax k2=k1 bmax b=0 m k1,k2,b (t). Although analytically maximizing L(A, B|D) is intractable, we can derive an efficient MM algorithm that iteratively updates A and B. See Appendix A for its derivation. We also denote the final result of the algorithmÂ andB, estimates of A and B.

Illustrated examples
We demonstrate the effectiveness of our method in two examples. In the first example, we simulate a network using Eq. (1) with A k = 3 log(max k, 1) α + 1 and B b = 3 log(max b, 1) α + 1. This functional form, which deviates substantially from the power-law form, has been used to demonstrate the working of non-parametric PA estimation methods (Pham et al., 2015). The network has a total of N = 1000 nodes. At each time-step, one new node is added to the network with m(t) = 5 new edges. In the second example, we first estimate A k and B b by applying our proposed method to a real-world co-authorship network between authors in statistics journals (cf. Section 4), and then use these parameter values for simulating a network based on Eq. (1). In the process, we kept the initial condition and the number of new nodes and new edges at each time-step exactly as what were observed in the real-world network.
We apply three estimation methods to each simulated network. The first is our proposed method, which jointly estimates the non-parametric functions A k and B b . The second is a joint parametric method, which jointly estimates PA and transitivity using the simplistic functional forms A k = (k + 1) α and B b = (b + 1) α . This parametric formation is used widely in various PA and transitivity estimation methods (Massen & Jonathan, 2007;Gómez et al., 2011). The third method ignores the joint existence of PA and transitivity: it consists of two sub-methods: the first one is a non-parametric method for estimating PA in isolation (Pham et al., 2015), and the second one is a maximum likelihood version of a non-parametric method for estimating transitivity in isolation (Newman, 2001a).
The results are shown in Fig. 1.
In both examples, while the joint parametric method somehow succeeded in obtaining the general trends of A k and B b , it failed to capture the deviations from the power-law form in the two functions. The non-parametric-in-isolation method grossly over-estimated both PA and transitivity mechanisms, due to its complete disregard of their joint existence. The proposed method worked reasonably well, succeeding in capturing the PA and transitivity functions in fine details.

Simulation study
We perform a systematic simulation to evaluate how well the proposed method can estimate A k and B b . We choose A k = (k + 1) α and B b = (b + 1) β as the true functions. This power-law functional form has been used in previous simulation studies of PA estimation methods (Pham et al., 2015;Pham, Sheridan, & Shimodaira, to appear). We consider five values (0, 0.5, 1, 1.5, and 2) for the exponent α and six values (0, 0.5, 1, 1.5, 2, 2.5, and 3) for the exponent β. These are the ranges of α and β observed in Section 4.2. For each combination of α and β, we simulate 10 networks. In each network, the total number of nodes is 1000 and there are five new edges at each time-step.
For each simulated network, we first estimate A k and B b as described in the previous section and then fit (k + 1) α and (b + 1) β to the estimation results to find the estimates of α and β. In other words, we indirectly measure how well A k and B b are estimated by looking at how well α and β are estimated: if the estimates of α and β are good, the estimations of A k and B b are likely successful, too. Figure 2 shows the true and estimated values of α and β. The proposed method successfully recovers α and β in all combinations. This implies that the estimation of A k and B b went well.

Quantifying the amount of contributions of PA and transitivity
Our model leads to a simple answer to a previously-unraised yet fascinating question: how can one compare the amount of contributions of PA and transitivity in the growth process of a network? To the best of our knowledge, no one has attempted to quantify the amount of contributions of different network growth mechanisms. To answer this question, one must find a meaningful way to define the amount of contributions so that they are computable and comparable. We achieve this by considering the dynamic process expressed in Eq. (1). This probabilistic dynamic process suggests that the variability of the PA/transitivity values in the set of node pairs is a sensible measure for the amount of contribution of PA/transitivity.
Let us define the amount of contributions of PA and transitivity at time-step t. Denote them as s PA (t) and s trans (t), respectively. Taking logarithm of both sides of Eq. (1), one gets: with is the logarithm of the normalizing constant at time-step t and is independent of i and j. Equation (3) implies that, looking locally at a node pair (i, j), PA and transitivity contribute to log 2 P ij (t) by the amounts of log 2 [A ki(t) A kj (t) ] and log 2 B bij (t) , respectively; the amount of contribution is measured by log 2 fold-changes. What is important globally is, however, the relative sizes of all log 2 [A ki(t) A kj (t) ] and log 2 B bij (t) at that time-step t. For example, consider the case when A k = 1, ∀k. In this case, the the value of log 2 [A ki(t) A kj (t) ] will be the same for all node pairs, and thus PA would have no role in determining which pair would get a new edge. By considering the case when B b = 1, ∀b, one can see that the same reasoning should also apply to log 2 B ij (t).
The exponents are estimated by a two-step procedure: first A k and B b are estimated jointly by the proposed method, then (k + 1) α and (b + 1) β are fitted to the estimated results by least square. Each estimated point is the mean of the results of 10 simulations, with the error bars display the standard errors of the mean.
This observation prompts us to define s PA (t) and s trans (t) as the standard deviations of log 2 [A ki(t) A kj (t) ] and log 2 B bij (t) , respectively, when (i, j) is sampled based on Eq. (1). Let U (t) be the set formed by all node pairs (i, j) that exist at time-step t. The probability P ij (t) in Eq. (1) can be written explicitly as The aforementioned standard deviations can be calculated as follows.
and log 2 B bij (t) are invariant to constant factors in A k and B b , and thus s PA (t) and s trans (t) are well-defined. The use of base-2 logarithms allows us to interpret s PA (t) and s trans (t) as log 2 fold-changes; a contribution value of s indicates a change of the probability 2 s times in Eq. (1). We also note that, although A k and B b are assumed to be time-invariant, k i (t), b ij (t), and P ij (t) change over time, thus leading to dynamic nature of s PA (t) and s trans (t).
In real-world situations, what are available to us is not the true values A and B, but only their estimateŝ A andB. We plug these estimates into Eqs. (4) and (5) to obtainŝ PA (t) andŝ trans (t), estimates of s PA (t) and s trans (t).
The requirement that (i, j) is sampled from Eq. (1) is needed to faithfully reflect the probabilistic dynamic process and leads to the following interpretation of s PA (t) and s trans (t). Assume that at some time-step t we observed m(t) ≥ 2 new edges whose end points are (i 1 , j 1 ), · · · , (i m(t) , j m(t) ). Consider the sample standard deviation of log 2 (B bi l j l (t) ) for l = 1, · · · , m(t), which is defined as Plugging in the estimatesÂ and B, we can viewŝ PA (t) andŝ trans (t) as the estimates of the expectations of the sample standard deviations in PA and transitivity values observed at the end points of new edges at time-step t. As we shall see in Section 4.3, this interpretation also gives us a mean to visualize how well the model fits an observed network.
Finally, we note that this quantification approach is not limited to PA and transitivity. Given a growth formula in which all growth mechanisms are combined in a multiplicative way, for example, as in Eq. (1), the standard deviations of the logarithmic value of each growth mechanism can be used as a measure of the contribution of that mechanism.  Table 1 shows the summary statistics for two networks. The ratios ∆|V |/|V | and ∆|E|/|E| are both close to one, which indicate that each network grows out from a very small initial network. Since the number of new edges ∆|V | is loosely corresponding to the number of available data in our statistical model, STA has the biggest amount of data. The clustering coefficients in both networks are rather high, but nevertheless fall in the normal range observed in real-world networks (Newman, 2001b). Table 1: Summary statistics for two scientific co-authorship networks. |V | and |E| are the total numbers of nodes and edges in the final snapshot, respectively. T is the number of time-steps. ∆|V | and ∆|E| are the increments of nodes and edges after the initial snapshot, respectively. C is the clustering coefficient of the final snapshot. k max is the maximum degree and b max is the maximum number of common neighbors. It is instructive to look at more fine-grained statistics. Figures 3A and B show the distributions of the number of collaborators k in the final snapshot of SMJ and STA, respectively. Since the degree distributions in two datasets exhibit signs of heavy-tails, we fitted one of the most representative class of heavy-tail distribution, the power-law distribution k −γ deg , to these degree distributions by Clauset's procedure (Clauset, Shalizi, & Newman, 2009). This procedure first chooses the minimum degree k min from which the powerlaw holds, and then uses a maximum likelihood approach to estimate the power-law exponent γ deg . The estimated power-law exponents for degree distributions in SMJ and STA are 2.97 and 3.35, respectively. These values fall in the range of 2 < γ deg < 4, which is a commonly observed range for γ deg in real-world networks (Newman, 2005;Clauset et al., 2009).

Two co-authorship networks
The situation with the distributions of b ij is, however, less clear. Figures 3C and D show the distributions of the number of node pairs with b common neighbors in the final snapshot of SMJ and STA, respectively. We also fitted the power-law distribution b −γcn to the distributions of b by Clauset's procedure and found that γ cn in SMJ and STA are 2.99 and 3.22, respectively. The power-law form, however, seems to be not a very good fit for these distributions. The ranges of b in two distributions seem to be too narrow to say anything definitely about the tails. To the best of our knowledge, no previous work has studied the distributions of b ij , either in co-authorship networks or any other network types. Since figuring out the distributional form of b ij is not our main goal, we leave this task as future work.

PA and transitivity effects are highly non-power-law
Applying the proposed method to two data-sets, we found that the estimated PA and transitivity functions display non-power-law and complex trends (Figures 4).
In both networks, the value of A k increases on average in k, which implies the existence of the PA phenomenon: the more collaborators an author gets, the more likely they would get a new one. This is consistent with previous results in the literature, in which the phenomenon has been found in collaboration networks in diverse fields (Newman, 2001a;Milojević, 2010;Kronegger et al., 2012;Ferligoj et al., 2015).
The situation with the transitivity functions is more complex. In both SMJ and STA, there is a huge jump when b goes from 0 to 1: B 1 /B 0 is about 60 times in SMJ and almost 100 times in STA. These jumps in B b values have been previously observed in co-authorship networks (Newman, 2001a;Milojević, 2010). After this initial jump, B b , however, stays relatively horizontal in both SMJ and STA, before slightly increases again in SMJ. This complex departure from the power-law form renders any statement about a universal transitivity effect moot. The value of B b at every b > 0 is, however, at least one order of magnitude higher than B 0 , which suggests that, co-authors of co-authors seem to be at least ten times more likely to become new co-authors, comparing with the case when there is no mutual co-author.
It is informative to supplement the non-parametric analysis with a parametric one, since the theoretical literature offers many insights in this context. Here, we follow the standard practice and fit the power-law functional forms A k = (k + 1) α and B b = (b + 1) β (Krapivsky et al., 2001;Jeong, Néda, & Barabási, 2003;Pham et al., 2015). To find the PA attachment exponent α and the transitivity attachment exponent β, we substitute these forms to Eq. (1) and numerically maximizes the resulted log-likelihood function with respected to α and β. Table 2 shows the estimated values of α and β. The PA attachment α in both networks are in the sub-linear region, i.e., 0 < α < 1, which is a frequently observed range in real-world networks (Newman, 2001a;Pham et al., 2015;Ronda-Pupo & Pham, 2018). While this region has been shown to give rise to a heavy-tail degree distribution when there is only PA at play (Krapivsky et al., 2001), there is no such theoretical result when PA jointly exists with transitivity. It is, however, not entirely unreasonable to expect that the sub-linear value of α is responsible for the observed heavy-tail degree distributions in Figs. 3A and B. This implies the existence of the PA phenomenon: a highly-connected author is likely to get more new collaborations than a lowly-connected one. B: The transitivity effect is highly non-power-law in both networks. While B b greatly increases when b changes from 0 to 1 in both networks, after this initial huge jump, B b stays relatively horizontal in SMJ and only slightly increases in STA. The huge jump at b = 1 implies that co-authors of co-authors is at least ten times more likely to become new co-authors, comparing with the case when there is no mutual co-author.
The transitivity attachment exponents β are both greater than 1, indicating an exponentially faster growth rate of the transitivity function comparing to the PA function. This is evident in, for example, STA: while A 10 is less than 10, B 10 is already larger than 100. To the best of our knowledge, there is no theoretical result on the effect of β on the structure of a growing network, even for the supposedly simpler case when there is only transitivity.
Overall, the results in this section indicate the joint existence of PA and transitivity phenomena in both networks. Our non-parametric approach revealed that a conventional power-law functional form in a parametric approach may not be the best to describe the two phenomena. For A k , the power-law form fits reasonably well the low-degree part, but cannot capture the deviations from the power-law form in the high-degree part. For B b , the power-law form is even less suitable. We hope our non-parametric findings could offer hints on more suitable parametric forms for A k and B b .

Transitivity dominated PA in both networks
After obtaining the estimatesÂ andB, we can compute the amount of contributions of PA and transitivity in the growth process of each network by plugging these estimates into Eqs. (4) and (5). The estimated amount of contributionsŝ PA (t) andŝ trans (t) are shown in Fig. 5 as solid lines. In each network,ŝ trans (t) is greater thanŝ PA (t) for all t. One might ask whether these tendencies hold for the true values s PA (t) and s trans (t) as well, or they are just artifacts arising when we plugÂ andB into Eqs. (4) and (5). We demonstrate by simulations that, if the true A and B are close to the estimateŝ A andB, s PA (t) and s trans (t) are similar toŝ PA (t) andŝ trans (t). For each real network, we simulated 100 networks based on Eq. (1) using the estimatesÂ andB as true functions. We kept all the aspects of the growth process that are not governed by Eq. (1) the same as what observed in the real network. This includes using the observed initial graph and the observed numbers of new nodes and new edges at each time-step in the simulation. SinceÂ andB are the true PA and transitivity functions for each simulated network, we were able to calculate the true contributions of PA and transitivity in each simulated network using Eqs. (4) and (5). The behaviours of the simulated contributions are very similar to the estimated contributionsŝ PA (t) andŝ trans (t), which indicates that the latter are likely to be reliable.
As explained in Section 3, one can interpret the contributionsŝ PA (t) andŝ trans (t) as estimates of the expectations ofĥ PA (t) andĥ trans (t), the sample standard deviations of PA and transitivity values at end points of actually-observed new edges at time-step t. This is expressed as where the estimatesŝ PA (t) andŝ trans (t) slightly overestimate the expectations, because Eĥ trans (t) ≤ (Eĥ trans (t) 2 ) 1/2 ≈ŝ trans (t).  Overall, the data indicate the governing role of transitivity in the growth processes of both networks: it is mostly the differences in the transitivity values that decide where new collaborations are formed. This intuitive result is consistent with previous results which found that common neighbors are more effective than PA at link prediction in co-authorship networks (Liben-Nowell & Kleinberg, 2007). If PA was what dominates, a scientist would only need to indiscriminately acquire as much collaborators as possible in order to boost their number of collaborators in the future. In light of the current result, they, however, might need to be more selective, since a collaborator who has collaborated with a lot of people might offer more advantages.

Diagnosis: time-invariance and goodness-of-fit
Finally, we consider two questions that are critical to our real-world data analysis. The first concerns the validity of the time-invariance assumption of A k and B b in two networks: in each network, do A k and B b stay relatively unchanged throughout the growth process? The second is whether Eq. (1) is a reasonably good model for the networks. Although Fig. (6) already hinted at an affirmative answer for both questions, we examine each question in finer details.

Time invariance of the PA and transitivity functions
One way to answer the first question is to compare the A k and B b in Fig. 4 with the A k and B b estimated using only some portion of the growth process, for many different portions. If they are similar, one can conclude that A k and B b indeed stay unchanged throughout the growth process, and thus the time-invariance assumption is valid.
To this end, from each original network, we create three new networks. The first new network ("First Half") contains only the first half of the growth process, thus allows estimating A k and B b in this portion. In the second new network ("Initial 0.5"), we set the initial time at the middle of the time-line, effectively enabling estimation of A k and B b of the second half of the growth process. In the third new network ("Initial 0.75"), we set the initial time at the 3/4 point of the time-line. This network lets us estimating A k and B B in the last quarter of the growth process. The estimated A k and B b in these three new networks then are compared with the A k and B b obtained from the full growth process (Figure 7). Visual inspection of Fig. 7 suggests that both the PA and transitivity functions stay relatively unchanged in the growth process of each network. This validates the time-invariance assumption.

Goodness-of-fit
We use a simulation-based approach to investigate the goodness-of-fit of the model. For each real-world network, we re-use the simulation data used in Fig. 5, which consists of 100 simulated networks generated using the estimated A k and B b of that network as true functions. We compare some statistics of the simulated networks with the corresponding statistics of the real network. If Eq. (1) is a good fit, then the observed statistics and the simulated statistics must be close. Similar simulation-based approaches have been proposed for inspecting goodness-of-fit of exponential random-graph models (Hunter, Goodreau, & Handcock, 2008) and stochastic actor-based models (Conaldi, Lomi, & Tonellato, 2012;J. Lospinoso, 2012).
For an overview, we look at how well the model can replicate the observed degree curves. In Fig. 8, for each real-world network we choose uniformly at random ten nodes from the top 1% of all nodes in term of the number of new edges accumulated during the growth process. For each node, we then plot the evolution line of the observed degree value and the simulated degree value. The closer this line to the line of equality is, the better the model captures the observed degree growth of that node. Although for some nodes the simulated degree sometimes tends to be lower than the observed degree, the lines are overall reasonably close to the identity line, which implies the model captured the degree growth well.
For a closer inspection, we then look at how well the model replicates the probability distribution of new edges during the growth process. In particular, consider sampling uniformly at random an edge e from the set of all new edges in the growth process. Suppose that e is between a node pair with degrees K 1 and K 2 (K 1 ≤ K 2 ) and the number of their common neighbors is X. The relative frequency, or observed probability, that K 1 = k 1 , K 2 = k 2 , and X = b is p k1,k2,b = t m k1,k2,b (t)/ While "First Half" contains only the first half of the growth process, the initial time is set at the middle and at the 3/4 point of the time-line in "Initial 0.5" and "Initial 0.75", respectively. In each data-set, all four PA /transitivity functions agree well with each other, which suggests that the PA and transitivity functions stay relatively unchanged throughout the growth process. which m k1,k2,b (t) is the number of new edges emerged at time t between a node pair whose degrees are k 1 and k 2 and their number of common neighbors is b. The probability p k1,k2,b thus summarizes information about the associations of k 1 , k 2 , and b at the end points of new edges through out the growth process. Our joint estimation of PA and transitivity is compared with two conventional approaches in which PA (Pham et al., 2015) and transitivity (Newman, 2001a) are estimated in isolation. For each of these two approaches, we first estimate the PA/transitivity function in isolation and then use the estimated function to generate 100 networks in order to inspect how well each existing method replicates p k1,k2,b . In order to visualize this probability distribution, which is multi-dimensional, we slice it into many one-dimensional ones by conditioning. Firstly, we look at with the convention that p k1,k2,b = 0 whenever k 1 > k 2 or k 2 > k max . This is the probability distribution of K 1 + K 2 conditioning on the event X ∈ B. Since we know from Fig. 3 that the number of node pairs with b = 0 or b = 1 is vastly greater than the rest, we consider two probability distributions p k|b≤1 and p k|b≥2 and show their cumulative probability distributions in Fig. 9. In all cases, the joint estimation approach best replicated the observed distributions. It is surprising to observe that the B b -in-isolation approach, which does not explicitly leverage any information about k, has more or less the same replication performance as the A k -in-isolation approach, which explicitly does. This suggests that the dimension of b preserves a fair amount of the information about k. Secondly, we look at where K is a non-empty set of un-ordered pairs. This is the probability distribution of X conditioning on the event (K 1 , K 2 ) ∈ K. Given a pair of node whose degrees are k 1 and k 2 and their number of common neighbours is b, there is a natural condition imposed on b: b must be not greater than either k 1 or k 2 . So if one chooses K such that k 1 or k 2 could be too small, the range of b would be severely limited. For this reason, we consider two probability distributions: p b| max(k1,k2)≤9 and p b| max(k1,k2)≥10 , both allow a large range for b. Their cumulative distributions are shown in Fig. 10. Once again, the joint estimation approach best replicated the observed cumulative probability distributions in all cases. While the B b -in-isolation approach replicated fairly well the observed distributions in most cases, the A k -in-isolation approach completely failed to do so in all cases. This implies that, while the dimension of b seems to preserve a fair amount of the information about k 1 and k 2 , the dimensions of k 1 and k 2 maintain little information about b.
Overall, the joint estimation approach performed comparatively well. The surprisingly good performance of the B b -in-isolation approach is, in fact, in agreement with the dominating role of B b in the growth process of both networks. Combining the results in Fig. 8 with those in Figs. 9 and 10, we conclude that the joint estimation approach captured reasonably well both first-order and second-order information of the networks. This good fit is consistent with the fact that the key assumption of time-invariability of A k and B b is satisfied in both networks.

Conclusion
We proposed a statistical network model that incorporates non-parametric PA and transitivity functions and derived an efficient MM algorithm for estimating its parameters. We also presented a method that is able to quantify the amount of contributions of not only PA and transitivity but also many other network growth mechanisms by exploiting the probabilistic dynamic process induced by the model formula.   Figure 9: Observed and simulated cumulative probability distributions p k|b≤1 and p k|b≥2 of k = k 1 + k 2 in two networks. For each estimation method, we generate 100 networks from the estimation result and report the average values over 100 simulations. A and B: the cumulative probability distribution p k|b≤1 in SMJ and STA, respectively. C and D: the cumulative probability distribution p k|b≥2 in SMJ and STA, respectively. In all cases, our joint estimation approach replicated the observed distributions comparatively well.   Figure 10: Observed and simulated cumulative probability distributions p b| max(k1,k2)≤9 and p b| max(k1,k2)≥10 in two networks. For each estimation method, we generate 100 networks from the estimation result and report the average values over 100 simulations. A and B: the cumulative probability distribution p b| max(k1,k2)≤9 in SMJ and STA, respectively. C and D: the cumulative probability distribution of p b| max(k1,k2)≥10 in SMJ and STA, respectively. In all cases, our joint estimation approach replicated the observed distributions comparatively well.
We showed that the proposed network model is a reasonably good fit to two real-world co-authorship networks and revealed intriguing properties of the PA and transitivity functions in those networks. The PA function is increasing on average in both networks, which implies the PA effect is at play. Excluding the high degree part, it does follow the conventional power-law form reasonably well. The transitivity function is, however, highly non-power-law in two networks: it jumps greatly after b = 0, but stays relatively horizontal or only slightly increases afterwards. This non-conventional form implies that co-authors of co-authors seems to be at least ten times more likely to become new co-authors, comparing with the case when there is no mutual co-author. We also found transitivity dominating PA in both networks, which suggests the importance of indirect relations in scientific creative processes.
There are some fascinating directions for further developing the statistical methodology. Firstly, although the proposed model and most other network models in the literature assume that new edges at each timestep are independent, such edges are hardly so in real-world collaboration networks. Efficiently relaxing this assumption might lead to better models for this network type. Secondly, it is curious to see whether one could take the time-invariability test developed for stochastic actor-based models (J. A. Lospinoso, Schweinberger, Snijders, & Ripley, 2011) and adapt it to our model.
On the application front, this work lays out a potentially fruitful approach for analyzing complex networks, while raising more questions than it answers. Does transitivity always dominate PA in co-authorship networks? Which parametric forms are capable of capturing the fine details seen in Fig. 4? What are the properties of PA and transitivity in co-authorship networks at the level of institutions or countries? We hope this paper has convinced informetricians to include non-parametric modelling of PA and transitivity into their toolbox.

Acknowledgements
This work was supported in part by JSPS KAKENHI Grant Numbers JP19K20231 to TP and JP16H02789 to HS. The funding source had no role in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.

A An MM algorithm for estimating the non-parametric PA and transitivity functions
To maximize the partial log-likelihood function l(A, B) in Eq. (2), we derive an instance of the Minorize-Maximization algorithms (Hunter & Lange, 2000). Denote A (q) k the value of A k at iteration q (q ≥ 0), and A (q) = [A b and B (q) in a similar way. Starting from some initial values (A 0 , B 0 ) at iteration q = 0, we want to compute (A (q+1) , B (q+1) ) from (A (q) , B (q) ). In MM algorithms, one derives such update formulas by first finding a surrogate function Q(A, B) that satisfies l(A, B) ≥ Q (A, B), ∀A, B and l(A (q) , B (q) ) = Q(A (q) , B (q) ), and then maximize the surrogate function. One can prove that, if (A q+1 , B q+1 ) maximizes Q(A, B), then l(A (q+1) , B (q+1) ) ≥ l(A (q) , B (q) ), i.e., the objective function is increased monotonically per iteration. Since there can be many surrogate functions that satisfy the conditions, the main indicator for evaluating a particular Q(A, B) is how easily we can maximize it.
Based on previous works (Pham et al., 2015;Pham, Sheridan, & Shimodaira, 2016), the following function is a surrogate function of l: where K := k max and B := b max . The product A i A j B l in the numerator of the third term in the r.h.s. of Eq. (A.1) prevents parallel updating of A and B. One way to deal with this product is to apply the AM-GM inequality (Hunter & Lange, 2004): where m i,k,· (t) := B l=0 m i,k,l (t) and m ·,·,b := K i=0 K j=i m i,j,b (t). Based on these formulas, at each iteration A (q+1) and B (q+1) can be computed in parallel without solving any additional optimization problems. This enables the method to work with large data-sets. The objective function value l(A (q+1) , B (q+1) ), as explained earlier, is guaranteed to be increasing in q.
The standard deviation ofĥ trans (t) then can be calculated by pluggingÂ andB into the above formula. The standard deviation ofĥ PA (t) follows the same derivation.