Biased diffusion on Japanese inter-firm trading network: Estimation of sales from network structure

To investigate the actual phenomena of transport on a complex network, we analysed empirical data for an inter-firm trading network, which consists of about one million Japanese firms and the sales of these firms (a sale corresponds to the total in-flow into a node). First, we analysed the relationships between sales and sales of nearest neighbourhoods from which we obtain a simple linear relationship between sales and the weighted sum of sales of nearest neighbourhoods (i.e., customers). In addition, we introduce a simple money transport model that is coherent with this empirical observation. In this model, a firm (i.e., customer) distributes money to its out-edges (suppliers) proportionally to the in-degree of destinations. From intensive numerical simulations, we find that the steady flows derived from these models can approximately reproduce the distribution of sales of actual firms. The sales of individual firms deduced from the money-transport model are shown to be proportional, on an average, to the real sales.


Introduction
The circulation of money is often likened to the circulation of blood. For instance, in the middle of the 18th century, French physiocrat François Quesnay introduced his economic theory 'tableauéconomique', which is one of the important foundations of modern economics, from the theory of blood circulation described by William Harvey in the 17th century [1]. If this analogy is acceptable, what are the differences between the flow of money (i.e., the flow within society) and the flow of blood (i.e., the flow within the body) in terms of transport phenomena?
Transport processes such as diffusion, advection and radiation, play a fundamental role in physics and theories of transport processes are widely applied in chemistry, biology, engineering, etc. One transport problem that has long attracted interest is transport in complex systems such as biological or the social systems. Because of recent accumulations of data regarding complex systems and developments in the theory of complex networks, we can now provide a new perspective on such problems.
Complex networks have been studied intensively over the past decade [2]- [4]. These studies revealed that complex networks can be observed in a wide range of real systems both in natural and man-made. In particular, transport phenomena in complex networks have been investigated. For example, random walks on complex networks have been studied from various viewpoints [5,6,9]. The PageRank, which is one of the most successful indices evaluating the importance of web pages and which is applied by Internet search engines, corresponds to the steady-state density of transport caused by random walks on the World Wide Web. Other types of transportation on complex networks have also been studied [7,8,10].
What properties characterize the actual transport on a complex network? A great majority of studies of transportation on complex networks has been based on theoretical approaches; however, a few studies have been involved actual transportation on complex networks. For example, R. Guimera et al . revealed the nonlinear relationships between degrees of airports traffics on the world-wide-airport network [11], P. Sen et al . studied the number of trains on the Indian rail way network [12] and A. Chmiel et al . investigate on visitors on portal sites and self-attracting walks can well describe its properties [13].
To investigate real phenomena of transport on a complex network, we focus on the transport of money on the trading network where firms correspond to network nodes and the trade relations between firms correspond to network edges. This system is useful for studying the real transport phenomena on the complex network, because we can estimate the total in-flow of each node by from the sales data.
In this study, we show that we can consistently estimate the sales from the structure of the inter-firm trading network, which is analogous to estimating blood flows from the vascular structure. In sec. II, we start by analysing the trading network data, from about 900,000 Japanese firms and their corresponding sales data. Next, we introduce two transport models and discuss their properties in Sec. III. In Sec. IV, we compare the steady-state sales of the model with the actual sales and show that flows generated by the models can reproduce the well-known Zipf's law; namely, that the cumulative distribution of sales obeys a power-law distribution with the exponent close to -1. Finally, we conclude with a discussion in Sec. V.

Data analysis
The data set was provided by Tokyo Shoko Research, Ltd. (TSR). and contains about one million firms practically covering all active firms in Japan. For each firm, the data set contains the annual sales and a list of business partners, categorized into suppliers and customers [14,15].
From this list, we generated a network (firm network), whose nodes are the firms and the edges are defined by the following rule: If the i-th firm buys something from the j-th firm, or equivalently if money flows the i-th to the j-th, we connect from the i-th to the j-th with a directed link [14]. Figure 1(a) shows the average degree of the nearest neighbours, which is denoted by a function of degree k: k nn (k) [16]. The data shown in this figure confirms that the firm network has a negative degree-degree correlation.
To clarify the properties of this firm network, we perform a parallel analysis using an artificial random network having the same degree distribution as the firm network [14,17,18]. We generate an artificial random network by using the Markov-chain Monte Carlo switching algorithm [19] which is used to repeatedly choose the edge pairs randomly and switch from 'X1 → Y 1 and X2 → Y 2' to 'X1 → Y 2 and X2 → Y 1' until the network is well randomized. This artificial network, which we call the shuffled network in this paper, is an almost uncorrelated network having the same degree distribution as the original firm network. The red line in figure 1(a) shows the degree-degree correlation of this network. By comparing the differences in behaviour between the real firm network and the shuffled network, we can check the effect of a correlation.

Statistical properties of individual firms
We begin by investigating the properties of individual firms. figure 1 (b) shows cumulative distributions s, k (in) and k (out) , where s is the annual sales in 2005, k (in) is the in-degree and k (out) is the out-degree. The data shown in the figure indicates that s, k (in) and k (out) obey the following power-law cumulative distribution functions (CDFs): with exponents α s = 1.0 , α in = 1.3 and α out = 1.3. For sales s, this empirical fact, which is well known as Zipf's law, is observed in various countries [20,21,22,14]. Next, we investigate the correlation between sales and degrees. We calculate the conditional mean of s as a function of k (in) . As shown in figure 1 (c), for a large in-degree k (in) , < s > k (in) can be described as a power law is the conditional mean of s for given k (in) . These results imply that the mean of 'sales per in-degree' increases with increasing in-degree. Roughly speaking, we can theoretically derive a relationship among α in , α s , β s|k as a result of transformation of random variables. Assuming that, k (in) obeys the power-law distribution with the probability density function (PDF) , and s and k (in) satisfy the power law relationship then by changing the variables, the PDF of s becomes Thus, we get the following nontrivial relationship between power law indices: This relationship is consistent with the observed values.

Nearest-neighbourhood correlations
We now consider the relationships between the sales of a firm and the sales of its customers. Customers of m-th node are defined by the nodes whose out-going edges reach the i-th node. We introduce two kinds of weighted sums of customer sales, s (1) m and s (2) m : where, A is an N × N adjacency matrix defined by if there is an edge from i to j 0 otherwise (11) In the case of s (1) , a customer distributes money among all its suppliers evenly. However, in the case of s (2) , a customer distributes money among its suppliers in proportion to suppliers' in-degree. We calculate conditional mean of s (1) given s, denoted by < s (1) > s , and conditional mean of s (2) given s, denoted by < s (2) > s , functions of s: where, δ In this paper, we estimate other conditional means < · > · in a similar manner.
As shown in figures 2(a) and 2(b), for large sales s, < s (1) > s and < s (2) > s can be described as power laws: For s (2) , in particular, we observe a simple linear relationship in the region above 10 9 yen. Its proportionality constant is equals to about 1. Red broken line shows s (2) = 0.0045 · s 1.3 . We confirm that for the firm network, < s (2) > s is almost equal to s for the large sales region. In contrast, < s (1) > s for the firm network and < s (2) > s for the shuffled network are not proportional to s.
The results shown in figure 2 (c) confirm that < s (2) > s is not proportional to s for the shuffled network. In other words, equation (15) does not hold for the shuffled network, which has the same degree distribution as the real firm network.

Models
In correspondence with the local relationships given in equations (9) and (10), we introduce the following two models of the time evolution of locally conserved scalar quantities, x m (t). Model-1 (PageRank model) Model-2 (Biased distribution model) Note that for k (out) i = 0 in equation (16) or N j=1 A ij k (in) j = 0 in equation (17), we omit the contributions of the i-th node.
In equation (16), a node(customer) evenly distributes its scalar(money) among its outgoing edges (suppliers); i.e., Model-1 corresponds to the PageRank model [5]. However, Model-2 corresponds to a kind of biased random-walk model [23]. In equation (17), a node(customer) distributes its scalar(money) to its outgoing edges in proportion to the node in-degrees indicated by the outgoing edges (suppliers).
We apply these time-evolution models to two types of networks. The first network is the largest strongly connected component(LSCC) of the real firm network. A strongly connected component(SCC) is defined as the maximal subset of edges in a network such that each node can reach all others and is itself reachable from all others along a directed path [24]. The LSCC of the firm network is defined as the SCC having the largest number of nodes when we decompose the firm network into SCCs. The LSCC of the firm network is the core of the firm network, containing 462,602 nodes and 2,583,620 edges. In addition, the previously mentioned statistical properties of the firm network holds true for the LSCC of the firm network. Note that for the both Model-1 and Model-2, there is no need to consider a boundary-condition or end effects, because of an SCC does not have exits(nodes having no out-edges) or entrances(nodes having no in-edges). Therefore, we can focus on the properties of models for the bulk of the network, which is why we do not use the original firm network but the LSCC of the original firm network for the first step of the numerical simulations.
The second network is the shuffled LSCC of the firm network, generated by the above-mentioned Markov-chain Monte Carlo switching algorithm [19]. The shuffled LSCC of the firm network is an almost uncorrelated network with the same degree distribution as the LSCC of the firm network. For our simulation, we used the shuffled LSCC of a network that consisted of a single SCC and we also assumed that the network consisted of a single SCC for the theoretical analysis.

Properties of models
For a strongly connected network, we consider the existence of a steady state for the Model-1 and Model-2. Because SCCs have no outlets, the total scalar N i=1 x i (t) = N i=1 x i (0) is conserved in both models. For the steady state with normalization, x i (t) = 1, both models are Markov-chains, where the probability of existence of the m-th node, p m (t) is given by x m (t) and the transition probability Q mi from the i-th node to the m-th node is A im ·1/k ) for the Model-1 and Model-2, respectively. These Markov-chains are irreducible; i.e., there is a non-zero transition probability from any state to any other state. This property arises because a path exists between any two nodes in the graph on the strongly connected network and the transition probability from the i-th node to the j-th node is a non-zero for the node pair the i-th and the j-th such that A ij = 1. In general, it is known that an irreducible Markov-chain with the finite number of states have unique steady state [25], therefore, both models have the unique steady state for the strongly connected network. We denote this steady state for the given strongly connected network by p (s) . According to linearity, the steady state x (s) is obtained as where i=1 N x i (0) is the total sum of the initial values.

Model-1 (PageRank model)
To understand the properties of our model for the LSCC of the firm network and the shuffled LSCC of the firm network, we simulate the Model-1 by equation (16). Starting with the initial condition x i (0) = 1 (i = 1, 2, · · · , N), the CDF of x converges to the steady state distribution as shown in figure  3(a). For both cases the real and shuffled network, the distribution of x follows a power law with an exponent of about 1.3, which is the power-law exponent for in-degree α in . Thus, Model-1 is not consistent with the empirical sales distribution because the empirical sales distribution follows the Zipf's law with exponent -1. Figure 3(b) shows the conditional mean of x for given k (in) as a function of k (in) , which is denoted by < x > k (in) . For both networks, we get the following liner function: This result can be explained by the mean-field solution of PageRank [9]. However, it is inadequate to regard x as sales because equation (19) disagrees with the empirical relationship, equation (4).

Model-2 (Biased diffusion model)
We apply the same analysis to Model-2. Starting with the same initial condition as for Model-1, the CDF of x converges quickly to a power law, as shown in figure 3 (c) for both case of the LSCC of the real firm network and for the shuffled LSCC. The exponent of the steady state power law is about -1 for the LSCC of the firm network, which agrees with the empirical observation for sales, equation (1). However, for the shuffled LSCC of the firm network, the exponent takes -0.65, which disagrees with the empirical observation equation (1). Note that the steady state of Model-2 is sensitive to the correlation of the network structure. figure. 3 (d) shows the conditional mean of x for given k (in) viewed as a function of k (in) , denoted by < x > k (in) . For all cases, its behaviour approximately follows the power law: with exponent β x|k satisfying equation (8). For the shuffled LSCC of the firm network, the exponent β x|k , which takes a value about 2, is explained by an annealing approximation solution of the biased random walk model for an uncorrelated network [23]. However, this exponent disagrees with the empirical exponent given in equation (4). Conversely, for the LSCC of the firm network, the exponent β x|k , which is equal to 1.3, agrees with empirical exponent given in equation (4). These results indicate that the properties of the power-law exponent are connected to the space-correlation of the network.

Comparisons between simulations and observations
In this section, we consider the statistical properties of the entire firm network. For general networks, the steady states of the Model-1 and Model-2 do not always exist. In addition, because the scalar quantity defined for the nodes flows into nodes that do not have out-edges, most nodes in the network bulk have zero scalar after many time steps.
To obtain a non-trivial steady state we add the effects of injection and dissipation in the following forms: Model-1 (PageRank model) Model-2 (Biased diffusion model) where 0 < 1 − r ≤ 1 is the dissipation factor and f > 0 is the injection term, which is a constant and positive value. Note that, for k (out) i = 0 in equation (21) (22), we omit contributions of the i-th node. In general, starting from any initial state, the time evolution given by x(t + 1) = rBx(t) + f converges to a unique steady state provided that the maximum eigen-value of rB less than 1 [26], where x(t) is a state vector for time t and rB is a square matrix. Denoting the maximum eigen-value of rB as λ and the corresponding eigen-vector as y gives From this definition, we obtain the following for the Model-1 and for Model-2, we obtain N j=1 Thus, in both cases, Substituting equation (27) in equation (24), we obtain Thus, Therefore, starting from any initial state, x converges to a unique steady state.
In figure 4 we compare the CDFs between the simulations and observation. Figure  4(a) shows the results for case of Model-1 for the firm network, figure 4(b) shows a results for the case of Model-2 for the firm network and figure 4(c) shows the results for Model-2 for the shuffled network. For these figures, we used r = 0.95 and f = 1.33 · 10 5 (1000 yen). Under these conditions, figure 4(a) and (b), we see that the CDF of x for Model-2 agrees well with the sales distribution observed for the firm network, whereas, the CDF of x for Model-1 disagrees with the observed CDF. In addtion, the result shown in figure 4(c), implies that Zipf's law, which is observed in actual data, is strongly related to the degree-degree correlation mentioned in the preceding section.
Next, we compare the results for x from our simulation with the sales of the real firms one by one. figure 4(d), 4(e) and 4(f) show the actual firm sales on the vertical axis and the result for x obtained from the above-mentioned network-flow on simulation in the horizontal axis. Figure 4(d) shows the result for Model-1 for the firm network, figure 4(e) shows the same for Model-2 for the firm network and figure 4(f) shows the result for Model-2 for the shuffled network. Figure 4(e) show that for large x, in the case of Model-2, the conditional mean of actual sales s given sales of the simulation x as a function of x, denoted by < s > x is almost equal to x. Meanwhile, figure 4(d) shows that for Model-1 applied to the the firm network, < s > x is proportional to about the 1.3 power of x, whereas for Model-2 applied to the shuffled network, proportional to about the 0.7 power of x, as shown in figure 4(f). In both cases, < s > x is not proportional to x. This result suggests that Model-2(i.e., the model with injection and dissipation) applied to the firm network roughly reproduces the values of sales of actual firms for simulation sales x larger than about 3 · 10 6 (1000 yen).

Conclusion and discussion
In this paper, we demonstrated that we can roughly estimate the sales of firms from the structure of the Japanese inter-firm trading network. First we found the simple linear local relationship between sales of a firm and the weighted sum of sales of its customers by analysing data from the Japanese inter-firm trading network and corresponding sales data. Next, we introduced a model (Model-2) that satisfies this local linear relationship between adjacent nodes. In this model, a firm (customer) distributes money to its outedges (suppliers) proportionally to the in-degree of destinations. By using this model to numerically simulate the real firm network, we confirmed that the steady flows derived from the money-transport model reproduce the distribution of real firm sales and sales of individual firms on average. In addition, we also confirmed that the PageRank(Model-1), which corresponds to the equal distribution of money to out-edges, does not reproduce the distribution of sales. Note that Model-2 corresponds to a biased random walk whose transition probabilities are proportional to the in-degrees of destinations. Therefore, based on our model, we argue that actual firm sales are proportional, on an average, to the existence probability (or the mean stay time) for the steady state of a biased random walker on the firm network.