Network Structure of Japanese Firms. Scale-Free, Hierarchy, and Degree Correlation: Analysis from 800,000 Firms

We analyze fundamental characteristics of the inter-firm transaction network through the data of 800,000 Japanese firms. We find that there exists a hierarchical structure and a negative degree correlation in this transaction network. We also find that this undirected network is a scale-free network. We bring to light these characteristics of the network and discuss why there is an important need to conduct research work on the actual network structure. --

1 Introduction nity should have. They also contend that there is diminished cooperation among the PD games on the hierarchical scale-free networks. Aoki and Yoshikawa (2006) discuss the importance of a hierarchical structure in Finance and Economics.
Moving on to the topic of degree correlation, the co-star and co-author relationship has a positive degree correlation. On the other hand, the concepts of gene network, protein network, nerve circuit, and food chain appearing in Biology have a negative correlation. Artificial networks like power grids and the Internet have a negative but weak correlation. Degree correlation differs across different networks. Positive correlation tends to lead to a lower percolation transition point Newman (2002), Callaway et al. (2001). In particular, it is well known that degree correlation affects the synchronization of oscillators on networks Motter et al. (2005), Di Bernardo et al. (2007), Sorrentino et al. (2006), Di Bernardo et al. (2005). In this sense, degree correlation is also an important network characteristic.

Outline of this Paper
The outline of this paper is as follows. Section 2 consists of three subsections. The first subsection shows the approach to identifying hierarchical structures and degree correlation. It comprises four blocks. We begin by explaining the figure that we will be using many times thereafter. Then, we briefly touch upon the random and scale-free networks. Next, we describe the hierarchical structure of a network by comparing it with other types of networks that are devoid of hierarchy and also describe the approach to identifying a hierarchical structure. Finally, we explain what is meant by degree correlation. The second subsection has three blocks. Here, we first describe the data. Thereafter, we explain the method through which degree distribution is calculated. Finally, we introduce clustering coefficients. In the last subsection, we show the results of the study and discuss them in three blocks. First, we review the degree distribution of the Japanese inter-firm undirected network. Next, we show that the network has a hierarchical structure by analyzing the clustering coefficient. Finally, we show that the network has a negative degree correlation. Section 3 contains the conclusion. First , we explain Fig.1 1 briefly. In this figure, there are three types of networks: random networks, scale-free networks, and hierarchical networks. We will explain the networks in this given order. In the figure, k stands for the degree, which is defined by the number of links that the vertex has. The first row (a) illustrates the examples of each network, the second row (b) exhibits degree distribution P (k), and the third row (c) shows the clustering coefficient C(k). The latter two terms will be explained later.

Scale-Free Network and Random Network
Firstly, let us explain scale-free networks. A scale-free network is a network whose degree distribution is as follows.
The degree is the number of links that a vertex has and is denoted by k. A scale-free network is quite different from the "Random Graph" that was defined by Erdos and Renyi (1959). A random graph is constructed in the following way. Choose two vertices and either link them with uniform probability p or do not link them with probability 1 − p. Complete this procedure for all pairs of vertices. If the whole network has n vertices, the degree distribution is binomial.
(2) becomes Poisson distribution Here, λ is the mean degree of the network. In Fig.1, Aa and Ab illustrate a random graph and its degree distribution.
On the other hand, Ba and Bb in Fig.1 show a scale-free network and its degree distribution in a log-log plot. In a random graph, there are no vertices with very large degrees. In a scale-free network, in contrast, there are a small number of vertices with very large degrees that are called "Hubs." Roughly speaking, the two networks differ from each other in that hubs are present in one while being absent in the other.

Identifying Hierarchical Structure
The third network (C) is a hierarchical network and has a scale-free degree distribution P (k) ∼ k −γ . The difference between a scale-free network (B) and a hierarchical network is depicted in the third row (Bc, Cc), which illustrates the clustering coefficients of these networks. The clustering coefficient, which will be explained later, of the scale-free network in the figure is constant C(k) = Const. On the other hand, the clustering coefficient of a hierarchical network is dependent on k as C(k) ∼ k −1 . When we compare this hierarchical network with other two types of networks,the structure of the hierarchical network becomes clear. In many real networks, the clustering coefficient and the degree have the special relation of C(k) ∼ k −1 . Examples of this relation are the Co-Actor network, the Language network, the World Wide Web, and the Internet at the autonomous system level. Ravasz and Barabási (2003) show that this relation, C(k) ∼ k −1 , implies a hierarchical structure. The authors term networks like (Cc) in Fig.1 "hierarchical structure" networks. Barabási and Oltvai (2004) have also explained this relation and hierarchical structure.
As is explained in graph (Cc), log c(k) and log k are linearly proportional and the proportionality coefficient is −1, while the clustering coefficient C(k) is constant in the other two networks. Therefore, in order to detect the relation C(k) ∼ k −1 that implies a hierarchical structure as in (Ca) 2 in Fig.1, we study the clustering coefficient.

Degree Correlation
We study the degree correlation of the network. Degree correlation is defined by the following equation: Pr(k | k) is the conditional probability that the vertex with degree k is adjacent to the vertex with degree k . In a nutshell, k nn (k) implies that the vertex with degree k is adjacent to the vertex with degree k nn (k). This is also an important network characteristic.

The Data
The data for this study have been supplied by Tokyo Shoko Research, Ltd. (TSR) via RIETI and consist of the financial data and network relationships of 800,000 firms: in other words, the data comprise their buying, selling, and shareholder relationships. The data contain reports on 4,000,000 of the firms' relations and include information on gross sales, region, year of establishment, number of employees, number of offices, number of factories, industrial classification, and so on. We did not use the data of any firm that does not report its gross sales; thus, the number of firms we used reduced to 800,000. We used the relationship between the firms' buying and selling. This data set was created by asking the firms to report their business partners. Thus, the data set suffers from one limitation: it does not include all the transactional relationships. In this paper, we construct an undirected network in which we do not discriminate the fact that firm A sells to firm B from the fact that firm B sells to firm A. This is because we believe that when analyzing the clustering coefficient, it is appropriate to study an undirected network. Furthermore, we believe that studying the clustering coefficient of an undirected network is more essential than studying that of a directed one. Thus, this paper only studies the relationship in which there exists a transaction between two firms. We built an adjacency matrix, which is the common method through which networks are analyzed. In the adjacency matrix, we set element ij as 1 if there is any transaction, regardless of whether it involves buying or selling . We set element ij as 0 if no transaction takes place between firm i and firm j. Hence, the number of transactions between firms is not considered either; our only concern is whether any transaction has taken place. Actually, we are unable to discern the number of transactions between specific firms from the data. Thus, the new data that include the number of transactions will be able to reveal new information.

Degree Distribution
Now, we present the method of calculating degree distribution in detail. First, we count all the links from the vertex with degree k. Let this be denoted by "All degree(k)." Remember that the degree indicates the number of links that the vertex has. Second, we calculate P (k) as follows: In order to detect scale-free distribution, it is better to draw the CDF rather than the PDF, the reason for which is given in Dorogovtsev and Mendes (2003). Hence, we illustrate the CDF.

Clustering Coefficient
Now, we provide the definition of the clustering coefficient. The clustering coefficient can be defined for every vertex whose degree is larger than 1. For example, the clustering coefficient of vertex j is defined as follows: C j = The number of triangles around vertex-j The maximum number of possible triangles around vertex-j (6) Figure 2: explanation for clustering coefficient Fig.2 explains the clustering coefficient. The number of triangles around vertex j is 2. The maximum number of triangles we could make around vertex j is 4 C 2 = 6 because there are 4 vertices around vertex j. Thus, the clustering coefficient of vertex j is 2 6 = 1 3 . If the degree is 1, we cannot define the clustering coefficient because we cannot make any triangle around the vertex. Recall that the degree is defined as the number of edges that the vertex has. For example, in Fig.2, the degree of vertex j is 4. In a nutshell, the clustering coefficient helps us measure how densely the vertices are connected locally among their neighbors. We use a friends' network as an example. If the clustering coefficient is large, it is likely that a friend of a friend is also your friend. The same in unlikely if the clustering coefficient is small. It is worth noting that many real networks have a larger clustering coefficient and smaller mean path length as compared to random networks. The mean path length is defined as the average of the path lengths over all the pairs of vertices. We study the transaction network of Japanese firms. The solid line in Fig.3 illustrates the 1 − CDF of degree distribution in log-log plots. Saito et al. (2007) shows that the directed network of the transactions of Japanese firms in which the buying and selling transactions are distinguished from each other has a scale-free distribution. On the other hand, we show that the degree distribution of an undirected network, 3 in which we consider that the firms are linked if there exists either transaction, follows a scale-free distribution of the form P (k) ∼ k −2.4 . 1 − CDF of P (k) ∼ k −2.4 is illustrated through a dashed line in in the same figure.

Results and Discussions
The drop of the actual data line in the right of the figure stems from the fact that there are only finite number of vertices in the network. If we want to see a perfect scale-free distribution, we will need a network with infinite number of vertices.

Hierarchical Structure of Japanese Firms
In this section, we discuss the hierarchical structure of the transaction network of Japanese firms that is implied by a clustering coefficient.
We draw the clustering coefficient of the transaction network of 800,000 Japanese firms. Fig.4 depicts the scatter plots and estimated line. The x-axis is log(k) and y-axis is log(clustering coefficient). Table 1 shows the estimation result.  The estimated relation is R 2 is 0.66. Remember that C(k) stands for the clustering coefficient of the vertex with degree k. Eq.(7) strongly demonstrates that the coefficient of log(k) is very close to −1; this relation is equivalent to C(k) ∼ k −1 , which is desired. As we have discussed previously, this relation implies that the transaction network of Japanese firms not only has a scale-free structure, it also possesses a hierarchical structure. This fact is clearly exemplified (Ca) in Fig.1. Fig.4 seems to comprise less than 4 structures of dots aligning on lines with a negative slope. We need to explain this observation. Remember that the clustering coefficient is defined as The number of triangles k(k − 1)/2 ∼ The number of triangles k 2 /2 .
The number of triangles in Eq.(8) is discrete, i.e., 1, 2, 3, and so on. Hence, the bottommost structure consists of points that comprise 1 triangle, and thus, the clustering coefficients are 2 × 1/k 2 . Similarly, the second structure from the bottom consists of points that comprise 2 triangles. Subsequently, the clustering coefficients are 2 × 2/k 2 . The clustering coefficients of the third structure from the bottom are 2 × 3/k 2 and so on. Since we take their logarithm, the slope of these structures is −2, so that in the region where the number of triangles is small, they appear to be aligned.
We would like to mention the following. Barabási and Albert (1999) introduced the famous scale-free network generating mechanism known as preferential attachment. In a nutshell, the higher the vertex's degrees, the more likely it attracts links from other vertices. However, it is well known that the network generated by this preferential attachment mechanism does not have a hierarchical structure. In this network, the relation C(k) ∼ k −1 cannot be observed. Since the transaction network has a hierarchical structure, another mechanism must be governing the firms' transaction network. Another important discovery is the existence of a degree-degree correlation. The degree and the mean of the next neighbor degrees have the following relation in the network.

Degree Correlation
Here, k nn stands for the mean of the next neighbor degrees. The result is k nn = 1289 k −0.546 . Table 2 shows the regression results.
We thus obtain log(k nn ) = −0.546 log(k) + 7.162, which is almost same as Eq.(9). Further, R 2 is 0.681. Fig.5 illustrates the scatter plot and fitted curve while Fig.6 demonstrates the log-log plot and fitted curve.

Conclusion
We study the existing transaction network in Japanese firms by analyzing the firms' degree distribution, clustering coefficient, and degree correlation. We discover the following three important characteristics. First, we find that an undirected network is a scale-free network. Second, we discover that the network has a hierarchical structure. Third, we realize that there exists a degree correlation.
As mentioned earlier, we believe that the study of a real network will lead to further research that will reveal the hidden relation between the underlying network structure and the economy. We need information on the real network structure when we build models on networks. In many other fields, it has been discovered that the micro and macro properties are dependent on the network structures. They depend on, for example, whether the network structures are random or scale-free networks, hierarchical structure, clustering coefficients, degree correlation, and so on. We expect that a similar relation will be discovered in Economics as well. In this sense, the study of a real network toward obtaining the above information is crucial.