Dynamics of essential interaction between firms on financial reports

Companies tend to publish financial reports in order to articulate strategies, disclose key performance measurements as well as summarise the complex relationships with external stakeholders as a result of their business activities. Therefore, any major changes to business models or key relationships will be naturally reflected within these documents, albeit in an unstructured manner. In this research, we automatically scan through a large and rich database, containing over 400,000 reports of companies in Japan, in order to generate structured sets of data that capture the essential features, interactions and resulting relationships among these firms. In doing so, we generate a citation type network where we empirically observe that node creation, annihilation and link rewiring to be the dominant processes driving its structure and formation. These processes prompt the network to rapidly evolve, with over a quarter of the interactions between firms being altered within every single calendar year. In order to confirm our empirical observations and to highlight and replicate the essential dynamics of each of the three processes separately, we borrow inspiration from ecosystems and evolutionary theory. Specifically, we construct a network evolutionary model where we adapt and incorporate the concept of fitness within our numerical analysis to be a proxy real measure of a company’s importance. By making use of parameters estimated from the real data, we find that our model reliably replicates degree distributions and motif formations of the citation network, and therefore reproducing both macro as well as micro, local level, structural features. This is done with the exception of the real frequency of bidirectional links, which are primarily formed as a result of an entirely separate and distinct process, namely the equity investments from one company into another.


Introduction
Recent developments in the field of complexity science have led to a renewed interest in social and economic activities [1][2][3] being captured as complex systems characterised properties such as small world [4] and scale-free [5]. It has previously been observed that business firm PLOS  networks belong to a class of the complex systems interacting with others in distinct ways, accompanied by specific scaling relations [6][7][8][9][10][11][12][13][14][15]. A typical example of business firm networks is the inter-firm business transactions network, whereby nodes are firms that link through business transactions from customers and suppliers, producing a directional money flow (with the opposite direction being the goods/ service flows). Recent studies have found that such type of network contains the generic properties similar to other complex networks, however, with particular formation dynamics. As a result, models have been proposed to understand the dynamics of the networks [11,[16][17][18][19][20] beyond the generic formation models. Here, we highlight the work carried out by Takayasu et al. [11]. The investigation into the Japanese inter-firm business transactions networks, including over 1 million firms in Japan, and the subsequent follow on work carried out by Miura et al. [18]. Within the latter, a simple business network model-whereby a directed link connecting nodes represents money flows between a pair of firms-is proposed in order to evaluate the effects of new establishments, bankruptcies, and mergers and acquisitions (M&As). Each of these businesses processes is separately represented-respectively-by creation, removal and aggregation of nodes and related links. It follows that such model accurately reproduced the statistical characteristics of the inter-firm business transactions networks. The dataset underpinning the above research was provided by one of the leading corporate research bureaus in Japan. It encompasses information on business trading partners-such as customers and suppliers-of almost all corporate ecosystem within Japan, with both public as well as private firms. The dataset contains detailed information of circa 85% of all registered companies within the country that make up 98% of total annual sales [15].
Additional related research provides insight into the concept of 'metabolism' of firms [15,18]. It shows that the cumulative distribution of the age of firms, since establishment, is well characterised by an exponential function, exp(−t/τ). The τ * 55 years is the characteristic decay time so that it is assumed that a firm disappears randomly following a Poisson process. Likewise, the distribution of the business transactions is also approximated well by the exponential function with the τ * 6 years [21]. Therefore, only about 1.8% of nodes and 16.7% of links on the inter-firm business transactions networks rewire over the following year. In other words, these characteristics indicate that most components of the inter-firm business transactions network are stable.
Furthermore, Atalay et al. [16] proposed a scale-free model and confirmed this model using yearly firm-level buyer-supplier networks of the US economy, covering a total of over 39,000 firm-year observations from 1979 to 2007. The dataset used for the research was compiled by one of the credit rating agencies, which had lists of core partners of all the listed firms and a few private firms, generated by company information such as financial reports. In general, public firms are required to disclose all audited financial statements by regulatory listing requirements. Almost all firms usually refer their essential information such as business partners and capital ties in their financial reports to explain prospects of their businesses to external stakeholders [22,23]. Hence, it can be expected that the changes to the business models and fundamental relationships are reflected within the yearly financial reports. Therefore, the essential dynamics of interactions among firms can be derived from the financial reports. However, this potentially differs from the specific dynamics of the inter-firm business transactions networks earlier described, as the latter only provides information related to major transactions whereas the former also contains middle sized as well as smaller business transactions. In any case, little is currently known and researched about differences between these two network dynamics due to the lack of data related to smaller private firms' financial reports.
In summary, although a network solely derived from financial information is only partial, it is found to replicate the key structural features of the wider real interfirm business network, with more specific differences reconciled by a relatively simple evolutionary dynamic algorithm [24].
Motivated by these discussions, a key objective of this study is to investigate the dynamics of firms interactions derived from the financial reports and how they interrelate to the real business transaction networks. Here, we make use of a comprehensive dataset of Japanese firms' financial reports collected by one of the largest corporate research providers in Japan. This dataset covers not only listed firms but also small and middle-scaled private firms in Japan, covering about 400, 000 firms, so that this is large enough to compare with the previous studies as we described above. To do this, inter-firm business citations networks are compiled by text information in the financial reports (hereafter referred to as "citation networks") written about one's business summary given "Line of Business", "Characteristics of the Company", "Operating Performance", "Financial Position and Fund-Raising Capacity", and "Latest Trend and Prospects" [25].
Essentially, we find that about quarter of links on the citation network are exchanged for next year so that the rewiring is one of the dominant processes in the network, even though only about 16.7% links on the inter-firm business transactions network exchanges for next year [21]. In addition, we propose a network generation model with fitness, which is a real number measuring companies' importance proposed by Caldarelli et al. [26], and conduct numerical analysis by our model with parameters estimated by real data. We verify that our model is able to replicate major statistical characteristics of the citation network, with the exception of the real frequency of bidirectional links that are mainly generated by inter-firm investment relationships. This result suggests that investment interactions tend to be formed as a result of a different, and separate, process from other business transactions.

Empirical data analysis
About dataset. In this report, we analyse Japanese firm dataset provided by Teikoku Databank Ltd (TDB), one of the largest corporate research providers in Japan. It is common business practice within Japan for companies to gather detailed corporate information from business partners in order to build long term trustworthy relationships as well as to manage credit and operational risk. The data collection process and credit analysis are normally outsourced and carried out by professional third-party organisations such as TDB, one of the largest corporate research providers in Japan, that has been assessing the credit status of firms for 119 years. TDB's credit research reports include detailed information about the financial statements of firms, their history profile, business partners, management structure as well as banking transactions and relationships. Companies are tracked over time by an allocated unique IDs, "Teikoku Company Code", including all private enterprises, business owners, government organizations and other public offices in Japan. This code is embedded in all databases so that one can combine all different types of data provided by TDB with ease. Table 1 shows details of three specific datasets, namely (a) financial reports, containing detailed financial data from around 400, 000 firms including the years 2017 and 2018; (b) inter-firm business transactions records for about 800, 000 firms, which has been extensively used in academic research over the last fifteen years [11,12,15,18,20,21,27]; and (c) investment relationships form business partners, groups companies, capital ties and structures as well as other investment relationships. The later data is consistent with that found in the first database as firms usually report such essential information within their financial statements. Therefore, the real network of investment relationships and business partners combined with the inter-firm business transactions can be used to directly compare with the citation network derived from the scanning of the financial reports.
Scaling relations. The citation network is constructed by the scanning of text data in the financial reports automatically. This is therefore unstructured, which essentially leads to the construction of some incorrect links due to companies having the same names or common noun names. For example, there is a restaurant run by CHINA Co., Ltd. in Japan. Even though this company is very small, it gets a large number of citations from companies who traded with China as country since it is one of the most important destinations for their exports.
To eliminate such errors, we attempt to detect outliers where the total number of citations is inconsistent with the size of the company. In order to set a standard in one's number of citations, we observe a scaling relation between a link number on the citation network k c and a sum of link numbers on both the inter-firm business transactions and the inter-firm investment network k b (hereafter referred to as "business partners") as the measure of firm size. Besides, this is in similar manner with previous study [24] and it can be substituted by number of employees or annual sales, given that there are scaling relations among those quantities as described in [12].
We next produce an intersection network among the inter-firm business transactions network, the inter-firm investment network, and the citation network. This is composed of overlapping links between the citation network and the rest of the networks so that it is logical to regard the intersection network as a subset of the citation network without data harvesting induced any errors. Therefore, the intersection network can be used to observe real scaling relation between k b and k c . Fig 1 shows scaling relations between the mean of pairs of k b and k c on each type of network. We here divide k c into ffi ffi ffi ffi N p bins-i.e. square-root choice -, where N is total number of firms on each network and each bin contains the same number of items, and we follow on by calculating the mean of k b per bin. Panel (a) shows the relation on the intersection network. We have identified there is a scaling relation such that k b / k 0:9�0:2 c since the intersection network is a subset of the citation network without any errors as mentioned before. On the other hands, panel (b) shows the relation on the citation network. In a range of large k c , this collapses the relation we found in panel (a). Considering this, we delete firms that deviate from the scaling relation k b / k 0:9�0:2 c . Concretely, we eliminate a firm which has k c > μ + 2σ for each range of k b to get the same scaling relation on the citation network as that on the intersection network as shown in panel (c). In the following discussion, we use the citation network without the outliers. Besides, Table 2 shows number of nodes and links in each type of network. This shows that 94,009 firms with 1,035,482 links on the citation network have been eliminated by this procedure in 2018. However, only 2% of all nodes on the intersection network have been eliminated corroborating the fact that almost all valid data is preserved. The elimination rate is also very consistent with the implied error rate arising from the manual procedure described in. Furthermore, in order to obtain additional comfort with regards to the effectiveness of the filtering method, we have manually sampled 1,000 firms within our dataset in the descending order of the respective differences between k b and k c . We found that links were correctly removed from 989 firms within the sample, and therefore, implying an error rate of around 1.1% which as  perfectly acceptable for the purposes of this study. We also noted that all financial reports that we have made use of have proofread manually since TDB has strict data checking process to maintain data quality. Therefore, abbreviations or misspellings of companies are rare events within the reports. Moreover, we conduct segmentation of sentences into their parts of speech and extract only noun words by MeCab, which is an open-source text segmentation library for use with text written in the Japanese language [28]. We then replace any misspelt word or abbreviation with the correct word through this process so that the rare cases are reduced even further and incorrect links caused by such human error are eliminated. Metabolism of network evolution. It has previously been revealed that a model, structured by the processes of node creation, annihilation and coagulation, together with a preferential attachment rule-where new nodes are attached to older ones with a probability which is a growing function of the number of pre-existing links [5,29]-reproduces a degree distribution that follows a power law consistent with that of Japanese inter-firm business transactions network [18,27]. Therefore, the balance between new entrants, bankruptcies and mergers plays the key role in the time evolution of this network. In terms of 'metabolism' of the network, the cumulative distributions of both node lifespan and link lifespan on the interfirm business transactions network are well characterised by an exponential function, exp(−t/ τ), where τ is the characteristic decay time [18,21]. Concretely, τ * 55 years for nodes and τ * 6 years for links so that about 1.8% of nodes and 16.7% of links on the networks exchanges for next year, respectively.
Similar to previous studies on the interfirm business network, we also observe here the occurrence probabilities from 2015 to 2018 for both nodes and links to evaluate the specific dynamics of the citation network. It is assumed that the citation network is a non-growing network because a number of newcomers is the almost same amount as that of disappears, though about 14.9% of nodes disappeared for each year in Table 3. Additionally, the citation network has minimal numbers of coagulation of nodes-i.e. mergers-in comparison with the interfirm business transactions network. Table 4 shows the details of occurrence probabilities for links. This also indicates that the network seems to be a non-growing network in terms of links, though the characteristics are different from links on the business transactions network. Links account for about quarter of the citation network are exchanged for next year so that it appears that rewiring is one of the dominant processes in the citation network, being time independent. Asymmetrical degree distribution. Data from several countries suggest that firm indicators such as annual sales, number of employees, and the number of business partners follow a power law [6][7][8][9][10][11]. In particular, it has been reported that both a number of in-and out-links (i.e. in-and out-degrees) on the inter-firm business transactions network in Japan follows a power law with a cumulative exponent 1.4 ± 0.1 [11]. In contrast, we find that in-and outdegrees are asymmetrical in the citation network as follows.  Network motif. Motif formation is one of the basic characteristics of interactions among three-node subsets in networks [30,31]. To observe microscopic characteristics of the citation network, we observe the motif formation distribution and compare this with that of the business transactions network. Moreover, we randomise both the business transactions network and the citation network over 10, 000, 000 times while preserving firms' degrees [32] (hereinafter, this is called "randomised network") and observe the motif formation distributions. Each panel (a) and (b) in Fig 3 shows all 13 types of three-node connected subgraphs and a probability density distribution of the subgraphs, respectively. Each black circle, black filled circle, grey square, and grey filled square in panel (b) shows the probability distribution of the subgraphs on the citation network, that on the randomised network based on the citation network, that on inter-firm business transactions network, and that on the randomised network based on inter-firm business transactions network, respectively.
It appears that subgraph number two is dominant within the citation network, whereas subgraphs number one, two, and three are dominant in the business transactions network, as also reported in previous studies [33,34]. By comparison with the result by the randomised network on the inter-firm business transactions network, there are differences in frequencies of subgraphs that have bidirectional links such as number four, five and seven to thirteen. Consequently, it seems that the occurrence probability of the bidirectional links on the citation network is unusually high. Moreover, there are significant differences in frequencies of all subgraphs within the citation network, except for the dominant subgraphs. This indicates that,

Fig 3. Network motifs on both the citation network and the inter-firm business transactions network in 2018. Each panel (a) and (b) shows all 13
types of three-node connected subgraphs and a probability density distribution of the subgraphs, respectively. Each black circle, black filled circle, grey square, and grey filled square shows the probability distribution of the subgraphs on the citation network, that on the randomised network by the citation network initially, that on inter-firm business transactions network, and that on the randomised network by inter-firm business transactions network initially, respectively.
https://doi.org/10.1371/journal.pone.0225853.g003 in terms of network motif, therefore, the dynamics of the citation network is far from random. Besides, one can assume that the difference of probabilities between subgraph one and two on the citation network might be caused by the in-out asymmetry since both the probabilities on the business transaction network, which does not have the asymmetry, are almost the same.

Model analysis
Fitness. Various models that generate different types of complex networks have been proposed. As described above, Miura et al. [18] make use of a simple business network model in which a directed link connecting nodes represents money flow between a pair of firms, and they took into account the effects of new establishments, bankruptcies, and mergers and acquisitions (M&As) by creation of new nodes, removal of nodes, and aggregation of nodes together with links, respectively. Additionally, by using a merger kernel estimated through an M&A data analysis, the model reproduces business network characteristics with the parameters estimated by real firm data [15,27]. As shown in Tables 3 and 4, however, the citation network has a few numbers of M&As but a large number of rewirings. Reka et al. [35] proposed an evolving network model that was taken into account the effects of link additions, rewiring, and node additions. In terms of the total number of nodes, however, this also does not fit with the citation network because this is not an evolving network but a quasi-steady-state network, and additionally, the node annihilations cannot be ignored as shown in Table 3. Moreover, Moore et al. [36] studied a model that has both node creations and annihilations. It was revealed that a power law degree distribution was shown to be realised in the case of a growing network with the preferential attachment rule. They also studied that the tail of the cumulative degree distribution follows a power law with an exponent γ = ((3 − r)/(1 − r)) − 1 > = 2.0, where r is a rate of node annihilations. Even though their model fits for the citation network in terms of metabolism, this study would also not match the case of the citation network because the cumulative in-degree distribution follows a power law with the exponent γ = 1.4 as shown in Fig 2(b).
In summary, it seems that the scale-free property of the citation network is not related to the concept of preferential attachment, which new nodes are attached to older ones with a probability which is a growing function of the number of pre-existing links [5]. A financial report (or an annual report) is written about a company's activities throughout the following year so that a company mentions about its business partners based on their relationships which were already constructed. We, therefore, consider a fitness, which is a real number measuring companies' importance proposed by Caldarelli et al. [26], is assigned to every firm initially. Caldarelli et al. studied scale-free networks with the fitness, and they revealed that link creations among nodes with a probability depending on the fitnesses give rise to a rich-getricher mechanism, in which sites with more massive fitness are more likely to become hubs. Fig 4 shows distributions where Q(k b ) is a probability of a new entrant connecting to an old firm of size k b , N(k b ) is a number of firms with k b , and λ is an exponent of fitness for the newcomers in panel (a) and the new links in panel (b), respectively. The observation was introduced by Jeong et al. [37] though we here use the number of business transactions k b which is initially given by the business transactions network so that this is the way to observe the fitness distribution. Both upper and lower dotted lines indicate power law distributions with the exponent λ in = 1.1 ± 0.1 and λ out = 0.6 ± 0.1, respectively.
Monte Carlo simulation. Here, we propose a model introducing both the fitness of each node and the effect of node annihilations as follows: Step1. Start with N 0 nodes having m/2 in-links and m/2 out-links. The end of the links is chosen randomly (m = 4, which is consistent with the mean number of in-and outlinks for newcomers in Fig 2(c)).
Step2. Choose one of the following three events stochastically. The occurrence probabilities of link creations, link annihilations, node annihilations, and node creations are denoted by r p , r q , r r , and 1 − r p − r q − r r , respectively (r p : r q : r r = 0.203: 0.181: 0.301, which corresponds to the observation in Table 4).
Link creations. A node is randomly selected following a rule P out ¼ ðk l out b;i þ 1Þ= P j ðk l out b;j þ 1Þ as a starting point of a new citation link. The other end of the link is chosen randomly following a rule P in ¼ ðk l in b;i þ 1Þ= P j ðk l in b;j þ 1Þ. This process is repeated m times.
Link annihilations. A citation link is chosen randomly and deleted. This process is repeated m times.
Node annihilations. A randomly chosen node is removed along with all citation links connected to this node because a firm's lifetime follows an exponential distribution; it is roughly consistent with the simple assumption that a firm disappears randomly following a Poisson process [18,21].
Node creations. A new node having m/2 in-links and m/2 out-links is added. Each in-and out-link is connected to a node chosen randomly following a rule P in and P out , respectively.
We also illustrate our algorithm by a flow chart in Fig 5. In order to verify whether our model grasps the essence of dynamics of the citation network, we compare both degree distributions and motif formation distributions generated by Regarding the in-degree distribution, the results by λ in = 1.1 * 1.2 fit well with the power law distribution with the exponent γ = 1.4. Moreover, the simulation results by λ out = 0.5 ± 0.1 also work to replicate exponentially decays of the out-degree distribution. These parameters correspond with the observation in Fig 4; therefore, we confirm the model seems to be reasonable in terms of distributions of degree.   Table 4; the occurrence probabilities r p : r q : r r = 0.203: 0.181: 0.301, each the preferential attachment exponent for in-links and out-links is λ in = 1.0 and λ out = 0.5, respectively. https://doi.org/10.1371/journal.pone.0225853.g005 Dynamics of essential interaction between firms on financial reports results with the distribution of the citation network, our model does not fit well. Similar to the comparison results of the real network with the randomised network as shown in Fig 3, there are especially differences in frequencies of subgraphs that have bidirectional links.
For this reason, we investigate bidirectional links on the citation network. As shown in Table 2, there are 587,213 links. Among these links, 568,901 pairs of firms (about 97% of all links) are uni-directional, and the rest of 9,156 pairs of firms (about 3%) are bidirectional. To understand the meaning of each link, we check overlapping among links on the citation network, list of inter-firm business transactions and list of inter-firm investment relationships. Table 5 shows the details. It appears that about 76% of bidirectional links on the citation network overlap links on the list of inter-firm investment relationships, though about only 10% of unidirectional links overlap links on that. We simultaneously attempt to delete overlapping links on the citation network with links on the list of inter-firm investment relationships (hereafter referred to as "capital ties") and compare our simulation results with the probability density distribution of subgraphs on the network (black filled circle in Fig 7). It appears that our results work better to replicate the motif formation distribution on the citation network with capital ties excluded from the citation network. Therefore, it is confirmed that our model reflects the dynamics of firms' interactions, mainly inter-firm business transactions, on the citation network without the real frequency of bidirectional links that are mainly generated by inter-firm investment relationships and a tiny component of the network. This is consistent with the fact that in any group structures both parent companies as well as subsidiaries must report the relationship [38], and therefore, leading to bidirectional links. Dynamics of essential interaction between firms on financial reports Furthermore, to find the best parameter to replicate the motif formation distribution, we apply both the two-sample Kolmogorov-Smirnov test [39] and the two-sample Anderson-Darling test [40], which measure the difference between two distributions. Anderson-Darling test gives more weight to the tails of the distribution, whereas Kolmogorov-Smirnov test is more sensitive of the center of distribution [41,42]. The definition of chi-square statistic in the Kolmogorov-Smirnov test χ 2 = 4D 2 � mn/(m + n), where test statistic D is a maximum vertical deviation between two distributions and m and n are the number of samples of those distributions. Moreover, the definition of the two-sample Anderson-Darling test statistic is as  where each F m (x) and G n (x) is the empirical distribution functions of the sample and H m+n (x) = {mF m (x) + nG n (x)}/(m + n) is the empirical distribution function of the pooled sample [40] . Fig 7(d) and 7(e) show the results for each parameter. Each horizontal line and vertical line shows λ out and test statistics as defined above, respectively, and each grey filled square, purple filled triangle and orange filled rhombus shows λ in = 1.0, λ in = 1.1, and λ in = 1.2, respectively. As a result of this, we find that the simulated distribution comes closest to the real one with the parameter set λ in = 1.2, λ out = 0.5 which fits well with the real observation.

Conclusions
In this paper, we investigated firms' interactions generated by automated scanning of financial reports of around 400, 000 firms in Japan, in order to establish the essential dynamics of interactions between firms. As key finding, we observe that the metabolism of the derived citation network is significantly different from that of the actual inter-firm business transactions network that had previously been studied. Remarkably, we found that about quarter of links on the citation network were exchanged for next year so that the rewiring was one of the dominant processes in the network, even though only about 16.7% links on the inter-firm business transactions network exchanged for next year. Furthermore, we introduced simulations based on a network evolution model with fitness because the financial reports were written about a firms' activities throughout the following year so that firms mentioned about its business partners based on their relationships which were already constructed. Our results show that our model is able to replicate the statistical properties of the citation network with parameters estimated from real underlying data. It is also found that the bidirectional links, which overlapped with the inter-firm investment relationships, was hard to replicate by our simple numerical model. Even though the bidirectional links were tiny components of the citation network, this result suggested that essential investment interactions tended to be generated through a divergent process from that of the business transactions.