Money flow network among firms’ accounts in a regional bank of Japan

In this study, we investigate the flow of money among bank accounts possessed by firms in a region by employing an exhaustive list of all the bank transfers in a regional bank in Japan, to clarify how the network of money flow is related to the economic activities of the firms. The network statistics and structures are examined and shown to be similar to those of a nationwide production network. Specifically, the bowtie analysis indicates what we refer to as a “walnut” structure with core and upstream/downstream components. To quantify the location of an individual account in the network, we used the Hodge decomposition method and found that the Hodge potential of the account has a significant correlation to its position in the bowtie structure as well as to its net flow of incoming and outgoing money and links, namely the net demand/supply of individual accounts. In addition, we used non-negative matrix factorization to identify important factors underlying the entire flow of money; it can be interpreted that these factors are associated with regional economic activities. One factor has a feature whereby the remittance source is localized to the largest city in the region, while the destination is scattered. The other factors correspond to the economic activities specific to different local places. This study serves as a basis for further investigation on the relationship between money flow and economic activities of firms.


Introduction
Determining how money flows among economic entities is an important aspect of understanding the underlying economic activities. For example, the so-called flow of funds accounts record the financial transactions and the resulting credits and liabilities among households, firms, banks, and the government (see, e.g., [1]). Another example is the input-output table, which describes the purchase and sale relationships among producers and consumers within an economy and clarifies the flows of final and intermediate goods and services with respect to industrial sectors and product outputs (e.g., [2]). These data are used in macroscopic studies, such as those of industrial sectors and aggregated economic entities.
Recent years have witnessed the increasing emergence of microscopic data. For example, one can study a nationwide production network, i.e., how individual firms transfer money among one another as suppliers and customers for transactions of goods and services (see [3] and the references therein). In contrast to the macroscopic studies mentioned above, microscopic studies can uncover the heterogeneous structure of the network and its role in economic activities, how the activities are subject to shocks due to natural disasters [4] and pandemics [5], and so forth. However, microscopic data are not exhaustive; although they may cover most active firms, not all the suppliers and customers are recorded. Such records are based on a survey in which a firm nominates a selected number of important customers and suppliers. In addition, the transaction amounts are often lacking; hence, the network is directed but only binary. More importantly, microscopic and macroscopic data are compiled and updated annually or quarterly at most (see [3,6] and the references therein).
To uncover how economic entities such as firms perform economic activities in a real economy, we should ideally study how money flows among firms by using real-time data of bank transfers with exhaustive lists of accounts and transfers. Also, investigating money flows among accounts will help to tackle real-world problems including the prediction of the economic impact of COVID-19, the defaults of firms, and the bank accounts that could be involved in illegal activities. However, these problems have been addressed without utilizing the information about the network of money flow [7]. The prediction accuracy will be improved by taking into account the network as well as other features. To the best of our knowledge, such a study has not been conducted thus far, simply because such data are not available for academic purposes. The present study precisely performs such an analysis of a Japanese bank's dataset. The bank is a regional bank, which has a high market share with respect to the loans and deposits in a prefecture, particularly supporting financial transactions among the manufacturing firms located there (according to a disclosure issued by the bank).
The objective of this study is to investigate economic activities via bank transfers among firms' accounts by selecting all the transfers related to the firms to uncover how money flows behind the economic activities. More specifically, we examine the network and flow structures, especially the so-called bowtie structure, to locate the position of individual accounts upstream and downstream of the entire flow. We quantify the location using the method of Hodge decomposition of the flow. Furthermore, we examine geographical information of bank transfers in order to see how geographical relations between remittance source and destination are represented by a small number of components of areas.

Data
Our dataset comprises all the bank transfers that are sent from or received by the bank accounts in a regional bank. The regional bank is Shiga Bank, Ltd., the largest bank in a prefecture in Japan, which is mid-sized in terms of its population (more than a million) and economic activity. All the accounts are anonymous for obvious reasons, while several attributes such as geographical locations are given to the accounts owned by firms under the anonymity. Hereafter, we refer to it simply as Bank A for brevity. The period covered in our study is from March 1, 2017, to July 31, 2019, i.e., a period of 29 months or 883 days.
During this period, there were 23 million transfers among 1.7 million bank accounts involving a total of 17.4 trillion yen (roughly 160 billion USD or 140 billion Euros). Let For a transfer i → j, the column "Entire data" includes the cases in which either i or j is not an account of Bank A. The column "Within Bank A" corresponds to the case in which both i and j are accounts of Bank A. "firms" implies that both the source and the target of a link are firm accounts. M and T denote million and trillion, respectively.

Figure 1
Construction of bank-transfer network by aggregation. How bank transfers are aggregated into links. i made three transfers (1, 2, and 4) in an arbitrary unit of money to j, while j made one transfer (1) to i during a certain period. Flow f ij is defined by the total flow of transfers along i → j. Frequency g ij is the frequency of these transfers us denote a transfer from account i to account j by i → j. To focus only on the firms' accounts in Bank A, we filtered the data such that (i) both i and j are the accounts of Bank A, (ii) both i and j are owned by firms excluding households, and (iii) self-loops i → i are deleted. Point (ii) is important for our purpose, because our concern here is how money flows and circulates among firms' accounts, which is considered to be closely related to the firms' economic activities. The resulting data are summarized in Table 1 (see the rightmost column). Note that multiple transfers i → j can exist for a given pair of i and j, because of frequent transfers. One can quantify the strength of the directional relationship between a pair of accounts either by the flow of transfers or by their frequency. To do so, we aggregate multiple transfers, if present, into a single link i → j with two types of weights, namely flow f ij and frequency g ij (see the illustration in Fig. 1). Hereafter, we use the term link for aggregated transfers.
The number of accounts or nodes in the network is N = 30,613, while the number of links is M = 280,864 after the aggregation (see Table 1).
The summary statistics of the links' flows f ij and frequencies g ij for all the pairs of accounts i and j are presented in Table 2. One can observe that the distributions for flow and frequency have large skewness, implying that a considerable fraction of the money flow is due to a large amount transferred by a small number of flows.

Network of firms' accounts and links of transfers
First, let us summarize the network structure comprising firms' accounts as nodes and aggregated transfers as links. We remark that transfers are aggregated into links as shown in Fig. 1. The degree is the number of transfers received by or sent from an account. The number of incoming and outgoing links of an account is called the in-degree and out-

Figure 2
Degree distributions for the bank transfer network. Complementary cumulative distributions for in-degree and out-degree, which refer to the number of incoming and outgoing links, respectively, of each account degree, respectively. Figure 2 shows the distributions of the in-degree and out-degree as complementary cumulative distributions. By noting that the total number of accounts is N = 30,613, we can see that a small fraction of accounts has a considerable degree, i.e., a thousand or more links, while most accounts have a limited number of links. In fact, the tail in Fig. 2 can be approximated by a Pareto distribution P(k) ∝ k -μ for degree k with an exponent μ, which can be estimated as μ = 1.63(±0.05) for in-degree and μ = 1.99(±0.06) for out-degree (standard errors in parentheses), both obtained by Hill's estimator for the top 1000 (corresponding to roughly k > 50). Such hubs are presumably entities associated with the local government or the public sector in the region. We summarize the basic properties of the network in Appendix A.
Because each node has an in-degree and out-degree, we can examine how they are correlated. Figure 3 shows the scatter plot for the in-degree and out-degree of each account. We can observe a tendency for a positive correlation between the degrees (Pearson's r = 0.303 (p < 10 -6 ); Kendall's τ = 0.164 (p < 10 -6 )). We also observe the accounts that have many more incoming links than the outgoing ones (and vice versa), which can be respectively considered as "sinks" and "sources" with respect to money flow. If household accounts were included, one would have a larger number of sinks corresponding to the situation  that income and saving are likely larger than expenditure and dissaving, but such sinks are not present here.
We can observe each link's weights, flow f ij , and frequency g ij (see Fig. 1). Figure 4 shows the complementary cumulative distribution for the flow along each link. The distribution is highly skewed; there exist a small number of links that have a large amount of flow  Each level corresponds to the hierarchical level in the Infomap community analysis [10]. A community at a level can be decomposed at the next lower level (from top to bottom). If a community cannot be decomposed further, it is called an irreducible community. The numbers of irreducible communities are listed in the third column. The fourth column lists the numbers of accounts belonging to these irreducible communities at each level.
exceeding a billion yen -likely important channels with large flows of money. Quantitatively, 0.1% of the links have flows larger than a billion yen. Figure 5 shows the complementary cumulative distribution for the frequency along each link. The steps at 30 and 60 on the horizontal axis are considered to correspond to transfers performed once or twice in each month (recall that the entire period includes 29 months). We can see that 0.1% of the links have frequencies of 500 or more corresponding to daily transfers on weekdays.

Community analysis
Communities or clusters in a network are tightly knit groups with high intra-group density and low inter-group connectivity [8]. Community analysis is useful for understanding how a network has such heterogeneous structures. We adopt the widely used Infomap method [9,10] to detect communities in our data.
The results are presented in Table 3. "Level" indicates the level of communities in a hierarchical tree of communities that are detected recursively (see [10]). The number of communities indicates how many communities are detected at the corresponding level. The label "irr. comm. " denotes irreducible communities that cannot be decomposed further Distributions of the sizes of irreducible communities. Rank-size plot for the sizes of irreducible communities detected using the Infomap method at all the levels, where the ranks are in descending order of the size with the lowest rank equal to the total number of irreducible communities (see Table 3). The size of a community is simply the number of nodes included in the community to the next level of smaller communities in the hierarchical decomposition. For example, 143 of 164 communities at the first level are irreducible ones, whereas the rest of them are decomposed into 2327 smaller communities at the next level, and so forth.
We find that most of the communities are at the second level because of the number of accounts, and that most of the accounts (94.5%) belong to the second-level communities. In our previous study [11] on the application of hierarchical community analysis using Infomap to a large-scale production network, we showed that a relatively shallow hierarchy can be observed at the fifth level as the lowest level; in particular, most firms are included at the second level, exactly as we find here. This is reasonable, because our data on bank transfers among firms' accounts should reflect a regional fraction of the entire production network on a nationwide scale. The finding here is interesting, because this implies a selfsimilar structure of the production network meaning that a partial system has a similar network property to the global system. Figure 6 shows the distribution of the sizes of irreducible communities at the lowest level that includes all the accounts. The size of a community is simply the number of nodes included in the community. The result indicates that the size of the communities is highly skewed over a few orders of magnitude. We note that there exist more than 10 communities with sizes exceeding 100, which correspond to important clusters of economic activities that depend on geographical sub-regions and industrial sectors. We shall discuss this issue in our analysis of non-negative matrix factorization later.

Bowtie-walnut structure
With respect to the flow of money, the accounts can be located in a classification of the so-called bowtie structure, which was first adopted in the study of the Internet [12]. In the context of economics and finance, the method has been applied to business relationship networks [13] and credit default swap network [14], for example. Nodes in a directed network can be classified into a giant strongly connected component (GSCC), its upstream side as the IN component, its downstream side as the OUT component, and the rest of the nodes that do not belong to any of GSCC, IN, and OUT. In general, they can be defined as follows.
GWCC  Table 4. As is seen here, nearly 40% of the accounts are inside GSCC. Further, 15% of the accounts are in the upstream portion or IN, whereas 37% are in the downstream portion or OUT (see Fig. 7). These figures are similar to those observed in the production network in Japan in a previous study [11].
The similarity between the current money-flow network and the production network requires careful elaboration. First, the flow in the current network is in the direction of money transfer, while in the production network the flow is in the direction of goods and services, i.e., from suppliers to customers. Therefore, the IN component in the production network should be compared to the OUT component and vice versa. Upon making this comparison, we notice that the OUT component in the current network occupies a much larger portion of the network (37.3%) than the one in the production network (20.6%; see [11]). This is understandable based on two facts: (i) nodes are bank accounts in the current network, while in the production network, nodes are firms' headquarters; (ii) the prefecture where the current regional bank resides is void of major cities of Japan, such as Tokyo, Osaka or Nagoya. This implies that many firm headquarters are outside of this prefecture and the prefecture is dominated by agriculture and production facilities. Given that the number of bank accounts of factories and associated offices is expected to be much higher than that of accounts being closer to consumer market, the OUT component in the current network occupies a larger portion than it does in the production network. "Ratio" refers to the ratio of the number of firms to the total number of accounts in GWCC.

Figure 7
Walnut structure: a schematic view. The so-called bowtie structure reveals that GSCC includes nearly 40% of all the nodes or accounts, while the IN and OUT components include 15% and 37%, respectively (see Table 4 for the details). The prominent features are as follows. (i) The shortest distances to IN and OUT from GSCC are quite small, typically 1 or 2, and 4 at most (Table 5); hence, the ties are not elongated like a "bowtie" but rather like a "walnut" skin. (ii) The nodes in the components of IN and OUT are connected to the nodes scattered widely in GSCC. See also the study of a supplier-customer network [11] with similar features The global structure of the network, its connectivity, is an another important property. The term "bowtie" refers to the connectivity structure observed in many social and technological networks, such as the Internet, where the maximum distances from GSCC to IN or OUT are often very long (see the original paper [12]) and in fact look similar to a bowtie in their visualization. However, in the case of production networks, it was found that the connectivity between IN and GSCC and the one between GSCC and OUT is very high: Over 90% of nodes in IN can reach a node in GSCC with only one link and similarly with from OUT to GSCC. Here, network visualization showed a tightly bound shape, where IN and OUT forms two thin half shells with GSCC at the core, reflecting the high connectivity. The authors of [11] found that this structure, lacking two wings elongating from the center, does not resemble a "bowtie" at all and coined the more fitting term "walnut" to describe it. The shortest-path lengths between GSCC and IN or OUT in the current money-flow network is given in Table 5, where we can observe that the accounts in the IN and OUT components are only a few steps away from GSCC: the money-flow network is "walnut" in structure, similar to the production network. This is most understandable when we look at their relationship. Given that firm headquarters form a tight "walnut" network, their factories and related offices cannot form an elongated link structure.
Finally, it should be noted that the decomposition of IN, OUT, and GSCC components is based on the identification of the largest strongly connected component (GSCC) and reachability to it from other part of the network. Individually, there is no significant difference in each network structure. Each component is merely a subgraph of the original network; a part of bank transfers reflecting the supplier-customer relationship of firms. The difference is the relative position of IN and OUT with respect to GSCC, which can be quantified and interpreted as relative position in the upstream and downstream of money flow (as discussed in the following section).

Hodge decomposition: upstream/downstream flow
Our analysis of the bowtie structure implies that the nodes in IN and OUT are located in the upstream and downstream sides in the flow of money. The Hodge decomposition of the flow in a network is a mathematical method of ranking nodes according to their locations upstream or downstream of the flow [15]. This method, also known as the Helmholtz-Hodge-Kodaira decomposition, has been used to find such a structure in complex networks (see, e.g., neural networks [16] and economic networks [17][18][19]).
First, we recapitulate the method in a manner suitable for our purpose here. Let A ij denote adjacency matrix of our directed network of bank transfers, i.e., Recall that the numbers of accounts and links are N and M, respectively. We excluded all the self-loops, implying that A ii = 0. Each link has a flow, denoted byF ij , either of the total amount of transfers, f ij , or the frequency of transfers, g ij (see Fig. 1), i.e., Note that there may be a pair of accounts such that A ij = A ji = 1 andF ij ,F ji > 0. Next, we shall take the frequency of transfers, g ij , by assuming that it represents the strength of the link. Let us define a "net flow" F ij by and a "net weight" w ij by Note that w ij is symmetric, i.e., w ij = w ji , and non-negative, i.e., w ij ≥ 0 for any pair of i and j. We remark that Eq. (5) is simply a convention to consider the effect of mutual links between i and j. One could multiply Eq. (5) by 0.5 or an arbitrary positive number, which does not change the result significantly for a large network. Now, the Hodge decomposition is given by where the circular flow which implies that the circular flow is divergence-free. The gradient flow F (g) ij can be expressed as i.e., the difference of "potentials". In this manner, the weight w ij serves to make the gradient flow possible only where a link exists. We refer to the quantity φ i as the Hodge potential.
If φ i is relatively large, the account i is located in the upstream side of the entire network, while a small φ i implies that i is located in the downstream side of the entire network. Equations (6)-(8) can be solved as follows. First, we combine them into the following equation for the Hodge potentials (φ 1 , . . . , φ N )(≡ φ): for i = 1, . . . , N . Here, L ij is the so-called graph Laplacian and defined by where δ ij is the Kronecker delta. It is straightforward to show that the matrix L = (L ij ) has only one zero mode (eigenvector with zero eigenvalue), i.e., φ = (1, 1, . . . , 1)/ √ N . The presence of this zero mode simply corresponds to the arbitrariness in the origin of φ. We can show that all the other eigenvalues are positive (see, e.g., [20]). Therefore, Eq. (9) can be solved for the potentials by fixing the potentials' origin. We assume that the average value of φ is zero, i.e., i φ i = 0.
We note that the Hodge decomposition described above plays an essential role in deciphering structure of the entire network, as well as the position and the role of each nodes in it. In studying the nodes, one may think of simply evaluating the cumulative out-flows and use it in place of the Hodge potential. This, however, misses the whole point of studying the network: Let us think of two nodes in the IN component, who have the same total out-flow. If we use the total out-flow as a measure of their locations, they are at an equal level, regardless of to whom they are connected: even if one is connected to a GSCC node close to the IN side and the other is connected to a GSCC node close to the OUT side. This also applies to those GSCC nodes in a reverse way: in evaluating the location of those GSCC nodes it is important to whom in the IN/OUT component they are connected. The Hodge decomposition solves this problem at once, as it is based on the network structure. Those IN nodes will be given appropriate Hodge potential in relation with their connection to those GSCC nodes, who again are given appropriate Hodge potential with view of all the other edges of the entire network. (See Appendix B for some intuitive explanation and simple examples.) The Hodge potentials obtained for the entire network of GWCC are shown in Fig. 8 as the distribution for the potentials of all the accounts in GWCC. By noting that the average is zero by definition, we can see that it is a bimodal distribution with two peaks at positive and negative values, while there are a number of potential values close to zero (peaks around zero). The nodes in TE (tendrils) can be considered to have locations that are not It can be expected that there is a correlation between the value of the Hodge potential and the net amount of demand or supply of money for each node. We can measure the net amount of demand/supply by examining the in-degree and out-degree of the node, or alternatively, the in-flow and out-flow of money. Figure 9 and Fig. 10 show the results.
We find that if the potential is positive, the node is located in the upstream side, and its net degree and flow are negative. If the potential is negative, the node is located in the downstream side, and its net degree and flow are positive.
This finding can be interpreted as follows. Consider a supplier in the production network, which supplies its products to a number of customers. The supplier has a bank account (or possibly multiple accounts) that receives money from the customers' accounts as the supplier's sales. If the supplier is in the upstream side of the supplier-customer relationship, it is likely that the account is located in the downstream side of the money flows in this study. As the supplier not only makes sales but also incurs costs, typically labor costs, there must be an outgoing flow from its account to be linked with households and other non-commercial entities, which are not included in the present study. Consequently, the supplier's account has a positive net degree and flow, while its Hodge potential is likely negative. A similar argument would hold for customers in an opposite way. In other words, our finding is a direct observation of how the flow of money reflects the economic activities among the firms' accounts.

Figure 9
Hodge potential and net degree for each node. Each point represents a node or an account. The net degree is defined by the difference between the in-degree and the out-degree of the node. If the net degree is positive, the node has more incoming links than outgoing ones and vice versa In response, a keen reader may wonder how the results might change if the consumption of households is included in our study. In fact, in case of the economic activity of households in this particular region as well as other regions of Japan during the period of our study, cash was the largest channel of payment. Hence, transfers from households to firms are quite negligible in frequency and amount, even if the consumption of households is included.

Non-negative matrix factorization (NMF): decomposition of geographical structures of bank transfers
In this section, we focus on the geographical information of bank transfers. Each bank account has a registered address, when the account was created. We obtain the latitudes and longitudes of the bank accounts by using geocoding. Consequently, a bank transfer between two bank accounts has two coordinates of its remittance source and destination. Can geographical relations between source and destination be represented by only a small number of components of areas? We construct a non-negative matrix defined from the frequencies between the geographical areas, and we adopt NMF to find such components of geographical structures of the bank transfers. NMF constructs an approximate factorization of a non-negative matrix [21]. Applications of NMF to real dataset give a small number of components whose linear sums can approximate elements of the dataset. For example, NMF is useful for processing facial images because it produces parts-based representations of such images [22]. To obtain the basic components whose linear sums can approximate bank transfers, we apply NMF to a non-negative matrix V constructed as a geographical aggregation from the frequencies of bank transfers g ij in the following way.
Let the geographical location of account i be loc(i), that is, the pair of the longitude and latitude of the registered address of i. We set a lattice grid in the entire region including the Shiga prefecture using L by L sufficiently small squares, where L = 100. Let R be such squares ( = 1, 2, . . . , L 2 ). Aggregate the frequencies of bank transfers from a source grid R s to a destination grid R d bỹ where the summation is taken over all pairs of accounts (i, j) such that the source i is located in R s and the destination j is located in R d . Then let us convert the aggregated frequency to its logarithm to reduce the influence of outstanding values by Note that every entry V sd is non-negative. V = (V sd ) is a sparse matrix of size L 2 × L 2 ; that is, only a small fraction of the entries are non-zero because bank transfers do not occur between many pairs of source and destination, for which we have V sd = 0. NMF provides the approximate factorization: where W and H are non-negative matrices of size L 2 × K and K × L 2 respectively and K is an integer. Because of the sparsity of V , we expect that K L 2 . We assume that the approximation is based on the minimization of the following loss function given by the Frobenius norm: where W ≥ 0 and H ≥ 0 implies non-negativity. Technically, we solve Eq. (14) numerically with the initialization of W , H using non-negative double singular value decomposition (see the review [23] and references therein). The minimization yields local minima in general. However, our numerical solutions under different random seeds provided essentially the same decomposition. The decomposition by NMF can be interpreted as follows. Equation (13) is explicitly written as For an arbitrary source s, Eq. (15) can read as where h k is the vector given by the kth row of H. This equation means that the transfers from the source s can be expanded by such "basis vectors" h k (k = 1, 2, . . . , K where w k is the vector given by the kth column of W . This implies that the transfers to the destination d can be expanded by the basis vectors w k (k = 1, 2, . . . , K ). The basis vector's components (w k ) s = W sk represent a spatial pattern of how sources s are distributed geographically for the kth NMF component. In fact, we can regard Eq. (13) as the approximation of V by the sum of products of these basis vectors: This expression can be understood in the way that bank transfers can be decomposed into K "NMF components" comprising pairs of basis vectors, w k and h k (k = 1, 2, . . . , K ). We let K = 10 from the prior knowledge that the number of local communities in the prefecture is around 10. We later discuss how results depend on different choices of K . Results of all the basis vectors for each NMF component k = 1, 2, . . . , K are depicted in Fig. 11, Fig. 12, Fig. 13   The exceptional case is the basis vector h k for k = 7 in Fig. 13. In this case, while the source is concentrated in the largest city of the Shiga prefecture, the destination spreads over the entire prefecture and also its neighboring city of Kyoto. This implies that one of the NMF components corresponds to bank transfers from firms in the largest city to other firms in different local areas as well as in Kyoto.
In all the other cases, we can observe that the pair of source and destination is located in mostly similar regions. To clarify this, Fig. 15 shows a matrix of cosine similarities between a basis vector of the source and a basis vector of the destination, where the cosine similarity  Fig. 11 for explanation. Note that for the box k = 7, the basis h k=7 (right figure) has its vector components of destination widely distributed over the Shiga prefecture and its neighbor in Kyoto, which is located to the left (west) of Shiga. So, in this exceptional case, the figure does not include a circle corresponding to any peak of vector components of h k and w k is calculated by

Conclusion
We studied an exhaustive list of bank accounts of firms and remittances from source to destination within a regional bank with a high market share of loans and deposits in a prefecture of Japan. By studying such a network of money flow, we could uncover how firms conduct the underlying economic activities as suppliers and customers from the upstream side to the downstream side of the money flow. We aggregated the remittances that occurred for each pair of accounts as a link during the period from March 2017 to July 2019 (i.e., approximately two and a half years), which comprises 30K nodes and 0.28M links. We found that the statistical features of the network are actually similar to those of a production network on a nationwide scale in Japan [3], but with greater emphasis on the regional aspects.
The bowtie analysis revealed what we refer to as a "walnut" structure in which the core and upstream/downstream components are tightly connected within the shortest distances, typically at a few steps. By quantifying the location of the individual account of a firm using the method of Hodge decomposition, we found that the Hodge potential of each node can describe the location in the entire flow of money from the upstream side to the downstream side, well characterized by the values of the potential. In particular, there is a significant correlation between the Hodge potentials and the net flows of incoming and outgoing money and links as well as the potentials and the walnut structure. This implies that we can characterize the net demand/supply of each node and decompose the flows into those due to the difference in potentials as well as divergence-free flows.
In addition, the network structure uncovered in this study can be used in predicting the default of firms. Particularly, because the financial information of small and mediumsized enterprises is often difficult to access, the credit risk management of banks will be improved by utilizing the information obtained from the network. Information on the network structure will be also useful in promoting the regional economy because the hubs of the GSCC can be firms playing a key role in the region. Studying the network of money flow can enable the prediction of what arises following an economic shock, which is essential in economic policymaking.
Furthermore, by using non-negative matrix factorization, we uncovered the fact that the entire flow can be considered as a combination of several significant factors. One factor has a feature whereby the remittance source is localized to the largest city in the region, while the destination is scattered. The other factors correspond to the economic activities specific to different local places, which can be interpreted as local activities of the economy.
We can consider several points that remain to be studied separately from the present work. While we aggregated the entire period in this paper, it would be interesting to determine how the network changes with time by examining the time-stamps recorded in every remittance. At time scales of days, weeks, and months, it is quite likely that there are intraday, weekly, and seasonal patterns of activities. More interestingly, under mild changes in the booms and busts of the regional economy on a relatively long time scale, the economic agents might change their behaviors possibly by changing peers in the transactions. Alternatively, under sudden changes due to natural disasters or pandemics, the agents can change their usual patterns abruptly. In other words, these are important aspects of a temporally changing network. Capturing such dynamic patterns of remittances allows banks to forecast the timing of transactions and manage their liquidity more effectively.
In addition, further investigation of the aspect of money flow amounts is warranted in the sense that the dominant driving force likely comes from "giant players" who demand or supply a large amount of money. Moreover, it would be interesting to select them in a subgraph by choosing only links with flow amounts that are larger than a certain threshold. These topics will be studied in our future work.

Appendix A: Network analysis
It would be beneficial to provide a set of stylized facts on the flow of money in our dataset of the bank accounts. In this appendix, we summarize the basic properties and statistics of the network.
A summary of network properties is given in • Clustering coefficient of a node i is defined as e i /(k i (k i -1)) where k i is the number of neighbors of i, and e i is the number of actually connected pairts between all neighbors of i. • Connectivity of a node i is the number of its neighbors (i.e., k i ). Neighborhood connectivity of a node i is defined as the average connectivity of all neighbors of i. • Average shortest path of a node i is the average length of a shortest path from i and any other node reachable from i. Let us denote it as L avg (i).  • Closeness of a node i is defined as 1/L avg (i) (i.e., the reciprocal of the average shortest path of i). • Eccentricity of a node i is the maximum non-infinite length of the shortest path between i and another node reachable from i in the network. • Betweenness of a node i, C b (i), is defined by C b (i) = s,t σ s,t (i)/σ s,t , where s and t are the starting and terminating nodes different from i, σ s,t is the number of shortest paths from s to t, and σ s,t (i) is the number of shortest paths from s to t such that i is on the path.

Appendix B: Hodge decomposition
As explained in the main text, Hodge decomposition plays an essential role in studying the network structure, by allowing the researchers to quantitatively order the nodes according to their connectivity to other nodes. One way to understand it to study some simple examples. One of the most simple but nontrivial one is illustrated in Fig. B.1. The network illustrated on the most left-hand-side ("Original Flow") is made of the three nodes with the given flow. The flows are decomposed to "Circular flow" and "Gradient Flow" as are illustrated. Sum of the two flows are equal to the original flow: For example, from the node no.1 to the node no.2, circular flow is -1/3 (as it is +1/3 in the other direction) and the gradient flow is +4/3, which adds up to 1, the value of the original flow. Also, the gradient flow satisfies the property (7). Furthermore, the gradient flow satisfies Eq. (8) with all the weights equal to one (w ij = 1) and the Hodge potential (φ i ) = (+2/3, -2/3, 0). Figure B.2 shows the visualization of this network with the use of the Hodge potential (φ i ) as vertical coordinate. In this illustration it is straightforward to see that gradient flows are equal to the difference of the Hodge potentials of the relevant nodes.  visualization of the same network with the vertical coordinate determined by the Hodge potential and the horizontal coordinate determined by the spring-charge method.
In Fig. B.3, the nodes are placed in a left-right symmetric manner on the left panel, although the links do not have the same symmetry. The nodes no.1 and no.3 are placed in same vertical position. If one used the total out-flow as a measure of the rank, they would be placed just like this, as both of them have the total out-flow equal to three. The right panel, however, shows a different picture: Nodes no.1 and no.3 are placed at different heights, due to the difference in their Hodge potential, which again is due to the difference in the way they are connected to other nodes.
The example in Fig. B.3 shows the power of the Hodge decomposition in a different manner: On the left-panel, we do not see any symmetry and the roles of the nodes are not apparent. On the contrary, the right panels shows the left-right symmetry except for the node no.6. Nodes no.1 and no.5 plays very similar role in this network, the only difference being that no.1 is connected to no.6. Same is true for the nodes no.4 and no.3. Without the use of the Hodge decomposition this fact is rather difficult to see.
As seen in these examples, the Hodge potential plays an important role in clarifying the whole structure of the network.

Appendix C: NMF basis vectors: spatial concentration
As we showed in the main text, the NMF basis vector's components are concentrated in small geographical regions, because of the fact that the vector components have peaks at specific locations in most cases. In this appendix, we shall quantify the concentration and the peaks, and show results.
Recall that the entire region was divided into L by L small squares R ( = 1, 2, . . . , L 2 ) in a lattice grid, where we set L = 100. A basis vector v, which is either a column vector w k of W or a row vector h k of H, has its components at the indices , each of which corresponds to a different location R . Because all components of the basis vector v are non-negative by construction of the NMF, the v's vector components can be represented as a heatmap in the geographical region. This representation was actually used in Fig. 11, Fig. 12, Fig. 13, and Fig. 14.
Let r be the center of the square R , and C be a circle centered at r with a certain radius. We choose the radius as 5 km in order to avoid overlapping of circles. The choice of the radius is not essential because the circle is not related to the NMF and is used only for quantification of geographically localized structure. For a vector v and a circle C , let us define In all the cases (except h k for k = 7), the basis vectors w k and h k have such peaks meaning that the corresponding sources and destinations are well localized in the geographical region. The circles shown in the figures of Fig. 11, Fig. 12, Fig. 13, and Fig. 14 are given by this procedure. Figure C.1 shows the levels of concentration β max (v) for all the pair of w k and h k depicted together as 10 circles corresponding to k = 1, 2, . . . , K = 10 different NMF components. The numbers in the circles are the levels of concentration. The levels of concentration are more than 23% except for one basis vector in both figures of the source and destination; therefore, most basis vectors of bank transfers are localized geographically. Since the positions of the circular areas are around the centers of cities, geographically localized properties are thought to reflect the economic activity in those local areas. This is how we identified city names in each boxes of Fig. 11, Fig. 12, Fig. 13, and Fig. 14. The single exception is the basis vector h k for k = 7, for which the level is only 9%. This means that the destinations are spread over the prefecture of Shiga and also Kyoto.