Geography versus topology in the European Ownership Network

In this paper, we investigate the network of ownership relationships among European firms and its embedding in the geographical space. We carry out a detailed analysis of geographical distances between pairs of nodes, connected by edges or by shortest paths of varying length. In particular, we study the relation between geographical distance and network distance in comparison with a random spatial network model. While the distribution of geographical distance can be fairly well reproduced, important deviations appear in the network distance and in the size of the largest strongly connected component. Our results show that geographical factors allow us to capture several features of the network, while the deviations quantify the effect of additional economic factors at work in shaping the topology. The analysis is relevant to other types of geographically embedded networks and sheds light on the link formation process in the presence of spatial constraints.


Introduction
A large body of work has shown how the theory of complex networks can be fruitfully applied to a wide range of domains [1][2][3]. The case when complex networks are embedded in a geometric space (e.g. geographical) has been attracting growing attention (for a recent extensive review, see [4]). Indeed, when the spatial location of the nodes plays a role in the formation of links, the geometric space contributes to shaping the topology of the network [5][6][7] and, as a consequence, affects the dynamics that may take place on the network.
Most of the works on spatially embedded complex networks have focused on the geographical distance between first neighbours. However, in all networks in which information or matter flows along links, it is of interest to understand how the geographical distance of nodes that are higher-order neighbours relates to the length of their shortest path. In this paper, we investigate the relation between geographical distance and network distance. In particular, we study how the geographical distance between higher-order neighbours is predicted by the distance between first neighbours and how this affects the small world (SW) properties of the network. We address these issues in an empirical dataset of a directed and weighted network using high-resolution geographical locations.
Empirical works have investigated several contexts in which geographical space is relevant, including transportation services (e.g. airport network), mobility (e.g. commuting), infrastructures (e.g. power grids, roads and the Internet) and social networks (e.g. friendship and phone calls). For instance, the worldwide airport network has been thoroughly investigated [8,9], as well as various public transportation services at the national [10] or metropolitan level [11]. In all the above-mentioned contexts, the effect of space is that links are associated with a cost growing with geographical distance. As a result, the probability that two given nodes are connected decreases exponentially with distance. A related fact that emerges in a number of contexts is the so-called 'gravity law', originally introduced in the geography literature during the 1960s. According to this law, the intensity of the connection between two nodes is proportional to the attractivity of the nodes and is inversely proportional to the square (or variant exponents) of the distance. 3 From a theoretical point of view, models of spatial networks can be grouped into several classes [4]. Typically, nodes are assigned a given location either according to a convenient geometry (e.g. a lattice) or a distribution (e.g. uniformly scattered) or according to some empirical pattern (e.g. the airport location). On the other hand, links are usually assigned stochastically according to a probability that depends on the distance between two nodes. Some models assume that the network grows with a preferential attachment so that the linking probability depends also on the current degree of the nodes. In general, as is done in fitness models, the linking probability depends on a state variable associated with the nodes. In turn, this can be quenched or is dynamically evolving with the topology. Overall, many models have focused on reproducing the scale-free nature of the degree distribution together with the spatial distribution of the nodes [6,12]. In addition, others offer some insights into how spatial location may constrain the network topology [13].
Among the domains in which geographically embedded networks have been investigated, the case of economic networks has so far received very little attention. Yet, geographical space plays an important role in economic activities, as witnessed by a rich stream of works on economic geography [14]. In particular, in this field there is an open debate on whether a firm's spatial location (e.g. proximity between firms) is more important than the firm's nonspatial relationships (e.g. R&D collaborations, ownership and venture capital, board interlock) for its competitiveness [15]. For instance, it has been shown that certain corporate practices (i.e. the 'poison pill' versus the 'golden parachute') spread among firms via interlock links, whereas others do so via proximity (i.e. firms simply adopt the practice of nearby firms) [16]. Moreover, a related issue is the extent to which network relations enable economic actors to access knowledge beyond their organizational and geographical boundaries [14,17,18]. Finally, economic networks have been recognized to be one of the main research challenges in the field of complex networks [19].
In this paper, we analyse the ownership network of European firms in relation to their geographical location. An ownership relation refers to the fact that a firm owns fully or in part another firm. Ownership plays a role not only in the control that shareholders can exert over a firm [20], but also in the spread of information and business practices in the corporate community. Several properties of these networks have been investigated so far. First, small world (SW) properties have been observed in national samples [21,22]. Moreover, the scale-free nature of degree and scaling laws, relating degree and portfolio volumes, have been observed in various stock markets [20,23], as well as in foreign direct investments [24]. To some extent, the geographical information has been considered in [24], where ownership links are aggregated within regions at the NUTS3 level (i.e. roughly corresponding to provinces). Similarly, in [25] the aggregation is at the level of countries, and the links are used to simulate the propagation of financial crises worldwide. However, to our knowledge no previous study has used the geographical location of firms at the postal code level in a large sample.
In our analysis, we find an exponential decay of the linking probability with the distance, similar to what was found in transportation and infrastructure networks. We observe, instead, deviations from the gravity law in the weights of the links. However, the main contribution of the paper is to go beyond the level of first neighbours and to analyse the relationship between network distance and geographical distance. So far, a study of this kind has been carried out only for the metropolitan transportation network of Berlin [11], in which distances are on a much smaller scale. We compare our empirical results with those obtained with a random spatial model (RSM) that belongs to the class of geometric graph models [4]. We keep the empirical location of nodes and we apply a link reshuffling procedure that preserves the degree sequence and is also subject to a tunable spatial constraint. We then analyse how SW properties are affected by geographical distance (i.e. we look at the shortest path of nodes located at different geographical distances). We find that the empirical network is a 'smaller world' than it would be without spatial constraints.
In addition, we analyse how the distribution of geographical distance, which is exponential among first neighbours, is affected when we look at pairs of nodes at higher network distance. Such a distribution turns out to be fairly well reproduced by the RSM. Overall, our results show that geographical space allows the capture of a number of features of ownership networks. Nevertheless, the important discrepancies observed between the data and the model point to the role played by economic factors. In particular, the organization of economic control in the empirical data is significantly more hierachical than would emerge solely from geographical factors.

Data
The dataset covers about 200 000 firms located in 14 EU countries (see table 2), their geographical location at postal code level and about 188 000 ownership relationships among them. Ownership data are obtained from the Orbis database of the Bureau Van Dijk's, 2007 release, by selecting those firms that have at least one ownership relationship with another firm in Europe (i.e. firms owning shares or having a shareholder firm in the same country or in another EU country). The geographical coordinates of the postal codes of the firms are obtained from the database TeleAtlas GIS 4 . Our dataset allows us to construct the European ownership network, in which the nodes correspond to firms and the links correspond to ownership relationships. The adjacency matrix of the network is defined as A i j , where A i j = 1 if there is an edge (or link) i → j. The network is directed (i.e. A is not symmetric) and the direction is taken with the following convention: A i j implies that i has a share of j. The network is also weighted and thus it is associated with the matrix W , with W i j ∈ (0 , 1] corresponding to the ownership share, i.e. the fraction of the value of j owned by i. There is a constraint on the weights of the incoming links i W i j = 1, but not on the outgoing links, because a single firm can have large shares in many other firms.
A peculiar aspect of these data concerns the paths in the network. We recall that a path from two nodes i and j not directly connected is a sequence of adjacent nodes from i to j. In many networks, it is not possible to assign a precise meaning to paths. For instance, in the airport network, while a connection between airports A and B reflects the number of connecting nonstop flights per day from A to B, one cannot derive much information on indirect connections, i.e. on the number of passengers who travel from airports A to B and then to airport C in the same journey. In contrast to ownership relationships, if firm A owns W AB shares of B and B owns W BC shares of C, this implies an indirect relationship in which A owns indirectly W AB W BC shares of C [20].

Link reshuffling and spatial constraints
Throughout the paper, we aim at assessing the extent to which the empirical properties of the European ownership network are consistent with those of a network obtained from a random link formation process. In this section, we introduce an algorithm to reshuffle the links in the network, taking into account some structural and geographical constraints.
Let us denote the network by the graph G = (V, E), where V is the set of vertexes and E = {s, t} the set of edges connecting the vertexes, with s being the origin (or source) vertexes and t the target vertexes of the link (s, t). Let W s,t be the share of firm t held by firm s. There are no loops in the network, i.e. s = t in the link (s, t). We want to construct a randomized network that preserves the following properties: 1. Degree sequence, i.e. the number of outgoing and incoming links of each node. 2. Ownership share. As mentioned in section 2, the total weight of incoming links has to sum up to one. 3. Geographical location. If geography matters, then constraining the origin and the destination of a link should allow us to recover some structural properties of the network.
As in the Maslov-Sneppen algorithm [26][27][28], the basic idea is to choose random pairs of links and swap the sources of the two links. This automatically satisfies constraints 1-2, provided that each new link of the pair is accepted if and only if: (i) it is not a loop-the no-loop condition; and (ii) it does not already exist-the no-multiple-links condition. A possible implementation consists in randomly choosing pairs of links and rejecting those swaps that violate the no-loop and no-multiple-links conditions. A more instructive way is to determine the pairs of links that need to be excluded a priori from the set of those eligible to be swapped. Consider the two links (s 1 , t 1 ) and (s 2 , t 2 ); the patterns to be excluded are listed below: • coinciding sources: s 1 = s 2 (case 1); • coinciding targets: t 1 = t 2 (case 2); • chains: t 1 = s 2 or s 1 = t 2 (case 3); • Z-pattern: the edge (s 2 , t 1 ) exists in the original network (case 4); • Z-mirror pattern: the edge s 1 = t 2 exists in the original network (case 5). An example of each pattern is provided in table 1, where it is straightforward to see that swapping the source nodes s 1 and s 2 would create loops or multiple links or both. One can also verify that these are all the possible patterns to be excluded. After excluding the patterns above, we are sure that each link either will be swapped or cannot be swapped in the current configuration of the network. Compared with the Maslov-Sneppen algorithm, this method makes more apparent the topological limits to the reshuffling procedure. Incidentally, it is enough to loop once along the list of links to ensure that all the eligible links have been effectively swapped.
The randomization procedure described above, which we call the random direct model (RDM), does not consider the geographical location. In each swap, the source s 2 is chosen randomly among all the existing sources in the list of eligible links. Clearly, sources with large out-degree have higher probability of being selected, because they occur in several links. But their location does not matter. We thus introduce the RSM, in which the choice of the source The source, s 1 1 4 5 7 8 9 10 10 13 14 13 The target, t 2 3 6 6 8 7 11 12 11 15 16 16 s 2 is stochastically dependent on its geographical proximity to source s 1 , according to an exponentially decaying probability, where d(s 1 , s 2 ) denotes the distance between the two sources to be swapped and d c is a characteristic distance. The value of d c modulates the role of geographical distance. For d c → 0, the source s 2 tends be the firm closest to s 1 among all the eligible sources. For d c → ∞, the source s 2 tends be chosen randomly and in this limit RSM coincides with RDM. Finally, it should be noted that the procedure of the RSM described above shares some basic principles with the class of geometric graph models, as classified in [4]. In particular, in [6], nodes are located in a d-dimensional lattice and are assigned the degree from a given distribution. Then, links are formed by connecting each node to nodes within a given radius until the degree is reached. In contrast, in [12] the probability of forming a link is a power law of the distance with exponent δ, which plays the role of control parameter. Geometric graph models have also been combined with the fitness model [29]. In [30] links are formed depending both on the distance and on the fitness values associated with the two nodes of each pair.
In the present work, the location of nodes is taken from the empirical data of the firm's location. Thus, the concentration of nodes in space is quite heterogeneous (e.g. high density in metropolitan areas and low density in rural areas). The degree is also taken from the empirical data and follows approximately a power law. The probability of forming a link decays exponentially with the distance. Our aim is not to reproduce the distribution of degree nor the distribution of the distance between first neighbours. Instead, we use both degree and distance as input to determine the properties they induce in terms of: (1) geographical distance between higher-order neighbours, (2) small world and (3) connected components.

Basic network statistics
As a preliminary step, we perform a coarse grain analysis of the location of sources and destinations of links. For each country r , we compute an integration index, defined as the ratio of intra-country links over the total number of links departing from that country, i∈c r , j∈c r A i j / i∈c r , j A i j , where c r is the set of firms located in country r . This is a common indicator of economic integration in the economic geography literature [31]. We extend this index to a weighted integration index by taking into account the weights of the links, i∈c r , j∈c r W i j / i∈c r , j W i j . In both cases, we find that about 80% of the ownership relations (both in number and in weight) are intra-country (i.e. the owner and the owned are in the same country); see table 2. This first result implies that, already at the level of countries, a preference for links to be formed at short distance is apparent. Notable exceptions are Switzerland and Luxembourg. A plausible explanation is that many companies set their headquarters in these Moving to the basic measures of network connectivity, since this is a directed network, we have to distinguish between the in-degree of a node, denoted by k in (i.e. the number of incoming links), and the out-degree, denoted by k out (i.e. the number of outgoing links). As shown in figure 1 (left), in our dataset the out-degree has a very broad distribution. Note that the tail of the distribution deviates from the straight line that fits the bulk of the data. This means that the frequency of firms with out-degree in the upper end of the range exceeds what could be expected even based on a power-law behaviour. In the context of ownership, the out-degree corresponds to the number of firms in which a given shareholder owns shares. This number is related to the level of portfolio diversification of a shareholder [20], and it is known to correlate with the volume of the portfolio of the shareholder. Thus, the result is essentially in line with what was previously observed in other ownership datasets [20,23,24]. On the other hand, indegree corresponds to the number of shareholders of a firm and, consequently, to the level of ownership concentration, although here weights have to be taken carefully into account [20]. In our dataset, as in previously studied ones, the in-degree distribution is bounded because of the reporting system: only important shareholders are recorded in the database. Thus, the distribution of in-degree displays a clear cut-off.
We then analyse the weight of the ownership relations. Interestingly, the empirical observations suggest a tri-modal distribution of weights. As shown in figure 1 (right), the majority of the relations imply full ownership (e.g. 100% of the shares) or a very small share, while there are a minor, but considerable, number of links with weight just above 50%. This evidence can be interpreted as follows. The first pattern corresponds to a relationship between a holding company and a subsidiary. The second one corresponds to an investment without the aim of controlling the target firm. The third one corresponds to an obvious intention to control the target firm without owning all the shares, since 50% + is the minimum share that allows full control. In terms of components, the largest weakly connected component (LWCC) in the network comprises 32.6% (70 070) of all firms. Since all the other components are significantly smaller (the second largest one contains only 134 nodes), from now onwards we focus only on the LWCC.

Geographical distance along links
We first analyse the geographical distance between all pairs of nodes connected by a link. As shown in figure 2 (left), we find that the frequency distribution of links decreases exponentially with the distance: P i j ∝ exp(−d i j /d * c ), with d * c = 1/0.0027 = 370.37 km (R 2 = 0.949). This result is consistent with previous empirical works on geographically embedded networks. The airport [7] and the Internet router network [32,33] also display exponential decay in the distance probability. This finding should not be confused with the so-called 'gravity law' from the geography literature [34,35], which predicts that the strength of a relationship between two entities positioned in the geographical space decreases as a power law of their distance: where m i and m j represent the attractivity of the entities and α is a parameter. In our context, the role of attractivity could be played by some proxy of the size of the firms, which unfortunately was not available in our dataset. For the sake of completeness, we analysed how the link weight depends on the distance, assuming that m i = m j = 1. Figure 2 (right) shows the frequencies of edges between two firms at a given distance and with a given weight value. For a given geographical distance interval, the colour code indicates how the edge weights are distributed. We can see that for small distances (i.e. roughly <d * c ≈ 370 km), most ownership relationships are associated with shares close to 100%. This would suggest that many holding companies have subsidiary firms located nearby, e.g. in the same city. In many cases, a subsidiary is located in the same postal code as the owner company; consequently, the two companies share the same headquarter location. In contrast, at larger distance, most relationships involve small shares (<5%). This finding may be due to the fact that at short distances, companies have better possibilities to monitor the management of the companies they own and to influence their decisions. The investment is perceived to be safer and thus a higher share is invested. In contrast, longer distances may imply differences in culture and legal settings other than supervision difficulties, making the investment more risky. Consequently, shareholders are willing to hold smaller shares. However, it is important to observe that about 30% of the relationships are still associated with shares of more than 95%. There is also a smaller but not negligible proportion of shares of around 50% that, similarly, remains constant with distance. Even by averaging weight values in each distance bin, we do not obtain a monotonic decay of the weight with the distance. Thus, our results seem to deviate from the prediction of the gravity law.

A small geographical world?
SW properties have been found in most empirical complex networks, including networks of corporate board membership [36] and ownership [21,22]. We first verify that the European Ownership Network is also a SW; we then investigate how this property interacts with geographical distance.

Directed small world (SW).
A weaker definition of SW is that the average network distance (i.e. the average path length) of the graph is comparable to that of a random graph of the same size. A stronger definition of SW implies that, in addition, the network has a high clustering coefficient when compared with a random graph [37]. We recall that the clustering coefficient is the average ratio of the number of existing links among the neighbours of a node i to the number of all their possible links (see [38]): where k tot i is the total degree of i and k ↔ i is the number of bilateral edges between i and its neighbours 5 . On the other hand, the average path length is the average number of edges along all the shortest paths connecting all pairs of nodes, with i j being the shortest path connecting i and j [3]. As benchmarks to compare the values of C and¯ , we consider two synthetic networks. The first one is an undirected Erdös-Rény random graph (Erdös-Rény model (ERM)) [2] with the same number of nodes and edges as that in the empirical network. The second one is generated with the reshuffling method described in section 3 and is referred to as RDM 6 . In the case of ERM, analytical expected values of clustering C and average path length¯ are known. For the former, it is simply C = k/n, where k is the average degree and n the number of nodes. For the latter,¯ = ln(n)/ln(k) [3]. In the case of the reshuffled network, we can measure the quantities only empirically. As shown in table 3, we find that¯ actual ¯ random and C actual C random for both synthetic benchmarks.¯ takes similar values in the three networks. In contrast, C in the empirical networks is almost 10 3 times larger than that in the ERG and about 20 times larger than that in the reshuffled network RDM. The difference in C between ERM and RDM reflects the impact of imposing the constraint on the degree sequence. Our result is in line with those of [21,22,39].

SW properties at varying geographical distance.
The fact that the European ownership network is an SW (in the stronger sense) suggests that the geographical location of companies does not matter much for the exchange of business practices and information, and for exerting corporate control. Indeed, the existence of shortcuts shrinks the network distance, allowing firms to reach other firms located far away in the network. An important question that, to our knowledge, has not been addressed so far is to what extent these shortcuts also reduce the geographical distance. In particular, when a network is embedded in geographical space, we can ask how the typical network distance depends on the geographical distance. Since most of the links are between nodes at short distance, one could expect that the further two nodes are geographically apart, the larger is the shortest path that connects them. Figure 3 (top left) shows the distribution of network distance for several values of geographical distance. We select all pairs of nodes located within a given interval of geographical distance d and we plot the distribution of their shortest path length . The empirical distribution of shifts slightly towards the right for increasing values of geographical distance. Moreover, we observe that at large geographical distance, some probability mass builds up on values of around [15][16][17][18][19][20][21][22][23][24][25]. This means that, proportionally, among pairs that are far away geographically, longer paths are more probable. However, the distribution is essentially stable, which means that the SW property (in the weak sense) is robust with respect to geography. As a comparison, in the random benchmark RDM all distributions coincide. This is not surprising as, by construction, links are assigned without dependence on the geographical distance. On the other hand, the RSM reproduces qualitatively the shifts to the right in the distribution, but does not reproduce the distribution itself. As expected, with increasing characteristic distance d c , the distribution becomes closer to that obtained with RDM. However, even at very small d c , the distributions are broader and are centred at higher values of . Figure 4 compares directly the empirical distribution with those obtained with the various models. Interestingly, in the empirical network, the network distance tends to be smaller than in its randomized versions, especially when nodes are geographically close. Note that this holds even if the randomization takes geographical distance into account. Indeed, in figure 4 (left), the distribution of the RSM model is to the right of the empirical one at both small and large geographical distance (recall that RDM coincides with RSM at infinite d c ). The values of mean network distance (with their respective error of the mean) are: 7.647 36 ± 0.000 69 for the empirical network and 9.832 33 ± 0.000 20 for RSM (1 km). These findings suggest that (i) there is a strong SW property, especially at short geographical distance, and that (ii) SW are quite stable across geographical distance.
An analysis similar to that described above has been carried out only in a few previous works, e.g. in the case of metropolitan public transportation networks. There, the distance between initial and final stations of passengers' journeys scales as a power law of the path length (i.e. the number of stations along the journey) [11]. This behaviour reflects the fact that journeys are typically planned with the purpose of travelling the longest possible distance along the shortest number of stations. In contrast, in our network, the path length is only slowly growing with the geographical distance. This finding means that when firms acquire shares of other firms, they build ownership chains that tend to depart from the origin, but typically are not intended to travel long distances in the shortest number of steps.
From a theoretical point of view, it is interesting to mention here the model of [12] in which, depending on the control parameter δ and the dimension d of the space in which the network is embedded, three different qualitative topological regimes are observed, i.e. random graph for δ < d, lattice-like for δ > 2d and scale-free in the interval d < δ < 2d. In our case, d = 2 and the distribution of distances is exponential, i.e. the closest case is δ > 3. This would imply that our network is located at the border between the scale-free regime and the lattice regime. Indeed in our network a majority of the links are at short distance as in a lattice, but the degree is definitely scale-free. There are, however, important differences between the two models in the way the network is constructed, so an understanding of the relations between their results is deferred to future investigations.

Geographical distance at varying network distance
Finally, we investigate how geographical distance between two nodes i and j connected by a path varies with their network distance i j . We already studied in section 5.1 the case = 1, i.e. when the two nodes are directly connected, and we found an exponential decay with the geographical distance. We now ask how the geographical distance is distributed when nodes are located at different network distances. One may expect that, for nodes located at large network distance, geography does not matter any more. Then, their geographical distance should be distributed as a random variable subject only to the constraint imposed by the distribution of locations.
Thus, two further questions arise. Firstly, given that for = 1 it is an exponential, how does the empirical distribution evolve to the random case? Secondly and more important, is the distribution of distance induced by the RSM consistent with the empirical one? In other words, is the RSM able to reproduce the distribution of geographical distance between nodes that are connected not directly by a link, but via a longer path? Figure 5 shows that the empirical distribution evolves from the exponential one to a bimodal one. The location of firms in the space reflects the specific geographical characteristics of the countries included in the dataset. For instance, the concentration of firms in urban areas that are disjoint may result in links associated with values of distance that are either small (i.e. two firms in the same urban area) or larger than a minimal distance (i.e. two firms in two different areas). In other words, the bimodality of the distribution may be explained by the fact that firms seek to have relations in the next closest large city, which typically is not closer than 200 km. Hence the gap and the hump in the distribution. The presence of geographical constraints such as mountains, seas, rivers, etc may also contribute to shaping the distribution in a similar way. Moreover, as seen in section 4, firms tend to prefer links within their respective national boundaries because of legal settings and cultural similarities. This could create a sort of finite-size effect that could be absent or smaller in a larger country such as the US. However, even a large country is divided by administrative boundaries that may have a similar effect. In any case, we are not able to quantify this effect with our data. From ≈ 8 onwards, the empirical distribution starts looking like the random one. As shown in figure 5, at large network distance, the distribution in the empirical network and in RSM tends to coincide with that of RDM.
Moreover, as shown in figure 6, the distribution obtained with RSM follows quite closely the empirical one when d c < 100 km. At = 1, this is expected because, in the limit of d c → 0, RSM replaces the source of a link with the closest eligible source and therefore the new value of distance for a given pair is close to the original value if the area is densely populated. In contrast, the match observed at > 1 is not a trivial finding, implying that the network distance induced by the RSM is roughly correct.
As we have seen, RSM is able to fairly reproduce the empirical distribution of geographical distance, for different levels of network distance. In contrast, there are significant deviations in the distribution of network distance. These deviations are also reflected in other topological properties, as reported in table 4. In the randomization process, the size of the largest weakly connected component is preserved within a few per cent. However, the number of strongly connected components (SCCs) and the size of the largest one (LSCC) are radically modified. The empirical network contains 1544 SCC (most with less than 10 nodes) with a LSCC consisting of 502 nodes. In contrast, RSM yields an LSCC that is at least seven times larger. For Table 4. Network and geographical statistics.
types of networks embedded in geography. In contrast, the dependence of link weights on the geographical distance is not monotonic and seems to be at odds with the gravity law, which is commonly assumed in the geography literature.
An important contribution of this paper is to look also at geographical distance between nodes that are not first neighbours, but are connected via paths of length > 1. As increases, the distribution evolves from an exponential decay to a bimodal distribution that reflects the specific geographical location of firms in the EU. On the one hand, the distribution of geographical distance is qualitatively reproduced, even at > 1, by an RSM that reshuffles the links in the network with a likelihood that depends on the geographical distance between the nodes to be swapped. On the other hand, the empirical distribution of the network distance, for a given geographical distance, is not well reproduced by such a model. Empirical distances tend to be shorter than in the model, implying a world 'smaller' than it would be by chance, especially at short geographical distance. This is reflected also by deviations in other topological quantities. In particular, the empirical network is characterized by a largest SCC much smaller than would be obtained in the benchmark models. Such a deviation quantifies the extent to which, in addition to geographical ones, economic mechanisms are at work. Their effect seems to lead to a more hierarchical organization of the network.
The relevance of our results for the economic geography literature lies in the investigation of the extent to which the geographical factor is able to explain the organization of links among firms. In particular, we find that the structure of the European ownership network is only partially explained by the geographical location of firms. Although geography matters a great deal, firms establish ownership relations also on the basis of socio-economic factors. For example, additional variables to be taken into account are the size and the sector of the firms or the presence of special structures such as 'pyramids' and 'cross-shareholdings', which firms build up in order to access corporate control and fiscal advantages [40].
The analysis presented in this paper can be applied to other kinds of geographically embedded complex networks. A comparison of the results across domains could contribute to a better understanding of the mechanisms underlying the formation of spatially embedded networks.