Transfer market activities and sportive performance in European first football leagues: A dynamic network approach

Professional football is a globalized game in which players are the most valuable assets for clubs. In this study, we explore the evolution of the football players’ transfer network among 21 European first leagues between the seasons 1996/1997 and 2015/2016. From a topological point of view, we show that this network achieved an upper limit expansion around season 2007/2008, thereafter becoming more connected and dense. Using a machine learning approach based on Self-Organizing Maps and Principal Component Analysis we confirm that European competitions, such as the UEFA Champions League or UEFA Europa League, are indeed a “money game” where the clubs with the highest transfer spending achieve better sportive performance. Some clubs’ transfer market activities also affect domestic performance. We conclude from our findings that the relationship between transfer spending and domestic or international sportive performance might lead to substantial inequality between clubs and leagues, while potentially creating a virtuous (vicious) circle in which these variables reinforce (weaken) each other.


Introduction
Professional football is regarded as the most popular sport throughout the world, famous for both its players and its clubs. Several football players are recognized as international superstars, including Cristiano Ronaldo, Lionel Messi, Neymar, or Paul Pogba, while less renowned players are also essential to delivering the final team 'product' of football on the pitch. Accordingly, players constitute the most valuable asset for football clubs, particularly for top clubs such as Real Madrid, FC Barcelona, Paris Saint-Germain, or Manchester United. Regardless of the reputation of players or clubs, the fundamental aim of both is to achieve outstanding sportive performance, as it is the essence of the clubs' financial performance and survival.
Sportive performance in football is ultimately achieved by a squad of approximately 24 players and the corresponding team of trainers. Thereby, a football team can be built in two a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 The aim of the paper is to analyze the relationship between the clubs' transfer market activities and sportive performance. Thereby, the contributions of the paper are twofold. First, we apply a network approach and extend previous analyses by introducing a dynamic perspective on the topological characteristics of the European transfer network over 20 years. Second, the machine learning approach based on SOMs and PCA allow us to explore the relationships between clubs' transfer market activities and sportive performance over time. Our results describe the evolution of the transfer network and show that financial resources used to acquire football players are a decisive variable in explaining sportive performance in some domestic leagues but not in others. Remarkably, only clubs that invest substantial financial resources reach top positions at UEFA level.
The remainder of this article is organized as follows. First, we introduce the dataset and methodology. Second, we present the results of our network analyses. Here, we illustrate the connection between transfer market activities and sportive performance. We conclude with our key findings, discuss the inequality in financial resources among leagues and clubs, and provide future research directions.

Data and methodology
We studied the football player transfer market activities among European first leagues from 21 countries ( Of these, the English Premier League, Spanish LaLiga, German Bundesliga, Italian Serie A, and French Ligue 1 are the most prominent leagues, often referred to as the 'big five' leagues in Europe. Within this time period, the studied transfer market activities include more than 135,000 transfers to and from football clubs of these 21 leagues and other national or international leagues.
For every transfer, we collected data including the specification of the respective transfer, such as the club before and after the actual transfer, transfer fee or loan fee, as well as player market value, field position, age, and nationality. This dataset was enriched with clubs' national performance, measured as end of season league ranks, and international performance, measured with the UEFA club coefficients. The latter is based on a club's points obtained at European club competitions such as the UEFA Champions League and UEFA Europa League. The transfer market data was retrieved from the transfer market website www.transfermarkt. de. National performance data was derived from www.transfermarkt.de, www.kicker.de, www. soccerway.com, www.weltfussball.de or www.weltfussball.com, while international performance data was sourced from www.uefa.com and https://kassiesa.home.xs4all.nl.
With the above information, a transfer network was constructed for every season with clubs as nodes and player transfers among clubs/nodes during the respective season as links. The data included also transfers towards the end of the players' careers. Hence, a node called 'end of career' was included. In addition, missing information regarding source and target clubs was found in the dataset. Thus, an 'unknown' node was included. However, transfers from or to these nodes only accounted for less than 0.3% of all transfers. Within this network, links were weighted by the fees paid (or received) for each transfer. The network included 'core nodes', which incorporated the EFL21 clubs, and 'neighboring nodes', which involved clubs that received or delivered players from or to the 'core nodes' respectively. Note that not only players moved throughout every season, but 'core nodes' also changed due to promotions or relegations of clubs to or from the respective first league. By studying the yearly constructed networks sequentially over time, a dynamic view of the transfer market was generated. In sum, for each node and their links the following attributes were accumulated: • Nodes (clubs): Name, country, transfer spending (transfer spending is the amount of money each club has spent in transfers in each season, while transfer earnings represent the amount of money each club has generated from transfers in each season), transfer fee volume (transfer spending + transfer earnings), league, domestic rank, UEFA points, continent • Links (players transferred): Transfer fee, name, age, nationality, field position We calculated several of the most commonly used measures of complex network properties. For this, unweighted directed networks for each season were generated. As such, the weights from transfer/loan fees were neglected since the analysis focused purely on the topological properties of the transfer network. Based on the R package igraph, important network measures were calculated, such as the Average Path Length (APL), the Density of Links (Density), and the average Clustering Coefficient (CC) [15]. The APL is defined as the average of the shortest paths between every pair of nodes in the network in terms of the number of steps along the nodes. In this sense, low values of the APL implied a highly connected topology. Density is defined as the ratio of the actual number of links in the network to the number of all possible links among the network nodes. Lastly, the average CC quantifies the degree to which network's nodes are inclined to cluster together. High values of CC implies the existence of rather dense subgroups of network nodes. Extended explanations on network property measures can be found in S1 Text.
Further, a machine learning approach was employed to deal with the multivariate information attached to every club. For this, the following five variables of every EFL21 club in every season were taken into account: • Transfer spending • Transfer balance: Transfer earnings from minus transfer spending on player transfers in the respective season.
• Domestic rank • UEFA points • Relative transfer spending: A coefficient of relative transfer spending in relation to the overall transfer spending in the corresponding league during the respective season.
We tracked the evolution of these five variables associated to every club across the entire time period. The best way to map such five-dimensional data visually while maintaining the original closeness relations is through the Kohonen SOMs [13,14,16]. This algorithm approximately preserves neighboring relationships among points (i.e. clubs' characteristics) in the high-dimensional input space when mapped onto a two-dimensional space. In this way, original patterns of the (dis-) similarities among clubs and relationships with the variables considered are visualized in a two-dimensional plot. With Euclidean distance, all variables were normalized. Kohonen self-organizing maps have been extensively applied to clustering problems and data exploration in linguistic, artificial intelligence, natural sciences, and more recently, business and finance contexts [17,18]. As a final step, PCA [16] was employed to quantify the relative weight of each variable. In addition, the domestic rank variable was rescaled to assign higher numbers to better final positions. A 'one' was assigned to the top position and a 'zero' to the last position, while linear interpolation was conducted for the positions in-between.  Fig 2). Throughout most regions, we observe similar patterns. However, transfers from/to Asian clubs also increase after season 2008/2009. In contrast, African clubs experience an extraordinary decline in their transfer market activity with EFL21 clubs. Interestingly, an increasing number of players of African descent have started to return to Africa recently in order to play in their home regions [19]. Overall, taking into account only the number of transfers and clubs involved, the network seems to have reached a tipping point just at the time of the recent financial crisis, while only the interaction with Asian clubs presents an exception. S1 Fig shows the chained index for the same geographical regions, which includes a comparison among successive calculation periods and, therefore, focuses on the network evolution from year-to-year. Similarly as before, this figure shows that most regions observe a downward trend over time, while Asia and Oceania remain as exceptions. afterwards. However, the ratio of the number of transfers to the number of clubs (Gsize/Gorder) displays a steady increase from the beginning of the analyzed period until 2016. Concordantly, the evolution of Density shows a continuous decline while the number of clubs and transfers involved in the network increase before 2007/2008, and reveal a sharp uplift thereafter. Consequently, after a period of a growth with less connection, the network begins to slowly diminish in size, reaching a higher grade of connectivity. This development is also apparent in the APL which shows a steady and rather strong decrease (20% since 1996), implying an increasing connectedness of the transfer network where nodes get 'closer' to each other over time. Note also the increase in average CC, which has almost doubled in value since the beginning of the period analyzed. Hence, the transfer network evolves towards a clusterized network. These findings imply that the transfer network evolves towards a small-world network [20], as anticipated by Liu et al. [12]. Likewise, the average number of incoming (degreein) and outgoing links (degree-out) per club has slowly grown throughout the time period. Finally, the CC has doubled its initial value at the end of the period. The initial CC value (season 1996) was around 0.  Transfer market activities and sportive performance in European first football leagues chained index of the same measures. APL and Density measures experience an upward trend, reinforcing the idea of a potential transformation from a dynamic to a more connected network.

Transfer market activities and sportive performance
In this Section we study the relation between transfer market activities and international sportive performance at both league and club levels. Fig 4 shows the correlations between transfer spending and international sportive performance, measured by UEFA points. As such, the blue line shows the correlations between the net amount of transfer spending by all clubs of a league and the corresponding total amount of UEFA points that all clubs of a league achieved in the respective season. It is apparent that correlations remain rather flat over time and fluctuate between values of 0.42 and 0.52. Even though the overall level of correlations is high, it does not reveal fully convincing evidence that international performance can be achieved through high investments on the transfer market. The red line in Fig 4 presents the same correlation at club level and shows clubs' transfer spending and the corresponding earned UEFA points irrespective of their corresponding league. Here correlation coefficients reach values between 0.80 and 0.95. This indicates more clearly that performance in UEFA competitions is money-driven as only clubs with sufficient financial resources seem to be able to achieve substantial sportive performance.
In addition, a multivariate analysis was performed using the Kohonen SOMs. By considering five characteristics-transfer spending (Spend), domestic rank (Domestic Rank), UEFA points (UEFA Points), relative transfer spending (Relat Spend), and transfer balance (Balance)-SOM maps were constructed for each season.  A more complete perspective of this season is obtained by plotting the SOM map of codes, as displayed in the left panel of Fig 6. As such, a comprehensive picture of variable distributions along the cells set is created. Thereby, every pie chart inside these cells shows the vector of weights of each variable, and every map cell represents regions with similar characteristics. In this way, clubs with similar characteristics belong to the same or adjacent cells. This map distinguishes the clubs' characteristics along the entire population, taking into account the variables' importance and closeness. For instance, SOM cells in the lower left corner of the left panel of Fig 6 show the existence of a high correlation between transfer spending, transfer balance, UEFA points, and domestic rank. This implies that clubs belonging to these cells will also reveal these characteristics. Further, confirming the maps of Fig 5, a high correlation between domestic rank and relative transfer spending exists as displayed in the cells in the upper left corner. For further information, the number of clubs belonging to these cells is shown in the right panel of Fig 6. Every black dot inside the cells represents a club. For instance, only two clubs belong to the cell in the lower left corner where the correlations between the examined variables is the highest. Finally, the characteristics' distributions served as basis for naturally grouping the clubs by performing a clustering partition. For this, we firstly determined the optimal number of clusters (function NbClust() from the R package NbClust [21]). Secondly, the Ward method was used to perform hierarchical clustering resulting in three-the optimal number-clusters overall.
In Fig 6 (right panel), the obtained clusters are displayed with uniform color codes along the cells delimited with solid lines. Comparing both panels, the blue cluster (with only seven clubs) revealed high correlations between domestic rank, transfer balance, transfer spending, and UEFA points, of which transfer spending seems to be the dominant variable. For this specific year, we identified Real Madrid and FC Barcelona from Spain, AC Venezia, Atalanta Bergamo, and Juventus Turin from Italy, Manchester United from England, and Budapest JH from Hungary in this cluster. In contrast, the green cluster is comparably larger, and both transfer spending and UEFA points are clearly much less important variables for the respective clubs. In fact, for most clubs these two variables do not exist at all while the transfer balance, domestic rank, and relative transfer spending are the most important variables driving the configuration of this cluster. As stated above, for some clubs the relative transfer spending is an influential variable connected to domestic ranking. Finally, the orange cluster is the largest in terms of the number of clubs. In this cluster, we identify many clubs that are not able to achieve UEFA points and only achieve low domestic performance (lower right corner of both panels). To examine the evolution of these variables, we repeated the calculations for every season. Fig 7 (upper panel) depicts the number of clubs belonging to the "blue" cluster (i.e. with high correlations between domestic rank, transfer balance, transfer spending, and UEFA points) according to their corresponding leagues. This cluster enlarges in the number of clubs and, to a lesser extent, in the number of leagues over time. We further observe that the Premier League contributes the largest number of clubs to this cluster, followed by Spain. The next most well represented first leagues are those from Germany, Portugal, Finland, Italy, France and Hungary. The remaining leagues are only represented by one club-except for Turkey and Netherlands-which surprisingly are not represented at all in the blue cluster. These results suggest that variables driving this cluster have increased in their importance over time, and transfer spending is therefore gaining relevance in achieving European and domestic sportive performance. The blue cluster consists of 142 club observations during the entire time period. Transfer market activities and sportive performance in European first football leagues However, these observations correspond to only 71 unique clubs, hence, many of them appear more than once in the blue cluster over time. For instance, Real Madrid is found in the blue cluster with 13 observations, followed by Manchester United and FC Chelsea (9 observations each). In fact, only 10 clubs make up 50% of all observations (72 times) in the blue cluster. Ordered by the number of observations, clubs belonging to the blue cluster are: Real Madrid, Manchester United, FC Chelsea, FC Barcelona, Manchester City, FC Liverpool, Atlético Madrid, Bayern Munich, Tottenham Hotspur, Juventus Turin. This cluster is thus formed by the most successful and renowned clubs both at a domestic and European level and, therefore, we name it 'top cluster'. In contrast, the lower panel in Fig 7 presents the same information for the green cluster. Here, clubs from all leagues are represented with different frequencies.
Finally, Fig 8 shows the results derived from the PCA. A principal decomposition was performed in every season, resulting in a corresponding set of eigenvalues. For every season across the entire time period, the first two components retain more than 70% of the data variance with the first component retaining around 50% of the variance. A representation of the variables' correlations with the first and second components is displayed in the left panel of  Although correlation with components is an important variable in every PCA, it is necessary to also consider how much variance is explained by every principal component. This is

Concluding remarks
This paper employs a dynamic network approach to analyze the transfer market activities among 21 European first leagues between the seasons 1996/1997 and 2015/2016. We collected transfer records among more than 2,200 clubs, which were involved in more than 135,000 transfers during the time period. In the networks, nodes were clubs and links represented the players transferred from one club to another, which were weighted by the actual fees paid (or received) for each transfer. We extended the work of Liu et al. [12] by analyzing the evolution of the network over time. Additionally, we employed a machine learning approach based on both Kohonen SOMs and PCA to reveal similarities among clubs and cluster them in terms of their transfer market activities. This led to multiple findings from which we derived future research directions.
As a first finding and from a topology perspective, the European transfer network seems to have reached an upper limit in both the number of clubs involved and in the number of players transferred. At approximately the time of the global financial crisis (2007/2008) these numbers stopped growing and the network became more connected and dense. Note that during the time of the Dotcom crisis in the late 1990s and the 9/11 attacks, the network continues to grow even though the clubs' transfer spending sharply declines thereafter (as shown in S3 Fig). In contrast to this stagnation in network size, only transfers to and from Asia continued to increase, while all other regions decreased their interaction with the European football market. At the end of the examined time period, the transfer network became less diverse in terms of clubs and leagues compared to the mid-2000s. Remarkably, the whole transfer network evolved towards a small-world network which is accentuated over time.
A second finding focuses on the relationship between transfer market activities and sportive performance. As such, transfer spending is identified as a key factor for success in international UEFA competitions. We find intense rivalry among leagues at the European level but only a few rich clubs can achieve top positions. We furthermore show that transfer spending is the main driver of UEFA and domestic sportive performance, while relative transfer spending and transfer balance are less important. When looking at club clusters we find significant heterogeneity among clubs and leagues as also reported by Liu et al. [12]. However, within the 'top cluster' the most important European clubs achieve sportive performance very similarly through transfer spending, which originates mainly from England, Spain, Germany and Italy. As such, club managers might have to reconsider their strategic focus and reevaluate the (financial) resource allocation within their clubs to achieve optimal sportive results.
Further findings can conclusively be derived by looking at the inequality of clubs' transfer activities, specifically at transfer spending differences in acquiring talent through the transfer market. Although the transfer market is a key factor in achieving sportive performance, at least for the most important leagues in Europe, this connection is far from being perfect. As such, uncertainty about the sportive outcome might still exist. For instance, Rottenberg [22] explicitly state that a roughly equal distribution of talent is needed in order to generate uncertainty of outcome. (For recent studies on outcome uncertainty in football see, e.g., [23] and [24][25][26][27]). In turn, this might be relevant for keeping fans interested in both individual matches and countries' leagues. From our findings, however, it also becomes clear that most European first football leagues seem not to comply with this theoretical axiom as their clubs are characterized by decisive differences in transfer spending behavior and, hence, financial resource endowments. In turn, these differences might affect sportive results since they are crucial in acquiring future talent. Similarly, Rottenberg [22] find theoretical evidence for equal opportunities in European professional football but less uncertainty of outcome at the league level as long as national leagues are dominated by a small number of clubs. We find that the connection between transfer spending and sportive performance, especially in UEFA competitions, is extremely strong particularly for clubs from the 'top cluster', which might further limit the overall level of uncertainty of outcome. In addition, such a strong connection produces further opportunities and risks. As improved sportive performance from the acquisition of new talents will in turn lead to higher financial income, it might be possible that a virtuous circle is created. As such, financially well-endowed clubs might also be successful in the long-term. However, this interdependence might also lead to a vicious circle with substantial negative implications. Such developments would further increase inequality within the European football market.
From these findings, we suggest three areas for future research. First, an in-depth examination of lower league levels would broaden the scope of our analyses. For example, the second divisions in England and Germany include clubs with a wide scope of different financial resource endowments, providing further opportunities to examine inequalities within the transfer market. Further, the inclusion of football leagues from Asia, North America, and South America to our dynamic network approach would enable an inter-continental comparison with regards to transfer market activities. Second, further variables such as clubs' financial indicators (e.g., revenues, profitability, cash flow, leverage) should be examined in future studies. Thereby, a more detailed understanding of sportive performance drivers with regards to network specifics could be obtained while better identifying potential sources for future transfer spending. Third, a detailed investigation of football clubs' transfer spending behavior during times of economic crises would aid in understanding the direct and indirect connection between the football market and worldwide economic shocks.