The interplay of university and industry through the FP5 network

To improve the quality of life in a modern society it is essential to reduce the distance between basic research and applications, whose crucial roles in shaping today's society prompt us to seek their understanding. Existing studies on this subject, however, have neglected the network character of the interaction between university and industry. Here we use state-of-the-art network theory methods to analyze this interplay in the so-called Framework Programme--an initiative which sets out the priorities for the European Union's research and technological development. In particular we study in the 5th Framework Programme (FP5) the role played by companies and scientific institutions and how they contribute to enhance the relationship between research and industry. Our approach provides quantitative evidence that while firms are size hierarchically organized, universities and research organizations keep the network from falling into pieces, paving the way for an effective knowledge transfer.


I. INTRODUCTION
Understanding the relationship between research and industry is essential to improve the quality of life in any modern society. Ranging from faster application of new discoveries to knowing whether or where investment should be employed, this flow of knowledge between 2 research and industry has long been of general interest. Yet, knowledge is a very special resource whose study demands new techniques. The traditional approach to resources is based on the concept of scarcity since they are usually finite. But knowledge cannot be seen this way because it grows, and the more it is used the more it spreads [1]. In addition, existing studies on the research and industry interplay [2,3,4] have neglected its network character.
Our approach consists in analyzing this issue from a complex network viewpoint [5]. Many other systems are better understood in this manner [6,7,8,9]. In this approach, the interaction between research and industry is best described as a network whose vertices (or nodes) represent either companies or institutions devoted to research, and each edge (or link) represents collaboration between any two of them. Hence, we can quantitatively study how research and industry influence each other, if we have access to data describing a real system.
Here, we focus our attention in the so-called Framework Programme (FP), a mechanism aiming to improve the transference of knowledge in the European Union (EU) by setting out its priorities for research and technological development. The data to generate the corresponding FP network were gathered from the CORDIS website [10] by a robot. Since, currently, the 6th programme is under execution and the 7th is being planned, we focused our study in the 5th Framework Programme (FP5)-covering the period from 1998 to 2002-in order to analyze a completely finished programme. Despite the presence of more than 25,000 participants, they can be split in two major groups: Companies and Universities. The first is made of over 16,700 companies and other industry related participants who expect their investments in R+D+I to be profitable. The second group can be regarded as the opposite, more than 8,500 participants involved in some type of research for whom results do not necessarily return immediate income (see Appendix). Exploring the relationship between these two groups not only provides a good example of the interplay between structure and information flow, but also offers a glimpse on how research links with innovation and if the distance between basic research, applications and products reduces [11].
It is worth remarking that we are mainly interested in the capacity of the FP5 to create and transfer information and nothing can be said about this issue inside each node. Notice that some participants are large institutions or companies with complex organization charts, which may have several projects whose coordination cannot be guaranteed in general.
However, our main concern is how to set the means to integrate research, development and 3 innovation efficiently, not if these means are successfully used.

II. ANALYSIS OF THE DATA
To characterize the FP5, in this section we compute five important features in any network: degree distribution, shortest path distribution, betweenness centrality, clustering coefficient and the degree-degree correlation. The detailed description of the dataset can be found in the Appendix.

A. Degree distribution
The probability that a University collaborates with k other Universities (i.e., the degree distribution of the Universities) decays as a power law, Similarly, Companies follow a power law with γ C = 2.76. The two distributions can be seen in Fig. 1, where a log-log scale is used in the plot, providing evidence for the scale-free topology [12] of both networks. The degree distribution of the whole FP5 network is also well approximated by a power law with exponent γ close to 2.1 [13].
Note that the degree distribution of Universities is described by a power law with γ U < 2, implying that their mean degree grows in time. Indeed the first moment (i.e. mean degree in this case) of a distribution with a power-law tail diverges when its exponent is less than 2. This result suggests that Universities form an accelerated growing network [14], where the total number of edges grows faster than a linear function of the total number of vertices and, consequently, it is verified that 1 < γ < 2.
To elucidate this issue, we computed the average degree k during several years to check its tendency. Though we only have the data corresponding to 4 years (table I), they are enough to confirm the existence of an accelerated growth since the average degree is not constant (46% increase for the network of Universities in the four year period). But if the collaborations grow faster than proportional to the number of participants, it is because they do not emerge by the mere increase of participants. Not only new participants contribute to increase the number of collaborations, but also the old ones, meaning that some form of synergy exists encouraging the creation of new collaborations between Universities.
On the other hand, the average degree of Companies also grows (though significantly k other Universities, that is, its degree distribution. The degree distribution of Companies is shown with blue circles. Data were log-binned. We find that both distributions follow a power law tail, P (k) ∼ k −γ , thus having a scale-free topology, with vertices connecting each other in a heterogeneous manner: Most vertices have few connections, but some have a very large degree.
The best fit for the straight region of the curves gives γ U = 1.76 ± 0.01 with a correlation coefficient R = 0.998 for Universities, and γ C = 2.76 ± 0.03 with R = 0.991 for Companies. However, the fact that Universities show γ U < 2 whereas Companies have γ C > 2 implies that the mean degree of Universities grows in time but not the mean degree of Companies. This result suggests that some form of synergy encourages the creation of new collaborations mainly between Universities, while the network of Companies is less dynamic in this respect. slower) during the four year span of the dataset (table I). However, the fact that γ C > 2 suggests that this increase should be transient. Therefore, although the creation of collaborations is encouraged (since when the FP5 was finished the mean number of collaborations had risen from 10 to 26 and some participants had surpassed 2,500 collaborations) these results reveal that the synergy is more pronounced between Universities. In this sense, the FP5 is less effective in improving the network of Companies and Universities seem to take more advantage of this opportunity to create new collaborations.
Also noticeable in table I is the fact that the number of Companies increases faster than the number of Universities (72% and 64% increase respectively in the four year period), indicating another difference in the evolution of both networks. whereas, in the case of Companies, the farthest pair is separated by 14 edges and the average distance is d = 5.67 [15]. This can be seen in Fig. 2 where we plot the distance distribution, P (d) versus d. Hence, also here Universities are essential for Companies since the largest distance in the entire network is only 8 and the average distance is d = 3.14, which implies that, on average, there are only two intermediaries between two participants.
The average distance is a coarse characteristic though. As a finer measure, it is possible to compute the average distance of a vertex of degree k to all other vertices in the largest component [16]. In Fig. 3 we plot d (k) for both networks on a log-linear scale, where the of Companies has 13 intermediaries, for Universities the maximum separation is 7 edges. Therefore, Universities are important for Companies since, when they cooperate, in the whole FP5 network the largest distance reduces to 8 and the average distance to 3.14.
Y axis means d (k) and the X axis is log k.
Therefore, albeit both networks display the so-called small-world effect [17], there are important differences. The presence of Universities eases the flow of information since they are much closer to each other than Companies. This could be expected since the main purpose of a company is to satisfy its shareholders, which does not include the spread of information from which competitors can take advantage. But, interestingly, the consequences of this fact go beyond. When Universities are excluded from the projects, Companies become isolated  Companies. The decay is faster (i.e. β C > β U ) in the net with the larger value of exponent γ (see Fig. 1), providing empirical evidence for the results of Ref. [16]. Note that the lowest degree vertices in the network of Universities show a distance to other vertices comparable to the one of the highest degree vertices in the network of Companies. Also note that in both networks max ( . k) ≈ 2 min ( . k) as had been previously observed in network models [16].

C. Betweenness centrality
To further investigate the interplay between the two kinds of participants, we can also measure the betweenness centrality [18] in the FP5. The betweenness σ m of vertex m measures the extent to which m lies on the paths between other participants. It therefore accounts for the influence of a participant between other two distant participants, relating the local structure and the global topology of the network. It is defined as where B(i, j) is the number of shortest paths between nodes i and j, B(i, m, j) is the number of such shortest paths passing through vertex m, and the sum is taken over all pairs of vertices i and j which do not include m. The pre-factor, where N is the total number of nodes, accounts for normalization, so that 0 ≤ σ m ≤ 1.
Since the computation of the betweenness for the whole FP5 is an extremely timeconsuming task, we focus our study on one of its subprograms: 'Promotion of innovation and encouragement of small and medium sized enterprises participation' (SME), which is formed by 195 research institutions and 212 Companies (see Appendix). Given our ability to split the SME into Universities and Companies, several different situations are considered. The average betweenness of the SME, taken over all its vertices, turns out to be σ = 5.19 · 10 −3 . Considering only those vertices m which are Universities, we find that their average betweenness among all other vertices in the SME is σ U = 6.76·10 −3 . Likewise, we obtain σ C = 3.74 · 10 −3 for Companies.
Now, if we only take into account those shortest paths whose endpoints are Companies, the betweenness measures the role Universities play in linking Companies: σ CU C = 5.44 · 10 −3 ; on the other hand, when the endpoints are Universities, the average betweenness of Companies is σ U CU = 2.34 · 10 −3 . Thus, we see that the role Universities play between Companies is more than twice the one played by Companies between Universities. Moreover, given that σ U > σ > σ C , we observe again the central function played by research institutions in the FP5 network.

D. Clustering coefficient
The clustering coefficient of a vertex i is defined as where n i is the number of edges connecting its k i nearest neighbors. It equals 1 for a participant at the center of a completely connected cluster, and 0 for a node whose neighbors are not linked at all. Taking the average of the clustering coefficient, we obtain C = 0.68 for Universities and C = 0.59 for Companies, which are much higher than the average clustering coefficient of a random graph [19] with the same number of nodes and average degree (namely, C = k /N ). Moreover, C is independent of the number N of participants in both cases (see table I), in contrast with the prediction of a scale-free model [12] where C ∼ N −0.75 [5,20].
This high and size-independent average clustering coefficient evidences the organization of Universities and Companies in modules.
However, when we measure the clustering coefficient of a node with k links, C(k), for both networks (Fig. 4), we find that it decays as a power law for large k. We therefore infer that the two nets have hierarchical modularity, which is characterized by the scaling 9 law C(k) ∼ k −α , in contrast to some scale-free or modular networks where the clustering coefficient is degree-independent [21]. This result suggests that Universities and Companies have an inherent self-similar structure [22], being made of many highly connected small modules, which integrate into larger modules, which in turn group into even larger modules (Fig. 5A). Actually, we observe that 4,333 Universities (50.8%) and 10,564 Companies (63.5%) have C i = 1, indicating the presence of many totally connected groups. This is due to the fact that most of these entities participate in only one project, having as neighbors other vertices, which in turn are all connected between them by virtue of the participation in the project. Furthermore, given that this result suggests weak geographical constraints [23], we searched for communities in them [24] and found precisely that they were not based on nationality (Fig. 5B), whence, the FP is successfully applying a policy which avoids its segregation by nationality. algorithm [25], we find that they are all mixed.

E. Degree-degree correlation
An interesting question is which vertices pair up with which others. It may happen that vertices connect randomly, no matter how different they are. Usually, however, there is a selective linking, i.e. there is some feature which makes more (or less) likely the connection [26]. There is assortative mixing when vertices of similar degree tend to be connected, and disassortative mixing in the opposite case (i.e. when vertices of high degree tend to connect to vertices of low degree) [27,28].
A first approach to elucidate this issue is by means of the joint degree-degree distribution P (k, k ), which gives us the probability of finding an edge connecting vertices of degree k and k . We see that for Companies the distribution has sharp peaks for k = k (Fig. 6A).
This network thus seems to display assortative mixing, i.e. if one chooses at random a vertex of degree k then, with considerable probability, it will be connected to vertices of degree k. In other words, Companies with similar degree tend to collaborate more frequently than Companies with different degrees.
Notice that the fact (mentioned in the previous section) that many entities participate in only one project may, by itself, explain these peaks: If the X participants of a certain project have no other projects each of them has degree X − 1 and each of their neighbors has degree X − 1, giving rise to an assortative trend. On the other hand one can also argue that, when a Company has high degree it is due to being involved in many projects. It is then reasonable to assume that nodes with high degree represent large institutions, given that only these can deal with many projects at the same time. That being the case, the observed assortativity means that the spread of information between Companies depends on the institution's size. On the contrary, for Universities P (k, k ) is scattered throughout the plane k − k (Fig. 6B).  It is also interesting to analyze how Universities and Companies link each other, which can be done as follows. We search for all Companies with k links and then compute the average degree of all their neighboring Universities. Note that the former degrees are always calculated in the corresponding network, thus a Company with degree k has k neighbor Companies, though it may have more links (to Universities) in the complete FP5 network.
Analogously, we can find all Universities with k links to average the degrees of all neighbor Companies. The results are depicted in Fig. 8 where, as before, it is used a log − log scale.
Again, we plot as squares (Universities) or circles (Companies) the points obtained from more than 10 observations to identify the region where the tendency is well defined. We find that, while Companies link to Universities independently of their sizes, Universities with high degree tend to collaborate with large Companies.
Finally, another way to quantify the mixing in the FP5 is by means of the assortativity coefficient [27], which is just the Pearson correlation coefficient of the degrees of connected vertices. In this case, we obtain what type of mixing takes place in the network by means of a single number instead of a distribution. If e jk is the probability that a randomly chosen edge has vertices with degree j and k at either end, the assortativity coefficient takes the following form: r = jk jk(e jk − q j q k ) k k 2 q k − ( k kq k ) 2 where q k = j e jk and q j = k e jk . This coefficient verifies that −1 ≤ r ≤ 1, being positive when the network is assortative and negative when it is disassortative. We find r C = 0.13 for the network of Companies and r U = 0.06 for Universities, corroborating an assortative trend usual in social networks [28].
Therefore, Companies and Universities differ in the way they establish collaborations.

III. CONCLUSION
We have presented here a study of the interplay between research and industry in the scope of the Fifth Framework Programme. Using network theory methods, we perform several measures that allow us to quantify the features of this relationship and assess their potential improvements. Naturally, the FP5 network does not include all interactions between university and industry (such as the recruitment of graduates by companies, the transfer of knowledge through scientific and technical literature or industry conferences).
Furthermore, as already mentioned in the introduction, it also neglects the fact that internal connections in an institution (e.g. between different departments) may be absent, which would mean that a node in the studied network would split into disconnected nodes. While these issues may significantly influence the flow of information in the network, addressing all of them requires information that is beyond reach for most researchers at this point. The presented analysis thus represents a starting point for a quantitative understanding of the university-industry interplay network. It is possible, however, to foresee advances in these directions, given the increasing availability of information on how institutions self-organize.
The results point to the central function played by Universities in the FP5 network in reducing the distance between research and applications. Indeed, we show that Universities play a crucial role in connecting the network of Companies, which would otherwise be separated in many small clusters. While the network of Universities is well integrated and established in accordance to what is observed for other social networks, the same doesn't seem to apply for the Companies network, mainly due to its relatively small largest connected component. Competition is probably the origin of this effect, which is moderated by the presence of Universities. It seems reasonable, then, to conclude that special attention should be devoted to company-company collaborations. Supporting this, is also the fact that new collaborations arise at a higher rate between Universities.
Our observations suggest in addition that Companies and Universities establish collaborations differently: While Companies seem to exhibit a hierarchical structure in terms of their size, Universities are less selective in their collaborations. We also observed that both networks display hierarchical modularity and that communities in the FP5 network are not nation-based. The FP appears then to mix all nationalities of the European Union, thus reaching one of its main goals: Promote the transfer of knowledge throughout Europe. • QOL: Quality of life and management of living resources (2,524 projects).
• NUKE: Research and Training in the field of Nuclear Energy (1,032 projects).
And there are three Horizontal Programmes to cover the common needs across all research areas: • INCO: Confirming the international role of Community research (1,034 projects).
• SME: Promotion of Innovation and encouragement of small and medium enterprises participation (142 projects).
• HPOT: Improving human research potential and the socio-economic knowledge base (4,876 projects).
The data to analyze the FP5 as a complex network were obtained from the web pages of CORDIS [10] with a robot implemented in Perl. The result was a database with 15,776 records as follows: Programme | Year | Participant1 -Nation -Dedication | Participant2 -Nation -Dedication | . . .
The first field refers to the specific programme to which the project belongs and the second field informs us about the year in which it started. The following fields are the participants in the project with their corresponding nationality and dedication ('research', 'education', 'industry'...). We then have a bipartite graph [5,14]  Therefore, all records could be classified in one of the following levels: 'Non Companies' (41,317), 'Industry' (6,447), 'Other' (17,588) and 'Not Available' (12,346). The total number of records (77,698) is larger than the number of participants (25,287) since many of them collaborate in several projects. Then, it was necessary to verify if repeated records were always classified in the same level of 'Dedication'.
We found that many participants were classified in different levels, thus we had to define a set of rules which eliminated this ambiguity. Hence, the following step was to study each level to understand their composition. For every level, we chose 100 records randomly to check by direct inspection their dedication. The result was that all selected records in 'Industry' were companies, any in 'Non Companies', 95 in 'Other' and 55 in 'Not Available'.
With the former information, we proceeded as follows. We first defined for each par- In order to confirm this result and to classify the remaining 3,286 entities, we defined a filter based in keywords relative to the Universities group, such as 'univer', 'schule', 'laborato'... When we focused our attention in the group of 22,001 participants classified using 'Dedication', we found that those classified as Universities according to the filter were also Universities according to 'Dedication'. Since the filter was a completely different manner of splitting the dataset, we could use it for the rest of the entries. Note that we only believed the result of the filter if it was University, not if the result was Company. This is reasonable since the filter was designed to identify terms related to Universities, not to Companies.
By means of the filter we classified all participants but 309. To place these entities, we paid attention to which value was higher: 'Non Companies' or 'Industry', independently of