Influence of technological innovations on industrial production: A motif analysis on the multilayer network

We study whether specific combinations of technological advancements can signal the presence of local capabilities allowing for a given industrial production. To this end, we generate a multi-layer network using country-level patent and trade data, and perform a motif-based analysis on this network using a statistical validation approach derived from maximum entropy arguments. We show that in many cases the signal far exceeds the noise, providing robust evidence of synergies between different technologies that can lead to a competitive advantage in specific markets. Our results can be highly useful for policy makers to inform industrial and innovation policies.


Introduction
Technological innovation is the main driver of modern economic growth [1,2]. It is therefore not surprising that measuring and predicting the potential impact of technological innovations on export competitiveness has been the central issue of many studies in the last forty years [3][4][5][6], as well as the focus of general interest in the field of innovation systems [7]. Several empirical studies that tried to measure such effects of national innovativeness on productivity and trade were also carried out, with mixed results [8,9]. Overall, such academic efforts provided a theoretical framework and empirical stylized facts that helped to understand the aggregate effect of innovation in determining the competitive advantage of countries in different markets. Policymakers are, however, more interested in identifying specific technologies that are relevant for specific markets [10,11], a task that is much harder to deal with in an organic and objective fashion. Indeed, the scientific effort addressing the impact of specific technologies on specific markets has been limited to ad hoc case studies that are difficult to compare [12,13].
A recent paper of ours [14] deals with this issue using a multilayer-network characterization of the innovation system. Specifically, a three-layered network of innovation activities is derived, starting from three bipartite networks describing the scientific, technological, and production activities of countries; the connections of the multilayer network represent the conditional probability that the information produced by an innovation activity (e.g., a technological sector) is used in another innovation activity (e.g., an industrial-product category) after a given time. Grounded on previous fundamental studies of economic complexity [15][16][17][18], this is the first attempt to build a representation of the innovation system as a complex multilayer network.
In this paper, we generalize this approach by measuring the potential influence that a pair of activities has on another activity. That is, we go beyond single-link analysis and consider the motifs of the multilayer network [19,20]. For simplicity, we limited our analysis to the relationships between (pairs of) technologies and products, which however represent a crucial aspect of the innovation system since the interaction between different technologies is often a driver of innovation and progress [21]. As in Reference [14], we carry out statistical validation of our results against null network models derived from maximum-entropy principles.

Materials and Methods
In order to build the bilayered network of technologies and products that is used in our analysis, we started from the following popular databases.
PATSTAT (www.epo.org/searching-for-patents/business/patstat) collects all patents from different Patent Offices around the world. The basic unit of observation in the dataset is a patent family (i.e., the set of patents with common priorities, that is, referred to the same innovation). Each family is related to the countries of origin of the applicants, and to a (set of) technological code(s) defined by the International Patent Classification (IPC). We define W ct (y) as the number of patent families associated to IPC code t applied by firms located in country c on year y.
BACI export data, recorded by UN COMTRADE (https://comtrade.un.org/), collects the import-export flows (quantified in thousands of current U.S. dollars) between countries in the world, related to production as classified using the Harmonized System 2007 of the World Customs Organization. We define W cp (y) as the monetary value of the overall export of country c for product p during year y. Note that we use export data as proxies of (competitive) industrial production, as typically done in the economic complexity literature.
In this work, we consider a data timespan ranging from 1995 to 2012, for which we have a reliable coverage for both patent and export data. For technologies, we use a 4-digit resolution of IPC codes, resulting in a number of technological sectors N t ranging between 629 and 636. For products, we again use a 4-digit resolution of the harmonized system, resulting in a number of product categories N p ranging between 1140 and 1176. Finally, the number of considered countries N c varies between 66 and 72. The slight variations of these numbers depend on the particular year considered, and are due to geopolitical changes and periodical recategorization of technologies and products.
Using these basic data, we can define the Revealed Comparative Advantage (RCA) [22] of a country c on an activity a (which is either a technological sector t or a product category p) in a given year y: Thanks to the RCA, we can further define, for each year y, the binary bipartite networks countries-technologies and countries-products. These are, respectively, represented by binary biadjacency matrices M C,T (y) and M C,P (y) whose elements M ct (y) and M cp (y) are: (where, again, a refers to technological sector t in the case of the bipartite countries-technologies network, and to product category p in the countries-product case).
Once we have matrices M C,T (y 1 ) for year y 1 and M C,P (y 2 ) for the year y 2 , in analogy with References [14,18], we can construct the assist matrix B T →P (y 1 , y 2 ), whose generic element is defined as: (3) In the above expression, k t (y 1 ) = ∑ N c c=1 M ct (y 1 ) is the number of countries having technology t in their technological portfolio at year y 1 , and k (p) c (y 2 ) = ∑ N p p=1 M cp (y 2 ) is the cardinality of the product basket of country c in year y 2 . As explained in Reference [14], B tp (y 1 , y 2 ) with y 1 ≤ y 2 give the conditional probability that a bit of information produced in technological sector t in year y 1 arrives (via a random walk on the coupled bipartite network) at product category p in year y 2 , through one of the countries having t in its technological basket at y 1 as well as p in its product basket at y 2 . Elements B tp (y 1 , y 2 ) then represent the weighted links of the bilayered (or bipartite) network technologies-products, with the former at year y 1 and the latter at year y 2 . These links are extensively studied in Reference [14].
Here, we move forward and consider the Λ motifs of the bipartite technologies-products network: Λ p tt (y 1 , y 2 ) gives the conditional probability that two bits of information originally located on technologies t and t , respectively, at year y 1 both reach product p at year y 2 . In other words, this motif quantifies the joint probability for co-occurrence in a single country of pair technology t and product p, and of pair technology t and product p, where the two events are considered as independent. Note that, while this interpretation of the Λ motifs cannot be directly related to "impact" or "causality", it does go beyond a simpler measure of a (time-dependent) correlation. Note also that the name Λ motif comes from the fact that, in the bipartite technologies-products network, this quantity gives the weight of a Λ-shaped set of two links having different origins in the technology layer and the same end in the product layer [20]. In principle, it is possible to generalize this approach by considering higher-order motifs by, e.g., assessing the influence of a wider group of technologies on a single product. For the sake of simplicity and due to limits of statistical significance, here we focus on the simple Λ motif.
After obtaining the empirical values of the Λ motifs from the data, we statistically validate them using their probability distribution derived from an appropriate null model, which is schematically defined as follows (see the appendix for a thorough presentation). For each year y, we build two statistical ensembles of biadjacency matricesM C,T (y) andM C,P (y), respectively, for the countries-technologies and countries-products bipartite networks. These networks are built to be maximally random, apart from having the ensemble average of node degrees equal to the observed values in the empirical networks. For node degrees, we mean both technological diversification of countriesk (t) c (y) = ∑ tM y ct and technology ubiquitiesk t (y) = ∑ cM y ct for the countries-technologies network, and analogously both product diversification of countriesk (p) c (y) = ∑ pM y cp and product ubiquitiesk p (y) = ∑ cM y cp for the countries-products network. We chose these quantities as constraints as we wanted our null model to only bear the information contained in the diversification of countries and the ubiquity of activities without taking into account the specific pattern of co-occurrences found in the empirical networks. In the spirit of information-theory interpretation of statistical mechanics [23][24][25], the probability measure defining both statistical ensembles of binary bipartite networks is obtained using a constrained entropy maximization approach. The resulting ensembles are known in the literature as Bipartite Configuration Models (BiCM) [20]. Finally, given BiCM ensembles for bipartite networksM C,T (y 1 ) andM C,P (y 2 ), we can use Equations (3) and (4) appropriately applied to BiCM quantities to derive the probability distribution for valueΛ p tt (y 1 , y 2 ) in the null model [26,27].
Numerically, we populate the BiCM ensembles by generating 10 3 matricesM C,T (y 1 ) andM C,P (y 2 ), and then contract each pair to generate a final ensemble of 10 3 null matricesB T →P (y 1 , y 2 ).
In the following, we focus on Λ p tt (∆y), that is, the average value of Λ p tt (y 1 , y 2 ) over all pairs of years giving the same difference y 2 − y 1 = ∆y. This represents the conditional probability that two bits of information, produced in the same year for a pair of technologies t and t reach product p after ∆y years. We define signal φ(∆y) as the fraction of significant Λ p tt (∆y) (at the α = 0.01 significance level, according to the probability distribution of the null model) for combinations of t, t , and p chosen for selected matrix regions. In general, we consider a population of motifs equal to 5500 units (see below).

Results
As a first test, we report mean signal φ, that is, the signal averaged over all combinations of t, t , and p. Since the total number of such motifs is extremely large, we chose 5500 motifs at random and took their mean signal as representative of the global average. Figure 1 shows that φ basically remains within one standard deviation from noise level α, indicating that the mean signal within the data is negligible. We then report the signal relative to motifs within selected regions of the assist matrix ( Figure 2). Specifically, we chose subregions of 11 technologies and 100 products (related to specific technological sectors and product categories), whose total number of Λ motifs was 5500 (since Λ p tt = Λ p t t ). From Figure 2 we see that, by selecting coherent sets of technologies and products, the signal is well-enhanced: the presence of a pair of technologies in the capability basket of a country can predict whether that country can successfully export a product, and this happens almost independently on time lag ∆y.
As consistency checks, we made two exercises, both reported in Figure 3. Firstly, we show that the results we have just presented do not depend on the particular resolution used to choose the motifs. Secondly, we show that, for incoherent technologies and products, we indeed get a much lower signal-even lower than the significance level. In this latter case, a significant development of specific technologies corresponds to a low export level for a given product.  Figure 2) and products in the 2706-3104 region "inorganic and organic chemicals, pharmaceuticals". Orange lower triangles: technological codes in the H01P-H02M region-related to sector "physics: electricity", and products in the 4411-5516 region "textiles". In all cases, error bars represent the standard deviation over the year pairs giving the same time lag, whereas the dotted line is significance level α.
We finally provide a few examples of motifs with a high signal (i.e., with a low p-value). To do that, since the total number of motifs is extremely high, so that a complete exploration cannot be performed efficient, we considered the motifs made up of link pairs B tp (y 1 , y 2 ) and B t p (y 1 , y 2 ) which were independently those with the highest signal. Table 1 reports some instances of such motifs for specific choice ∆y = 0. Triplets t, t , p appearing in the table indeed seem coherent, and confirm that our method can actually extract meaningful information in an unsupervised way. Table 1. Examples of highly significant Λ motifs. p-value is averaged over all year pairs y 1 = y 2 giving ∆y = 0. To make this selection, we picked the most significant pairs (individual links) (t, p), and then chose t within the region where the average p-value was highest.

Conclusions
In this work, we provide an effectual way of measuring the combined effect of a set of technologies on one product. In particular, we considered lambda motifs, quantifying the paired effect of two technologies together. In the process of finding relevant combinations, we highlight several results.
First of all, we showed how the combination of multiple technologies has a very different role in different industrial and technological sectors. Technologies within the same sector (we showed examples for physics, engineering and chemistry) tend to show synergies between them in enhancing the chances of successful exports of a product (Figure 2), whereas, looking at two generic technologies there is no such effect (Figure 1). This heterogeneity, while expected, is quantitatively measured here. Secondly, we confirm that co-occurrences between technological activities in a country allow extracting information on shared capabilities, which in turn can inform policymakers and stakeholders of relevant synergies, like those highlighted in Table 1, for specific export markets.
The mapping provided by our approach for the effects of pairs of technologies on products can represent the fundamental building block for the formulation of a powerful instrument to inform policies and industrial strategies about technology transfer. This operational step will be an important aspect of future research. Funding: This work was supported by Italian PNR Project CRISIS-Lab. G.C. also acknowledges support from the CoeGSS EU project (676547).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Bipartite Configuration Model (BiCM) and Assist-Matrix Null Model
In order to assess the statistical significance of elements of the assist matrices, we resorted to a null model for bipartite matrices M C,T (y) and M C,P (y), built by randomly reshuffling their elements (i.e., the network links connecting nodes in countries layer C to nodes in technologies layers T and products P, respectively), but preserving the diversification of countries and the ubiquity of different innovation activities (i.e., the degrees of the nodes in both layers of the countries-technologies and countries-products bipartite networks). This means that we randomized the signal coming from the network-connectivity patters beyond that encoded in the node degrees. In order to analytically formulate the null model, avoiding to rely on a conditional uniform graph test [28,29], degree constraints are imposed on average, i.e., in a way formally similar to what happens for the canonical ensemble in statistical mechanics with the constraint on energy [23]. This amounts to set a null hypothesis described by the BiCM [20], which is an extension of the configuration model [24] to bipartite networks.
In the following, we use symbols with the tilde for quantities assessed on null-model configurations, and without the tilde for empirically observed values. From an operational viewpoint, the BiCM null model for any binary biadjacency matrix M, representing a (real) empirical bipartite network with two node layers a ∈ A and b ∈ B, is built using two main steps:

1.
Through a constrained maximum-entropy approach, we define ensemble Ω of bipartite networks that are maximally random, apart from the ensemble average of the node degrees on both layers of the bipartite network that are constrained to generic fixed values. Such an ensemble is thus an instance of an Exponential Random Binary Graph (ERBG).

2.
In order to determine the ERBG that best represents the empirical bipartite network, we use a maximal-likelihood argument showing that the mean values of the node degrees have to be taken equal to the observed ones in the empirical network [30]: k a Ω = k a , ∀a ∈ A and k b Ω = k b , ∀b ∈ B, where we have indicated with k the observed degrees in the real network, and withk the degrees in a generic configuration of the null model. We remind that k a = ∑ b M ab and k b = ∑ a M ab , and analogously for "tilded" quantities.
Let us start by introducing the ERBG with fixed mean node degrees, and letM ∈ Ω be a network configuration in such an ensemble, and P(M) be its occurrence probability. By implementing the prescriptions from information theory and statistical mechanics [23,31], the least biased choice of P(M) is the one that maximizes informational entropy subject to normalization condition ∑M ∈Ω P(M) = 1 plus the constraints: where k * a and k * b are arbitrarily fixed values for the mean degrees of nodes belonging to layers A and B. By defining respective Lagrange multipliers ω, {µ a } a∈A , and {ν b } b∈B (one for each node of the bipartite network), the probability distribution of all configurationsM ∈ Ω that maximizes the entropy, satisfying at the same time all the constraints, is determined by the following variational equation: It is a matter of simple algebra to show that the solution of this equation is: Equations (A4)-(A6) above define the network ensemble known as the BiCM model. Note that, as we have implemented only local constraints, i.e., the mean node degrees, Equation (A4) can be rewritten as the product of single-link probability distributions over all pair of nodes belonging, respectively, to the two different layers [20]: where π ab is simply the probability of the link between nodes a ∈ A and b ∈ B: with η a = e −µ a and θ b = e −ν b . In other words, the existence of each link is an independent event whose probability is function only of the Lagrange multipliers associated to the node pairs defining the links. The values of the Lagrange multipliers are determined by the constraints of Equation (A2), which can be rewritten in terms of the derivatives of the partition function: Equations (A4)-(A8) define the generic EBRG with fixed mean degrees of nodes on both layers of the bipartite network.
Once the generic EBRG has been defined, we can move to the step of determining, among all possible EBRGs, the optimal null model for a given real biadjacency matrix (i.e., a bipartite network) M. Equivalently, we have to choose the best values for {k * a } a∈A and {k * b } b∈B , i.e., for Lagrange multipliers {µ a } a∈A and {ν b } b∈B , in relation to the connectivity properties of M.
To this aim, we write the log-likelihood function [30] L({µ a }, which exactly amounts to choosing k * a = k a ∀a ∈ A and k * b = k b ∀b ∈ B. Finally, the null model for a real bipartite network M is defined by Equations (A7) and (A8), with the Lagrange multipliers set by Equation (A11). This recipe can therefore be applied to construct an appropriate null model for all empirical bipartite networks {M C,T (y), M C,P (y)} y min ≤y≤y max obtained, respectively, by PATSTAT and COMTRADE data.
In order to build a null model for assist matrices B T →P (y 1 , y 2 ), and consequently for the Λ motifs defined by Equation (4), we have to now compose the null models for bipartite networks M C,T (y 1 ) and M C,P (y 2 ). This is done by contracting the two BiCMs for matrices M C,T (y 1 ) and M C,P (y 2 ) along the country dimension, as for Equation (3). We have: wherek t (y 1 ) = ∑ cMct (y 1 ) andk (p) c (y 2 ) = ∑ pMcp (y 2 ) are respectively the ubiquity of technology t and the product diversification of country c in the two single configurations for the BiCM null models for the two bipartite networks countries-technologies of year y 1 and countries-products of year y 2 . In other words, starting from the two BiCM ensembles for M C,T (y 1 ) and M C,P (y 2 ) we build by composition an ensemble of configurations of bipartite networks Ω T →P (y 1 , y 2 ) with link weights given by Equation (A12). The probability distributions of elementsB tp (y 1 , y 2 ), describing the null model, can be, in principle, obtained using exact techniques [26,27]. However, due to the strong non-Gaussianity of such distributions, we adopt a more practical sampling technique: starting from the BiCMs for M C,T (y 1 ) and M C,P (y 2 ), we use Equations (A7), (A8) and (A12) to generate null assist matrices and populate related ensemble Ω T →P (y 1 , y 2 ) to estimate the full probability distributions. In a similar way, by using composition Equation (4) for ensemble Ω T →P (y 1 , y 2 ) Λ p tt (y 1 , y 2 ) =B tp (y 1 , y 2 )B t p (y 1 , y 2 ) and averaging over all pairs of years y 1 and y 2 with fixed delay ∆y = y 2 − y 1 to obtainΛ tp (∆y), we can construct the null distribution of Λ motifs for each triple t, t , p and delay ∆y. The generic observed element Λ p tt (∆y) is then considered statistically significant depending on the p-value that we can infer from its distribution under the null hypothesis. The specific threshold for statistical significance and the size of the generated ensemble vary on the performed exercises (as highlighted in the text). In our case, we fixed the statistical significance level at α = 0.01. It is useful to recall that the two choices, the threshold and the size of the ensemble, are not unrelated: the higher the threshold we want to test, the bigger the sample we require. We consequently extracted, for each couple of years y 1 and y 2 , two ensembles of 1000 configurations/matrices for two BiCM null models M C,T (y 1 ) andM C,P (y 2 ); by one-to-one contracting the configurations in the two ensembles, we obtain 1000 values ofΛ p tt (y 1 , y 2 ) to finally determine the statistical significance of observed value Λ p tt (∆y), as explained in Section 2.
A final comment on the issue of multiple hypothesis testing is in order here. Since we perform many statistical tests at once (one for each motif of the assist matrices), in order to determine the true significant elements, we should use a correction for the significant threshold (such as Bonferroni or False Discovery Rate). However, signal φ(∆y) is meant to measure the average outcome of a statistical test over all possible motifs in the network, and as such we do not need to implement any correction for the significance threshold.