R&D productivity in Europe: towards a regional taxonomy in the European Union

Harvard Deusto Business Research. Volumen III. Número 1. Páginas 2-22. ISSN: 2254-6235 Abstract Research, development and innovation activities have become key sources of competitive advantage, which is one of the main factors behind the wellbeing of citizens living in a given territory. Being aware of this fact, public administrations at different administrative levels have encouraged the production of innovations through public policies. If we focus in Europe, regional disparities in the amount of innovation inputs and outputs are very high. In this paper, the authors will measure the productivity of research and development activities performed by all regions in the EU. In order to do so, authors will take into account some indicators to measure innovation inputs and outputs at the regional level. Using Data Envelopment Analysis (DEA) authors will measure regional productivity in the field of R&D and then compare this productivity outcome between regions in the EU. After explaining this first DEA model, authors will use cluster analysis to achieve a typology of regions regarding their productivity in R&D activities. With all these results, policy makers could compare the situation of their own regions and adapt policies of efficient regions to their own institutional and economic background in the field of R&D.


Introduction
R&D investment has become one of the main variables to achieve competitive advantages.These competitive advantages, in the long run, will create higher levels of prosperity in a given region.This idea has been accepted by economic theory since Adam Smith, but it has been in recent times when economic theory has focused in R&D and its connection with policy makers and society in general (Dodgson & Rothwell, 1994;Porter, 1998;Porter, Furman & Stern, 2000).
Once we have highlighted the importance of R&D as a basic tool to achieve higher levels of prosperity in a given society, it would be obvious that the public administration would support R&D activities through a proper public policy.Additionally, the different schools of economic thought are in favor of this kind of behavior.The neoclassical literature accepts that the competitive market underinvests in R&D activities (Mani, 2002).Hence, the level of R&D investment that maximizes profit for firms is smaller than the level of R&D that maximizes social prosperity (Arrow, 1962;Beije, 1998).
On the other hand, the evolutionary school, linked with the concept of national/regional innovation system (NIS/RIS), proposes the public intervention to strengthen the different economic agents inside a NIS/RIS, and also to increase the interaction among these actors (Lundvall, 1992).
A consequence of the positive economic results that governments and firms link with R&D investment has been a non-stop increasing in the public and private funding devoted to R&D in almost all developed economies (Martínez & Aguado, 2009).
Although the volume of private and public expenditure in R&D activities has been growing for the last decades both at the national and regional levels, there are few studies about the efficiency of this kind of expenditure, especially at the regional level.
In this work, we are going to present a comparative study of all regions in the European Union (27 countries).In this paper, we will analyze the efficiency of R&D expenditures taking into account EU-27 regions, in order to build a common taxonomy, discovering similarities and disparities between regions.Some attempts to measure the efficiency of RIS at the European level have been done by different authors in recent times (Navarro, Gibaja, Aguado & Bilbao-Osorio, 2009;Martínez Pellitero, 2007).In these studies, the conceptual framework of RIS has been used to select a range of variables linked with inputs and outputs of R&D activities.In all these cases, the methodology and statistical use of data has been similar: principal component analysis to highlight the main dimensions that explain regional behavior in R&D activities and then a cluster analysis to gather regions in groups with common features measured in the axes defined previously in the principal component analysis.This kind of econometric analysis is used to group regions with similar levels of economic development, R&D inputs and R&D outputs.Moreover, it helps in finding the strong and weak points of each group of regions in comparison to the rest of groups.
However, this kind of analysis do not link directly the amount of output achieved with the amount of inputs devoted to R&D.A region (region Y) using a great quantity of R&D inputs and achieving exactly the same output as other region (region Z) that uses a smaller amount of R&D inputs would appear in a higher position in the ranking of innovative regions.In reality, region Z is using its resources in a more efficient way than region Y, so region Z should be highlighted as more efficient and rank in a higher position.
Different authors have tried to measure the efficiency of RIS in certain national economies (Buesa and Heijs, 2007;Miceli, 2010).In these analyses, the number of patent applications in the national patent office or in the European Patent Office (EPO) has been used as one of the main or even unique R&D output indicator.
The number of patent applications has been a widely used indicator in the economic literature (Kamien & Schwartz, 1975;Mani, 2002), and allows quick comparisons between regions and nations.However, the use of this indicator as the only variable to measure the R&D output does not allow to take into consideration the whole result achieved by a region in this field (Álvarez, Aguado & Martínez, 2008).In some economic sectors, the propensity to patent may be very low.In other cases, firms may develop products or processes which are new to the firm, but not to the sector at the global level.In this case, a patent is not possible, although that company has achieved an R&D output.Due to the aforementioned facts, it may sensible to complement the number of patent applications with other variables in order to have a better measure of the R&D output of European regions.
The objective of this research is to measure the efficiency (productivity) of EU-27 regions in R&D activities, building a regional taxonomy according to those efficiency levels.In order to fulfill this task we will use the statistical tool Data Envelopment Analysis.
The paper is developed as follows.In section 2, the evolution of the R&D expenditure will be analyzed, in the context of the EU.In section 3, the Data Envelopment Analysis tool will be explained in detail and also its relation with measuring the efficiency (productivity) of R&D activities.In section 4, the methodology followed in this paper will be described and, in section 5, the results of the DEA analysis will be presented.The paper ends with a conclusions section.

Evolution of R&D expenditures in the context of the EU
As mentioned in the introduction, the relevance of the productivity of investment in R&D in the long-term growth of the economy is a topic widely accepted in economic literature (Cameron, 1998).Recent articles have been working on the relationship between investment in R&D and production showing its importance.In this section, we will make a brief overview on the status of R&D in EU-27 countries.
As seen in Table 1, Spain's position is low in terms of total investment in R&D relative to GDP, from 1.12% in 2005 to 1.38% in 2010.In the Italian case, the evolution is similar: from 1.07% to 1.21%.The development is positive but of insufficient entity to reach the leading countries, like Sweden and Finland, which by far exceed 3%.Both countries are below the average for the EU-27 and the EU-15.Countries with low rates on business investment in R&D (such us, Portugal, Italy and Spain) show a growth rate above average in this indicator.This would indicate the existence of a process of convergence in this variable, at least until 2010 (last year available for the whole EU-27), between those countries and the EU-27 average.The effect of the current economic crisis in this process of convergence might be negative (OECD, 2013).

Assessing R&D effectiveness and productivity using data envelopment analysis (DEA)
In many economic studies, performance/productivity is defined or measured as the quantity of resource (inputs) needed to obtain some quantity of product (outputs).
This performance analysis leads us to the study of efficiency: how to obtain the best mix of resources for obtaining those results.
In general terms, the modelling approach to measuring comparative performance could be summarized in two groups: • Parametric methods, like the Stochastic Frontier Analysis (SFA), which uses multivariate techniques to analyze the variation in the production rate or cost rate among different organizations running the same activity (i.e.financial services, hospitals...).
• Non parametric methods, like Data Envelopment Analysis (DEA), that tries to measure the efficiency of those homogeneous entities estimating the optimum level of product as function of the type and quantity of available resources (Smith & Street, 2005).
In this paper, DEA1 is being used as it was coined by Charnes, Cooper and Rhodes (1978) in their seminal paper on DEA, based on a previous work by Farell (1957).DEA is for measuring relative efficiency, so an organization that consumes fewer resources for getting the same quantity of product can be considered as more efficient.With such premise, this methodology starts from the definition of Decision Making Unit (DMU) as the unit of assessment or entity whose efficiency would be relatively measured.And the efficiency ratio defined as a weighted sum of outputs to a weighted sum of inputs.
How to obtain the weight factors?A linear programming is, then, used to get those numbers where the objective function is the efficiency ratio of a DMU and the constraint set is defined by the fact that the efficiency ratio of the rest of DMUs cannot be upper than 1 (or 100%).
Repeating the analysis for each DMU allow us to build up an efficiency frontier where more efficient DMUs are located (those which minimize inputs levels for given outputs levels or alternatively, maximize the output for given inputs levels).All those efficient DMUs have an efficiency score equal to 1 while the rest will get a lower value.The outcome of the process is shown in Figure 1.
DEA models could be classified regarding two criteria: • The Pareto Definition: Two definitions are given: -the one labelled "output oriented"-when outputs are controllable (i.e.goods produced), so trying to produce with given amounts of inputs the highest possible amount of outputs and -the one labelled "input oriented" -when inputs are controllable (i.e.workers and machinery) and, therefore, produce given amounts of outputs with the lowest possible amount of inputs.
As R&D investment is mainly focused in the obtaining of results (output maximization) and following, for example, Graves and Langowitz, 1996 who studies the behaviour of R&D expenditure, an output oriented CRS model has been selected.
Let us assume that we have n DMUs (k = 1, 2, ..., n) using r inputs to secure s outputs.Let x jk (j = 1, 2, ..., r), be the input levels used by DMU k and y ik the levels of output i (i = 1, 2, ..., s) secured by DMU k.And let l j be the weight factor assigned to each DMU.
The following linear programming model can be stated, and be solved for every DMU: where ε is a very small positive number to avoid null weight factors.

Methodology
The methodology used in this paper is very straightforward.It is depicted in Figure 2. • Firstly, the input-output variables have been selected following recommendations found in previous studies that analyze the RIS efficiency.• Secondly, data from all EU-27 regions has been collected.• Next, R&D activities' efficiency have been measured based on DEA.
• Finally, an exercise of clustering the analyzed regions has been made according to the previous findings and results.
On building the efficiency models, three inputs and three outputs have been considered.For measuring R&D outputs three variables have been selected: GDP per capita, knowledge intensive services and high & mid tech manufacturing employment and the number of patents applied for in the European Patent Organization.Several examples to measure R&D outputs can be found in the recent literature.Some authors (Navarro et al., 2009) have used the number of patents, while others (Martínez Pellitero, 2002, 2007;Buesa & Heijs, 2007) have used GDP per capita and knowledge intensive services and high & mid tech manufacturing employment as output variables.
On selecting the time period covered by input and output data, a lag has been used, as R&D inputs are not turned into outputs instantaneously.Some studies (i.e. Lee & Park, 2005) state that there is a three to five years lag since R&D inputs is reverted into outputs.In this paper, inputs are being measured as the average of the values obtained in the period 2004-2007 while all output data has been gathered from 2008-2009 (or latest available data).
The whole dataset has been obtained from Eurostat.We will be using regions at the NUTS 3 level, with the exception of Belgium where the only data available is at NUTS 2 level.We have eliminated from the analysis regions with at least one of the following characteristics: • Number of patents < 10 per million of inhabitants.
These filters are necessary in order to obtain valid results, not distorted by regions without significant R&D expenditures.After applying the filters, we have ended with 190 regions as valid units for the final analysis.

Results
This section shows the results of measuring the efficiency of R&D investment of the 190 regions using data envelopment analysis (DEA).First, we made the analysis of efficiency using the basic model (which includes all inputs and outputs).Then we have proceeded to the execution of partial models that combine a single output with all inputs.In this way, it is possible to measure the efficiency in R&D for each selected output.
For example, the DEA model that includes all inputs and that incorporates patents as output can be understood as the model that measures the efficiency oriented to the achievement of patents.Additionally, we have estimated other additional modes apart from the basic model (which includes all inputs and outputs): the GDP-oriented efficiency, the patent-oriented model and the employment efficiency-oriented model.Table 3 shows inputs and outputs included in each of the seven DEA models that have been calculated.
Table 4 shows the results of the efficiency of R&D for the 190 regions using the basic DEA model, in which all inputs and outputs have been taken into account.Regions are divided in three different groups according their level of efficiency: high efficiency regions (efficiency equal or higher than 0.70), average efficiency regions (efficiency between 0.69 and 0.40), and low efficiency regions (efficiency below 0.40).Regions highlighted in cursive fonts achieve maximum efficiency.Inside each group, regions are always ordered from maximum to minimum efficiency.
Regions from different countries are able to achieve maximum efficiency in the basic model.On the other hand, the most inefficient regions (less than 40% efficiency) are also widespread at the European level.The rest are in an intermediate position between these two extremes.It is noteworthy that some of the regions 100% efficient in the basic model show a small level of R&D investment over GDP compared to others, such as capital regions (Madrid, London, Lazio, Berlin...) that have higher levels of use of inputs.
These results differ from those obtained by Buesa and Heijs (2007) using a DEA model based on patent application as the only output for R&D investment.For Buesa and Heijs, the more efficient regions tend to coincide with that showing the highest R&D expenditure per capita and in absolute terms.However, in this study, those regions are in most cases in an intermediate position (Bruxelles, Stockholm).In contrast, some regions with a reduced R&D investment, both in absolute and relative terms, are capable of reaching the highest level of efficiency (Sardegna, Illes Balears).

Economic theory supports the idea of having an active innovation policy to enhance R&D activities
After the estimation of the basic model, the patent-oriented efficiency model and the employment oriented efficiency model have been calculated.As it is shown is Table 5, there is a strong correlation between the three models considered in this analysis.The basic model is highly correlated with the employment-oriented efficiency model (0.787) and the GDP -oriented efficiency model (0.847).The basic model shows a moderate correlation with the patent-oriented model (0.540).On the other hand, there is a high correlation between employment and GDP oriented efficiency models (0.722), whereas there is a low level of correlation between the patent-oriented efficiency model and the employment and GDPoriented efficiency models.In short, we can conclude that there is a strong correlation between the basic, GDP and employment models, while all those models differ clearly from the patent-oriented efficiency model.From this analysis it is possible to conclude that similar results to the ones presented in Table 4 are to be expected in the 3 highly correlated models (basic, GDP and employment), while different ones are to be expected considering the patentoriented efficiency model.Considering the output variables used to estimate all DEA models (GDP, patents and employment in knowledge intensive services and in high & mid high tech manufacturing), we have conducted a cluster analysis of the 190 regions included in the models.The results of this analysis are shown in Figure 3.The aim of this analysis is to identify groups of regions sharing similar patterns in terms of efficiency, but different in contrast with the other groups.Results from Table 5 suggests that with only two DEA models it is possible to consider almost all the information calculated in all DEA models, due to the high correlation between three of them (basic, GDP and employment).
According to the fact, we have selected the employment-oriented efficiency model (highly correlated with basic and GDP models) and the patent-oriented efficiency model (no highly correlated with the other models).Moreover, the correlation between these two selected models is the smallest one (see Table 5).Following this procedure, we have identified groups of regions which share similar features in all DEA models, using only two of them.
As shown in Figure 3, we can distinguish seven groups of regions: • Cluster 1: In this group, we discover regions with the smallest level of efficiency in both models.In this group, we find regions from peripheral European countries, mainly from Southern and Eastern Europe (29 regions) (see Table 6).
• Cluster 2: In this group, we still find regions with very low levels of efficiency in both models, showing a slightly more efficient behavior regarding patents.Anyway, the cluster average is below general average in the two models.In this case, we have regions belonging mainly to France, the UK and Spain (50 regions) (see Table 7).• Cluster 3: In this cluster, we find regions with the most efficient result regarding patents and almost an average result regarding employment.Then, these regions are especially productive in using their inputs in order to generate patents.In this small group of successful regions, we have German, Austrian, British, Danish and Italian regions (13 regions) (see Table 8).Table 8 Cluster 3: leading regions in patent efficiency • Cluster 4: Very high efficiency regions in both generation and employment creation.These regions are the European leaders in terms of efficiency, achieving the best result in comparison with the rest of the groups.The geographical origin of these regions is similar to the one in cluster 3 (9 regions) (see Table 9).• Cluster 5: Regions with a high efficiency regarding employment and a low efficiency regarding patent generation.These regions are more efficient than average regarding employment.Most of them are Eastern European regions, with a small number of Spanish, French and British regions (38 regions) (see Table 10).• Cluster 6: Regions which are more efficient than average in patent generation and a bit less efficient than average in employment-oriented efficiency.We find mainly German regions in this group (37 regions) (see Table 11).• Cluster 7: In this cluster, regions with the highest level of efficiency in employment are grouped.However, their efficiency regarding patents is lower than the average.We find regions belonging mainly to Eastern Europe (14 regions) (see Table 12).In Table 13, we can compare average results for each of the seven clusters with the general average for the 190 regions.Each cluster presents averages in the two variables under study that are different with statistical significance (see In Figure 3 we can observe the concentration of the leading regions in efficiency in both models (cluster 4), regions with a high level of efficiency regarding to employment in knowledge and technology intensive sectors (cluster 7), regions with a high level of efficiency in patent generation (cluster 3), intermediate regions (clusters 5 and 6) and regions with a low efficiency level (clusters 2 and 3).The largest group of regions is one with the lowest level of efficiency regarding employment (group 2: 50 regions).The leading group of regions (4) gathers 9 regions, in contrast with the 79 regions of the low efficiency clusters (1 and 2).
We can conclude from this analysis that there is a strong polarization among leading regions and low efficiency regions.The number of low efficiency regions is extremely high in comparison with the number of leading regions in efficiency.This shows the necessity of improving the use of R&D inputs at the regional level in Europe and the convenience of establishing learning processes between regions so that low efficiency regions could adapt policies and know-how from the leading ones.GDP per capita, patents and employment in advanced sectors are output indicators

Conclusions
The aim of this study has been to measure the efficiency of R&D activities performed at the regional level in the EU-27 using the data envelopment analysis (DEA).In addition to the basic model (that model includes three inputs and three outputs), we have built three models in order to measure the efficiency of each output.After analyzing the four DEA models, we have grouped all regions in seven different categories, according to the efficiency levels achieved in the DEA models.
The results of this study could be used to assess regional R&D policy in the EU-27 at the regional level.The final objective of DEA is to give each region a tool to ameliorate the efficiency of regional expenditures in R&D and also to offer a context to compare the results of each region with the results of other regions located in the same economic and cultural environment.With this tool, non-efficient regions could calculate the increase in output needed to become 100% efficient.Regional policy makers could benefit from this tool and take into account the efficiency level of their region in order to design policies to improve it.Policy makers in low efficiency regions should consider this low level of efficiency in their territories and analyze its causes.These causes may differ from region to region.North Eastern Scotland and Veneto, for example, obtain satisfactory results in general terms, but they show a clear weakness in terms of efficiency in employment.If these regions improve their situation in that field, they could achieve higher levels of efficiency.On the other hand, regions such as Koblenz or Niederösterreich show very high efficiency levels in all models.Low efficiency regions (especially the ones located in clusters 1 and 2) could try to adapt to their regional features and possibilities R&D policies implemented in those high efficiency regions.
The limitations of this study are twofold.On one hand, the DEA models we have estimated have been built using constant returns to scale, following the vast majority of authors presenting this kind of analysis.On the other hand, the number of input and output indicators used in this work is very limited.A wider range of the indicators taken into consideration in this study could be beneficial in order to strengthen the final outcome.
A qualitative analysis of the regional innovation systems (RIS) of regions taken into consideration in this study could clarify the reasons why some regions are more efficient than others.Using the concept of regional innovation system, it could be possible to conclude whether the lack of interaction between RIS agents, the lack of investment and/or the lack of an institutional framework at the regional level are lowering the efficiency of regional R&D activities.
Figure 2 A four-step methodology

Figure 3
Figure 3 Cluster analysis of the EU-27 regions using results in the patent-oriented model and the employment-oriented model

Some regions are efficient in patent creation, others in employment in advanced sectors Figure 1 Efficient frontiers
NoteIn this figure, A, B, C and D are efficient DMU under VRS.On the contrary, E and F are relative inefficient units.It can be observed that unit C achieves greater output level than E with the same input level, while unit B achieve the same level of output than E with smaller level of input.Under CRS, the only efficient DMU is B.

Table 3 Inputs and Outputs considered in the DEA models
Source: Own elaboration.

Table 4 Results of the basic DEA model for the EU-27 regions
Source: Own elaboration.