Ranking-Based Second Stage in Data Envelopment Analysis: An Application to Research Efficiency in Higher Education

: An alternative approach for the panel second stage of data envelopment analysis (DEA) is presented in this paper. Instead of efficiency scores, we propose to model rankings in the second stage using a dynamic ranking model in the score-driven framework. We argue that this approach is suitable to complement traditional panel regression as a robustness check. To demonstrate the proposed approach, we determine research efficiency of higher education systems at country level by examining scientific publications and analyze its relation to good governance. The proposed approach confirms positive relation to the Voice and Accountability indicator, as found by the standard panel linear regression, while suggesting caution regarding the Government Effectiveness indicator


Introduction
In operations research, data envelopment analysis (DEA) is a non-parametric method used to measure the relative efficiency of decision-making units (DMUs) that convert inputs into outputs.It compares DMUs by calculating their efficiency scores based on a set of inputs and outputs.The method has been widely applied in the fields of agriculture, education, energy, finance, government, healthcare, manufacturing, retail, sport, and transportation (Liu et al., 2013).
In DEA research, it is common to follow the efficiency measurement with a second-stage regression analysis that uses efficiency scores as the dependent variable and includes contextual (or environmental) variables as independent variables.This approach is known as two-stage DEA.In many cases, efficiency is assessed annually, which may require a panel regression as the second-stage model to account for time-varying contextual variables.The most frequently employed panel methods for the second stage are panel linear regression (see, e.g., Chen et al., 2019;Mamatzakis et al., 2013) and panel Tobit regression (see, e.g., Borozan, 2018;Fonchamnyo and Sama, 2016).In linear regression, log transformations of efficiency scores are often used (see, e.g., Poveda, 2011;Zhang et al., 2018).Other panel methods include panel quantile regression (see, e.g., Frýd and Sokol, 2021;Zhang et al., 2018), panel fractional regression (see, e.g., Da Silva e Souza and Gomes, 2015; Fonchamnyo and Sama, 2016), and panel beta regression (see, e.g., Pirani et al., 2018;Song et al., 2016).Liu et al. (2016) identifies two-stage DEA as one of the active fronts in DEA research.
The standard two-stage DEA has been subject to criticism by Simar and Wilson (2007), Simar and Wilson (2011), and Kneip et al. (2015).The criticisms mainly stem from three issues: (1) correlation among the estimated efficiency scores due to the complex structure of the data generating process, (2) the use of estimated efficiency scores as dependent variable instead of the true unobserved efficiency scores, and (3) the potential inseparability between the frontier production and the impact of contextual variables.These issues can significantly affect the validity of inference.When dealing with repeated assessement of efficiency, there is also the issue of temporal dependence.Nevertheless, some authors such as Banker and Natarajan (2008), McDonald (2009), and Banker et al. (2019) argue for the use of linear regression.For a survey on statistical approaches in nonparametric frontier models, see Moradi-Motlagh and Emrouznejad (2022).
The focus of this paper is on the two-stage DEA in the multiple-period setting.However, the time series dimension can be incorporated into DEA models in various ways.Notable examples include window analysis (Charnes et al., 1985), Malmquist productivity index (Färe et al., 1994), dynamic DEA model (Nemoto and Goto, 1999;Sueyoshi and Sekitani, 2005), multiperiod aggregative efficiency (Park and Park, 2009), and dynamic slacks-based measure (Tone and Tsutsui, 2010).
In this paper, we present an alternative approach for the panel second stage of DEA.Instead of modeling efficiency scores, we propose to model the rankings.In the recent literature, Holý and Zouhar (2022) developed a time series model for rankings that utilize the Plackett-Luce distribution and incorporates autoregressive and score dynamics.This model is based on the modern framework of score-driven models introduced by Creal et al. (2013) and Harvey (2013).While Holý and Zouhar (2022) applied the model to the results of the Ice Hockey World Championships, they also suggested its potential use in the second stage of DEA.Following this call, we devote this paper to exploring the use of this dynamic ranking model in DEA.
The motivation for using the score-driven dynamic ranking model in the second stage of DEA arises from the following properties: • Relevance of Rankings.Rankings preserve the important information of mutual comparison among DMUs.In certain scenarios, the primary objective of DEA may even be to obtain rankings of DMUs, in which case modeling rankings directly is more appropriate.The longterm behavior of DMUs may also be of interest, in which case the long-term ranking may have a clearer interpretation than an aggregate of efficiency scores.
• Robustness to DEA Model.Consider two DEA models: the super-efficiency DEA model of Andersen and Petersen (1993) and the universal DEA model of Hladík (2019), both with either constant returns to scale (CRS) of Charnes et al. (1978) or variable returns to scale (VRS) of Banker et al. (1984).Despite producing different efficiency scores, these models generate the exact same ranking.By modeling rankings instead of efficiency scores in the second stage, any differences between these models are eliminated, as the response variable is the same in both cases.An additional consideration when modeling efficiency scores is whether to use the logarithmic transformation.However, since the log transformation preserves rankings, this is not a concern when using a ranking model.
• Robustness to Outliers.Outliers, in the form of extreme values of efficiency scores, can significantly influence the coefficients in a second-stage regression model.However, using rankings can mitigate this issue, as a DMU with an extremely low or high efficiency score would simply be ranked last or first, respectively.Thus, a ranking model can effectively handle such outliers.
• Simple yet Powerful.The model of Holý and Zouhar (2022) is straightforward to work with.The Plackett-Luce distribution, unlike its alternatives, is available in a closed form (see Alvo and Yu, 2014) and the dynamics are observation-driven (see Cox, 1981).As a result, the model can be estimated using the maximum likelihood method, and conventional Hessian-based standard errors can be used.Moreover, the model only requires a modest number of parameters, consisting of individual effects of DMUs, regression coefficients common for all DMUs, and two additional parameters controling dynamics common for all DMUs.
Our approach also faces the following limitations: • Loss of Information.While using rankings instead of efficiency scores can provide robustness to DEA model and outliers (as discussed above), it also leads to loss of information.This loss can be beneficial in some scenarios, but it is still important to recognize that it occurs.One drawback of using rankings alone is that it is not possible to determine the boundary between inefficient and efficient DMUs.Efficiency scores, on the hand, provide a clear distinction between the two groups.
• Different Data Generating Process.Our approach does not address the criticism of Simar and Wilson (2007), Simar andWilson (2011), andKneip et al. (2015).Indeed, the dependence between the DMUs is not captured by the Plackett-Luce distribution, which assumes the property known as the independence of irrelevant alternatives.The data generating process assumed by the model of Holý and Zouhar ( 2022) is much simpler then the true one generated by DEA.
• Absence of Ties.The model of Holý and Zouhar (2022) has a limitation in that it does not allow for rankings with ties.This means that in the second stage, we need to use a suitable DEA model that can rank all DMUs, including the efficient ones.However, this can be addressed by extending the Plackett-Luce distribution to incorporate ties, as demonstrated by Turner et al. (2020).
• Sufficient Variation in Rankings.A single realization of efficiency scores is often used in a second stage regression model.A single ranking is, however, not enough for a meaningful analysis.Repeated rankings are therefore needed, which naturally take the form of panel data.Our approach is therefore suitable only when the time dimension is present.Even with repeated rankings, however, the Plackett-Luce distribution requires that for any possible partition of DMUs into two non-empty subsets, there exists at least one DMU in the second subset that is ranked higher than at least one DMU in the first subset (see Hunter, 2004).
Our approach is fundamentally different from traditional panel regressions, but it is not intended to replace them.Particularly when it suffers from the same shortcomings highlighted by Simar and Wilson (2007), Simar and Wilson (2011), and Kneip et al. (2015).Instead, our approach is best used as a complement to traditional panel regressions to provide valuable insights that are not burdened by the problems specific to efficiency scores.This can be viewed as a form of robustness check, where both approaches are used to provide a more complete picture of the data.Given the controversies surrounding the second stage DEA, conducting extensive robustness checks is crucial for ensuring the reliability and validity of the results.DEA practitioners who wish to utilize the dynamic ranking model can do so easily using the gasmodel R package, which offers all the necessary tools for estimation, forecasting, and simulation.
As an illustration of the proposed approach, we explore the research efficiency in higher education of European Union (EU) countries through the analysis of scientific publications in 2005-2020.In the first stage, we perform DEA analysis for each year independently.We use gross domestic expenditure on R&D and the number of researchers as inputs to reflect the financial and human resources, respectively.For outputs, we use the number of publications and the number of citations to reflect the quantity and quality of scientific research, respectively.In the second stage, we investigate the influence of good governance on the research efficiency.As contextual variables, we use the six Worldwide Governance Indicators (WGI) of Kaufmann et al. (2011), together with the gross domestic product (GDP).We perform panel linear regression analysis of efficiency scores obtained by three DEA models proposed by Charnes et al. (1978), Andersen andPetersen (1993), andHladík (2019), along with the dynamic ranking model of Holý and Zouhar (2022).As a side note, we demonstrate that the results obtained from the model of Andersen and Petersen (1993) can be derived from the model of Hladík (2019).All models uncover that the Voice and Accountability indicator is significantly positively correlated with research efficiency suggesting that participation in selecting the government, freedom of expression, freedom of association, and freedom of media are key factors of governance influencing research efficiency.The Government Effectiveness indicator has also positive effect, however, its significance is not confirmed by all models and this result is therefore not robust.No other significant relations are found.By utilizing the proposed approach in this study, we are able to assess the robustness of the relationship to the Voice and Accountability indicator.However, the results also indicate caution in interpreting the findings related to the Government Effectiveness indicator.Therefore, conducting extensive robustness checks such as this one is important to increase the reliability of the analysis and prevent misleading conclusions.
The rest of the paper is structured as follows.In Section 2, we present three DEA models proposed by Charnes et al. (1978), Andersen andPetersen (1993), and, Hladík (2019), which are utilized in the subsequent analysis.In Section 3, we present details on the dynamic ranking model of Holý and Zouhar (2022) and its estimation, along with some modifications suitable to our case.In Section 4, we conduct an empirical study to examine research efficiency in higher education and compare the proposed ranking approach with the traditional panel regression approach.We conclude the paper in Section 5.

First Stage: Measuring Efficiency
The first stage of DEA involves determining the relative efficiency scores of the DMUs.The number of DMUs is denoted by N .Each DMU transforms I inputs into J outputs.Let x ni denote the i-th input of the n-th DMU, and y nj denote the j-th output of the n-th DMU.The matrix of inputs is denoted by X = (x ni ) N,I n=1,i=1 , while the matrix of outputs is denoted by Y = (y nj ) N,J n=1,j=1 .The inputs of a single DMU n are denoted by x n = (x n1 , . . ., x nI ) ⊺ , and the outputs of a DMU n are denoted by y n = (y n1 , . . ., y nJ ) ⊺ .The notation X −n represents the inputs of every DMU but n, while Y −n represents the outputs of every DMU but n.
2.1 Basic DEA Charnes et al. (1978) proposed the very first DEA model, which has since become one of the most widely used DEA models to date.This model is commonly referred to as the CCR model and is based on the assumption of constant returns to scale (CRS).The efficiency scores θ CCR n are found for each DMU n by the following linear program: where u and v are vectors of weights for the outputs and inputs respectively.The efficiency scores for inefficient DMUs lie in [0, 1) and are equal to 1 for efficient DMUs.Note that DMUs achieving an efficiency score of 1 may still be considered weakly efficient, as they may have non-zero slack in either inputs or outputs (see, e.g., Cooper et al., 2007).

Super-Efficiency DEA
A shortcoming of the CCR model is that it cannot differentiate between efficient DMUs, which can lead to the loss of valuable information.Andersen and Petersen (1993) proposed a super-efficiency DEA to overcome this limitation.In this model, the DMU under evaluation is excluded from the set of benchmarks, which allows efficient DMUs to achieve score greater than 1.The super-efficiency model with CRS (labeled as the AP model) is given by the following linear program: (2) The efficiency scores for inefficient DMUs are the same as those obtained from the CCR model, while the scores for efficient DMUs are greater than or equal to 1.

Universal DEA
Recently, Hladík (2019) proposed a DEA formulation that focuses on a robust optimization viewpoint.The model uses a scaled Chebyshev norm to measure efficiency as a distance to inefficiency and inefficiency as a distance to efficiency.The scores generated by this model are universal in the sense that they are naturally normalized, and therefore, can be compared across unrelated models.The universal DEA model with CRS (labeled as the H model) is given by the following linear program: (3) Note that Hladík (2019) also proposed a nonlinear DEA model based on the Chebyshev norm, to which (3) is a tight approximation.The efficiency scores for inefficient DMUs lie in [0, 1), while the scores for efficient DMUs lie in [1,2].
Applications of the universal DEA model include Holý and Šafr (2018), which analyzes the efficiency of basic and applied research; Frýd and Sokol (2021), which focus on the efficiency of farms; and Holý (2022), which examines the efficiency of public libraries.

Relation of Super-Efficiency and Universal DEA
The universal DEA model is closely related to the super-efficiency DEA model of Andersen and Petersen (1993).Hladík (2019) showed that the ranking of DMUs according to θ AP n is the same as the ranking according to θ H n .We go a bit further and show that the models are even more connected as the efficient scores themselves can be derived by the following transformations: The proof is as follows.First, we rewrite model (2) as (5) The objective function y ⊺ n u in (2) can be replaced by an additional variable α ≥ 0, when the constraint n v < 1 must be suboptimal.This is because both v and u could be multiplied by 1/c to achieve x ⊺ n v = 1, resulting in an objective that is 1/c times higher.Next, using the substitution we rewrite model (3) as Note that we can impose α ≥ 0 as y ⊺ n ũ ≥ 0. This also follows from δ ∈ [−1, 1) in model (3).Models ( 5) and ( 7) have the same constraints and differ only in the objective function.The function 2α/(1 + α) is monotonically increasing on [0, ∞), thereby attaining its maximum value at the same point as α.Thus, we have θ . This proof follows and extends Proposition 2 in Hladík (2019).

Second Stage: Modeling Dynamic Rankings
The second stage of DEA involves identifying the factors that affect efficiency scores and measure their impact.We assume periodic evaluation of efficiency of the same set of DMUs at times t = 1, . . ., T with efficiency scores θ t = (θ 1t , . . ., θ N t ) ⊺ .In this paper, we propose to model rankings of DMUs, instead of their efficiency scores as is usual in the second-stage DEA.Let R t (n) denote the rank of a DMU n according to efficiency scores θ t at time t.The complete ranking at time t is then denoted by R t = (R t (1), . . ., R t (N )) ⊺ .The inverse of this ranking is the ordering O t = (O t (1), . . ., O t (N )) ⊺ at time t, where O t (r) represents the DMU with rank r at time t.We employ the dynamic ranking model of Holý and Zouhar (2022).

Plackett-Luce Distribution
We assume that at each time t the ranking R t follows the Plackett-Luce distribution proposed by Luce (1959) and Plackett (1975).In the ranking literature, it is a widely used probability distribution for random variables in the form of permutations.Each DMU n at each time t has a worth parameter w nt ∈ R reflecting its rank at time t.The probability of a higher rank increases with a higher worth parameter value.Specifically, the probability mass function is given by In other words, a ranking is iteratively constructed by selecting the best DMU, followed by the second best, the third best, and so on.At each stage, the probability of selecting a particular DMU is proportional to the exponential of its worth parameter divided by the sum of the exponentials of the worth parameters of all DMUs that have not been selected yet.The log-likelihood function is given by The score (i.e. the gradient of the log-likelihood function) is given by The Plackett-Luce distribution is based on the Luce's choice axiom, which states that the probability of selecting one item over another from a set of items is not influenced by the presence or absence of other items in the set (see Luce, 1977).This property of choice is known as the independence of irrelevant alternatives.Clearly, this property is not met in the case of DEA as addition or removal of DMUs from the set can influence efficiency scores and even ranking of other DMUs.As in the case of many second-stage models, the proposed dynamic ranking model therefore does not conform to the complex data generating process of DEA efficiency scores and rankings.Nevertheless, the proposed model can be a useful tool due to its simplicity when applied with caution.

Regression and Dynamics
We let the worth parameters linearly depend on K contextual variables and also include an autoregressive and score-driven component.For n = 1, . . ., N , t = 1, . . ., T , the worth parameters are then given by the recursion where ω n are the individual effects for each DMU n, β k are the regression parameters for the contextual variables z nkt , φ is the autoregressive parameter, and α is the score parameter for the lagged score ∇ n (w t−1 |R t−1 ) given by ( 10).The model corresponds to panel regression with fixed effects and dynamic error term.Note that the model is overparametrized as the probability mass function ( 8) is invariant to the addition of a constant to all worth parameters.We therefore use standardization Our specification differs from the model of Holý and Zouhar (2022) by introducing the separate e nt component.Our specification is inspired by the regression with ARMA errors, while the specification of Holý and Zouhar (2022) resemble the ARMAX model.In our specification, the contextual variables influence only concurrent ranking, which is easier to interpret.Our model is also easier for numerical estimation as ω n and φ are disconnected.
The e nt component captures dynamic effects by the autoregressive term and the lagged score.The model therefore belongs to the class of score-driven models, also known as generalized autoregressive score (GAS) models or dynamic conditional score (DCS) models, proposed by Creal et al. (2013) and Harvey (2013).The score can be interpreted as a measure of the fit of the Plackett-Luce model to the observed rankings.A positive score indicates that a DMU n is ranked higher than what its worth parameter w nt suggests, while a negative score suggests that it is ranked lower.A score of zero indicates that the DMU is ranked as expected according to its worth parameter.Thus, the score can be used as a correction term for the worth parameter after the ranking is observed.

Maximum Likelihood Estimation
The model is observation-driven and can be estimated by the maximum likelihood method.Let θ = (ω 1 , . . ., ω N −1 , β 1 , . . ., β K , φ, α) ′ denote the vector of the N + K + 1 parameters to be estimated.Note that ω N is obtained from (12) as ω N = − N −1 n=1 ω n .The maximum likelihood estimate θ is then given by where the log-likelihood ℓ (w t |R t ) is given by ( 9) and w t follow (11).The problem (13) can be numerically solved by any general-purpose algorithm for nonlinear optimization.Furthermore, the standard errors of the estimated parameters are computed using the empirical Hessian of the loglikelihood evaluated at θ.
In order for the log-likelihood to have a unique maximum, it is necessary that for any possible partition of DMUs into two non-empty subsets, there exists at least one DMU in the second subset that is ranked higher than at least one DMU in the first subset (see Hunter, 2004).This condition ensures that no DMU is always ranked first, which would result in an infinite worth parameter and violate the assumptions of maximum likelihood estimation.

Empirical Study
Our empirical study aims to analyze research efficiency in the higher education sector by examining scientific publications on a country-level basis, with a particular focus on the EU countries between 2005 and 2020.Specifically, we seek to determine whether certain aspects of good governance have a positive impact on research efficiency.

Relevant Studies
Assessing the efficiency of research and development (R&D) is a widely studied topic in the data envelopment analysis (DEA) literature.In Table 1, we present a list of several relevant DEA papers and the key specifics of each study.We focus on the assessment of countries (and regions), although similar analyses can be performed at more detailed levels of institutions (see, e.g., Jablonsky, 2016) and projects (see, e.g., Lee et al., 2009).Typically, studies on R&D efficiency use financial resources and human resources as the two main inputs.In terms of outputs, some studies focus on variables related to scientific publications (such as Hung et al., 2009), some on patents (such as Cullmann et al., 2012), while the majority consider both types of R&D-related outcomes.

Input, Output, and Contextual Variables
As inputs, we use the following variables: • R&D Expenditure refers to the gross domestic expenditure to R&D activities performed in the higher education sector.The unit is million purchasing power standards.Holý and Šafr (2018) emphasize the importance of accounting for purchasing power parity when adjusting prices to ensure meaningful comparisons between countries with varying purchasing power.This variable reflects the financial resources.
• Number of Researchers refers to the total number of researchers employed in the higher education sector.The unit is full-time equivalent.This variable reflects the human resources.
As outputs, we use the following variables: • Number of Publications represents the number of articles, reviews, and conference papers published.This variable reflects the quantity of scientific research.
• Number of Citations represents the number of citations to the published articles, reviews, and conference papers.This variable reflects the quality of scientific research.
As contextual variables, we use the six Worldwide Governance Indicators (WGI), which Kaufmann et al. (2011) define in the following way: • Voice and Accountability captures perceptions of the extent to which a country's citizens are able to participate in selecting their government, as well as freedom of expression, freedom of association, and a free media.
• Political Stability and Absence of Violence/Terrorism captures perceptions of the likelihood that the government will be destabilized or overthrown by unconstitutional or violent means, including politically-motivated violence and terrorism.
• Government Effectiveness captures perceptions of the quality of public services, the quality of the civil service and the degree of its independence from political pressures, the quality of policy formulation and implementation, and the credibility of the government's commitment to such policies.
• Regulatory Quality captures perceptions of the ability of the government to formulate and implement sound policies and regulations that permit and promote private sector development.
• Rule of Law captures perceptions of the extent to which agents have confidence in and abide by the rules of society, and in particular the quality of contract enforcement, property rights, the police, and the courts, as well as the likelihood of crime and violence.
• Control of Corruption captures perceptions of the extent to which public power is exercised for private gain, including both petty and grand forms of corruption, as well as "capture" of the state by elites and private interests.
Finally, we also include the following variable as a contextual variable: • Gross Domestic Product is used to control for the economic development of a country.To filter out the trend, we use the percentage of EU total GDP per capita based on million purchasing power standards.
We therefore have I = 2 input variables, J = 2 output variables, and K = 7 contextual variables.
Similarly to Holý and Šafr (2018), we lag the input and contextual variables by one year, recognizing that there is typically a delay between the input variables and the corresponding output variables.

Data Sample
Our data sample contains all N = 27 countries of EU.The outputs are taken from 2005 to 2020, while the inputs and contextual variables are taken with a one-year lag from 2004 to 2019.We therefore have T = 16 time periods to analyze.The source of the R&D expenditure, the number of researchers, and the GDP is Eurostat1 .There were 4 missing observations for the number of researchers of Greece in 2004, 2008, 2009, and 2010.We have interpolated these values using linear regression.The source of the number of documents and the number of citations is Scimago Journal & Country Rank2 .The source of the Worldwide Governance Indicators is the World Bank3 .Table 2 presents the five-number summary of all variables.Figure 1 depicts the correlation matrix.It reveals strong positive correlations among all input and output variables.The contextual variables form a separate block, in which all variables also demonstrate positive correlations, albeit to a lesser degree in some cases.The correlations between input/output variables and contextual variables are relatively small.

Suitability of DEA Model
In order for efficiency scores to be interpretable, several criteria need to be met.We have adopted the best practices in DEA as outlined by Dyson et al. (2001) and Cook et al. (2014).We begin by establishing that the process under evaluation is well-defined.Our focus is on the research output in the form of scientific publications.The two chosen output variables encompass both the quantity and quality of scientific publications.While quantity is naturally quantifiable, measuring quality can be achieved using several metrics such as the number of citations and the h-index.However, combining indices and volume measures can pose difficulties and we have therefore decided to use the number of citations for our analysis.The two primary resources for conducting research are funding and personnel, both of which are represented by the two input variables we have selected.All input and output variables are volume measures and are isotonic (i.e.increased input reduces efficiency, while increased output increases efficiency).With a total of 4 input and output variables and 27 DMUs, our DEA model possesses sufficient discriminatory power.
Next, we examine the homogeneity assumption.Our set of DMUs encompasses all EU countries as of February 2020, although it should be noted that EU membership changed during the period under observation.Specifically, Romania and Bulgaria joined in 2007, and Croatia became a member in July 2013, whereas the United Kingdom departed in January 2020.Nevertheless, EU countries should be considered homogeneous in terms of research due to the harmonized policies and frameworks   Finally, we analyze the appropriate returns to scale.Note that EU countries exhibit considerable variation in size, with Germany being the most populous country at 83.17 million people and Malta being the least populous with a population of 0.51 million as of January 2020.Our focus is on the higher education sector, which is composed of (1) universities, colleges of technology, and other institutions providing formal tertiary education programmes, (2) research institutes, centres, experimental stations and clinics that have their R&D activities under the direct control of, or administered by, tertiary education institutions (see OECD, 2015).The scientific output of a country can be seen as the sum of outputs from these individual institutions.As a result, we assume that country size does not have a significant impact on the relative scientific output and employ the constant returns to scale (CRS) assumption in our analysis.

Efficiency Scores
Table 3 reports the efficiency scores obtained from the universal DEA model (3) of Hladík ( 2019) for all countries and years.Figure 2 then displays the ranges of the annual rankings.Bulgaria consistently shows high levels of efficiency across most years, which can be primarily attributed to its extremely low R&D spending, both in absolute value and relative to the number of publications, citations and even researchers.Romania is also found to be efficient in many years due to their relatively low R&D spending.Cyprus has an average of 3.10 publications per researcher, the highest among all countries, followed by Slovenia with 2.57.Moving to Western Europe, the Netherlands stands out as the country with the highest number of citations per researcher, with an average of 81.49.The final country that is ever found efficient in our sample is Luxembourg.Germany, as the largest country, dominate in absolute values of all inputs and outputs; its efficiency is, however, average.At the other end of the efficiency spectrum, we find Latvia with 0.66 publications and 9.09 citations per researcher on average, and Lithuania with 0.61 publications and 8.82 citations per researcher on average.We conduct a simple experiment.We remove a single DMU from the set, compute DEA efficiency, and compare the resulting ranking with the original ranking based on the full set of DMUs.We repeat this process for all DMUs and time periods.Thus, we obtain a total of N • T = 432 DEA rankings.In total, 87 percent of the rankings remain unchanged after removing a single DMU.The correlation coefficient between the rankings based on the reduced sets and the full set is 0.9954.Naturally, the ranking is more likely to change when DMUs with higher efficiency scores are removed.In our case, removing countries such as Bulgaria, Cyprus, Luxembourg, Netherlands, Romania, and Slovenia-all of which hold high ranks according to Figure 2-results in changes to the ranking across multiple time periods.In summary, we confirm the violation of the IIA assumption in our empirical study.Nevertheless, the extent of this violation is rather mild, as evidenced by the relatively high correlation between rankings.

Long-Term Ranking
When conducting efficiency analysis using DEA across multiple time periods, it can be beneficial to report the long-term behavior.This could be done by simple aggregate statistics, such as modal or median ranks.But it is also a perfect task for our dynamic ranking model.For this purpose, we estimate the model without any contextual variables, only in the form of a stationary time series model.We can then rank DMUs according to the unconditional values of the worth parameters, which are simply equal to ω n .This long-term or "ultimate" ranking is visualized in Figure 3.

Panel Regression and Ranking Model
We proceed to the second stage where we find relation between the efficiency scores or their associated rankings and the contextual variables.For the efficiency scores, we employ standard panel linear regression model with the robust estimation of the standard errors by the White method.As dependent variable, we use the efficiency scores obtained by the basic DEA model (1) of Charnes et al. (1978) (denoted as CCR), the super-efficiency model (2) of Andersen and Petersen (1993) (denoted as AP), and the universal DEA model (3) of Hladík (2019) (denoted as H).We also use the log transform of the AP efficiency scores, which are equal to the logit transform of H efficiency scores, Furthermore, we use the AP efficiency scores, or equivalently the H efficiency scores, to derive rankings of the DMUs, which serve as the dependent variable in our dynamic ranking model.The results of the estimated models are reported in Table 4.All panel linear regression models exhibit consistent signs of coefficients.The only exception is the Rule of Law indicator, with a positive coefficient for the Log model but negative for the CCR, AP, and H models.All these coefficients are, however, very close to zero and statistically insignificant.In contrast, all panel linear regression models find the Voice and Accountability indicator to be statistically significant at the 0.05 level.Furthermore, the Government Effectiveness indicator is significant according to all panel regression models but AP.All the remaining contextual variables are found insignificant by all models.
The dynamic ranking model confirms the positive and significant relation to the Voice and Accountability indicator, which is consistent with the results of all panel regression models.However, regarding the Government Effectiveness indicator, the model agrees with AP and finds it to be insignificant.The Political Stability and GDP variables have opposite signs, in contrast to the panel regression models, but remain insignificant.It is important to note that while the signs and significance of coefficients can be compared between the panel regression models and the dynamic ranking  The estimated value of 0.86 for φ suggests that the process is stationary, with high persistence over time.
The inference of the ranking model is derived using the empirical Hessian (denoted as Hess. in Table 4).To ensure the robustness of our findings, we additionally employ the parametric bootstrap technique to compute standard errors and p-values (denoted as Boot. in Table 4).Our bootstrap procedure is based on 1 000 simulated samples.According to Table 4, the estimated standard errors across the two methods are quite similar.An exception can be seen in the case of the Rule of Law variable; the bootstrapped standard deviation is noticeably lower here.Despite this discrepancy, the coefficient for this variable remains statistically insignificant at typical significance levels.The p-values exhibit similar behavior, and the significance of the variables remains unchanged; only the Voice and Accountability variable achieves significance at a lower level.Collectively, these findings affirm the validity of inference based on the empirical Hessian within our finite sample.

Computing Environment
The empirical study was performed in R. The CCR and AP DEA efficiency scores were obtained using the dea() and sdea() functions from the Benchmarking package.The H efficiency scores were obtained from the AP DEA efficiency scores using transformation (4).The panel regressions were estimated using the plm() function from the plm package with robust inference obtained using the coeftest() function from the lmtest package.The dynamic ranking model was estimated by the gas() and gas_bootstrap() functions from the gasmodel package.All these packages are available on CRAN.

Discussion of Results
The results of our analysis show that the Voice and Accountability indicator has a consistently positive and significant correlation with research efficiency across all models.This indicates that factors such as participation in selecting the government, freedom of expression, freedom of association, and freedom of media, which form the Voice and Accountability indicator, play a crucial role in enhancing research efficiency.In contrast, the Government Effectiveness indicator also has a positive effect on research efficiency, but its significance is not confirmed by all models.This suggests that while Government Effectiveness can enhance research efficiency, it may not be as crucial as the Voice and Accountability indicator and lacks robustness.The findings of this study can inform policy decisions and strategic planning to enhance research performance and impact, ultimately advancing knowledge and innovation in various fields.

Conclusion
The main contribution of the paper lies in proposing and showcasing the use of the dynamic ranking model of Holý and Zouhar (2022) in the second stage of DEA.The primary objective of the model is to serve as a complement to conventional second-stage models and provide a robustness check.In the empirical study evaluating research efficiency in the higher education sector, we find that the results of the second stage may slightly differ when using a panel regression model applied to the efficiency scores from various DEA models and when using the studied dynamic ranking model.All models agree that the Voice and Accountability variable, one of the indicators of good governance, has a positive and significant impact on research efficiency.Additionally, all models suggest that the Political Stability, Regulatory Quality, Rule of Law, and Control of Corruption indicators do not significantly correlate with research efficiency.However, the results on the Government Effectiveness indicator are mixedwhile some models suggest that it has a significant positive effect, others, including the dynamic ranking model, consider this variable insignificant.As the second stage of DEA is surrounded by controversies about the appropriateness of its use, while being widely applied by researchers at the same time, it is important to perform various robustness checks and not rely on results from a single model.In this sense, the dynamic ranking model can be quite useful as it is fundamentally different from traditional panel models and is thus able to provide a different perspective.However, just like traditional second-stage models, it also suffers from the drawback of misspecifying the process generating the efficiency scores and rankings.It can also be used only for repeated rankings according to efficiency scores, such as annual assessment of efficiency, that do not include ties.While the use of the dynamic ranking model in the second stage of DEA may not be a perfect solution for all situations, it is a valuable addition to the DEA researcher's toolkit.
As an additional contribution, we show that the recently proposed universal DEA model of Hladík (2019), with attractive properties and interpretation, is closely related to the widely used super-efficiency DEA model of Andersen and Petersen (1993).Specifically, we demonstrate that the efficiency scores from one model can be derived from the other model using a simple transformation.
Future research efforts should be directed towards expanding the dynamic ranking model in two ways.Firstly, the model should be able to incorporate ties, which may occur due to DEA models lacking super-efficiency.Secondly, the model should be able to capture more complex interdependen-cies between DMUs, which can be perhaps achieved or at least approximated by employing Thurstone order statistics models based on the multivariate normal or multivariate extreme value distributions.

Figure 1 :
Figure 1: The correlation matrix of the input, output, and contextual variables.

Figure 2 :Figure 3 :
Figure 2: The ranges of annual rankings by research efficiency, together with the long-term ranking according to the dynamic ranking model.

Table 1 :
An overview of relevant studies.

Table 2 :
The minimum (Min), first quartile (Q1), median (Q2), third quartile (Q3), and maximum (Max) of the input, output, and contextual variables.European Commission, such as the European Research Area (ERA) and the Horizon Europe program.These initiatives aim to promote collaboration and standardization among EU member states, facilitating the dissemination of research findings and enhancing the overall quality of scientific output.

Table 3 :
Hladík (2019)ficiency scores obtained from the universal DEA model (3) ofHladík (2019).As discussed in Sections 1 and 3.1, DEA rankings do not adhere to the independence of irrelevant alternatives (IIA) assumption of the Plackett-Luce distribution.This means, among other things, that if a DMU is removed from the set, the ranking of the remaining DMUs can change.The question is to what extent the IIA assumption is violated in real data.

Table 4 :
The estimated coefficients with standard errors for panel linear regressions and the dynamic ranking model.
* * * p < 0.001; * * p < 0.01; * p < 0.05 model, the estimated values cannot be directly compared due to differences in the model specifications.The coefficients φ and α, which control the dynamics, have both positive values, as expected.