A robust DEA model for measuring the relative efficiency of Iranian high schools

Article history: Received October 2


Introduction
In the existing economic circumstances, all organization are confronted with critical decisions and the primary concern in many organizations is to work as efficiently as possible in order to reduce the overall costs at minimum level.Financial figures are normally used to measure the relative performance of different financial organizations.However, there are many governmental organizations where financial data are not playing important role on measuring the success of the firms such as educational systems.For instance, consider two high schools, which run with the same budget where we could compare the relative performances of these two firms using some nonfinancial figures such as the number of students who could enter universities.In such a case, we may have different input/output factors and measuring the relative efficiency could be performed using a popular DEA method.The main concern is to carefully measure the input and the output parameters and there is always uncertainty on both the input and output information.The methods for estimating efficiency are normally classified as parametric and non-parametric.Parametric methods assume a particular functional form such as cost or production functions whereas non-parametric methods do not require estimating this function.Stochastic frontier analysis (SFA) together with DEA are the most important parametric and non-parametric models, respectively, for estimating the efficiency of a set of decision-making units (which are called DMUs).
DEA is a data-oriented method, which uses linear programming for measuring and benchmarking the relative efficiency of different decision-making units (DMUs) with multiple input and output.In contrast, SFA is an alternative frontier-oriented approach, which uses regression analysis to estimate a function.Separation of the inefficiency effect from the statistical noise caused by data errors gives parametric method an advantage.Forsund et al. (1980) discussed advantages and disadvantages of DEA and SFA.DEA has been widely applied in educational organizations for measuring efficiency.However, the majority of cases do not consider uncertainty on input/output data.For example, Levin (1974), Bessent and Bessent (1980), Bessent et al. (1982), Ludwin and Guthrie, (1989), and Färe et al., (1989) used this method to measure the efficiency of U.S. schools, and Jesson et al., (1987) applied DEA method to study the efficiency of school districts (LEAs) in the UK.Bonesrqnning and Rattsq, (1994) conducted a study of efficiency analysis of Norwegian high schools.SFA was applied by Barrow (1991) to estimate a stochastic cost frontier based on data from schools in England.Wyckoff and Lavinge (1991) together with Cooper and Cohn (1997) estimated technical efficiency using school district data from New York and South Carolina, respectively.Grosskopf et al. (1997) applied the parametric approach to measure allocative and technical efficiency in Texas school districts.Levin (1974Levin ( , 1976) ) is considered the early researcher who measured technical inefficiency in education production.He applied the Aigner and Chu (1968) parametric non-stochastic linear programming model to estimate the coefficients of the production frontier.In these studies, he found that parameter estimation by ordinary least squares (OLS) does not provide correct estimated relationship between inputs and outputs for technically efficient schools and only determines an average relationship.Klitgaard and Hall (1975) used OLS techniques to conclude that the schools which produce remarkable results in achievement scores, have smaller classes and more qualified teachers.
In the series of studies on technical efficiency in public schools using the DEA method, one of the earliest was proposed by Charnes (1978).They evaluated the relative efficiency of individual schools to a production frontier.Bessent and Bessent (1980) and Bessent et al. (1982) incorporated a nonparametric form of the production function and considered multiple outputs.They also determined the sources of inefficiency for an individual school to make further improvements.Ray (1991) and McCarty and Yaisawarng (1993) considered controllable inputs in the first stage of the DEA model to measure technical efficiency.Then the environmental (i.e., non-controllable) inputs were used as repressors in the second stage using OLS or a Tobit model, and the residuals were analyzed to evaluate the performance of each school district.In these studies, it is assumed that the production frontier of all firms are the same and deterministic, and any deviation from that frontier is caused by differences in efficiency.The concept of the stochastic production frontier was introduced and developed by Aigner et al. (1977), Battese and Corra (1977), Meeusen and van Den Broeck (1977), Lee and Tyler (1978), Pitt and Lee (1981), Jondrow (1982), Kalirajan and Flinn (1983), Bagi and Huang (1983), Schmidt and Sickles (1984), Waldman (1984) and Battese and Coelli (1988).
Practically, there are many cases we face with perturbation in the input/output data while the original DEA assumes the measuring of input/output with full accuracy.The noise or perturbation in data causes some effects on the estimation of efficient frontier, which could have an influence on the estimated efficiency scores of different high schools.Therefore, DEA approach is open to criticism where a small perturbation causes a big change on the ranking.
In a survey study, through solving some benchmark problems, Ben-Tal and Nemirovski (2000) showed that a small perturbation on data could produce infeasible solution.Therefore, the generated ranking could be unreliable.This problem is more serious when the efficiency of a particular DMU is close to that of another.Recently studies on robust optimization provide opportunity to develop some robust DEA models which can handle problems caused by perturbation and lead to achieve reliable rankings.Robust optimization developed by Ben-Tal and Nemirovski (1998, 1999, 2000), Bertsimas and Sim (2003, 2004, 2006) and Bertsimas et al. (2004).This technique generally refers to the modeling of optimization problems which have uncertainty on data to achieve a solution that is certainly good for all or most possible realizations of uncertain problems.Uncertainty in the parameters represents all (or most) possible values that may be realized for uncertain parameters.Soyster (1973) investigated a very simple approach to robust optimization.In these approaches the column vectors of the constraint were assumed to belong to ellipsoidal uncertainty sets.Falk (1976) conducted more researches into inexact linear programs to follow this idea.Ben-Tal and Nemirovski (1998, 1999, 2000) and El-Ghaoui and Lebret (1997) proposed a new model for uncertain data based on ellipsoidal uncertainty sets.Recently, Bertsimas and Sim (2003, 2004, and 2006) and Bertsimas et al. (2004) have developed a robust optimization approach based on polyhedral uncertainty sets.This method is capable to preserve the class of problems under analysis.Sadjadi and Omrani (2008) introduced the DEA model based on a robust optimization approach.
In this paper, we propose a general method for efficiency estimation with the case study of Iranian high schools with uncertain data.Our proposed robust DEA model is based on Ben-Tal (BN) and Bertsimas (BR) approaches.In educational system, we are faced with some cases where it is necessary to consider perturbation.Therefore, we apply the robust optimization technique to overcome the uncertainty on data.It can handle the perturbation problems.The method is applied for Iranian high schools and the results are compared with other existing methods.This paper is organized as follows.We first present the DEA model in section 2. In section 3, robust DEA with uncertain data are discussed and the SFA method is presented in section 4. The proposed DEA and SFA methods are solved using some actual data to show validity in section 5. Furthermore, section 5 presents the comparison between Ben-Tal, Bertsimas and SFA approaches.Finally, the conclusions summarize the contribution of the paper in Section 6.

Data envelopment analysis
There are various methods developed for estimating the efficiency scores of different decision making units (DMUs) and they are generally classified as deterministic and stochastic methods.The concept of deterministic ignores errors in the data as statistical noise but an error term is considered as the statistical noise for the statistical methods.These methods could be organized as parametric and non-parametric.In contrast to non-parametric methods, in the parametric methods, estimation of cost or production frontier is a necessity.The corrected ordinary least squares (COLS) and the stochastic frontier analysis (SFA) are examples of parametric models and data envelopment analysis (DEA) and principal component analysis (PCA) are considered to be non-parametric models.Furthermore, COLS, DEA, and PCA are normally considered to be deterministic and SFA is considered to be stochastic.
The COLS method is based on regression analysis.The OLS technique is used for estimation of regression equation.Then this equation shifted to the efficient frontier by adding the value of the largest negative estimated error to that of the estimated intersects (for a cost function).Also, the method ignores stochastic errors and relies heavily on the position of the single most efficient firm.The method assumes that the whole deviations from the frontier are accounted for the inefficiency (Collei and perleman, 1999).SFA, as another parametric method, is used to estimate the efficient frontier and efficiency scores.Estimation of efficiency scores in SFA is similar to COLS.SFA considers apart of error term as the statistical noise in contrast to COLS.The basic idea behind the stochastic frontier model decomposes the error term into two parts.The first part captures the effects of inefficiency and the second part describes the statistical noise.PCA which is a multivariate statistical method is commonly used to reduce the number of indicators under the study and consequently the ranking and analysis of the DMUs.This method is applied as an alternative to DEA in several studies (Zhu, 1998).DEA is a linear programming-based methodology which has proven to be a useful tool in efficiency measurement and was originally introduced by Charnes (1978).DEA can be used with either the constant or variable data considering returns to scale assumption (either the CRS or VRS).The CRS hypothesis proposes that DMUs can be flexible to adjust their sizes to the one optimal firm size.On the contrary, The VRS approach is less serving to restrict since it only compares the productivity of DMUs within similar sample sizes.Therefore, the VRS approach produces technical efficiency scores which are greater than or equal to those obtained using CRS and is considered the more flexible frontier.It can be concluded that the return to scale do not play an important role in the process when both approaches generate similar results.Also, there are two types of orientations in DEA approach.One of them is input oriented and the other one is output oriented.Input oriented model's objective has the form maximizing weighted outputs given the level of inputs.Output oriented model's objective takes the form minimizing weighted inputs given the level of outputs Cooper and Cohn (1997).
The original fractional DEA model (1) estimates the relative efficiencies of n DMUs, each with m input and s outputs indicated by , respectively where the ratio of the weighted sum of outputs to the weighted sum of inputs is maximized as follows: e is the efficiency of DMU under consideration, r u and i v are the weight factors.Model ( 1) is a non-linear fractional programming approach which is converted by Charnes et al. (1978) into the following LP model: This LP model, in general form, can be expressed as follows, W max subject to: In this model, we define x and v as output and input variables.Indices r i, and j represent the number of inputs, outputs and DMUs respectively.

Robust optimization
Robust optimization is coming out as a directing methodology to address optimization problems under uncertainty (Bertsimas & Sim, 2003).In classical modeling, a full probabilistic characterization is assumed under uncertainty.However, a representative nominal value is used instead of uncertainty which is ignored in many models.Stochastic programming (SP) is the classical approach to handle uncertainty.Recently, robust optimization is developed which is considered as an alternative to sensitivity analysis and SP.
In this paper, an alternative approach for bootstrapping techniques used in DEA by Simar and Wilson (1998) is proposed.Both of them simulate the effect of noise in data on efficiency evaluation.Simar andWilson (1998, 2000) precisely conducted studies on statistical properties of DEA models and developed bootstrap algorithm to examine the statistical properties of efficiency scores produced by DEA.Bootstrapping technique, first, generates efficiency scores by original DEA, and then the proposed algorithm by Simar and Wilson (1998) can obtain the standard errors of DEA estimators.Application of this algorithm is confronted with some difficulties.One of them is to find an appropriate value of smoothing parameter.The number of iteration is another difficult step in the proposed algorithm.The approach based on robust optimization is the preferred method due to its applicability.In this method, first, the percentage of the perturbation in data is considered and then the robust efficiency estimation is obtained.For better understanding of the robust structure developed by Bertsimas and Sim (2004), consider the following LP problem: The uncertainty is assumed to affect the constraint A and is a polyhedron.For modeling uncertainty in data, a particular row of i of the matrix A is considered where i j represents the set of coefficient in row i which are subject to uncertainty.Each entry is then modeled as a symmetric and bounded random variable (see Ben-Tal and Nemirovski, 2000) centered at the point ij a which is nominal value and ij a ˆ denotes the precision of the estimate.

Robust DEA based on Ben-Tal (BN) approach
Although Soyster (1973) considered the highest protection, it is also the most conservative in practice in the sense that the robust solution has an objective function value of the solution of the nominal linear optimization problem (Bertsimas and Sim,2002).Ben-Tal and Nemirovski (2000) and ij z represent decision variables and e , as mentioned above, is the percentage of perturbation (for example 01 .0 = e ).
Considering the data uncertainty model U, the probability that the i constraint is violated is Here, k denotes the reliability level (e.g. ) and means that by changing Ω , the reliability level is kept under control.
It is possible for the uncertainty to be considered in different parts of input and output according to the features of problem.By applying the idea of robust optimization in DEA, we grant immunity to the output result from infeasibility when we are confronted with uncertain data.The robust DEA model based on Ben-Tal approach is proposed as follows: The robust formulation of Ben-Tal provides probabilistic guarantees on feasibility of a robust solution when uncertain coefficient has some natural probability distribution.On the other hand, application of this approach leads the linear programming models to change into the nonlinear programming models, which are more complicated to solve.In addition, the natural exclusion of discrete optimization models, which is generally based on LP, is considered as another disadvantage of this approach.

Robust DEA based on Bertsimas (BR) approach
Let i j represent the set of indices of uncertain data in the ith constraint and i a ~ be the ith vector of A′ in order to reformulate model (4) as follow, x c′ min Subject to: Bertsimas and Sim (2003, 2004, and 2006) and Bertsimas et al. (2004) proposed another approach for robust linear optimization that provides full control on the degree of conservatism and retains the advantages of the linear framework of Soyster.They defined the scaled deviation from nominal value of ij a as follows, In this approach, i Γ is not necessarily integer and for every constraint i , takes value in the interval [ ] n , 0 .This parameter is defined to adjust the robustness of the proposed method against the level of conservatism of the solution.i Γ is called the budget of the uncertainty of constraint i .
, there is no protection against uncertainty.
, the ith constraint of the problem is protected against all possible uncertainty.
, the decision maker consider a trade-off between the protection level of the constraint and the conservation degree of the solution.In this approach, the set i J is defined as Now, problem (9) can be formulated as follows, x c′ min subject to: For a given i is equivalent to Duality of ( 15) is expressed as follows (Bertsimas and Sim, 2004), , 0 where i p and ij q are the corresponding dual variables.When the changes obtained from Eq. ( 16) is applied to Eq. ( 14), the robust formulation based on Bertsimas and Sim (2003,2004,2006) and Bertsimas et al.(2004) is changed as follows, where the DEA model ( 3) is linear programming and the robust DEA model is proposed due to uncertainty in outputs.This Bertsimas (BR) DEA model is expressed as follows, W max subject to: The robust counterparts proposed by Bertsimas et al. (2004) are linear optimization problems.In other words, by applying this approach the classification of the original problem can be preserved, e.g., the robust counterparts of a linear programming problem such as DEA are a linear programming problem such as model ( 18).

Stochastic frontier analysis (SFA)
SFA is a common statistical method in efficiency analysis, which generates a stochastic error, and an inefficiency term using the residuals obtained from an estimated protection frontier.The method is regression analysis and it is an alternative approach for measuring relative efficiency of some DMUs.According to Färe et al. (1989), Coelli and Perelman (1999)  , can be calculated as is shown in Eq. ( 19) where M and K represent the number of inputs and outputs, i denotes ith firm in the sample and  (Kumbhakar et al., 1991).Despite the fact that two components of the residual could have a number of different distributions, a common assumption in the estimation procedure is that i v is normally distributed, while i u is often represented by a half-normal distribution.Therefore, the probability distributions function for i v and i u are assumed as follows: ( ), Here, γ shows the percentage of error term that the efficiency may have and must be between 0 and 1 (Kumbhakar & Lovell, 1991).For examination of the necessity of the application of SFA a hypothesis test is defined as follows: where λ , ( ) , respectively (Kumbhakar & Lovell, 1991).In the situations where the null hypothesis holds, the test statistic has a chi-square distribution where the degree of freedom equals to differences between the parameters involved in the null hypothesis.

Case study
We implement the proposed robust DEA model using some actual data from sample Iranian high schools.Our data series involved annual data on 35 high schools observed in 2009.These data is retrieved from Department of Education of Tehran located in region one.Table 1 summarizes the input and the output parameters used for the case study presented in next section.Educational institutions are different from different aspects such as private or public.Therefore they are different most often both in terms of inputs that they utilize as well as their outputs.It offers a challenge of evaluating their performance in this kind of multi-dimensional setting.
It is important to determine input-output variables for the proposed model to evaluate the performance of these high schools.There are different school related variables such as class size, teacher experience, teacher qualification, number of students/teacher etc, environmental factors can be considered as inputs and academic and non academic achievements as outputs.School and home related variables are inputs and test scores are outputs in the studies by (Bessent & Bessent, 1980;Bessent et al., 1982).In this study, we consider the result of universities entrance examination and average score of final exam as the outputs of learning activity.Educational achievements are obtained using human resources and materials.It is difficult for some factors which are not related directly to human resources, to be measured, quantitatively.The uncertain inputs used in this study embrace the number of students enrolled in the school, the number of teachers, administrative or supporting staff and the number of classes.

DEA result
We have used the information of 35 high schools in Tehran and an efficiency score is estimated for each high school in 2009, in order to explain the performance of the DEA-CRS model.The proposed DEA model is solved without considering uncertainty in data.The results of DEA model, which are shown in the second column of Table 2, indicate that high schools with efficiency score equal to one represent the efficient production frontier.In other words, these high schools in terms of technical efficiency are considered as a reference set for the others.Efficiency rating calculated for other high schools are below one and range between a minimum of 0.517 and 0.982.As observed, the mean overall technical efficiency score of high schools is 0.9351.and the perturbation to 01 .0 = e , 0.05 and 0.10, respectively in (7) in order to apply the robust DEA based on Ben-Tal approach.As we observe, the result of model ( 7) in Table 2 demonstrate that, for example, when the perturbation is considered to be 0.05, the efficiency ranges from a minimum of 0.509 to 0.986.By applying the model for different perturbation, the results indicate that, as the perturbation increases from 0.01 to 0.1, the mean of efficiency decreases from 0.930 to 0.905 and the results are shown in Fig. 1.Now, we implement the robust DEA model based on Bertsimas (BR) approach.For the constraint i to be violated with probability at most i e , when each ij a ~has a symmetric distribution centered at ij a and of , it is sufficient to choose i Γ at least equal to Eq. ( 25): where φ is the cumulative distribution function of the standard Gaussian variable (Normal distribution) and n is the sources of uncertainty in each constraint.It is necessary to ensure full protection for constraints with fewer numbers of uncertain data, which is seemingly equivalent to the Soyster's method (Bertsimas & Sim, 2004).In this study Γis considered to be 3 for all constraints.Also, the perturbations i e are set to 0.01, 0.05 and 0.10, respectively.As mentioned about approach proposed by Ben-Tal and Nemirovski (2000), the efficiency scores are decreased as perturbation increases according to the result of robust DEA based on Bertsimas (BR) approach.However, these approaches are not the same in the rate of decrease and the recent approach has higher rate.The results are shown in Table 2 and Fig. 2 and indicate that increasing the values of perturbation cause to increase the differences between the results of two approaches.In this case, as perturbation increases from 0.01 to 0.1, the mean of efficiency scores is decreased from 0.930 to 0.907.

SFA results
In order to estimate efficiency with SFA method, a translog distance function is applied to assess the SFA parameters.The results of the implementation of SFA method in a cross-section of 35 high schools are obtained by using the FRONTIER version 4.1 (Coelli, 1994).In this study, we present the common truncated normal model due to similarity between the results of application of both truncated normal and half normal model for i u .As we can see from Table 2, efficiency ranges from 0.652 to 0.972.The mean overall technical efficiency score of high schools is 0.83.

Comparison between Bertsimas (BR) And Ben-Tal (BN) approaches
We consider the following indicators: • Preserving the class of problems under analysis • Number of constraints in the proposed approach • Number of variables in the proposed approach As mentioned in the last sections, the Bertsimas and Sim's approach can preserve the class of problems.In other words, the robust counterpart of linear programming problem remains a linear programming problem whereas the Ben-Tal approach changes a linear problem to a nonlinear one.It is concluded that if the number of variables and constraints increase in the original model, the Bertsimas approach presents better results.To examine the number of constraints and variables for two approaches, we assume that there are k coefficients of the  (Bertsimas & Sim, 2004).In other words, the first approach has more variables and when n k > , the second approach has fewer constraints.

Comparison between SFA and robust DEA approaches
We apply the Pearson correlation test to compare the results of SFA with robust DEA models.As observed in Table 3, the correlation coefficients among all results are significant at the 0.01 level and we apply both of approaches with 05 .0 = e .In Fig. 3, the comparison between both approaches of robust DEA and SFA are shown.It is clear that the Bertsimas (BR) approach performs better than the others.These two reasons can be expressed for this conclusion: • The implementation of Bertsimas (BR) approach is the easiest.
• Due to using linear programming model, it is the most accessible method.

Conclusion
We have proposed a new robust DEA model by considering uncertainties on the output parameters.These methods are capable of handling uncertainty for all parameters where the uncertain data has a symmetric unknown distribution.The presented method in this paper is based on two recent approaches.We have applied robust DEA method based on Ben Tal and Nemirovski (2000) and Bertsimas et al. (2004) and an econometrics method called SFA in order to compare the results and to evaluate these methods.These methods are applied to data gathered from Iranian high schools and the results have indicated that the robust DEA approaches can be more appropriate method for efficiency estimating.
assumed that the true values ij a ~ of uncertain coefficient entries in ith inequality are achieved from the nominal values ij a of the entries by random perturbation: Ben-Tal and Nemirovski (2000) proposed the following robust problem, ij a represent uncertain data and nominal value, respectively, and ij a ˆdenotes the precision of the estimate.It is clear that ij η has an unknown but symmetric distribution which takes value in [ ] the aggregated scaled deviation for constraint i could take any value in In this equation i u and i v can be considered as asymmetric non-negative inefficiency term and symmetric stochastic noise, respectively and then

Fig. 3 .
Fig. 3.The results from different stochastic approaches translog form is generally applied for distance function, which is estimated in the multi-output-multi-input state.Translog form of input

Table 1
Summary statistics for input and output variables

Table 3
× nominal matrix A which are subject to uncertainty.Given that the original nominal problem has n variables and m constraints (not counting the bound constraints); first approach is a second -order cone problem, with n m