Some Recent Developments in Efficiency Measurement in Stochastic Frontier Models

This paper addresses some of the recent developments in efficiency measurement using stochastic frontier SF models in some selected areas. The following three issues are discussed in details. First, estimation of SF models with input-oriented technical efficiency. Second, estimation of latent class models to address technological heterogeneity as well as heterogeneity in economic behavior. Finally, estimation of SF models using local maximum likelihood method. Estimation of some of these models in the past was considered to be too difficult. We focus on the advances that have been made in recent years to estimate some of these so-called difficult models. We complement these with some developments in other areas as well.


Introduction
In this paper we focus on three issues. First, we discuss issues mostly econometric related to input-oriented IO and output-oriented OO measures of technical inefficiency and talk about the estimation of production functions with IO technical inefficiency. We discuss implications of the IO and OO measures from both the primal and dual perspectives. Second, the latent class finite mixing modeling approach is extended to accommodate behavioral heterogeneity. Specifically, we consider profit-revenue-maximizing and cost-minimizing behaviors with technical inefficiency. In our mixing/latent class model, first we consider a system approach in which some producers maximize profit while others simply minimize cost, and then we use a distance function approach, and mix the input and output distance functions in which it is assumed, at least implicitly, that some producers maximize revenue while others minimize cost . In the distance function approach the behavioral assumptions are not explicitly taken into account. The prior probability in favor of profit revenue maximizing behavior is assumed to depend on some exogenous variables. Third, we consider stochastic frontier SF models that are estimated using local maximum likelihood LML

The IO and OO Debate
The technology with or without inefficiency can be looked at from either a primal or a dual perspective. In a primal setup two measures of technical efficiency are mostly used in the efficiency literature. These are i input-oriented IO technical inefficiency and ii output oriented OO technical inefficiency. 1 There are some basic differences between the IO and OO models so far as features of the technology are concerned. Although some of these differences and their implications are well-known except for Kumbhakar and Tsionas 1 , no one has estimated a stochastic production frontier model econometrically with IO technical inefficiency using cross-sectional data. 2 Here we consider estimation of a translog production model with IO technical inefficiency.

The IO and OO Models
Consider a single output production technology where Y is a scalar output and X is a vector of inputs. Then the production technology with the IO measure of technical inefficiency can be expressed as where Y i is a scalar output, Θ i ≤ 1 is IO efficiency a scalar , X i is the J × 1 vector of inputs, and i indexes firms. The IO technical inefficiency for firm i is defined as ln Θ i ≤ 0 and is interpreted as the rate at which all the inputs can be reduced without reducing output. On the other hand, the technology with the OO measure of technical inefficiency is specified as where Λ i ≤ 1 represents OO efficiency a scalar , and ln Λ i ≤ 0 is defined as OO technical inefficiency. It shows the percent by which actual output could be increased without increasing inputs for more details, see Figure 1 . It is clear from 2.1 and 2.2 that if f · is homogeneous of degree r then Θ r i Λ i , that is, independent of X and Y . If homogeneity is not present their relationship will depend on the input quantities and the parametric form of f · .
We now show the IO and OO measures of technical efficiency graphically. The observed production plan Y, X is indicated by the point A. The vertical length AB measures OO technical inefficiency, while the horizontal distance AC measures IO technical inefficiency. Since the former measures percentage loss of output while the latter measures percentage increase in input usage in moving to the production frontier starting from the inefficient production plan indicated by point A, these two measures are, in general, not directly comparable. If the production function is homogeneous, then one measure is a constant multiple of the other, and they are the same if the degree of homogeneity is one.
In the more general case, they are related in the following manner: f X · Λ f XΘ .
Although we consider technologies with a single output, the IO and OO inefficiency can be discussed in the context of multiple output technologies as well.

Economic Implications of the IO and OO Models
Here we ask two questions. First, does it matter whether one uses the IO or the OO representation so far as estimation of the technology is concerned? That means, whether features of the estimated technology such as elasticities, returns to scale, and so forth, are invariant to the choice of efficiency orientation. Second, are efficiency rankings of firms invariant to the choice of efficiency orientation? That is, does one get the same efficiency measures converted in terms of either output loss or increase in costs in both cases? It is not possible to provide general theoretical answers to these questions. These are clearly empirical issues so it is necessary to engage in applied research to get a feel for the similarities and differences of the two approaches. Answers to these questions depend on the form of the production technology. If it is homogeneous, then there is no difference between these two models econometrically. This is because for a homogeneous function r ln Θ i ln Λ i , where r is the degree of homogeneity. Thus, rankings of firms with respect to ln Λ i and ln Θ i will be exactly the same one being a constant multiple of the other . Moreover, since f X · Λ f X Θ r , the input elasticities as well as returns to scale measures based on these two specifications of the technology will be the same. 3 This is, however, not the case if the technology is nonhomogenous. In the OO model the elasticities and returns to scale will be independent of the technical inefficiency because technical efficiency i.e., assumed to be independent of inputs enters multiplicatively into the production function. This is not true for the IO model, where technical inefficiency enters multiplicatively with the inputs. This will be shown explicitly later for a nonhomogeneous translog production function.

Econometric Modeling and Efficiency Measurement
Using the lower case letters to indicate the log of a variable, and assuming that f · has a translog form the IO model can be expressed as

Journal of Probability and Statistics
where y i is the log of output, 1 J denotes the J × 1 vector of ones, x i is the J × 1 vector of inputs in log terms, T i is the trend/shift variable, β 0 , β T and β TT are scalar parameters, β, ϕ are J × 1 parameter vectors, Γ is a J × J symmetric matrix containing parameters, and v i is the noise term. To make θ nonnegative we defined it as − ln Θ θ.
We rewrite the IO model above as Note that if the production function is homogeneous of degree r, then Γ1 J 0, 1 J β r, and 1 J ϕ 0. In such a case the g θ i , x i function becomes a constant multiple of θ, namely, 1/2 θ 2 i Ψ − θΞ i −rθ i , and consequently, the IO model cannot be distinguished from the OO model. The g θ i , x i function shows the percent by which output is lost due to technical inefficiency. For a well-behaved production function g θ i , x i ≥ 0 for each i.
The OO model, on the other hand, takes a much simpler form, namely, where we defined − ln Λ λ to make it nonnegative. 4 The OO model in this form is the one introduced by Aigner et al. 2 and Meeusen and van den Broeck 3 , and since then it has been used extensively in the efficiency literature. Here we follow the framework used in Kumbhakar and Tsionas 1 when θ is random. 5 We write 2.4 more compactly as . . , n.

2.6
Both Ψ and Ξ i are functions of the original parameters, and Ξ i also depends on the data x i and T i . Under the assumption that v i ∼ N 0, σ 2 and θ i is distributed independently of v i with the density function f θ i ; ω , where ω is a parameter, the probability density function of y i can be expressed as where μ denotes the entire parameter vector. We consider a half-normal and an exponential specification for the density f θ i ; ω , namely,

2.8
Journal of Probability and Statistics 5 The likelihood function of the model is then where f y i ; μ has been defined above. Since the integral defining f y i ; μ is not available in closed form we cannot find an analytical expression for the likelihood function. However, we can approximate the integrals using a simulation as follows. Suppose θ i, s , s 1, . . . , S is a random sample from f θ i ; ω . Then it is clear that and an approximation of the log-likelihood function is given by which can be maximized by numerical optimization procedures to obtain the ML estimator.
For the distributions we adopted, random number generation is trivial, so implementing the SML estimator is straightforward. 6 Inefficiency estimation is accomplished by considering the distribution of θ i conditional on the data and estimated parameters where a tilde denotes the ML estimate, and D i x i , T i denotes the data. For example, when f θ i ; ω is half-normal we get

2.13
This is not a known density, and even the normalizing constant cannot be obtained in closed form. However, the first two moments and the normalizing constant can be obtained by numerical integration, for example, using Simpson's rule.
To make inferences on efficiency, define efficiency as r i exp −θ i and obtain the distribution of r i and its moments by changing the variable from θ i to r i . This yields

The IO Approach
We now examine the IO and OO models when behavioral assumptions are explicitly introduced. First, we examine the models when producers minimize cost to produce the given level of output s . The objective of a producer is to from which conditional input demand functions can be derived. The corresponding cost function can then be expressed as where C w, Y is the minimum cost function cost frontier and C a is the actual cost. Finally, one can use Shephard's lemma to obtain X a j X * j w, Y /Θ ≥ X * j w, Y for all j, where the superscripts a and * indicate actual and cost-minimizing levels of input X j .
Thus, the IO model implies i a neutral shift in the cost function which in turn implies that RTS and input elasticities are unchanged due to technical inefficiency, ii an equiproportional increase at the rate given by θ in the use of all inputs due to technical inefficiency, irrespective of the output level and input prices.
To summarize, result i is just the opposite of what we obtained in the primal case see 6 . Result ii states that when inefficiency is reduced firms will move horizontally to the frontier as expected by the IO model .

The OO Model
Here the objective function is written as Journal of Probability and Statistics 7 from which conditional input demand functions can be derived. The corresponding cost function can then be expressed as where as, before, C w, Y is the minimum cost function cost frontier and C a is the actual cost. Finally, q · C w, Y/Λ /C w, Y ≥ 1. One can then use Shephard's lemma to obtain where the last inequality will hold if the cost function is well behaved. Note that X a j / X * j w, Y for all j unless q · is a constant. Thus, the results from the OO model are just the opposite from those of the IO model. Here i inefficiency shifts the cost function nonneutrally meaning that q · depends on output and input prices as well as Λ; ii increases in input use are not equiproportional depends on output and input prices ; iii the cost shares are not independent of technical inefficiency, iv the model is harder to estimate similar to the IO model in the primal case . 9 More importantly, the result in i is just the opposite of what we reported in the primal case. Result ii is not what the OO model predicts increase in output when inefficiency is eliminated. Since output is exogenously given in a cost-minimizing framework, input use has to be reduced when inefficiency is eliminated.
The results from the dual cost function models are just the opposite of what the primal models predict. Since the estimated technologies using cost functions are different in the IO and OO models, as in the primal case, we do not repeat the results based on the production/distance functions results here.

The IO Model
Here we assume that the objective of a producer is to

2.20
from which unconditional input demand and supply functions can be derived. Since the above problem reduces to a standard neoclassical profit-maximizing problem when X is replaced by X · Θ, and w is replaced by w/Θ, the corresponding profit function can be expressed as where π a is actual profit, π w, p is the profit frontier homogeneous of degree one in w and p and h w, p, Θ π w/Θ, p /π w, p ≤ 1 is profit inefficiency. Note that the h w, p, Θ 8 Journal of Probability and Statistics function depends on w, p, and Θ in general. Application of Hotelling's lemma yields the following expressions for the output supply and input demand functions: where the superscripts a and * indicate actual and optimum levels of output Y and inputs X j . The last inequality in the above equations will hold if the underlying production technology is well behaved.

The OO Model
Here the objective function can be written as

2.23
which can be viewed as a standard neoclassical profit-maximizing problem when Y is replaced by Y/Λ and p is replaced by p·Λ, the corresponding profit function can be expressed as where g w, p, Λ π w, p · Λ /π w, p ≤ 1. Similar to the IO model using Hotelling's lemma, we get

2.25
The last inequality in the above equations will hold if the underlying production technology is well behaved.
To summarize i a shift in the profit functions for both the IO and OO models is non-neutral. Therefore, estimated elasticities, RTS, and so on, are affected by the presence of technical inefficiency, no matter what form is used. ii Technical inefficiency leads to a decrease in the production of output and decreases in input use in both models, however, prediction of the reduction in input use and production of output are not the same under both models.
Even under profit maximization that recognizes endogeneity of both inputs and outputs, it matters which model is used to represent the technology!! These results are different from those obtained under the primal models and from the cost minimization framework. Thus, it matters both theoretically and empirically whether one uses an inputor output-oriented measure of technical inefficiency.

Modeling Technological Heterogeneity
In modeling production technology we almost always assume that all the producers use the same technology. In other words, we do not allow the possibility that there might be more than one technology being used by the producers in the sample. Furthermore, the analyst may not know who is using what technology. Recently, a few studies have combined the stochastic frontier approach with the latent class structure in order to estimate a mixture of several technologies frontier functions . Greene 7,8 proposes a maximum likelihood for a latent class stochastic frontier with more than two classes. Caudill 9 introduces an expectationmaximization EM algorithm to estimate a mixture of two stochastic cost frontiers with two classes. 10 Orea and Kumbhakar 10 estimated a four-class stochastic frontier cost function translog with time-varying technical inefficiency.
Following the notations of Greene 7, 8 we specify the technology for class j as where u i | j is a nonnegative random term added to the production function to accommodate technical inefficiency. We assume that the noise term for class j follows a normal distribution with mean zero and constant variance, σ 2 vj . The inefficiency term u t | j is modeled as a half-normal random variable following standard practice in the frontier literature, namely, That is, a half-normal distribution with scale parameter ω j for each class. With these distributional assumptions, the likelihood for firm i, if it belongs to class j, can be written as 11 Finally, φ · and Φ · are the pdf and cdf of a standard normal variable.
The unconditional likelihood for firm i is obtained as the weighted sum of their j-class likelihood functions, where the weights are the prior probabilities of class membership. That is, where the class probabilities can be parameterized by, for example, a logistic function. Finally, the log likelihood function is The estimated parameters can be used to compute the conditional posterior class probabilities. Using Bayes' theorem see Greene 7,8 and Orea and Kumbhakar 10 the posterior class probabilities can be obtained from 3.6 This expression shows that the posterior class probabilities depend not only on the estimated parameters in π ij , but also on parameters of the production frontier and the data. This means that a latent class model classifies the sample into several groups even when the π ij are fixed parameters independent of i .
In the standard stochastic frontier approach where the frontier function is the same for every firm, we estimate inefficiency relative to the frontier for all observations, namely, inefficiency from E u i | ε i and efficiency from E exp −u i | ε i . In the present case, we estimate as many frontiers as the number of classes. So the question is how to measure the efficiency level of an individual firm when there is no unique technology against which inefficiency is to be computed. This is solved by using the following method, where P j | i is the posterior probability to be in the jth class for a given firm i defined in 3.9 , and EF i j is its efficiency using the technology of class j as the reference technology. Note that here we do not have a single reference technology. It takes into account technologies from every class. The efficiency results obtained by using 3.10 would be different from those based on the most likely frontier and using it as the reference technology. The magnitude of the difference depends on the relative importance of the posterior probability of the most likely cost frontier, the higher the posterior probability the smaller the differences. For an application see Orea and Kumbhakar 10 .

Modeling Directional Heterogeneity
In Section 2.3 we talked about estimating IO technical inefficiency. In practice most researchers use the OO model because it is easy to estimate. Now we address the question of choosing one over the other. Orea et al. 12 used a model selection test procedure to determine whether the data support the IO, OO, or the hyperbolic model. Based on such a test result, one may decide to use the direction that fits the data best. This implictly assumes that all producers in the sample behave in the same way. In reality, firms in a particular industry, although using the same technology, may choose different direction to move to the frontier. For example, some producers might find it costly to adjust input levels to attain the production frontier, while for others it might be easier to do so. This means that some producers will choose to shrink their inputs while others will augment the output level. In such a case imposing one direction for all sample observations is not efficient. The other practical problem is that no one knows in advance, which producers are following what direction. Thus, we cannot estimate the IO model for one group and the OO model for another. The advantage of the LCM is that it is not necessary to impose a priori criterion to identify which producers are in what class. Moreover, we can formally examine whether some exogenous factors are responsible for choosing the input or the output direction by making the probabilities function of exogenous variables. Furthermore, when panel data is available, we do not need to assume that producers follow one direction for all the time, so we can accommodate switching behaviour and determine when they go in the input output direction.

The Input-Oriented Model
Under the assumption that v i ∼ N 0, σ 2 , and θ i is distributed independently of v i , according to a distribution with density f θ θ i ; ω , where ω is a parameter, the distribution of y i has density where Δ denotes the entire parameter vector. We use a half-normal specification for θ, namely, The likelihood function of the IO model is where f IO y i | z i , Δ has been defined in 3.8 . Since the integral defining f IO y i | z i , μ in 3.11 is not available in closed form, we cannot find an analytical expression for the likelihood function. However, we can approximate the integrals using Monte Carlo simulation as follows. Suppose θ i, s , s 1, . . . , S is a random sample from f θ θ i ; ω . Then it is clear that Journal of Probability and Statistics and an approximation of the log-likelihood function is given by which can be maximized by numerical optimization procedures to obtain the ML estimator.
To perform SML estimation, we consider the integral in 3.11 . We can transform the range of integration to 0, 1 by using the transformation r i exp −θ i which has a natural interpretation as IO technical efficiency. Then, 3.11 becomes

3.13
Suppose r i, s is a set of standard uniform random numbers, for s 1, . . . , S. Then the integral can be approximated using the Monte Carlo estimator The standard uniform random numbers and their log transformation can be saved in an n × S matrix before maximum likelihood estimation and reused to ensure that the likelihood function is a differentiable function of the parameters. An alternative is to maintain the same random number seed and redraw these numbers for each call to the likelihood function. This option increases computing time but implies considerable savings in terms of memory. An alternative to the use of pseudorandom numbers is to use the Halton sequence to produce quasi-random numbers that fill the interval 0, 1 . The Halton sequence has been used in econometrics by Train 13 for the multinomial probit model, and Greene 14 to implement SML estimation of the normal-gamma stochastic frontier model.

The Output-Oriented Model
Estimation of the OO is easy since the likelihood function is available analytically. The model is 3.15 We make the standard assumptions that v i ∼ N 0, σ 2 v , λ i ∼ N 0, σ 2 λ , and both are mutually independent as well as independent of z i . The density of y i is 11, page 75 Journal of Probability and Statistics

13
where e i y i − z i α, ρ 2 σ 2 v σ 2 λ , τ σ λ /σ v , and φ N and Φ N denote the standard normal pdf and cdf, respectively. The log likelihood function of the model is 3.17

The Finite Mixture (Latent Class) Model
The IO and OO models can be embedded in a general model that allows model choice for each observation in the absence of sample separation information. Specifically, we assume that each observation y i is associated with the OO class with probability p, and with the IO class with probability 1 − p.
To be more precise, we have the model with probability p, and the model with probability 1 − p, where the stochastic elements obey the assumptions that we stated previously in connection with the OO and IO models. Notice that the technical parameters, α, are the same in the two classes. Denote the parameter vector by ψ α , σ 2 , ω 2 , σ 2 v , σ 2 λ , p . The density of y i will be where O α , σ 2 , ω 2 , and Δ α , σ 2 v , σ 2 λ are subsets of ψ. The log likelihood function of the model is

3.21
The log likelihood function depends on the IO density f IO y i | z i , Δ , which is not available in closed form but can be obtained with the aid of simulation using the principles presented previously to obtain

3.22
where f IO y i | z i , Δ has been defined in 3.14 and f OO y i | z i , O in 3. 16 . This log likelihood function can be maximized using standard techniques to obtain the SML estimates of the LCM.
14 Journal of Probability and Statistics

Technical Efficiency Estimation in the Latent Class Model
A natural output-based efficiency measure derived from the LCM is is the posterior probability that the ith observation came from the OO class. These posterior probabilities are of independent interest since they can be used to provide inferences on whether a firm came from the OO or IO universe, depending on whether, for example, P i ≥ 1/2 or P i < 1/2. This information can be important in deciding which type of adjustment cost input-or output-related is more important for a particular firm.

Returns to Scale and Technical Change
Note that returns to scale defined as RTS j ∂y/∂x j is not affected by the presence of technical inefficiency in the OO model. The same is true for input elasticities and elasticities of substitution that are not explored here . This is because inefficiency in the OO model shifts the production function in a neutral fashion. On the contrary, the magnitude of technical inefficiency affects RTS in the IO models. Using the translog specification in 2.4 , we get whereas the formula for RTS in the OO model is

3.26
We now focus on estimates of technical change from the IO and OO models. Again TC in the IO model can be measured conditional on θ TC IO I and TC defined at the frontier TC IO II , namely,

3.27
These two formulas will give different results if technical change is neutral and/or the production function is homogeneous i.e., 1 J ϕ / 0 . The formula for TC OO is the same as Journal of Probability and Statistics 15 TC IO II except for the fact that the estimated parameters in 3.3 are from the IO model, whereas the parameters to compute TC OO are from the OO model. It should be noted that in the LCM we enforce the restriction that the technical parameters, α, are the same in the IO and OO components of the mixture. This implies that RTS and TC will be the same in both components if we follow the first approach, but they will be different if we follow the second approach. In the second approach, a single measure of RTS and TC can be defined as the weighted average of both measures using the posterior probabilities, P i , as weights. To be more precise, suppose RTS IO

Relaxing Functional form Assumptions (SF Model with LML)
In this section we introduce the LML methodology 15 in estimating SF models in such a way that many of the limitations of the SF models originally proposed by Aigner et al. 2 , Meeusen and van den Broeck 3 , and their extensions in the last two and a half decades are relaxed. Removal of all these deficiencies generalizes the SF models and makes them comparable to the DEA models. Moreover, we can apply standard econometric tools to perform estimation and draw inferences.
To fix ideas, suppose we have a parametric model that specifies the density of an observed dependent variable y i conditional on a vector of observable covariates x i ∈ X ⊆ R k , a vector of unknown parameters θ ∈ Θ ⊆ R m , and let the density be l y i ; x i , θ . The parametric ML estimator is given by The problem with the parametric ML estimator is that it relies heavily on the parametric model that can be incorrect if there is uncertainty regarding the functional form of the model, the density, and so forth. A natural way to convert the parametric model to a nonparametric one is to make the parameter θ a function of the covariates x i . Within LML this is accomplished as follows. For an arbitrary x ∈ X, the LML estimator solves the problem where K H is a kernel that depends on a matrix bandwidth H. The idea behind LML is to choose an anchoring parametric model and maximize a weighted log-likelihood function that places more weight to observations near x rather than weight each observation equally, as the parametric ML estimator would do. 11 By solving the LML problem for several points x ∈ X, we can construct the function θ x that is an estimator for θ x , and effectively we have a fully general way to convert the parametric model to a nonparametric approximation to the unknown model. Suppose we have the following stochastic frontier cost model: where y is log cost and x i is a vector of input prices and outputs 12 ; v i and u i are the noise and inefficiency components, respectively. Furthermore, v i and u i are assumed to be mutually independent as well as independent of x i .
To make the frontier model more flexible nonparametric , we adopt the following strategy. Consider the usual parametric ML estimator for the normal v and truncated normal u stochastic cost frontier model that solves the following problem 16 :

4.5
ψ μ/ω, and Φ denotes the standard normal cumulative distribution function. The parameter vector is θ β, σ, ω, ψ and the parameter space is Θ R k × R × R × R. Local ML estimation of the corresponding nonparametric model involves the following steps. First, we choose a kernel function. A reasonable choice is where m is the dimensionality of θ, H h · S, h > 0 is a scalar bandwidth, and S is the sample covariance matrix of x i . Second, we choose a particular point x ∈ X, and solve the following problem:

4.7
A solution to this problem provides the LML parameter estimates β x , σ x , ω x , and ψ x . Also notice that the weights K H x i − x do not involve unknown parameters if h is known so they can be computed in advance and, therefore, the estimator can be programmed in any standard econometric software. 13 For an application of this methodology to US commercial banks see Kumbhakar

General
There are many innovative empirical applications of stochastic frontier analysis in recent years. One of them is in the field of auctions, a particular field of game theory. Advances and empirical applications in this field are likely to accumulate rapidly and contribute positively to the advancement of empirical Game theory and empirical IO. Kumbhakar et al. 19 propose Bayesian analysis of an auction model where systematic over-bidding and under-bidding is allowed. Extensive simulations are used to show that the new techniques perform well and ignoring measurement error or systematic over-bidding and under-bidding is important in the final results. Kumbhakar and Parmeter 20 derive the closed-form likelihood and associated efficiency measures for a two-sided stochastic frontier model under the assumption of normal-exponential components. The model receives an important application in the labor market where employees and employers have asymmetric information, and each one tries to manipulate the situation to his own advantage, Employers would like to hire for less and employees to obtain more in the bargaining process. The precise measurement of these components is, apparently, important.
Kumbhakar et al. 21 acknowledge explicitly the fact that certain decision making units can be fully i.e., 100% efficient, and propose a new model which is a mixture of i a half-normal component for inefficient firms and ii a mass at zero for efficient firms. Of course, it is not known in advance which firms are fully efficient or not. The authors propose classical methods of inference organized around maximum likelihood and provide extensive simulations to explore the validity and relevance of the new techniques under various data generating processes. Tsionas 22 explores the implications of the convolution ε v u in stochastic frontier models. The fundamental point is that even when the distributions of the error components are nonstandard e.g., Student-t and half-Student or normal and half-Student, gamma, symmetric stable, etc. it is possible to estimate the model by ML estimation via the fast Fourier transform FFT when the characteristic functions are available in closed form. These methods can also be used in mixture models, input-oriented efficiency models, twotiered stochastic frontiers, and so forth. The properties of ML and some GLS techniques are explored with an emphasis on the normal-truncated normal model for which the likelihood is available analytically and simulations are used to determine various quantities that must be set in order to apply ML by FFT.
Starting with Annaert et al. 23 , stochastic frontier models have been applied very successfully in finance, especially the important issue of mutual funds performance. Schaefer and Maurer 24 apply these techniques to German funds to find that they "may be able to reduce its costs by 46 to 74% when compared with the best-practice complex in the sample." Of course, much remains to be done in this area and connect more closely stochastic frontier models with practical finance and better mutual fund performance evaluation.

Panel Data
Panel data have always been a source of inspiration and new models in stochastic frontier analysis. Roughly speaking, panel data are concerned with models of the form y it α i x it β v it , where the α i 's are individual effects, random or fixed, x it is a k × 1 vector of covariates, β is a k × 1 parameter vector and, typically, the error term v it ∼ iidN 0, σ 2 v . An important contribution in panel data models of efficiency is the incorporation of factors, as in Kneip et al. 25 . Factors arise from the necessity of incorporating more structure into frontier models, a point that is clear after Lee and Schmidt 26 . The authors use smoothing techniques to perform the econometric analysis of the model.
In recent years, the focus of the profession has shifted from the fixed effects model e.g., Cornwell et al. 27 to a so-called "true fixed effects model" TFEM first proposed by Greene 28 In this model, the individual effects are separated from technical inefficiency. Similar models have been proposed previously by Kumbhakar 29 and Kumbhakar and Hjalmarsson 30 , although in these models firm-effects were treated as persistent inefficiency. Greene shows that the TFEM can be estimated easily using special Gauss-Newton iterations without the need to explicitly introduce individual dummy variables, which is prohibitive if the number of firms N is large. As Greene 28 notes: "the fixed and random effects estimators force any time invariant cross unit heterogeneity into the same term that is being used to capture the inefficiency. Inefficiency measures in these models may be picking up heterogeneity in addition to or even instead of inefficiency." For important points and applications see Greene 8,31 . Greene's 8 findings are somewhat at odds with the perceived incidental parameters problem in this model, as he himself acknowledges. His findings motivated a body of research that tries to deal with the incidental parameters problem in stochastic frontier models and, of course, efficiency estimation. The incidental parameters problem in statistics began with the well-known contribution of Neyman and Scott 32 see also 33 . In stochastic frontier models of the form: y it α i x it β v it , for i 1, . . . , N and t 1, . . . , T. The essence of the problem is that as N gets large, the number of unknown parameters the individual effects α i , i 1, . . . , N increase at the same rate so consistency cannot be achieved. Another route to the incidental parameters problem is well-known in the efficiency estimation with cross-sectional data T 1 , where JLMS estimates are not consistent.
To appreciate better the incidental parameters problem, the TFEM implies a density for the ith unit, say The problem is that the ML estimator max α 1 ,...,α n ,δ is not consistent. The source of the problem is that the concentrated likelihood using α i the ML estimator will not deliver consistent estimators for all elements of δ.
In frontier models we know that ML estimators for β and σ 2 ω 2 seem to be alright but the estimator for ω or the ratio λ ω/σ can be wrong. This is also validated in a recent paper by Chen et al. 34 .
There are some approaches to correct such biases in the literature on nonlinear panel data models.
i Correct the bias to first order using a modified score first derivatives of log likelihood .
ii Use a penalty function for the log likelihood. This can of course be related to 2 above.
iii Apply panel jackknife. Satchachai and Schmidt 35 has done that recently in a model with fixed effects but without one-sided component. He derives some interesting results regarding convergence depending on whether we have ties or not. First differencing produces O T −1 but with ties we have O T −1/2 for the estimator applied when you have a tie . iv In line with ii one could use a modified likelihood of the form p * i δ; Y i p i α i , δ; Y i w α i dα i , where w α i is some weighting function for which it is clear that there is a Bayesian interpretation. Since Greene 8 derived a computationally efficient algorithm for the true fixed effects model, one would think that application of panel jackknife would reduce the first order bias of the estimator and for empirical purposes this might be enough. For further reductions in the bias there remains only the possibility of asymptotic expansions along the lines of related work in nonlinear panel data models. This point has not been explored in the literature but it seems that it can be used profitably.
Wang and Ho 36 show that "first-difference and within-transformation can be analytically performed on this model to remove the fixed individual effects, and thus the estimator is immune to the incidental parameters problem." The model is, naturally, less general than a standard stochastic frontier model in that the authors assume u it f z it δ u i , where u i is a positive half-normal random variable and f is a positive function. In this model, the dynamics of inefficiency are determined entirely by the function f and the covariates that enter into this function.
Recently, Chen et al. 34 proposed a new estimator for the model. If the model is y it α i x it β v it − u it , deviation from the mean gives Given β β OLS , we have e it ≡ y it − x it β v it − u it , when e it "data." The distribution of e it v it − u it , belongs to the family of the multivariate closed skew-normal CSN , so estimating λ and σ is easy. Of course, the multivariate CSN depends on evaluating a multivariate normal integral in R T 1 . With T > 5, this is not a trivial problem see 37 .
There is reason to believe that "average likelihood" or a fully Bayesian approach can perform much better relative to sampling-theory treatments. Indeed, the true fixed effects model is nothing but another instance of the incidental parameters problem. Recent advances suggest that the best treatment can be found in "average" or "integrated" likelihood functions. For work in this direction, see Lancaster 33 and Arellano and Bonhomme 38 ,Arellano and Hahn 39,40 ,Berger et al. 41 ,and Bester and Hansen 42,43 . The performance of such methods in the context of TFE remains to be seen.
Tsionas and Kumbhakar 44 propose a full Bayesian solution to the problem. The approach is obvious in a sense, since the TFEM can be cast as a hierarchical model. The authors show that the obvious parameterization of the model does not perform well in simulated experiments and, therefore, they propose a new parameterization that is shown to effectively eliminate the incidental parameters problem. They also extend the TFEM to models with both individual and time effects. Of course, the TFEM is cast in terms of a random effects model so it is at first sight not directly related to 35 .

Nature of Individual Effects
If we think about the model: y it α i x it β v it u it , u it ≥ 0, as N → ∞ one natural question is: Do we really expect ourselves to be so agnostic about the fixed effects as to allow α n 1 to be completely different from what we already know about α 1 , . . . , α n ? This is rarely the case. But we do not adopt the true fixed effects model for that reason. There are other reasons. If we ignore this choice we can adopt a finite mixture of normal distribution for the effects.
In principle this can approximate well any distribution of the effects, so with enough latent classes we should be able to approximate the weight function w α i quite well. That would pose some structure in the model, it would avoid the incidental parameters problem if the number of classes grows slowly and at a lower rate than N so for fixed T there should be no significant bias. For really small T a further bias correction device like asymptotic expansions or the jackknife could be used.
Since we do not adopt the true fixed effects model for that reason, why do we adopt it? Because the effects and the regressors are potentially correlated in a random effect framework so it is preferable to think of them as parameters. It could be that α i h x i1 , . . . , x iT , or Mundlak 45 first wrote about this model. In some cases it makes sense. Consider the alternative model:

6.1
Under stationarity we have the same implications with Mundlak's original model but in many cases it makes much more sense: mutual fund rating and evaluation is one of them. But even if we stay with Mundlak's original specification, many other possibilities are open. For small T , the most interesting case, approximation of α i h x i1 , . . . , x iT by some flexible functional form should be enough for practical purposes. By "practical purposes" we mean bias reduction to order O T −1 or better. If the model becomes

Random Coefficient Models
Consider y it α i x it β i v it u it ≡ z it γ i v it u it 46 . Typically we assume that γ i ∼ iidN K γ, Ω . For small to moderate panels say T 5 to 10 adaptation of the techniques in the paper by Chen et al. 34 would be quite difficult to implement in the context of fixed effects. The concern is again with evaluation of T 1 -dimensional normal integrals, when T is large.
Here, again we are subject to the incidental parameters problem-we never really escape the "small sample" situation.
One way to proceed is the so-called CAR conditionally autoregressive prior model of the form:

6.4
In the multivariate case we would need something like the BEKK factorization of a covariance matrix as in multivariate GARCH processes. The point is that the coefficients cannot be too dissimilar and their degree of dissimilarity depends on a parameter ϕ 2 that can be made a function of covariates, if any. Under different DGPs, it would be interesting to know how the Bayesian estimator of this model behaves in practice.

More on Individual Effects
Related to the above discussion, it is productive to think about sources, that is, where these α i s or γ i s come from. Of course we have Mundlak's 45 interpretation in place. In practice we have different technologies represented by cost functions, say . The standard cost function y it α 0 x it β v it with input-oriented technical inefficiency results in y it α 0 x it β v it u it , u it ≥ 0. Presence of allocative inefficiency results in a much more complicated model: y it α 0 x it β v it u it G it x it , β, ξ it where ξ it is the vector of price distortions see 47,48 . So under some reasonable economic assumptions and common technology we end up with a nonlinear effects model through the G · function. Of course one can apply the TFEM here but that would not correspond to the true DGP. So the issues of consistency are at stake.
It is hard to imagine a situation where in a TFEM, y it α i x it β v it u it the α i s can be anything and they are subjected to no "similarity" constraints. We can, of course, accept that α i ≈ α 0 G it x it , β, ξ it , so at least for the translog we should have a rough guide on what these effects represent under allocative inefficiency first-order approximations to the complicated G term are available when the ξ it s are small. Of course then one has to think about the nature of the allocative distortions but at least that's an economic problem.

Why a Bayesian Approach?
Chen et al.'s transformation that used the multivariate CSN is one class of transformations, but there are many transformations that are possible because the TFEM does not have the property of information orthogonality 33 . The "best" transformation, the one that is "maximally bias reducing," cannot be taking deviations from the means because the information matrix is not block diagonal with respect to α i , λ ω/σ signal/noise . Other transformations would be more effective and it is not difficult to find them, in principle.
Recently Tsionas and Kumbhakar 44 considered a different model, namely, y it α i δ i x it β v it u it , where δ i ≥ 0 is persistent inefficiency. They have used a Bayesian approach. Colombi et al. 49 used the same model but used classical ML approach to estimate the parameters as well as the inefficiency components. The finite sample properties of the Bayes estimators posterior means and medians in Tsionas and Kumbhakar 44 were found to be very good for small samples with λ values typically encountered in practice of course one needs to keep λ away from zero in the DGP . The moral of the story is that in the random effects model, an integrated likelihood approach based on reasonable priors, a nonparametric approach based on low-order polynomials or a finite mixture model might provide an acceptable approximation to parameters like λ.
Coupled with a panel jack knife device these approaches can be really effective in mitigating the incidental parameters problem. For one, in the context of TFE, we do not know how the Chen et al. 34 estimator would behave under strange DGPs-under strange processes for the incidental parameters that is. We have some evidence from Monte Carlo but we need to think about more general "mitigating strategies." The integrated likelihood approach is one, and is close to a Bayesian approach. Finite mixtures also hold great promise since they have good approximating properties. The panel jack knife device is certainly something to think about. Also analytical devices for bias reduction to order O T −1 or O T −2 are available from the likelihood function of the TFEM score and information . Their implementation in software should be quite easy.

Conclusions
In this paper we presented some new techniques to estimate technical inefficiency using stochastic frontier technique. First, we presented a technique to estimate a nonhomogeneous technology using the IO technical inefficiency. We then discussed the IO and OO controversy in the light of distance functions, and the dual cost and profit functions. The second part of the paper addressed the latent class-modeling approach incorporating behavioral heterogeneity. The last part of the paper addressed LML method that can solve the functional form issue in parametric stochastic frontier. Finally, we added a section that deals with some very recent advances.

Endnotes
1. Another measure is hyperbolic technical inefficiency that combines both the IO and OO measures in a special way see, e.g., 50 , Cuesta and Zofio 1999 , 12 . This measure is not as popular as the other two.
2. On the contrary, the OO model has been estimated by many authors using DEA see, e.g., 51 and references cited in there .
3. Alvarez et al. 52 addressed these issues in a panel data framework with time invariant technical inefficiency using a fixed effects models . 11. LML estimation has been proposed by Tibshirani 58 and has been applied by Gozalo and Linton 59 in the context of nonparametric estimation of discrete response models.
12. The cost function specification is discussed in details in Section 5.2.
13. An alternative, that could be relevant in some applications, is to localize based on a vector of exogenous variables z i instead of the x i 's. In that case, the LML problem becomes θ z arg max where z are the given values for the vector of exogenous variables. The main feature of this formulation is that the β parameters as well as σ, ω, and ψ will now be functions of z instead of x.