Performance estimation when the distribution of ineﬃciency is unknown

We show how to compute ineﬃciency or performance scores when the distribution of the one-sided error component in Stochastic Frontier Models (SFMs) is unknown; and we do the same with Data Envelopment Analysis (DEA). Our procedure, which is based on the Fast Fourier Transform (FFT), utilizes the empirical characteristic function of the residuals in SFMs or eﬃciency scores in DEA. The new techniques perform well in Monte Carlo experiments and deliver reasonable results in an empirical application to large U.S. banks. In both cases, deconvolution of DEA scores with the FFT brings the results much closer to the ineﬃciency estimates from SFM.


Introduction
Stochastic Frontier Models (SFMs; Kumbhakar and Lovell, 2000, also known as Stochastic Frontier Analysis, SFA) as well as Data Envelopment Analysis (DEA, Charnes et al., 1978;Dyson et al., 2001) are essential in the measurement of performance and efficiency. However, SFMs rely on distributional assumptions about the twosided error term that represents noise, and the one-sided error component that represents technical inefficiency.
Although the normality assumption for the two-sided error is reasonable, specific distributional assumptions for the one-sided error component are mostly hard to defend. Such assumptions usually involve the half-normal, the truncated-normal, the exponential, less commonly the gamma distribution, etc. See also Adams and Sickles (2007).
For DEA, it is widely recognized that inefficiency scores incorporate random noise of measurement error and they are likely to overestimate inefficiency; see Johnson and Kuosmanen (2012), Cordero et al. (2015), Oh and Shin (2015), Holland and Lee (2002), Wanke et al. (2015), and Tsionas (2003). Since such distributional assumptions for the one-sided error in SFMs are more or less arbitrary and subject to researcher's preference, it would be important to develop a procedure that does not depend on such assumptions about the distribution. In this paper, we use the convolution of the two-sided (normal) and one-sided components to recover efficiency or performance estimates without specific assumptions on the one-sided error component. Specifically, we construct the empirical characteristic function of the composed error term, we recover the distribution of the one-sided disturbance through the Fourier transform, and we compute performance estimates using numerical integration. Essential to the numerical efficiency of this approach is the use of the Fast Fourier Transform (FFT).
The same method is extended to the case of DEA under the assumption that DEA scores can be decomposed into noise and pure inefficiency. Therefore, we "adjust" DEA scores for the presence of noise using the FFT deconvolution of DEA scores into measurement error and (in)efficiency. Other methods that address the problem of measurement error, include the Bayesian approach of Tsionas (2019), and the smooth monotone concave approximation of Tsionas and Izzeldin (2018), which is based on Allon, Beenstock, Hackman, Passy, and Shapiro (2007), Keshvari and Kuosmanen (2013) and Kuosmanen and Kortelainen (2012) and Lee et al. (2013). Of course, the presence of noise in obtaining performance scores from DEA is an important issue and much of the literature is devoted to addressing the issues from measurement problems (see, for example, Andor et al., 2019). In SFM, the main problem is to avoid assuming specific distribution for the one-sided error component; and for DEA the main issue is to perform deconvolution of (in)efficiency scores from the possible presence of measurement error without explicitly specifying the way in which measurement error affects inputs or outputs. This is important because different ways of introducing measurement error in DEA problems may yield different efficiency scores; and choosing between different DEA models is an under-explored area. So, to summarize, in DEA our problem is to deconvolve "signal" (efficiency or inefficiency scores) from noise without specifically assuming how measurement error interacts with inputs and outputs (which can give rise to a whole constellation of possible models that would have to be considered). In SFM, our problem is that of obtaining (in)efficiency estimates (under normality, say, of the two-sided error component) without distributional assumptions about the one-sided error term. The new techniques are investigated using Monte Carlo simulations as well as in the context of panel data from large U.S. banks.

Model
Suppose x it ∈ R K is a vector of explanatory variables (say log input prices and log outputs), y it is the dependent variable (typically log cost), and the model is where v it ∼ i.i.d N (0, σ 2 v ), α i represents firm effects to capture heterogeneity, u it is a non-negative error component representing technical inefficiency, I is the number of decision-making units or firms, and T is the number of time periods. Alternative, we can define a production frontier of the form where x it is the vector of log inputs, y it is log output and the interpretation of v it and u it is the same as in (1).
Typical assumptions about u it include the exponential, half-normal, truncated-normal, gamma, etc. distributions.
We make no specific distributional assumption about u it other than that these error components are i.i.d and independent of the regressors. If the composed error term is denoted ε it = v it + u it , then the characteristic function where E(·) denotes expectation (with respect to ε in this instance whose density is denoted f ε (ε)) and ι = √ −1.
By standard properties of the characteristic function we have: In the case of production frontiers where the composed error is ε = v − u the formula is modified to To recover the density from the characteristic function we have Given residuals 1ε it the empirical characteristic function is Therefore, the characteristic function of u it is given by The characteristic functions corresponding to several one-sided error distributions are presented in (A.6) of Appendix A. Taking natural logs we have Given ϕε it (τ ), σ 2 v can be estimated as the coefficient of −τ 2 j /2 in the following regression-like equation, see (A.5) in Appendix A: where {τ j , j = 1, . . . , J} is a set of J points around zero, and ξ j is an error term. Of course, σ 2 v must be restricted to be non-negative. Since the values of the dependent variable and ln ϕ uit (τ j ) can be complex, we consider stacking the real and imaginary parts in (9) (Feuerverger and McDunnough, 1981a,b; see also Koutrouvelis, 1980Koutrouvelis, , 1981. As an example, for the half-normal distribution we have where Φ(·) is the standard normal distribution function, see equation (A.6) in Appendix A. However, as in practice, we cannot obtain the empirical characteristic function of the one-sided error terms, we need to find a close approximation described in what follows. To introduce our procedures, we notice that the density function may be obtained as where p(z) is the density of a random variable Z taking values z, ϕ(τ ) is the respective characteristic function. The FFT-based algorithm is applied to derive the standardized density by applying the Fourier transform to the cdf.
The integral in (11) is evaluated for N equally spaced points with distance h, that is If we set τ = 2πω we have This integral can be approximated by using the rectangle rule for N points with spacing s, i.e.
Each element (say the kth one) is normalized by s(−1) k−1−(N/2) to obtain the density function at each point of the grid.
Any characteristic function satisfies the conditions: where the overbar denotes complex conjugation. Since ϕ u (τ ) is a univariate function, we can use standard kernel smoothing techniques to estimate (9) as a semi-parametric linear model, which can be re-normalized so that properties (i), (ii) and (iii) hold. As this strategy seems prohibitively difficult, a simpler alternative is to use a mixture-of-normal-distributions model (MNM) for the characteristic function (see (A.5) in Appendix A as well as the additivity property of any characteristic function in (A.3)): where {µ g , σ g , p g , g = 1, . . . , G} are unknown coefficients, g g=1 p g = 1, p g ≥ 0, g = 1, . . . , G, and G is the number of terms in the mixture. From (9) and (16) we obtain: In principle, this model approximates well any distribution of the one-sided error component, as long as G is large enough. The model is nonlinear but can be estimated using the method of nonlinear least squares. The parameters to be estimated are σ 2 v , {p g , µ g , σ g } G g=1 . The data consist of points {τ j , j = 1, . . . , J}. The exact configuration of parameters, , B, N 0 , N B will be described as we proceed, but within certain bounds their exact specification is immaterial.
To obtain the density we use the Fast Fourier Transform (FFT) in (5); see Tsionas (2012). Specifically, let N = 2 N , and define the sequence for some h > 0 and integer N . The density can be expressed as As this is the inverse Fourier transform, it provides the density once we know the characteristic function. This is based on equations (6) and (7) of Mittnik et al. (1999). An approximation is obtained as follows, by rectangle integration. where see Mittnik et al. (1999). This expression is essentially an application of rectangle integration. Based on the results of Doganoglu and Mittnik (1998) and Mittnik et al. (1999), the error of approximation is close to 10 −7 . Therefore, an efficient way to compute the density is to apply the FFT to the sequence (−1) n−1 ϕ(2πω n ). 2 The transform needs to be normalized by s(−1) k−1−N/2 . In our computations, we use N = 18 (yielding N = 262, 144) and we it (Duhamel and Vetterli, 1990). To compute individual inefficiency scores, we need the distribution of inefficiency conditional on the data which is given by where D denotes the data on y it and x it ,and r it = y it −α i − x itβ given estimatesσ 2 v ,α i andβ . The first term, , corresponds to the normality of v it given u it .The second term, f u (u it ) is simply the density of u it .
The first problem is to compute the density f u (u it ) at points that are different from the u k s (1 ≤ k ≤ N ) used in the FFT. The values of the density can be computed easily using interpolation (Tsionas, 2012). The second problem is to compute the normalizing constants using standard quadrature techniques: In turn, inefficiency scores can be computed aŝ again, using quadrature techniques. The expression in (24) is simply the expected value of u from its posterior conditional distribution.

Deconvolution in DEA
Clearly, a certain amount of noise (denoted v it ) is present, with the implication that the true inefficiency scores are Therefore, we can write the DEA scores as We assume that v it iid ∼ N (0, σ 2 v ). The distribution of actual inefficiency scores, under this assumption, has If σ were known, this integral would be computed using draws from a normal distribution, and the empirical densityf U (U ) would be computed by standard kernel density techniques. To estimate σ we consider the log characteristic function for some error term ξ(τ ), whereφ U (τ ) is the empirical characteristic function of {U it } : We can approximate log ϕ u (τ ) using the same methods as in (9) and (16). As this, in turn, provides an estimate of σ, we can recover the density of u it using (5) together with the FFT transform as implemented in (18)-(21), in a flexible way.

Monte Carlo evidence
To assess our new techniques we consider a production function of the form where the x i s are generated from a lognormal distribution with location parameter 1 and standard deviation 2. We , defining the variance parameter σ 2 = σ 2 v + σ 2 u and the signal-to-noise ratio λ = σu σv . 3 We examine various values of I,λ and σ with K = 10 which give five input prices and five outputs. Since we have a production frontier we must take into account (4). In Table 1, Monte Carlo results for correlation coefficients between actual and predicted inefficiency are reported from SFM and output-oriented, variable-returns-to-scale DEA. We use 5,000 Monte Carlo replications. SFMs are estimated using the method of maximum likelihood for the MNM model, when the actual generating mechanism is a half-normal distribution, allowing for 500 maximum iterations. In case of non-convergence we draw another data set. 4 3 The original parameters may be recovered as The half-normal is an obvious and popular case to consider. Monte Carlo results for other distributions are reported below. Moreover, when the distribution of the one-sided error component is known, application of the FFT to recover the density function is closely approximated by the MNM with an error 10 −7 .  when the amount of measurement error is low but not when noise is relatively large. From the same study corrected ordinary least squares (COLS) is shown to be more sensitive to data noise, so that the results are inferior to those of DEA (see, however, Ruggiero, 1999, on this point). For this reason we give no further consideration to COLS.
In Table 2 we report mean absolute errors (MAE) between actual and estimated inefficiency.   Another important issue is the sensitivity of results when the inputs and the two-sided error are correlated, i.e., there is an endogeneity problem (this comment was provided by an anonymous reviewer). The x i s are generated as previously, but they depend on the two-sided error term. Each regressor is then generated using x i =x i + i , where thex i s are generated from a lognormal distribution with location parameter 1 and standard deviation 2, as before.
The error term i is normally distributed with zero mean and variance σ 2 , where σ 2 depends on the correlation coefficient based onx i and the two-sided error term. Our results are reported in Table 3, when λ = 0.5 and for different values of σ and . FFT-SFM and FFT-DEA are based on the MNM model defined in (16). The BIC criterion favored G = 3 normal components. Based on the results in Table 3, MAEs increase with and with σ but decrease with higher values of λ (Table 2). Even with = 0.25, the results are acceptable but as increases to 0.5 or 0.75 the MAEs are increasing and, therefore, the results are not to be trusted. This is reasonable as endogeneity that is ignored has material impact on coefficient estimates and functions of interest when exceeds 0.25 or so.  Another issue (pointed out by an anonymous reviewer) is the performance of efficiency estimates when the wrong distribution is adopted for technical inefficiency. Again, we use the MNM model in (16)

Empirical application
We The IDF is homogeneous of degree one in inputs, increasing in inputs, decreasing in outputs, concave in inputs, and D(x, y) ≤ 1. These constraints are enforced at ten randomly selected observed points and the geometrical means of the data and then we check whether they hold at the majority of observations. Suppose x it ∈ R K + and y it ∈ R M + are, respectively, logs of inputs and outputs, a translog ODF is given as follows: where β 1 and B are a vector and a symmetric matrix of parameters, respectively, and a i denotes bank effects. Let us define where T it denotes an approximation to temporal effects: where I(·) denotes the indicator function, and T R it = t (for all i and t). The coefficients in (33) are firm specific and we assume the following parameterization: wherez iτ = [x iτ,2 − x iτ,1 , . . . , x iτ,K − x iτ,1 , y iτ,1 , y iτ,2 , . . . , y iτ,M ] , and is a conformable vector. The specification in (33) resembles the Baltagi and Griffin (1988) general index of technological change. The difference is that the coefficients are firm-specific due to the Baltagi in (34). This makes it possible to estimate firm-specific as well as time-varying technical change (T C it ) given as The differencing of inputs in (32) is due to homogeneity of degree one of (31). The model in (31)     banks experienced TC as high as 8.5%, and EC as high as 4%. For other banks, TC and EC were as low as -4%.
All models yield different results for TC, and PC, and, therefore, for PG as well. We report sample distributions of Efficiency Change (EC) in panel (a) of Figure 7. In panel (b) we report sampling distributions of technical change (TC) using different distributional assumptions. The implication is that different specifications yield different results and, if we need to accept one model out of many, then we need to use the BIC or other techniques like, for example, Bayes factors. This is not attempted here as our main interest is in comparing different models rather than in selecting the "best" model. Second, adding environmental or contextual variables in inefficiency is an interesting avenue for further research that is not consider in this paper due to its complexity. Third, new challenges for the FFT are likely to arise if inefficiency is dynamic in both DEA and SFA. Finally, Bayesian inference procedures are likely to be important in the class of models considered in this paper. The major obstacle is that Bayesian analysis using characteristic functions instead of densities, is quite difficult. In this field, it would be useful to have results related to Bayesian updating of characteristic functions corresponding to the similar updating of densities.

Appendix A. Mathematical Appendix
For any random variable X, the characteristic function is defined as ϕ X (τ ) = E(e ιτ X ) = e ιτ x f X (x) dx, (A.1) where ι = √ −1, τ ∈ R, and f X (x) is the probability density function of X evaluated at a point x. The characteristic function completely characterizes the distribution of X. If the characteristic function of a random variable X is integrable, then its distribution function is absolutely continuous, and therefore X has a probability density function, which is given by inverting (A.1): For a complete treatment, see Lukacs (1970). Some of the properties of the characteristic function are listed below.