Minimax regret priors for eﬃciency estimation

,


Introduction
The problem of specifying a distribution for inefficiency in stochastic frontier models (SFMs) is long-standing and well-known.Typical choices are the half-normal, the exponential, and the truncated-normal.A partial list includes Chen and Lin (2009), Kao et al., (2019), Kumbhakar Tsionas (2003) proposed using DEA scores for crafting a prior for SFMs, an approach that was refined in Tsionas (2023).
In this paper we propose the use of a minimax regret empirical prior relative to the sample distribution of efficiency scores from Data Envelopment Analysis (DEA).Despite the fact that DEA does not account for noisy data, it is likely that interval estimators of efficiency (a quantity naturally defined in (0, 1]) will be more robust and thus they offer a natural candidate for crafting or benchmarking an (in)efficiency prior.These interval priors are closer to the idea of "thick frontiers" (Berger and Humphrey, 1992;Tsionas, 2019) in which classes rather than point inferences are used.The minimax prior is taken with respect to all distributions that are consistent with DEA interval estimators of inefficiency and they can be used to mitigate concerns about use of a single prior (be it, for example, a halfnormal or DEA itself).For related work that uses DEA in SFA with non-standard efficiency distributions, see Campbell, Rogers and Rezek (2008), Rezek, Campbell and Rogers (2011), Macedo and Scotto (2014), Macedo, Silva and Scotto (2014).
The term "prior" in conjunction with inefficiency is used as the latter represent missing data.We consider the minimax regret prior as an empirical prior for inefficiencies as well as other SFM parameters and we use DEA and maximum likelihood estimation to craft this empirical (Bayes) prior.Given that inefficiencies are not observed and they are coarse, any given interval is compatible with different "data" on inefficiency, making the construction of a single prior problematic.When observations are coarse and may overlap, it is not clear how to define the likelihood function: several options exist, according to whether we take into account the measurement process governing the incompleteness or not.In this paper, we focus on optimizing the likelihood function that we should have observed, had observations been precise.Due to incomplete observations, this likelihood function is imprecise, since there are several possible precise datasets compatible with the coarse observations.Two approaches have been proposed: one considers an optimistic point of view aiming to disambiguate the data, by maximizing the maximum likelihood value across candidate datasets.Another more cautious one tries to maximize the minimum likelihood value across candidate datasets thus adopting a robust optimization approach.Both approaches have their weaknesses and can be criticized as being extreme ones, yielding either too deterministic or too dispersed distribution functions.In this paper we propose an alternative criterion that tries to define a trade-off between the two previous approaches, and can be seen as minimizing a maximal regret criterion.Finally, we apply the new techniques to a dataset of large U.S. banks to showcase the applicability of minimax regret priors.

Formulation
We consider a stochastic frontier model (SFM)1 where the regressors represents measurement error (noise), and u i stands for technical inefficiency.We denote efficiency by r i = e −u i ∈ (0, 1] (i = 1, ..., n).More specifically, we consider the class of input distance functions (IDFs).Suppose the inputs are denoted X i = (X i1 , ..., X iI ) ∈ R I + not to be confused with x i s, and outputs are denoted When there is no risk of confusion we omit the observation index i (i = 1, ..., n).Let the set of feasible production plans be The IDF is defined as and it has several properties: It is homogeneous of degree one in inputs, increasing in inputs, decreasing in outputs and concave in inputs.It can be shown that a translog approximation to an IDF takes the following form (see also O'Donnell, 2018, p. 284): where Xik = X ik X i1 , α 0 , (α k , α kk , β mm , γ km , δ kl , ζ ml ) are unknown parameters, w i = (w il , l = 1, ..., L) (∀i = 1, ..., n) is an L × 1 vector of exogenous variables that includes firm-specific time effects, a time trend and possibly quasi-fixed inputs.A full translog functional form is used to incorporate this vector into the analysis.Notice that (4) is in the form of (1).To benchmark a prior, we use DEA with the given sets of inputs and outputs.DEA provides a set of inefficiency scores or estimates u DEA i for u i (R i = e −u i and, therefore, which ignore the presence of noise or any simplifications arising from knowing approximately, at least, the functional form.For example, the translog is known to be second-order accurate for an arbitrary IDF.Our purpose is to use the efficiency estimates (R DEA i ) for benchmarking a prior p R (R i ) from which a prior p u (u) on inefficiency u i = − log r i can be computed using change of variables.Although DEA delivers numerical estimates we prefer to use the concept of "thick frontiers" to mitigate the problem of noise in DEA.Specifically, we assume that efficiency scores can be in r distinct groups, defined by (0, ā1 ), [ā 1 , ā2 ), ..., [ā r−2 , ār−1 ), (ā r , 1].
Despite the influence of noise, the thick frontier is less affected by it so intervals estimators of efficiency should be closer to reality.As we are dealing with crafting a prior, we need not worry about true inefficiency falling in the correct interval all the time.The efficiency groups can be transformed to inefficiency intervals and we have (−∞, a 1 ), [a 1 , a 2 ), ..., [a r−2 , a r−1 ), (a r , 1], where a j = − log āj .The intervals are fixed but their number 1 + r can vary so, we can choose r.In selecting r, we must keep in mind that finer intervals are less robust to noise while coarser intervals are more robust but less informative in terms of locating inefficiency.Our purpose is to adopt a prior which has minimax regret with respect to all priors compatible with the DEA estimates.

The minimax regret empirical prior
To summarize, for a random variable u representing inefficiency we consider an experiment where u is unobserved but there is an estimator û which provides incomplete observations for inefficiency: , where s(U) is the set of all subsets of U, and û represents the estimator.We assume Û = {A 1 , ..., A r }, A i ⊆ U albeit in our problem r = M .Our question is what is the best choice for the distribution P (u) given the estimates.This choice concerns the selection of a prior probability distribution to be used subsequently in more elaborate analysis.In other words, the estimator û is considered as a rough guide.We will write our choice as of observations and G j ∈ Û representing estimating inefficiency to be in group or interval j; , where u j is the jth unobserved outcome of u = (u 1 , ..., u n ), generated through the random process u.Notice that G j merely provides a group or interval to which inefficiency belongs.This is broadly consistent with the idea of "thick frontiers".Let also As usual in cases involving latent data we have three likelihood functions, namely: 1.The observed sample in Û: P (û|θ) = n i=1 p(û i |θ), where p(û i |θ) is a density.

The complete sample of
In this discussion we are interested in the second option viz., the latent sample in U.
We will not consider these problems in detail and we will only refer to Guillaume & Dubois (2019) where the problems are analyzed albeit not in the context of efficiency estimation.It suffices to say that both approaches are extreme, yielding either "too deterministic" or "too dispersed" distribution functions.Suppose n k is the number of appearances of a k in u, q j is the number of observations of A j in û, and n ik is the number of times that (a i , A k ) is in the complete sample z.It follows that u ∈ U(û) if an only if the following conditions are satisfied (Guillaume & Dubois, 2019): Using log-likelihoods and letting p θ k = Pr(U = a k |θ) the two problems can be written as : subject to constraints (5) and If we let θ = (p 1 , ...., p M ) be a probability mass function, we have the probability mass assignment pmf(A j ) = q j n (j = 1, ..., r), the optimal solution to the maximin likelihood problem subject to the first three constraints of ( 5) and n k ∈ R + , is the distribution with maximal entropy, namely the solution to max The interpretation is that we try to find the value of θ that reaches the best compromise between the various ideal values θu for all u in agreement with the estimates in û.The problem of minimizing maximal regret is which is equivalent to the following mathematical programming model: where V (•) signifies the vertices of a set.The problem can be solved only numerically using standard solvers. 4In our application we set r = 10.This is the value that maximizes the log marginal likelihood in our estimated model.The model is estimated using Bayesian techniques organized around Markov Chain Monte Carlo (MCMC) using 15,000 iterations the first 5,000 of which are omitted in the interest of mitigating possible start up effects.
Flat priors are used on the translog parameters and the two-sided error standard deviation,

The minimax regret prior for all other parameters
In the SFM given by (1), parameters β and σ v can be assigned a minimax regret prior con- e i where e i = y i − x i b are LS residuals.An alternative is to estimate (1) by maximum likelihood using any conventional distributional assumption e.g., half-normality, which provides consistent standard errors, S, as well.Using the 95% confidence intervals b j ± 2S j (j = 1, ..., d) it is possible to place r points in them (the two extremes being the endpoints of the intervals).In turn, using the methodology of Section 2.2 it is possible to compute the minimax regret empirical prior of β j conditional on all other βs denoted β −j as well as σ v and U , viz., we obtain the prior The prior of σ v can be constructed in the same way.In connection to (4) we can use the same methodology or argue that, in this instance, the βs do not have a structural interpretation and therefore it is better to place priors in terms of functions of interest like returns-to-scale (RTS), technical change (TC), and efficiency change (EC).These functions are linear in β and provide a set of linear functions that we denote Aβ where A has dimensionality F × d where F is the number of functions of interest (F =4 in this case) and β is d × 1.A class of possible priors is where r is an F × 1 vector and I F denotes the F × F identity matrix.As the properties of a prior like (11) are well understood we will proceed with the former option.We assume that r (β,σv) is the common value of r corresponding to all conditional priors of β and σ v in (10).
We assume a common value merely for convenience and in the empirical application this has been found to be 8 when we maximize the log marginal likelihood of the model.With a full minimax regret empirical prior on β, σ v , and U we proceed to showcase the new techniques in actual data from large U.S. banks.

Monte Carlo study
To examine the properties of the minimax regret empirical prior we consider a Monte Carlo study using model ( 4) retaining only the first-order terms.Therefore, we have a Cobb-Douglas approximation to an IDF (O'Donnell, 2018, p. 284 on this point, where it is argued that if there is more than one output and output sets are bounded, then IDFs cannot be exact Cobb-Douglas functions).We have five inputs and five outputs as in our empirical application in the next Section.The regressors are generated by bootstrapping with replacement the data set in the next Section so that we have a more realistic scenario.Coefficients α k are set to 1  4 and coefficients β j to 1 5 and we consider values of σ v ∈ {0.01, 0.05.0.1, 0.3}.The sample size is n ∈ {100, 250, 500, 1000, 2500, 5000}.Inefficiency is generated from a gamma distribution, G(a, c) whose mean is a c with a = 2 3 and c = 1, which has mean 2 3 .The density of efficiency is reported in Figure 1.We use r=10 supported points for both inefficiencies as well as the parameters of the SFM as this value turned out quite reasonable in the empirical application.We consider 5,000 Monte Carlo replications.For each replication we use 15,000 MCMC passes omitting the first 5,000 and starting values are obtained from maximum likelihood (ML) with half-normally distributed inefficiencies.For ML estimates of inefficiency we know that the estimates are generally consistent unless we have time-invariant fixed effects (Greene, 2005).For DEA we know that efficiency scores converge very slowly at a rate O(n ).This information is enough to compare our own estimates.Before proceeding, however, it is interesting to examine how the minimax regret priors look like.In Figure 2 we present the joint density of regression parameters α 1 and α 2 (panels (a) and (b)), and their marginals in panels (c) and (d).Clearly, the two parameters presented here are positively correlated as they are based on the data.Moreover, the marginals and the joint density show that the empirical prior is not symmetric and, therefore, cannot be close to a bivariate normal.More evidence on non-normality is provided by the marginal prior of σ v reported in panel (e) which is, clearly, multimodal.Multimodality is the result of the compromising role of the minimax regret prior which balances the maximax and maximin priors defined in the previous Section.As the minimax regret empirical prior balances two extremes (uniformity on the one hand and Dirac masses on the other) multimodality is not overly surprising, and it is also evident in densities of α 3 and α 4 presented in panels (f), (g), (h), and (i).0.12 0.13 0.14 0.15 0.16 0.17 0.  Our Monte Carlo results are reported in Table 1 where shown are mean squared errors of inefficiency estimates.The surprising feature of these results is that the mean squared error of inefficiency estimates, for a given value of σ v , is that it goes down at the rate n instead of the usual √ n -a fact that can be justified if we recall that the prior is empirical, that is it is base on the same data as the posterior.
1.22 one may argue that the results in Table 1 are too optimistic.To show that this is not the case, we report additional results in Table 2 with values of σ v closer to σ U .Although mean-squared errors are higher, the general tendency of decreasing at rate n rather than √ n continues to apply.From the Monte Carlo study, we may conclude the efficiency estimates converge rather quickly (as opposed to DEA which converges very slowly relative to the total number of inputs and outputs, I + J) and they are essentially unbiased and consistent.This important property means that minimax regret priors inherit the advantages of DEA being a nonparametric estimator and the advantages of SFA in terms of handling noise.Additionally, the efficiency estimators converge as a function of the sample size rather than I + J.This shows that the minimax regret prior can combine effectively the virtues of both DEA and SFA in providing accurate estimates of technical inefficiency.

Data
To illustrate the new techniques we use U.S. banking data previously used by Malikov, Kumbhakar and Tsionas (2015).As in their paper, we focus on a selected subsample of relatively homogeneous large banks, i.e. those with total assets in excess of $1 billion dollars We have the following outputs: consumer loans (y1), real estate loans (y2), commercial and industrial loans (y3), securities (y4), and off-balance-sheet income (y5) .See Berger andMester (1997, 2003) and Hughes andMester (1998, 2013).The inputs are labor, i.e.In terms of policy implications, the new model is likely to be useful as we do not rely on a parametric distribution for inefficiency but rather we anchor on DEA (a popular approach in most policy discussions and empirical applications) and a minimax regret approach to construct the (prior) model.Therefore, inefficiency estimates share some of the good properties of DEA (nonparametric specification) but avoiding some of its drawbacks like slow convergence in the total number of inputs and outputs.In this example, as DEA overestimates efficiency, we find that the minimax regret prior for incomplete data provides lower efficiency and brings results closer to what is known from previous studies.Specifically, bank efficiencies are in the neighborhood of 80-85%.Additionally, we provide a perspective with respect to the following idea.Although DEA may be biased, it is nevertheless biased in the right direction as it underestimates inefficiencies, at least in this example; therefore, policy recommendations to reduce inefficiency will not surprise the decision maker in the   wrong direction.In other words, if according to DEA inefficiency is 30% and in fact it is 20%, if policy measures are taken to reduce inefficiency by the true amount (20%) then the decision maker will not be disappointed.In this sense DEA is a useful benchmark for policy analysis.However, SFA often performs much better due to better statistical properties so, a combination of both SFA and DEA is to be recommended in practice.One way to accomplish this task is to draft a prior based on the minimax regret criterion for incomplete data and use this prior specification as a distribution for use in SFA.To conclude this discussion, having access to the most reliable information (e.g., from this new unbiased and consistent estimator) allows the use of adequate and proportional policy measures to reduce the true amount of inefficiency.

Concluding remarks
In this paper we construct a minimax regret prior for inefficiencies in SFMs (i) to avoid specific distributional assumptions which are made invariably in the literature, and (ii) to avoid the use of a single prior on unobserved or "missing data" inefficiencies.The minimax regret prior is taken with respect to all priors consistent with interval estimates of inefficiencies from DEA. Interval estimation is used as it makes DEA more robust to the presence of noise.The minimax regret inefficiency prior is extended to include the parameters of the SFM.MCMC methods are developed and the new techniques are applied successfully to a dataset on large U.S. banks.

nn
log n k n .The solution to the second problem in(6) is the solution with minimal entropy, namely the solution to min n log n k n subject to the first three constraints of(5).The maximax approach tends to resolve uncertainty very strongly yielding Dirac distributions consistent with the estimates.The maximin approach yields distributions with high variances interpreting incomplete information as the result of extreme randomness: the less information the larger the variance, which is not fully satisfactory either.Suppose θu is the maximum likelihood estimator of θ provided u is observed.The minimax relative regret problem is max θ∈Θ min u∈U (û) P (u|θ) P (u| θu).
on inefficiencies U = (u 1 , ..., u n ).With the exception of the intercept, we know that least squares (LS) estimates of β are consistent.Suppose the vector b contains LS parameter estimates ignoring technical inefficiency and S is the vector of LS standard errors.Them we know that the intervals b ± 2S contain the true parameters with sampling probability 95% (element-wise).We assume that for the intercept, a consistent estimator and its standard error are available in the first elements of b and S, viz., b 1 and S 1 .Such a consistent estimator is provided by b 1 := b 1 + min i=1,...,n

Figure 2 :
Figure 2: Minimax regret empirical priors of selected parameters the number of full-time equivalent employees (x1), physical capital (x2), purchased funds (x3), interest-bearing transaction accounts (x4) and non-transaction accounts (x5).For comparison purposes we estimate also a translog output distance function (ODF) using the same inputs and outputs.To account for technical change we include a time trend.In panel (a) of Figure3reported are sample distributions of posterior-mean estimated technical efficiencies.DEA corresponds to input-oriented Data Envelopment Analysis.We also consider SFMs with exponentially, half-normally, and truncated-normally distributed inefficiencies (for the priors of β and σ v in this case, see Footnote 5; for the shape parame-ters of the distributions we assume flat priors).In panel (b) reported are sample distributions of posterior-mean estimated returns to scale (RTS).In panel (c) reported are sample distributions of posterior-mean estimated technical change (TC).In panel (d) we present sample distributions of posterior-mean estimated of efficiency change (EC).Finally, in panel (e) we present the actual minimax regret prior (using linear interpolation)6 compared to the DEA sample distribution of efficiency scores.Clearly, there are important differences when a minimax regret prior is placed on the inefficiencies.RTS, TC, and EC are different and from panel (f) the minimax regret prior is strikingly different compared to DEA despite the fact that the former is based on the latter.In turn, the choice of a minimax regret prior affects materially all measures of interest like RTS, TC, etc.

Figure 3 :
Figure 3: Distributions of interest

Notes:
In panel (a) reported are sample distributions of posterior-mean estimated technical efficiencies.DEA corresponds to input-oriented Data Envelopment Analysis.In panel (b) reported are sample distributions of posterior-mean estimated returns to scale.In panel (c) reported are sample distributions of posterior-mean estimated technical change (TC).In panel (d) we present sample distributions of posterior-mean estimated of efficiency change (EC).In panel (e) we present the actual minimax regret prior (using linear interpolation) compared to the DEA sample distribution of efficiency scores.

Table 1 :
Monte Carlo results I

Table 2 :
Monte Carlo results II US dollars) in the first three years of observation.The data on commercial banks come from Call Reports available from the Federal Reserve Bank of Chicago and include all FDIC-insured commercial banks with reported data for 2001:Q1-2019:Q4.We have an unbalanced panel with 3,897 bank-year observations for 285 banks.We deflate all nominal stock variables to 2005 US dollars using the consumer price index for all urban consumers.