A generalized true random-effects model with spatially autocorrelated persistent and transient inefficiency

This study extends the generalized true random-effects model to account for spatial dependence in persistent and transient inefficiency. For this purpose, a model with spatially autocorrelated persistent and transient inefficiency components is specified. Additionally, spatial dependence is also modeled in the noise component to account for uncontrolled spatial correlations. The proposed model is applied to a panel dataset of Wisconsin dairy farms observed between 2009 and 2017 and estimated using Bayesian techniques. Apart from the traditional output-input quantities, the utilized dataset also contains information on the exact location of farms based on their latitude and longitude coordinates as well as on environmental factors. The empirical findings suggest low levels of both persistent and transient inefficiency for farms. Additionally, all components exhibit spatial dependence with its magnitude being more than double for persistent inefficiency. © 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )


Introduction
In neoclassical economics, the theory of the firm states that producers successfully optimize their production processes. However, irrespective of the firms' objectives (cost minimization or profit maximization), this assumption rarely holds in practice. This can be due to governmental regulation, poor management practices or even unforeseen events that are outside the control of producers. Therefore, empirical studies have focused on quantifying deviations of observed from optimal production. In a parametric setting, the measurement of shortfalls in production is achieved using the technique of Stochastic Frontier Analysis (SFA) introduced simultaneously by Aigner, Lovell, and Schmidt (1977) and Meeusen and van den Broeck (1977) . The SFA model recognizes that firms may not operate on the frontier due to pure inefficiency and noise. The former is captured by a one-sided non-negative inefficiency component and the latter by a two-sided error term.
Since its introduction, the SFA tool has undergone several amendments mainly related to the distributional assumptions imposed on inefficiency. Also, the availability of panel data enabled empirical studies to capture changes in firms' performances over time by specifying the inefficiency component as time-varying. Nevertheless, the main challenge when panel data are available lies on controlling for unobserved (time-invariant) heterogeneity. With this objective in mind, firm effects were accounted for by including an additional time-invariant component in the SFA model. The main dilemma of studies following this practice was whether to treat these firm effects as (persistent or time-invariant) inefficiency or not. For instance, Kumbhakar (1991) and Kumbhakar and Heshmati (1995) treated firm effects as persistent inefficiency, while Greene (2005a) and Greene (2005b) assumed that these effects are not parts of inefficiency.
This methodological conflict led to the introduction of a new state-of-the-art model by Colombi, Kumbhakar, Martini, and Vittadini (2014) , Kumbhakar, Lien, and Hardaker (2014) and Tsionas and Kumbhakar (2014) , simultaneously, called the Generalized True Random-Effects (GTRE) model. This model separates the timeinvariant firm effects into a (two-sided) random firm effect that captures unobserved heterogeneity and a (one-sided) persistent inefficiency effect. Therefore, overall the GTRE model contains two two-sided noise components (one time-invariant and one timevarying) and two one-sided inefficiency components (one timeinvariant or persistent and one time-varying or transient). The need for including time-invariant and time-varying noise terms is obvious when panel data are at hand. The question is why one needs two inefficiency components. Below we provide some lines of reasoning. On the one hand, persistent inefficiency is a long-run concept and can be due to rigidities producers face with regards to reorganizing their production processes. For instance, quasi-fixed factors of production such as capital can't be instantaneously altered due to the existence of adjustment costs ( Stefanou, 2009 ); ( Silva, Oude Lansink, & Stefanou, 2015 ). These adjustment costs can be either pecuniary in nature (e.g. high debts) or related to learning ( Skevas, Emvalomatis, & Brümmer, 2018a ). In either case, these costs can result in persistent inefficiency for firms. On the other hand, transient inefficiency is a short-run concept related, for instance, to management qualities that can vary from year to year and can cause temporal changes in inefficiency ( Tsionas & Kumbhakar, 2014 ). Recognizing the need to account for these two inefficiency components as this is motivated above, the GTRE model has recently been used by several empirical studies including ( Filippini & Greene, 2016 ); ( Skevas, Emvalomatis, & Brümmer, 2018c ); ( Lien, Kumbhakar, & Alem, 2018 ); ( Adom & Adams, 2020 ).
Another direction of the efficiency measurement literature that runs in parallel concerns the identification of spatial dependencies in firms' inefficiency levels. The assumption of the existence of spatial relationships between neighboring units originates from Tobler (1970 , p. 236), with his first law of geography stating that "everything is related to everything else, but near things are more related than distant things". Recently, a plethora of empirical studies have emerged that account for spatial dependencies in firms inefficiencies including the studies of Areal, Balcombe, and Tiffin (2012) , Fusco and Vidoli (2013) , Tsionas and Michaelides (2016) , Pede, Areal, Singbo, McKinley, and Kajisa (2018) and Skevas (2020) . All the aforementioned studies specify a spatial autoregressive process on inefficiency, assuming that each individual's inefficiency depends on neighbors' inefficiency levels plus an individual noise component. The motivation for seeking for spatial dependencies in firms' inefficiency scores stems from the similar preferences or tastes of producers who own land in the same area ( Skevas, Skevas, & Swinton, 2018d ) or the potential communication between neighboring producers regarding their production decisions/practices, the use of (new) technologies and the flow of knowledge regarding the use of resources ( Skevas & Oude Lansink, 2020 ); ( Skevas, 2020 ); ( Schneider, Skevas, & Oude Lansink, 2021 )). Furthermore, spatial dependence in producers' inefficiency levels may not only arise from imitating behavior but also from cases whereby producers are being advised by common local consultants and/or by being members of the same local cooperative ( Orea & Álvarez, 2019 ).
Given that studies employing the GTRE model ignore the aforementioned spatial efficiency studies, and vice versa, this article blends the state-of-the-art GTRE model with the spatial autoregressive efficiency model making it the first to provide empirical evidence on the existence of spatial dependence on firms' persistent and transient inefficiencies. Questions such as whether both inefficiencies are spatially dependent or not, and if yes, which component exhibits higher spatial dependence are exploited in the present article. Furthermore, spatial dependence in the noise component is taken into account in order to capture correlations in factors that are outside the control of producers. The proposed model is applied to the case of Wisconsin dairy farms. A panel dataset is at hand that contains information on farms output, inputs and geographical coordinates of latitude and longitude, thus allowing us to explore their exact location and identify neighboring producers. Additionally, environmental factors are used to account for observed environmental heterogeneity across farms. The next section presents the model, the Bayesian estimation method and a simulation study. A description of the data along with information on the empirical specification follows. The results are then presented and the final section provides some concluding remarks.

Model
We first introduce some notation. Let i = 1 , 2 , . . . , N and t = 1 , 2 , . . . , T indicate individuals and time periods, respectively. Let y it denote the log output of an individual i in time t, x it denote a K × 1 vector of log inputs (including an intercept) of an individual i in time t and z it denote a L × 1 vector of log environmental characteristics of an individual i in time t. The GTRE model introduced by Colombi et al. (2014) , Kumbhakar et al. (2014) and Tsionas and Kumbhakar (2014) adjusted for environmental factors as in O'Donnell (2016) and Njuki, Bravo-Ureta, and Cabrera (2020) in a production function setting is written as: where β and δ are K × 1 and L × 1 vectors of parameters to be estimated, respectively, and the remaining terms are error components with the superscript + denoting non-negative values. The term α i is a time-invariant random firm effect that captures unobserved heterogeneity, v it is a time-varying noise component, η + i represents the time-invariant persistent inefficiency and u + it captures the time-varying transient inefficiency. The typical normality assumption is made for the firm effect α i : where "i.i.d." stands for "independent and identically distributed" and σ 2 α denotes the variance of the firm effect. Additionally, distributional assumptions need to be made for the inefficiency components η + i and u + it . The typical approach is to assume that both of them follow a half-normal distribution ( Tsionas & Kumbhakar, 2014 ).
However, given that the present paper aims to account for spatial dependence in the inefficiency components, their distributional assumptions are complemented by a spatial component. Following Fusco and Vidoli (2013) , the main idea is that the inefficiency of an individual i depends on the neighboring individuals' j = 1 , 2 , . . . , N inefficiencies plus an individual noise component. Applying the above approach to the GTRE model implies that the above-discussed spatial structure needs to be imposed on the involved persistent and transient inefficiency components. Therefore, persistent inefficiency of an individual i is allowed to be a linear combination of neighbors' persistent inefficiencies and an individual noise component as follows: is a standardized row of a N × N spatial weights matrix W (whose specification is described in the data section), λ is a parameter that measures the strength of spatial dependence and ˜ η i is a noise component assumed to be i.i.d. N (0 , σ 2 ˜ η ) . Therefore, the typical half-Normal distribution imposed on persistent inefficiency becomes: Moving the spatial component to the right-hand-side yields:  JID: EOR [m5G;January 16, 2021;19:34 ] assumed to be i.i.d. N (0 , σ 2 ˜ u ) . In the same spirit, the half-normal distribution imposed on transient inefficiency is:

ARTICLE IN PRESS
where σ 2 ˜ u is a variance component. Finally, besides allowing for spatial dependence in the inefficiency components, we also account for spatial dependence in the noise term v it . This is because apart from accounting for "behavioral correlations" through the specification of spatial dependence in the inefficiencies, we also recognize that there may exist "uncontrolled correlations" among individuals, which is a procedure also followed by Orea and Álvarez (2019) . Note also that allowing for the two-sided random noise term to be spatially dependent is in fact equivalent to allowing for spatial dependence in the dependent (output) and independent (inputs) variables ( Gibbons & Overman, 2012 ), which gives us the opportunity to specify a more parsimonious model as opposed to one that accounts for spatial dependence in several frontier parameters. Technically-wise, allowing for spatial dependence in the random noise term v it is similar to the above-presented spatial specification for the two inefficiency components. Specifically, the two-sided noise component of an individual i is allowed to be a linear combination of neighbors' two-sided noise components and a new individual noise term as

Estimation
The model presented in equations ( 1 -5) is estimated using Bayesian techniques. We gather all parameters to be estimated in a vector θ = [ β, δ σ˜ v , μ, σ α , λ, σ˜ η , ρ, σ˜ u ] . The likelihood of the model is: where y is the stacked output vector over i and X and Z are the stacked inputs and environmental factors matrices across i and t, respectively. The first and the second terms of the likelihood function presented in Eq. (6) Table 1 .
The typical approach of using a multivariate Normal prior for β and δ is followed, where the prior means are set equal to 0 and the prior covariance matrices are diagonal with the diagonal entries being equal to 1,0 0 0. This high prior variance implies that our prior beliefs will have a negligible impact on the results. As it is conventional in Bayesian inference, we work with precisions instead of variances or standard deviations. Precision is simply the inverse of the variance. A Gamma prior distribution (as this is typical in the Bayesian econometrics literature) is imposed on the precisions of the noise component v it and the firm effect a i . Both the shape and the rate parameters are set equal to 0.001. Given that the variance of the Gamma distribution equals the ratio of the shape parameter to the square of the rate parameter, the imposed values yield a variance of 1,0 0 0. Again, this high value manifests our intention to let the data speak about the true parameter values.
Following Fusco and Vidoli (2013) in that the spatial parameters lie on the unit interval, a Beta distribution is imposed on λ and ρ. Given that this is the first study to seek for spatial effects in persistent and transient inefficiency, we do not have any prior knowledge on the values of the spatial parameters apart from case studies that form spatial autoregressive processes on a single inefficiency component. We choose to follow Areal et al. (2012) because they apply their model to the same case study as ours (i.e. dairy farming) and report a value for their spatial dependence parameter of around 0.15. Therefore, and given that the mean of the Beta distribution equals a/(a+b), we set the shape parameter a equal to 2 and the shape parameter b equal to 10. The same procedure is followed for the prior of the spatial parameter μ. Finally, a Gamma prior distribution is used for the precisions of the inefficiency components η + i and u + it . Nevertheless, their parameterization differs from the other precision parameters and, in both cases, the shape and rate parameters equal 7 and 0.5, respectively. This results in lower variance and a more informative prior. The need to place a more informative prior on such parameters is stressed by Van den Broeck, Koop, Osiewalski, and Steel (1994) , Fernandez, Osiewalski, and Steel (1997) , and Griffin and Steel (2007) , who warn that an uninformative prior for the variance of inefficiency may lead to an improper posterior. Now that we specified the model's likelihood and the parameters' priors, using Bayes rule yields the following posterior distribution: where p( θ) is the prior distribution of the parameters. Estimation of the posterior moments of the model's parameters is organised around Markov Chain Monte Carlo (MCMC) simulation and data augmentation techniques for the latent variables.

Simulation
The proposed model is tested using simulated data. A panel dataset is created with N = 100 and T = 8 . A constant term and one independent variable are generated as random draws from standard normal distributions and form the x vector. We generate two additional variables as random draws from standard normal distributions in order to form the z vector. Data in the form of latitude and longitude are created as random draws from a uniform distribution. Subsequently, they are used to calculate the distance between individuals. The minimum and maximum values of the uniform distribution equal those of the real dataset to better represent reality as Wang, Kockelman, and Wang (2013) propose.
The spatial weights matrix W is then constructed based on the inverse distances. Zeros are specified on the diagonal of W and in the entries where the distance is above the minimum value at

ARTICLE IN PRESS
14 , ρ = 0 . 1 , σ˜ u = 0 . 33 . The MCMC scheme involved 140,0 0 0 iterations, while discarding the first 20,0 0 0 to remove the influence of the initial values (which were set equal to the true parameter values) and keeping one out of two draws to mitigate potential autocorrelations. The posterior distributions of the parameters are presented in Fig. 1 .
Note that convergence of MCMC was met for all parameters based on the Geweke (1992) diagnostic, while Monte Carlo Standard Errors (MCSE) were quite small, indicating that autocorrelation of draws is not an issue here. All true parameter values are inside the regions of their associated posterior distributions. This means that we can effectively estimate the model's parameters without biases.

Data & specification
The proposed model is applied to the case of specialized dairy farms in Wisconsin, which participate in the Agricultural Financial JID: EOR [m5G;January 16, 2021;19:34 ] Advisor (AgFA) program at the University of Wisconsin-Madison Center for Dairy Profitability (CDF). We use a balanced panel dataset of 139 farms observed between 2009 and 2017. This yields a total of 1251 observations. We stress though that the initial dataset at hand was unbalanced. Farms typically leave the panel because they may no longer want to be part of the AgFa program, or they are no longer a client of the farm association who collected farm financial data on behalf of CDF, or even because they exited the sector. We also note that even though the transformation of the dataset into a balanced panel excludes several observations, the resulting dataset is still representative of the initial one. This is because the mean values of the utilized variables in the balanced dataset are very similar to those in the unbalanced dataset, which is logical given that farms participating in the AgFa program are relatively homogeneous in that they are relatively small and financially weaker farms that are seeking to improve their profitability. The decision to construct and work with a balanced panel dataset is based on the following reasoning. As Elhorst (2014b) stresses, the asymptotic properties of global spatial estimators may become problematic for unbalanced panels if the reason why data are missing is not known with certainty, which is the case in our study. Therefore, extending spatial estimators to an unbalanced panel data setting involves making a strong assumption about why observations are missing and using data imputation techniques. For instance, Pfaffermayr (2013) and Wang and Lee (2013) assume that data are missing at random (i.e. the missing data may depend on variables observed in the data set, but not on the missing values themselves) for their unbalanced spatial panels, an assumption that is not true in our case. Our data are missing not at random (MNAR) because the missingness could be related to unobserved conditions (e.g. a farmer's decision to exit the sector). In the case of MNAR, there is no direct way of analysis. This is because crucial parts of the data are missing, making it unclear what their effect on results is. Most importantly, because the necessary information is missing, one cannot verify whether it has occurred. Addressing this problem is not straightforward from a statistical point of view, as assumptions must be made that cannot be tested empirically ( Buehl, Heinzl, Mittlboeck, & Findl, 2008 ). In light of this, and the fact that a general approach to addressing the issue of missing observations in spatial panels is still not available ( Elhorst, 2014b ), we proceed with the constructed balanced panel dataset.

ARTICLE IN PRESS
One output and six inputs namely capital, labor, land, livestock, purchased feed and materials are specified. Output consists of milk, meat and crops. We do not separate the different categories because farms almost exclusively produce milk. Capital consists of machinery and buildings, while labor includes own and hired labor hours. Land represents the own and hired acres of agricultural area and livestock includes the total number of heads. Finally, the last two inputs are purchased feed and materials, with the latter including all intermediate inputs excluding purchased feed, such as veterinary expenses, energy, contract work, crop-specific costs and other variable costs. Output, capital, purchased feed and materials are measured in monetary units (i.e. constant 2010 prices). The monetary output and inputs are transformed into implicit quantity indices by computing the ratio of value to its corresponding price index. Price indices for output and inputs are obtained from the National Agricultural Statistics Service and, when necessary, aggregated to Törnqvist price indexes.
Additionally, two environmental indicators are used; summer precipitation and summer temperature. These variables are specified on the frontier and not on the distribution of inefficiencies because they should not affect the way producers manage their assets but rather the production of milk. As Qi, Bravo-Ureta, and Cabrera (2015, page 8664) note "In general, research on the connection between climatic variables and livestock has focused on  (2015) stress that livestock production is particularly vulnerable to extreme weather, which can cause significant output losses. In particular, increased summer precipitation contributes to high humidity which is known to be related to mastitis infection ( Morse et al., 1988 ). Mastitis is the most economically important disease in the dairy industry worldwide, causing among others milk yield losses, increased veterinary costs, involuntary culling of cows, and higher workload for the farmers ( Halasa, Huijps, Østerås, & Hogeveen, 2007 ). Increased summer temperature can cause heat stress to dairy cattle ( Armstrong, 1994 ). One of the effects of heat stress in dairy cows is increased somatic cell count ( Hammami, Bormann, M'hamdi, Montaldo, & Gengler, 2013 ), which is known to negatively affect milk quantity and quality ( Cinar, Serbester, Ceyhan, & Gorgulu, 2015 ). Given the above arguments, this study incorporates the environmental indicators on the frontier through the "period-and-environment specific frontier" presented in Eq.
(1) as in O'Donnell (2016) and Njuki et al. (2020) . Table 2 offers summary statistics of all variables. A time trend is also included to capture technological progress/regress. A translog specification is used including interactions between inputs and environmental variables, interactions between the time trend, the inputs and the environmental variables, and their square values. Prior to estimation output, inputs and environmental variables are normalized by their geometric means thus allowing for a direct interpretation of the first-order terms of inputs as output elasticities evaluated at the geometric mean of the data.
Apart from output, input and environmental quantities, the dataset contains exact location information of farms based on their latitude and longitude. This information is used to calculate the distance between farms and form the spatial weights matrix W . Apart from its diagonal elements which equal zero so that an individual is not termed as neighbor to himself/herself, the remaining elements are set equal to the inverse distance ( 1 /d i j ), thus placing higher weight on closer neighbors. This choice is based on the argument used by Roe, Irwin, and Sharp (2002) , stressing that individuals are more likely to be influenced by closer than more distant neighbors. Following common practice, a distance threshold d * is used outside which spatial relationships no longer exist. As in Marasteanu and Jaenicke (2016) and Skevas (2020) this threshold is set equal to the minimum distance at which all individuals in the sample have at least one neighbor. This is 50km in our case study. Finally, all elements in W are normalized by its maximum eigenvalue as in Vega and Elhorst (2015) .

Results
Using the same sampling scheme used with the simulated data, estimation of our model with the Wisconsin dairy farms' data yields the posterior moments of the model's parameters that are presented in Table 3   zero for all parameters suggesting that the posterior draws do not exhibit autocorrelation. All output elasticities are positive and "statistically significant", given that their associated 95% credible intervals do not contain zero. This is an expected finding given that the utilized inputs play a key role in the production of dairy farm's output as this is also reported by Emvalomatis, Stefanou, andOude Lansink (2011) , Sauer andLatacz-Lohmann (2015) and Skevas et al. (2018a) . Material inputs have the biggest effect on production with livestock and purchased feed following. Adding the output elasticities yields a scale elasticity of 0.995 suggesting that dairy farms in Wisconsin operate, on average, on the decreasing returns to scale part of the technology with a probability of 65%. Finally, there is evidence that Wisconsin dairy farms experience an inverted Ushaped technical change due to the positive estimate of the trend variable and the negative estimate of its square term. Note that the remaining parameter estimates (i.e. interaction and square terms) are not discussed because the performed geometric mean normalization of the data directs interest only to the first-order terms. However, all estimates are presented for the sake of completeness.
Regarding the environmental variables, summer precipitation is "statistically insignificant" while summer temperature is negative and "statistically significant". This is an expected finding since, as stated above, increased summer temperature can lead to heat stress in dairy cattle and in turn high somatic cell count, which JID: EOR [m5G;January 16, 2021;19:34 ]   can negatively affect milk quantity ( Cinar et al., 2015 ). We also note that squared summer precipitation is also negative and "statistically significant". As mentioned above, this is because too high summer precipitation causes high humidity, which is related to mastitis infection that negatively affects milk output ( Halasa et al., 2007 ).

ARTICLE IN PRESS
Moving to the estimates of persistent and transient inefficiency, summary statistics are provided in Table 4 . Average (across individuals) persistent inefficiency is estimated at 9% while average (across both individuals and time) transient inefficiency is 8% 2 The fact that transient and persistent inefficiency exhibit similar average values implies that the time-span covered by the utilized data is close to farms' equilibrium, which is a conclusion also drawn by Skevas, Emvalomatis, and Brümmer (2018b) who found similar persistent and transient inefficiencies for German dairy farms. Variation in the inefficiency scores is low, particularly in farms' transient inefficiency. Persistent inefficiency exhibits more extreme values than transient inefficiency in both sides. On the one hand, minimum persistent inefficiency is only 2%, while minimum transient inefficiency is 5%. On the other hand, maximum persistent inefficiency is 20% and maximum transient inefficiency is 6% lower. In general, Wisconsin dairy farms exhibit low inefficiency levels although there is still scope for improvement for both their long-run and short-run performance.
To put the inefficiency results into context, using the same case study as ours, Cabrera, Solis, and Del Corral (2010) reported an average inefficiency score of 12% for 2007 and Chidmi, Solís, and Cabrera (2011) a mean inefficiency of 10% for the period 2004-2008. The small differences in the inefficiency estimates between the above-cited studies and ours, can be attributed to the more recent dataset that we use and to our different modeling approach that captures both persistent and transient inefficiency as well as their spatial associations. Furthermore, Njuki et al. (2020) used both the same case study (for the period 1996-2012) and the GTRE model and reported an average persistent inefficiency of 6% and mean transient inefficiency of 14%. Although our study also reports relatively low inefficiency scores for Wisconsin dairy farms, the observed differences may stem from the older dataset that Njuki et al. (2020) use and their ignorance of spatial dependence in the inefficiency components.
We note in passing that the GTRE model accommodates by construction random-effects as its name manifests. Although there exists a debate on whether fixed or random effects fit a panel dataset best, treating the two-sided unobserved heterogeneity term as a fixed effect is not common because when trying to eliminate it using the typical "first-differences" transformation the time-invariant persistent inefficiency component is also cancelled out. A valid alternative that does not eliminate the persistent inefficiency term could be Mundlak's approach of including the variables' group means as additional independent variables. Although this is an uncommon approach in the efficiency literature that uses the GTRE model, we also estimate the model using Mundlak's approach. While the obtained estimates change only slightly, the persistent inefficiency component is inflated, manifesting that it may be still capturing part of unobserved time-invariant heterogeneity, which is not the case in the random-effects specification. Additionally, Bayes factors reveal that the random-effects specification fits the data best. The results from these comparisons are presented in Table A.2 in the Appendix.
Turning to the estimates of the spatial parameters, Table 5 presents their associated posterior moments. Note that, as mentioned in the previous section, we followed the typical procedure of setting the distance threshold d * equal to the minimum distance at which all farms in our sample have at least one neighbor (i.e. 50km). Roe et al. (2002) follow a more arbitrary approach and set a threshold that results in all farms having several neighbors. Although we do not follow this more arbitrary approach, we conduct robustness checks with respect to higher thresholds (i.e. 55km, 60km and 65km). The posterior distributions of the spatial parameters for each threshold are presented in Figs. (A .1 , A .2 , A .3 ) in the Appendix. The results reveal that the estimates are not sensitive to the threshold choice. This is logical since even though adding more neighbors by specifying higher thresholds, the inverse distance specification of the spatial weights matrix gives less weight to more distant neighbors.
The spatial dependence parameter μ is estimated at 0.082 revealing that there indeed exists spatial dependence in the noise component, which can stem from "uncontrolled correlations" between individuals. These can include similarities in soil quality (that can affect f eed production), identical local shocks (extreme weather events) and proximity of urban areas (advantage of better infrastructure). Furthermore, both persistent and transient inefficiencies of Wisconsin dairy farms are spatially dependent. Specifically, the spatial parameter with regards to persistent inefficiency ( λ) is estimated at 0.247, while the spatial parameter associated with transient inefficiency ( ρ) is 0.091. A possible explanation for this finding is that neighboring dairy farmers in Wisconsin communicate or imitate each-other regarding both long-run and shortrun choices/practices. For instance, Brock and Barham (2013) provide evidence of the impact of social networks in the adoption of organic dairy farming in Wisconsin. More specifically, farmers faced information constraints related to organic dairy farming and overcame these constraints through the exchange of information with early adopters. Furthermore, Lewis, Barham, and Robinson (2011) in their study of the role of spatial spillovers in organic dairy farming adoption in southwestern Wisconsin, show that the presence of nearby organic dairy farmers affects the adoption decision. Given that the decision to adopt a new technology, such as organic farming, can affect both short and long-run performance (since adoption of new technologies requires substantial changes in equipment, facilities, and managerial strategy), this shows that communication/imitation between neighboring Wisconsin dairy farmers can result in spatial dependence in their (in)efficiencies.
Additionally, producers may have common consultants that are advising them regarding their short/long-run production practices (i.e correct use of variable inputs/new machines), which can result in spatial dependence in their (in)efficiency levels. Also, farmers may belong to the same local cooperative thus having similar input qualities and machinery services, which can again result in similar short/long-run performances. Although there does not exist any study that reports the magnitude of spatial dependence for both persistent and transient inefficiency, it would worth mentioning that Skevas (2020) reports a value of 0.371 for the inefficiency's JID: EOR [m5G;January 16, 2021;19:34 ] spatial dependence parameter. This finding is larger than the ones reported in our study but not comparable because of using a different dataset and more importantly, because our reported spatial dependencies concern two inefficiency components. What is striking in our results though is that the magnitude of spatial dependence in persistent inefficiency is more than double when compared to the magnitude of spatial dependence in transient inefficiency. This can be because dairy farmers can consider some choices to be more important than others, and therefore imitate or seek for information from neighboring peers or follow the advice of their common consultants/cooperatives mostly for them. For example, the costs at stake can be much higher in the decision to adopt a new technology (decision that can also result in persistent (in)efficiency), than in the decision of how much of a variable input to use (decision that will only affect transient (in)efficiency), because the former constitutes a large investment for farms. Therefore, producers that are reluctant to adopt new technologies can influence neighboring ones to make similar choices, thus collectively exhibiting some persistent inefficiency. Conversely, farmers that are keen to adopt new technologies can influence neighboring farmers to also do so, thus collectively exhibiting higher persistent efficiency. On the contrary, producers may not discuss or imitate their neighbors regarding the decision of how much feed to use because they may not consider this decision equally important as the decision to innovate.

Conclusions
We propose a model that combines the (environmentallyadjusted) GTRE model of Colombi et al. (2014) , Kumbhakar et al. (2014) and Tsionas and Kumbhakar (2014) and the spatial autoregressive efficiency model complemented by spatial autoregressive disturbances. This makes us the first to present a model that simultaneously separates time-invariant firm effects from persistent and transient inefficiency while accounting for spatial dependence in these two inefficiency components.
The need to separate unobserved heterogeneity from inefficiency is a well-discussed topic in the panel data econometrics literature with its main objective being to prevent distortions in the inefficiency estimates ( Greene, 2005a ) and ( Greene, 2005b ). The need to decompose inefficiency into persistent (i.e. time-invariant) and transient (i.e. time-varying) components lies on the rigidity of some production factors that make inefficiency persist ( Stefanou, 2009 ) and on factors that are volatile and cause temporal changes in inefficiency ( Tsionas & Kumbhakar, 2014 ). Finally, among other factors, spatial dependence in inefficiency can stem from imitation or communication between neighboring units regarding production choices/practices and the flow of knowledge ( Skevas, 2020 ).
The proposed model is first tested using simulated data in a Bayesian estimation framework. The results from the simulation study reveal that all parameters are well identified without yielding biases. The model is then applied to a panel dataset of specialized dairy farms in Wisconsin observed over the period 2009-2017. The utilized dataset does not only provide information on farms' physical units (i.e. output and inputs) and environmental characteristics (i.e. summer precipitation and temperature) but also on their exact location based on latitude and longitude coordinates, thus allowing us to identify neighboring farmers.
The results reveal that all output elasticities are positive with material inputs having the highest effect on Wisconsin's dairy farms production. Additionally, farms operate under mild decreasing returns to scale, while an inverted U-shaped technical change is reported. High summer temperature and too high precipitation limit production. Coming to the inefficiency scores, mean persistent inefficiency is 9% and mean transient inefficiency is 8%. That is, Wisconsin dairy farms can still improve both their long-run and short-run performances. Variation in inefficiency scores is slightly higher in the persistent component.
The empirical findings also provide evidence of spatial dependence in the noise component as well as in both persistent and transient inefficiencies. Spatial dependence in the noise component can be attributed to correlations in uncontrolled factors such as soil quality and to common local shocks. The finding of spatial dependence in inefficiencies can be because dairy farmers communicate with their neighbors and exchange ideas on how to manage both their short-run and long-run production processes or due to receiving similar advice from common consultants/cooperatives. Nevertheless, the strength of spatial dependence is much higher for the persistent inefficiency component. An explanation for this result is that farmers deem factors that cause (in)efficiency to persist (e.g. delay in the adoption of a new technology) more important than factors related to typical production choices (e.g. amount of feed use), resulting in higher level of communication regarding the former.
Finally, we note that the rule of thumb of at least one neighbor on which the parameterization of the spatial weights matrix W is based, was used because estimation procedures for the distance threshold are not developed for more advanced models as the utilized GTRE model but rather for more simple models. An example is the linear Spatial Lag of X (SLX) model for which Elhorst (2014a) proposes an estimation procedure where the distance cutoff point is estimated according to an algorithm that minimizes the ordinary least squared residuals. Based on that, future research can focus on developing an algorithm for estimating the distance cut-off point for more complicated models such as the stochastic frontier. Furthermore, although the efficiency measurement literature has raised many concerns regarding the endogeneity of inputs ( Kutlu, 2010 ); ( Shee & Stefanou, 2015 ); ( Karakaplan & Levent, 2017 ), the majority of GTRE studies, including ours, ignore this issue (which can be particularly true for variable inputs) with the exceptions of Lai and Kumbhakar (2018) and Lien et al. (2018) . Hence, future work can combine the spatial GTRE model presented in this study, and the endogeneity-correcting approaches proposed by Lai and Kumbhakar (2018) and Lien et al. (2018) .