The Scaled Beta2 Distribution as a Robust Prior for Scales

We put forward the Scaled Beta2 (SBeta2) as a flexible and tractable family for modeling scales in both hierarchical and non-hierarchical settings. Various sensible alternatives to the overuse of vague Inverted Gamma priors have been proposed, mainly for hierarchical models. Several of these alternatives are particular cases of the SBeta2 or can be well approximated by it. This family of distributions can be obtained in closed form as a Gamma scale mixture of Gamma distributions, as the Student distribution can be obtained as a Gamma scale mixture of Normal variables. Members of the SBeta2 family arise as intrinsic priors and as divergence based priors in diverse situations, hierarchical and non-hierarchical. The SBeta2 family unifies and generalizes different proposals in the Bayesian literature, and has numerous theoretical and practical advantages: it is flexible, its members can be lighter, as heavy or heavier tailed as the half-Cauchy, and different behaviors at the origin can be modeled. It has the reciprocality property, i.e if the variance parameter is in the family the precision also is. It is easy to simulate from, and can be embedded in a Gibbs sampling scheme. Short of not being conjugate, it is also amazingly tractable: when coupled with a conditional Cauchy prior for locations, the marginal prior for locations can be found explicitly as proportional to known transcendental functions, and for integer values of the hyperparameters an analytical closed form exists. Furthermore, for specific choices of the hyperparameters, the marginal is found to be an explicit “horseshoe prior”, which are known to have excellent theoretical and practical properties. To our knowledge this is the first closed form horseshoe prior obtained. We also show that for certain values of the hyperparameters the mixture of a Normal and a Scaled Beta2 distribution also gives a closed form marginal. Examples include robust normal and binomial hierarchical modeling and metaanalysis, with real and simulated data.


Introduction
The focus of this paper is to propose the Scaled Beta2 (SBeta2) family of distributions, as a convenient family of prior distributions for scale parameters, both for informative and quasi-non-informative scenarios, and for hierarchical and non-hierarchical models. The intention is to provide an alternative to the use of the Inverted Gamma distribution, and to show that the SBeta2 has a natural motivation, is flexible and tractable.
The SBeta2 is a comprehensive family that encompasses and expands several previous proposals. Two of the most noteworthy, in the context of random effects models, are Gelman (2006) and Berger (2006). Gelman (2006) proposes the half-Cauchy and the Uniform distributions for the between groups standard deviations. Berger (2006) proposes using 1/ √ σ 2 as prior for the between groups variance. We claim that the SBeta2 contains these proposals, exactly or approximately. Take the case of the half-Cauchy for the standard deviation, where an application of the change of variables formula leads to a SBeta2(1/2, 1/2, b) for the variance. On the other hand, the SBeta2(1/2, q, b), for small q, say 0 < q ≤ 1/2 and large b is a useful approximation to 1/ √ σ 2 . Other proposals are contained in Pericchi (2010) and Polson and Scott (2012), both putting forward a SBeta2 but without explicit mention of scale. However, a more flexible family is achieved including an adjustable scale, without which, for example, the distribution may not approximate sufficiently well the Uniform distribution.
Certainly, other alternative families to the Gamma/Inverted-Gamma have been proposed, for instance Griffin and Brown (2010) and Frühwirth-Schnatter and Wagner (2010), among others. Nevertheless, we argue that the SBeta2 is a flexible family that is able to model the advantages of the previous proposals, like heavy tails or boundedness/unboundedness at the origin, etc. Besides it has additional advantages that will be discussed in the sequel. It is no surprise that in Bayesian Statistics modeling and testing, scattered particular cases of the SBeta2 or of the Beta2 distribution have appeared (like Gelman's half-Cauchy). These include: Bradlow et al. (2002), Scott and Berger (2006), Liang et al. (2008), Maruyama and George (2011), Wang and Sun (2013) and Sparks et al. (2013). Noteworthy is the appearance of particular members of the SBeta2 family in Pericchi (2005) as intrinsic priors for testing the scale of an Exponential Law and in Giron et al. (2006) as intrinsic priors for scales in the Linear Model. In Supplementary Appendix 3 (Pérez et al., 2016) we show that in a normal model with known mean, the SBeta2 distribution is the intrinsic prior of the scale parameter.
We justify our proposal of using the SBeta2 family for modeling scales based on a combination of theoretical and practical considerations.
First of all, it has a natural motivation as a Gamma scaled mixture of a Gamma distribution as shown in Lemma 1. It has the property of reciprocality, i.e., if p(ψ) belongs to the SBeta2 family, so does p(1/ψ), which is not a property of the Gamma/Inverted-Gamma family (the half-Cauchy distribution proposed by Gelman to model standard deviations also has this attractive property, and it is reassuring that this is a particular case of our proposal as mentioned before). It is flexible enough for modeling a variety of behaviors at the origin and at the tail, and for specific hyperparameters boundedness at the origin and heavy right tail is obtained, as heavy or even heavier than the Cauchy distribution.
Secondly, it is convenient practically: it can be simulated from in various ways (as a Gamma scaled mixture of Gammas or as the scaled odds of a Beta distribution), and thus it can be easily embedded in a Gibbs sampler scheme. Also the SBeta2 family is amenable for elicitation as one of its parameter controls the behavior at the origin, another the right tail behavior and the scale can be assessed in a variety of ways, as will be seen later in this work.
Thirdly a variety of analytical results spring from the SBeta2 family, that we summarize here: a) With conditional Normal priors for location and SBeta2 for the precision, the marginal prior for location can be found in closed form for specific values of hyperparameters. It is bounded at zero and heavy tailed.
b) When a SBeta2 prior for the scale is coupled with a conditional Cauchy prior for locations, the marginal prior for locations can be found explicitly as proportional to known transcendental functions, and for integer values of the hyperparameters they are found in analytical closed forms. Furthermore, for specific choices of the hyperparameters, the marginal is found to be an explicit horseshoe prior (Carvalho et al., 2010) with a pole at the origin and heavy tail, leading to a sort of nearly optimal choice as a prior for sparse locations. This seems to be the first explicit horseshoe prior in the literature. c) Again for a conditional Cauchy prior for location, if now the square of the scale is modeled as a SBeta2, the marginal is no longer a horseshoe prior, but a general closed form result is obtained. This strategy leads also to a very useful prior distribution, called the Student-SBeta2 distribution (Fúquene et al., 2014).
It is important to emphasize that analytical results in cases (b) and (c), are obtained for heavy tailed distributions for locations and in (a) for light tailed distributions.
This paper is organized as follows: in Section 2 we motivate the SBeta2 family showing that the SBeta2 distribution can be obtained as a scale mixture of Gamma distributions, we present some of its properties and we discuss how to use the SBeta2 distribution as a prior for variances and precisions. Section 3 deals with closed forms for mixtures of SBeta2 and Cauchy, Student and Normal distributions. In Section 4 we give examples to illustrate the advantages of the use of SBeta2 distributions as priors for scale parameters. Finally, we summarize some conclusions about those advantages.
The usual second level in the hierarchy reads as This model is known to be "non-robust" in the sense that the amount of shrinkage may be too heavy for an outlying observation. To reduce the excessive shrinkage, following a suggestion that goes back to De Finetti (1961), a level of hierarchy is added to have a scale mixture of Normals, replacing (3) by This effectively replaces the Normal by a Student distribution since Similarly, replacing (4) as yields the SBeta2 distribution as prior for the precisions h i . This, we prove in the sequel, effectively replaces the Gamma by a scaled version of the Beta2 distribution, the Scaled Beta2 distribution given in (1). This result, formalized in Lemma 1 below, describes an effective way to generate SBeta2 random variables.
Lemma 1. The SBeta2 density is obtained as a Gamma mixture of Gamma densities or Inverted Gamma densities as follows: Similarly, Proof. See Supplementary Appendix 1.
In Section 4 we will present examples of the use of the SBeta2 in practical situations, where it will be seen that the use of this distribution as prior for scale parameters promotes robustness in hierarchical models and produces sensible analyses in diverse settings.

Properties of the SBeta2
We now explore the properties of SBeta2 distribution that can be helpful for assessment of the hyperparameters.
As we already discussed, the SBeta2 can be generated using Lemma 1, as a Gamma scale mixture of Gamma distributions. Another easy way to generate ψ ∼ SBeta2(p, q, b) random variables is as ψ = θ (1−θ) b, where θ follows a Beta(p, q) distribution, that is, as a scaled odds of Beta random variables.
Consider a distribution in a scale or location-scale family with unknown scale parameter σ. We suggest to specify the prior as σ 2 ∼ SBeta2(p, q, b), or equivalently for the reciprocal (precision), h = 1 σ 2 ∼ SBeta2(q, p, 1 b ). For selecting values for the hyperparameters p, q and b, the following properties can be useful.
1. Behavior at zero: 3. Location of the mode: Otherwise 4. When p = q, the median of the SBeta2 distribution is the scale parameter b. For p = 1, the median turns out to be b · (2 1/q − 1).

Robustness of the SBeta2
One way to measure the thickness of the tails is to measure the index ρ of a regularly varying density (Andrade and O'Hagan, 2006).

Definition. The right-hand tail of a density f (y) is regularly varying with index
Note that a Student-t distribution with ν degrees of freedom has index ρ = −(ν + 1). Computation shows that the SBeta2(p, q, b) has ρ = −(q + 1), so the tail behavior is totally defined by q. Thus for q = 1 we have a tail behavior of the same index as a Cauchy distribution. More generally the tail index of a SBeta2(p, q, b) is that of a Student-t with q degrees of freedom. On the other hand the behavior at the origin is commanded by the value of the parameter p. For p > 1 the density function is zero at the origin, and for p < 1 it is infinity at the origin. For p = 1 the density function is bounded at the origin.

Some thoughts about elicitation
In general, we want to have heavy tails for a robust inference, but we don't want to give high weight to σ 2 = 0. So, our suggestion for selecting the hyperparameters p, q and b is taking 1/2 ≤ p ≤ 1 and 0 < q ≤ 1. Note however that other values of (p, q) may be necessary, like q > 1, based on stability considerations in complex Markov Chain Monte Carlo (MCMC) modeling (Pericchi et al., 2011). One way to assess b is to fix it empirically as the median (or somehow higher than the median) or based on subject matter knowledge. Another possibility is to assess probability statements like P (ψ > a) = c for ψ ∼ SBeta2 (p, q, b). This approach can be very useful since the SBeta2 distribution can be regarded as the distribution of the scaled odds of a Beta random variable. We then have P (ψ > a) = P (θ > a (a+b) ) where θ ∼ Beta(p, q), which can be easily solved using statistical software. Note that the p = q = 1 and p = q = 1

Closed form results for mixtures with SBeta2 distribution
In this section, we show that the SBeta2 is amazingly tractable for Bayesian analysis, and produces some interesting heavy tail distributions for locations. Here the SBeta2 will be used as a prior distribution of the precision of a Normal and the scale and square scale of a Cauchy distribution (some results in this section were obtained using Wolfram Alpha LLC., 2014).

Normal-Scaled Beta2 distribution prior
where the precision τ follows a SBeta2(p, q, b) distribution. We use the representation of the SBeta2 distribution as Gamma scale mixtures of Gamma distributions in Lemma 1 for calculating the marginal distribution of θ by changing the order of integration The integrand with respect to τ simplifies to: and the integral becomes, For the important particular case p = q = 1, the integral reduces to This last integral can be explicitly calculated, and the result is We will call this the Normal-Scaled Beta2 distribution. It is a scale family with scale The density is shown in Figure 1 for different values of b. It can be shown that tails of this distribution go to zero as O(θ −3 ). This implies that this distribution has a finite mean, but does not have a finite second moment. Its cumulative distribution function (CDF) can be calculated also in closed form, For θ < 0, symmetry can be used for finding the CDF.

Cauchy-Scaled Beta2 distribution: an explicit horseshoe distribution
Now, instead of a normal, let θ be a Cauchy random variable with location parameter 0 and scale parameter τ , and assume τ ∼ SBeta2(p, q, b). Then, the joint distribution of θ and τ is given by The marginal distribution of θ can be calculated as Note that when p and q are integers, the integrand is a rational function. For example, if p = q = 1, Note that this density function depends on θ only through |θ|, and so it is clearly symmetric around 0.
In this case, the cumulative distribution function also has a closed form, given by For θ < 0, symmetry can be used for finding the CDF.
The density functions of Cauchy-Scaled Beta2 variables for p = q = 1 and different values of b are shown in Figure 2. Larger values of b are associated to lower areas around the origin. This distribution has a pole at θ = 0 and flat tails, which is an example of a horseshoe prior (Carvalho et al., 2010). To the best of our knowledge, this is the first horseshoe prior in explicit algebraic form. Figure 3 compares densities for quartile matching (|q 1 | = q 3 = 1) Normal, Cauchy, Normal-SBeta2(p = q = 1) and Cauchy-Beta2 (p = q = 1) distributions. The heaviest tails correspond to the Cauchy-SBeta2(1,1,1), while the Normal-SBeta2 tails are lighter than those of the Cauchy. Other choices of the hyperparameters may lead to a closed form marginal but not necessarily a horseshoe prior. For instance, for p = 2 and q = 1, we obtain This density does not have a pole at θ = 0. The corresponding CDF is Again, symmetry can be used for calculating the CDF value when θ < 0.

Assigning a SBeta2 prior to the square of the scale: a general result
As before, suppose that Cauchy or, more generally, Student-t distributions are assumed for locations. What if instead of the scale, the square of the scale is assumed to be SBeta2? For example, In this case, the marginal for θ is: This is an interesting distribution, close to a Cauchy. It does not have a pole at zero, so it is not a horseshoe prior.
In Fúquene et al. (2014) this distribution, called Student-t-Beta2, is studied and applied in detail. There a general result for the marginal of the location for any p and q is obtained in terms of the Hypergeometric Function, as summarized in Supplementary Appendix 2.

Examples
Here we analyze three datasets found in literature and some simulated data. The first dataset is the "8-schools example" presented in Gelman (2006), where it is shown that the SBeta2 behaves sensibly in the sense that it does not promote very small variances and large shrinkages. In the second example, we use data from Normand and Shahian (2007) to illustrate that the SBeta2 promotes robustness in the hierarchical model. In the third example we revisit the famous baseball dataset in Efron and Morris (1972) and we robustly predict the batting averages of 18 baseball players, protecting the exceptional players from too much shrinkage to the mean (the so called "Clemente Problem", Efron, 2010) and at the same time improving the mean squared error (MSE). Finally, we use simulated data to compare the SBeta2 with the half-Cauchy proposed by Gelman for the schools example and find that the SBeta2(1, 1, b) seems preferable in this settings.

A normal hierarchical model
In this section we will consider the normal hierarchical model described by Gelman (2006) in the so called "8-schools example". We wish to compare the changes in the posterior distribution of the precision when using either the Inverse-Gamma or the SBeta2 as prior distributions for the random effects variance.
Gelman works with a simple two-level normal model of data y ij with group-level effects α j : where the parameters α 1 , · · · , α 8 represent the relative effects of Scholastic Aptitude Test coaching programs in eight different schools, and σ α stands for the between-school standard deviations of these effects. The effects are measured in points within a range between 200 and 800. The approximate average and standard deviation are 500 and 100 respectively. The model has three hyperparameters μ, σ y , and σ α . Here we will only study the effect of the prior distribution for the variance of the random effects, σ 2 α . Gelman proposes a half-Cauchy(25) as prior distribution for σ α , which corresponds to a SBeta2(0.5, 0.5, 625) prior distribution for σ 2 α . Therefore, for Bayesian estimation, the model is fitted with three different prior distributions for σ 2 α : SBeta2(1, 1, 625), SBeta2(0.5, 0.5, 625) and Inverse-Gamma(0.001, 0.001). Figure 4 are based on 6000 iterations from a model fit using OpenBUGS (Thomas et al., 2006), and correspond to the posterior distribution for σ α obtained with each of these three prior distributions .The left and middle histograms show the posterior distributions for σ α using priors SBeta2(0.5,0.5,625) and SBeta2(1,1,625), respectively. We can observe that the range of values is mainly between 0 and 20 with a light tail after this last value.

Results in
The histogram on the right shows the posterior distribution for the same parameter using an Inverse-Gamma(0.001, 0.001) prior distribution. We can see that the range of values for σ α is concentrated in a short interval near 0 (0 to 5), and the posterior has a sharp peak near zero. This is the anomalous behavior highlighted by Gelman (2006) which is not present when the SBeta2 distributions are used as priors.
It can be seen that the SBeta2 distribution works properly when the analysis is performed for all 8 schools. Gelman (2006) comments that some problems could arise when the number of groups J is small because the data give little information about the variance between groups. In the analysis of the schools example Gelman only in- cludes data for the first three schools, and uses the uniform and the half-Cauchy as prior distributions for σ α . He concludes that the half-Cauchy gives good results in this example with plausible posterior values for σ α < 50. However, when a uniform prior distribution is used, the posterior distribution for σ α presents an extremely long right tail with values for σ α too high to be reasonable for this example, and therefore its use could result in an "under-shrinking" of the estimates for the effects α j . Figure 5 shows the histograms of the posterior distributions for σ α with prior distributions SBeta2(0.5,0.5,625) and SBeta2(1,1,625) when only data from the first three schools are used. We see that they present a range of plausible values for σ α between 0 and 50; after this last value, the posterior densities decrease rapidly. Like the half-Cauchy, the SBeta2 prior distribution with p = q = 1 has a good performance in this example because it shows plausible values for σ α without the presence of a heavy right tail. Figure 5: Histograms of posterior simulations of the between-school standard deviation, σ α , from models with two different prior distribution: (i) SBeta2(0.5,0.5,625), (ii) SBeta2(1,1,50) and data for the first three schools.

A tale of unwanted attraction, or how "naïve" hierarchical Bayes pulls up perfect hospitals
In an application of Bayesian hierarchical models to hospital profiling, Normand and Shahian (2007) analyze data for 30-day mortality following isolated coronary artery bypass grafting (CABG) surgery in 13 non-governmental hospitals in Massachusetts, USA. In their work, it can be seen that, unfortunately, random effects of hospitals with no deaths are subject to large shrinkage with non-robust hierarchical models. This is an instance of too much shrinkage: a perfect hospital (no deaths) is pulled strongly towards the general mean regardless of its exceptional quality. The assumption of a vague Inverse-Gamma produces a hierarchical model that does not predict outliers, very bad or very good hospitals, and thus implies large corrections for outlying values.
In that sense the selected model is non-robust. Then we should change the assumptions if a robust behavior is desired. Notice also that a robust model is fairer with exceptional individuals: the amount of shrinkage is not constant, but depends on performance.
We revisit the data shown in Table 1 without using explanatory data (which is not available). Additionally, we enlarge the sample size of one of the hospitals with no deaths to the average size of all hospitals, as an alternative scenario to explore the differences between different models.
We will focus on the probability of death for each hospital, θ i , and its corresponding log-odds, β i . We also calculate the predictive probability of 0 deaths for the following 100 patients for each hospital.  (Normand and Shahian, 2007). Hospital 5 was changed from 26 patients to the approximate mean number of patients 350.
• Model 3: A Student-t distribution with 2 degrees of freedom is used as a prior for the log-odds instead of a Normal.
In Models 2 and 3, we choose a conventional SBeta2(1, 1, 1) as a plausible nearly objective model, since it is symmetric in the information of the scale and its reciprocal. We may add that this choice also makes sense from an Empirical Bayes approach, since the observed variance of the log-odds (from the modified hospital data) is around 0.8, close to the assessed median of 1. Figure 6 shows 95% posterior probability intervals for the log-odds and the probabilities of death for each hospital, and the posterior probabilities of 0 deaths for 100 patients for each hospital are shown in Figure 7. For the fully robust Model 3, the inference for the "perfect" hospital 5 is the most reasonable, followed by Model 2. The assumption of a flat tail location, as the Cauchy (which is widely accepted as a more robust model in a setting like this) is made even more robust by the assumption of a Scaled Beta2 for the scale. Efron and Morris (1972) obtained a sample of batting averages for 18 baseball players for the 1970 season. They used the average obtained during the first 45 at bats for predicting the batting average of each player for the rest of the season. The initial assumption about the data is  where Y i is the batting average for the first 45 at bats, and p i depends on each player's ability.

The Clemente problem
The batting average for the rest of the season, R i , can be modeled as where n i is the number of at bats for player i during the remainder of the season.
Efron and Morris applied a variance stabilizing transformation to Y i , so that X i ∼ N (μ i , 1), with μ i approximately equal to the transformed value of p i . In the sequel, we will use this transformed variable.
In his talk at the '09 Objective Bayes Conference June 2009, "The Future of Indirect Evidence" Professor Bradley Efron exposed a fundamental problem: "The Clemente Problem: How to protect atypical cases from too much indirect evidence?". Professor Efron is referring to the Puerto Rican sportsman Roberto Clemente, an outstanding batter and human being, who had the highest batting average of the list of 18 players. After the first 45 turns, Clemente had a score of 400 (40% of hits). Even though shrinking to a general mean improves the overall prediction of the 18 batters, for Clemente under the conjugate prior a batting average of 282 is predicted (see Table 2). The atypical Clemente was not protected from "too much of a good thing", and his personal prediction was very poor: he finished with a score of 346, much higher than predicted. The problem lies in the fact that the usual method shrinks equally all players. This problem extends beyond location parameters: formula (2.10) in Diaconis and Ylvisaker (1979) shows that if the prior for an Exponential Family parameter is chosen in the usual conjugate family its posterior mean is a linear combination of prior expectation and arithmetic mean. This implies a serious lack of robustness, since the shrinkage rate is constant regardless the conflict between prior expectations and sample means.
As an illustration assume that h is the precision parameter of Normal data with known mean. Applying (2.3) in Diaconis and Ylvisaker (1979), the conjugate prior of the precision is a Gamma distribution with prior location W 0 and "prior sample size" n 0 . It turns out that the posterior mean of the mean parameter W can be written as: It is clear then that the rate of shrinkage n0 n0+n is constant, regardless the conflictW −W 0 . So any Exponential Family parameter shares the same behavior when conjugate priors are employed, regardless if the parameter is location, scale, etc. The goal, then, is a model that shrinks less the exceptional, without inflating the mean square error of prediction. We fit the model for different hyperparameters (p, q, b). We used one of the assessment strategies discussed in Section 2, selecting b such that P (σ 2 > 1.5) = 0.1 (the value 1.5 was chosen using the empirical variance for the transformed data, s 2 = 1.116). Under this condition, the values b = 0.17 for p = q = 1 and b = 0.038 for p = q = 1/2 were elicited. The former prior leads to a reduction of MSE of 8.4% over the conjugate, while the SBeta2(1/2,1/2,0.038) prior leads to a lesser reduction of 5%, so here again the SBeta2(1,1,b) performed slightly superior than the half-Cauchy (though it can be argued that in both cases the shrinkage of Clemente is still excessive). However, assigning higher values to b, as in a SBeta2(1,1,1), arguably leads to a satisfactory reduction in mean square error of almost 7% simultaneously with less shrinkage of the extreme values, Clemente and Alvis (see Table 2). It should be noted that the SBeta2 is somewhat sensitive to the assessed values of the scale b. Also note that the improvement in the overall mean squared error does not imply that all the individual predictions are better.

Simulation study
A simulation study was performed in order to analyze the performance of the SBeta2 as a prior distribution for scales in hierarchical models in different scenarios. Back in the setting of the 8-schools example, we generated data according to the model in equation (6). Values for the effects α j were fixed according with three scenarios: all similar in magnitude, a few medium outliers and one large outlier. We simulated data for J = 3, 4, 5, 6, 7, 8, 9 and 10. The value for σ 2 y is known and we want to determine the effect of the prior distribution of σ 2 α on the estimation error of the effects α j . For each fixed value of α j and σ 2 y we generated 1000 samples from a normal distribution for the error terms. We fitted the model using five different prior distributions for σ 2 α : SBeta2(1,1,625), SBeta2(0.5,0.5,625), Inverse-Gamma(0.001,0.001), SBeta2(1,1,b) and SBeta2(0.5,0.5,b), where values for b were assigned such that p(σ 2 α >Var(α j )) = 0.5 (and therefore b =Var(α j )). We carried out 10000 MCMC simulations with a burn in of 2000 for each case.
In order to study the estimation error we computed the global estimation error whereα kj is the posterior mean of α j calculated for simulated dataset k. Table 3 shows the results for the first scenario, where all α j 's are similar in magnitude. The values for these effects were set within a range of ±0.5 with Var(α i ) = 0. 093, 0.067, 0.150, 0.135, 0.130, 0.156, 0.138, 0.178 for J from 3 to 10, respectively. We observe that for each J the global estimation error is smaller when the model is fitted using SBeta2(1,1,Var(α j )) as prior distribution for σ 2 α , followed by the fitting with SBeta2(0.5,0.5,Var(α j )). The distributions that exhibit the largest errors are the SBeta2 with b = 625; this seems reasonable since we are assigning a prior distribution with big scale parameter in a situation where the effects are similar in magnitude, and therefore their variance is small.   Table 4 shows the results for the second scenario: a few medium outliers. The majority of values for the effects were set within the range ±0.5, and a few in the range ±2 and ±3. One medium outlier was assigned for J = 3, 4, 5, two medium outliers for J = 6, 7 and three medium outliers for J ≥ 8, with Var(α j ) = 1.803, 1.200, 1.332, 2.647, 2.246, 2.876, 2.520, 2.299 for J from 3 to 10, respectively. The global estimation error is smaller using SBeta2(1,1,Var(α j )) prior except in the case J = 10. For J ≥ 6 (with more than one medium outlier) using the Inverse-Gamma prior leads to larger global estimation errors. Table 5 shows the results for the third scenario: one large outlier. Again the values for all effects were set within a range ±0.5 except one, which was assigned an absolute value greater than 5. The variance for the random effects is Var(α i ) = 9.403, 6.650, 6.400, 5.180, 3.878, 4.039, 3.436, 3.290 for J from 3 to 10, respectively. The largest global error corresponds to the Inverse-Gamma(0.001, 0.001   We calculated 95% highest posterior density intervals for the effects in each of the simulations corresponding to the three scenarios. With these intervals we calculated the coverage rate. When all effects are similar in magnitude the coverage rates are almost equal regardless the prior employed. For the second scenario (a few medium outliers) the coverage rates for medium outliers change with the prior, and they are smallest when the Inverse-Gamma(0.001,0.001) is used. For example, Table 6 shows the coverage rates when J = 8, where α j = −0.3, 0.1, 0.3, −0.2, 0.5, 2.5, −2.6, 2.8. It can be seen that when the effects are similar in magnitude the coverage rates are almost equal but for α 6 , α 7 and α 8 the coverage rates are smaller. The biggest coverage rate for these three medium outliers is obtained with the SBeta2 (1,1,625)   Consider the case when J = 3 and the scenario is one large outlier. In this situation we selected α j = -0.3, 0.1, and 5.2, as commented before. Table 7 shows that the coverage rate for the large outlier obtained with any of the SBeta2 distributions as prior on σ 2 α is greater than the one obtained using the Inverse-Gamma(0.001,0.001). Similar results are obtained for other values of J: the coverage rates are almost equal when the effects are similar in magnitude and smaller when the model is fitted using the Inverse-Gamma(0.001,0.001) prior. Even though the main intention in this subsection is Prior distribution on σ 2 α j Inverse-Gamma SBeta2 SBeta2 SBeta2 SBeta2 (0.001,0.001) (0.5,0.5,625) (1,1,625) (0.5,0.5,V(α)) (1,1,V (α))  1  100  100  100  100  100  2  100  100  100  100  100  3 87.0 99.4 100 92.5 94.9 Table 7: Simulation study: Coverage rate for J = 3 in the scenario with one large outlier.
to compare SBeta2 with different parameters and Inverse-Gamma, it is to be expected that assuming t-priors for the random effects would improve the robustness of the methodology, as in the previous subsections.

Final remarks
In this article we put forward the idea of the Scaled Beta2 as a standard family for prior distributions of scales for both hierarchical and non-hierarchical models. The Scaled Beta2 distribution is naturally motivated, flexible and amazingly tractable. For specific values of hyperparameters, it leads to closed form results, and generalizes previous proposals in the literature like a half-Cauchy for standard deviations. For ranges of