Confidence intervals for standardized effect sizes: Theory, application, and implementation

The behavioral, educational, and social sciences are undergoing a paradigmatic shift in methodology, from disciplines that focus on the dichotomous outcome of null hypothesis significance tests to disciplines that report and interpret effect sizes and their corresponding confidence intervals. Due to the arbitrariness of many measurement instruments used in the behavioral, educational, and social sciences, some of the most widely reported effect sizes are standardized. Although forming confidence intervals for standardized effect sizes can be very beneficial, such confidence interval procedures are generally difficult to implement because they depend on noncentral t, F, and x2 distributions. At present, no main-stream statistical package provides exact confidence intervals for standardized effects without the use of specialized programming scripts. Methods for the Behavioral, Educational, and Social Sciences (MBESS) is an R package that has routines for calculating confidence intervals for noncentral t, F, and x2 distributions, which are then used in the calculation of exact confidence intervals for standardized effect sizes by using the confidence interval transformation and inversion principles. The present article discusses the way in which confidence intervals are formed for standardized effect sizes and illustrates how such confidence intervals can be easily formed using MBESS in R.


Introduction
In the behavioral, educational, and social sciences (BESS), units of measurement are many times arbitrary, in the sense that there is no necessary reason why the measurement instrument is based on a particular scaling.Many, but certainly not all, constructs dealt with in the BESS are not directly observable and the instruments used to measure such constructs do not generally have a natural scaling metric as do many measures, for example, in the physical In the BESS, a large debate has been underway for some time about the importance of reporting effect sizes and confidence intervals (e.g., Schmidt 1996;Meehl 1997;Thompson 2002;Cohen 1994;Kline 2004, and the references contained therein) rather than only the dichotomous reject or fail-to-reject decision from a null hypothesis significance test (see Krantz 1999, for a review of the tension that sometimes exists between statisticians and methodologists regarding this debate).On the surface, it seems there is no reason not to report effect sizes and their corresponding confidence intervals.However, effects sizes based on raw scores are not always helpful or generalizable due to the lack of natural scaling metrics and multiple scales existing for the same phenomenon in the BESS.A common methodological suggestion in the BESS is to report standardized effect sizes in order to facilitate the interpretation of results and for the cumulation of scientific knowledge across studies, which is the goal of meta-analysis (e.g., Hunter and Schmidt 2004;Glass, McGaw, and Smith 1981;Hedges and Olkin 1985).A standardized effect size is an effect size that describes the size of the effect but that does not depend on any particular measurement scale.A standardized effect size thus represents a pure number, in the sense that the magnitude of the effect is not wedded to a particular scale.The act of standardization is not, however, generally based on a set of known population parameters, but rather the standardization process is based on sample statistics.This is the case for two reasons: (a) many measures have not been normed in order to determine the population parameters of interest for the population in general or for a subpopulation of interest and (b) it is generally desirable to base the standardized effect size on the particular sample rather than mixing population parameters with sample statistics.

CI construction
Seminal work by Neyman (1935;1937) laid the foundation for confidence interval formation.Recall that the general procedure for a confidence interval yields an upper and lower limit, such that the probability that the fixed parameter is contained within a random interval is 1 − α, where α is the Type I error rate and 1 − α is the confidence level coverage.The general form of the confidence interval is given as where θ is some parameter of interest, θ L (X) and θ U (X) are the lower and upper random confidence limits, respectively, which are based on the observed data, X, and p denotes probability.For notational ease θ L (X) and θ U (X) will be denoted θ L and θ U , respectively, with the understanding that the lower and upper confidence limits are random because they depend on the random data.
Two sections follow that discuss the ways in which θ L and θ U can be calculated for confidence intervals for different types of effect sizes commonly used in the BESS.The first approach is the standard approach that is generally given in applied texts and implemented in statistical software programs.The second approach, however, is more difficult conceptually and computationally than the first approach and is not generally discussed in applied texts nor implemented in statistical software without specialized programming scripts.The second approach is ultimately what is of interest to the present work, with the first approach given to provide a context for the second approach.

CIs for pivotal quantities
Suppose the population standard deviation is known for a normally distributed population with some unknown mean for a sample of size N , the following inequality holds with 1 − α probability where M is the sample mean and σ M is the population standard deviation of the sampling distribution of the mean, defined as σ/ √ N , and z (•) represents the quantile from the standard normal distribution at the subscripted probability value.The inequality contained within the brackets can be manipulated by first multiplying the inequality by σ M , which removes the value in the denominator of the inequality's center: (3) The value of M from the center of the inequality can be removed by subtracting M from the center and both sides: Multiplying the inequality by −1 to make −µ positive-requiring the inequalities to be reversed as is the case when inequalities are multiplied by a negative value-the resultant equation can be given as Further manipulation of the inequality yields Because z (α/2) is always negative and the normal distribution is symmetric, −1 × z (α/2) is equivalent to z (1−α/2) and the right hand side of the inequality thus reduces to M + z (1−α/2) , which implies that Equation 6 can be written as Although often optimal, in the sense that the confidence interval is as narrow as possible (Casella and Berger 2002), there is no reason to restrict the confidence intervals to those where the lower and upper rejection region are equal.The specified α can be conceptualized as consisting of two parts, α L and α U , where α L is the rejection region for the lower limit and α U the rejection region for the upper limit (i.e., the proportion of time that the parameter will be less than the lower confidence limit or greater than the upper confidence limit, respectively).Thus, more generally, the confidence interval formation given in Equation 7 can be written as or for convenience where α L + α U = α.The confidence interval method given in Equation 9, or a closely related rewriting, is what is typically given in applied texts.Generally discussed is only one type of special case of nonequal rejection regions where α L = α U : for one sided confidence intervals where α L or α U is set to zero.
The equations above each assumed that σ was known.In almost all applications, σ is unknown and it is necessary to use a (central) t-distribution with the appropriate degrees of freedom instead of basing the confidence interval on critical values from the standard normal distribution.With unknown σ, the analog of Equation 2 would be where s M is the estimated standard deviation of the sampling distribution of the mean defined as s/ √ N with s being the square root of the unbiased estimate of the variance, ν are the degrees of freedom, which are N −1 in the context of single sample designs when σ is unknown, and t (α/2;ν) is the α/2 quantile from a t-distribution with ν degrees of freedom.Through a set of manipulations and reductions analogous to Equations 2 through 8, a (1−α)100% confidence interval can be obtained for µ when σ is unknown, which is given as or for convenience Notice that in Equation 2 the quantity in the center of the inequality is simply a z-test statistic, whereas in Equation 10 the quantity in the center of the inequality is simply a t-test statistic.The logic of transforming a probabilistic statement for a particular z-test or t-test statistic to a confidence interval for µ was possible in the manner done because the center of the inequality could be reduced to only the population parameter of interest (i.e., µ) and the interval did not depend on any unknown parameters.This procedure used to transform the probability statement to a confidence interval is known as inverting the test statistic (Casella and Berger 2002, Section 9.2.1; see also Kempthorne and Folks 1971, Section 13.3) because the α100% region of implausible parameter values (i.e., where p < α under the null hypothesis) is inverted to form the (1 − α)100% region of plausible parameter values (i.e., where p > α under the null hypothesis).Confidence intervals can be formed by inverting the test statistic when the quantity of interest is a pivotal quantity.A pivotal quantity, sometimes termed a pivot, is a quantity whose probability distribution does not depend on any unknown parameters (e.g., Casella and Berger 2002, Chapter 9;Stuart, Ord, and Arnold 1999, Chapter 19).When the test statistic cannot be transformed into a confidence interval by reducing the probabilistic statement concerning the test statistic into a probabilistic statement concerning the parameter, implying the quantity is not pivotal, a more general approaches to confidence interval formation is required.This more general approach to confidence interval formation is discussed in the next section.

CI formation for nonpivotal quantities
Although confidence intervals for commonly used pivotal quantities (e.g., the mean, mean difference, variance, regression coefficients, etc) are well known, many effects of interest in the BESS are not pivotal.In particular, standardized effect sizes (e.g., standardized mean differences, standardized regression coefficients, coefficients of variation, etc.) and effect sizes that are bounded (e.g., correlation coefficients, squared multiple correlation coefficients, proportions, etc.), which are considered standardized since they do not depend on the particular measurement scale are not generally pivotal quantities.Thus, confidence intervals for such effects cannot be obtained by inverting their corresponding test statistic, which was the method discussed in the previous section for a quantity that was pivotal.Many applied statistics texts either do not mention confidence intervals for such nonpivotal quantities or provide only approximate methods of confidence interval formation, sometimes without mentioning that the methods are approximations.
Confidence interval formation by inverting the test statistic requires that the effect of interest be a pivotal quantity.If not, another more general approach is required.This more general approach to confidence interval formation has been termed pivoting the cumulative distribution function (Casella and Berger 2002, Section 9.2.3; see also Kempthorne and Folks 1971, Section 13.4).When a quantity related to a test statistic is not pivotal, the sampling distribution of the estimate depends on an outside parameter that is almost certainly not known and implies that the test statistic cannot be inverted.The solution to such a problem is to find the value of the unknown parameter that leads to the observed cumulative probability of the test statistic being 1 − α L , which becomes the lower confidence limit of the parameter, and to find the value of the unknown parameter that leads to the observed cumulative probability of the test statistic having probability α U , which becomes the upper confidence limit of the parameter.For effects of interest in the BESS, these unknown parameter values are generally noncentrality parameters (e.g., Steiger and Fouladi 1997;Cumming and Finch 2001;Smithson 2003;Steiger 2004).
For example, forming a confidence interval for µ when σ is unknown was shown in Equations 11.However, suppose what is of interest is forming a confidence interval for the population standardized mean, which is estimated by replacing the parameters with their sample analogs, where m is the sample estimate of M, the population standardized mean.1 Because s (or s M ) cannot be pivoted, as s is necessary to standardize M , and the center of the quantity does not contain the population quantity, rewriting Equation 10 yields where the √ N was removed from the center of the inequality by multiplying the inequality by 1/ √ N .The lack of pivotibility of the quantity is due to s necessarily being involved in the center of the inequality since it is what standardizes the mean and the population value not being contained within the inequality.Furthermore, the population effect size is not contained within the center of the inequality.Because the test statistic cannot be pivoted, it is necessary to pivot the cumulative distribution function of the test statistic itself.Before pivoting the cumulative distribution function, a discussion of noncentral distributions is necessary.
The widely used (central) t-distribution is a special case of the more general noncentral t distribution, when the noncentrality parameter equals zero.Johnson, Kotz, and Balakrishnan (1995) provide a modern overview of noncentral t-distributions (see also Johnson and Welch, 1940).From Johnson et al. (1995), in the single sample situation when the distribution is normal and σ unknown, the population noncentrality parameter from a noncentral t-distribution can be written as where µ 0 is the specified value of the null hypothesis, which will be set to zero in the present context without loss of generality.Thus, there is a one-to-one relationship between the nonpivotable quantity in Equation 10 and the noncentrality parameter from a t-distribution.Although a known probability distribution does not literally exist for (M − µ)/s, it is indirectly available via the noncentral t-distribution, which is denoted t (ν;λ) , where ν are the degrees of freedom and λ is the noncentrality parameter.The noncentrality parameter, which can be conceptualized as an index of the magnitude of difference between the null and alternative hypotheses, can be estimated as where Equation 17and 18 are equivalent to the observed t-test statistic: Given the linkage between the test statistic, the standardized mean, and the noncentral tdistribution, it is helpful to discuss two important principles when forming confidence intervals for standardized effect sizes.Theses principles are the confidence interval transformation principle and the inversion confidence interval principle, both of which will be discussed momentarily.The names of these principles were coined and the concepts discussed in Steiger and Fouladi (1997) with a review given in Steiger (2004).Steiger and Fouladi (1997) did not literally develop the theory behind these principles, rather their important work combined disparate statistical theory into a formal context for forming confidence intervals.When these principles are combined a set of powerful tools for confidence interval formation become available for standardized effect sizes that are related to a noncentrality parameter.Almost all of the effects commonly used in the BESS can be linked to a noncentrality parameter from t, F , or χ 2 distributions.Methods for forming confidence intervals for noncentrality parameters from t, F , and χ 2 distributions have been implemented in the Methods for the Behavioral, Educational, and Social Sciences (MBESS; Kelley 2007b,a) R package (R Development Core Team 2007).
The confidence interval transformation principle is beneficial for forming a confidence interval on a parameter that is monotonically related to another parameter, when the latter has a tractable method of obtaining the confidence interval whereas the former might not.Let f (θ) be a monotonic transformation of θ defined by the function f Thus, for monotonically related parameters the confidence interval for the transformed population quantity is obtained by applying the same transformation to the limits of the confidence interval as was done to the population quantity itself (Steiger and Fouladi 1997;Steiger 2004).
The inversion confidence interval principle states that if θ is an estimate of θ with a cumulative distribution that depends on ς, some necessary outside parameter, the probability of observing an estimate (i.e., θ) smaller than that obtained is given as p( θ|ς) (i.e., it is the cumulative probability).Calculation of a confidence interval for θ based on the inversion confidence interval principle involves finding θ L such that p( θ|θ L ) = 1 − α L for the lower limit and θ U such that p( θ|θ U ) = α U for the upper limit.The confidence interval for θ then has coverage of 1 − (α L + α U ) and is given as where θ is some parameter of interest with θ L and θ U being the lower and upper confidence interval limits, where θ L will be greater than θ α L 100% of the time and θ U 100% will be less than θ α U 100% of the time.The confidence interval procedure is general and need not have equal rejection regions.For example, the most common confidence interval without equal rejection regions is obtained by setting α L or α U (whichever is appropriate for the specific situation) to zero, which is simple a one sided confidence interval.
Returning to the example of confidence interval formation for the standardized mean, it now becomes apparent that indeed, a λ L value can be found such that p(t|λ L ) = 1 − α L and a λ U value can be found such that p(t|λ U ) = α U .Given the values of λ L and λ U , these noncentral values can be transformed into the metric of the standardized mean.Manipulation of Equation 18shows that λ L and λ U can be substituted for t in Equation 22, so that the lowest and highest plausible values of the standardized mean can be obtained given the specified values of α L and α U .Thus, the confidence interval for the standardized mean is given as This confidence interval is realized by first computing the confidence interval on the noncentrality parameter from a t-distribution and then transforming the limits of the confidence interval into the metric of the standardized mean.
For example, suppose M = 50, s = 10, and N = 25.The estimated noncentrality parameter (i.e., the estimated t-test statistic for the test of the null hypothesis that µ = 0) from Equation 18is λ = 25.A 95% confidence interval for the population noncentrality parameter is where CI .95represents the 95% confidence interval limits.Dividing the limits of the confidence interval by √ N will then transform the confidence limits for λ into confidence limits for M, which is given as Although there is no cumulative distribution function developed specifically for the standardized mean, by pivoting the cumulative distribution function for the noncentral t-distribution, limits of the confidence interval for M can be calculated.These limits are those that correspond to the theoretical cumulative probability distribution of the standardized mean for the specified level of confidence.
The present section has discussed a method of obtaining an exact confidence interval for a standardized mean that itself has no known probability distribution function.Although the logic and method is not overly complex, determining θ L and θ U has been a serious issue for some time.The difficulty in finding the necessary confidence limits, by finding noncentrality parameters that have the obtained statistic at the specified quantile, has led to tri-entry (probability, degrees of freedom, and noncentrality parameter) tables (see a review of such tables in Johnson et al. 1995, chapter 31) that will almost certainly yield approximate results in any applied situation.As will be discussed in the following section, MBESS has functions that return the exact values of θ L and θ U for the most general cases of noncentral t, F , and χ 2 distributions.Thus, any effect size (e.g., those that are standardized) that has a monotonic relation to the noncentrality parameter from one of these distributions can be formed with MBESS.Within MBESS, many commonly used standardized effect sizes have functions that compute their respective confidence intervals directly (instead of the user forming a confidence interval for the corresponding noncentrality parameter and then transforming to the scale of the effect size).Some of the most popular standardized effect sizes will be discussed, and examples using MBESS given, in the remainder of the article in the analysis of variance (ANOVA) and regression contexts.

CIs for the standardized mean in MBESS
The ci.sm() function from MBESS can be used to form confidence intervals for the population standardized mean (i.e., M).Although other optional specifications exist, a basic specification of the function ci.sm() would be of the form ci.sm(sm=m, N=N , conf.level=1 − α) where sm is the observed standardized mean, N is the sample size, and conf.level is the confidence level coverage.

CIs for standardized effect sizes in an ANOVA context
The comparison of means is a commonly used technique in the BESS, as research questions are many times related to issues involving means and mean differences.This section is organized with three subsections that deal with standardized effect sizes: one concerning the mean difference of two groups, one concerning omnibus effects when several group means are involved, and a section concerning targeted effects (i.e., comparisons) when several group means are involved.

The standardized mean difference for two independent groups
One of the most commonly used effect sizes in the BESS is the standardized mean difference.The population standardized mean difference is defined as and is generally estimated by where µ 1 and µ 2 are the population means of group 1 and group 2, respectively, with M 1 and M 2 as their respective estimates, and s is the square root of the unbiased estimate of the within group variance, which is estimated as with per group sample sizes of n 1 and n 2 for group 1 and group 2, respectively, and with s 2 1 and s 2 2 being the unbiased estimate of the variance for group 1 and group 2, respectively, assuming σ 2 1 = σ 2 2 .The typical two-group t-test is defined as where s M 1 −M 2 is the standard error of the mean difference and is given as Notice that the difference between d from Equation 27 and the two-group t-test statistic from Equation 29 is the quantity , which is multiplied by s to estimate the standard error of the mean difference.Because n 2 can be rewritten as n 2 +n 1 n 1 n 2 , multiplying the inverse of this quantity by d leads to an equivalent representation of the t-test statistic: Given Equation 31, it can be seen that Equation 27can be written as The population noncentrality parameter for the two-group t-test is given as which is estimated, as in the single sample situation, with the observed t-test statistic.Thus, The analog of Equation 10 in the two group situation is The test statistic can be pivoted such that the confidence interval for the (unstandardized) mean difference, which is discussed in many sources is given as (39) However, when what is of interest is the standardized mean difference, Equation 38 cannot be pivoted as was done in Equation 39.As discussed in the single sample situation, in order to form a confidence interval for δ, a confidence interval for λ is found and then the limits transformed into the scale of δ by using Equation 32.Thus, the value of λ L is found such that p( λ|λ L ) = 1 − α L and the value λ U is found such that λ U p( λ|λ U ) = α U , in exactly the same manner as they were found in the single sample situation, the difference being the noncentrality parameter is from the two-group context with corresponding degrees of freedom N − 2. Given λ L and λ U , the confidence interval for λ is given as which is generally of interest only in so much as its transformation allows for a confidence interval to be formed for δ, as is given by the following: Confidence intervals for the standardized mean difference were detailed in Steiger and Fouladi (1997) and various aspects of such confidence intervals have been discussed in Cumming and Finch (2001), Kelley (2005), Smithson (2003), Algina, Keselman, and Penfield (2006) and Steiger (2004).

CIs for the standardized mean difference in MBESS
The ci.smd() function from MBESS can be used to form confidence intervals for the population standardized mean difference (i.e., δ).Although other optional specifications exist, a basic specification of the function ci.smd() would be of the form ci.smd(smd=d, n.1=n 1 , n.2=n 2 , conf.level=1 − α) where smd is the observed standardized mean difference, n.1 and n.2 are the sample sizes for group 1 and group 2, respectively, and conf.level is the desired confidence level coverage.

Standardized effect sizes for omnibus effects in an ANOVA context
Examining the differences among several group means is commonly done in the BESS.For example, the depression level of mildly depressed individuals might be examined as a function of whether or not the individuals were randomly assigned to the control group, the counseling only group, the drug only group, or the combination of counseling and drug group.Both targeted effects, such as the mean difference between the counseling only and the drug only group might be of interest, and omnibus effects, such as the proportion of variance on the depression scores is accounted for by the grouping factor, have standardized effect sizes that have corresponding confidence intervals for their population values.Whereas a targeted effect of interest might be a follow-up comparison, an omnibus effect of interest might be the ratio of the group sums of squares to the total sums of squares.So as to have measures of effect that are not wedded to a particular measurement scale, this section reviews selected standardized effects sizes in the ANOVA context, shows their relation to the noncentrality parameter, and illustrates how such confidence intervals for population standardized effect sizes can be easily obtained with MBESS.
In a precursor of suggestions that were to be emphasized nearly a generation later in the BESS, Fleishman (1980) discussed omnibus effect sizes and their corresponding confidence intervals in an ANOVA context.Fleishman (1980) showed the relationship between certain ANOVA effects sizes and their corresponding noncentrality parameters from noncentral Fdistributions.Given what was akin to the confidence interval inversion principle and the inversion confidence interval principle, confidence intervals for these effect sizes can be formed given confidence intervals for noncentral F -parameters.Such confidence intervals can easily be obtained in MBESS, as will be illustrated momentarily for selected effect sizes.
Let Λ p be the noncentrality parameter for the pth factor of a noncentral F -distribution in a multi-factor (or single factor) fixed effects ANOVA, which is given as where n pj is the sample size of the jth group of factor p (j = 1, . . ., J; N = n pj ), τ pj is the effect associated with being in the jth level of factor p, which is defined as with µ pj being the population mean of the jth level of factor p and µ the overall population mean, and σ2 is the mean square error (Fleishman 1980). 2 Alternatively, Equation 42 can be written as where σ 2 p is the variance due to factor p: Notice that in a single factor design the equations reduce, specifically the p subscript in each of the equations is ignored, and J is simply the number of groups.
Two effect sizes closely related to Λ p , and to one another, are the signal-to-noise ratio and the proportion of variance in the dependent variable that is accounted for by knowing the level of the factor (or group status in a single factor design; e.g., Fleishman 1980;Hays 1994;Cohen 1988).Formally the signal-to-noise ratio is defined as, and the proportion of variance in Y accounted for by knowing the level of the factor (or group status in a single factor design) is defined as where σ 2 T is the total variance of the dependent variable.There is also a close relation between φ 2 p and η 2 p : Just as in the situation of the noncentral t a Λ L value can be found such that p(F |Λ L ) = 1−α L and a Λ U value can be found such that p(F |Λ U ) = α U , where F is the value of the F -test statistic for factor p from the factorial ANOVA procedure (or simply the F -test statistic from a single factor ANOVA).Given Λ L and Λ U , a confidence interval for Λ can be formed, which is generally of interest only in so much as its transformation allows for a confidence interval to be formed for φ 2 p , η 2 p , and/or possibly other effects from an ANOVA context.The confidence intervals of interest are transformations of Equation 53 by manipulating Equations 46 and 48 in order to transform the noncentrality parameter into the effect size of interest.Thus, a confidence interval for φ 2 p is given as and a confidence interval for η 2 p is given as The square root of the signal-to-noise ratio also has a substantively appealing interpretation, as the standard deviation of the standardized means (e.g., p. 275 of Cohen 1988;Steiger 2004).Such a measure can easily be obtained due to the confidence interval transformation principal simply by taking the square root of the confidence limits for φ 2 p : CIs for standardized omnibus effects in ANOVA with MBESS The ci.snr() function from MBESS can be used to form confidence intervals for the population signal-to-noise ratio (i.e., φ 2 p ) for the pth fixed effects factor in an ANOVA setting.Although other optional specifications exist, a basic specification of the function ci.snr() would be of the form ci.snr(F.value=F, df.1, df.2, N=N , conf.level=1 − α), where F.value is the observed F value, df.1 is the numerator degrees of freedom for the particular F -test statistic, df.2 is the denominator degrees of freedom of the F -test statistic, N is the total sample size, and conf.level is the desired confidence level coverage.The ci.pvaf() function from MBESS can be used to form confidence intervals for the population proportion of variance accounted for in the dependent variable by knowing group status (i.e., η 2 p ) for the pth fixed effects factor in an ANOVA setting.Although other optional specifications exist, a basic specification of the function ci.pvaf() would be of the form ci.pvaf(F.value=F, df.1, df.2, N=N , conf.level=1 − α), where all of the input parameters are the same as for the ci.snr function.
The ci.srsnr() function from MBESS can be used to form confidence intervals for the square root of the signal-to-noise ratio for the pth fixed effects factor (i.e., φ p ) in an ANOVA setting.Although other optional specifications exist, a basic specification of the function ci.srsnr() would be of the form ci.srsnr(F.value=F, df.1, df.2, N=N , conf.level=1 − α), where all of the arguments are the same as for the ci.snr() and ci.pvaf() function.

Standardized effect sizes for targeted effects in an ANOVA context
The methods discussed in the previous section have been for omnibus measures of effect.However, targeted effects that have been standardized in an ANOVA context can also be easily implemented with the methods that have been discussed.Like many research design texts, Maxwell and Delaney (2004) discuss methods of forming comparisons among means to determine if a null hypothesis that a particular contrast (e.g., µ 1 −µ 2 = 0, (µ 1 +µ 2 )/2−µ 3 = 0, (µ 1 + µ 2 )/2 − (µ 3 + µ 4 )/2 = 0, etc.) equals zero can be rejected.The test statistic can be given as where Ψ is given as and s 2 is the mean square error note that J j=1 c j = 0, as is the case with all comparisons .
Notice that this is simply a t-test for a targeted contrast.The t-test in Equation 57 can be pivoted in order to form a confidence interval for Ψ.
As detailed in Steiger (2004), hypothesis tests for comparisons of the form Equation 57 can be standardized.As before, the process of standardization leads to a test statistic that cannot be pivoted, implying that the confidence interval for the standardized effect (ψ) is not directly available given the confidence limits for the unstandardized effect (Ψ).This issue is literally just an extension of the methods discussed in the context of two independent groups, where a confidence interval for δ was formed.The way in which Ψ is standardized only involves division by s , the root mean square error, which yields a population standardized comparison given as The noncentrality parameter from a t-distribution in this context is thus given as which is simply the t-test statistic if the population values replaced the sample estimates, as was done in Equation 57.Thus, when a confidence interval is found for λ, the limits can be transformed into a confidence interval for ψ (which is equal to δ when J = 2 because the ANOVA reduces to an independent groups t-test) by setting Equation 60 equal to ψ: Thus, a confidence interval for ψ is given as CIs for standardized targeted effects in ANOVA with MBESS The ci.sc() function from MBESS can be used to form confidence intervals for the population standardize comparison (i.e., ψ) in an ANOVA setting.Although other optional specifications exist, a basic specification of the function ci.sc() would be of the form ci.sc(means=c(M 1 , M 2 , . . ., M J ), error.variance=s 2 , c.weights=c(c 1 , c 2 , . . ., c J ), n=c(n 1 , n 2 , . . ., n J ), N=N , conf.level=1 − α).where means is a vector of the J group means, error.variance is the mean square error, c.weights is a vector of the J contrast weights (that must sum to zero), n is the vector of the J levels of sample sizes (or group sample sizes in a single factor design), and N is the total sample size (which need not be specified in single factor designs).

CIs for standardized effect sizes in a regression context
Multiple and simple regression are very popular methods in the BESS, especially for observational research, as research questions many times involve how a set of regressor (predictor/independent/explanatory) variables influence or explain a criterion (predicted/outcome/ dependent) variable.This section is organized in two sections: one concerning the omnibus effect (i.e., the squared multiple correlation coefficient) and another section concerning targeted effects (i.e., individual regression coefficients).

Standardized effect sizes for omnibus effects in multiple regression
In the special case of fixed regressor variables, where the values of the regressors are selected a priori as part of the research design, the population proportion of variance in Y that is accounted for by the K regressors is the squared multiple correlation coefficient, denoted P 2 .Notice that P 2 from a regression context is equivalent to the η 2 statistic discussed in the ANOVA context.This comes as no surprise, as both ANOVA and multiple regression are special cases of the general linear model.The population proportion of variance accounted for (in the regression context) can be written as where is the variance of Y as predicted from the K X variables (i.e., the variance of Ŷ ), σ 2 is the error variance (i.e., the variance of Y − Ŷ ), Σ X is the population covariance matrix of the K X variables, and σ XY is the vector of population covariances between the K X variables and Y .Of course, in addition to the omnibus effect of P 2 , the targeted effects provided by the regression coefficients are also of interest.The vector of K population regression coefficients, excluding the intercept, are obtained as Given that the relationship between the F -test statistic and Λ as well as the relationship between P 2 and Λ, forming confidence intervals for P 2 is simply a matter of solving for Λ L and Λ U , so that p(F |Λ L ) = 1 − α L and p(F |Λ U ) = α U can be found.The confidence limits for Λ can then be transformed to the units of P 2 with Equation 75 in accord with the confidence interval transformation principle as Of course, the square root of the confidence limits can be taken so that the confidence interval is for the multiple correlation coefficient, P, which would be given as Although not often discussed in applied texts, there are differences in the sampling distribution of R 2 when predictors are fixed compared to when predictors are random (e.g., Sampson 1974;Gatsonis and Sampson 1989;Rencher 2000).Even though the sampling distributions of R 2 in the fixed and random predictor case are very similar when the null hypothesis that P 2 = 0 is true, they can be quite different when the null hypothesis is false.The method discussed thus far for confidence interval formation assumed that the predictors were fixed.However, in most applications of multiple regression in the BESS, regressor variables are random.Basing the confidence interval formation procedure on fixed regressors when regressors are random will thus lead to a nominal (specified) Type I error rate that differs from the empirical (actual) Type I error rate, something that is never desirable.
Confidence intervals for P 2 under the case of random regressors have not often been considered.Lee (1971) and Ding (1996) provide algorithms for computing various properties of the distribution of R 2 for random regressors, which can be used when forming confidence intervals.The method of Lee (1971) was implemented by Steiger and Fouladi (1992) in an early stand-alone program, R2.Algina and Olejnik (2000) provide a summary of the approach suggested by Lee (1971) as well as a SAS script that can be used to implement the procedure discussed in Lee (1971).MBESS also allows for confidence intervals for P 2 in the context of random predictor variables based on Lee (1971) and Algina and Olejnik (2000), which gives results that are consistent with the R2 program of Steiger and Fouladi (1992).The confidence interval procedure for P 2 when regressors are random is conceptually similar to that discussed in the fixed case, however, the sampling distribution of the F -test statistics does not follow a noncentral F -distribution.Rather, Fisher (1928) showed that the sampling distribution of R 2 follows a Gauss hypergeometric distribution when predictors are random.Lee (1971) used an approximate of the sampling distribution of R 2 /(1 − R 2 ) (i.e., the sample estimate of φ 2 from Equation 74), which is monotonically related to the sampling distribution of R 2 when predictors are random.The sampling distribution of the observed signal-to-noise ratio is estimated with a three moment approximation using noncentral F -distributions based on an iterative scheme.The method of Lee (1971) is quite accurate in terms of the empirical and nominal level of confidence interval coverage.Although not many details have been included, the technical underpinnings of confidence intervals for P 2 when regressors are random is quite difficult.As will be shown, MBESS allows the regressors to be regarded as fixed or random depending on the specifications given.As might be expected due to the increased randomness where s b k is the standard error of b k defined as with R 2 X k •X −k being the squared multiple correlation coefficient when predictor X k is the dependent variable predicted from the remaining K − 1 regressors.Alternatively R 2 X k •X −k can be obtained indirectly from the covariance matrix of the predictors, S XX , as where c kk is the diagonal element of S −1 XX (Harris 2001).Since the noncentrality parameter of a t-distribution is the value of the t-test statistic if population values were substituted for the sample quantities, the noncentrality parameter for a standardized regression coefficient can be written as where Because β k can be written (e.g., Hays 1994) as φ k can be written as Notice that φ k is the square root of the signal-to-noise ratio for a targeted effect, which shows the contribution of the kth effect to the overall signal-to-noise ratio, φ 2 .
Given the representation of λ k in Equation 82, β k can be solved for such that Thus, forming a confidence interval for β k involves transforming the limits of the confidence interval for λ, by way of Equation 86:

CIs for targeted effects in multiple regression with MBESS
The function ci.src() from MBESS can be used to form confidence intervals for the population standardized regression coefficient (i.e., β k ).Although many optional specifications exist, a basic specification would be of the form ci.src(beta.k=bk , SE.beta.k=sb k , N=N , k=k, conf.level=1− α), where beta.k is the observed standardized regression coefficient, SE.beta.k is the observed standard error of the standardized regression coefficient, N is the sample size, and k is the number of regressor variables.3

General CI procedures for standardized effects
Although other standardized effects that are beneficial in the BESS and could have been discussed, the ideas presented generalize to a wide variety of standardized effects (e.g., the coefficient of variation, the Mahalanobis distance, the change in the squared multiple correlation coefficient when one or more predictors is added (or removed) from a model, the root mean square error of approximation (in the context of a structural equation model), etc.).In accord with the confidence interval inversion and transformation principles, exact confidence intervals for many such effects can be formed.The specific R functions from MBESS used to form confidence intervals with conf.limits.nct()and conf.limits.ncf().These functions provide confidence limits for noncentral t and F parameters, respectively.The confidence interval functions discussed in the article thus used these general functions and applied the necessary transformation of the confidence limits so that the scale of the confidence interval was in terms of the particular standardized effect size.Relatedly, the function conf.limits.nc.chisq() is also part of MBESS, where the function is used to form confidence intervals for noncentral χ 2 parameters.The ability to form confidence intervals for noncentrality parameters in MBESS is one of its most important features.

Discussion
The topic of confidence intervals for the general case of standardized effect sizes has not often been considered in the general statistics literature, where parameters of interest are almost always in an unstandardized form.Confidence intervals for standardized effects, have, however, recently generated much interest in the BESS (e.g., Algina et al. 2006;Cumming and Finch 2001;Kelley 2005;Kelley and Rausch 2006;Steiger and Fouladi 1997;Smithson 2001Smithson , 2003;;Steiger 2004), where confidence intervals for effect sizes of primary importance are very strongly encouraged (e.g., Wilkinson and The American Psychological Association Task Force on Statistical Inference 1999; Task Force on Reporting of Research Methods in AERA Publications 2006).However, software to implement confidence intervals for standardized effect sizes has, for the most part, been in the form of specialized scripts for specific effect sizes or stand alone software packages.MBESS, however, is a package within the R statistical language and environment and thus can be seamlessly incorporated into data analysis in R. Furthermore, the functions contained within MBESS are designed to be user friendly and accept the necessary sufficient statistics as input.Thus, for those who prefer other data analysis software programs, the MBESS functions can be used within R with only the summary statistics provided from other software programs (as was shown in the examples).One need not be a skilled R user in order to use R for confidence interval formation with MBESS, as the necessary R commands are very simple and rely only on summary statistics that are easily obtainable in R or from other programs.It is hoped that this article and the Methods for the Behavioral, Educational, and Social Sciences (MBESS; Kelley 2007b,a) R package will be helpful resources as the future of quantitative research (Thompson 2002) unfolds in the behavioral, educational, and social sciences, where inferences based on dichotomous outcomes from null hypothesis significance tests (reject or fail-to-reject) are replaced by effect sizes and their corresponding confidence intervals.