Numerical Computing and Graphics for the Power Method Transformation Using Mathematica

This paper provides the requisite information and description of software that perform numerical computations and graphics for the power method polynomial transformation. The software developed is written in the Mathematica 5.2 package PowerMethod.m and is associated with ﬁfth-order polynomials that are used for simulating univariate and multivariate non-normal distributions. The package is ﬂexible enough to allow a user the choice to model theoretical pdfs, empirical data, or a user’s own selected distribution(s). The primary functions perform the following (a) compute standardized cumulants and polynomial coeﬃcients, (b) ensure that polynomial transformations yield valid pdfs, and (c) graph power method pdfs and cdfs. Other functions compute cumulative probabilities, modes, trimmed means, intermediate correlations, or perform the graphics associated with ﬁtting power method pdfs to either empirical or theoretical distributions. Numerical examples and Monte Carlo results are provided to demonstrate and validate the use of the software package. The notebook Demo.nb is also provided as a guide for user of the power method.


Introduction
The power method polynomial transformation (Fleishman,1978, Eq.1;Headrick, 2002, Eq.16) is a popular moment-matching technique used for simulating continuous non-normal distributions in the context of Monte Carlo or simulation studies.The primary advantage of this transformation is that it provides computationally efficient algorithms for generating univari-Until recently (Headrick and Kowalchuk 2007), two problems associated with the power method were that its pdf (probability density function) and cdf (cumulative distribution function) were unknown (Tadikamalla 1980;Kotz, Balakrishnan, and Johnson 2000, p. 37).Thus, it was difficult (or impossible) to determine a power method distribution's percentiles, peakedness, tailweight, or mode (Headrick and Sawilowsky 2000b).However, these problems were resolved in an article by Headrick and Kowalchuk (2007) where the power method's pdf and cdf were derived in general form.For a detailed discussion on the properties and theory underlying the power method transformation see Headrick and Kowalchuk (2007).
In view of the above, the objective is not to discuss the theory underlying the power method but to provide a Mathematica 5.2 (Wolfram 2003) package, executed under <<PowerMethod`, that performs numerical computations and graphics associated with the power method transformation.In Section 2, we present the essential requisite information for the user of the power method in the context of polynomials of order five.In Section 3, some Mathematica functions are described and numerical examples and simulation results are provided to demonstrate and validate the use of the source code.For more detail, the user can inspect the source code and its implementation in the files PowerMethod.nband Demo.nb.

Univariate non-normal data generation
The power method transformation is summarized by the polynomial where Z ∼ i.i.d.N (0, 1).Setting r = 6 in (1) gives the Headrick (2002) class of distributions associated with polynomials of order five.We note that setting r = 4 in (1) would give the smaller Fleishman (1978) class of distributions.
The shape of Y in (1) is contingent on the values of the constant coefficients c =1,...,6 .These coefficients are computed by solving the system of equations in Appendix A of Headrick and Kowalchuk (2007) for a set of specified standardized cumulants γ =3,...,6 .Note that the mean (γ 1 ) and variance (γ 2 ) are arbitrarily set to zero and one in A1 and A2.Thus, in terms of univariate non-normal data generation, the transformation in ( 1) is computationally efficient because it only requires an algorithm that generates standard normal random deviates and the knowledge of six coefficients.
The pdf and cdf associated with Y in (1) are given in parametric form ( 2 ) as in Headrick and Kowalchuk (2007) where −∞ < z < +∞, the derivative Y (z) > 0 (i.e.Y is a strictly increasing monotonic function in Z), and where f Z (z) and F Z (z) are the standard normal pdf and cdf.
To illustrate, depicted in Figure 1 are the graphs of fifth-order power method pdfs and cdfs.These graphs were obtained using (2) and (3) and the numerical and graphing techniques for symmetric and asymmetric distributions described and demonstrated in PowerMethod.nband Demo.nb.The standardized cumulants γ =3,...,6 listed in Panels A and B are associated with Student's t df =7 and χ 2 df =3 distributions, respectively.One of the limitations associated with fifth-order polynomial transformations is that some combinations of cumulants in this class of power method distributions will not produce valid power method pdfs.For example, consider a logistic distribution which has standardized cumulants of γ 3 = 0, γ 4 = 6/5, γ 5 = 0, and γ 6 = 48/7.These cumulants will yield coefficients c for (1) but will not produce a valid power method pdf because Y (z) in (2) is not a strictly increasing monotonic function for all z ∈ (−∞, +∞) i.e., Y (z) = 0 at z = ±8.1813for the logistic pdf.However, a technique that can often be used to mitigate this limitation is to increase γ 6 , ceteris paribus.For example, the values of γ 3 = 0, γ 4 = 6/5, γ 5 = 0, and γ 6 = 62/7 will produce a valid power method pdf and thus allow for the more interpretable values of skew (γ 3 ) and kurtosis (γ 4 ) to be preserved.

Multivariate non-normal data generation
The power method can be extended from univariate to multivariate non-normal data generation by specifying k equations of the form in (1) as where i = j.A controlled correlation between two non-normal distributions Y i and Y j is accomplished by making use of equation ( 17) in Appendix B. More specifically, the left-hand side of ( 17) is set to a specified correlation ρ Y i Y j , the coefficients c i and c j are substituted into the right-hand side, and then ( 17) is numerically solved for the intermediate correlation The solved intermediate correlations ρ Z i Z j are assembled into a k × k matrix that is subsequently decomposed (e.g., a Cholesky decomposition).The results from the decomposition are used to generate standard normal deviates Z i and Z j , correlated at the intermediate levels, that are then transformed by k polynomials of the form in ( 4) and ( 5) such that Y i and Y j have their specified shapes and correlation.

Univariate distributions
Two Mathematica functions available for computing theoretical (empirical) standardized cumulants γ =3,...,6 (γ =3,...,6 ) are based on the two sets of equations in Appendix A. There is also a function available for computing values of γ =3,...,6 based on Fisher's k-statistics.These three functions require the user to specify a constant denoted as SixCon.Specifically, SixCon is initially set equal to zero and ideally the computed standardized cumulants associated with a theoretical density or an empirical data set will yield a valid power method pdf such as a chi-square distribution (df > 1).There are cases where a constant will have to be added to the sixth cumulant in order to produce a valid power method pdf e.g., SixCon=2 for the logistic distribution as noted at the end of Section 1 and in Demo.nb.
Given a set of standardized cumulants, the coefficients c associated with (1) can be computed by one of three Mathematica functions depending on a user's need.More specifically, the coefficients can be computed using PowerMethodX [cumulants_List] where X=1 for theoretical asymmetric pdfs or empirical data, X=2 for theoretical symmetric pdfs, or PowerMethod3[gamma3_,..., gamma6_] for the case where the user may want to freely load the cumulants.On solving for a set of coefficients, these values can then be used by functions to determine if they will also yield a valid power method pdf.That is, test the condition that the coefficients satisfy that Y (z) > 0 for all z ∈ (−∞, +∞) in (2).A number of examples are provided in Demo.nb for the user's perusal.
To demonstrate the use of the PowerMethod.mpackage, comparisons were made between various theoretical pdfs and their power method analogs using the functions that perform graphics and compute cumulative probabilities (or percentiles) and trimmed means.Specifically, depicted in Figure 2 are the graphs of the exponential, Beta, and Gamma pdfs with their power method analogs superimposed on these theoretical pdfs.Inspection of these graphs, the percentiles in Table 1, and the trimmed means in Table 2 indicate that the power method pdfs provide good approximations to these theoretical pdfs.
In terms of empirical pdfs, presented in Figure 3 are power method pdfs superimposed on measures of body density, weight, height, and percent body fat taken from n = 252 adult males (http://lib.stat.cmu.edu/datasets/bodyfat).Inspection of Figure 3 indicates that the power method pdfs provide good approximations to the empirical data.Further, the trimmed means listed in Table 3 are all within the 95% bootstrap confidence intervals based on the data.The confidence intervals are based on 25000 bootstrap samples.
One way of determining how well a power method pdf models a set of data is to compute a chi-square goodness of fit statistic.For example, listed in Table 4 are the cumulative percentages and class intervals based on the power method's pdf for the body density data.The asymptotic value of p = .438indicates the power method pdf provides a good fit to the data.It is noted that the degrees of freedom for this test were computed as df = 3 = 10(class intervals)−6(parameter estimates)−1(sample size).

Simulating multivariate non-normal distributions
Presented in Table 5 is a specified correlation matrix ρ Y i Y j between the power method distributions depicted in Figure 2 where Y 1 , . . ., Y 4 have the standardized cumulants associated with Panels A,. ..,D, respectively.3.  Table 7: Cholesky decomposition on the intermediate correlation matrix in Table 6.  2. The estimates are based on 10000 replications of samples of size n = 1000000.Each cell contains the parameter γ =3,...,6 enclosed in parentheses and is rounded to three digits.
use of the formulae where V 1 , . . ., V 4 are independent standard normal random deviates.The values of Z 1 , . . ., Z 4 are then used in equations of the form in ( 4) and ( 5) to produce Y 1 , . . ., Y 4 with their specified shapes in Figure 2 and the specified correlation structure in Table 5.
To empirically demonstrate, the four power method distributions depicted in Figure 2 were simulated in accordance to the specified correlation matrix in Table 5 using an algorithm coded in FORTRAN 77.The algorithm employed the use of subroutines UNI1 and NORMB1 (Blair 1987) to generate pseudo-random uniform and standard normal deviates.Single draws of size n = 1000000 were drawn from each of the four distributions and the sample correlation coefficients ρY i Y j were computed.These values are reported in Table 8.
The estimates of the standardized cumulants were obtained by applying equations ( 13), ( 14), (15), and (16) in Appendix A to samples of size n = 1000000 for each of the four distributions.
The empirical estimates γ =3,...,6 for each distribution were computed by taking the overall average across the 10000 replications.Thus, each estimate γ =3,...,6 was based on ten billion random deviates and the results are reported in Table 9. Inspection of Table 8 and Table 9 indicate that the procedure produces excellent agreement between the empirical estimates and parameters.

Comments
The advantages and limitations of the power method transformation (Fleishman 1978;Headrick 2002;Headrick and Kowalchuk 2007) are similar to the transformation associated with the generalized lambda distribution (e.g., Headrick and Mudgadi 2006;Karian and Dudewicz 2000;Ramberg, Tadikamalla, Dudewicz, and Mykytka 1979).Specifically, both procedures can generate a variety of univariate pdfs as well as simulate correlated data sets in a computationally efficient manner.However, both classes of pdfs are limited to the extent that they do not span the entire space in the plane defined by the inequality for skew (γ 3 ) and kurtosis (γ 4 ) i.e. γ 4 > γ 2 3 − 2 where γ 4 = 0 for the normal distribution.Nevertheless, as demonstrated in Demo.nb, a user of the power method has the flexibility to alter the sixth cumulant γ 6 (or γ6 ) if needed to create a valid pdf e.g., the Extreme Value pdf where SixCon=1.Such small alterations to γ 6 (or γ6 ) should have little impact on an approximation of a pdf and yet preserve the more important interpretable values of skew and kurtosis.In terms of γ6 , we would note that it has high variance (see equation six in the PowerMethodX functions) and that errors in its estimation have the potential to be magnified.It is also worth pointing out that the amount of increase to SixCon required to create a valid power method pdf is positively correlated with the original estimate γ6 from the data.That is, the larger the estimate of γ6 usually requires a larger increase in SixCon to create a valid pdf.See, for example, the estimates of γ6 for Weight and Percent Body Fat given in Panels B and D of Figure 3.We would also note that increasing γ 6 (or γ6 ) in order to create a valid pdf is not a panacea.For example, a theoretical density where the power method will not yield valid pdfs are the values associated with the Beta[a,b] distribution when either a or b is equal to one.
Finally, we note that the initial starting values (int1,...,int6) that are set for the command FindRoot in the PowerMethodX functions obtained all solutions for the coefficients in Demo.nb, the examples in this manuscript, as well as for a large set of Tables (available on request from the first author) that provides a range of possibilities for the power method.As such, we recommend that the user do not alter the initial starting values.

Figure 3 :
Figure 3: Power method approximations to empirical pdfs based on measures taken from n = 252 men.† The values of γ6 associated with Weight and Percent Body Fat had to be increased to 234.431 and 12.735 to ensure valid power method pdfs.

a
11 = 1 a 12 = 0.444 a 13 = 0.583 a 14 = 0.643 0 a 22 = 0.896 a 23 = 0.502 a 24 = 0 Table 6 gives the required intermediate correlation matrix which was created by separately solving each of the six bivariate cases using the function InterCorr as demonstrated in Demo.nb for ρ Z i Z j .Table7gives the results of a Cholesky decomposition on the intermediate correlation matrix.These results are subsequently used in an algorithm to create Z 1 , . . ., Z 4 having the specified intermediate correlations by making

Table 1 :
Percentiles of distributions (x i ) and their power method analogs in Figure2.

Table 2 :
Power method approximations of trimmed means from theoretical distributions.

Table 3 :
Power method approximations of trimmed means from empirical distributions.Each empirical trimmed mean is based on a sample size of n = 152 and has a 95% bootstrap confidence interval enclosed in parentheses.

Table 4 :
Observed and expected frequencies and χ 2 test based on the power method approximation to the body density data in Figure

Table 8 :
Empirical estimates of the population correlations (ρY i Y j ) inTable 5.The estimates are based on single draws of size n = 1000000 from each distribution.