Computer algebra and algorithms for unbiased moment estimation of arbitrary order

: While unbiased central moment estimators of lower orders (such as a sample variance) are easily obtainable and often used in practice, derivation of unbiased estimators of higher orders might be more challenging due to long math and tricky combinatorics. Moreover, higher orders necessitate calculation of estimators of powers and products that also amount to these orders. We develop a software algorithm that allows the user to obtain unbiased estimators of an arbitrary order and provide results up to the sixth order, including powers and products of lower orders. The method also extends to finding pooled estimates of

Abstract: While unbiased central moment estimators of lower orders (such as a sample variance) are easily obtainable and often used in practice, derivation of unbiased estimators of higher orders might be more challenging due to long math and tricky combinatorics. Moreover, higher orders necessitate calculation of estimators of powers and products that also amount to these orders. We develop a software algorithm that allows the user to obtain unbiased estimators of an arbitrary order and provide results up to the sixth order, including powers and products of lower orders. The method also extends to finding pooled estimates of ABOUT THE AUTHORS Inna Gerlovina is a postdoctoral scholar at the University of California, San Francisco. She has worked on small sample inference and error rate control as well as higher-order inferential approaches, developing software packages for high-dimensional data analysis. Her current interests include development, implementation, and application of statistical methods that contribute to the understanding of malaria epidemiology and transmission. Inna completed her MA and PhD in Biostatistics at the University of California, Berkeley.
Alan Hubbard, Professor of Biostatistics, University of California, Berkeley, is a director of the Biomedical Big Data pre-doctoral training program, and co-director of the Center of Targeted Learning, is head of the computational biology Core E of the SuperFund Center at UC Berkeley (NIH/EPA), as well a consulting statistician on several federally funded and foundation projects. He has worked as well on projects ranging from molecular biology of aging, epidemiology, and infectious disease modeling, but most all of his work has focused on semiparametric estimation in high-dimensional data. His current methods-research focuses on precision medicine, variable importance, statistical inference for data-adaptive parameters, and statistical software implementing targeted learning methods. He is currently working in several areas of applied research, including early childhood development in developing countries, patient outcomes from acute trauma, environmental genomics and comparative effectiveness research in diabetes care.

PUBLIC INTEREST STATEMENT
Higher-order statistics are increasingly used in various research fields and data analysis, and central moment estimates are useful for many approaches. Derivation of higher-order unbiased central moment estimators has long been a challenging task; software made the general order solution possible. This paper describes a direct approach to obtaining estimators of any order, including multi-sample pooled estimators. It also introduces an open source R package Umoments, which calculates one-and twosample estimates up to sixth order and contains machinery to obtain even higher-order estimates, including a combinatorial algorithm that can be used for solving other problems and assist in long derivations (e.g. Edgeworth expansions). higher central moments of several different populations (e.g. for two-sample tests). We introduce an R package Umoments that calculates one-and two-sample estimates and generates intermediate results used to obtain these estimators.

Introduction
Most data analysis methods rely on estimating unknown quantities such as characteristics of an underlying distribution or an effect of a treatment. From a variety of possible estimators of an unknown true parameter, the ones that are typically chosen have certain desirable properties-e.g. consistency, efficiency, or unbiasedness. When the sample size is moderate or small, finite sample behavior of an estimator-such as bias, variability, and mean squared error-is particularly relevant and is therefore often given special consideration. In addition, when estimation is conducted across multiple samples or studies (pooled estimators), bias may become an important issue.
Moments of a distribution are the most basic building blocks of statistical analysis and their estimates are present in some form in virtually any practical application. A sample average is an estimate of the mean (first moment). Estimates of the variance (second central moment) are routinely used in statistical inference; they are present in all studentized statistics, of which the most common example is an ordinary t-statistic. Higher moments and their estimates, while not as widely used, can be important in various statistical applications and inferential procedures; they also comprise cumulants and their scaled versions (skewness, kurtosis). They are often used in signal processing, financial modeling, and many other areas (for a list of applications, see Pebay, Terriberry, & Kolla et al., 2014). Methods that employ higher-order statistics might utilize data in a more efficient way and offer greater insight into the distribution of interest, thus providing additional refinement in inference, for example through the use of higher-order approximations to the distribution of a test statistic-such as empirical Edgeworth expansions (Bickel, 1974;Hall, 1987;Putter & van Zwet, 1998). These methods would require higher-order moment estimation and warrant considerations about estimators' finite sample properties; since moderate or small sample analysis would benefit from such higher-order approaches, unbiased estimates could prove particularly useful. Naïve estimators m k ¼ n À1 ∑ n i¼1 ðX i À XÞ k , k ¼ 2; 3; . . . of central moments μ k are biased-that is, Eðm k ÞÞμ k . The first unbiased estimator was introduced for variance by Friedrich Bessel; it is obtained by multiplying m 2 by a factor n=ðn À 1Þ, thus often called Bessel's correction. That estimator is a part of an ordinary t-statistic and therefore plays a role in Student's t-distribution; it corresponds to the degrees of freedom in chi-squared distribution that arises from ∑ n i¼1 ðX i À XÞ 2 ,χ 2 nÀ1 when X is a standard normal random variable. The corresponding standard deviation estimator, however, is still biased (though the bias is reduced) and underestimates the true parameter. Interest to unbiased moment and cumulant estimation has a long history, which led to theoretical advances and various strategies to be able to obtain higher-order estimators. The work of R.A. Fisher (1928) (Fisher, 1930) provided basis for much of this research, particularly on cumulants; for central moments, unbiased estimators up to fourth order (or "weight" in some literature) have been published by Harald Cramér in 1946(Cramér, 2016; the results were later expanded for more complex settings (e.g. including weights (Rimoldini, 2014)).
Whereas derivation of unbiased moment estimators in general is straightforward, higher-order calculations involve long algebra and require obtaining nontrivial coefficients, and brute-force calculations of these coefficients become unfeasible fairly quickly. Having observations from different populations or categories, requiring pooled estimators, compounds the problem. Unlike second and third central moments, where naïve biased moment estimators differ from unbiased ones by a constant factor that does not depend on data, subsequent orders require calculation of combinations (integer powers and products) of lower moments that amount to the same order, which in turn creates systems of equations to be solved. With computer algebra, manipulating long algebraic expressions and solving reasonably large systems of linear equations is no longer an issue; the challenge can then be condensed to finding an expectation of the form EðX , of an arbitrary order and length, written in terms of sample size and true central moments of the distribution of X. Thus, a software algorithm that solves this problem and computer algebra can provide the machinery needed to obtain one-sample and multi-sample pooled estimates of any order, limited only by available processing power.
General order solutions for many problems formulated in the course of unbiased estimation history, including cumulant and moment estimation, are provided in mathStatica (Rose & Smith, 2002), an add-on package for the proprietory computer algebra system Mathematica. Still, given many potential uses for such estimates, there is a need for open-source software and easy-to-use tools, accessible to a wide range of researchers, that could be seamlessly incorporated into data analysis. Multi-sample pooled estimation, which has not received much attention in higher-order statistics pursuit (and is not included in mathStatica), can have many practical applications, especially in two-sample settings (e.g. comparing treatment and control groups). In addition, open access to the code and algorithms that are used in generating arbitrary order estimates can be used for obtaining other statistical results, e.g. Edgeworth expansions. We introduce an R package Umoments , which provides pre-programmed functions that calculate one-and two-sample estimates up to sixth order, either from data or from naïve biased estimates, as well as algorithms and tools for generating general order estimators.
In this paper, we break down the procedure of obtaining unbiased moment estimators of an arbitrary order, as well as estimators of products and powers of moments (also referred to as generalized h-statistics (Tracy & Gupta, 1974)) such as μ k i μ l j Á Á Á ; an analogous procedure is provided for multi-sample pooled estimators. Additionally, this direct approach is illustrated in the Sage and Jupyter https://github.com/innager/unbiasedMoments templates. Next, we describe the algorithm that generates an expression for expectation of raw (non-central) sample moments and their powers and products, thus automating the challenging part of the derivation. Results section provides a full set of one-sample unbiased estimators up to sixth order (or "weight" in some literature); two-sample pooled estimators can be found in Umoments package but orders four and higher are too long to include in the paper. Results are followed by a quick overview of Umoments package functions; we conclude with a discussion about practical applications of these estimators.

One-sample estimates
For simplicity, we can consider a mean-zero random variable without any loss of generality. Let X i ; . . . ; X n be a sample of independent identically distributed random variables with EðXÞ ¼ 0 and central moments μ k (mean μ 1 ¼ 0, variance μ 2 ), in this case equal to raw moments; X ¼ 1 n ∑ n i¼1 X i . We also adopt the following useful notation: MðÁÞ-unbiased estimator of an expression inside the parentheses (for quantities such as central moments and their powers and products), e.g. E Mðμ 2 3 Þ The steps to obtain unbiased estimators of a general order are straightforward: (1) for a desired order, list all the moment combinations for that order (example provided below); (2) expand their naïve biased estimators (remove the brackets); (3) take expectations and represent the results in terms of moments μ k and sample size n; this will produce an equation or a system of equations; (4) solve this equation or a system of equations for true moments.
As an illustration, we go through these steps for Mðμ 3 Þ, an unbiased estimator of a third central moment: Eðm 3 Þ; Mðμ 3 Þ ¼ n 2 ðn À 1Þðn À 2Þ m 3 : Steps 2 and 4 are trivial and are performed using computer algebra. Calculation of any unbiased moment estimator of a given order involves all the combinations (powers and products of moments) of that order, which means that for fourth and higher orders there will be a system of equations rather than a single equation (recall that μ 1 ¼ 0). For example, estimators of seventh order will include Mðμ 7 Þ, Mðμ 2 μ 5 Þ, Mðμ 2 2 μ 3 Þ and Mðμ 3 μ 4 Þ (Step 1); Step 2 will correspondingly expand m 7 , m 2 m 5 , m 2 2 m 3 and m 3 m 4 producing four equations. Since the equations need to be solved for a given order's combinations of true moments, not individual moments, and all the equations in the system are linear in that order, it makes sense to treat these combinations as single variables, thus solving a system of linear equations.
Step 3 is more challenging but the problem can be reduced to finding an expression for E X j1 X 2 j 2 X 3 j 3 Á Á Á since any term from the right-hand side of Step 2 equations can be written in that form-e.g. 1 A general solution to this problem is provided in Umoments package ) that generates expressions for these expectations using combinatorics. This algorithm is explained in detail in Section 3.

Pooled estimates
A simple extension of the method can be used to obtain unbiased estimators of central moments for samples that contain observations from several populations or categories. We demonstrate the procedure on a two-category estimation, which extends trivially to any number of categories.
For a sample X 1 ; . . . ; X nx ; Y 1 ; . . . ; Y ny , X?Y, let where m k is a two-sample analog of the naïve biased estimator described previously. Note that pooled estimation implies an assumption of equality of estimated central moments between distributions of X and Y: μ xk ¼ μ yk ¼ μ k ; k ¼ 2; 3; . . . . Using this assumption, independence of X and Y, and one-sample results from Step 3 in Section 2.1, we extend Step 3 of the roadmap to incorporate two variables and obtain expectations.
Example: obtain two-sample pooled estimate of the third central moment. Using one-sample result (1), we get Eðm 3 Þ ¼ n x Eðm x3 Þ þ n y Eðm y3 Þ n x þ n y ¼ n 2 x n y þ n x n 2 y À 6n x n y þ 2 n x þ 2 n y n x n y ðn x þ n y Þ μ 3 ¼ 1 À 2 ð3n x n y À n x À n y Þ n x n y ðn x þ n y Þ ! μ 3 ;

which yields
Mðμ 3 Þ ¼ n x n y ðn x þ n y Þ n 2 x n y þ n x n 2 y À 6n x n y þ 2 n x þ 2 n y m 3 : For this particular example, the result matches one-sample case if n x ¼ n y . That is not true in general, however. All the higher orders involve powers and products of lower moments that need to be expanded before taking expectations, affecting the systems of equations. For example, E m 2 2 À Á ¼ E n x m x2 þ n y m y2 n x þ n y 2 " # ¼ n 2 x Eðm 2 x2 Þ þ 2 n x n y Eðm x2 ÞEðm y2 Þ þ n 2 y Eðm 2 y2 Þ ðn x þ n y Þ 2 :

Generating expressions for expectations
To derive general order expectations of naïve moment estimators and their powers and products, one needs to find expectations E X j 1 X 2 j 2 X 3 j 3 Á Á Á . To build up to this, we first describe the procedure for generating E X k , which easily extends to E X k X 2 l and then to the general case that involves an arbitrarily long product of X i j i .

Generate
To find Equation (4), we need to consider all the different combinations of ordered indices i 1 ; i 2 ; . . . ; i k ; i j ¼ 1; . . . ; n for each j. There are n k such combinations but many combinations yield the same EðX i 1 Á Á Á X i k Þ-for example, EðX 2 X 2 X 2 X 2 X 2 X 5 X 5 X 1 X 1 Þ ¼ EðX 4 X 3 X 4 X 6 X 6 X 3 X 6 X 6 X 6 Þ ¼ μ 2 2 μ 5 . Combinations that produce the same expectation form a set that we will call a grouping (similar to "partitions" and "augmented symmetric functions" in some terminology), and the problem therefore reduces to considering all the groupings (each producing a distinct expectation) and calculating their coefficients, which are the number of combinations in each set. Each product X i 1 Á Á Á X i k can be broken into smaller products, or groups, of X's with the same indices such as fX i j : i j ¼ cg, c ¼ 1; . . . ; n. The number of groups ranges between 1 (when all the indices are the same: i 1 ¼ i 2 ¼ Á Á Á ¼ i k ) and k (when all the indices are different: i 1 Þi 2 Þ Á Á Á Þi k ); sizes of these groups determine EðX i 1 Á Á Á X i k Þ. Thus, each grouping is fully characterized by the number of groups and the group sizes.
Let d denote the number of groups in one grouping G and a 1 ; . . . ; a d -the numbers of X's in each group, ∑ d u¼1 a u ¼ k; set of group sizes is unordered, so assigning indices to a's is arbitrary (e.g. decreasing). In the example above: k ¼ 9, d ¼ 3, a 1 ¼ 5, a 2 ¼ 2, and a 3 ¼ 2. If ∑ d u¼1 Iða u ¼ 1Þ>0 (at least one group is of size 1), EðX i 1 Á Á Á X i k Þ ¼ 0 since EðXÞ ¼ 0 and there is no need to calculate a coefficient for this grouping, which is important in terms of computational efficiency; otherwise, By adding a subscript g to indicate a grouping G, we get where C g is the coefficient for G, i.e. the number of combinations that yield fa g;u g.
One way of arriving at the expression for C g could be the following: there are ðnÞ d sg;1! s g;2 ! ÁÁÁ ways to pick (unordered) indices that satisfy given group sizes (set fa g;u g) and k! a g;1 !a g;2 !ÁÁÁa g;d ! ¼ k a g;1 k À a g;1 a g;2 Á Á Á a g;dÀ1 þ a g;d a g;dÀ1 ways to place these indices on k positions (a multinomial coefficient).
Our software generates expressions for E X k for a given k using the method described above. To find all the possible groupings, we impose an ordering on them and use it to generate each consecutive grouping when the previous one is given, thus moving through a complete set of groupings from fa 1 ¼ kg to fa 1 ¼ a 2 ¼ Á Á Á ¼ a k ¼ 1g. For example, in an agglomerative order, a grouping f5; 2; 2g is preceded by f5; 2; 1; 1g and followed by f5; 3; 1g.
The smallest number of groups is d ¼ 1, which produces an order of n n k ¼ 1 n kÀ1 (the highest order in the range); the largest d with a non-zero contribution to Eð X k Þ is k 2 Ä Å (when the indices of X appear in pairs and there are no unpaired indices; when k is odd, one of the groups is of size 3), and the order it produces is 1 n k 2 d e .

Generate
To generate expressions for Equation (7), we extend the algorithm described in Section 3.1 for Equation (4). Now groups consist of X's and X 2 's with the same indices: fX is ; X 2 jt : i s ¼ j t ¼ cg, c ¼ 1; . . . ; n, and are thus described not by a single number (group size) but by a pair ða; bÞ, where a is the number of i's and b is the number of j's in the group. Consequently, a grouping in this version is characterized by a set of pairs fða u ; b u Þg, u ¼ 1; . . . ; d; ∑ d u¼1 a u ¼ k, ∑ d u¼1 b u ¼ l, and its definition is different from the one in Section 3.1 since for given k and l there can be different groupings that yield the same expectation, e.g. groupings fð2; 3Þ; ð3; 0Þ; ð1; 1Þg, fð4; 2Þ; ð1; 1Þ; ð1; 1Þg, and fð0; 4Þ; ð3; 0Þ; ð3; 0Þg will all produce EðX i 1 Á Á Á X i 6 X 2 j 1 Á Á Á X 2 j 4 Þ ¼ μ 2 3 μ 8 . Analogously to the original version, if ∑ d u¼1 Iða ¼ 1; b ¼ 0Þ>0 (at least one pair in the grouping is ð1; 0Þ), EðX i 1 Á Á Á X i k X 2 j 1 . . . X 2 j l Þ ¼ 0; otherwise, EðX i 1 Á Á Á X i k X 2 j1 . . . X 2 j l Þ ¼ Q d u¼1 μ auþ2bu . Note that to account for all the possible groupings in this case, permutations need to be used, adding another layer to computational complexity.
Coefficient C g for a grouping G is calculated in a similar way to Section 3.1 (Equation (6)) with a few adjustments: where s g;1 ; s g;2 ; . . . are the numbers of the groups with same values for a and b.
In this case, the order ranges from 1 n kþlÀ1 , when

Umoments R package
Umoments contains a set of pre-programmed functions that calculate one-sample and pooled two-sample unbiased moment estimates, both up to sixth order. This functionality is primarily useful for data analysis. The estimates can be calculated either directly from the sample or from naïve biased estimates, in which case sample size n needs to be provided. For two-sample estimation, input should also include labels indicating which observation belongs to which sample/category, or both n x and n y for sample sizes. Below are some examples.
Two-sample pooled estimates from the data up to sixth order (note that smp is a data vector, and treatment is a vector of labels that separates it into two categories): Other functions in the package can be used to obtain higher-order estimators, pooled estimators across multiple (three or more) samples, and other statistical results.

Discussion
The difference between unbiased and biased estimators depends on the sample size and might be considerable for small samples; also, for fixed sample size, it is relatively greater for higher orders. At the same time, variability of the estimators is an important factor to be considered in this biasvariance trade-off, especially in connection with sample size n and the order of the estimators as variability increases with higher orders (which might be offset by the lower contribution/weight of these orders in certain methods) and smaller samples. Another question is the relationship between n and the maximal order that could reasonably be used in a method; besides purely algebraic restrictions on a sample size for a given order, apparent from the expressions for unbiased estimators (n ! k for k'th order estimators), there might be another underlying stricter relationship that needs to be explored, either theoretically or numerically.
While unbiased estimators of products and integer powers of moments are possible to obtain, that is not the case with ratios and roots. Of course, such biased estimators, like a square root s of sample variance s 2 ¼ 1 nÀ1 ∑ n i¼1 ðX i À XÞ 2 or skewness estimator, are widely used in practice. Adding to the complexity is the fact that since unbiased estimator of the ratio cannot be obtained, simplifying expressions should also be questioned-consider, for example, scaled sixth cumulant: This example also provides an illustration for another important consideration that should factor into a decision of which estimators to use-variability of the denominator in studentized statistics. In λ 6 , sixth cumulant κ 6 is scaled by μ 3 2 ; to substitute for this unknown quantity, a variety of estimators can be used: m 3 2 , Mðμ 2 Þ ½ 3 , or Mðμ 3 2 Þ, to name a few. While expression for Mðμ 2 Þ (and thus its cube) contains m 2 only, the expression for Mðμ 3 2 Þ includes m 4 and m 6 as well. These higher-order quantities may be highly variable, especially in the small sample, and therefore the whole ratio becomes highly sensitive to the small values of estimates in the denominator that can inflate λ 6 dramatically, increasing variability of the ratio to the point of unusability. Therefore, it might be advisable in certain cases, e.g. with considerably skewed distributions, to perform some numeric exploration to determine if it might be indeed preferable to use lower-order estimators, naïve biased or unbiased, in place of parameters in denominators because of their relative stability ("power of mean" instead of "mean of power").