Orthogonal contrasts for both balanced and unbalanced designs and both ordered and unordered treatments

We consider designs with t treatments, the ith level of which has ni observations. Four cases are examined: treatment levels both ordered and not, and the design balanced, with all ni equal, and not. A general construction is given that takes observations, typically treatment sums or treatment rank sums, constructs a simple quadratic form and expresses it as a sum of squares of orthogonal contrasts. For the case of ordered treatment levels, the Kruskal–Wallis, Friedman and Durbin tests are recovered by this construction. A dataset where the design is the supplemented balanced, which is an unbalanced design in our terminology, is analyzed. When treatment levels are not ordered the construction also applies. We then focus on Helmert contrasts.

tively, the Kruskal-Wallis, Friedman, and Durbin test statistics into sums of squares of orthogonal contrasts.However their approach is only presented for balanced designs in which each of the ordered treatment levels is observed the same number of times.The construction given subsequently gives orthogonal contrasts for both ordered and unordered treatment levels and both balanced and unbalanced designs.The sum of squares of these contrasts is a simple quadratic form that may be used as an omnibus test statistic, or, for example, to aggregate squared contrasts into a residual.The construction may be such that the quadratic form is a rank test statistic or a treatment sum of squares in an analysis of variance (ANOVA).
For ordered treatments the construction is described in Section 2 and the distribution theory in Section 3. It takes the treatment (rank) sums, standardizes and normalizes them, generates orthogonal contrasts and aggregates them into a simple closed form.Asymptotically the aggregated statistic has a  2 distribution and the contrasts are orthogonal, each with the  2 1 distribution.Unordered treatment levels are considered in Section 4. Familiarity with the material in Thas et al. (2012) will assist the reader.
For ordered treatments we generally use the orthogonal polynomials, although other options are possible.Our construction does not require the values of the independent variable to be equally spaced, since the orthogonal polynomials are constructed using arbitrary ordinal scores.We also use orthonormal functions; the orthogonality is required for the contrasts to be uncorrelated, while the normality is required to give unique decompositions of the omnibus test statistics.
To end this introduction we give an example to demonstrate the decompositions given here.The RL test used in this example was proposed in Rayner and Livingston Jr (2023).It is a rank sum statistic to analyze Latin square data and, as will be shown in the Appendix, is consistent with the construction described in the next section.Since the Latin square design is balanced the orthogonal contrasts could also have been derived using the approach in Thas et al. (2012).The recommended test first aligns and then ranks the raw data.We now comment briefly on alignment.
The Latin square is an orthogonal design, so the parametric F test statistic is not "contaminated" by the block effects.However RL test statistic is affected by block effects, and alignment enables the removal of those block effects.Alignment (Hodges & Lehmann, 1962) is a technique often used in analyzing data from factorial designs to obtain nonparametric tests of interaction effects that would otherwise be virtually inaccessible.Here the idea is to "strip" away the row and column effects by aligning and then analyzing the aligned data.See Rayner and Livingston Jr (2023, section 9.4).Given the raw data {Y ijk } define which the dot subscript indicates summation and i refers to treatments, j to rows and k to blocks.The aligned data is {X ijk }.

1.1
Traffic example Kuehl (2000, p. 301) considers the scenario in which a traffic engineer conducts a study to compare the total unused red-light time for five different traffic light signal sequences.The experiment was conducted with a Latin square design in which blocking factors were (1) five intersections and (2) five times of day periods.In Table 1 the five signal sequence treatments are shown in parentheses as A, B, C, D, E and the numerical values are the unused red-light times in minutes.Although Kuehl does not give details of the blocking factors, we assume that both are ordinal and use natural scores.For example, time period may refer to 10-minute periods throughout the day.These may be, say, 8:00 a.m., 9:00 a.m., 1:00 p.m., 4:00 p.m. and 5:00 p.m.  Best and Rayner (2011) report different analyses, both parametric and nonparametric, with treatment p-values both above and below .05.They also suggest the value 19.2 at intersection 2 and time period 3 might be an outlier.
A parametric analysis of the raw data gives a p-value of .0498for treatments.The residuals are consistent with normality.Nevertheless the possible outlier suggests it may be informative to consider the ranks.
The ANOVA F-test on the ranks gives a p-value of .0328for treatments and a corresponding permutation test p-value of .0167.For the Latin square design we find  2 p-values unreliable and hence use permutation test p-values, here based on 1,000,000 permutations.
The aligned rank sums are 76, 94, 22, 54, and 79 for A to E, respectively.Plots of these sums against treatments, ordered A-E, show a shape that is possibly cubic.Incidentally, the rank sums for the nonaligned data are 72, 74, 47, 62, and 70; alignment has made quite a difference, especially with the second and third sums.
Orthonormal contrasts were constructed using the orthonormal polynomials.The permutation test p-values of the RL test and its linear, quadratic, cubic, and quartic orthonormal contrasts were .0780, .5971, .1172, .1948, and .0680,respectively.The detailed analysis here shows that not only is the more omnibus RL test not significant at the 0.05 level, but neither are any of the focused contrasts.Sometimes omnibus tests mask focused effects, but that is not happening here.
As noted previously, Rayner and Best (2001) reported analyses with p-values both below and above .05.The same is happening here with the F-test on the ranks and the RL test.Just as different people may perceive the same object similarly but differently, so many different tests find similar but different p-values for the analysis of the same data.
To calculate the contrasts here and subsequently for the strawberry example requires the calculation of multiple orthonormal polynomials.We use Emerson recurrence; see Emerson (1968) and Rayner, Thas, and De Boeck (2008).

THE CONSTRUCTION
In the following no constraints on the design are imposed, except that the definitions are meaningful.There may be several factors present, but we will focus on one that will be called treatments.
Initially the treatment levels are assumed to be ordered.There are t treatment levels, with n i observations of the ith, and n observations in all.The construction is developed for ranked data; modifications for unranked ordered data are immediate.The expression developed for G O below is appropriate when ranking is overall.A slightly modified version, G WB , is subsequently developed for when there is a single blocking factor and ranking is within blocks.
The raw (unranked) data are represented as y ij, j = 1, … , n i and i = 1, … , t, the jth observation on the ith treatment.Initially we take r ij to be the overall rank of y ij .Suppose the sum of the ranks for treatment i is and that the sum of all the ranks is T = ∑ t i=1 R i .If ties occur mid-ranks will normally be used.Provided the rank sum is preserved at n(n + 1)/2-what it would have been with untied data-other ranking options may be used.
We now define a contrast.Suppose c = (c 1 , … , c t ) T is a vector of contrast coefficients and that w = (w 1 , … , w t ) T is a vector of variables or parameters.Then for balanced designs Thas et al. ( 2012) say c T w is a contrast if c 1 + … + c t = 0. However more generally we need to account for treatment levels with unequal numbers of observations.For i = 1, … , t put p i = n i /n and redefine the contrast coefficients as As an example, suppose we have t samples with means ) ∕n is a contrast in the sample means.For example, take c 1 = n 2 and c 2 = − n 1 so that a contrast involving the first two sample means is , the variance of the possibly tied ranks.Now for i = 1, … , t define the centered and normalized rank sums, in which E[R i ] = Tn i ∕n, the proportion of the rank sum attributable to the ith treatment.Note that ∑ i Z i = 0. Next suppose q 1 , … , q t are positive and sum to one: q i > 0 for i = 1, … , t and Here  rs is the Kronecker delta:  rs = 1 for r = s and zero otherwise.The expectation is with respect to the distribution defined by {q j }.It is customary to take the orthogonal function of degree zero to be identically 1.For subsequent notational comfort we designate ht to be this function: As before, if for r = 1, … , t -1 the h T r Z are orthonormal contrasts and f is a constant, then the f h T r Z are also orthonormal contrasts.Note that, as is common in the literature, we frequently say 'orthogonal contrasts' when the contrasts are, in fact, orthonormal.To proceed a lemma is needed.We previously defined p i = n i /n, a particular case of the more general weights q i .Lemma.Suppose that h 1 , … , h t-1 , h t = 1 t are t × 1 vectors orthonormal with weight function q = (q 1 , … , q t ) T .Define D = diag(q 1, … , q t ).Then The orthonormality means that for r, s = 1, … , t, h T r Dh s =  rs , and hence that H*DH* T = I t .Put K = H*D 0.5 .Then KK T = I t , so that K is an orthogonal matrix.Hence K T K = I t = D 0.5 H* T H* D 0.5 and D −1 = H* T H* = H T H + 1 t 1 T t and the stated result follows.Now from the lemma, if Y is any t × 1 vector with elements Y i , then premultiplying by Y T and postmultiplying by Y gives .
This result requires no distributional assumptions.
We now apply the lemma to {Z i } and {p i } defined previously.Since The subscript "O" is to indicate ranking is overall rather than within blocks.
The development for balanced designs in Thas et al. ( 2012) required, for their Y , 1 T t Y = 0 to construct their contrasts.They then use the singular value decomposition theorem to show that their test statistics are the sum of squares of t -1 contrasts.We can achieve the same result by putting q i = 1/t in the lemma.▪ We now turn to ranking within blocks.First note that if, as in the Latin square design, there is more than one blocking factor, then ranking within blocks is ambiguous.However ranking overall is still available.The following assumes there is only one blocking factor as, for example, in the randomized block design.For such designs ranking both within blocks and overall is available.
When ranking is overall the construction decomposes , the variance of the possibly tied ranks.When ranking is within blocks then we now define r ij as the rank for treatment i in block j, so that R i = ∑ j r ij , the rank sum over blocks for treatment i.For E[R i ] instead of the proportion of the overall rank sum attributable to the ith treatment we need to aggregate the proportions of the block rank sums attributable to the ith treatment, Suppose there are b blocks in all, with b j observations in block j.The construction is applied to and gives One advantage of defining G O and G WB as we have is that by so doing we recover tests such as the Kruskal-Wallis and Friedman.This is shown in the Appendix.

DISTRIBUTION THEORY
We now show that Z T ( D −1 − 1 t 1 T t ) Z = G say, asymptotically has the  2 t−1 distribution and the contrasts are asymptotically uncorrelated.This follows when either ranking method, overall or within blocks, is used.Hence the notation, dropping the subscripts on G O and G WB . Put A is idempotent.By the central limit theorem the W i are asymptotically N(0, 1).The eigenvalues of A are one t -1 times and zero once.It follows that the rank of A is t -1 and the distribution G is ) 2 are asymptotically  2 1 distributed.In Thas et al. ( 2012) the objective was to decompose statistics known, at least asymptotically, to have the  2 distribution.Here that is not necessarily the case.As in the supplemented balanced example following it is the construction that gives the omnibus test statistic ∑ i Z 2 i and the central limit theorem that gives its distribution.In practice the χ 2 approximation to this sampling distribution may be poor, the conditions for the CLT not having been sufficiently met.See the discussions for the Durbin and RL statistics in Rayner and Livingston Jr (2023).In fact it may also be the case that while ∑ i Z 2 i is well-approximated by the  2 t−1 distribution, the squared contrasts ( h T r Z ) 2 are not well-approximated by the  2 1 distribution.A conservative way forward would be to not rely on the χ 2 sampling distributions and use permutation testing instead.However this denies the distributional assumptions in the parametric model and ultimately the independence of the different squared contrasts cannot be relied upon.Thus a significant linear effect may induce a significant quadratic effect.Nevertheless the squared contrasts are still appropriate test statistics, and the contrasts are still orthonormal.The contrasts are therefore assessing different disjoint aspects of the null hypothesis of equal treatment means.
Often the contrasts are being calculated more as exploratory data analytic tools.As in the strawberry example following, formal conclusions from the use of orthogonal contrasts will augment subjective conclusions from tools such as data plots.

3.1
Strawberry example Pearce (1960) considered the supplemented balanced design and gave an example using the strawberry dataset in Table 2, in which there are five treatments and four blocks.The control occurs twice on each block while the other treatments occur twice on one block and once on the other blocks.Thus there are eight observations of the control and five of each of the other treatments.
The experiment is unbalanced in our sense, but since there are seven observations on each block there is also a sense of balance that justifies Pearce's use of the term.The data can be ranked overall after first aligning to remove the block effects, or ranked within blocks.We do both.We also give permutation test p-values.
Pesticides are applied to strawberry plants to inhibit the growth of weeds.The response represents the total spread in inches of 12 plants per plot approximately 2 months after the Note: Each cell gives the pesticide, the response and the corresponding overall mid-rank.
application of the weedkillers.The question is do they also inhibit the growth of the strawberries?There is an assumed treatment order: Analysis in Rayner and Livingston Jr (2023, chapters 8 and 10) suggest a relationship between the means of the raw data that is predominantly quadratic with an element of linearity.
If the data are aligned before ranking overall then using the chi-squared distributions with permutation test p-values based on 10,000,000 permutations in parentheses, we find that the omnibus statistic has p-value .0001(.0000) while the p-values for the four orthogonal contrasts based on the orthogonal polynomials are .0433(.0416), .0005(.0002), .0278(.0254), and .1352(.1376), respectively.The aggregated effect is highly significant; the most dominant of the effects is the quadratic.
If instead the data are ranked within blocks the analysis yields very similar results.The chi-squared distribution p-values with permutation test p-values based on 10,000,000 permutations in parentheses finds the omnibus statistic has p-value .0002(.0000) while the p-values for the four orthogonal contrasts are .0401(.0501), .0017(.0017), .0128(.0149), and .1497(.1721), respectively.While the quadratic contrast is strong, so is the cubic, suggesting a complex relationship between the mean ranks.
For both methods of blocking the chi-squared p-values are generally similar to those based on the permutation distribution.

ORTHOGONAL CONTRASTS WHEN TREATMENT LEVELS ARE UNORDERED
Although we could give a more general exposition, suppose we have a one factor ANOVA with observations The factor, total and error sums of squares are given by SSF = and as previously p i = n i /n, both for i = 1, … , t.Then ∑ i Z i = 0 and If {h r } are orthonormal functions with weight function {p i } and if h 1 = 1 t , then as before, the h T r Z, r = 2, … , t, are contrasts.The procedure from Section 2 gives ) 2 , and we also have that this is nSSF.
A useful approach is to construct orthonormal functions from orthogonal matrices.Suppose p i > 0 for i = 1, … , t and The orthogonality of V gives I t = V V T = H D H T so that h T r Dh s =  rs : the {h r } is orthonormal with weight function (p 1 , … , p t ) T .Moreover the first row of H, h T 1 , is h T 1 D −0.5 = 1 T t .The construction of Section 2 now applies with the { h T r Z } the orthonormal contrasts.One option in statistical analysis is to use the Helmert matrices (Lancaster, 1965) to construct Helmert contrasts.For positive p i write P k = p 1 + … + p k , k = 1, … , t so that P t = 1.Then the Helmert matrix V is given by.
As above put H = V D -0.5 .Then h 1j = 1 for j = 1, … , t.For r = 2, … , t the h r are orthonormal functions that lead to contrasts that compare the first r Z i .We have ) with m r,j = 0 for j > r > 1.
Thus for r = 2, … , t the rth Helmert contrast, h T r Z, is given by In the one factor ANOVA since SSF = { ( ) 2 } ∕n, the contrasts decompose the treatment sum of squares.The rth contrast test statistic, r = 2, … , t, is and has the F 1,n-t distribution.For many datasets the Helmert contrasts may not be relevant.The usual process is the context determines which contrasts are of interest, and this leads to a set of linearly independent vectors to which the Gram-Schmidt orthogonalization process can be applied to construct the orthonormal vectors, starting with ( √ p 1 , … , √ p t ) T .

The ordered effect of alcohol on anxiety example
Five groups of 50-year-old adults were administered between 0 and 4 oz of pure alcohol per day over a 1-month period.At the end of the experiment, their anxiety scores were measured with a well-known anxiety scale.At the time of writing the data, given here in Table 3, were available at 5-63 from https://www.marekrychlik.com/sites/default/files/05_contrasts1.pdf.
An ordered analysis using orthogonal polynomials gives the  2 p-values (with permutation test p-values in parentheses) for the contrasts corresponding to r = 1, 2, 3 and 4 and the omnibus statistic were .0015(.0000), .0749(.1901), .3963 (.3200), .1609 (.2398), and overall .0032(.0004), respectively.The mean anxiety scores for the different alcohol levels are 120.75,103.8, 100.0, 82.6, and 95.8.There is a strong linear (downward) effect with an upturn for the final treatment inducing a weak quadratic effect.The significant overall effect is barely diluted by the nonlinear effects.
When the ordering of the treatments is ignored the Helmert contrasts  2 p-values (with permutation test p-values in parentheses) were .0015(.0154), .0749(.0284), .3962(.0005), .1609(.3132), and overall .0032(.0007).The first Helmert contrast compares the first two treatments, the second compares the first two treatments with the third, and so on.Overall there is a significant difference in means, with the second, third and fourth means significantly different from their predecessors.Only the mean of the fifth treatment is not different from the mean of its predecessors.
Here the agreement between the  2 and permutation test p-values is not as good as for the supplementary balanced design.That being the case, more emphasis should be given to the permutation test p-values.Of course how good the agreement between  2 and permutation test p-values will depend on several factors, such as the sample size, the design and as the last two examples here suggest also, how unbalanced the design is.
√ ∑ j S j .We now show that specific cases of the construction yields the Kruskal-Wallis test statistic even when the design is not balanced, and the Friedman, Durbin and RL test statistics.

A.1. Completely randomized design
The Kruskal-Wallis test statistic is Since ranking is overall, preserving the rank sum however ties occur, ∑ i,j r ij = 1 + … + n = n(n + 1)/2 = T. Thus E[R i ] = (n i /n) n(n + 1)/2 = n i (n + 1)/2 and In the construction, as var = ∑ i,j r 2 ij ∕n − {(n + 1)∕2} 2 , As previously mentioned, multiplying orthonormal contrasts by the same constant results in another set of orthonormal contrasts.Since h T u Z, u = 1, … , t -1 are orthonormal contrasts whose sum of squares is G O , h T u Z √ n−1 n , u = 1, … , t -1 are orthonormal contrasts whose sum of squares is KW A .

A.2. Balanced incomplete block design
The adjusted Durbin statistic is given by Unused red-light time in minutes.
TA B L E 1 Growth of strawberry plants after applying pesticides.
TA B L E 2 Anxiety scores for different alcohol consumptions.