All about the ⊥ with its applications in the linear statistical models

Abstract For an n x m real matrix A the matrix A⊥ is defined as a matrix spanning the orthocomplement of the column space of A, when the orthogonality is defined with respect to the standard inner product ⟨x, y⟩ = x'y. In this paper we collect together various properties of the ⊥ operation and its applications in linear statistical models. Results covering the more general inner products are also considered. We also provide a rather extensive list of references


Introduction
Consider a columnwise partitioned matrix A D .a 1 W : : : W a m / 2 R n m (the set of n m matrices with real elements). Then the column space of A is defined as C .A/ D f z 2 R n W z D At D a 1 t 1 C C a m t m for some t 2 R m g: The notation C .A/ ? refers to the orthocomplement of C .A/, i.e., the set of vectors which are orthogonal (with respect to the standard inner product) to every vector of C .A/: C .A/ ? D f u 2 R n W u 0 At D 0 for all t 2 R m g: Now A ? is defined as a matrix whose column space is C .A ? / D C .A/ ? D N .A 0 /. In view of decomposition R n D C .A/˚C .A/ ? ; where˚refers to the direct sum, the rank of A ? is rank.A ? / D n rank.A/: The set of all matrices A ? is denoted as fA ? g and hence: Z 2 fA ? g " (a) A 0 Z D 0 and (b) rank.Z/ D n rank.A/: We immediately observe that Z 2 fA ? g " A 2 fZ ? g: Trivially A ? is unique only when A is a nonsingular square matrix in which case A ? D 0. Notice that A 2 R n m H) A ? 2 R n s ; where s n rank.A/: In this paper our purpose is to review various features of the ? operation, the "perp-operation", say, and in particular, to present several useful applications related to linear statistical models. Results covering the more general inner products are also considered. We believe that our review provides a useful summary of the ? operation and thereby increases the insights and appreciation of this, seemingly simple, operation.

A ? in terms of generalised inverses
The generalised inverses offer a very handy tool for explicit expressions of the A ? , and in this section we give a short tour into such possibilities. Matrix G 2 R m n is a generalised inverse of A 2 R n m if and it is the Moore-Penrose inverse, denoted as A C , if it also satisfies the following three conditions: (mp2) GAG D G ; (mp3) .AG/ 0 D AG ; (mp4) .GA/ 0 D GA : If G satisfies the condition AGA D A, we may denote G D A , or G 2 fA g. As the excellent references for the generalised inverses, see Ben-Israel & Greville [10] and Rao & Mitra [46]. In particular, for more of about the Moore (of Moore & Penrose), see Ben-Israel [11]. It is well known that the nullspace N .A/ can be expressed as where A can be any generalized inverse of A. Hence we can express C .A/ ? in terms of A : The last equality above follows from the fact Notice that it is a bit questionable to write .A / 0 D .A 0 / because (3) means the equality between two sets. However, for the (unique) Moore-Penrose inverse we always have .A C / 0 D .A 0 / C : In light of (2), we have, for example, the following choices for A ? (recalling that A 2 R n m ): where we have used the fact A.A 0 A/ 2 f.A 0 / g. By replacing .A 0 / with .A 0 / C in (4) and using we get I n AA C WD I n P A WD Q A 2 fA ? g : It can be shown that if G satisfies the conditions (mp1) and (mp3), i.e., G 2 fA 13 g then AG is unique and thereby AA 13 D AA C ; and hence I n AA 13 is one choice for A ? .
The notations P A and Q A in (5) refer to the orthogonal projectors onto C .A/ (with respect to the standard inner product) and C .A/ ? , respectively. Matrix P is defined as the orthogonal projector onto C .A/ if it satisfies the following conditions: P D P 0 D P 2 and C .P/ D C .A/;  [46,Lemma 2.2.4], which says that for nonnull A and C, the matrix product AB C is invariant with respect to the choice of the generalized inverse B if and only if C .C/ C .B/ and C .A 0 / C .B 0 /. Notice that AA is not necessarily an orthogonal projector: it is idempotent and it satisfies C .AA / D C .A/ but it is not necessarily symmetric.
Below is a summary of some of the expressions for A ? with obvious extensions to .A 0 / ? in terms of generalised inverses.
Obviously the orthogonal projector Q A D I n AA C is often a convenient choice for A ? because it is symmetric and idempotent.

Some specific formulas
Suppose that Z is a choice for A ? . Then, for a comformable matrix B, we have whenever rank.ZB/ D rank.Z/. According to Marsaglia In particular, choosing Q A as A ? yields where .A W B/ denotes the partioned n .m C q/ matrix.
In the next theorem we take a look at the perps of some particular partitioned matrices. To prove (b), we observe that Thus (b) is confirmed. Part (c) can be proved in the corresponding way.
However, expression like (8) is obviously problematic, and the meaning of the above notation should be clarified. One interpretation for (8) might be to agree that it means that In other words, the sets of matrices are identical. However, the statement (9) is incorrect as can be concluded by Theorem 3.5 below.
Let us ask the following: which matrices B 2 R n p and D 2 R q p satisfy the following: On the other hand, because we immediately obtain the following:

Two rank formulas and a decomposition of orthogonal projector
Two particular rank formulas in terms of the orthocomplement are worth special praising due to their numerous applications particularly when dealing with linear statistical models: the rank of the product A n a B a m and the rank of the partitioned matrix .A n a W B n b /.
Theorem 4.1. The rank of the partitioned matrix .A n a W B n b / can be expressed as and the rank of the matrix product A n a B a m is In terms of an arbitrary generalized inverse A , (10) can be expressed as As a reference to (10) and (12) [8] provide several expressions for the ranks of a product of two matrices and of a column-wise partitioned matrix as well as an extensive list of related references. Several applications of (10) and (11) appear in Puntanen, Styan & Isotalo [39,Ch. 5]. One example concerns the decomposition of the column space C .X W VX ? /, where X 2 R n p and V is an n n (symmetric) nonnegative definite matrix. Such a situation occurs when we consider the general linear model y D XˇC "; denoted as M D fy; Xˇ; Vg; where X is a known n p model matrix, the vector y is an observable n-dimensional random vector,ˇis a p 1 vector of unknown parameters, and " is an unobservable vector of random errors with expectation E."/ D 0; and covariance matrix cov."/ D V. Then we have the following; see, e.g., Rao [ (13). Then Moreover, if the model is correct, in which case it is called consistent, then the observed (realized) value of the random vector y satisfies For a discussion concerning the consistency concept, see, e.g., Puntanen & Styan [38], J.K. Baksalary, Rao & Markiewicz [5], Groß [16, p. 314], and Tian et al. [53]. In this paper, we assume that the corresponding consistency holds whatever model we have. When working with linear models, we often need to consider the orthogonal projector onto the column space of the partitioned matrix. Then the following theorem appears to be very convenient in various connections; see, e.g., Puntanen Theorem 4.3. The orthogonal projector (with respect to the standard inner product) onto the column space C .A n a W B n b / can be decomposed as In particular, if X 2 R n p and V n n is nonnegative definite, then where M D I n P X . Notice also that according to Theorem 4.3 we have P .XWV/ D P X C P MV and thereby 5 Orthocomplement when the inner product matrix is V

V is positive definite
Consider now the inner product in R n defined as hx; yi V D x 0 Vy ; where V is a positive definite symmetric matrix.
The orthocomplement of C .A n m / with respect to this inner product is By A ? V we will denote any matrix whose column space is C .A/ ? V . Recall that A ? I is shortly denoted as A ? . We have where the last equality can be concluded from Notice that corresponding to (1), It is easy to confirm that the answer is positive.
Now we have the following decomposition: and hence every y 2 R n has a unique representation as a sum for some b and c. The vector y D Ab is the orthogonal projection of y onto C .A/ along C .A/ ? V . The orthogonal projector P AIV is such a matrix which transforms y into its projection y , i.e., P AIV y D y D Ab. Its explicit unique representation is We may mention that part (a) of Theorem 3.2 holds even if the inner product matrix is V, i.e., Similarly Theorems 3.3 and 3.5 hold also when all orthocomplements are taken with respect to the inner product matrix V.

V is nonnegative definite, possibly singular
Let V be a singular nonnegative definite matrix. Then ht; ui V D t 0 Vu is a semi-inner product and the corresponding seminorm (squared) is ktk 2 V D t 0 Vt. For a singular nonnegative definite matrix V we can define the matrix A ? V again as any matrix spanning C .A/ ? V , and so

Some further considerations
Consider the linear model M D fy; Xˇ; Vg, defined as in (13), and let V be positive definite. Then we have observed that the following sets are identical: For (a), . . . , (f) above, see also Puntanen, Styan & Isotalo [39, §5.13]. When V is singular, the above considerations become more complicated. A very convenient tool appears to be the following class of matrices: In (17) U can be any p p matrix as long as C .W/ D C .X W V/ is satisfied. Of course, U can be chosen as 0 if C .X/ C .V/ which happens, for example, when V is positive definite. The set W of matrices has an important role in the theory of linear models. Below are listed some useful equivalent statements concerning W: X.X 0 W X/ X 0 W X D X for any choices of the generalized inverses involved.
Moreover, each of these statements is equivalent to C .X/ C .W 0 /, and hence to the statements (18b')-(18e') obtained from (18b)-(18e), by setting W 0 in place of W. As the references to (18) where W is an arbitrary (but fixed) generalized inverse of W. The column space C .VX ? / can be expressed also as C .VX ? / D C .W / 0 X W I n .W / 0 W 0 ? : Moreover, let V be possibly singular and assume that C .X/ C .V/. Then where the inclusion becomes equality if and only if V is positive definite.
Remark 5.3. It is of interest to note that the perp symbol ? drops down, so to say, very "nicely" when V is positive definite: but when V is singular we have to use a much more complicated rule to drop down the ? symbol: where W 2 W.
Remark 5.4. Let us next prove the following: If W 2 W, where W is defined as in (17), then We first observe that  Next we shortly consider a typical n p model matrix X partitioned as X D .1 W x 1 W : : : W x k / D .1 W X 0 /, and so p D k C 1. The sample covariance matrix of the x-variables is S xx D 1 n 1 X 0 0 CX 0 and the sample correlation matrix is R xx D OEdiag.S xx / 1=2 S xx OEdiag.S xx / 1=2 : While calculating the correlations, we assume that all x-variables have nonzero variances, that is, the matrix diag.T xx / is positive definite, or in other words: x i … C .1/; i D 1; : : : ; k: Theorem 4.1 implies then the following result: Theorem 6.2. The rank of the model matrix X D .1 W X 0 / can be expressed as If all x-variables have nonzero variances, i.e., the correlation matrix R xx is properly defined, then rank.R xx / D rank.S xx /. Moreover, the following statements are equivalent: For the rank of of the sample covariance matrix, see Trenkler [56]. As regards the geometry and linear models, the reader may take a look at Margolis [31], Herr [22], and Seber [49]. : : :

Estimability in a simple ANOVA
where n D n 1 C C n g . As the rank of the n .g C 1/ model matrix X is g we know thatˇis not estimable under A . Which parametric functions ofˇare estimable?
We recall that K 0ˇi s estimable if it has an unbiased linear estimator, say Ay with property E.Ay/ D AXĎ K 0ˇf or allˇ2 R p , i.e., AX D K 0 . Hence the parametric function k 0ˇi s estimable under A if and only if In view of part (c) of Theorem 3.2, one choice for Hence, according to (20), the parametric function k 0ˇi s estimable if and only if i.e., We can also study the estimability of a parametric function of 1 ; : : : ; g (dropping off the parameter ); denote this function as`0 . Then and on account of (21), the estimability condition for`0 becomes`01 g D 0.

Best linear unbiased estimator, BLUE
An unbiased linear estimator Gy for Xˇis defined to be the best linear unbiased estimator, BLUE, for Xˇunder the model M D fy; Xˇ; Vg if cov.Gy/ Ä L cov.Ly/ for all LW LX D X; where "Ä L " refers to the Löwner partial ordering. In other words, Gy has the smallest covariance matrix in the Löwner sense among all linear unbiased estimators. The following theorem gives the "fundamental BLUE equation"; see, e.g., Rao [40], Zyskind [58], J.K. Baksalary [1], and O.M. Baksalary & Trenkler [6,7].
Notice also that even though G in (22) may not be unique, the numerical observed value of Gy is unique (with probability 1) once the random vector y has obtained its value in the space C .X W VX ? /. The set of matrices G satisfying (22) is sometimes denoted as fP XjVX ? g.
Remark 6.4. At this point we may take a liberty to make a short side trip to the notation P AjB in the spirit of Rao [45] and Kala [25]. Supposing that C .A/ and C .B/ are (virtually) disjoint, then y 2 C .A W B/ has a unique representation as a sum y D y A C y B , where y A 2 C .A/, y B 2 C .B/. A matrix P which transforms every y 2 C .A W B/ into its projection y A is called a projector onto C .A/ along C .B/. It appears that the projector P WD P AjB onto C .A/ along C .B/ may be defined by the equation Moreover, Rao [45] showed that .P VAjA ? C P A ? jVA /z D z ; .P VA ? jA C P AjVA ? /y D y ; P AjVA ? y D .I n P 0 A ? IV /y ; We shall use the short notation H D P X ; M D I n H ; and thereby the ordinary least squares estimator .OLSE/ of Xˇis Hy; we will denote Hy D X Ǒ , where Ǒ is any solution to X 0 XˇD X 0 y. If X has full column rank thenˇis estimable and its OLSE is Ǒ D .X 0 X/ 1 X 0 y D X C y.
Characterizing the equality of the OLSE and the BLUE of Xˇhas received a lot of attention in the statistical literature, the major breakthroughs being made by Rao [40], Zyskind [58], and Kruskal [27]; for a review, see Puntanen & Styan [37], and for some special remarks, Markiewicz, Puntanen & Styan [33], and O.M. Baksalary, Trenkler & Liski [9]. Theorem 6.3 gives immediately several equivalent characterizations for the OLSE and the BLUE to be equal, some of them are collected in Theorem 6.5. Notice that then the equality between OLSE and BLUE occurs with probability 1 but in what follows, we drop off the phrase "with probability 1". Theorem 6.6. The general solution for G satisfying G.X W VX ? / D .X W 0/ can be expressed, for example, in the following ways: where F 1 : : : ; F 4 are arbitrary matrices, Q W D I n P W , and W 2 W, where W is defined as in (17).
In view of the consistency condition (14), we have y 2 C .W/ and hence the terms F i Q W y disappear with probability 1. We observe, for example, that When X has full column rank and V is positive definite, then Ǒ D .X 0 X/ 1 X 0 y and Q D .X 0 V 1 X/ 1 X 0 V 1 y while the corresponding covariance matrices are On the other hand, in light of (23) we have It is interesting to note that in (25) the covariance matrix V need not be positive definite. If V is positive definite, then combining (24) and (25) yields the following: The matrix P M is very handy in many connections related to linear model M D fy; Xˇ; Vg. For example, the ordinary, unweighted sum of squares of errors SSE is defined as while the weighted SSE is (when V is positive definite) In the general case, the weighted SSE can be defined as where W D V C XUX 0 , with C .W/ D C .X W V/. Then, again,  [54,55], and Hauke, Markiewicz & Puntanen [21].
for some a 2 R, and matrices N 1 and N 2 such that V 2 is nonnegative definite.

The reduced model
Let us consider the partitioned linear model M 12 D fy; X 1ˇ1 C X 2ˇ2 ; I n g, where X D .X 1 W X 2 / has full column rank, X 1 2 R n p 1 , X 2 2 R n p 2 , p D p 1 C p 2 . In light of the projector decomposition (15), we have H D P .X 1 W X 2 / D P X 1 C P M 1 X 2 ; where M 1 D I n P X 1 and thereby Premultiplying (27) by M 1 gives In view of (11), rank.M 1 X 2 / D rank.X 2 / D p 2 , and hence the left-most M 1 X 2 can be cancelled from (28) and thus we obtain Premultiplying the model M 12 by the orthogonal projector M 1 yields the reduced model Taking a look at the models, we can immediately make an important conclusion: the OLS estimators ofˇ2 under the models M 12 and M 12 1 coincide: The equality (30)   Let X D .X 1 W X 2 / have full column rank, and C .X/ C .V/, but V is possibly singular. Then it appears that corresponding to (29) we have

Best linear unbiased predictor, BLUP
Let y f denote a q 1 unobservable random vector containing new future observations. The new observations are assumed to follow the linear model y f D X fˇC " f , where X f is a known q p matrix,ˇis the same vector of unknown parameters as in M D fy; Xˇ; Vg, and " f is a q-dimensional random error vector associated with new observations. Then For brevity, we denote The linear predictor By is said to be unbiased for y f if E.y f By/ D 0 for allˇ2 R p . This is equivalent to BX D X f . Now a linear unbiased predictor By is the best linear unbiased predictor, BLUP, for y f , if the Löwner ordering cov.y f By/ Ä L cov.y f Fy/ holds for all F such that Fy is an unbiased linear predictor for y f .
The following theorem characterizes the BLUP; see, e.g., Christensen [ whereˇis a vector of fixed parameters and a vector of random ones, with the known covariance matrices cov."/ D R and cov. / D D, and expectations E."/ D 0, E. / D 0. We assume that the random effect and error term " are uncorrelated and thereby cov.y/ D ZDZ 0 C R D˙, say. Taking as the "new observation" it is easy to conclude, in view of Theorem 6.10, that the following holds. where bothˇand are fixed (but unknown) coefficients, and supplement F with the stochastic restrictions y 0 D C " 0 , where cov." 0 / D D: This supplement can be expressed as the partitioned model: We will need the matrix X ? for which, according to part (b) of Theorem 3.2, one choice is I n Z 0 M; where M D I n P X , and so we have Using (35) Haslett & Puntanen [20,Th. 1] show that all properties of BLUEs and BLUPs in mixed model L can be considered using the augmented model F , where bothˇand are fixed parameters. Using the connection between the mixed model L the augmented model F , the following result follows from Theorem 6.8 immediately.
Theorem 6.12. Consider two mixed models: L i D fy; XˇC Z ; D i ; R i g ; and denote˙i D ZD i Z 0 C R i and V i D R i 0 0 D i ; i D 1; 2: Then every representation of the BLUE for Xˇunder L 1 remains the BLUE for Xǔ nder L 2 and every representation of the BLUP for under L 1 remains the BLUP for under L 2 if and only if any of the following equivalent conditions holds: (a) Every representation of the BLUE for X under F 1 remains the BLUE for X under F 2 .
(e) The matrix V 2 can be expressed as for some a 2 R and matrices N 1 and N 2 such that V 2 is nonnegative definite.