Multivariate Normal Distribution

The lecture notes attached here accompany video lectures by the Author Follow the links: 
Lecture 1: https://www.youtube.com/watch?v=HBDNlqYdkjk 
Lecture 2: https://youtu.be/-KjA-MFeZ2A 
Lecture 3: https://youtu.be/krkQLyo7tRc 
Lecture 4: https://youtu.be/1xLiN9KWtEM 
Lecture 5: https://youtu.be/qk3is5w9ORk


Review of univariate normal distribution
We say that Y follows a normal distribution, that is, Y ∼ N(µ, σ 2 ), if the pdf of Y is We can show that E(Y) = µ and Var(Y) = σ 2 . PDF and CDF of normal distribution N(µ, σ 2 ) for different values of µ and σ 2 are shown in Figure 1.

Some properties
There are some basic properties of normal distribution: • Standard Normal Distribution: Z ∼ N(0, 1). Any normal random variable Y ∼ N(µ, σ 2 ) can be standardized using • The function φ(·) is often used to denote the pdf of the standard normal distribution: ST 437/537 multivariate normal distribution 3 • Any normal distribution can be created from a standard normal distribution using Y = µ + Zσ. Specifically, if Z ∼ N(0, 1) then µ + Zσ ∼ N(µ, σ 2 ).
• Each interval has an associated probality, see Figure 3 for some examples.

Assessing univariate normality
We can use graphical as well as hypothesis testing techniques to assess wheather the normality assumption is reasonable for a dataset. A common graphical technique to check for normality is to create a normal quantile-quantile plot (Q-Q plot).

Normal quantile-quantile plot
A scatterplot of the sorted data,  Note that the theoretical quantiles of a N(0, 1) distribution are ploted in the x-axis. Since the plot is fairly linear, normality assumption seems reasonable in this case.
We can also employ formal statistical tests to check for normality.
• Shapiro-Wilks test, and Shapiro-Francia test: the later test is a simplification of the former; they show similar power to each other. These two are among the more powerful normality tests.
• Since the p-value is large (e.g., larger than 5%), we can say that normality assumption for the data is pausible.

Bivariate and Multivariate normal distributions
The random vector X 2×1 = (X 1 , X 2 ) T follows a bivariate normal (Gaussian) distribution with mean vector µ = (µ 1 , µ 2 ) T and variancecovariance (positive definite) matrix Σ and denoted as X ∼ N 2 (µ, Σ) if its probability density function is 3 x 1 x 2 d e n s it y The shape of the PDF (and that of the scatterplot of a random sample generated from the distribution) is determined by Σ, the variance-covariance matrix of X. An easy was to visualize the PDF of a bivariate distribution is to plot the constant probability density contours.

Constant probability density contours
We define the constant probability density contour (also called constant-density contour) of a bivariate normal PDF to be the set of vectors x such that f (x) is constant, that is, for a specific c. These sets are ellipses that are centered around µ, and the major and minor axes are c √ λ i e i , where λ i are the eigenvalues and e i are the corresponding eigenvectors of Σ.
More generally, a random vector X = (X 1 , . . . , X p ) T is said for follow a multivariate normal distribution N p (µ, Σ), where µ is a p × 1 vector and Σ is positive definite matrix, if the PDF of X is We can show that E(X) = µ and that cov(X) = Σ. • Zero covariance implies the compoments of X are independent (ONLY when X is multivariate normal) • When µ = 0 p and Σ = I p , we say that we have a standard multivariate normal distribution, Z ∼ N p (0 p , I p ). 4 4 Compare with univariate standard normal distribution: Z ∼ N(0, 1).
• All subsets of X also follow multivariate normal distribution.
• If X follows a multivariate normal distribution, then any linear

Mahalanobis distance
The quantity is called the Mahalanobis squared distance between x and µ.
Using the last property, we can compute the probability observing data within any constant-density contours. Specifically, consider the constant-density ellipse where G p (c) is the CDF of a χ 2 p distribution. Figure 8 shows 50% and 90% contours below for two bivariate normal distributions.

Sampling distribution ofX and S
Recall that for univariate normal distribution, if X 1 , . . . , X n form a random sample from N(µ, σ 2 ), then X ∼ N(µ, σ 2 /n), and n − 1 where S 2 is the sample variance. We also know that X and S 2 are independent.
We have similar results for multivariare normal distribution.

Exact distribution ofX and S
Suppose X 1 , . . . , X n form a random sample from a N p (µ, Σ) distribution. Then •X has a N p (µ, Σ/n) distribution. • (n − 1)S has a Wishart distribution with n − 1 degrees of freedom (a generalization of χ 2 distribution). •X and S are independent.
Large sample results analogous to univariate normal also exist. Recall that if X 1 , . . . , X n form a random sample from N(µ, σ 2 ), then Central Limit Theorem (CLT) says when n is large enough X approximately has a N(µ, σ 2 /n) distribution.
Similar results hold for multivariate normal distribution.

Large sample results
Suppose X 1 , . . . , X n form a random sample from a population (can be different from normal) with mean µ and covariance matrix Σ. When the sample size n is large, •X has an approximate N p (µ, Σ/n) distribution (multivariate CLT). • (X − µ) T S −1 (X − µ) has an approximate χ 2 p distribution (also need n − p large; note that we replaced Σ with S).

Checking multivariate normality
Many of the techniques typically used in multivariate statistics assume that the parent distribution is multivariate normal or that the sample size sufficiently large (in which case the normality assumption is less crucial). However, the quality of the inferences relies on how close the parent distribution is to the multivariate normal. Thus it is essential to validate the normality assumption.
It is difficult to assess multivariate normality. In practice, we investigate the univariate and bivariate distributions to determine how close they are to normality. We describe a few steps for checking multivariate normality below.

Check univariate normality
Usual univariate analysis for each variable, such as normal Q-Q plot and statistical tests for normality can be done. Recall, if X is multivariate normal, then each component is univariate normal as well. If we reject normality for one of the variables, then X can not be multivariate normal.
Let us consider the lumber stiffness dataset 6 where four measures 6 of stiffness x 1 , . . . , x 4 are measured of each of the n = 30 boards.
In general, just checking univariate plots is not enough. Even if individual variables are normally distributed, their joint distribution may not be multivariate normal.

Check scatterplots
If the data indeed are generated from a normal distribution, the constant-density contours must be ellipses. Thus, the scatterplots should also conform to this structure. Creating scatterplots and pairs-plot (pairwise scatterplots) of the variables will also reveal any unusual shape (or outliers) in the data set.
The R function pairs() can be used to create pairwise scatterplots. The pairs-plot of the dataset is shown in Figure 11   Overlaying "data ellipses" (constant-density contours estimated from the data assuming normality) on top of scatterplots are useful in this situation. The data ellipses can be drwan using the dataEllipse function in the car package. Figure 12 shows the 50% and 90% data ellipses overlayed on the scatter plot of X 2 vs. X 1 .  By default, the 50% and 95% ellipses are drawn. See the documentation using ?dataEllipse for more customization options. We can see from Figure 12 that the data cloud does have an elliptical shape. However, there is one point that might be an outlier.

Construct a chi-square plot
Given sample data x 1 , . . . , x n , the chi-square plot is constructed using the following steps: • For each i, compute the Mahalanobis squared distance wherex and s are observed values of the sample mean and covariance matrix, respectively.
ST 437/537 multivariate normal distribution 12 • If the data are indeed generted from a normal distribution, then the d 2 i values should follow a χ 2 p (in our example, p = 4) distribution. Thus, we plot the ordered d 2 i values, against the theoretical quantiles of the χ 2 p distribution If the multivariate normality assumption is correct, then the points should follow a straight line. A systematic curved pattern will suggest a departure from normality. One or two points that show large deviations from the linear trend might be outliers and would warrant further investigation.
A function to create such a chi-square plot is shown below. It seems, for the most part, the chi-square plot indeed shows a linear pattern. However, there are one or two points (upper right corner; marked by red circles) show deviation from the linear trend. These points may indicate that there are outliers.

Outlier detection
Outliers can be viewed as unusual data points that do not seem to follow the pattern of variability produced by other observations. Univariate outliers can be detected using a dot plot or boxplot. However, it might be more complicated for multivariate data. The chi-square plot describes above can also be used for outlier detection.
In case that there are suspected outliers, we should inspect the data points corresponding to the top few distance values. We would like to see in what manner the outliers differ from the rest of the dataset. Thus, along with the actual data points, it is also useful to inspect the z-scores for each variable. Recall that, if the assumption of multivariate normality is reasonable, then z-scores of each variable should follow a standard normal distribution. We expect roughly 7 For example, a scatterplot of X 1 and X 2 in Figure 16. However, observation 16 is hidden within the data cloud and is only visible in the chi-square plot. In contrast, observation 9 is easy to notice since it is visible in scatterplots as well. This is because even though the observation follows the overall pattern on the plot (there seems to be a linear relationship between X 1 and X 2 , and observation 9 does conform to the relationship), the z-scores are very large in magnitude for all the four measures of stiffness.
Once we find an outlier, we must try to access the real specimens and re-examine them whenever possible to determine the reason behind the unusual observations.

Bivariate boxplot
We discuss two extensions of the univariate boxplot to the bivariate situation. A bivariare analogue of the usual boxplot is proposed by Goldberg and Iglewicz (1992 package implements this method.
Let us look at the variables x 1 and x 2 from the lumber stiffnes data discussed before. A bivariate boxplot is shown in Figure 17. The bivariate boxplot consists of the following: • Two concentric ellipses, the inner ellipse (called the "hinge") contains 50% of the data, and the outer ellipse (called the "fence") determines potential outliers. These ellipses are drawn based on robust measures of location, scale, and correlation, and a constant, D, that determines the distance of the fence from the hinge. Goldberg and Iglewicz (1992) propose to use D = 7 so that the outer ellipse forms an approximate 99% confidence bound.
• Resistant (robust) regression lines of both y on x and x on y are drawn. Their intersection shows the location estimator.
It seems observation 9 is an outlier. However, observation 16 is on the fence.

Bagplot
Another bivariate extension of the usual boxplot, called bagplot, has been suggested by Rousseeuw, Ruts and Tukey (1999 bagplot() function in the aplpack package implements this method. Figure 18 shows a bagplot of X 1 and X 2 . The bagplot is based on the concept of halfspace location depth of a point relative to a bivariate dataset, which extends the univariate concept of rank. The plot consists of the following: • An inner convex polygon, called the "bag," containing 50% of the data points (with the largest depth).
• The outer polygon, called the "fence" is created by magnifying the bag by a factor of three. The fence separates inliers from outliers. The fence is not plotted, but the outliers are plotted in red. The observations between the bag and the fence are shown using a lighter color.
The bagplot visualizes the location, spread, correlation, skewness, and tails of the data. It is not limited to elliptical (e.g., multivariate normal) distributions.