Transl Clin Pharmacol. 2016 Jun;24(2):66-73. English.
Published online Jun 14, 2016.
Copyright © 2016 Translational and Clinical Pharmacology
Review

Statistical basis for pharmacometrics: random variables and their distribution functions, expected values, and correlation coefficient

Kyungmee Choi
    • Division of Mathematics, College of Science and Technology, Hongik University at Sejong, Jochiwon, Sejong 30016, South Korea.
Received February 22, 2016; Accepted April 29, 2016.

It is identical to the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/).

Abstract

For pharmacometricians, probability theory is the very first obstacle towards the statistics since it is solely founded on mathematics. The purpose of this tutorial is to provide a simple version of introduction to a univariate random variable, its mean, variance, and the correlation coefficient of two random variables using as simple mathematics as possible. The definitions and theorems in this tutorial appear in most of the statistics books in common. Most examples are small and free of subjects like coins, dice, and binary signals so that the readers can intuitively understand them.

Keywords
Probability; Expected values; Moment generating function; Correlation coefficient

Introduction

When the population is too big to study all, it is necessary to randomly sample a proper size of data in order to figure out its distribution including center and dispersion (Fig. 1). Let us define two important parameters µ and σ2 as the true mean and variance of the population which are unknown unless the whole population is studied. When µ and σ2 are estimated from the sample, the probability theory provides their estimators with the theoretical background like how close the estimators are to the true values.

Figure 1
The probability density function of the population and the histogram of sample.

Suppose X1,X2, ..., Xn are a random sample of size n. Since each of them is supposed to be independent and identically represent the population while each observation Xi,i=1, 2, ···, n varies. The histogram of a sample is supposed to predict the distribution curve of the population. As an analogy of the population mean and variance, we can define the sample mean and variance S2 as follows:

X¯=1ni=1nXi
S2=1n-1i=1nXi-X¯2
For the notation of estimator, we often use hats such as µ̂ = , σ̂2 = S2. Naturally we expect that converges to µ and S2 converges to σ2 as n is large enough. Then, the probability theory provides the mathematical basis to those asymptotic properties.

Whenever the readers want more details, they can refer to the following excellent literatures on mathematical statistics: Hogg and Craig 4th ed. (pp 1-149),[1] Hogg and Tanis 3rd ed. (pp 1-60),[2] Kim et al. 4th ed. (pp 1-93),[3] Mood 3rd ed. (pp 1-84),[4] Peebles 4th (pp 1-202),[5] Rice 3rd ed. (pp 1-176),[6] Rosner 6th ed. (pp 1-121),[7] and Ross 8th ed.[8]

In this tutorial, we start from defining the probability. The sections are composed of probability, expectations, the mean and the variance, the moment generating function, and finally the correlation coefficient. Still, the readers are encouraged to review derivative and integral of polynomial function and exponential funcion if they want to follow the examples of continuous cases. Otherwise, they can skip those examples. The contents are proper for a lecture of two or three hours. The readers who want more subject related examples can look into the biostatistics book by Rosner.[7] This tutorial focuses on mathematics and logic.

Probability

Let us first define some words in view of experiments as follows [2]:

  • Experimental unit: an object such as person, thing, or event about which we collect data.

  • Population: a set of units that we want to study

  • Sample: a subset of a population

  • Trial: one observation in an experiment

  • Outcome: the result of a trial

  • Sample space S: a set of all possible outcomes

  • Event A: AS, a subset of the sample space S

  • A and B are exclusive events if AB = 0

Here are some examples.

Example 1 Sample space and events

Suppose we toss a die and observe the upface number. "Toss a die" is a trial, "the number of the upface" is an outcome. The sample space is S = {1, 2, 3, 4, 5, 6}. Let the event A be multiples of 3 in a die toss, and the event B be the minimum outcome. Then A = {3, 6}, B = {1}, and the two events A and B are exclusive.

Let us now define the probability of an event A, P(A).

Definition 1 For a given event A the probability is defined by

PA=nAnS
where n(A) is the size or number of outcomes of the event A.

Then the three axioms of probability hold for P.

Axiom 1 Three axioms of probability

  • P(A) ≥ 0

  • P(S) = 1

  • P(AB) = P(A) + P(B) if AB = 0

The associated properties are as follows:

  • P(Ac) = 1 - P(A)

  • P(AB) = P(A) + P(B)-P(AB)

Definition 2 For A, B ⊆ S, the conditional probability P(A|B) is said to be the probability of the event A given that the event B has occurred and is defined by

PA|B=PABPB
if P(B) > 0.

Example 2 Probability and conditional probability

Suppose we roll a die and observe the number of the upface. Then S = {1, 2, 3, 4, 5, 6}. Let A = {1} and B = {1, 3, 5}, a set of odd numbers, and C = {2, 3}. Then, P(A)=16, P(B)=12, P(C)=13,P(BC) = 1/6 .

P(BC) = P(B) + P(C) - P(BC) = 2/3
If we have an information that the outcome is odd, then P({1}|odd) is one of the three. Thus
PA|B=nABnB=n1n1,3,5=13
Note that we divide n(AB) by n(B) because the sample space S becomes B. Here,[2]
PA|B=nABnB=nAB/nSnB/nS=PABPB
Therefore,
PA|B=PABPB

From the definition of the conditional probability, the multiplication rule is obtained.

Theorem 1 Multiplication rule

P(AB) = P(A)P(B|A) = P(B)P(A|B)

Example 3 Sensitivity and specificity [7]

Conditional probability is very useful in screening tests. Suppose we have a table of a new test result for a disease. There are two widely used conditional probabilities to measure test abilities.

  • Sensitivity=True Positive(TP) rate=P(+|D)=TPD=1114

  • Sensitivity=True Negative(TN) rate=P(-|No D)=TNNo D=8486

There are two types of error probabilities False Positive (FP) rate and False Negative (FN) rate as follows:

  • FP rate=P(+|No D)=FPNo D=286

  • FN rate=P(-|D)=FND=314

Let us also think about the conditional probability of having the disease given the + test result.

PD|+=PTPP(+)=1113
Similarly, the conditional probability of having the disease given the - test result is as follows:
PD|-=PFNP(-)=387
The last two probabilities are related to the famous Bayes' formula which we want to skip its derivation. Let us still take a look at an interesting small example in a signal trasmission problem through a channel.

Example 4 Bayes' formula. [4]

Let us transmit a binary signal through a channel. There are four cases: (1) send 0 and receive 0, (2) send 0 and receive 1, (3) send 1 and receive 0, (4) send 1 and receive 1. Let us dene events S0 as send 0, S1 as send 1, R0 as receive 0, R1 as receive 1. Suppose that P(S0) = 0.3, P(S1) = 0.7, and P(R0|S0) = P(R1|S1) = 0.99, P(R1|S0) = P(R0|S1) = 0.01. Then what is the probability that the received signal 0 is the true signal? Then we need to calculate

PS0|R0=PS0R0/PR0=PS0PR0|S0PS0PR0|S0+PS1PR0|S1=0.30.990.30.99+0.70.01=0.9769737
Note that there are only two cases of receiving 0 for the denominator: (1) send 0 and receive 0 given that 0 is sent (2) send 1 and receive 0 given that 1 is sent.

If events A and B are independent each other, then the conditional probability of A given B would not depend on whether B has occurred or not. In other words, the occurrence of B may not change the probability of the occurrence of A. The concept of independence is very important in statistics because the assumption of independence makes all calculation much simpler and easier.

Definition 3 Events A and B are independent if P(A|B) = P(A) or P(B|A) = P(B).

From the definition of conditional probability, the following important theorem holds right away.

Theorem 2 Events A and B are independent if and only if P(AB) = P(A)P(B).

Proof. P(A|B) = P(AB)P(B) = P(A), so P(AB) = P(A)P(B).

Example 5 independent events [4]

Suppose we have a signal transmission system composed of two parts connected in parallel: upper part (UP) and lower part (LP). The UP has one router and the LP has two routers serially connected: 1 in the UP and 2 and 3 in the LP (Fig. 2). Let Ri be the ith router failure, and assume that each failure is independent with the failure probabilities P(R1) = 0.005, P(R2) = P(R3) = 0.008. What is the failure probability of transmitting a signal from a to b?

Figure 2
The three routers in a communication system.

    P(transmission failure of a signal)
  = P(UP fails ∩ LP fails)
  = P(UP fails)P(LP fails)
    by independence of UP and LP
  = P(R1)P(R2R3)
  = P(R1)(P(R2) + P(R3) - P(R2R3))
  = P(R1)(P(R2) + P(R3) - P(R2)P(R3))
    by independence of the three routers's failure
  = (0.005)(0.008 + 0.008 - (0.008)2)
  = 0.00007968
  

A random variable and its probability density function

Let us now define a random variable (rv) X as a function from the sample space S to a set of real numbers R with its corresponding probability density function (pdf).

Definition 4 X(s) : SR for sS. The probability density function (pdf) for a discrete rv X is defined by

f(x) = P(X = x)

Example 6 a rv and its pdf

Let X be the number of heads in two coins. Then

S = {HH, HT, TH, TT}
X(HH) = 2, X(HT) = X(TH) = 1, X(TT) = 0

The pdf is uniquely defined for a rv. Note that the difference between a variable and a random variable depends on whether or not it has its corresponding pdf. For example, in the function for a simple line y = a+bx, both x and y are variables since their values change, but not random variables since they don't have corresponding pdf's. The rv is called discrete if its values are countable or continuous if its values are not countable.

Definition 5 For f(x) = P(X = x) to be the pdf of a discrete rv, it should satisfy the following two conditions:

fx0, xfx=1
Calculate the probability of event A as follows:
PXA=xAfx
For a discrete rv, the probability mass function is more widely used than pdf. In this tutorial, we will stick to pdf just for convenience.

Example 7 the pdf of a discrete rv

Let fx=x6, for x = 1, 2, 3 and 0 otherwise. Then PX=3=12, P23<X<94=f1+f2=16+26=12.

Definition 6 For f(x) to be a pdf of a continuous rv it should satisfy the following two conditions:

fx0, -fxdx=1
Calculate the probability of set A as follows:
PXA=Afxdx
Note that the area under the curve f(x) is 1 and P(A) is the area under the curve f(x) for xA. Also P(X = x) = 0 since the area at a point is zero, so that
PX<x=PXx=-xfvdv

Example 8 the pdf of a continuous rv

For f(x) = cx, 0 < x < 1 to be a pdf, what is c?

01cxdx=c2x201=c2=1,
therefore c = 2.
P1/3<X<1/2=13122xdx=x21312=14-19=536

Definition 7 The cumulative distribution function (cdf) of a rv X is defined by

F(x) = P(Xx)
The distribution function is often used instead of cdf.

Example 9 the cdf of a discrete rv

Suppose a rv X has a pdf P(X = 0) = 0.3, P(X = 1) = 0.2, P(X = 2) = 0.5.

Then its cdf is as follows (Fig. 3):

Fx=0,x<00.3,0x<10.5,1x<21,2x

Figure 3
(A) the pdf of Example 9 (B) cdf of Example 9 (C) the pdf of Example 10 (D) the cdf of Example 10.

Example 10 the cdf of a continuous rv

Suppose a rv X has a pdf f(x) = 1, 0 < x < 1. Then its cdf is as follows (Fig. 3):

Fx=0,x<0x,0x<11,1x

From the two examples, we can easily derive the following properties of the cdf F(x):

  • F(-∞) = 0, F(∞) = 1, 0 ≤ F(x) ≤ 1

  • If x1 < x2 then F(x1) ≤ F(x2). That is, F(x) is increasing, but not strictly increasing. We say the cdf is non-decreasing.

  • P(X = x) = F(x+) - F(x-), where F(x+) is the right limit and F(x-) is the left limit of F(x). There could be a set of X values corresponding to a value of F(x). We say the distribution function is right-continuous.

  • P(X > x) = 1 - F(x)

  • F(x1 < Xx2) = P(Xx2) - P(Xx1) = F(x2) - F(x1)

  • Let X be a continuous rv with pdf f(x) and differentiable cdf F(x). Then Fx=-xfvdv and fx=ddxFx which follows from the fundamental theorem of calculus and the defini- tions of the pdf and the cdf.

Let us extend it to the two dimensional sample space, where we have two random variables X and Y.

Definition 8 The joint pdf of the two discrete random variables X and Y is defined by

f(x, y) = P(X = x, Y = y),
which is positive and should satisfy
xyf(x,y)=1
For the event A, P(A) can be evaluated by
PX,YA=x,yAfx,y
For the discrete random variables X and Y with the given joint pdf f(x, y), the marginal pdf's fX(x) and fY(y) are defined by
fXx=yfx,y, fYy=xfx,y
If a vector (x, y) is given, then a value of the joint pdf f(x, y) is determined since it is a bivariate function. In other words, one joint pdf f(x, y) corresponds to a vector (X, Y) where both X and Y are random variables.

For continuous random variables we replace ∑ with ∫.

Definition 9 For the function f(x, y) to be the joint pdf of two continuous random variables X and Y, it should satisfy the following two conditions:

fx,y0, --fx,ydxdy=1
For calculation of the probability of the set A, use
PX,YA=Afx,ydxdy
Its marginal pdfs are defined by
fXx=-fx,ydy, fYy=-fx,ydx
More details about the joint distribution functions F(x, y) = P(Xx, Yy) are found in Peebles.[5]

Theorem 3 Two random variables X and Y are independent if and only if

f(x, y) = fX(x)fY(y)
Proof. Let us consider the discrete case only. X and Y are independen if and only if f(x, y) = P(X = x, Y = y) = P(X = x)P(Y = y) = fX(x)fY(y).

Example 11 independence of discrete random variables

Suppose we flip a coin twice and define the random variables X and Y as follows:

X = the number of heads in the first flip
Y = the number of heads in the two flips
Then
S = {HH, HT, TH, TT}
 X(TT) = X(TH) = 0, X(HT) = X(HH) = 1
Y(TT) = 0, Y(HT) = Y(TH) = 1, Y(HH) = 2
Therefore the joint pdf is given as the following table.

Let us calculate some probabilities.

PX1,Y1=f1,1+f1,2=14+14=12
PX<Y=f0,1+f0,2+f1,2=12
To check the independence of X and Y, we calculate P(X = 0, Y = 0) and P(X = 0)P(Y = 0) and compare them.
PX=0,Y=0=14PX=0PY=0=12·14=18
Thus, X and Y are not independent.

Example 12 independence of continuous rv's

The joint pdf f(x, y) of two continuous random variables X and Y is given by

fx,y=12e-x-y2 x>0,y>0
Then the marginal pdfs are obtained as follows:
fXx=120e-xe-y2dy=12e-x[-2e-y2]0=e-x, x>0
fYy=120e-xe-y2dx=12e-y20exdx=12e-y2, y>0
Since f(x, y) = fX(x)fY(y), X and Y are independent.

The mean and the variance

Let us now consider how we can estimate the center of the population distribution. The top priority estimator is the population mean µ mentioned at the beginning. The µ is the simple average of the whole observations for a rv X in the population, which is said to be the expected value of a rv X. Since we often do not know the whole observations for X and we assume its pdf, we need to define the expected value of a rv X based on its pdf.

Definition 10 The expected value of a rv X is defined by

µ=EX=xxfxfor a discrete r.v.-xfxdxfor a continuous r.v.
If X is a rv, then a function g(X) is also a rv. Thus the following theorem holds immediately.

Theorem 4 For a rv X with its pdf f(x), the expected value of g(X) is obtained by

EgX=xgxfxfor a discrete rv-gxfxfor a continuous rv

Let us look at a very simple example.

Example 13 µ of a rv

Let a rv X have a pdf as the following table.

Then

µ = E[X] = (-1)(0.2) + (0)(0.6) + (1)(0.2) = 0.
Now let us define another rv Y which is a function of X. Let
Y = X2
Then
EX=EX2=xx2fx=-120.2+020.6+120.2=0.4

Due to the linearity of summation and integration, the following properties can be derived.

Theorem 5 Let a rv X have the pdf f(x).

  1. E[a] = ∑x af(x) = a

  2. E[a + bX] = a + bE[X] (Linearity of the expectation)

Proof of 2.

Ea+bX=xa+bXfx=axfx+bxxfx=a+bEX

As a measure of the data dispersion from the center µ, the variance σ2 is used.

Definition 11 The variance of a rv X with its pdf f(x) is defined by

σ2=VarX=EX-µ2=xx-µ2fx-x-µ2fxdx
One easy way of calculating σ2 is
σ2 = E[(X - µ)2] = E[X2 - 2µX + µ2] = E[X2] - 2µE[X] + E2] = E[X2] - µ2
The standard deviation σ is defined by σ2. Some useful properties of Var(X) are as follows:

  • Var(a) = 0 since µ = E[X] = a and E[X2] = a2.

  • Var(aX + b) = E[((aX + b) - (aµ + b))2] = E[a2(X - µ)2] = a2Var(X)

  • For a standardized rv Z=X-µσ,E[Z] = 0, Var(Z) = 1

Example 14 the mean and the variance of a rv

Let a rv X have a distribution as the following table.

Then, µ = E[X] = 0, E[X2] = 0.4, σ2 = E[X2] - (E[X])2 = 0.4

Example 15 the expected values of a tranformed rv

Let E[X] = µ, Var(X) = σ2,Z=X-µσ, and T = 5Z + 1. Then, E[T] = 1, Var(T) = 52.

Let us now define the expectation of two random variables.

Definition 12 Let the rv's X and Y have a joint pdf f(x, y). Then, for the function r(X, Y),

ErX,Y=xyrx,yfx,yrx,yfx,ydxdy
As before the linearity of the expectation holds because of the linearity of summation and integral.

Theorem 6 Let the rv's X and Y have the joint pdf f(x, y). Then

  1. E[ag(X + bh(Y)] = aE[g(X] + bE[h(Y)] (Linearity of the expectation)

  2. If X and Y are independent, then E[g(X)h(Y)] = E[g(X)] E[h(Y)]

Proof of 1.

EagX+bhY=xyagx+bhyfx,y=axygxfx,y+bxyhxfx,y=aEgX+bEhY
The expectation of the linear combination of g(X) and h(Y) is the linear combination of E[g(X)] and E[h(Y)].

Proof of 2.

EgXhY=xygxhyfx,y=xygxhyfXxfYybecause of independence=ygxfXxyhxfYy=Eg(X)Eh(Y)
The expectation of product of g(X and h(Y) is the product of each expectation E[g(X)] and E[h(Y)]. We will take a look at the examples at the last section.

Moment generating function

In order to measure the location center µ and the dispersion of the distribution σ2, we need E[X] and E[X2], which we call the first moment and the second moment of the distribution.

Definition 13 The two types of moments of a rv X are defined by

mn = E[Xn],
where m1 = µ, the population mean and
µn = E[(X-µ)n],
where the variance is σ2 = µ2, the skewness of the distribution is µ33, and the kurtosis, thickness of the distribution tails, is µ44 - 3.

We define a moment generating function from which we can generate moments of the distribution.

Definition 14 The moment generating function (MGF) of a rv X is given by

MX(t) = E[etX]
Then
M'X(t) = E[(etX)'] = E[XetX], M'X(0) = E[X]
Similarly, the nth derivative of MGF at 0 generates the nthe moment of X
MXn0=EXn
Since a MGF corresponds to a rv as in pdf, the MGF uniquely defines the distribution of a rv.

Example 16 the MGF of a discrete rv

Let us consider the previous rv X that has a pdf as the following table.

Then its MGF is

MX(t) = 0.2e-t + 0.6 + 0.2et
and
M'X(t) = -0.2e-t + 0.2et, M'X(0) = E[X] = 0
M''X(t) = 0.2e-t + 0.2et, M''X(0) = E[X2] = 0.4, Var(X) = 0.4

Example 17 the MGF of a continuous rv

Suppose a rv X have pdf f(x) = e-x, x > 0. Then its MGF is for t < 1,

MXt=0e-xetxdx=0e-1-txdx=-e-1-tx1-t0=-limxe-1-tx1-t+11-t=11-t
since limxe-1-tx for t < 1. After differentiating MX(t), we get
MXt=11-t2, MXt=21-t3
M'X(0) = E[X] = 1, M''X(0) = E[X2] = 2
Therefore,
µ = E[X] = 1, σ2 = Var(X) = E[X2] - (E[X])2 = 1
For the calculation, the readers should be familiar with the following integral and derivative: ∫eaxdx = (1/a)eax + C, (f/g)' = (f'g-fg')/g2.

Correlation coefficient

Two random variables X and Y can be correlated. For example, Y increases as X increases, or vise versa. As a measure of their relationship, covariance and correlation coefficient are often used.

Definition 15 The covariance of rv's X and Y are defined by

Cov(X, Y) = E[(X-E[X])(Y-E[Y]) = E[XY] - E[X]E[Y]
If both X and Y increase, then the covariance is positive. If Y decreases as X increases, then the covariance is negative. If X and Y are independent, then E[XY] = E[X]E[Y]. Thus, Cov(X, Y) = 0.

Theorem 7 Let X and Y be two random variables. Then

Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)
The readers are recommended to prove this theorem. If X and Y are independent, then
Var(X + Y) = Var(X) + Var(Y)

Note that the amount of covariance depends on the measurement units of X and Y, which means that the covariance in km is greater than that in cm. In order to make the covariance be free of measurement units, we divide it by the standard deviations of the two random variables.

Definition 16 The correlation coefficient of the two random variables X and Y is defined by

ρX,Y=CovX,YσXσY, -1<ρ<1
Then ρ does not depend on the measurement units and ρ = 0 if X and Y are independent.

Example 18 The correlation coefficient between two discrete random variables

Let us go back to Example 11 of flipping two coins.

X = the number of heads in the first flip
Y = the number of heads in the two flips

Then,

EX=012+112=12, EX2=12, VarX=14
E[Y]=014+112+214=1, EY2=32, VarY=12
EXY=1114+1214=34
CovX,Y=34-12·1=14
Therefore
ρX,Y=1414+14=22=1.4142=0.707
The following example would be a challenging example to the readers who are familiar with double integrals.

Example 19 The correlation coefficient between two continuous random variables

Let f(x, y) = 2, 0 ≤ x + y ≤ 1. Then,

P(0 ≤ X ≤ 1/2, 0 ≤ Y ≤ 1/2) = 1/2
fXx=01-x2dy=21-x, 0<x<1
EX=012x1-xdx=1/3, EX2=012x21-xdx=1/6
Thus
Var(X) = 1/18
Similarly,
fY(y) = 2(1 - y), 0 < y < 1, E[Y] = 1/3, E[Y2] = 1/6, Var(Y) = 1/18
Also,
EXY=0101-y2xydxdy=1/12
Cov(X, Y) = E[XY] - E[X]E[Y] = -1/36
Therefore,
ρ=-1/361/181/18=-1/2

Discussion

Since the contents of this tutorial is more like a lecture note and limited, readers are strongly encouraged to read references for further contents. Hopefully, pharmacometricians who read this tutorial get more familiar with statistical notations and probability theories.

Notes

This work was supported by the 2016 Hongik University Academic Research Support Fund.

Conflict of interest:The author has no conflict of interest.

References

    1. Hogg RV, Craig AT. In: Introductin to Mathematical Statistics. 4th ed. Macmillan, Publishing Company; 1978.
    1. Hogg RV, Tanis EA. In: Probability and Statistical Inference. 3rd ed. Macmillan, Publishing Company; 1983.
    1. Kim WC, et al. In: Introduction to Statistics. 4th ed. Seoul: Youngchi; 2005.
    1. Mood AM, Graybill FA, Boes DC. In: Introduction to the Theory of Statistics. New York: McGraw-Hill; 1974.
    1. Peebles PZ JR. In: Probability, Random Variables, and Random Signal Principles. 4th ed. Korea: McGraw-Hill; 2001.
    1. Rice JA. In: Mathematical Statistics and Data Analysis. 3rd ed. Brooks/Cole Cen-gage Learning; 2007.
    1. Rosner B. In: Fundamentals of Biostatistics. 6th ed. 2006.
    1. Ross S. In: A First Course in Probability and Statistical Inference. 8th ed. New Jersey: Pearson Education; 2010.

Metrics
Share
Figures

1 / 3

Funding Information
PERMALINK