Multivariate Gaussian processes: definitions, examples and applications

Chen, Zexun; Fan, Jun; Wang, Kuo

doi:10.1007/s40300-023-00238-3

Multivariate Gaussian processes: definitions, examples and applications

Open access
Published: 27 January 2023

Volume 81, pages 181–191, (2023)
Cite this article

Download PDF

You have full access to this open access article

METRON Aims and scope Submit manuscript

Multivariate Gaussian processes: definitions, examples and applications

Download PDF

5133 Accesses
3 Citations
4 Altmetric
Explore all metrics

Abstract

Gaussian processes occupy one of the leading places in modern statistics and probability theory due to their importance and a wealth of strong results. The common use of Gaussian processes is in connection with problems related to estimation, detection, and many statistical or machine learning models. In this paper, we propose a precise definition of multivariate Gaussian processes based on Gaussian measures on vector-valued function spaces, and provide an existence proof. In addition, several fundamental properties of multivariate Gaussian processes, such as stationarity and independence, are introduced. We further derive two special cases of multivariate Gaussian processes, including multivariate Gaussian white noise and multivariate Brownian motion, and present a brief introduction to multivariate Gaussian process regression as a useful statistical learning method for multi-output prediction problems.

Properties of the Bayesian Parameter Estimation of a Regression Based on Gaussian Processes

Article 15 November 2014

A. A. Zaytsev, E. V. Burnaev & V. G. Spokoiny

Oscillating Gaussian processes

Article Open access 29 April 2020

Pauliina Ilmonen, Soledad Torres & Lauri Viitasaari

The Use of Gaussian Processes in System Identification

1 Introduction

In probability theory and statistics, Gaussian processes are used in connection with problems such as estimation, detection, stochastic analysis, even modern statistical or machine learning. These problems are often effectively formulated in terms of Gaussian measures on appropriate real-valued function space. As an important extension, vector-valued Gaussian processes have not been adequately explained in the literature so that the real-valued function space may limit applications of classical Gaussian processes since the correlation between multiple random values is difficult to be considered in many problems. Therefore, this paper is to consolidate the fundamentals of vector-valued stochastic processes, especially extending classical Gaussian process to multivariate Gaussian process by introducing Gaussian measures on vector-valued function spaces. The main contribution of this paper is to provide a concise proof and a proper explanation of the multivariate Gaussian process followed by examples and applications such as the multivariate Brownian motion and multivariate Gaussian process regression.

The paper is organised as follows. Section 2 introduces some preliminaries, including classical Gaussian measures, matrix-variate Gaussian distribution, stationary process, and classical Gaussian process. Section 3 presents some theoretical definitions of multivariate Gaussian process with the proof of existence. Two examples and one application of multivariate Gaussian processes which show their usefulness is presented in Sect. 4 and Sect. 5. Conclusions and a discussion are given in Sect. 6.

2 Preliminary

2.1 Gaussian measure

Definition 1

(Gaussian measure on ${\mathbb {R}}$ [1]) Let ${\mathcal {B}}({\mathbb {R}})$ denote the completion of the Borel $\sigma $-algebra on ${\mathbb {R}}$. Let $\lambda : {\mathcal {B}}({\mathbb {R}}) \mapsto [0, + \infty ]$ denote the usual Lebesgue measure. Then the Borel probability measure $\gamma : {\mathcal {B}}({\mathbb {R}}) \mapsto [0, 1] $ is Gaussian with mean $\mu \in {\mathbb {R}}$ and variance $\sigma ^2 >0$,

$$\begin{aligned} \gamma (A) = \int _{A} \frac{1}{\sqrt{2\pi \sigma ^2}}\exp \left( -\frac{(x - \mu )^2}{2\sigma ^2} \right) \textrm{d}\lambda (x), \end{aligned}$$

for any measurable set $A \in {\mathcal {B}}({\mathbb {R}})$.

A random variable X on a probability space $(\Omega , {\mathcal {B}}, {\mathcal {P}})$ is Gaussian with mean $\mu $ and variance $\sigma ^2$ if its distribution measure is Gaussian, i.e.

$$\begin{aligned} {\mathcal {P}}(X \in A) = \gamma (A). \end{aligned}$$

In terms of random variable, we have a definition of Gaussian random variable.

Definition 2

An n-dimensional random vector $\varvec{X} = (X_1, \cdots , X_n)$ is Gaussian if and only if $\langle \varvec{a}, \varvec{X} \rangle : = \varvec{a}^{{\textsf{T}}}\varvec{X} = \sum a_i X_i$ is a Gaussian random variable for all $\varvec{a} = (a_1, \cdots , a_n) \in {\mathbb {R}}^n$.

In terms of measure, we can naturally have the definition of Gaussian measure on ${\mathbb {R}}^n$.

Definition 3

(Gaussian measure on ${\mathbb {R}}^n$ [1]) Let $\gamma $ be a Borel probability measure on ${\mathbb {R}}^n$. For each $\varvec{a} \in {\mathbb {R}}^n$, denote a random variable $Y(\varvec{x}), \varvec{x} \in {\mathbb {R}}^n $ as a mapping $\varvec{x} \mapsto \langle \varvec{a}, \varvec{x} \rangle \in {\mathbb {R}}$ on the probability space $({\mathbb {R}}^n, {\mathcal {B}}({\mathbb {R}}^n), \gamma )$. The Borel probability measure $\gamma $ is a Gaussian measure on ${\mathbb {R}}^n$ if and only if the random variable Y is Gaussian for each $\varvec{a}$.

2.2 Matrix-variate Gaussian distribution

In statistics, the matrix-variate Gaussian distribution is a probability distribution that is a generalization of the multivariate normal distribution to matrix-valued random variables.

Definition 4

(Matrix-variate Gaussian distribution [2]) The random matrix is said to be Gaussian:

$$\begin{aligned} \varvec{X} \sim \mathcal{M}\mathcal{N}_{n, d}(M, U, V), \end{aligned}$$

if and only if

$$\begin{aligned} \textrm{vec}(\varvec{X}) \sim {\mathcal {N}}_{nd}(\textrm{vec}(M), V \otimes U), \end{aligned}$$

where $\otimes $ denotes the Kronecker products and $\textrm{vec}(\varvec{X})$ denotes the vectorisation of $\varvec{X}$.

2.3 Stationary process

Definition 5

(Stationary process [3]) Let $X = \{ X_n(\omega ) \}$ be a stochastic process on a probability space $(\Omega , {\mathcal {B}}, {\mathcal {P}})$. If for any integers n and h, two random vectors

$$\begin{aligned} (X_n, X_{n+1}, \ldots , X_{n+k-1}) \,\,\text {and} (X_{n+h}, X_{n+h+1}, \ldots , X_{n+h+k-1}) \end{aligned}$$

have the same probability distribution, then X is said to be a stationary process.

2.4 Gaussian process

Let ${\mathbb {R}}_T$ be function space of all ${\mathbb {R}}$-valued functions on T^{Footnote 1}. Consider the product topology on ${\mathbb {R}}_T$, which is defined as the smallest topology that makes the projection maps $\Pi _{t_{1}, \ldots , t_{n}}(f)=\left[ f\left( t_{1}\right) , \ldots , f\left( t_{n}\right) \right] $ from ${\mathbb {R}}_T$ to ${\mathbb {R}}^n$ measurable, and define ${\mathcal {F}}$ as the Borel $\sigma $-algebra of this topology.

Definition 6

(Gaussian measure on $({\mathbb {R}}_T, {\mathcal {F}})$) A measure $\gamma $ on $({\mathbb {R}}_T, {\mathcal {F}})$ is called as a Gaussian measure if for any $n \ge 1$ and $t_1, \cdots , t_n \in T$, the push-forward measure $\gamma \circ \Pi _{t_1, \cdots , t_n}^{-1}$ on ${\mathbb {R}}^n$ is a Gaussian measure.

Briefly speaking, (multivariate) Gaussian distributions are Gaussian measures on ${\mathbb {R}}^n$, and Gaussian processes are Gaussian measures on the function space $({\mathbb {R}}_T, {\mathcal {F}})$ due to the following theorem.

Theorem 1

(Relationship between Gaussian process and Gaussian measure) If $X=(X_t)_{t \in T}$ is a Gaussian process, then the push-forward measure $\gamma = {\mathcal {P}} \circ X^{-1}$ with $X: \Omega \mapsto {\mathbb {R}}_{T}$ is Gaussian on ${\mathbb {R}}_{T}$, namely, $\gamma $ is a Gaussian measure on $({\mathbb {R}}_T, {\mathcal {F}})$. Conversely, if $\gamma $ is a Gaussian measure on $({\mathbb {R}}_T, {\mathcal {F}})$, then on the probability space $({\mathbb {R}}_T, {\mathcal {F}}, \gamma )$, the co-ordinate random variable $\Pi = (\Pi _t)_{t \in T}$ is from a Gaussian process.

The proof of the relationship between Gaussian process and Gaussian measure can be found in [4]. Motivated by the relationship between Gaussian measures and Gaussian processes, we properly define multivariate Gaussian processes by extending Gaussian measures on real-valued function space to vector-valued function space.

3 Multivariate Gaussian process

Gaussian processes (GPs) have been proven to be an effective statistical learning method for nonlinear problems due to many desirable properties, such as a clear structure with Bayesian interpretation, a simple integrated approach of obtaining and expressing uncertainty in predictions and the capability of capturing a wide variety of data feature by hyper-parameters [5, 6]. With the development of Gaussian processes related to machine learning algorithms, Gaussian process applications face a conspicuous limitation. The classical GP models can be only used to deal with a single output or single response problem because the process itself is defined on ${\mathbb {R}}$, and as a result the correlation between multiple tasks or responses cannot be taken into consideration [6, 7]. In order to overcome the drawback above, [8] proposed a framework to perform multi-output prediction using multivariate Gaussian processes (MV-GPs). Although [8] showed the usefulness of the proposed methods via data-driven examples, some theoretical issues of multivariate Gaussian processes, such as existence, independence condition, and stationary condition, are still not clear.

3.1 Definitions

Following the classical theory of Gaussian measure, matrix-variate Gaussian distribution, and Gaussian process, we introduce Gaussian measure on matrix space and Gaussian measure on ${\mathbb {R}}^d$-valued function space, and finally define the multivariate Gaussian process.

According to Definition 3 (equivalence between Gaussian measure on ${\mathbb {R}}^n$ and random variable on ${\mathbb {R}}^n$) and Definition 4 (equivalence between random variable on ${\mathbb {R}}^{nd}$ and random matrix on ${\mathbb {R}}^{n \times d}$ ), we can have a definition of Gaussian measure on ${\mathbb {R}}^{n \times d}$.

Definition 7

(Gaussian measure on ${\mathbb {R}}^{n \times d}$) Let $\gamma $ be a Borel probability measure on ${\mathbb {R}}^{n \times d}$. For each $\varvec{a} \in {\mathbb {R}}^{nd}$, denote a random variable $Y(\varvec{x}), \varvec{x} \in {\mathbb {R}}^{n \times d}$ as a mapping $\varvec{x} \mapsto \langle \varvec{a}, \textrm{vec}(\varvec{x}) \rangle \in {\mathbb {R}}$ on the probability space $({\mathbb {R}}^{n \times d}, {\mathcal {B}}({\mathbb {R}}^{n \times d}), \gamma )$. The Borel probability measure $\gamma $ is a Gaussian measure on ${\mathbb {R}}^{n \times d}$ if and only if the random variable Y is Gaussian for each $\varvec{a}$.

Let $({\mathbb {R}}^{d})_T$ be all ${\mathbb {R}}^{d}$-valued functions on T and consider the smallest topology on ${\mathbb {R}}^d$ that makes the projection mappings $\Xi _{t_{1}, \ldots , t_{n}}(\varvec{f})=\left[ \varvec{f}\left( t_{1}\right) , \ldots , \varvec{f}\left( t_{n}\right) \right] $ from $({\mathbb {R}}^{d})_T$ to ${\mathbb {R}}^{n \times d}$ measurable, and define ${\mathcal {G}}$ as the Borel $\sigma $-algebra of this topology. Thus we can have a definition of Gaussian measure on ${\mathbb {R}}^d$-valued function space.

Definition 8

(Gaussian measure on $(({\mathbb {R}}^d)_T, {\mathcal {G}})$) A measure $\gamma $ on $(({\mathbb {R}}^d)_T, {\mathcal {G}})$ is called as a Gaussian measure if for any $n \ge 1$ and $t_1, \cdots , t_n \in T$, the push-forward measure $\gamma \circ \Xi _{t_1, \cdots , t_n}^{-1}$ on ${\mathbb {R}}^{n \times d}$ is a Gaussian measure.

Inspired by the relationship between Gaussian process and Gaussian measure in Theorem 1, we can appropriately define multivariate Gaussian processes (MV-GPs).

Definition 9

(d-variate Gaussian process) Given a Gaussian measure on $(({\mathbb {R}}^d)_T, {\mathcal {G}})$, $d \ge 1$, the co-ordinate random vector $\Xi = (\Xi _t)_{t \in T}$ on the probability space $(({\mathbb {R}}^d)_T, {\mathcal {G}}, \gamma )$ is said to be from a d-variate Gaussian process^{Footnote 2}.

Given the definition of MV-GPs, it is essential to show the existence before proceeding with further research. Inspired by the fact that GPs can be specified by mean function and covariance function (thus they can be denoted as $\mathcal{G}\mathcal{P}(\mu , k)$), we present the proof of existence of MV-GPs by applying Daniell-Kolmogorov theorem.

Theorem 2

(Existence of d-variate Gaussian process^{Footnote 3}) For any index set T, any vector-valued mean function $\varvec{u} : T \mapsto {\mathbb {R}}^{1 \times d}$, any covariance function $k: T \times T \mapsto {\mathbb {R}}$ and any positive semi-definite parameter matrix $\Lambda \in {\mathbb {R}}^{d \times d}$, there exists a probability space $(\Omega , {\mathcal {G}}, {\mathcal {P}})$ and a d-variate Gaussian process $\varvec{f}(t) \in {\mathbb {R}}^{1 \times d}, t \in T$ on this space, whose mean function is $\varvec{u}$, covariance function is k and parameter matrix is $\Lambda $ , such that,

$\mathbb {E}[\varvec{f}(t)] = \varvec{u}(t), \quad \forall t \in T$,
$\mathbb {E}\left[ (\varvec{f}(t_s)- \varvec{u}(t_s))(\varvec{f}(t_l)- \varvec{u}(t_l))^{{\textsf{T}}}\right] = \textrm{tr}(\Lambda )k(t_s,t_l),\quad \forall t_s,t_l \in T$,
$\mathbb {E}[({\textbf {F}}_{t_1, \cdots , t_n} - M_{t_1, \cdots , t_n})^{\textsf{T}}({\textbf {F}}_{t_1, \cdots , t_n} - M_{t_1, \cdots , t_n}) ] = \textrm{tr}(K_{t_1, \cdots , t_n}) \Lambda , \quad \forall n\ge 1, t_1, \cdots , t_n \in T$, where
$$\begin{aligned} M_{t_1, \cdots , t_n}&= [\varvec{u}(t_1)^{\textsf{T}}, \cdots , \varvec{u}(t_n)^{\textsf{T}}]^{\textsf{T}}\\ F_{t_1, \cdots , t_n}&= [\varvec{f}(t_1)^{\textsf{T}}, \cdots , \varvec{f}(t_n)^{\textsf{T}}]^{\textsf{T}}\\ K_{t_1, \cdots , t_n}&= \begin{bmatrix} k(t_1, t_1) &{} \cdots &{} k(t_1, t_n) \\ \vdots &{} \ddots &{} \vdots \\ k(t_n, t_1) &{} \cdots &{} k(t_n, t_n) \end{bmatrix}. \end{aligned}$$

We denote $\varvec{f} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )$.

Proof

Given $n > 1$, for every $t_1, \cdots , t_n \in T$, a Gaussian measure $\gamma _{t_1, \cdots , t_n}$ on ${\mathbb {R}}^{n \times d}$ satisfies the assumptions of Daniell-Kolmogorov theorem because the projection of a matrix Gaussian distribution on ${\mathbb {R}}^{n \times d}$ with $\left[ \varvec{u}(t_1)^{{\textsf{T}}}, \cdots , \varvec{u}(t_n)^{{\textsf{T}}}\right] ^{{\textsf{T}}} \in {\mathbb {R}}^{n \times d}$, $n \times n$ column covariance matrix $K = (k_{i,j}) \in {\mathbb {R}}^{n \times n}$, and $d \times d$ row covariance matrix $\Lambda \in {\mathbb {R}}^{d \times d}$, to the first $n-1$ co-ordinates, is precisely the Gaussian distribution with $\left[ \varvec{u}(t_1)^{{\textsf{T}}}, \cdots , \varvec{u}(t_{n-1})^{{\textsf{T}}} \right] ^{{\textsf{T}}} \in {\mathbb {R}}^{(n-1) \times d}$, $(n-1) \times (n-1)$ column covariance matrix $K = (k_{i,j}) \in {\mathbb {R}}^{(n-1) \times (n-1)}$, and row covariance matrix $\Lambda \in {\mathbb {R}}^{d \times d}$ due to the conditional property of matrix Gaussian distribution [2, 8]. By the Daniell-Kolmogorov theorem, there exists a probability space $(\Omega , {\mathcal {G}}, {\mathcal {P}})$ as well as a d-variate Gaussian process $X = (X_t)_{t \in T} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )$ defined on this space such that any finite dimensional distribution of $[X_{t_1}, \cdots , X_{t_n}]$ is given by the measure $\gamma _{t_1, \cdots , t_n}$. $\square $

3.2 Properties

Following the existence of d-variate GPs, we also achieve some properties as follow.

Proposition 3

(Stationary) A d-variate Gaussian process $\mathcal {MGP}_d(\varvec{u}, k, \Lambda )$ is said to be stationary if

$$\begin{aligned} \varvec{u}(t) = \varvec{u}(t + h), \quad k(t_s + h, t_l + h) = k(t_s, k_l), \forall t, t_s, t_l, h \in T. \end{aligned}$$

Proof

Let $\varvec{f} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )$, then for $\forall n\ge 1, t_1, \cdots , t_n \in T$,

$$\begin{aligned}{}[\varvec{f}(t_1)^{{\textsf{T}}},\ldots ,\varvec{f}(t_n)^{{\textsf{T}}}]^{{\textsf{T}}} \sim \mathcal{M}\mathcal{N}(\varvec{u}_{t_1, \cdots , t_n},K_{t_1, \cdots , t_n},\Lambda ), \end{aligned}$$

where $\mathcal{M}\mathcal{N}(\cdot , \cdot , \cdot )$ is matrix-variate Gaussian distribution,

$$\begin{aligned} \varvec{u}_{t_1, \cdots , t_n} = \begin{bmatrix} \varvec{u} (t_1) \\ \vdots \\ \varvec{u} (t_n) \end{bmatrix} , \quad K_{t_1, \cdots , t_n} = \begin{bmatrix} k(t_1, t_1) &{} \cdots &{} k(t_1, t_n) \\ \vdots &{} \ddots &{} \vdots \\ k(t_n, t_1) &{} \cdots &{} k(t_n, t_n) \end{bmatrix}. \end{aligned}$$

Given any time increment $h \in T$, there also exists,

$$\begin{aligned}{}[\varvec{f}(t_1 + h)^{{\textsf{T}}},\ldots ,\varvec{f}(t_n + h)^{{\textsf{T}}}]^{{\textsf{T}}} \sim \mathcal{M}\mathcal{N}(\varvec{u}_{t_1+h, \cdots , t_n+h},K_{t_1+h, \cdots , t_n+h},\Lambda ), \end{aligned}$$

where

$$\begin{aligned} \varvec{u}_{t_1+h, \cdots , t_n + h}&= \begin{bmatrix} \varvec{u}(t_1 +h ) \\ \vdots \\ \varvec{u} (t_n +h ) \end{bmatrix} ,\\ K_{t_1 + h, \cdots , t_n + h}&= \begin{bmatrix} k(t_1 + h, t_1 + h) &{} \cdots &{} k(t_1 + h, t_n + h) \\ \vdots &{} \ddots &{} \vdots \\ k(t_n + h, t_1 + h) &{} \cdots &{} k(t_n + h, t_n + h) \end{bmatrix}. \end{aligned}$$

Since $\varvec{u}(t) = \varvec{u}(t + h), k(t_s + h, t_l + h) = k(t_s, k_l), \forall t, t_s, t_l, h \in T$,

$$\begin{aligned} \varvec{u}_{t_1+h, \cdots , t_n + h}&= \begin{bmatrix} \varvec{u}(t_1 +h) \\ \vdots \\ \varvec{u} (t_n +h) \end{bmatrix} =\begin{bmatrix} \varvec{u} (t_1) \\ \vdots \\ \varvec{u} (t_n) \end{bmatrix} = \varvec{u}_{t_1, \cdots , t_n }, \\ K_{t_1 + h, \cdots , t_n + h}&= \begin{bmatrix} k(t_1 + h, t_1 + h) &{} \cdots &{} k(t_1 + h, t_n + h) \\ \vdots &{} \ddots &{} \vdots \\ k(t_n + h, t_1 + h) &{} \cdots &{} k(t_n + h, t_n + h) \end{bmatrix} =\begin{bmatrix} k(t_1 , t_1) &{} \cdots &{} k(t_1, t_n) \\ \vdots &{} \ddots &{} \vdots \\ k(t_n, t_1) &{} \cdots &{} k(t_n, t_n) \end{bmatrix} \\&= K_{t_1, \cdots , t_n}. \end{aligned}$$

Therefore, $[\varvec{f}(t_1 + h)^{{\textsf{T}}},\ldots ,\varvec{f}(t_n + h)^{{\textsf{T}}}]^{{\textsf{T}}} $ has the same probability distribution as $[\varvec{f}(t_1)^{{\textsf{T}}},\ldots ,\varvec{f}(t_n)^{{\textsf{T}}}]^{{\textsf{T}}} $. Due to the arbitrary choice of $n>1$ and $t_1, \cdots , t_n \in T$, $\varvec{f} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )$ is a stationary process according to Definition 5. $\square $

Proposition 4

(Independence) A d-collection of functions $\{\varvec{f}_i\}_{i = 1,2, \cdots , d}$ identically independently follows a Gaussian process $\mathcal{G}\mathcal{P}(\mu , k)$ if and only if

$$\begin{aligned} \varvec{f} = [\varvec{f}_1, \varvec{f}_2, \cdots , \varvec{f}_d] \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda ), \end{aligned}$$

where $\varvec{u} = [\mu , \cdots , \mu ] \in {\mathbb {R}}^d$ and $\Lambda $ is any diagonal positive semi-definite matrix.

Proof

Necessity: if $\varvec{f} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )$, then for $\forall n\ge 1, t_1, \cdots , t_n \in T$,

$$\begin{aligned}{}[\varvec{f}(t_1)^{{\textsf{T}}},\ldots ,\varvec{f}(t_n)^{{\textsf{T}}}]^{{\textsf{T}}} \sim \mathcal{M}\mathcal{N}(\varvec{u}_{t_1, \cdots , t_n},K_{t_1, \cdots , t_n},\Lambda ), \end{aligned}$$

where,

$$\begin{aligned} \varvec{u}_{t_1, \cdots , t_n} = \begin{bmatrix} \varvec{u} (t_1) \\ \vdots \\ \varvec{u} (t_n) \end{bmatrix} , \quad K_{t_1, \cdots , t_n} = \begin{bmatrix} k(t_1, t_1) &{} \cdots &{} k(t_1, t_n) \\ \vdots &{} \ddots &{} \vdots \\ k(t_n, t_1) &{} \cdots &{} k(t_n, t_n) \end{bmatrix}. \end{aligned}$$

Rewrite the left, we obtain

$$\begin{aligned}{}[\varvec{\xi }_1, \varvec{\xi }_2, \cdots , \varvec{\xi }_d] \sim \mathcal{M}\mathcal{N}(\varvec{u}_{t_1, \cdots , t_n},K_{t_1, \cdots , t_n},\Lambda ), \end{aligned}$$

where $\varvec{\xi }_i = [f_i(t_1), f_i(t_2), \cdots , f_i(t_n)]^{{\textsf{T}}}$. Since $\Lambda $ is a diagonal matrix, for any $i \ne j$

$$\begin{aligned} {\mathbb {E}}[\varvec{\xi }_i^{{\textsf{T}}} \varvec{\xi }_{j}] = \textrm{tr}( K_{t_1, \cdots , t_n}) \Lambda _{ij} = \textrm{tr}( K_{t_1, \cdots , t_n}) \cdot 0 = 0. \end{aligned}$$

Because $\varvec{\xi }_i$ and $\varvec{\xi }_j$ are any finite number of realisations of $\varvec{f}_i$ and $\varvec{f}_j$ respectively from the same Gaussian process, $\varvec{f}_i$ and $\varvec{f}_j$ are uncorrelated. Due to joint finite realisations of $\varvec{f}_i$ and $\varvec{f}_j$ follow Gaussian, non-correlation implies independence.

Sufficiency: if $\{\varvec{f}_i\}_{i = 1,2, \cdots , d} \sim \mathcal{G}\mathcal{P}(0, k)$ are independent, for $\forall n\ge 1, t_1, \cdots , t_n \in T$ and for any $i \ne j$,

$$\begin{aligned} 0 = {\mathbb {E}}[\varvec{\xi }_i^{{\textsf{T}}} \varvec{\xi }_{j}] = \textrm{tr}( K_{t_1, \cdots , t_n}) \Lambda _{ij}. \end{aligned}$$

Since $\textrm{tr}( K_{t_1, \cdots , t_n})$ is non-zero, $\Lambda _{ij}$ must be 0. Due the arbitrary choices of i, j, $\Lambda $ must be diagonal. That is to say, $\varvec{\xi }_i = [f_i(t_1), \cdots , f_i(t_n)]^{{\textsf{T}}}$ can be written as a matrix Gaussian distribution $\mathcal{M}\mathcal{N}(\varvec{u}_{t_1, \cdots , t_n}, K_{t_1, \cdots , t_n}, \Lambda )$ where $\Lambda $ is a diagonal positive semi-definite matrix. Since for $\forall n\ge 1, t_1, \cdots , t_n \in T$ the above result holds, $\{\varvec{f}_i\}_{i = 1, \cdots , d}$ are identically independent Gaussian processes $\mathcal{G}\mathcal{P}(\mu , k)$. $\square $

4 Example: special cases

Instinctively, a special case is centred multivariate Gaussian process where vector-valued mean function $\varvec{\mu } = \varvec{0}$. The 50 realisation samples generated from a centred multivariate Gaussian process are demonstrated in Fig. 1: Left. Furthermore, we can derive the multivariate Gaussian white noise and the multivariate Brownian motion.

4.1 Multivariate Gaussian white noise

Let $\varvec{f} = [\varvec{f}_1,\cdots , \varvec{f}_d ] \sim \mathcal {MGP}_d(\varvec{0}, \sigma ^2 \mathbbm {1}(t_s, t_l), \Lambda )$, where $\mathbbm {1}(\cdot )$ is an indicator function that equals 1 if $t_s = t_l$, otherwise 0, thus

$$\begin{aligned}&{\mathbb {E}}[\varvec{f}(t)] = \varvec{0}, \quad \forall t \in T, \\&{\mathbb {E}}\left[ \varvec{f}(t_s)\varvec{f}(t_l)^{{\textsf{T}}}\right] = \textrm{tr}(\Lambda )\sigma ^2 \mathbbm {1}(t_s, t_l) = {\left\{ \begin{array}{ll} 0 &{} \text{ if } t_s \ne t_l \\ \sigma ^2\textrm{tr}(\Lambda ) &{} \text{ if } t_s = t_l \end{array}\right. } ,\quad \forall t_s,t_l \in T. \end{aligned}$$

Furthermore, $\forall n\ge 1, t_1, \cdots , t_n \in T$, there exists,

$$\begin{aligned} {\mathbb {E}}[{\textbf {F}}_{t_1, \cdots , t_n}^{\textsf{T}}{\textbf {F}}_{t_1, \cdots , t_n}] = \textrm{tr}(K_{t_1, \cdots , t_n}) \Lambda = \textrm{tr}(\sigma ^2 \textrm{I}_{d \times d})\Lambda = d \sigma ^2 \Lambda , \end{aligned}$$

where $F_{t_1, \cdots , t_n} = [\varvec{f}(t_1)^{\textsf{T}}, \cdots , \varvec{f}(t_n)^{\textsf{T}}]^{\textsf{T}}$.

Therefore, it is nature to have a definition of the d-variate Gaussian white noise.

Definition 10

(d-variate Gaussian white noise) A d-variate Gaussian process $\mathcal {MGP}_d (\varvec{u}, k, \Lambda )$ is said to be a d-variate Gaussian white noise if $\varvec{u} = \varvec{0}$ and $k(t_s, t_l) = \sigma ^2 \mathbbm {1}(t_s, t_l)$.

Remark 1

We observe that d-variate Gaussian white noise has independence property as white noise along with T, but it has correlation along with d-variate dimension. Therefore, d-variate Gaussian white noise is also called as variate-dependent Gaussian white noise or variate-correlated Gaussian white noise, which is distinct from the traditional d-dimensional independent Gaussian white noise. Here are 50 realisation samples generated from a multivariate Gaussian white noise shown in Fig. 1: Right.

4.2 Multivariate Brownian motion

According to the Chapter 2 of the book [9], there is a definition of Brownian motion, which is a Gaussian white noise whose intensity is Lebesgue measure. Since Brownian motion is a special case of Gaussian process with continuous sample paths, mean function $u = 0$ and covariance function $k(s,t) = \min (s,t)$, we propose an example, d-variate Brownian motion, as a special case of d-variate Gaussian process with vector-valued mean function $\varvec{u} = \varvec{0}$, covariance function $k(s,t) = \min (s,t)$ and parameter matrix $\Lambda $. Based on the Theorem 2, we derive some properties of the traditional Brownian motion to a more general vector-valued case.

Definition 11

(d-variate Brownian motion) A d-variate Gaussian process $\mathcal {MGP}_d(\varvec{u}, k, \Lambda )$ is said to be d-variate Brownian motion^{Footnote 4} if all sample paths are continuous, $\varvec{u} = \varvec{0}$ and $k(t_s, t_l) =\min (t_s, t_l)$.

Let $B_{t}$ be a d-variate Brownian motion, which means for all $0 \le t_1 \le \cdots \le t_n$ the random matrix $Z = (B_{t_1}^{{\textsf{T}}}, \ldots , B_{t_n}^{{\textsf{T}}})^{{\textsf{T}}} \in {\mathbb {R}}^{n \times d}$ has a Gaussian distribution on the probability space $(\Omega , {\mathcal {G}}, {\mathcal {P}})$ mentioned in Theorem 2. There exists a matrix $M \in {\mathbb {R}}^{n \times d}$ and two non-negative definite matrices $C = [c]_{jm} \in {\mathbb {R}}^{n \times n}$ and $\Lambda = [\lambda ]_{ab} \in {\mathbb {R}}^{d \times d}$ such that

$$\begin{aligned} {\mathbb {E}} \left[ \exp \left( \text {i} \sum _{j = 1}^{n} W_{j, \cdot } Z_{j,\cdot }^{{\textsf{T}}} \right) \right]= & {} \exp \left( - \dfrac{1}{2} \sum _{j,m} W_{j, \cdot } c_{jm} W_{m, \cdot }^{{\textsf{T}}} + \text {i} \sum _j W_{j, \cdot } M_{j,\cdot }^{{\textsf{T}}} \right) ,\\ {\mathbb {E}} \left[ \exp \left( \text {i} \sum _{a = 1}^{d} W_{\cdot , a} Z_{\cdot ,a}^{{\textsf{T}}} \right) \right]= & {} \exp \left( - \dfrac{1}{2} \sum _{a,b} W_{\cdot , a} \lambda _{ab} W_{\cdot , b}^{{\textsf{T}}} + \text {i} \sum _a W_{\cdot , a} M_{\cdot ,a}^{{\textsf{T}}} \right) , \end{aligned}$$

where $W = [w]_{ja} \in {\mathbb {R}}^{n \times d}$ and $\text {i}$ is the imaginary unit. Moreover, we also have the mean value $M = {\mathbb {E}}[Z]$ and two covariance matrices

$$\begin{aligned} c_{jm}= & {} {\mathbb {E}} [(Z_{j,\cdot } - M_{j,\cdot }) (Z_{m,\cdot } - M_{m,\cdot })^{{\textsf{T}}}],\\ \lambda _{ab}= & {} {\mathbb {E}} [(Z_{\cdot ,a} - M_{\cdot ,a})^{{\textsf{T}}} (Z_{\cdot ,b} - M_{\cdot ,b})]. \end{aligned}$$

Assume that the mean matrix M here is a zero matrix, i.e. ${\mathbb {E}}[Z] = {\mathbb {E}}[Z \vert t = 0] = 0$, $I_d = \Lambda $, and

$$\begin{aligned} C = \begin{bmatrix} t_1 &{} t_1 &{} \cdots &{} t_1 \\ t_1 &{} t_2 &{} \cdots &{} t_2 \\ \vdots &{} \vdots &{} &{} \vdots \\ t_1 &{} t_2 &{} \cdots &{} t_n \end{bmatrix}. \end{aligned}$$

Hence, ${\mathbb {E}}[B_t]=0$ for all $t \ge 0$ and

$$\begin{aligned} {\mathbb {E}}[B_t B_t^{{\textsf{T}}}] = d t, \; {\mathbb {E}}[B_t B_s^{{\textsf{T}}}] = d \min (s,t), \\{\mathbb {E}}[B_t^{{\textsf{T}}} B_t] = t \Lambda , \; {\mathbb {E}}[B_t^{{\textsf{T}}} B_s] = \min (s,t) \Lambda . \end{aligned}$$

Moreover, we have

$$\begin{aligned} {\mathbb {E}} [(B_t - B_s) (B_t - B_s)^{{\textsf{T}}}] = {\mathbb {E}} [B_t B_t^{{\textsf{T}}} - 2 B_s B_t^{{\textsf{T}}} + B_s B_s^{{\textsf{T}}}] = d |t - s |,\\ {\mathbb {E}} [(B_t - B_s)^{{\textsf{T}}} (B_t - B_s)] = {\mathbb {E}} [ B_t^{{\textsf{T}}} B_t - 2 B_s^{{\textsf{T}}} B_t + B_s^{{\textsf{T}}} B_s] = |t - s |\Lambda . \end{aligned}$$

Note that this d-variate Brownian motion $B_t$ still has independent increments since ${\mathbb {E}} [(B_{t_i} - B_{t_{i-1}}) (B_{t_j} - B_{t_{j-1}})^{{\textsf{T}}}] = 0$ and ${\mathbb {E}} [(B_{t_i} - B_{t_{i-1}})^{{\textsf{T}}} (B_{t_j} - B_{t_{j-1}})] = 0$ when $t_i < t_j$ holds for all $0< t_1< \cdots < t_n$.

Remark 2

Similar to d-variate Gaussian white noise, d-variate Brownian motion also has independence property along with T, but it has correlation along with d-variate dimension. Therefore, d-variate Brownian motion is also called as variate-dependent Brownian motion or variate-correlated Brownian motion, which is distinct from the "traditional" d-dimensional Brownian motion. Actually, the "traditional" d-dimensional Brownian motion is a special case of d-variate Brownian motion with diagonal matrix $\Lambda $.

As a Brownian motion, we then introduce Itô lemma for the d-variate Brownian motion.

Let $B_t = [B_1(t), \cdots , B_d(t)]$ be the d-variate Brownian motion derived in Section 4.2. Then, we have the following lemma.

Lemma 5

(Itô lemma for the d-variate Brownian motion) Let F be a twice continuously differentiable real function on ${\mathbb {R}}^{d+1}$ and let $\Lambda = [\lambda ]_{i,j} \in {\mathbb {R}}^{d \times d}$ be the covariance matrix for the d-variate dimension. Then,

$$\begin{aligned} F (t, B_1(t), \cdots , B_d(t)) = F (0, B_1(0), \cdots , B_d(0)) + \sum _{i=1}^d \int _0^t \dfrac{\partial F}{\partial B_i} (s, B_1(s), \cdots , B_d(s)) \textrm{d}B_i(s)\\ + \int _0^t \left\{ \dfrac{\partial F}{\partial s} (s, B_1(s), \cdots , B_d(s)) + \dfrac{1}{2} \sum _{i,j=1}^{d} \dfrac{\partial ^2 F}{\partial B_i \partial B_j} (s, B_1(s), \cdots , B_d(s)) \lambda _{i,j} \right\} \textrm{d}s. \end{aligned}$$

Proof

By Itô lemma and the definition of the d-variate Brownian motion, we obtain

$$\begin{aligned} F (t, B_1(t), \cdots , B_d(t))&= F (0, B_1(0), \cdots , B_d(0)) + \int _0^t \dfrac{\partial F}{\partial s} (s, B_1(s), \cdots , B_d(s)) \textrm{d}s \\&\quad + \sum _{i=1}^d \int _0^t \dfrac{\partial F}{\partial B_i} (s, B_1(s), \cdots , B_d(s)) \textrm{d}B_i(s) \\&\quad + \dfrac{1}{2} \sum _{i,j=1}^{d} \int _0^t \dfrac{\partial ^2 F}{\partial B_i \partial B_j} (s, B_1(s), \cdots , B_d(s)) \textrm{d}\langle B_i, B_j \rangle (s). \end{aligned}$$

The proof is complete by $d\langle B_i, B_j \rangle (s) = \lambda _{i,j} \textrm{d}s$. $\square $

5 Application: multivariate Gaussian process regression

Multi-output prediction is a good example of a practical application of multivariate Gaussian processes. Multivariate Gaussian process modelling provides a solid and unified framework for predicting multiple responses or tasks by exploiting their correlations. As a regression problem, multivariate Gaussian process As a regression problem, multivariate Gaussian process regression (MV-GPR) has closed-form expressions for marginal likelihoods and predictive distributions, so parameter estimations can employ the same optimization techniques as conventional Gaussian process modelling [8].

As a summary of MV-GPR in [8], the noise-free multi-output regression model is considered and the noise term is incorporated into the kernel function. Given n pairs of observations $\{(x_i,\varvec{y}_i)\}_{i=1}^n, x_i \in {\mathbb {R}}^p, \varvec{y}_i \in {\mathbb {R}}^{d}$, we assume the following model

$$\begin{aligned} \varvec{f} \sim \mathcal {MGP}_{d}(\varvec{0},k',\Lambda ), \quad \varvec{y}_i = \varvec{f}(x_i), \text{ for } \; i = 1,\cdots ,n, \end{aligned}$$

where $\Lambda $ is an undetermined covariance matrix (the relationship between different outputs), $k' = k(x_i,x_j) + \delta _{ij}\sigma _n^2, $ and $\delta _{ij}$ is Kronecker delta. According to multivariate Gaussian process, it yields that the collection of functions $[\varvec{f}(x_1),\ldots ,\varvec{f}(x_n)]$ follows a matrix-variate Gaussian distribution

$$\begin{aligned}{}[\varvec{f}(x_1)^{{\textsf{T}}},\ldots ,\varvec{f}(x_n)^{{\textsf{T}}}]^{{\textsf{T}}} \sim \mathcal{M}\mathcal{N}(\varvec{0},K',\Lambda ), \end{aligned}$$

where $K'$ is the $n \times n$ covariance matrix of which the (i, j)-th element $[K']_{ij} = k'(x_i,x_j)$. Therefore, the predictive targets $\varvec{f}_* = [f_{*1},\ldots ,f_{*m}]^{\textsf{T}}$ at the test locations $X_* = [x_{n+1},\ldots ,x_{n+m}]^{\textsf{T}}$ is given by

$$\begin{aligned} p(\varvec{f}_* \mid X, Y, X_*) = \mathcal{M}\mathcal{N}({\hat{M}},{\hat{\Sigma }},{\hat{\Lambda }}), \end{aligned}$$

where

$$\begin{aligned} {\hat{M}} = K'(X_*,X)^{{\textsf{T}}}K'(X,X)^{-1}Y,\\ {\hat{\Sigma }} = K'(X_*,X_*) - K'(X_*,X)^{{\textsf{T}}}K'(X,X)^{-1}K'(X_*,X), \end{aligned}$$

and

$$\begin{aligned} {\hat{\Lambda }} = \Lambda . \end{aligned}$$

Here $K'(X,X)$ is an $n \times n$ matrix of which the (i, j)-th element $[K'(X,X)]_{ij} = k'(x_{i},x_j)$, $K'(X_*,X)$ is an $m \times n$ matrix of which the (i, j)-th element $[K'(X_*,X)]_{ij} = k'(x_{n+i},x_j)$, and $K'(X_*,X_*)$ is an $m \times m$ matrix with the (i, j)-th element $[K'(X_*,X_*)]_{ij} = k'(x_{n+i},x_{n+j})$. In addition, the expectation and the covariance are obtained,

$$\begin{aligned} {\mathbb {E}}[\varvec{f}_*]= & {} {\hat{M}}=K'(X_*,X)^{{\textsf{T}}}K'(X,X)^{-1}Y,\\ \textrm{cov}(\textrm{vec}(\varvec{f}^{{\textsf{T}}}_*))= & {} {\hat{\Sigma }}\otimes {\hat{\Lambda }} = [K'(X_*,X_*) - K'(X_*,X)^{{\textsf{T}}}K'(X,X)^{-1}K'(X_*,X)] \otimes \Lambda . \end{aligned}$$

From a data science perspective, the hyperparameters involved in the covariance function (kernel) $k'( \cdot , \cdot )$ and the row covariance matrix of MV-GPR needs to be estimated from the training data using a variety of methods [10], including maximum likelihood estimation, maximum a posteriori and Markov Chain Monte Carlo [11].

6 Conclusion

In this paper, we provide a formal definition of the multivariate Gaussian process as well as several of its related properties, including existence, stationarity and independence. Additionally, we present some special cases and examples of multivariate Gaussian processes, such as multivariate Brownian motion and multivariate Gaussian white noise. Finally, we present an useful application of multivariate Gaussian process regression in statistical learning.

Notes

Usually, T is a time space, however it could be considered as any arbitrary space (both one-dimensional and multi-dimensional, both discrete and continuous space) other than time.
The co-ordinate random vector can be either column and row. For simplicity, we use row random vector only in our discussions of the d-variate Gaussian process.
In order to clarify the shape of random vector, we use the notation, ${\mathbb {R}}^{d}$ for column vector only, and use the notation, ${\mathbb {R}}^{1 \times d}$, for row vector only.
For clarification, we only use row random vector in our discussions of the d-variate Brownian motion.

References

Eldredge, N.: Analysis and probability on infinite-dimensional spaces. arXiv preprint arXiv:1607.03591 (2016)
Gupta, A.K., Nagar, D.K.: Matrix variate distributions, vol. 104. Chapman and Hall/CRC, New York (1999)
MATH Google Scholar
Hida, T., Hitsuda, M.: Gaussian processes, vol. 120. American Mathematical Soc, Providence, Rhode Island (1993)
MATH Google Scholar
Rajput, B.S., Cambanis, S.: Gaussian processes and Gaussian measures. The annals of mathematical statistics, 1944–1952 (1972)
Rasmussen, C.E.: Evaluation of gaussian processes and other methods for non-linear regression. PhD thesis, University of Toronto, Toronto, Canada (1997)
Boyle, P., Frean, M.: Dependent Gaussian processes. Adv. Neural Inf. Process. Syst. 17, 217–224 (2005)
Google Scholar
Wang, B., Chen, T.: Gaussian process regression with multiple response variables. Chemom. Intell. Lab. Syst. 142, 159–165 (2015)
Article Google Scholar
Chen, Z., Wang, B., Gorban, A.N.: Multivariate Gaussian and student-t process regression for multi-output prediction. Neural Comput. Appl. 32(8), 3005–3028 (2020)
Article Google Scholar
Le Gall, J.-F.: Brownian motion, Martingales, and stochastic calculus, vol. 274. Springer, Berlin, Heidelberg (2016)
Book MATH Google Scholar
Williams, C.K., Barber, D.: Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1342–1351 (1998)
Article Google Scholar
Rasmussen, C.E., Williams, C.K.: Gaussian Processes for machine learning, vol. 1. MIT press, Cambridge, Massachusetts (2006)
MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank Dr. Youssef El-Khatib for his comments and Dr. Gregory Markowsky for his kind proofreading and very helpful comments.

Author information

Authors and Affiliations

University of Edinburgh Business School, Edinburgh, UK
Zexun Chen
School of Mathematical Science, University of Nottingham Ningbo China, Ningbo, China
Jun Fan
College of Mathematics, Physics and Information Engineering, Jiaxing university, Jiaxing, China
Kuo Wang

Authors

Zexun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jun Fan
View author publications
You can also search for this author in PubMed Google Scholar
Kuo Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zexun Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, Z., Fan, J. & Wang, K. Multivariate Gaussian processes: definitions, examples and applications. METRON 81, 181–191 (2023). https://doi.org/10.1007/s40300-023-00238-3

Download citation

Received: 18 July 2022
Accepted: 07 January 2023
Published: 27 January 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s40300-023-00238-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Abstract

Similar content being viewed by others

Properties of the Bayesian Parameter Estimation of a Regression Based on Gaussian Processes

Oscillating Gaussian processes

The Use of Gaussian Processes in System Identification

1 Introduction

2 Preliminary

2.1 Gaussian measure

Definition 1

Definition 2

Definition 3

2.2 Matrix-variate Gaussian distribution

Definition 4

2.3 Stationary process

Definition 5

2.4 Gaussian process

Definition 6

Theorem 1

3 Multivariate Gaussian process

3.1 Definitions

Definition 7

Definition 8

Definition 9

Theorem 2

Proof

3.2 Properties

Proposition 3

Proof

Proposition 4

Proof

4 Example: special cases

4.1 Multivariate Gaussian white noise

Definition 10

Remark 1

4.2 Multivariate Brownian motion

Definition 11

Remark 2

Lemma 5

Proof

5 Application: multivariate Gaussian process regression

6 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation