1 Introduction

In probability theory and statistics, Gaussian processes are used in connection with problems such as estimation, detection, stochastic analysis, even modern statistical or machine learning. These problems are often effectively formulated in terms of Gaussian measures on appropriate real-valued function space. As an important extension, vector-valued Gaussian processes have not been adequately explained in the literature so that the real-valued function space may limit applications of classical Gaussian processes since the correlation between multiple random values is difficult to be considered in many problems. Therefore, this paper is to consolidate the fundamentals of vector-valued stochastic processes, especially extending classical Gaussian process to multivariate Gaussian process by introducing Gaussian measures on vector-valued function spaces. The main contribution of this paper is to provide a concise proof and a proper explanation of the multivariate Gaussian process followed by examples and applications such as the multivariate Brownian motion and multivariate Gaussian process regression.

The paper is organised as follows. Section 2 introduces some preliminaries, including classical Gaussian measures, matrix-variate Gaussian distribution, stationary process, and classical Gaussian process. Section 3 presents some theoretical definitions of multivariate Gaussian process with the proof of existence. Two examples and one application of multivariate Gaussian processes which show their usefulness is presented in Sect. 4 and Sect. 5. Conclusions and a discussion are given in Sect. 6.

2 Preliminary

2.1 Gaussian measure

Definition 1

(Gaussian measure on \({\mathbb {R}}\) [1]) Let \({\mathcal {B}}({\mathbb {R}})\) denote the completion of the Borel \(\sigma \)-algebra on \({\mathbb {R}}\). Let \(\lambda : {\mathcal {B}}({\mathbb {R}}) \mapsto [0, + \infty ]\) denote the usual Lebesgue measure. Then the Borel probability measure \(\gamma : {\mathcal {B}}({\mathbb {R}}) \mapsto [0, 1] \) is Gaussian with mean \(\mu \in {\mathbb {R}}\) and variance \(\sigma ^2 >0\),

$$\begin{aligned} \gamma (A) = \int _{A} \frac{1}{\sqrt{2\pi \sigma ^2}}\exp \left( -\frac{(x - \mu )^2}{2\sigma ^2} \right) \textrm{d}\lambda (x), \end{aligned}$$

for any measurable set \(A \in {\mathcal {B}}({\mathbb {R}})\).

A random variable X on a probability space \((\Omega , {\mathcal {B}}, {\mathcal {P}})\) is Gaussian with mean \(\mu \) and variance \(\sigma ^2\) if its distribution measure is Gaussian, i.e.

$$\begin{aligned} {\mathcal {P}}(X \in A) = \gamma (A). \end{aligned}$$

In terms of random variable, we have a definition of Gaussian random variable.

Definition 2

An n-dimensional random vector \(\varvec{X} = (X_1, \cdots , X_n)\) is Gaussian if and only if \(\langle \varvec{a}, \varvec{X} \rangle : = \varvec{a}^{{\textsf{T}}}\varvec{X} = \sum a_i X_i\) is a Gaussian random variable for all \(\varvec{a} = (a_1, \cdots , a_n) \in {\mathbb {R}}^n\).

In terms of measure, we can naturally have the definition of Gaussian measure on \({\mathbb {R}}^n\).

Definition 3

(Gaussian measure on \({\mathbb {R}}^n\) [1]) Let \(\gamma \) be a Borel probability measure on \({\mathbb {R}}^n\). For each \(\varvec{a} \in {\mathbb {R}}^n\), denote a random variable \(Y(\varvec{x}), \varvec{x} \in {\mathbb {R}}^n \) as a mapping \(\varvec{x} \mapsto \langle \varvec{a}, \varvec{x} \rangle \in {\mathbb {R}}\) on the probability space \(({\mathbb {R}}^n, {\mathcal {B}}({\mathbb {R}}^n), \gamma )\). The Borel probability measure \(\gamma \) is a Gaussian measure on \({\mathbb {R}}^n\) if and only if the random variable Y is Gaussian for each \(\varvec{a}\).

2.2 Matrix-variate Gaussian distribution

In statistics, the matrix-variate Gaussian distribution is a probability distribution that is a generalization of the multivariate normal distribution to matrix-valued random variables.

Definition 4

(Matrix-variate Gaussian distribution [2]) The random matrix is said to be Gaussian:

$$\begin{aligned} \varvec{X} \sim \mathcal{M}\mathcal{N}_{n, d}(M, U, V), \end{aligned}$$

if and only if

$$\begin{aligned} \textrm{vec}(\varvec{X}) \sim {\mathcal {N}}_{nd}(\textrm{vec}(M), V \otimes U), \end{aligned}$$

where \(\otimes \) denotes the Kronecker products and \(\textrm{vec}(\varvec{X})\) denotes the vectorisation of \(\varvec{X}\).

2.3 Stationary process

Definition 5

(Stationary process [3]) Let \(X = \{ X_n(\omega ) \}\) be a stochastic process on a probability space \((\Omega , {\mathcal {B}}, {\mathcal {P}})\). If for any integers n and h, two random vectors

$$\begin{aligned} (X_n, X_{n+1}, \ldots , X_{n+k-1}) \,\,\text {and} (X_{n+h}, X_{n+h+1}, \ldots , X_{n+h+k-1}) \end{aligned}$$

have the same probability distribution, then X is said to be a stationary process.

2.4 Gaussian process

Let \({\mathbb {R}}_T\) be function space of all \({\mathbb {R}}\)-valued functions on TFootnote 1. Consider the product topology on \({\mathbb {R}}_T\), which is defined as the smallest topology that makes the projection maps \(\Pi _{t_{1}, \ldots , t_{n}}(f)=\left[ f\left( t_{1}\right) , \ldots , f\left( t_{n}\right) \right] \) from \({\mathbb {R}}_T\) to \({\mathbb {R}}^n\) measurable, and define \({\mathcal {F}}\) as the Borel \(\sigma \)-algebra of this topology.

Definition 6

(Gaussian measure on \(({\mathbb {R}}_T, {\mathcal {F}})\)) A measure \(\gamma \) on \(({\mathbb {R}}_T, {\mathcal {F}})\) is called as a Gaussian measure if for any \(n \ge 1\) and \(t_1, \cdots , t_n \in T\), the push-forward measure \(\gamma \circ \Pi _{t_1, \cdots , t_n}^{-1}\) on \({\mathbb {R}}^n\) is a Gaussian measure.

Briefly speaking, (multivariate) Gaussian distributions are Gaussian measures on \({\mathbb {R}}^n\), and Gaussian processes are Gaussian measures on the function space \(({\mathbb {R}}_T, {\mathcal {F}})\) due to the following theorem.

Theorem 1

(Relationship between Gaussian process and Gaussian measure) If \(X=(X_t)_{t \in T}\) is a Gaussian process, then the push-forward measure \(\gamma = {\mathcal {P}} \circ X^{-1}\) with \(X: \Omega \mapsto {\mathbb {R}}_{T}\) is Gaussian on \({\mathbb {R}}_{T}\), namely, \(\gamma \) is a Gaussian measure on \(({\mathbb {R}}_T, {\mathcal {F}})\). Conversely, if \(\gamma \) is a Gaussian measure on \(({\mathbb {R}}_T, {\mathcal {F}})\), then on the probability space \(({\mathbb {R}}_T, {\mathcal {F}}, \gamma )\), the co-ordinate random variable \(\Pi = (\Pi _t)_{t \in T}\) is from a Gaussian process.

The proof of the relationship between Gaussian process and Gaussian measure can be found in [4]. Motivated by the relationship between Gaussian measures and Gaussian processes, we properly define multivariate Gaussian processes by extending Gaussian measures on real-valued function space to vector-valued function space.

3 Multivariate Gaussian process

Gaussian processes (GPs) have been proven to be an effective statistical learning method for nonlinear problems due to many desirable properties, such as a clear structure with Bayesian interpretation, a simple integrated approach of obtaining and expressing uncertainty in predictions and the capability of capturing a wide variety of data feature by hyper-parameters [5, 6]. With the development of Gaussian processes related to machine learning algorithms, Gaussian process applications face a conspicuous limitation. The classical GP models can be only used to deal with a single output or single response problem because the process itself is defined on \({\mathbb {R}}\), and as a result the correlation between multiple tasks or responses cannot be taken into consideration [6, 7]. In order to overcome the drawback above, [8] proposed a framework to perform multi-output prediction using multivariate Gaussian processes (MV-GPs). Although [8] showed the usefulness of the proposed methods via data-driven examples, some theoretical issues of multivariate Gaussian processes, such as existence, independence condition, and stationary condition, are still not clear.

3.1 Definitions

Following the classical theory of Gaussian measure, matrix-variate Gaussian distribution, and Gaussian process, we introduce Gaussian measure on matrix space and Gaussian measure on \({\mathbb {R}}^d\)-valued function space, and finally define the multivariate Gaussian process.

According to Definition 3 (equivalence between Gaussian measure on \({\mathbb {R}}^n\) and random variable on \({\mathbb {R}}^n\)) and Definition 4 (equivalence between random variable on \({\mathbb {R}}^{nd}\) and random matrix on \({\mathbb {R}}^{n \times d}\) ), we can have a definition of Gaussian measure on \({\mathbb {R}}^{n \times d}\).

Definition 7

(Gaussian measure on \({\mathbb {R}}^{n \times d}\)) Let \(\gamma \) be a Borel probability measure on \({\mathbb {R}}^{n \times d}\). For each \(\varvec{a} \in {\mathbb {R}}^{nd}\), denote a random variable \(Y(\varvec{x}), \varvec{x} \in {\mathbb {R}}^{n \times d}\) as a mapping \(\varvec{x} \mapsto \langle \varvec{a}, \textrm{vec}(\varvec{x}) \rangle \in {\mathbb {R}}\) on the probability space \(({\mathbb {R}}^{n \times d}, {\mathcal {B}}({\mathbb {R}}^{n \times d}), \gamma )\). The Borel probability measure \(\gamma \) is a Gaussian measure on \({\mathbb {R}}^{n \times d}\) if and only if the random variable Y is Gaussian for each \(\varvec{a}\).

Let \(({\mathbb {R}}^{d})_T\) be all \({\mathbb {R}}^{d}\)-valued functions on T and consider the smallest topology on \({\mathbb {R}}^d\) that makes the projection mappings \(\Xi _{t_{1}, \ldots , t_{n}}(\varvec{f})=\left[ \varvec{f}\left( t_{1}\right) , \ldots , \varvec{f}\left( t_{n}\right) \right] \) from \(({\mathbb {R}}^{d})_T\) to \({\mathbb {R}}^{n \times d}\) measurable, and define \({\mathcal {G}}\) as the Borel \(\sigma \)-algebra of this topology. Thus we can have a definition of Gaussian measure on \({\mathbb {R}}^d\)-valued function space.

Definition 8

(Gaussian measure on \((({\mathbb {R}}^d)_T, {\mathcal {G}})\)) A measure \(\gamma \) on \((({\mathbb {R}}^d)_T, {\mathcal {G}})\) is called as a Gaussian measure if for any \(n \ge 1\) and \(t_1, \cdots , t_n \in T\), the push-forward measure \(\gamma \circ \Xi _{t_1, \cdots , t_n}^{-1}\) on \({\mathbb {R}}^{n \times d}\) is a Gaussian measure.

Inspired by the relationship between Gaussian process and Gaussian measure in Theorem 1, we can appropriately define multivariate Gaussian processes (MV-GPs).

Definition 9

(d-variate Gaussian process) Given a Gaussian measure on \((({\mathbb {R}}^d)_T, {\mathcal {G}})\), \(d \ge 1\), the co-ordinate random vector \(\Xi = (\Xi _t)_{t \in T}\) on the probability space \((({\mathbb {R}}^d)_T, {\mathcal {G}}, \gamma )\) is said to be from a d-variate Gaussian processFootnote 2.

Given the definition of MV-GPs, it is essential to show the existence before proceeding with further research. Inspired by the fact that GPs can be specified by mean function and covariance function (thus they can be denoted as \(\mathcal{G}\mathcal{P}(\mu , k)\)), we present the proof of existence of MV-GPs by applying Daniell-Kolmogorov theorem.

Theorem 2

(Existence of d-variate Gaussian processFootnote 3) For any index set T, any vector-valued mean function \(\varvec{u} : T \mapsto {\mathbb {R}}^{1 \times d}\), any covariance function \(k: T \times T \mapsto {\mathbb {R}}\) and any positive semi-definite parameter matrix \(\Lambda \in {\mathbb {R}}^{d \times d}\), there exists a probability space \((\Omega , {\mathcal {G}}, {\mathcal {P}})\) and a d-variate Gaussian process \(\varvec{f}(t) \in {\mathbb {R}}^{1 \times d}, t \in T\) on this space, whose mean function is \(\varvec{u}\), covariance function is k and parameter matrix is \(\Lambda \) , such that,

  • \(\mathbb {E}[\varvec{f}(t)] = \varvec{u}(t), \quad \forall t \in T\),

  • \(\mathbb {E}\left[ (\varvec{f}(t_s)- \varvec{u}(t_s))(\varvec{f}(t_l)- \varvec{u}(t_l))^{{\textsf{T}}}\right] = \textrm{tr}(\Lambda )k(t_s,t_l),\quad \forall t_s,t_l \in T\),

  • \(\mathbb {E}[({\textbf {F}}_{t_1, \cdots , t_n} - M_{t_1, \cdots , t_n})^{\textsf{T}}({\textbf {F}}_{t_1, \cdots , t_n} - M_{t_1, \cdots , t_n}) ] = \textrm{tr}(K_{t_1, \cdots , t_n}) \Lambda , \quad \forall n\ge 1, t_1, \cdots , t_n \in T\), where

    $$\begin{aligned} M_{t_1, \cdots , t_n}&= [\varvec{u}(t_1)^{\textsf{T}}, \cdots , \varvec{u}(t_n)^{\textsf{T}}]^{\textsf{T}}\\ F_{t_1, \cdots , t_n}&= [\varvec{f}(t_1)^{\textsf{T}}, \cdots , \varvec{f}(t_n)^{\textsf{T}}]^{\textsf{T}}\\ K_{t_1, \cdots , t_n}&= \begin{bmatrix} k(t_1, t_1) &{} \cdots &{} k(t_1, t_n) \\ \vdots &{} \ddots &{} \vdots \\ k(t_n, t_1) &{} \cdots &{} k(t_n, t_n) \end{bmatrix}. \end{aligned}$$

We denote \(\varvec{f} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )\).

Proof

Given \(n > 1\), for every \(t_1, \cdots , t_n \in T\), a Gaussian measure \(\gamma _{t_1, \cdots , t_n}\) on \({\mathbb {R}}^{n \times d}\) satisfies the assumptions of Daniell-Kolmogorov theorem because the projection of a matrix Gaussian distribution on \({\mathbb {R}}^{n \times d}\) with \(\left[ \varvec{u}(t_1)^{{\textsf{T}}}, \cdots , \varvec{u}(t_n)^{{\textsf{T}}}\right] ^{{\textsf{T}}} \in {\mathbb {R}}^{n \times d}\), \(n \times n\) column covariance matrix \(K = (k_{i,j}) \in {\mathbb {R}}^{n \times n}\), and \(d \times d\) row covariance matrix \(\Lambda \in {\mathbb {R}}^{d \times d}\), to the first \(n-1\) co-ordinates, is precisely the Gaussian distribution with \(\left[ \varvec{u}(t_1)^{{\textsf{T}}}, \cdots , \varvec{u}(t_{n-1})^{{\textsf{T}}} \right] ^{{\textsf{T}}} \in {\mathbb {R}}^{(n-1) \times d}\), \((n-1) \times (n-1)\) column covariance matrix \(K = (k_{i,j}) \in {\mathbb {R}}^{(n-1) \times (n-1)}\), and row covariance matrix \(\Lambda \in {\mathbb {R}}^{d \times d}\) due to the conditional property of matrix Gaussian distribution [2, 8]. By the Daniell-Kolmogorov theorem, there exists a probability space \((\Omega , {\mathcal {G}}, {\mathcal {P}})\) as well as a d-variate Gaussian process \(X = (X_t)_{t \in T} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )\) defined on this space such that any finite dimensional distribution of \([X_{t_1}, \cdots , X_{t_n}]\) is given by the measure \(\gamma _{t_1, \cdots , t_n}\). \(\square \)

3.2 Properties

Following the existence of d-variate GPs, we also achieve some properties as follow.

Proposition 3

(Stationary) A d-variate Gaussian process \(\mathcal {MGP}_d(\varvec{u}, k, \Lambda )\) is said to be stationary if

$$\begin{aligned} \varvec{u}(t) = \varvec{u}(t + h), \quad k(t_s + h, t_l + h) = k(t_s, k_l), \forall t, t_s, t_l, h \in T. \end{aligned}$$

Proof

Let \(\varvec{f} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )\), then for \(\forall n\ge 1, t_1, \cdots , t_n \in T\),

$$\begin{aligned}{}[\varvec{f}(t_1)^{{\textsf{T}}},\ldots ,\varvec{f}(t_n)^{{\textsf{T}}}]^{{\textsf{T}}} \sim \mathcal{M}\mathcal{N}(\varvec{u}_{t_1, \cdots , t_n},K_{t_1, \cdots , t_n},\Lambda ), \end{aligned}$$

where \(\mathcal{M}\mathcal{N}(\cdot , \cdot , \cdot )\) is matrix-variate Gaussian distribution,

$$\begin{aligned} \varvec{u}_{t_1, \cdots , t_n} = \begin{bmatrix} \varvec{u} (t_1) \\ \vdots \\ \varvec{u} (t_n) \end{bmatrix} , \quad K_{t_1, \cdots , t_n} = \begin{bmatrix} k(t_1, t_1) &{} \cdots &{} k(t_1, t_n) \\ \vdots &{} \ddots &{} \vdots \\ k(t_n, t_1) &{} \cdots &{} k(t_n, t_n) \end{bmatrix}. \end{aligned}$$

Given any time increment \(h \in T\), there also exists,

$$\begin{aligned}{}[\varvec{f}(t_1 + h)^{{\textsf{T}}},\ldots ,\varvec{f}(t_n + h)^{{\textsf{T}}}]^{{\textsf{T}}} \sim \mathcal{M}\mathcal{N}(\varvec{u}_{t_1+h, \cdots , t_n+h},K_{t_1+h, \cdots , t_n+h},\Lambda ), \end{aligned}$$

where

$$\begin{aligned} \varvec{u}_{t_1+h, \cdots , t_n + h}&= \begin{bmatrix} \varvec{u}(t_1 +h ) \\ \vdots \\ \varvec{u} (t_n +h ) \end{bmatrix} ,\\ K_{t_1 + h, \cdots , t_n + h}&= \begin{bmatrix} k(t_1 + h, t_1 + h) &{} \cdots &{} k(t_1 + h, t_n + h) \\ \vdots &{} \ddots &{} \vdots \\ k(t_n + h, t_1 + h) &{} \cdots &{} k(t_n + h, t_n + h) \end{bmatrix}. \end{aligned}$$

Since \(\varvec{u}(t) = \varvec{u}(t + h), k(t_s + h, t_l + h) = k(t_s, k_l), \forall t, t_s, t_l, h \in T\),

$$\begin{aligned} \varvec{u}_{t_1+h, \cdots , t_n + h}&= \begin{bmatrix} \varvec{u}(t_1 +h) \\ \vdots \\ \varvec{u} (t_n +h) \end{bmatrix} =\begin{bmatrix} \varvec{u} (t_1) \\ \vdots \\ \varvec{u} (t_n) \end{bmatrix} = \varvec{u}_{t_1, \cdots , t_n }, \\ K_{t_1 + h, \cdots , t_n + h}&= \begin{bmatrix} k(t_1 + h, t_1 + h) &{} \cdots &{} k(t_1 + h, t_n + h) \\ \vdots &{} \ddots &{} \vdots \\ k(t_n + h, t_1 + h) &{} \cdots &{} k(t_n + h, t_n + h) \end{bmatrix} =\begin{bmatrix} k(t_1 , t_1) &{} \cdots &{} k(t_1, t_n) \\ \vdots &{} \ddots &{} \vdots \\ k(t_n, t_1) &{} \cdots &{} k(t_n, t_n) \end{bmatrix} \\&= K_{t_1, \cdots , t_n}. \end{aligned}$$

Therefore, \([\varvec{f}(t_1 + h)^{{\textsf{T}}},\ldots ,\varvec{f}(t_n + h)^{{\textsf{T}}}]^{{\textsf{T}}} \) has the same probability distribution as \([\varvec{f}(t_1)^{{\textsf{T}}},\ldots ,\varvec{f}(t_n)^{{\textsf{T}}}]^{{\textsf{T}}} \). Due to the arbitrary choice of \(n>1\) and \(t_1, \cdots , t_n \in T\), \(\varvec{f} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )\) is a stationary process according to Definition 5. \(\square \)

Proposition 4

(Independence) A d-collection of functions \(\{\varvec{f}_i\}_{i = 1,2, \cdots , d}\) identically independently follows a Gaussian process \(\mathcal{G}\mathcal{P}(\mu , k)\) if and only if

$$\begin{aligned} \varvec{f} = [\varvec{f}_1, \varvec{f}_2, \cdots , \varvec{f}_d] \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda ), \end{aligned}$$

where \(\varvec{u} = [\mu , \cdots , \mu ] \in {\mathbb {R}}^d\) and \(\Lambda \) is any diagonal positive semi-definite matrix.

Proof

Necessity: if \(\varvec{f} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )\), then for \(\forall n\ge 1, t_1, \cdots , t_n \in T\),

$$\begin{aligned}{}[\varvec{f}(t_1)^{{\textsf{T}}},\ldots ,\varvec{f}(t_n)^{{\textsf{T}}}]^{{\textsf{T}}} \sim \mathcal{M}\mathcal{N}(\varvec{u}_{t_1, \cdots , t_n},K_{t_1, \cdots , t_n},\Lambda ), \end{aligned}$$

where,

$$\begin{aligned} \varvec{u}_{t_1, \cdots , t_n} = \begin{bmatrix} \varvec{u} (t_1) \\ \vdots \\ \varvec{u} (t_n) \end{bmatrix} , \quad K_{t_1, \cdots , t_n} = \begin{bmatrix} k(t_1, t_1) &{} \cdots &{} k(t_1, t_n) \\ \vdots &{} \ddots &{} \vdots \\ k(t_n, t_1) &{} \cdots &{} k(t_n, t_n) \end{bmatrix}. \end{aligned}$$

Rewrite the left, we obtain

$$\begin{aligned}{}[\varvec{\xi }_1, \varvec{\xi }_2, \cdots , \varvec{\xi }_d] \sim \mathcal{M}\mathcal{N}(\varvec{u}_{t_1, \cdots , t_n},K_{t_1, \cdots , t_n},\Lambda ), \end{aligned}$$

where \(\varvec{\xi }_i = [f_i(t_1), f_i(t_2), \cdots , f_i(t_n)]^{{\textsf{T}}}\). Since \(\Lambda \) is a diagonal matrix, for any \(i \ne j\)

$$\begin{aligned} {\mathbb {E}}[\varvec{\xi }_i^{{\textsf{T}}} \varvec{\xi }_{j}] = \textrm{tr}( K_{t_1, \cdots , t_n}) \Lambda _{ij} = \textrm{tr}( K_{t_1, \cdots , t_n}) \cdot 0 = 0. \end{aligned}$$

Because \(\varvec{\xi }_i\) and \(\varvec{\xi }_j\) are any finite number of realisations of \(\varvec{f}_i\) and \(\varvec{f}_j\) respectively from the same Gaussian process, \(\varvec{f}_i\) and \(\varvec{f}_j\) are uncorrelated. Due to joint finite realisations of \(\varvec{f}_i\) and \(\varvec{f}_j\) follow Gaussian, non-correlation implies independence.

Sufficiency: if \(\{\varvec{f}_i\}_{i = 1,2, \cdots , d} \sim \mathcal{G}\mathcal{P}(0, k)\) are independent, for \(\forall n\ge 1, t_1, \cdots , t_n \in T\) and for any \(i \ne j\),

$$\begin{aligned} 0 = {\mathbb {E}}[\varvec{\xi }_i^{{\textsf{T}}} \varvec{\xi }_{j}] = \textrm{tr}( K_{t_1, \cdots , t_n}) \Lambda _{ij}. \end{aligned}$$

Since \(\textrm{tr}( K_{t_1, \cdots , t_n})\) is non-zero, \(\Lambda _{ij}\) must be 0. Due the arbitrary choices of ij, \(\Lambda \) must be diagonal. That is to say, \(\varvec{\xi }_i = [f_i(t_1), \cdots , f_i(t_n)]^{{\textsf{T}}}\) can be written as a matrix Gaussian distribution \(\mathcal{M}\mathcal{N}(\varvec{u}_{t_1, \cdots , t_n}, K_{t_1, \cdots , t_n}, \Lambda )\) where \(\Lambda \) is a diagonal positive semi-definite matrix. Since for \(\forall n\ge 1, t_1, \cdots , t_n \in T\) the above result holds, \(\{\varvec{f}_i\}_{i = 1, \cdots , d}\) are identically independent Gaussian processes \(\mathcal{G}\mathcal{P}(\mu , k)\). \(\square \)

4 Example: special cases

Instinctively, a special case is centred multivariate Gaussian process where vector-valued mean function \(\varvec{\mu } = \varvec{0}\). The 50 realisation samples generated from a centred multivariate Gaussian process are demonstrated in Fig. 1: Left. Furthermore, we can derive the multivariate Gaussian white noise and the multivariate Brownian motion.

Fig. 1
figure 1

The 50 random realisation sample points generated from of 2-variate Gaussian process. Left: A centred 2-variate Gaussian process with Gaussian covariance function \(k(t_s, t_l) = 1.5 \exp (-(t_s - t_l)^2/2/0.5^2)\). Right: A 2-variate Gaussian white noise as a 2-variate Gaussian process with covariance function \(k(t_s, t_l) = 1.5\delta (t_s, t_l)\)

4.1 Multivariate Gaussian white noise

Let \(\varvec{f} = [\varvec{f}_1,\cdots , \varvec{f}_d ] \sim \mathcal {MGP}_d(\varvec{0}, \sigma ^2 \mathbbm {1}(t_s, t_l), \Lambda )\), where \(\mathbbm {1}(\cdot )\) is an indicator function that equals 1 if \(t_s = t_l\), otherwise 0, thus

$$\begin{aligned}&{\mathbb {E}}[\varvec{f}(t)] = \varvec{0}, \quad \forall t \in T, \\&{\mathbb {E}}\left[ \varvec{f}(t_s)\varvec{f}(t_l)^{{\textsf{T}}}\right] = \textrm{tr}(\Lambda )\sigma ^2 \mathbbm {1}(t_s, t_l) = {\left\{ \begin{array}{ll} 0 &{} \text{ if } t_s \ne t_l \\ \sigma ^2\textrm{tr}(\Lambda ) &{} \text{ if } t_s = t_l \end{array}\right. } ,\quad \forall t_s,t_l \in T. \end{aligned}$$

Furthermore, \(\forall n\ge 1, t_1, \cdots , t_n \in T\), there exists,

$$\begin{aligned} {\mathbb {E}}[{\textbf {F}}_{t_1, \cdots , t_n}^{\textsf{T}}{\textbf {F}}_{t_1, \cdots , t_n}] = \textrm{tr}(K_{t_1, \cdots , t_n}) \Lambda = \textrm{tr}(\sigma ^2 \textrm{I}_{d \times d})\Lambda = d \sigma ^2 \Lambda , \end{aligned}$$

where \(F_{t_1, \cdots , t_n} = [\varvec{f}(t_1)^{\textsf{T}}, \cdots , \varvec{f}(t_n)^{\textsf{T}}]^{\textsf{T}}\).

Therefore, it is nature to have a definition of the d-variate Gaussian white noise.

Definition 10

(d-variate Gaussian white noise) A d-variate Gaussian process \(\mathcal {MGP}_d (\varvec{u}, k, \Lambda )\) is said to be a d-variate Gaussian white noise if \(\varvec{u} = \varvec{0}\) and \(k(t_s, t_l) = \sigma ^2 \mathbbm {1}(t_s, t_l)\).

Remark 1

We observe that d-variate Gaussian white noise has independence property as white noise along with T, but it has correlation along with d-variate dimension. Therefore, d-variate Gaussian white noise is also called as variate-dependent Gaussian white noise or variate-correlated Gaussian white noise, which is distinct from the traditional d-dimensional independent Gaussian white noise. Here are 50 realisation samples generated from a multivariate Gaussian white noise shown in Fig. 1: Right.

4.2 Multivariate Brownian motion

According to the Chapter 2 of the book [9], there is a definition of Brownian motion, which is a Gaussian white noise whose intensity is Lebesgue measure. Since Brownian motion is a special case of Gaussian process with continuous sample paths, mean function \(u = 0\) and covariance function \(k(s,t) = \min (s,t)\), we propose an example, d-variate Brownian motion, as a special case of d-variate Gaussian process with vector-valued mean function \(\varvec{u} = \varvec{0}\), covariance function \(k(s,t) = \min (s,t)\) and parameter matrix \(\Lambda \). Based on the Theorem 2, we derive some properties of the traditional Brownian motion to a more general vector-valued case.

Definition 11

(d-variate Brownian motion) A d-variate Gaussian process \(\mathcal {MGP}_d(\varvec{u}, k, \Lambda )\) is said to be d-variate Brownian motionFootnote 4 if all sample paths are continuous, \(\varvec{u} = \varvec{0}\) and \(k(t_s, t_l) =\min (t_s, t_l)\).

Let \(B_{t}\) be a d-variate Brownian motion, which means for all \(0 \le t_1 \le \cdots \le t_n\) the random matrix \(Z = (B_{t_1}^{{\textsf{T}}}, \ldots , B_{t_n}^{{\textsf{T}}})^{{\textsf{T}}} \in {\mathbb {R}}^{n \times d}\) has a Gaussian distribution on the probability space \((\Omega , {\mathcal {G}}, {\mathcal {P}})\) mentioned in Theorem 2. There exists a matrix \(M \in {\mathbb {R}}^{n \times d}\) and two non-negative definite matrices \(C = [c]_{jm} \in {\mathbb {R}}^{n \times n}\) and \(\Lambda = [\lambda ]_{ab} \in {\mathbb {R}}^{d \times d}\) such that

$$\begin{aligned} {\mathbb {E}} \left[ \exp \left( \text {i} \sum _{j = 1}^{n} W_{j, \cdot } Z_{j,\cdot }^{{\textsf{T}}} \right) \right]= & {} \exp \left( - \dfrac{1}{2} \sum _{j,m} W_{j, \cdot } c_{jm} W_{m, \cdot }^{{\textsf{T}}} + \text {i} \sum _j W_{j, \cdot } M_{j,\cdot }^{{\textsf{T}}} \right) ,\\ {\mathbb {E}} \left[ \exp \left( \text {i} \sum _{a = 1}^{d} W_{\cdot , a} Z_{\cdot ,a}^{{\textsf{T}}} \right) \right]= & {} \exp \left( - \dfrac{1}{2} \sum _{a,b} W_{\cdot , a} \lambda _{ab} W_{\cdot , b}^{{\textsf{T}}} + \text {i} \sum _a W_{\cdot , a} M_{\cdot ,a}^{{\textsf{T}}} \right) , \end{aligned}$$

where \(W = [w]_{ja} \in {\mathbb {R}}^{n \times d}\) and \(\text {i}\) is the imaginary unit. Moreover, we also have the mean value \(M = {\mathbb {E}}[Z]\) and two covariance matrices

$$\begin{aligned} c_{jm}= & {} {\mathbb {E}} [(Z_{j,\cdot } - M_{j,\cdot }) (Z_{m,\cdot } - M_{m,\cdot })^{{\textsf{T}}}],\\ \lambda _{ab}= & {} {\mathbb {E}} [(Z_{\cdot ,a} - M_{\cdot ,a})^{{\textsf{T}}} (Z_{\cdot ,b} - M_{\cdot ,b})]. \end{aligned}$$

Assume that the mean matrix M here is a zero matrix, i.e. \({\mathbb {E}}[Z] = {\mathbb {E}}[Z \vert t = 0] = 0\), \(I_d = \Lambda \), and

$$\begin{aligned} C = \begin{bmatrix} t_1 &{} t_1 &{} \cdots &{} t_1 \\ t_1 &{} t_2 &{} \cdots &{} t_2 \\ \vdots &{} \vdots &{} &{} \vdots \\ t_1 &{} t_2 &{} \cdots &{} t_n \end{bmatrix}. \end{aligned}$$

Hence, \({\mathbb {E}}[B_t]=0\) for all \(t \ge 0\) and

$$\begin{aligned} {\mathbb {E}}[B_t B_t^{{\textsf{T}}}] = d t, \; {\mathbb {E}}[B_t B_s^{{\textsf{T}}}] = d \min (s,t), \\{\mathbb {E}}[B_t^{{\textsf{T}}} B_t] = t \Lambda , \; {\mathbb {E}}[B_t^{{\textsf{T}}} B_s] = \min (s,t) \Lambda . \end{aligned}$$

Moreover, we have

$$\begin{aligned} {\mathbb {E}} [(B_t - B_s) (B_t - B_s)^{{\textsf{T}}}] = {\mathbb {E}} [B_t B_t^{{\textsf{T}}} - 2 B_s B_t^{{\textsf{T}}} + B_s B_s^{{\textsf{T}}}] = d |t - s |,\\ {\mathbb {E}} [(B_t - B_s)^{{\textsf{T}}} (B_t - B_s)] = {\mathbb {E}} [ B_t^{{\textsf{T}}} B_t - 2 B_s^{{\textsf{T}}} B_t + B_s^{{\textsf{T}}} B_s] = |t - s |\Lambda . \end{aligned}$$

Note that this d-variate Brownian motion \(B_t\) still has independent increments since \({\mathbb {E}} [(B_{t_i} - B_{t_{i-1}}) (B_{t_j} - B_{t_{j-1}})^{{\textsf{T}}}] = 0\) and \({\mathbb {E}} [(B_{t_i} - B_{t_{i-1}})^{{\textsf{T}}} (B_{t_j} - B_{t_{j-1}})] = 0\) when \(t_i < t_j\) holds for all \(0< t_1< \cdots < t_n\).

Remark 2

Similar to d-variate Gaussian white noise, d-variate Brownian motion also has independence property along with T, but it has correlation along with d-variate dimension. Therefore, d-variate Brownian motion is also called as variate-dependent Brownian motion or variate-correlated Brownian motion, which is distinct from the "traditional" d-dimensional Brownian motion. Actually, the "traditional" d-dimensional Brownian motion is a special case of d-variate Brownian motion with diagonal matrix \(\Lambda \).

As a Brownian motion, we then introduce Itô lemma for the d-variate Brownian motion.

Let \(B_t = [B_1(t), \cdots , B_d(t)]\) be the d-variate Brownian motion derived in Section 4.2. Then, we have the following lemma.

Lemma 5

(Itô lemma for the d-variate Brownian motion) Let F be a twice continuously differentiable real function on \({\mathbb {R}}^{d+1}\) and let \(\Lambda = [\lambda ]_{i,j} \in {\mathbb {R}}^{d \times d}\) be the covariance matrix for the d-variate dimension. Then,

$$\begin{aligned} F (t, B_1(t), \cdots , B_d(t)) = F (0, B_1(0), \cdots , B_d(0)) + \sum _{i=1}^d \int _0^t \dfrac{\partial F}{\partial B_i} (s, B_1(s), \cdots , B_d(s)) \textrm{d}B_i(s)\\ + \int _0^t \left\{ \dfrac{\partial F}{\partial s} (s, B_1(s), \cdots , B_d(s)) + \dfrac{1}{2} \sum _{i,j=1}^{d} \dfrac{\partial ^2 F}{\partial B_i \partial B_j} (s, B_1(s), \cdots , B_d(s)) \lambda _{i,j} \right\} \textrm{d}s. \end{aligned}$$

Proof

By Itô lemma and the definition of the d-variate Brownian motion, we obtain

$$\begin{aligned} F (t, B_1(t), \cdots , B_d(t))&= F (0, B_1(0), \cdots , B_d(0)) + \int _0^t \dfrac{\partial F}{\partial s} (s, B_1(s), \cdots , B_d(s)) \textrm{d}s \\&\quad + \sum _{i=1}^d \int _0^t \dfrac{\partial F}{\partial B_i} (s, B_1(s), \cdots , B_d(s)) \textrm{d}B_i(s) \\&\quad + \dfrac{1}{2} \sum _{i,j=1}^{d} \int _0^t \dfrac{\partial ^2 F}{\partial B_i \partial B_j} (s, B_1(s), \cdots , B_d(s)) \textrm{d}\langle B_i, B_j \rangle (s). \end{aligned}$$

The proof is complete by \(d\langle B_i, B_j \rangle (s) = \lambda _{i,j} \textrm{d}s\). \(\square \)

5 Application: multivariate Gaussian process regression

Multi-output prediction is a good example of a practical application of multivariate Gaussian processes. Multivariate Gaussian process modelling provides a solid and unified framework for predicting multiple responses or tasks by exploiting their correlations. As a regression problem, multivariate Gaussian process As a regression problem, multivariate Gaussian process regression (MV-GPR) has closed-form expressions for marginal likelihoods and predictive distributions, so parameter estimations can employ the same optimization techniques as conventional Gaussian process modelling [8].

As a summary of MV-GPR in [8], the noise-free multi-output regression model is considered and the noise term is incorporated into the kernel function. Given n pairs of observations \(\{(x_i,\varvec{y}_i)\}_{i=1}^n, x_i \in {\mathbb {R}}^p, \varvec{y}_i \in {\mathbb {R}}^{d}\), we assume the following model

$$\begin{aligned} \varvec{f} \sim \mathcal {MGP}_{d}(\varvec{0},k',\Lambda ), \quad \varvec{y}_i = \varvec{f}(x_i), \text{ for } \; i = 1,\cdots ,n, \end{aligned}$$

where \(\Lambda \) is an undetermined covariance matrix (the relationship between different outputs), \(k' = k(x_i,x_j) + \delta _{ij}\sigma _n^2, \) and \(\delta _{ij}\) is Kronecker delta. According to multivariate Gaussian process, it yields that the collection of functions \([\varvec{f}(x_1),\ldots ,\varvec{f}(x_n)]\) follows a matrix-variate Gaussian distribution

$$\begin{aligned}{}[\varvec{f}(x_1)^{{\textsf{T}}},\ldots ,\varvec{f}(x_n)^{{\textsf{T}}}]^{{\textsf{T}}} \sim \mathcal{M}\mathcal{N}(\varvec{0},K',\Lambda ), \end{aligned}$$

where \(K'\) is the \(n \times n\) covariance matrix of which the (ij)-th element \([K']_{ij} = k'(x_i,x_j)\). Therefore, the predictive targets \(\varvec{f}_* = [f_{*1},\ldots ,f_{*m}]^{\textsf{T}}\) at the test locations \(X_* = [x_{n+1},\ldots ,x_{n+m}]^{\textsf{T}}\) is given by

$$\begin{aligned} p(\varvec{f}_* \mid X, Y, X_*) = \mathcal{M}\mathcal{N}({\hat{M}},{\hat{\Sigma }},{\hat{\Lambda }}), \end{aligned}$$

where

$$\begin{aligned} {\hat{M}} = K'(X_*,X)^{{\textsf{T}}}K'(X,X)^{-1}Y,\\ {\hat{\Sigma }} = K'(X_*,X_*) - K'(X_*,X)^{{\textsf{T}}}K'(X,X)^{-1}K'(X_*,X), \end{aligned}$$

and

$$\begin{aligned} {\hat{\Lambda }} = \Lambda . \end{aligned}$$

Here \(K'(X,X)\) is an \(n \times n\) matrix of which the (ij)-th element \([K'(X,X)]_{ij} = k'(x_{i},x_j)\), \(K'(X_*,X)\) is an \(m \times n\) matrix of which the (ij)-th element \([K'(X_*,X)]_{ij} = k'(x_{n+i},x_j)\), and \(K'(X_*,X_*)\) is an \(m \times m\) matrix with the (ij)-th element \([K'(X_*,X_*)]_{ij} = k'(x_{n+i},x_{n+j})\). In addition, the expectation and the covariance are obtained,

$$\begin{aligned} {\mathbb {E}}[\varvec{f}_*]= & {} {\hat{M}}=K'(X_*,X)^{{\textsf{T}}}K'(X,X)^{-1}Y,\\ \textrm{cov}(\textrm{vec}(\varvec{f}^{{\textsf{T}}}_*))= & {} {\hat{\Sigma }}\otimes {\hat{\Lambda }} = [K'(X_*,X_*) - K'(X_*,X)^{{\textsf{T}}}K'(X,X)^{-1}K'(X_*,X)] \otimes \Lambda . \end{aligned}$$

From a data science perspective, the hyperparameters involved in the covariance function (kernel) \(k'( \cdot , \cdot )\) and the row covariance matrix of MV-GPR needs to be estimated from the training data using a variety of methods [10], including maximum likelihood estimation, maximum a posteriori and Markov Chain Monte Carlo [11].

6 Conclusion

In this paper, we provide a formal definition of the multivariate Gaussian process as well as several of its related properties, including existence, stationarity and independence. Additionally, we present some special cases and examples of multivariate Gaussian processes, such as multivariate Brownian motion and multivariate Gaussian white noise. Finally, we present an useful application of multivariate Gaussian process regression in statistical learning.