1 Introduction

For each natural number n satisfying \(n\ge 2\), let

$$\begin{aligned} {\mathcal S}_{n-1} :=\left\{ p:{\Omega }_n\rightarrow {\mathbb R}_{++}\;\left| \; \sum _{{\omega }\in {\Omega }_n} p({\omega })=1\right. \right\} \end{aligned}$$

be the manifold of probability distributions on a finite sample space \({\Omega }_n=\{1,2,\dots ,n\}\), where \({\mathbb R}_{++}\) denotes the set of strictly positive real numbers. The manifold \({\mathcal S}_{n-1}\) is sometimes called the \((n-1)\)-dimensional probability simplex. In what follows, we identify each point \(p\in {\mathcal S}_{n-1}\) with the numerical vector \((p(1),p(2),\dots ,p(n) )\in {\mathbb R}_{++}^n\).

In his seminal book [6], Chentsov characterised Riemannian metric g and affine connections \(\nabla \) on \({\mathcal S}_{n-1}\) that fulfil certain invariance property, now usually referred to as the Markov invariance. Given natural numbers n and \(\ell \) satisfying \(2\le n\le \ell \), let

$$\begin{aligned} {\Omega }_\ell =\bigsqcup _{i=1}^n C_{(i)} \end{aligned}$$
(1)

be a direct sum decomposition of the index set \({\Omega }_\ell =\{1,\dots ,\ell \}\) into n mutually disjoint nonempty subsets \(C_{(1)},\dots ,C_{(n)}\). A map

$$\begin{aligned} f:\, {\mathcal S}_{n-1}\longrightarrow {\mathcal S}_{\ell -1}:\, (x^1,\dots ,x^n)\longmapsto (y^1,\dots ,y^\ell ) \end{aligned}$$

is called a Markov embeddingFootnote 1 associated with the partition (1) if it takes the form

$$\begin{aligned} y^j:=\sum _{i=1}^n x^i Q_{(i)}^j \qquad (j=1,\dots , \ell ), \end{aligned}$$
(2)

where \(Q_{(i)}=(Q_{(i)}^1,Q_{(i)}^2,\dots ,Q_{(i)}^\ell )\) is a probability distribution on \(\Omega _\ell \) whose support is \(C_{(i)}\) for each i (\(1\le i \le n)\). In other words, the image \(f(S_{n-1})\) is (the interior of) the convex hull of n extreme points \(\{ Q_{(i)} \}_{1\le i\le n}\) in \({\mathcal S}_{\ell -1}\). A simple example of a Markov embedding is illustrated in Fig. 1, where \(n=2\) and \(\ell =3\).

Fig. 1
figure 1

A Markov embedding \(f: {\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\) for \(n=2\) and \(\ell =3\), where \(C_{(1)}=\{1\}\), \(C_{(2)}=\{2, 3\}\), and \(Q_{(1)}=(1,0,0)\), \(Q_{(2)}=(0,Q^2,Q^3)\)

Since the image \(f({\mathcal S}_{n-1})\) of a Markov embedding \(f: {\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\) is statistically isomorphic to the preimage \({\mathcal S}_{n-1}\) (due to the existence of a sufficient statistic), Chentsov claimed that the geometry of the submanifold \(f({\mathcal S}_{n-1})\) of \({\mathcal S}_{\ell -1}\) must be equivalent to the geometry of \({\mathcal S}_{n-1}\).

Based on these observations, Chentsov introduced the notion of invariance/ equivariance, now usually referred to as the Markov invariance, as follows. A series \(\{g^{[n]}\}_{n\in {\mathbb N}}\) of Riemannian metrics, each on \({\mathcal S}_{n-1}\), is said to be invariant [6, p. 157] if

$$\begin{aligned} g_p^{[n]}(X,Y)=g_{f(p)}^{[\ell ]} (f_* X, f_* Y) \end{aligned}$$
(3)

holds for all \(n,\ell \in {\mathbb N}\) satisfying \(2\le n\le \ell \), Markov embeddings \(f:{\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\), points \(p\in {\mathcal S}_{n-1}\), and vector fields \(X, Y\in \Gamma (T{\mathcal S}_{n-1})\), where \(f_*\) denotes the differential of f. Chentsov proved that, up to a constant factor, the only invariant metric satisfying (3) is the Fisher metric [6, Theorem 11.1]. For an accessible proof, see [5] (cf., [8]).

On the other hand, a series \(\{\nabla ^{[n]}\}_{n\in {\mathbb N}}\) of affine connections, each on \({\mathcal S}_{n-1}\), is said to be equivariant [6, p. 62] if

$$\begin{aligned} f_*\left( \nabla _X^{[n]}Y \right) _p= \left( \nabla ^{[\ell ]}_{f_* X} f_* Y \right) _{f(p)} \end{aligned}$$
(4)

holds for all \(n,\ell \in {\mathbb N}\) satisfying \(2\le n\le \ell \), Markov embeddings \(f:{\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\), points \(p\in {\mathcal S}_{n-1}\), and vector fields \(X, Y\in \Gamma (T{\mathcal S}_{n-1})\). Chentsov proved that the only equivariant affine connections satisfying (4) are the \(\alpha \)-connections [6, Theorem 12.2]. For the reader’s convenience, we will give a proof, following the pattern of Chentsov’s original argument, in Appendix. (An alternative proof based on a weaker condition (5) below is found in [8].)

Chentsov’s theorem characterises all Markov invariant geometrical structures of the probability simplex \({\mathcal S}_{n-1}\), and thus is regarded as a cornerstone of information geometry. Nevertheless, it is natural to seek characterisations of Markov invariant tensor fields of generic-type and/or geometrical structures of probability spaces on infinite sample spaces, both of which being beyond the scope of Chentsov’s theorem.

Motivated by these considerations, this article aims to give two variations of Chentsov’s theorem. In Sect. 2, we extend the notion of Markov invariance to generic (rs)-type tensor fields, and characterise all Markov invariant tensor fields on \({\mathcal S}_{n-1}\). This section also serves as a brief overview of the paper [7]. In Sect. 3, following Amari and Nagaoka’s idea sketched out in [2], we demonstrate that the Fisher metric and the \(\alpha \)-connections are the only natural Markov invariant geometrical structures of parametric models comprising continuous probability densities on infinite sample spaces \({\mathbb R}^k\). Here, we employ only elementary calculus.

2 Markov invariant tensor fields of generic-types

Since the image \(f({\mathcal S}_{n-1})\) of a Markov embedding \(f:{\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\) is a submanifold of \({\mathcal S}_{\ell -1}\), its geometrical structure is canonically induced from the geometrical structure \((g^{[\ell ]}, \nabla ^{[\ell ]})\) of the ambient manifold \({\mathcal S}_{\ell -1}\). Specifically, the metric of \(f({\mathcal S}_{n-1})\) is induced from the metric \(g^{[\ell ]}\) by restricting it to the subspace \(T_{f(p)} f({\mathcal S}_{n-1}) \, (\subset T_{f(p)}{\mathcal S}_{\ell -1})\), and the connection of \(f({\mathcal S}_{n-1})\) is induced by projecting \((\nabla ^{[\ell ]}_{f_*X} f_*Y)_{f(p)}\) onto the subspace \(T_{f(p)} f({\mathcal S}_{n-1})\) with respect to \(g^{[\ell ]}\). Therefore, at first sight, the equivariance requirement (4) for a connection seems too strong, and one may instead define the Markov invariance for a connection in a weaker form as follows:

$$\begin{aligned} g^{[n]}_p(\nabla _X^{[n]}Y, Z)= g^{[\ell ]}_{f(p)}(\nabla ^{[\ell ]}_{f_* X} f_* Y, f_*Z). \end{aligned}$$
(5)

Nevertheless, it turns out that any sequence \(\{\nabla ^{[n]}\}_n\) of affine connections satisfying (5) enjoys the property \(\nabla ^{[\ell ]}_{f_*X} f_*Y\in T_{f(p)} f({\mathcal S}_{n-1})\), and thus the requirements (4) and (5) are actually equivalent.

Since the sequence \(\overline{\nabla }^{[n]}\) of the Levi-Civita connections with respect to the Markov invariant Fisher metrics \(g^{[n]}\) automatically fulfils the requirement (5), the problem of characterising the Markov invariant connections is reduced to characterising the Markov invariant (0, 3)-type tensor fields

$$\begin{aligned} (X,Y,Z)\longmapsto g\left( (\nabla _XY-\overline{\nabla }_XY), Z \right) . \end{aligned}$$

Now, in a quite similar way to the derivation of the Fisher metric:

$$\begin{aligned} g_p(X,Y):=E_p[(X\log p) (Y\log p)], \end{aligned}$$

where \(E_p[\,\cdot \,]\) denotes the expectation with respect to p, one can prove that, up to a constant factor, the only Markov invariant (0, 3)-type tensor field is given by

$$\begin{aligned} S_p(X,Y,Z):=E_p[(X\log p) (Y\log p) (Z\log p)], \end{aligned}$$
(6)

which is usually referred to as the Amari-Chentsov tensor. In this way, the \(\alpha \)-connection \(\nabla ^{(\alpha )}\) can also be defined by the formula

$$\begin{aligned} g(\nabla ^{(\alpha )}_X Y, Z) :=g(\overline{\nabla }_X Y, Z)-\frac{\alpha }{2} S(X,Y,Z) \qquad (\alpha \in {\mathbb R}). \end{aligned}$$

Note that the above argument naturally leads to a characterisation of Markov invariant (1, 2)-type tensor field F(XY) through the relation \(g(F(X,Y),Z)=S(X,Y,Z)\). One may naturally generalise this idea to characterising Markov invariant (1, s)-type tensor fields in terms of Markov invariant \((0,s+1)\)-type tensor fields. However, one cannot simply extend this argument to generic (rs)-type tensor fields. Now, a question naturally arises; how can one characterise Markov invariant (rs)-type tensor fields? The purpose of this section is to answer this question.

Associated with each Markov embedding \(f:\, {\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\) is a unique affine map

$$\begin{aligned} \varphi _f: {\mathcal S}_{\ell -1}\longrightarrow {\mathcal S}_{n-1} :\, (y^1,\dots ,y^\ell )\longmapsto (x^1,\dots ,x^n) \end{aligned}$$

that satisfies

$$\begin{aligned} \varphi _f \circ f=\mathrm{\textrm{id}}. \end{aligned}$$

In fact, it is explicitly given by the following relations

$$\begin{aligned} x^i=\sum _{j\in C_{(i)}} y^j \qquad (i=1,\dots ,n) \end{aligned}$$

that allocate each event \(C_{(i)} \,(\subset {\Omega }_\ell )\) to the singleton \(\{i\} \,(\subset {\Omega }_n)\). (For a proof, see [7].) We shall call the map \(\varphi _f\) the coarse-graining associated with a Markov embedding f. Note that the coarse-graining \(\varphi _f\) is determined only by the partition (1), and is independent of the internal ratios \(\{Q_{(i)}^j\}_{i,j}\) that specifies f as (2).

For example, let us consider a Markov embedding

$$\begin{aligned} f:{\mathcal S}_1\longrightarrow {\mathcal S}_3: (p_1,p_2)\longmapsto ({\lambda }p_1, (1-{\lambda }) p_1, {\mu }p_2, (1-{\mu }) p_2),\quad (0<{\lambda }, {\mu }<1) \end{aligned}$$

associated with the partition \({\Omega }_4=C_{(1)} \sqcup C_{(2)}\), where

$$\begin{aligned} C_{(1)}=\{1,2\},\quad C_{(2)}=\{3,4\}. \end{aligned}$$

Then, the coarse-graining \(\varphi _f: {\mathcal S}_3\rightarrow {\mathcal S}_1\) associated with f is given by

$$\begin{aligned} \varphi _f: (q_1,q_2,q_3, q_4)\longmapsto (q_1+q_2, q_3+q_4). \end{aligned}$$

Now we introduce a generalised Markov invariance. A series \(\{F^{[n]}\}_{n\in {\mathbb N}}\) of (rs)-type tensor fields, each on \({\mathcal S}_{n-1}\), is said to be Markov invariant if

$$\begin{aligned} F^{[n]}_p({\omega }^1,\dots , {\omega }^r, X_1,\dots , X_s) =F^{[\ell ]}_{f(p)}(\varphi ^*_f {\omega }^1,\dots , \varphi ^*_f{\omega }^r, f_* X_1,\dots , f_* X_s) \end{aligned}$$
(7)

holds for all \(n,\ell \in {\mathbb N}\) satisfying \(2\le n\le \ell \), Markov embeddings \(f:{\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\), points \(p\in {\mathcal S}_{n-1}\), cotangent vectors \({\omega }^1,\dots , {\omega }^r\in T^*_p\,{\mathcal S}_{n-1}\), and tangent vectors \(X_1,\dots , X_s\in T_p\,{\mathcal S}_{n-1}\). When no confusion arises, we simply use an abridged notation F for \(F^{[n]}\).

The main result of this section is the following.

Theorem 1

Markov invariant tensor fields are closed under the operations of raising and lowering indices with respect to the Fisher metric g.

In order to prove Theorem 1, we need some preliminary considerations. Suppose we want to know whether the (1, 2)-type tensor field \(F^{i}_{\;\; jk}:=g^{im}S_{mjk}\) is Markov invariant in the sense of (7), where S is the Markov invariant (0, 3)-type tensor field defined by (6). Put differently, we want to investigate if, for some (then any) local coordinate system \(({\xi }^a)\) of \({\mathcal S}_{n-1}\), the (1, 2)-type tensor field F defined by \(\displaystyle F\left( d{\xi }^a,\frac{\partial }{\partial {\xi }^b}, \frac{\partial }{\partial {\xi }^c}\right) :=g^{ae} S_{ebc}\) exhibits

$$\begin{aligned} F_p\left( d{\xi }^a,\frac{\partial }{\partial {\xi }^b}, \frac{\partial }{\partial {\xi }^c} \right) =F_{f(p)}\left( \varphi ^*_f d{\xi }^a, f_*\frac{\partial }{\partial {\xi }^b}, f_*\frac{\partial }{\partial {\xi }^c} \right) . \end{aligned}$$
(8)

In order to handle such a relation, it is useful to identify the Fisher metric g on the manifold \({\mathcal S}_{n-1}\) and its inverse \(g^{-1}\) with the following linear maps:

$$\begin{aligned} g&:&T{\mathcal S}_{n-1} \longrightarrow T^*{\mathcal S}_{n-1}:\; \frac{\partial }{\partial {\xi }^a} \longmapsto g_{ab} \,d{\xi }^b, \\ g^{-1}&:&T^*{\mathcal S}_{n-1} \longrightarrow T{\mathcal S}_{n-1}:\; d{\xi }^a \longmapsto g^{ab} \frac{\partial }{\partial {\xi }^b}. \end{aligned}$$

Note that these maps do not depend on the choice of a local coordinate system \(({\xi }^a)\) of \({\mathcal S}_{n-1}\).

Now, observe that

$$\begin{aligned} \text{ the } \text{ left-hand } \text{ side } \text{ of } \text{(8) }= & {} S_p\circ (g^{-1}_p\otimes I\otimes I)\left( d{\xi }^a,\frac{\partial }{\partial {\xi }^b}, \frac{\partial }{\partial {\xi }^c} \right) \\= & {} S_p \left( g^{ae}_p\frac{\partial }{\partial {\xi }^e},\frac{\partial }{\partial {\xi }^b}, \frac{\partial }{\partial {\xi }^c} \right) \end{aligned}$$

and

$$\begin{aligned} \text{ the } \text{ right-hand } \text{ side } \text{ of } \text{(8) }= & {} S_{f(p)}\circ (g^{-1}_{f(p)}\otimes I\otimes I)\left( \varphi ^*_f d{\xi }^a, f_*\frac{\partial }{\partial {\xi }^b}, f_*\frac{\partial }{\partial {\xi }^c} \right) \\= & {} S_{f(p)}\left( g^{-1}_{f(p)}(\varphi ^*_f d{\xi }^a), f_*\frac{\partial }{\partial {\xi }^b}, f_*\frac{\partial }{\partial {\xi }^c} \right) . \end{aligned}$$

Since the (0, 3)-type tensor field S is Markov invariant, the following Lemma establishes (8).

Lemma 2

For any Markov embedding \(f: {\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\), it holds that

$$\begin{aligned} f_*\left( g^{ae}_p\,\frac{\partial }{\partial {\xi }^e} \right) =g^{-1}_{f(p)}(\varphi ^*_f d{\xi }^a). \end{aligned}$$
(9)

In other words, the diagram

is commutative.

For the proof of Lemma 2, consult the original paper [7]. Lemma 2 has the following implication: raising indices with respect to the Fisher metric preserves Markov invariance. Note that this result is consistent with the observation in the opening paragraphs of this section where the Markov invariance of a (1, 2)-type tensor field F was connected with the Markov invariance of the (0, 3)-type tensor field S.

Let us proceed to the issue of lowering indices. Suppose that, given a Markov invariant (3, 0)-type tensor field T, we want to know whether the (2, 1)-type tensor field F defined by

$$\begin{aligned} F\left( \frac{\partial }{\partial {\xi }^a}, d{\xi }^b, d{\xi }^c \right) :=g_{ae} T^{ebc} \end{aligned}$$

satisfies Markov invariance:

$$\begin{aligned} F_p\left( \frac{\partial }{\partial {\xi }^a}, d{\xi }^b, d{\xi }^c\right) =F_{f(p)}\left( f_*\frac{\partial }{\partial {\xi }^a}, \varphi ^*_f d{\xi }^b, \varphi ^*_f d{\xi }^c\right) \end{aligned}$$

or equivalently

$$\begin{aligned} T_p\left( (g_p)_{ae} d{\xi }^e, d{\xi }^b, d{\xi }^c\right) =T_{f(p)}\left( g_{f(p)}\left( f_*\frac{\partial }{\partial {\xi }^a}\right) , \varphi ^*_f d{\xi }^b, \varphi ^*_f d{\xi }^c\right) . \end{aligned}$$

This question is resolved affirmatively by the following

Lemma 3

For any Markov embedding \(f: {\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\), it holds that

$$\begin{aligned} \varphi ^*_f \left( (g_p)_{ae} \, d{\xi }^e\right) =g_{f(p)}\left( f_*\frac{\partial }{\partial {\xi }^a}\right) . \end{aligned}$$
(10)

In other words, the diagram

is commutative.

Proof

Since g is an isomorphism, the identity (10) is an immediate consequence of Lemma 2. In fact, it follows from (9) that

$$\begin{aligned} f_*\left( \frac{\partial }{\partial {\xi }^e} \right) =g^{-1}_{f(p)} \left( \varphi ^*_f \left( (g_p)_{ea}\,d{\xi }^a \right) \right) , \end{aligned}$$

and thus

$$\begin{aligned} g_{f(p)}\left( f_* \frac{\partial }{\partial {\xi }^e} \right) =\varphi ^*_f \left( (g_p)_{ea}\,d{\xi }^a \right) , \end{aligned}$$

proving the claim. \(\square \)

Lemma 3 has the following implication: lowering indices with respect to the Fisher metric preserves Markov invariance. Theorem 1 is now an immediate consequence of Lemmas 2 and 3, as well as the line of arguments that precede those lemmas.

Theorem 1 has a remarkable consequence: every (rs)-type Markov invariant tensor field can be obtained by raising indices of some \((0,r+s)\)-type Markov invariant tensor field. For example, \(g^{ij}\) is, up to scaling, the only (2, 0)-type Markov invariant tensor field.

It may be worthwhile to mention that not every operation that is standard in tensor calculus preserves Markov invariance. The following example is due to Amari [1].

Example 4

With a \(\nabla ^{(+1)}\)-affine coordinate system \({\theta }=({\theta }^1,\dots ,{\theta }^{n-1})\) of \({\mathcal S}_{n-1}\) defined by

$$\begin{aligned} \log p({\omega })=\sum _{i=1}^{n-1} {\theta }^i{\delta }_i({\omega })-\log \left( 1+\sum _{k=1}^{n-1} \exp {\theta }^k \right) \qquad (\omega \in \Omega _n), \end{aligned}$$

the Amari-Chentsov tensor field (6) has the following components:

$$\begin{aligned} S_{ijk}=\left\{ \begin{array}{ll} {\eta }_i(1-{\eta }_i)(1-2{\eta }_i) &{} \quad (i=j=k) \\ -{\eta }_i (1-2{\eta }_i) {\eta }_k &{} \quad (i=j\ne k) \\ -{\eta }_j (1-2{\eta }_j) {\eta }_i &{} \quad (j=k\ne i) \\ -{\eta }_k (1-2{\eta }_k) {\eta }_j &{} \quad (k=i\ne j) \\ 2{\eta }_i {\eta }_j {\eta }_k &{} \quad (i\ne j\ne k\ne i)\end{array}\right. . \end{aligned}$$

Here, \({\eta }=({\eta }_1,\dots ,{\eta }_{n-1})\) is a \(\nabla ^{(-1)}\)-affine coordinate system of \({\mathcal S}_{n-1}\) that is dual to \({\theta }\); more succinctly, \(\eta _i=p(i)\). By using the formula

$$\begin{aligned} g^{ij}=\frac{1}{{\eta }_n}+\frac{{\delta }^{ij}}{{\eta }_i} \qquad \left( {\eta }_n:=1-\sum _{i=1}^{n-1} {\eta }_i \right) , \end{aligned}$$

the (1, 2)-type tensor field \(T^{i}_{\;\; jk}:=g^{im}S_{mjk}\) is readily calculated asFootnote 2

$$\begin{aligned} T^{i}_{~ jk} =\left\{ \begin{array}{ll} 1-2{\eta }_i &{} \quad (i=j=k) \\ -{\eta }_k &{} \quad (i=j\ne k) \\ -{\eta }_j &{} \quad (i=k\ne j) \\ 0 &{} \quad (i\ne j,\,i\ne k)\end{array}\right. . \end{aligned}$$
(11)

We know that T is Markov invariant (either from Theorem 1 or from the discussion in the opening paragraphs of this section). However, the following contracted (0, 1)-type tensor field

$$\begin{aligned} {\tilde{F}}_k:=T^{i}_{~ ik}=1-n{\eta }_k \end{aligned}$$

is non-zero, and hence is not Markov invariant.Footnote 3 This demonstrates that the contraction, which is a standard operation in tensor calculus, does not always preserve Markov invariance.

Chentsov’s idea of imposing the invariance of geometrical structures under Markov embeddings \(f: {\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\) is based on the fact that \({\mathcal S}_{n-1}\) is statistically isomorphic to \(f({\mathcal S}_{n-1})\). Put differently, the Markov invariance only involves direct comparison between \({\mathcal S}_{n-1}\) and its image \(f({\mathcal S}_{n-1})\), and is nothing to do with the complement of \(f({\mathcal S}_{n-1})\) in the ambient space \({\mathcal S}_{\ell -1}\). On the other hand, the partial trace operation \(T^{i}_{~ jk}\mapsto T^{i}_{~ ik}\) on \({\mathcal S}_{\ell -1}\) (more precisely, on \(T_{f(p)}{\mathcal S}_{\ell -1}\otimes T_{f(p)}^*{\mathcal S}_{\ell -1}\)) makes the output \(T^{i}_{~ ik}\) ‘contaminated’ with information from outside the submanifold \(f({\mathcal S}_{n-1})\). It is thus no wonder such an influx of extra information manifests itself as the non-preservation of Markov invariance. In this respect, a distinctive characteristic of Lemmas 2 and 3 lies in the fact that raising and lowering indices preserve Markov invariance although they are represented in the forms of contraction such as \(g^{i\ell }S_{mjk}\mapsto g^{im}S_{mjk}\) or \(g_{i\ell }T^{mjk}\mapsto g_{im}T^{mjk}\).

Remark 5

A related intriguing instance arises in curvature tensors: the Ricci curvature of the \(\alpha \)-connection \(\nabla ^{(\alpha )}\) of \({\mathcal S}_{n-1}\) is calculated to be

$$\begin{aligned} \textrm{Ric}^{\nabla ^{(\alpha )}}=(n-2)\,\frac{1-\alpha ^2}{4}\,g. \end{aligned}$$

Specifically, the manifold \({\mathcal S}_{n-1}\) is Einstein for all \(\alpha \in {\mathbb R}\). Moreover, for \(n\ge 3\), the Ricci curvature divided by \(n-2\) is Markov invariant for all \(\alpha \in {\mathbb R}\). Note that when \(n=2\), the manifold \({\mathcal S}_{n-1}\) is one-dimensional and thus is flat for all \(\alpha \).

3 Geometry of manifolds of continuous probability densities

In their celebrated book, Amari and Nagaoka stated that it is not so easy to extend Chentsov’s theorem to the case when the underlying set \({\mathcal {X}}\) of outcomes is infinite [2, p. 38]. There have been several attempts to deal with infinite outcome spaces and/or general measure spaces such as [3, 4, 9], but they are all technically demanding. Amari and Nagaoka also suggested a completely different approach to comprehend the Fisher metric and the \(\alpha \)-connections of a parametric model \(M=\{p_\theta (x): \theta \in \Theta \subset {\mathbb R}^d,\, x\in {\mathcal {X}}\}\) from the viewpoint of Chentsov’s theorem as followsFootnote 4 [2, p. 39]:

First, let us finitely partition \({\mathcal {X}}\) into the regions \(\Delta _1,\Delta _2,\dots ,\Delta _n\). In other words, each \(\Delta _i\) is a subset of \({\mathcal {X}}\), \(\Delta _i\cap \Delta _j=\varnothing \) (\(i\ne j\)), and \(\bigcup _{i=1}^n \Delta _i={\mathcal {X}}\). Now fix a particular partition \(\Delta =\{\Delta _1, \Delta _2,\dots ,\Delta _n\}\) and let

$$\begin{aligned} P_\theta ^\Delta (i):=\int _{\Delta _i} p_\theta (x) dx. \end{aligned}$$

Then \(M^\Delta :=\{P_\theta ^\Delta (i)\}\) forms a model on \(\Delta \). Since \(\Delta \) is a finite set, from Chentsov’s theorem we know that the Fisher metric and the \(\alpha \)-connections are introduced on \(M^\Delta \) by the invariance requirement. Now we may consider M to be the limit of \(M^\Delta \) as \(\Delta \) becomes finer and finer. Hence, if we require that the desired metrics and connections on models should be “continuous” with respect to such a limit, it is concluded that the metric and the connections on M should be given by the limit of the Fisher metric and the \(\alpha \)-connections on \(M^\Delta \), and under some regularity condition they coincide with the Fisher metric and the \(\alpha \)-connections on M.

It is crucial to notice that the coarse-graining \(p_\theta \mapsto P_\theta ^\Delta \) does not in general have a sufficient statistic.Footnote 5 This is in a striking contrast to the situation of the previous sections where we treated Markov embeddings that warranted the existence of sufficient statistics. In order to realise the above programme, therefore, it is important to scrutinise the limiting procedure. However, the meaning of “the limit of \(M^\Delta \) as \(\Delta \) becomes finer and finer” is mathematically unclear, and to the best of the author’s knowledge, this limiting procedure has not been treated explicitly in the literature. The purpose of this section is to demonstrate Amari and Nagaoka’s programme when the underlying sample space is \({\mathcal {X}}={\mathbb R}^k\) and the density function \(p_\theta (x)\) is continuous in \(x\in {\mathbb R}^k\), a simple yet typical situation in statistics.

Let \(M=\{p_\theta (x) : \theta \in \Theta \subset {\mathbb R}^d,\,x\in {\mathbb R}^k\}\) be a d-dimensional parametric family of probability density functions on \({\mathbb R}^k\). We assume the following regularity conditions:

  1. (i)

    the support of \(p_\theta \) does not depend on \(\theta \).

  2. (ii)

    \(p_\theta (x)\) is differentiable in \(\theta \), and both \(p_\theta (x)\) and its derivative \(X p_\theta (x)\) are continuous in x for all \(\theta \in \Theta \) and \(X\in T_{p_\theta }M\).

  3. (iii)

    for all Jordan measurableFootnote 6 domains \(A\subset {\mathbb R}^k\), \(\theta \in \Theta \), and \(X\in T_{p_\theta }M\),

    $$\begin{aligned} X \int _A p_\theta (x) dx=\int _A X p_\theta (x)dx. \end{aligned}$$
  4. (iv)

    for all \(\theta \in \Theta \) and \(X\in T_{p_\theta }M\), the Amari-Chentsov tensor

    $$\begin{aligned} S_\theta (X,X,X) = \int _{{\mathbb R}^k} p_\theta (x) \left( \frac{X p_\theta (x)}{p_\theta (x)} \right) ^3 dx \end{aligned}$$

    is absolutely convergent.

In condition (i), the support of \(p_\theta \) can be arbitrary; however, we assume in what follows that the support is \({\mathbb R}^k\) for concreteness. Let \(\Delta =\{\Delta _1, \Delta _2, \dots , \Delta _n\}\) be a Jordan measurable finite partition of \({\mathbb R}^k\) such that the interior of each \(\Delta _i\) is open and connected. We denote the totality of such finite partitions of \({\mathbb R}^k\) by \({\mathcal {I}}\). Note that \({\mathcal {I}}\) is a directed systemFootnote 7 endowed with partial ordering \(\Delta \prec \Delta '\) having the interpretation that \(\Delta '\) is finer than \(\Delta \).

Associated with a finite partition \(\Delta =\{\Delta _1, \Delta _2, \dots , \Delta _n\}\in {\mathcal {I}}\) is a parametric model \(M^\Delta =\{P_\theta ^\Delta \}_\theta \) on the finite set \(\Omega _n=\{1, 2, \dots , n\}\) defined by

$$\begin{aligned} P_\theta ^\Delta (i):=\int _{\Delta _i} p_\theta (x) dx \qquad (i\in \Omega _n). \end{aligned}$$

We are interested in the relationship between the original model \(M=\{p_\theta \}\) and the induced model \(M^\Delta =\{P_\theta ^\Delta \}\). Specifically, we want to know if the nets of the Fisher metrics \(\{g_\theta ^\Delta \}_\Delta \) and the Amari-Chentsov tensors \(\{S_\theta ^\Delta \}_\Delta \) on \(M^\Delta \) converge to the Fisher metric \(g_\theta \) and the Amari-Chentsov tensor \(S_\theta \) on M, respectively. The next theorem gives an affirmative answer to this question.

Theorem 6

Under regularity conditions (i)–(iv),

$$\begin{aligned} \lim _{\Delta \in {\mathcal {I}}}\, g_\theta ^\Delta (X,Y)=g_\theta (X,Y) \quad \text{ and }\quad \lim _{\Delta \in {\mathcal {I}}}\, S_\theta ^\Delta (X,Y,Z)=S_\theta (X,Y,Z) \end{aligned}$$

hold for all \(X,Y,Z\in T_{p_\theta }M\).

Theorem 6 could be paraphrased by saying that the Fisher metric and the \(\alpha \)-connections are the only natural Markov invariant geometrical structures of parametric models comprising continuous probability densities on \({\mathbb R}^k\).

Before proceeding to the proof, we introduce some notations that are used throughout the proof. For \(R>0\), let \(B^R\) denote the closed ball of radius R in \({\mathbb R}^k\) centred at the origin, i.e.,

$$\begin{aligned} B^R:=\{x\in {\mathbb R}^k: |x|\le R\}. \end{aligned}$$

Given a finite partition \(\Delta \in {\mathcal {I}}\), let

$$\begin{aligned} \Delta ^R=\{\Delta _j^R\}_{j=1}^{n_1+n_2} \end{aligned}$$

denote a refinement of \(\Delta \) in \({\mathcal {I}}\) such that \(\{\Delta _j^R\}_{j=1}^{n_1}\) and \(\{\Delta _j^R\}_{j=n_1+1}^{n_1+n_2}\) are partitions of \(B^R\) and its complement \({\mathbb R}^k{\setminus } B^R\), respectively. Note that \(n_1\) and \(n_2\) may depend both on \(\Delta \) and R.

Now we proceed to the proof of Theorem 6. By virtue of the standard polarisation argument using the identity

$$\begin{aligned} g(X,Y)=\frac{1}{2}\left\{ g(X+Y, X+Y)-g(X,X)-g(Y,Y) \right\} \end{aligned}$$

and its analogue

$$\begin{aligned} S(X,Y,Z)= & {} \frac{1}{6}\left\{ S(X+Y+Z,X+Y+Z,X+Y+Z)\right. \\{} & {} \quad -S(X+Y,X+Y,X+Y)-S(X+Z,X+Z,X+Z) \\{} & {} \quad \left. -S(Y+Z,Y+Z,Y+Z)+S(X,X,X)+S(Y,Y,Y)\right. \\{} & {} \quad \left. +S(Z,Z,Z)\right\} , \end{aligned}$$

which are valid for symmetric tensors g and S, we see that Theorem 6 is proved simply by showing that

$$\begin{aligned} \lim _{\Delta \in {\mathcal {I}}}\, g_\theta ^\Delta (X,X)=g_\theta (X,X) \end{aligned}$$
(12)

and

$$\begin{aligned} \lim _{\Delta \in {\mathcal {I}}}\, S_\theta ^\Delta (X,X,X)=S_\theta (X,X,X) \end{aligned}$$
(13)

for all \(X\in T_{p_\theta }M\). Since the proof of (12) is almost the same as that of (13), we shall present only the latter here.

The Amari–Chentsov tensor \(S_\theta ^{\Delta ^R}(X,X,X)\) of the induced model \(M^{\Delta ^R}=\{P_\theta ^{\Delta ^R}\}\) is decomposed into two parts:

$$\begin{aligned} S_\theta ^{\Delta ^R}(X,X,X) =\sum _{i=1}^{n_1} P_\theta ^{\Delta ^R}(i) \left( \frac{X P_\theta ^{\Delta ^R}(i)}{P_\theta ^{\Delta ^R}(i)} \right) ^3 +\sum _{i=n_1+1}^{n_1+n_2}P_\theta ^{\Delta ^R}(i) \left( \frac{X P_\theta ^{\Delta ^R}(i)}{P_\theta ^{\Delta ^R}(i)} \right) ^3.\nonumber \\ \end{aligned}$$
(14)

Firstly, let us evaluate the first term of the right-hand side of (14).

Lemma 7

$$\begin{aligned} \lim _{\Delta \in {\mathcal {I}}}\, \sum _{i=1}^{n_1} P_\theta ^{\Delta ^R}(i) \left( \frac{X P_\theta ^{\Delta ^R}(i)}{P_\theta ^{\Delta ^R}(i)}\right) ^3 =\int _{B^R} p_\theta (x) \left( \frac{X p_\theta (x)}{p_\theta (x)} \right) ^3 dx. \end{aligned}$$

Proof

Due to the mean-value theorem, for each \(i=1,\dots , n_1\), there is an \(x_i \in \Delta _i^R\) such that

$$\begin{aligned} P_\theta ^{\Delta ^R}(i)=\int _{\Delta _i^R} p_\theta (x) dx=p_\theta (x_i)\, \mu (\Delta _i^R), \end{aligned}$$

where \(\mu (\Delta _i^R)\) is the Jordan measure of the region \(\Delta _i^R\). Similarly, for each \(i=1,\dots , n_1\), there is a \(\xi _i\in \Delta _i^R\) such that

$$\begin{aligned} X P_\theta ^{\Delta ^R}(i) =\int _{\Delta _i^R} X p_\theta (x) dx =X p_\theta (\xi _i) \, \mu (\Delta _i^R). \end{aligned}$$

Thus,

$$\begin{aligned} \sum _{i=1}^{n_1} P_\theta ^{\Delta ^R}(i) \left( \frac{X P_\theta ^{\Delta ^R}(i)}{P_\theta ^{\Delta ^R}(i)}\right) ^3 = \sum _{i=1}^{n_1} p_\theta (x_i) \left( \frac{Xp_\theta (\xi _i)}{p_\theta (x_i)}\right) ^3 \mu (\Delta _i^R), \end{aligned}$$

and the next Lemma 8 proves the claim. \(\square \)

Lemma 8

Let f and g be continuous functions on a Jordan measurable bounded closed domain \(D \,(\subset {\mathbb R}^k)\). Given a Jordan measurable finite partition \(\Delta =\{\Delta _1,\dots ,\Delta _n\}\) of D, take arbitrary points \(x_i\) and \(\xi _i\) in \(\Delta _i\) for each \(i=1,\dots ,n\). Then

$$\begin{aligned} \lim _{\Delta }\, \sum _{i=1}^n f(x_i)g(\xi _i) \mu (\Delta _i)=\int _D f(x)g(x)dx, \end{aligned}$$

where the limit is taken over all Jordan measurable finite partitions \(\Delta \) of D.

Proof

Since

$$\begin{aligned} \sum _{i=1}^n f(x_i)g(\xi _i) \mu (\Delta _i) = \sum _{i=1}^n f(x_i)g(x_i) \mu (\Delta _i)+ \sum _{i=1}^n f(x_i)\{ g(\xi _i) -g(x_i)\}\mu (\Delta _i), \end{aligned}$$

it suffices to prove that

$$\begin{aligned} \lim _{\Delta }\, \sum _{i=1}^n f(x_i)\{ g(\xi _i) -g(x_i)\}\mu (\Delta _i)=0. \end{aligned}$$

We see from the Cauchy-Schwarz inequality that

$$\begin{aligned}{} & {} \left( \sum _{i=1}^n f(x_i)\{ g(\xi _i) -g(x_i)\}\mu (\Delta _i) \right) ^2\\{} & {} \le \left( \sum _{i=1}^n f(x_i)^2 \mu (\Delta _i)\right) \left( \sum _{i=1}^n \{ g(\xi _i) -g(x_i)\}^2 \mu (\Delta _i) \right) . \end{aligned}$$

For \(\Delta =\{\Delta _1,\dots ,\Delta _n\}\), let \(|\Delta |:=\max _{1\le i\le n} |\Delta _i|\), where \( |\Delta _i|\) is the diameter of \(\Delta _i\). Since g is uniformly continuous on D, for any \(\varepsilon >0\), there exists \(\delta >0\) so that \(|\Delta |<\delta \) implies \(|g(\xi _i)-g(x_i)|<\varepsilon \) for all \(i=1,\dots ,n\). As a consequence,

$$\begin{aligned} \sum _{i=1}^n \{ g(\xi _i) -g(x_i)\}^2 \mu (\Delta _i)<\varepsilon ^2 \mu (D). \end{aligned}$$

Since

$$\begin{aligned} \lim _{\Delta }\, \sum _{i=1}^n f(x_i)^2\mu (\Delta _i)=\int _D f(x)^2 dx<\infty , \end{aligned}$$

the claim is verified. \(\square \)

We next evaluate the second term of the right-hand side of (14).

Lemma 9

$$\begin{aligned} \lim _{R\rightarrow \infty } \sum _{i=n_1+1}^{n_1+n_2} P_\theta ^{\Delta ^R}(i) \left| \frac{X P_\theta ^{\Delta ^R}(i)}{P_\theta ^{\Delta ^R}(i)}\right| ^3 =0. \end{aligned}$$

Proof

Since \({p_\theta (x)}/{P_\theta ^{\Delta ^R}(i)}\) is a probability density on the region \(\Delta _i^R\), we apply Jensen’s inequality to the convex function \(t\mapsto |t|^3\), to obtain

$$\begin{aligned} \frac{1}{P_\theta ^{\Delta ^R}(i)} \int _{\Delta _i^R} p_\theta (x) \left| \frac{X p_\theta (x)}{p_\theta (x)}\right| ^3 dx\ge & {} \left| \frac{1}{P_\theta ^{\Delta ^R}(i)} \int _{\Delta _i^R} p_\theta (x) \frac{X p_\theta (x)}{p_\theta (x)} dx\right| ^3 \\= & {} \left| \frac{1}{P_\theta ^{\Delta ^R}(i)} \int _{\Delta _i^R} X p_\theta (x) dx \right| ^3 \\= & {} \left| \frac{X P_\theta ^{\Delta ^R}(i)}{P_\theta ^{\Delta ^R}(i)}\right| ^3. \end{aligned}$$

Consequently,

$$\begin{aligned} \int _{\Delta _i^R} p_\theta (x) \left| \frac{X p_\theta (x)}{p_\theta (x)}\right| ^3 dx \ge P_\theta ^{\Delta ^R}(i)\left| \frac{X P_\theta ^{\Delta ^R}(i) }{P_\theta ^{\Delta ^R}(i)}\right| ^3. \end{aligned}$$

Taking the sum over \(i=n_1+1,\dots , n_1+n_2\), we have

$$\begin{aligned} \int _{{\mathbb R}^k\backslash B^R}\, p_\theta (x) \left| \frac{X p_\theta (x)}{p_\theta (x)}\right| ^3 dx \ge \sum _{i=n_1+1}^{n_1+n_2} P_\theta ^{\Delta ^R}(i)\left| \frac{X P_\theta ^{\Delta ^R}(i) }{P_\theta ^{\Delta ^R}(i)}\right| ^3. \end{aligned}$$

Since regularity condition (iv) implies

$$\begin{aligned} \lim _{R\rightarrow \infty } \int _{{\mathbb R}^k{\setminus } B^R}\, p_\theta (x) \left| \frac{X p_\theta (x)}{p_\theta (x)}\right| ^3 dx=0, \end{aligned}$$

we have the claim. \(\square \)

Applying Lemmas 7 and 9 to (14), we conclude that for any \(\varepsilon > 0\), there exist \(\Delta \in {\mathcal {I}}\) and \(R>0\) such that \(\Delta ^R \prec \Delta '\) implies

$$\begin{aligned} \left| S_\theta ^{\Delta '}(X,X,X)-S_\theta (X,X,X) \right| <\varepsilon . \end{aligned}$$

This completes the proof of (13).

Remark 10

The continuity of the density \(p_\theta (x)\) and its derivative \(Xp_\theta (x)\) in regularity condition (ii) is introduced solely for the sake of simplicity, and can be loosened depending on the situation. For example, Theorem 6 is still valid even if \(p_\theta (x)\) and \(Xp_\theta (x)\) have finitely many discontinuity points.