Abstract
Chentsov’s theorem, which characterises Markov invariant Riemannian metric and affine connections of manifolds of probability distributions on finite sample spaces, is undoubtedly a cornerstone of information geometry. This article aims at providing a comprehensible survey of Chentsov’s theorem as well as its modest extensions to generic tensor fields and to parametric models comprising continuous probability densities on \({\mathbb R}^k\).
Similar content being viewed by others
1 Introduction
For each natural number n satisfying \(n\ge 2\), let
be the manifold of probability distributions on a finite sample space \({\Omega }_n=\{1,2,\dots ,n\}\), where \({\mathbb R}_{++}\) denotes the set of strictly positive real numbers. The manifold \({\mathcal S}_{n-1}\) is sometimes called the \((n-1)\)-dimensional probability simplex. In what follows, we identify each point \(p\in {\mathcal S}_{n-1}\) with the numerical vector \((p(1),p(2),\dots ,p(n) )\in {\mathbb R}_{++}^n\).
In his seminal book [6], Chentsov characterised Riemannian metric g and affine connections \(\nabla \) on \({\mathcal S}_{n-1}\) that fulfil certain invariance property, now usually referred to as the Markov invariance. Given natural numbers n and \(\ell \) satisfying \(2\le n\le \ell \), let
be a direct sum decomposition of the index set \({\Omega }_\ell =\{1,\dots ,\ell \}\) into n mutually disjoint nonempty subsets \(C_{(1)},\dots ,C_{(n)}\). A map
is called a Markov embeddingFootnote 1 associated with the partition (1) if it takes the form
where \(Q_{(i)}=(Q_{(i)}^1,Q_{(i)}^2,\dots ,Q_{(i)}^\ell )\) is a probability distribution on \(\Omega _\ell \) whose support is \(C_{(i)}\) for each i (\(1\le i \le n)\). In other words, the image \(f(S_{n-1})\) is (the interior of) the convex hull of n extreme points \(\{ Q_{(i)} \}_{1\le i\le n}\) in \({\mathcal S}_{\ell -1}\). A simple example of a Markov embedding is illustrated in Fig. 1, where \(n=2\) and \(\ell =3\).
Since the image \(f({\mathcal S}_{n-1})\) of a Markov embedding \(f: {\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\) is statistically isomorphic to the preimage \({\mathcal S}_{n-1}\) (due to the existence of a sufficient statistic), Chentsov claimed that the geometry of the submanifold \(f({\mathcal S}_{n-1})\) of \({\mathcal S}_{\ell -1}\) must be equivalent to the geometry of \({\mathcal S}_{n-1}\).
Based on these observations, Chentsov introduced the notion of invariance/ equivariance, now usually referred to as the Markov invariance, as follows. A series \(\{g^{[n]}\}_{n\in {\mathbb N}}\) of Riemannian metrics, each on \({\mathcal S}_{n-1}\), is said to be invariant [6, p. 157] if
holds for all \(n,\ell \in {\mathbb N}\) satisfying \(2\le n\le \ell \), Markov embeddings \(f:{\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\), points \(p\in {\mathcal S}_{n-1}\), and vector fields \(X, Y\in \Gamma (T{\mathcal S}_{n-1})\), where \(f_*\) denotes the differential of f. Chentsov proved that, up to a constant factor, the only invariant metric satisfying (3) is the Fisher metric [6, Theorem 11.1]. For an accessible proof, see [5] (cf., [8]).
On the other hand, a series \(\{\nabla ^{[n]}\}_{n\in {\mathbb N}}\) of affine connections, each on \({\mathcal S}_{n-1}\), is said to be equivariant [6, p. 62] if
holds for all \(n,\ell \in {\mathbb N}\) satisfying \(2\le n\le \ell \), Markov embeddings \(f:{\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\), points \(p\in {\mathcal S}_{n-1}\), and vector fields \(X, Y\in \Gamma (T{\mathcal S}_{n-1})\). Chentsov proved that the only equivariant affine connections satisfying (4) are the \(\alpha \)-connections [6, Theorem 12.2]. For the reader’s convenience, we will give a proof, following the pattern of Chentsov’s original argument, in Appendix. (An alternative proof based on a weaker condition (5) below is found in [8].)
Chentsov’s theorem characterises all Markov invariant geometrical structures of the probability simplex \({\mathcal S}_{n-1}\), and thus is regarded as a cornerstone of information geometry. Nevertheless, it is natural to seek characterisations of Markov invariant tensor fields of generic-type and/or geometrical structures of probability spaces on infinite sample spaces, both of which being beyond the scope of Chentsov’s theorem.
Motivated by these considerations, this article aims to give two variations of Chentsov’s theorem. In Sect. 2, we extend the notion of Markov invariance to generic (r, s)-type tensor fields, and characterise all Markov invariant tensor fields on \({\mathcal S}_{n-1}\). This section also serves as a brief overview of the paper [7]. In Sect. 3, following Amari and Nagaoka’s idea sketched out in [2], we demonstrate that the Fisher metric and the \(\alpha \)-connections are the only natural Markov invariant geometrical structures of parametric models comprising continuous probability densities on infinite sample spaces \({\mathbb R}^k\). Here, we employ only elementary calculus.
2 Markov invariant tensor fields of generic-types
Since the image \(f({\mathcal S}_{n-1})\) of a Markov embedding \(f:{\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\) is a submanifold of \({\mathcal S}_{\ell -1}\), its geometrical structure is canonically induced from the geometrical structure \((g^{[\ell ]}, \nabla ^{[\ell ]})\) of the ambient manifold \({\mathcal S}_{\ell -1}\). Specifically, the metric of \(f({\mathcal S}_{n-1})\) is induced from the metric \(g^{[\ell ]}\) by restricting it to the subspace \(T_{f(p)} f({\mathcal S}_{n-1}) \, (\subset T_{f(p)}{\mathcal S}_{\ell -1})\), and the connection of \(f({\mathcal S}_{n-1})\) is induced by projecting \((\nabla ^{[\ell ]}_{f_*X} f_*Y)_{f(p)}\) onto the subspace \(T_{f(p)} f({\mathcal S}_{n-1})\) with respect to \(g^{[\ell ]}\). Therefore, at first sight, the equivariance requirement (4) for a connection seems too strong, and one may instead define the Markov invariance for a connection in a weaker form as follows:
Nevertheless, it turns out that any sequence \(\{\nabla ^{[n]}\}_n\) of affine connections satisfying (5) enjoys the property \(\nabla ^{[\ell ]}_{f_*X} f_*Y\in T_{f(p)} f({\mathcal S}_{n-1})\), and thus the requirements (4) and (5) are actually equivalent.
Since the sequence \(\overline{\nabla }^{[n]}\) of the Levi-Civita connections with respect to the Markov invariant Fisher metrics \(g^{[n]}\) automatically fulfils the requirement (5), the problem of characterising the Markov invariant connections is reduced to characterising the Markov invariant (0, 3)-type tensor fields
Now, in a quite similar way to the derivation of the Fisher metric:
where \(E_p[\,\cdot \,]\) denotes the expectation with respect to p, one can prove that, up to a constant factor, the only Markov invariant (0, 3)-type tensor field is given by
which is usually referred to as the Amari-Chentsov tensor. In this way, the \(\alpha \)-connection \(\nabla ^{(\alpha )}\) can also be defined by the formula
Note that the above argument naturally leads to a characterisation of Markov invariant (1, 2)-type tensor field F(X, Y) through the relation \(g(F(X,Y),Z)=S(X,Y,Z)\). One may naturally generalise this idea to characterising Markov invariant (1, s)-type tensor fields in terms of Markov invariant \((0,s+1)\)-type tensor fields. However, one cannot simply extend this argument to generic (r, s)-type tensor fields. Now, a question naturally arises; how can one characterise Markov invariant (r, s)-type tensor fields? The purpose of this section is to answer this question.
Associated with each Markov embedding \(f:\, {\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\) is a unique affine map
that satisfies
In fact, it is explicitly given by the following relations
that allocate each event \(C_{(i)} \,(\subset {\Omega }_\ell )\) to the singleton \(\{i\} \,(\subset {\Omega }_n)\). (For a proof, see [7].) We shall call the map \(\varphi _f\) the coarse-graining associated with a Markov embedding f. Note that the coarse-graining \(\varphi _f\) is determined only by the partition (1), and is independent of the internal ratios \(\{Q_{(i)}^j\}_{i,j}\) that specifies f as (2).
For example, let us consider a Markov embedding
associated with the partition \({\Omega }_4=C_{(1)} \sqcup C_{(2)}\), where
Then, the coarse-graining \(\varphi _f: {\mathcal S}_3\rightarrow {\mathcal S}_1\) associated with f is given by
Now we introduce a generalised Markov invariance. A series \(\{F^{[n]}\}_{n\in {\mathbb N}}\) of (r, s)-type tensor fields, each on \({\mathcal S}_{n-1}\), is said to be Markov invariant if
holds for all \(n,\ell \in {\mathbb N}\) satisfying \(2\le n\le \ell \), Markov embeddings \(f:{\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\), points \(p\in {\mathcal S}_{n-1}\), cotangent vectors \({\omega }^1,\dots , {\omega }^r\in T^*_p\,{\mathcal S}_{n-1}\), and tangent vectors \(X_1,\dots , X_s\in T_p\,{\mathcal S}_{n-1}\). When no confusion arises, we simply use an abridged notation F for \(F^{[n]}\).
The main result of this section is the following.
Theorem 1
Markov invariant tensor fields are closed under the operations of raising and lowering indices with respect to the Fisher metric g.
In order to prove Theorem 1, we need some preliminary considerations. Suppose we want to know whether the (1, 2)-type tensor field \(F^{i}_{\;\; jk}:=g^{im}S_{mjk}\) is Markov invariant in the sense of (7), where S is the Markov invariant (0, 3)-type tensor field defined by (6). Put differently, we want to investigate if, for some (then any) local coordinate system \(({\xi }^a)\) of \({\mathcal S}_{n-1}\), the (1, 2)-type tensor field F defined by \(\displaystyle F\left( d{\xi }^a,\frac{\partial }{\partial {\xi }^b}, \frac{\partial }{\partial {\xi }^c}\right) :=g^{ae} S_{ebc}\) exhibits
In order to handle such a relation, it is useful to identify the Fisher metric g on the manifold \({\mathcal S}_{n-1}\) and its inverse \(g^{-1}\) with the following linear maps:
Note that these maps do not depend on the choice of a local coordinate system \(({\xi }^a)\) of \({\mathcal S}_{n-1}\).
Now, observe that
and
Since the (0, 3)-type tensor field S is Markov invariant, the following Lemma establishes (8).
Lemma 2
For any Markov embedding \(f: {\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\), it holds that
In other words, the diagram
is commutative.
For the proof of Lemma 2, consult the original paper [7]. Lemma 2 has the following implication: raising indices with respect to the Fisher metric preserves Markov invariance. Note that this result is consistent with the observation in the opening paragraphs of this section where the Markov invariance of a (1, 2)-type tensor field F was connected with the Markov invariance of the (0, 3)-type tensor field S.
Let us proceed to the issue of lowering indices. Suppose that, given a Markov invariant (3, 0)-type tensor field T, we want to know whether the (2, 1)-type tensor field F defined by
satisfies Markov invariance:
or equivalently
This question is resolved affirmatively by the following
Lemma 3
For any Markov embedding \(f: {\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\), it holds that
In other words, the diagram
is commutative.
Proof
Since g is an isomorphism, the identity (10) is an immediate consequence of Lemma 2. In fact, it follows from (9) that
and thus
proving the claim. \(\square \)
Lemma 3 has the following implication: lowering indices with respect to the Fisher metric preserves Markov invariance. Theorem 1 is now an immediate consequence of Lemmas 2 and 3, as well as the line of arguments that precede those lemmas.
Theorem 1 has a remarkable consequence: every (r, s)-type Markov invariant tensor field can be obtained by raising indices of some \((0,r+s)\)-type Markov invariant tensor field. For example, \(g^{ij}\) is, up to scaling, the only (2, 0)-type Markov invariant tensor field.
It may be worthwhile to mention that not every operation that is standard in tensor calculus preserves Markov invariance. The following example is due to Amari [1].
Example 4
With a \(\nabla ^{(+1)}\)-affine coordinate system \({\theta }=({\theta }^1,\dots ,{\theta }^{n-1})\) of \({\mathcal S}_{n-1}\) defined by
the Amari-Chentsov tensor field (6) has the following components:
Here, \({\eta }=({\eta }_1,\dots ,{\eta }_{n-1})\) is a \(\nabla ^{(-1)}\)-affine coordinate system of \({\mathcal S}_{n-1}\) that is dual to \({\theta }\); more succinctly, \(\eta _i=p(i)\). By using the formula
the (1, 2)-type tensor field \(T^{i}_{\;\; jk}:=g^{im}S_{mjk}\) is readily calculated asFootnote 2
We know that T is Markov invariant (either from Theorem 1 or from the discussion in the opening paragraphs of this section). However, the following contracted (0, 1)-type tensor field
is non-zero, and hence is not Markov invariant.Footnote 3 This demonstrates that the contraction, which is a standard operation in tensor calculus, does not always preserve Markov invariance.
Chentsov’s idea of imposing the invariance of geometrical structures under Markov embeddings \(f: {\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\) is based on the fact that \({\mathcal S}_{n-1}\) is statistically isomorphic to \(f({\mathcal S}_{n-1})\). Put differently, the Markov invariance only involves direct comparison between \({\mathcal S}_{n-1}\) and its image \(f({\mathcal S}_{n-1})\), and is nothing to do with the complement of \(f({\mathcal S}_{n-1})\) in the ambient space \({\mathcal S}_{\ell -1}\). On the other hand, the partial trace operation \(T^{i}_{~ jk}\mapsto T^{i}_{~ ik}\) on \({\mathcal S}_{\ell -1}\) (more precisely, on \(T_{f(p)}{\mathcal S}_{\ell -1}\otimes T_{f(p)}^*{\mathcal S}_{\ell -1}\)) makes the output \(T^{i}_{~ ik}\) ‘contaminated’ with information from outside the submanifold \(f({\mathcal S}_{n-1})\). It is thus no wonder such an influx of extra information manifests itself as the non-preservation of Markov invariance. In this respect, a distinctive characteristic of Lemmas 2 and 3 lies in the fact that raising and lowering indices preserve Markov invariance although they are represented in the forms of contraction such as \(g^{i\ell }S_{mjk}\mapsto g^{im}S_{mjk}\) or \(g_{i\ell }T^{mjk}\mapsto g_{im}T^{mjk}\).
Remark 5
A related intriguing instance arises in curvature tensors: the Ricci curvature of the \(\alpha \)-connection \(\nabla ^{(\alpha )}\) of \({\mathcal S}_{n-1}\) is calculated to be
Specifically, the manifold \({\mathcal S}_{n-1}\) is Einstein for all \(\alpha \in {\mathbb R}\). Moreover, for \(n\ge 3\), the Ricci curvature divided by \(n-2\) is Markov invariant for all \(\alpha \in {\mathbb R}\). Note that when \(n=2\), the manifold \({\mathcal S}_{n-1}\) is one-dimensional and thus is flat for all \(\alpha \).
3 Geometry of manifolds of continuous probability densities
In their celebrated book, Amari and Nagaoka stated that it is not so easy to extend Chentsov’s theorem to the case when the underlying set \({\mathcal {X}}\) of outcomes is infinite [2, p. 38]. There have been several attempts to deal with infinite outcome spaces and/or general measure spaces such as [3, 4, 9], but they are all technically demanding. Amari and Nagaoka also suggested a completely different approach to comprehend the Fisher metric and the \(\alpha \)-connections of a parametric model \(M=\{p_\theta (x): \theta \in \Theta \subset {\mathbb R}^d,\, x\in {\mathcal {X}}\}\) from the viewpoint of Chentsov’s theorem as followsFootnote 4 [2, p. 39]:
First, let us finitely partition \({\mathcal {X}}\) into the regions \(\Delta _1,\Delta _2,\dots ,\Delta _n\). In other words, each \(\Delta _i\) is a subset of \({\mathcal {X}}\), \(\Delta _i\cap \Delta _j=\varnothing \) (\(i\ne j\)), and \(\bigcup _{i=1}^n \Delta _i={\mathcal {X}}\). Now fix a particular partition \(\Delta =\{\Delta _1, \Delta _2,\dots ,\Delta _n\}\) and let
Then \(M^\Delta :=\{P_\theta ^\Delta (i)\}\) forms a model on \(\Delta \). Since \(\Delta \) is a finite set, from Chentsov’s theorem we know that the Fisher metric and the \(\alpha \)-connections are introduced on \(M^\Delta \) by the invariance requirement. Now we may consider M to be the limit of \(M^\Delta \) as \(\Delta \) becomes finer and finer. Hence, if we require that the desired metrics and connections on models should be “continuous” with respect to such a limit, it is concluded that the metric and the connections on M should be given by the limit of the Fisher metric and the \(\alpha \)-connections on \(M^\Delta \), and under some regularity condition they coincide with the Fisher metric and the \(\alpha \)-connections on M.
It is crucial to notice that the coarse-graining \(p_\theta \mapsto P_\theta ^\Delta \) does not in general have a sufficient statistic.Footnote 5 This is in a striking contrast to the situation of the previous sections where we treated Markov embeddings that warranted the existence of sufficient statistics. In order to realise the above programme, therefore, it is important to scrutinise the limiting procedure. However, the meaning of “the limit of \(M^\Delta \) as \(\Delta \) becomes finer and finer” is mathematically unclear, and to the best of the author’s knowledge, this limiting procedure has not been treated explicitly in the literature. The purpose of this section is to demonstrate Amari and Nagaoka’s programme when the underlying sample space is \({\mathcal {X}}={\mathbb R}^k\) and the density function \(p_\theta (x)\) is continuous in \(x\in {\mathbb R}^k\), a simple yet typical situation in statistics.
Let \(M=\{p_\theta (x) : \theta \in \Theta \subset {\mathbb R}^d,\,x\in {\mathbb R}^k\}\) be a d-dimensional parametric family of probability density functions on \({\mathbb R}^k\). We assume the following regularity conditions:
-
(i)
the support of \(p_\theta \) does not depend on \(\theta \).
-
(ii)
\(p_\theta (x)\) is differentiable in \(\theta \), and both \(p_\theta (x)\) and its derivative \(X p_\theta (x)\) are continuous in x for all \(\theta \in \Theta \) and \(X\in T_{p_\theta }M\).
-
(iii)
for all Jordan measurableFootnote 6 domains \(A\subset {\mathbb R}^k\), \(\theta \in \Theta \), and \(X\in T_{p_\theta }M\),
$$\begin{aligned} X \int _A p_\theta (x) dx=\int _A X p_\theta (x)dx. \end{aligned}$$ -
(iv)
for all \(\theta \in \Theta \) and \(X\in T_{p_\theta }M\), the Amari-Chentsov tensor
$$\begin{aligned} S_\theta (X,X,X) = \int _{{\mathbb R}^k} p_\theta (x) \left( \frac{X p_\theta (x)}{p_\theta (x)} \right) ^3 dx \end{aligned}$$is absolutely convergent.
In condition (i), the support of \(p_\theta \) can be arbitrary; however, we assume in what follows that the support is \({\mathbb R}^k\) for concreteness. Let \(\Delta =\{\Delta _1, \Delta _2, \dots , \Delta _n\}\) be a Jordan measurable finite partition of \({\mathbb R}^k\) such that the interior of each \(\Delta _i\) is open and connected. We denote the totality of such finite partitions of \({\mathbb R}^k\) by \({\mathcal {I}}\). Note that \({\mathcal {I}}\) is a directed systemFootnote 7 endowed with partial ordering \(\Delta \prec \Delta '\) having the interpretation that \(\Delta '\) is finer than \(\Delta \).
Associated with a finite partition \(\Delta =\{\Delta _1, \Delta _2, \dots , \Delta _n\}\in {\mathcal {I}}\) is a parametric model \(M^\Delta =\{P_\theta ^\Delta \}_\theta \) on the finite set \(\Omega _n=\{1, 2, \dots , n\}\) defined by
We are interested in the relationship between the original model \(M=\{p_\theta \}\) and the induced model \(M^\Delta =\{P_\theta ^\Delta \}\). Specifically, we want to know if the nets of the Fisher metrics \(\{g_\theta ^\Delta \}_\Delta \) and the Amari-Chentsov tensors \(\{S_\theta ^\Delta \}_\Delta \) on \(M^\Delta \) converge to the Fisher metric \(g_\theta \) and the Amari-Chentsov tensor \(S_\theta \) on M, respectively. The next theorem gives an affirmative answer to this question.
Theorem 6
Under regularity conditions (i)–(iv),
hold for all \(X,Y,Z\in T_{p_\theta }M\).
Theorem 6 could be paraphrased by saying that the Fisher metric and the \(\alpha \)-connections are the only natural Markov invariant geometrical structures of parametric models comprising continuous probability densities on \({\mathbb R}^k\).
Before proceeding to the proof, we introduce some notations that are used throughout the proof. For \(R>0\), let \(B^R\) denote the closed ball of radius R in \({\mathbb R}^k\) centred at the origin, i.e.,
Given a finite partition \(\Delta \in {\mathcal {I}}\), let
denote a refinement of \(\Delta \) in \({\mathcal {I}}\) such that \(\{\Delta _j^R\}_{j=1}^{n_1}\) and \(\{\Delta _j^R\}_{j=n_1+1}^{n_1+n_2}\) are partitions of \(B^R\) and its complement \({\mathbb R}^k{\setminus } B^R\), respectively. Note that \(n_1\) and \(n_2\) may depend both on \(\Delta \) and R.
Now we proceed to the proof of Theorem 6. By virtue of the standard polarisation argument using the identity
and its analogue
which are valid for symmetric tensors g and S, we see that Theorem 6 is proved simply by showing that
and
for all \(X\in T_{p_\theta }M\). Since the proof of (12) is almost the same as that of (13), we shall present only the latter here.
The Amari–Chentsov tensor \(S_\theta ^{\Delta ^R}(X,X,X)\) of the induced model \(M^{\Delta ^R}=\{P_\theta ^{\Delta ^R}\}\) is decomposed into two parts:
Firstly, let us evaluate the first term of the right-hand side of (14).
Lemma 7
Proof
Due to the mean-value theorem, for each \(i=1,\dots , n_1\), there is an \(x_i \in \Delta _i^R\) such that
where \(\mu (\Delta _i^R)\) is the Jordan measure of the region \(\Delta _i^R\). Similarly, for each \(i=1,\dots , n_1\), there is a \(\xi _i\in \Delta _i^R\) such that
Thus,
and the next Lemma 8 proves the claim. \(\square \)
Lemma 8
Let f and g be continuous functions on a Jordan measurable bounded closed domain \(D \,(\subset {\mathbb R}^k)\). Given a Jordan measurable finite partition \(\Delta =\{\Delta _1,\dots ,\Delta _n\}\) of D, take arbitrary points \(x_i\) and \(\xi _i\) in \(\Delta _i\) for each \(i=1,\dots ,n\). Then
where the limit is taken over all Jordan measurable finite partitions \(\Delta \) of D.
Proof
Since
it suffices to prove that
We see from the Cauchy-Schwarz inequality that
For \(\Delta =\{\Delta _1,\dots ,\Delta _n\}\), let \(|\Delta |:=\max _{1\le i\le n} |\Delta _i|\), where \( |\Delta _i|\) is the diameter of \(\Delta _i\). Since g is uniformly continuous on D, for any \(\varepsilon >0\), there exists \(\delta >0\) so that \(|\Delta |<\delta \) implies \(|g(\xi _i)-g(x_i)|<\varepsilon \) for all \(i=1,\dots ,n\). As a consequence,
Since
the claim is verified. \(\square \)
We next evaluate the second term of the right-hand side of (14).
Lemma 9
Proof
Since \({p_\theta (x)}/{P_\theta ^{\Delta ^R}(i)}\) is a probability density on the region \(\Delta _i^R\), we apply Jensen’s inequality to the convex function \(t\mapsto |t|^3\), to obtain
Consequently,
Taking the sum over \(i=n_1+1,\dots , n_1+n_2\), we have
Since regularity condition (iv) implies
we have the claim. \(\square \)
Applying Lemmas 7 and 9 to (14), we conclude that for any \(\varepsilon > 0\), there exist \(\Delta \in {\mathcal {I}}\) and \(R>0\) such that \(\Delta ^R \prec \Delta '\) implies
This completes the proof of (13).
Remark 10
The continuity of the density \(p_\theta (x)\) and its derivative \(Xp_\theta (x)\) in regularity condition (ii) is introduced solely for the sake of simplicity, and can be loosened depending on the situation. For example, Theorem 6 is still valid even if \(p_\theta (x)\) and \(Xp_\theta (x)\) have finitely many discontinuity points.
Data availibility
Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.
Notes
Chentsov called such an embedding a congruent embedding [6, Lemma 9.5].
Incidentally, the coefficients of \(\alpha \)-connections in the coordinate system \(\theta \) are related to \(S_{ijk}\) and \(T^{i}_{~ jk}\) as follows:
$$\begin{aligned} \displaystyle \Gamma _{ij,k}^{(\alpha )}=\frac{1-\alpha }{2} S_{ijk} \quad \text{ and }\quad \Gamma _{ij}^{(\alpha )k}=\frac{1-\alpha }{2} T^{k}_{~ ij}. \end{aligned}$$Recall that the only Markov invariant (0, 1)-type tensor field on \({\mathcal S}_{n-1}\) is zero.
Notations are slightly changed according to the context of the present article.
Amari and Nagaoka mentioned this fact in the original Japanese edition of [2].
A bounded set \(A\,(\subset {\mathbb R}^k)\) is called Jordan measurable if the inner Jordan measure of A (the supremum of volumes of nonoverlapping left-closed rectangles that belong to A) equals the outer Jordan measure of A (the infimum of volumes of nonoverlapping left-closed rectangles that cover A). In the present article, we further make an extended use of this terminology: a set \(A\,(\subset {\mathbb R}^k)\), which can be unbounded, is called Jordan measurable if \(A\cap \{x\in {\mathbb R}^k : |x|\le R\}\) is Jordan measurable for all \(R>0\).
A directed system is an index set \({\mathcal {I}}\) together with a partial ordering \(\prec \) which satisfies the condition that if \(\alpha , \beta \in {\mathcal {I}}\), then there exists \(\gamma \in {\mathcal {I}}\) so that \(\alpha \prec \gamma \) and \(\beta \prec \gamma \). A net \(\{x_\alpha \}_{\alpha \in {\mathcal {I}}}\) in a topological space X (i.e., a mapping from a directed system \({\mathcal {I}}\) to X) is said to converge to a point \(x\in X\), written \(\lim _{\alpha \in {\mathcal {I}}}\,x_\alpha =x\), if for any neighbourhood \({\mathcal {N}}_x\) of x, there exists a \(\beta \in {\mathcal {I}}\) so that \(\beta \prec \alpha \) implies \(x_\alpha \in {\mathcal {N}}_x\). See [10] for more information.
References
Amari, S.-I.: Private communication (2015)
Amari, S.-I., Nagaoka, H.: Methods of Information Geometry, Translations of Mathematical Monographs 191 (AMS and Oxford, Providence, 2000); Originally Published in Japanese. Iwanami Shoten, Tokyo (1993)
Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information geometry and sufficient statistics. Probab. Theory Relat. Fields 162, 327–364 (2015)
Bauer, M., Bruveris, M., Michor, P.W.: Uniqueness of the Fisher–Rao metric on the space of smooth densities. Bull. Lond. Math. Soc. 48, 499–506 (2016)
Campbell, L.L.: An extended Čencov characterization of the information metric. Proc. Am. Math. Soc. 98, 135–141 (1986)
Čencov, N.N.: Statistical Decision Rules and Optimal Inference, Translations of Mathematical Monographs 53 (AMS, Providence, 1982); Originally Published in Russian. Nauka, Moscow (1972)
Fujiwara, A.: Complementing Chentsov’s characterization. In: Ay, N. (ed.) Information Geometry and Its Applications, Springer Proceedings in Mathematics & Statistics, vol. 252. Springer, pp. 335–347 (2018)
Fujiwara, A.: Foundations of Information Geometry. Kyoritsu Shuppan, Tokyo (2021).. (in Japanese)
Pistone, G., Sempi, C.: An infinite-dimensional structure on the space of all the probability measures equivalent to a given one. Ann. Stat. 5, 1543–1561 (1995)
Reed, M., Simon, B.: Methods of Modern Mathematical Physics I: Functional Analysis. Academic Press, San Diego (1980)
Acknowledgements
The author would like to express his sincere gratitude to Professor Shun-ichi Amari for all his encouragement and inspiring discussions. He is also grateful to Professor Hiroshi Nagaoka for many insightful comments. The present study was supported by JSPS KAKENHI Grant no. JP17H02861.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author states that there is no conflict of interest.
Additional information
Communicated by Hiroshi Matsuzoe.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
A Chentsov’s argument characterising affine connections
Introduce a coordinate system \(\theta =(\theta ^i)_{1\le i\le n-1}\) of \({\mathcal S}_{n-1}\) as in Example 4, that is,
Then, Chentsov’s theorem [6, Theorem 12.2] is restated as follows.
Theorem 11
The affine connections \(\nabla \) on \({\mathcal S}_{n-1}\) satisfying the equivariance condition
are all described by formulas
where \(\partial _i:=\partial /\partial \theta ^i\), \(\eta _i:=p(i)\), and \(\gamma \) is a real parameter.
Note that (16) and (17) are rewritten as
where \(T^k_{\;\;\, ij}\) are defined by (11). Thus, by setting \(\gamma =(1-\alpha )/2\), we restore the \(\alpha \)-connections as demonstrated in the footnote of Example 4.
Proof of Theorem 11
We divide the proof into four steps.
Step 1. For \(i=1,\dots , n-1\), let \(X_i:=\partial /\partial \theta ^i\) be the vector fields associated with the coordinate system \(\theta =(\theta ^i)_{1\le i\le n-1}\). In order to comprehend these vector fields in terms of elementary geometry, let us represent the tangent vector \((X_i)_p\) at \(p\in {\mathcal S}_{n-1}\) by a numerical vector \((\overrightarrow{X}_i)_p\in {\mathbb R}^n\) whose \(\omega \)th entry (\(1\le \omega \le n\)) is given by
Note that the numerical vector \(\{\delta _i(\omega )-p(\omega )\}_{1\le \omega \le n}\) corresponds to the arrow connecting the initial point p with the terminal point \(e_i\), the ith vertex of the probability simplex \({\mathcal S}_{n-1}\). In this way, the tangent vector \((X_i)_p\) is interpreted as the geometrical vector \(\overrightarrow{pe_i}=\overrightarrow{e_i}-\overrightarrow{p}\) multiplied by p(i). Further, following Chentsov, we introduce another vector field \(X_n\) that has a geometrical vector interpretation \(\overrightarrow{pe_n}=\overrightarrow{e_n}-\overrightarrow{p}\) multiplied by p(n), i.e., whose numerical vector representation \((\overrightarrow{X}_n)_p\) has the form
In what follows, these representations are used interchangeably for tangent vectors \((X_i)_p\) with \(i=1,\dots ,n\). Note that the vector fields \(X_1,\dots , X_n\) satisfy the identity:
Similarly, we introduce a set of vector fields \(Y_1,\dots , Y_\ell \) on \(S_{\ell -1}\). A crucial observation is that for a Markov embedding \(f:{\mathcal S}_{n-1}\rightarrow {\mathcal S}_{\ell -1}\) associated with the partition (1), we have
In fact, due to (2),
where \(\epsilon _k\) is the kth vertex of \({\mathcal S}_{\ell -1}\). As a consequence,
where \(q(k):=p(\omega ) Q^k_{(\omega )}\) with the index \(\omega \) being the one satisfying \(k\in C_{(\omega )}\). By using this equality, we have
proving (19).
Step 2. Let us deal with the case when \(\ell =n\). In this case, a Markov embedding f is reduced to a permutation of indices of events. Consider the barycentre
of \({\mathcal S}_{n-1}\). Due to the symmetry of \({\mathcal S}_{n-1}\) over permutations of indices at the barycentre, the tangent vector \((\nabla ^{[n]}_{X_i} X_i)_{p_0}\) must be parallel to \((X_i)_{p_0}\), so that there is a constant \(\lambda ^{[n]}\) such that
Similarly, there is a constant \(\mu ^{[n]}\) such that for any distinct pair (i, j) of indices,
Now, using (18), we have
which leads to
Step 3. Suppose that \(\ell = N n\) for some \(N\in {\mathbb N}\), and consider the Markov embedding
This map corresponds to the partition
Since f maps the barycentre \(p_0\) of \({\mathcal S}_{n-1}\) to the barycentre of \({\mathcal S}_{\ell -1}\), it follows from the equivariance condition (15) as well as (19) that
Since \(f_*\) is injective, this leads to
Since \(\ell =N n\), this is further equivalent to
Consequently, there exists a constant \(\gamma \), independent of n, such that
Step 4. Take a rational point p in \({\mathcal S}_{n-1}\), and represent it by a common denominator as
Further, let us consider the Markov embedding
This map corresponds to the partition
where \(M_1=0\) and \(M_{j+1}=M_j+m_j\) for \(1\le j\le n-1\).
Since the image f(p) is the barycentre of \({\mathcal S}_{\ell -1}\), it follows from the equivariance condition (15) as well as (19) that for \(i\ne j\),
Since \(f_*\) is injective, this leads to
Further, by using (18),
Finally, the relations (20) and (21), which are valid for all rational points \(p\in {\mathcal S}_{n-1}\), are uniquely extended to all \(p\in {\mathcal S}_{n-1}\) by continuity. This completes the proof. \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fujiwara, A. Hommage to Chentsov’s theorem. Info. Geo. 7 (Suppl 1), 79–98 (2024). https://doi.org/10.1007/s41884-022-00077-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41884-022-00077-7