Information-Geometric Approach for a One-Sided Truncated Exponential Family

In information geometry, there has been extensive research on the deep connections between differential geometric structures, such as the Fisher metric and the α-connection, and the statistical theory for statistical models satisfying regularity conditions. However, the study of information geometry for non-regular statistical models is insufficient, and a one-sided truncated exponential family (oTEF) is one example of these models. In this paper, based on the asymptotic properties of maximum likelihood estimators, we provide a Riemannian metric for the oTEF. Furthermore, we demonstrate that the oTEF has an α = 1 parallel prior distribution and that the scalar curvature of a certain submodel, including the Pareto family, is a negative constant.


Introduction
Information geometry is the study of the structure of statistical models using differential geometry. From the standpoint of geometry, a statistical model consisting of a collection of parameterized probability distributions can be regarded as a manifold. Then, when the statistical model satisfies a certain regularity condition, Chentsov's theorem leads to a natural differential geometric structure [1]. This natural differential geometric structure consists of the Riemann metric defined from the Fisher information matrix and a one-parameter family of affine connections, called the Fisher metric and the α-connection, respectively.
Information geometry of regular models has been studied for a long time, and deep relationships between the geometric structures and statistical models have been revealed (Amari [2] and Amari and Nagaoka [3]).
However, the geometric properties of statistical models which do not satisfy regularity conditions are not sufficiently investigated. One reason is that Chentsov's theorem cannot be applied to non-regular models; thus, a natural geometric structure is not established. For example, in regular models, it is possible to define the Fisher information matrix in two forms, but in statistical models where the support of the probability density function depends on the parameters, they do not coincide. In previous work, Amari [4] discussed the relationship between the Finsler geometry and non-regular models especially for the translation family.
In the present study, we discuss information geometry for a one-sided truncated exponential family (oTEF) [5], a typical non-regular model. An oTEF is a statistical model with two parameters: the natural parameter θ and the truncation parameter γ. The support of the probability density function depends on the truncation parameter γ, which is what makes such a model non-regular. However, similar to the exponential family, the derivatives of the log-likelihood function, moments, and KL divergence can be explicitly continuous with respect to a dominating measure µ. We denote the probability density function of distribution P θ with respect to µ as p(x, θ).
In information geometry, we assume certain conditions on P.
Definition 1 (Regular statistical models). A statistical model P is said to be regular if it satisfies the following conditions. 1.
The support of the density function is independent of θ ∈ Θ.

4.
For any functions in the following, partial differentiation ∂ i and integration with respect to the measure µ are interchangeable as The above four conditions are referred to as regularity conditions. We will treat non-regular statistical models in later sections. In preparation, this subsection discusses the geometric structure of regular statistical models.
Next, we introduce the Fisher metric and α-connection. Let P be a regular statistical model and X be a random variable distributed according to P θ . We usually use the following definition of the Riemannian metric and affine connections on P.
Definition 2 (Fisher metric). The Fisher metric is the Riemannian metric on P defined by for i, j = 1, . . . , n. Here, the symbol E[·] denotes the expectation with respect to the observation X.
Definition 3 (α-connection). The α-connection is an affine connection defined by the coefficients for i, j, k = 1, . . . , n and α ∈ R on the coordinate system θ.
If P satisfies the regularity conditions, the Fisher metric and the α-connection have several properties. For instance, there is a formula [16] convenient for calculating the Fisher metric: Furthermore, the α-connection and the −α-connection are dual. In other words, holds for all tangent vectors A, B, and C, where ·, · denotes the inner product given by the Fisher metric. In particular, the 0-connection is self-dual. Furthermore, the 0-connection corresponds to the Levi-Civita connection, defined by the coefficients

Prior Distributions and Volume Elements
Next, we introduce volume elements on a regular statistical model and its relation to Bayesian statistics [17].
In Bayesian statistics, for a given statistical model P, we need a probability distribution over the model parameter space, which is called a prior distribution or simply a prior. We often denote a prior density as π. (π(θ) ≥ 0 and Θ π(θ)dθ = 1.) A volume element on an n-dimensional oriented model manifold corresponds to a prior density function over the parameter space (θ ∈ Θ ⊂ R n ) in a one-to-one manner. For a prior π(θ), its corresponding volume element ω is an n-form (differential form of degree n) and is written as in the local coordinates. For example, in the two-dimensional Euclidian space (n = 2), the volume element is given by ω = dx ∧ dy in the Cartesian coordinates (x, y). In the polar coordinates (r, ψ), it is written as ω = rdr ∧ dψ. Now, let us explain noninformative priors (see, e.g., work by Robert [18] for details). If we have specific information on the parameter in advance, then the prior should reflect it, in which case it is often called a subjective prior. If not, we adopt a certain criterion and use a prior obtained through the criterion. Such priors are called noninformative priors or objective priors. In particular, the Jeffreys prior, which is given by det(g), is the standard noninformative prior [19].
In information geometry, the Jeffreys prior is regarded as a 0-parallel volume element and was extended to an α-parallel volume element by Takeuchi and Amari [17]. The extensions are called α-parallel priors. They showed that the geometrical properties of regular models are deeply related to the existence of α-parallel priors.
When a suitable geometric structure is defined on a non-regular statistical model, it is interesting to see the relationship between the geometrical properties and the existence of α-parallel priors. Later sections will discuss this topic. In preparation, in this subsection, we briefly summarize some definitions and facts of α-parallel priors for regular models.
To define α-parallel priors, we introduce a geometric property of affine connections.
Definition 4 (equiaffine). Let P be an n-dimensional manifold with an affine connection induced by a covariant derivative ∇. An affine connection ∇ is equiaffine if there exists a volume element ω such that holds everywhere in P. Furthermore, such a volume element ω is said to be a parallel volume element with respect to ∇.
The necessary and sufficient condition for an affine connection ∇ to be equiaffine is described by its curvature. The following proposition holds for a manifold with an affine connection ∇. Let R ijk l be the components of the Riemannian curvature tensor [3] of ∇, defined as where Γ ij k denotes the connection coefficients of ∇.
Returning to the statistical model, we define the α-parallel prior distribution. Let P be an n-dimensional regular statistical manifold. It is shown that ∇ is equiaffine for some α ∈ R \ {0} if and only if it is equiaffine for all α ∈ R. Such a statistical manifold P is said to be statistically equiaffine. In a statistically equiaffine manifold, we may represent the α-parallel volume element ω (α) as for the coordinates θ, where π (α) ∈ C ∞ (P ). We take π (α) (θ) as a prior distribution on the parameter space Θ.
Since the 0-connection (Levi-Civita connection) is equiaffine, there always exists a 0-parallel prior, known as the Jeffreys prior.
Takeuchi and Amari [17] give a necessary and sufficient condition for α-parallel priors to exist. [17]). For a model manifold P, if

Proposition 2 (Takeuchi and Amari
then the α-parallel prior exists for any α ∈ R. Otherwise, only the 0-parallel prior exists.
We say θ = θ 1 , . . . , θ n is the natural parameter and γ is the truncation parameter. We also denote the interval (I 1 , I 2 ) as I.
In oTEF P, the support of the density function depends on the truncation parameter γ. Then, the oTEF does not satisfy the regularity conditions. An oTEF differs from an exponential family at this point. The density of an exponential family has a support that is independent of the parameters. Furthermore, this truncation parameter γ does not allow for the interchange of partial differentiation ∂ γ and integration with respect to the Lebesgue measure. For instance, this means instead of E[∂ γ l(X, θ, γ)] = 0.
Here, we introduce two properties of the function ψ. First, the partial derivative −∂ γ ψ(θ, γ) coincides with p(x, θ, γ)| x=γ and is always positive. This fact can be verified as follows. Since p is a probability density function, Therefore, by differentiating both sides with respect to γ, we obtain and Second, the following lemma holds. (13), there always exists θ and γ such that
If ∂ γ ∂ i ψ(θ, γ) ≡ 0, the function ψ(θ, γ) can be expressed as where Then, Differentiating both sides with respect to γ, we obtain Further differentiating both sides with respect to θ i , we have and all F i are constants. Therefore, The family of Pareto distributions is an example of the oTEF. Pareto distributions have the following density functions, with the natural parameter θ ∈ (0, ∞) and the truncation parameter γ ∈ (0, ∞). This family is used to describe various natural and social phenomena [21].
To discuss geometric structures in subsequent sections, we focus on the asymptotic behavior of maximum likelihood estimators in the oTEF. Our discussion of these estimators follows previous works. Consider random variables X 1 , . . . , X N independent and identically distributed according to P θ,γ , and let X (1) ≤, · · · ≤ ,X (N) be the order statistics of the sample. Letθ andγ denote the maximum likelihood estimators for θ and γ, respectively. Bar-Lev [5] showed the existence and uniqueness ofθ andγ. Here,γ = X (1) andθ is a root of the maximum likelihood equation ∂ i l(X,θ,γ) = 0 for i = 1, . . . , n.
The first-order asymptotic variances ofθ andγ are given by These are essential to our argument in the next section. Additionally, Akahira [6] and Akahira and Ohyauchi [22] obtained the second-order asymptotic loss.

Geometric Structure on One-Sided Exponential Families
This section gives definitions of a geometric structure of the oTEF. We take P as an (n + 1)-dimensional manifold with coordinates (θ 1 , . . . , θ n , γ). Since P does not satisfy regularity conditions, its geometric structure is not determined in a natural way.

Riemannian Metric in the oTEF
In this subsection, we define a Riemannian metric on oTEF P as follows.
Definition 7 (Riemannian metric in the oTEF). Let P = P θ,γ : θ ∈ Θ, γ ∈ I belong to the oTEF. The Riemannian metric of the oTEF is defined by for i, j = 1, . . . , n . This metric is also represented as Note that g ij can be expressed in terms of the function ψ as follows: This is similar to the case of exponential family distributions.
We will now explain how we came to the above definition.
Consider an exponential family E γ = P θ,γ : θ ∈ Θ for γ ∈ I and a statistical model F θ = P θ,γ : γ ∈ I for θ ∈ Θ. E γ is an n-dimensional submanifold of P obtained by fixing the truncation parameter γ and satisfies the regularity conditions. Additionally, F θ is a one-dimensional submanifold of P obtained by fixing the natural parameter θ. Since E γ is regular, the Riemannian metric on E γ should be the Fisher metric defined in Definition 2. This idea induces the components g ij to be the components of the Fisher metric. The remaining task is to define g iγ , the inner products of ∂ i and ∂ γ , and g γγ , the metric on F θ .
In this context, we review the statistical interpretation of the Fisher metric in regular models. Let P 0 be a regular model with parameters θ ∈ R n . As mentioned in Section 2.1, the Fisher metric is a Riemannian metric defined by the Fisher information matrix. Expanding the variance of the maximum likelihood estimatorθ, we have where the first-order coefficient corresponds to g F ij −1 . On the other hand, in the oTEF, the Riemannian metric is determined from the coefficient of the variance of the maximum likelihood estimator. As shown in Section 2.3, the variances ofθ andγ are expressed as where (g ij ) is the matrix (g ij ) n i,j=1 . Then, similarly to the Fisher information matrix, {∂ γ ψ(θ, γ)} 2 appears as the reciprocal of the first-term coefficient. From this, the Riemannian metric of F θ is defined as Furthermore, it should be noted that Cov θ ,γ is negligible up to O 1 N 2 . Moreover, whether θ is known does not affect the first-order term of V θ [22]. This is also true for the estimation of θ [23]. Based on these facts, we assume that ∂ i and ∂ γ are orthogonal, and define g iγ = 0 (i = 1, . . . , n).
As a result, the Riemannian metric in Definition 7 is obtained. This Riemannian metric is equal to the formally defined Fisher metric on the oTEF. In other words, the equality holds for a, b = 1, . . . , n, γ. The right-hand side is the same as the definition of the Fisher metric on regular statistical models.
However, the Riemann metric does not satisfy the equation For i = 1, . . . , n, we have but ∂ i ∂ γ ψ(θ, γ) = 0 for some θ and γ, by Lemma 1. This is influenced by the fact that it does not satisfy the regularity conditions.

Affine Connections in the oTEF
Next, we define an affine connection in the oTEF. In this study, we adopt two types of connections: the Levi-Civita connection and the α-connection.
The first affine connection is the Levi-Civita connection: introduced from the Riemannian metric. Of course, this Levi-Civita connection is also a metric and self-dual, the same as in the regular case. Rylov [10] and Li et al. [12] previously adopted this same affine connection. Second, we define an affine connection in the oTEF as an analogy for the α-connection in the regular statistical model defined in Definition 3.

Definition 8. For a given α ∈ R, the α-connection in the oTEF is defined by the connection coefficients
where l = l(X, θ, γ) is a log-likelihood function.
The above definition is obtained by substituting the oTEF probability density function into Equation (4) for the α-connection in the regular model. In particular, for the Pareto distribution family, it coincides with the α-connection given by Sun et al. [13] (see the equation in their paper). Note that, the log-likelihood function is not differentiable with respect to γ at γ = x. We calculate the expectation over the interval (γ, I 2 ) instead of [γ, I 2 ).
The α-connection in the oTEF differs from the one in the regular model in several aspects.
First, the 0-connection does not correspond to the Levi-Civita connection. This is due to the inability to express the α-connection coefficients in the following form: This transformation involves an interchange of the order of differentiation and integration, which is one of the regularity conditions. Therefore, in the non-regular model oTEF, the two sides do not match.
Additionally, the dual connection of the α-connection in the oTEF does not become the −α-connection. This can be verified as follows.

Existence of an α-Parallel Prior in the oTEF
In this section, as part of investigating the properties of the α-connection in the oTEF, we deal with the α-parallel priors. We show that there exists an α-parallel prior distribution for α = 1.
The existence of α-parallel priors depends on the geometric properties of statistical models, and they are not guaranteed to exist in general. Therefore, it is necessary to investigate the existence of α-parallel priors in the case of the oTEF. Note that the Levi-Civita connection is always equiaffine, and it is known to have the Jeffreys prior as a parallel prior distribution. Therefore, we do not deal with it in this section.
In the oTEF, attention is needed to be paid to the conditions for the existence of the αparallel prior distributions. In the case of regular models, Proposition 2 provides a necessary and sufficient condition for the existence of α-parallel priors. Sun et al. [13] revealed that the Pareto distribution family does not satisfy this condition and claims that it does not have α-parallel priors. However, the above deduction is incorrect, since Proposition 2 does not hold for an oTEF distribution. In proof [17] of Proposition 2, the connection coefficients of the α-connection are written in the same form as (55) by the Levi-Civita connection and the cubic tensor T ijk = E ∂ i l∂ j l∂ k l . However, this form cannot represent the α-connection in Definition 8. Then, in the oTEF, Proposition 2 does not confirm that the equation is a necessary and sufficient condition for the existence of α-parallel priors. Therefore, in this study, we use the necessary and sufficient conditions for general affine connections to investigate the existence of the α-parallel prior.
The following theorem reveals the existence of α-parallel priors in the oTEF.
Theorem 1. Consider P as belonging to the oTEF with densities of the form as those in Definition 13 with the natural parameter θ ∈ Θ and the truncation parameter γ. If α = 1, then the connection (α) ∇ is equiaffine and there exists a one-parallel prior. Moreover, this one-parallel prior π (1) can be represented as for θ ∈ Θ, γ ∈ I.
Proof. First, we prove that (1) ∇ is equiaffine. By Proposition 1, the necessary and sufficient condition for R denotes α-Riemannian curvature tensors. This condition can be represented as By Definition 7, we have Thus, Therefore, the formula ∂ a (α) Γ ac c holds when α = 1 and (1) ∇ is equiaffine.
Second, we find the one-parallel prior. Let π (1) be the density of the one-parallel volume element. According to Proposition 1 in the paper by Takeuchi and Amari [17], we have In the case of the oTEF, its representation is Therefore, the one-parallel prior for P is given as Moreover, the above one-parallel prior coincides with a certain reference prior. A reference prior, proposed by Bernardo [24], is a noninformative prior distribution derived from an information-theoretic perspective. Specifically, it is defined as a prior distribution that maximizes the expectation of the KL divergence between the posterior and prior distributions, and in the case of no nuisance parameters, it coincides with the Jeffreys prior [24]. However, this is not necessarily the case when nuisance parameters are present. Ghosh and Mukerjee [25] provided a new formulation for reference priors with nuisance parameters by considering the maximization of a functional with an appropriate penalty term. Furthermore, Ghosal [26] extended this reference prior to non-regular models where the support of the density depends on the parameters. When applied to an oTEF model with γ as the parameter of interest and θ as the nuisance parameter, the reference prior of Ghosal [26] is given by Thus, it coincides with the one-parallel prior in Theorem 1.

Scalar Curvature on a Submodel of the oTEF
This section finds a submodel of the oTEF with constant scalar curvature for the Levi-Civita connection. We do not use the α-connection in Definition 8.
Previous works about the information geometry of non-regular cases have mainly studied the geometric structure of Pareto distributions for the Levi-Civita connection. They adopted the formally defined Fisher metric as in (42), which is consistent with our Riemannian metric. Rylov [10] found that the family of Pareto distributions has a constant curvature with respect to the Levi-Civita connection. Li et al. [12] showed that its geometrical structure is isometric to the Poincaré upper half-plane and applied this geometrical structure to Bayesian inference by considering the Jeffreys prior.
We extend these previous works on Pareto distributions to n dimensions. For i = 1, . . . , n, let F i be a smooth function on the interval I and let X i be a random variable following a two-parameter truncated exponential distribution, which has density function with respect to the Lebesgue measure, where θ i ∈ (0, ∞), γ ∈ R. An oTEF includes this distribution.
Consider Q θ,γ , a joint distribution of independent random variables F 1 (X 1 ), . . . , F n (X n ) with the common redefined truncation parameter γ and the natural parameter θ = (θ 1 , . . . , θ n ). Q θ,γ has the density function with respect to the Lebesgue measure on R n . Here, I n is a rectangle ∏ n i=1 F i (R) and I(γ) is I n ∩[γ, ∞) n . Q is a family of distributions Q θ,γ with parameters θ and γ.
Q includes practical examples such as a family of several Pareto distributions with a common scale parameter (Rohatgi and Saleh [14]) and a family of truncated exponential distributions with a common location parameter (Ghosh and Razmpour [15]). Truncated exponential distributions with a common γ are sometimes applied to reliability and life testing. Please assume that the case where the first failure of n products can occur only after a common minimum time γ has elapsed, and these products have unknown and possibly unequal failure rates θ 1 , . . . , θ n . This truncation parameter γ takes the role of a "guarantee time", so estimation of the parameter γ is vital to determining the warranty period.
Note that the geometric structure of the above common truncation parameter model is equivalent to that of the family of Pareto distributions when n = 1.

Theorem 2.
Riemannian manifold Q has a constant scalar curvature of −2.

Concluding Remarks
This paper considered the geometric structure of a one-sided truncated exponential family (oTEF) with parameters θ and γ. We constructed a Riemannian metric based on the asymptotic properties of the maximum likelihood estimators. Under this, we showed that the formally defined α-connection admits α-parallel priors when α = 1. Our result gives geometric meaning to a specific reference prior. Furthermore, we proved that the scalar curvature of some submodels of the oTEF obtained by making γ common across multiple distributions is constant.
It is essential to discuss suitable affine conditions for the oTEF. First, we need to reveal the statistical meaning of the α-connection in the oTEF. α-connection coefficients are expected to appear in higher-order terms of the variance of maximum likelihood estimators. Additionally, instead of the α-connection, we can construct a family of affine connections to be equiaffine by connecting the α-connection and the Levi-Civita connection. It is also interesting to consider affine connections induced by the third derivatives of divergences.