A Characterization of the Compound Multiparameter Hermite Gamma Distribution via Gauss's Principle

We consider the class of those distributions that satisfy Gauss's principle (the maximum likelihood estimator of the mean is the sample mean) and have a parameter orthogonal to the mean. It is shown that this so-called “mean orthogonal class” is closed under convolution. A previous characterization of the compound gamma characterization of random sums is revisited and clarified. A new characterization of the compound distribution with multiparameter Hermite count distribution and gamma severity distribution is obtained.


Introduction
The topic of maximum likelihood characterizations of distributions has a long history and is an active field of contemporary mathematical sciences. It concerns the characterization of a (class of) probability distribution(s) through the structure of the maximum likelihood estimator (MLE) of one or more parameters of interest (e.g., location, scale, etc.). Starting point is a famous result by Gauss [1] on the foundation of least squares theory (see, e.g., [2], [3, afterword, pages 208 and 215]). Given is a location family with continuous derivative. If the maximum likelihood estimator of the location parameter is the sample mean, then the distribution is normal. This important result has been discussed by Poincaré [4,Chapter 10], Teicher [5], Ferguson [6,7], Marshall and Olkin [8], Bondesson [9], and Azzalini and Genton [10], by many other authors. The property that the MLE of the mean is the sample mean has been called Gauss's principle by Campbell [11] (see also [12,13] and references therein). A brief account of the present contribution follows.
Within the framework of multiparameter distributions, consider the "mean orthogonal class" of those distributions that besides Gauss's principle have a parameterization such that the mean is orthogonal to some parameter vector, a property which can always be satisfied by Amari [14], Section 8. This class has been considered by Sprott [15]. A characterization of the mean orthogonal class through the cumulant generating function (cgf) has been formulated by Hürlimann [16]. Extending a result by Hudson [17], it is shown in Theorem 3 that this class is closed under convolution. Section 3 is devoted to a characterization of random sums through the mean orthogonal class. Hürlimann [18] has established that the mean scaled severity in a compound model is necessarily gamma distributed provided that the count distribution and the distribution of the random sum belong to the mean orthogonal class, and some additional partial differential parameter equation can be solved. A followup to this construction for the individual model of risk theory is Hürlimann [19]. We clarify and simplify the original proof to obtain a characterization that is further used in Section 4. Based on a result by Puig and Valero [20], we derive in Theorem 17 a most stringent characterization, which allows compounding of the gamma distribution under a single count data family, namely, the multiparameter Hermite distribution. This one requires that the count distribution is closed under convolution and binomial subsampling.

Distributions with the Mean Orthogonal Property
Let be a random variable whose distribution depends upon a vector ( , ) = ( , 1 , . . . , ) of + 1 parameters, where the mean is functionally independent of , that is, 2 The Scientific World Journal / = 0, = 1, . . . , . The log likelihood of is denoted by ℓ( ; , ). We assume throughout that the cumulant generating function (cgf) ( ; , ) = ln{ [exp( )]} exists and denotes the variance by 2 = 2 ( , ). The standard regularity conditions for maximum likelihood estimation are supposed to hold. The vector X = ( 1 , . . . , ) denotes a random sample of size , which realizes the random variable , and denotes the sample mean. We are interested in the class of distributions that satisfy Gauss's principle (the maximum likelihood estimator of the mean is the sample mean), that is, such that̂= . A distribution belongs to this class if and only if there are functions = ( , ), = 1, . . . , , ℎ = ℎ( , ), such that the following equivalent partial differential equations hold (e.g., [15][16][17]): The original motivation for parameter orthogonality is improvement of maximum likelihood estimation by reparameterization. In the class the number of maximum likelihood equations is reduced by one and parameter orthogonality decreases the often high correlation between the MLEs of the parameters since the MLEs of orthogonal parameters are asymptotically uncorrelated. Indeed, the expectations in (2) are elements of the (expected) Fisher information matrix, which determines the asymptotic covariance matrix of ( , ). In this respect, one is interested in the subclass ⊥ of of all distributions satisfying besideŝ= the mean orthogonal property ⊥ . This so-called mean orthogonal class is characterized as follows.

Theorem 2 (Characterization of the mean orthogonal class).
Let be a random variable with cgf ( ; , ) satisfying the above assumptions. Then, one has ∈ ⊥ if and only if the following quasi-linear partial differential equation is satisfied: Proof. This is shown in Hürlimann [16].
Hudson [17,Theorem 1] has shown that the class is closed under convolution. In fact, convolution invariance holds under the more stringent mean orthogonal property.
Theorem 3 (Convolution invariance of the mean orthogonal class).

Mean Orthogonal Characterization of the Compound Gamma Distribution
Consider random sums of the type where the 's are independent and identically distributed nonnegative random variables, and is a counting random variable defined on the nonnegative integers, which is independent of the 's. The mean and variance of , , and ∼ are denoted, respectively, by , 2 , , 2 , and , 2 . The coefficient of variation of is denoted by = / . In some applications, it is convenient to scale the severity by the mean such that the mean scaled severity = / ∼ = / has mean 1/ . The resulting sum is called mean scaled compound random sum. The mean scaled compound model has important insurance risk applications. It has been studied in Hürlimann [18], which establishes that the mean scaled severity is necessarily gamma The Scientific World Journal 3 distributed provided that the random variables and belong to the mean orthogonal class and some additional partial differential parameter equation can be solved. A followup to this construction for the individual model of risk theory is Hürlimann [19]. We clarify and simplify the original proof to obtain a characterization of (4), which will be used in Section 4. In particular, (3.20) in Hürlimann [18] is not a consequence but an assumption. Since this equation is satisfied in the provided examples, this error does not harm the obtained result but must be rectified from a mathematical logical point of view. Also, the proof of Lemma 7 there will be simplified (proof of Lemma 8 below).
Applying the chain rule of differential calculus, this condition transforms to Now, by Lemma 8 below and the chain rule, one has Inserting into (12) shows that The statements (10) and (11) follow by using the representation (7).

Lemma 8.
If ∈ ⊥ , then the partial differential parameter Proof. The representation (7) implies that ( ) = . Now, using (7) one sees that (9) The Scientific World Journal Now, by Lemma 7 and (10), one has the identity (use the differential chain rule) which, together with 2 ⋅ ( / ) = , implies that Inserted into the above expression one obtains the ordinary differential equation: whose unique solution is ( ) = (1 − 2 ) − . Since 2 = / , one sees that is the cgf of a gamma-distributed random variable. The proof is complete.
The proof uses the so-called natural parameterization ( , , , ) of the compound gamma distribution. It is interesting to obtain explicit parameters orthogonal to the means of , , and . By the assumption ∈ ⊥ one has = ( , ) ⊥ , and since is gamma distributed, one has ∈ ⊥ with ⊥ . It remains to construct a parameter vector orthogonal to the mean of such that where = ( , , , ) must be determined. This task can be solved in a unified way for a lot of counting distributions (see [18,Section 4]). To illustrate the method, it suffices to consider here a single example.

Mean Orthogonal Characterization of the Compound Multiparameter Hermite Gamma
The mean orthogonal characterization of the compound gamma distribution allows for a wide variety of count data distributions in the mean orthogonal class. In order to reduce further the possible set of count distributions that can be used, one can ask for characterizations in terms of additional assumptions. For example, Puig [25] and Puig and Valero [26] characterize count data distributions satisfying Gauss's principle and several notions of additivity, which via Theorem 5 can be translated to characterizations of compound gamma distributions. Based on a result by Puig and Valero [20], we derive a most stringent characterization, which allows compounding of the gamma distribution under a single count data family, namely, the multiparameter Hermite distribution. To show this, some additional preliminaries are required.
Definition 11. Let F be a family of count distributions. It is called closed under binomial subsampling if, for any random variable with distribution in F, all its independent pthinnings, for all ∈ (0, 1], have distributions in F.

Definition 12.
Let F be a family of distributions. It is called closed under convolution if, for any two independent random variables , with distributions in F, the distribution of the sum + also belongs to F. There is only one count distribution family closed under convolution and binomial subsampling.
Some comments are in order. The case = 1 corresponds to the Poisson distribution, = 2 is the Hermite distribution (e.g., [27]). For arbitrary , this distribution is called the multiparameter Hermite distribution of order by Milne and Westcott [28]. In terms of the cumulant pgf (6), the representation (24) can be rewritten as where ( ), = 1, 2, . . . , , solves the system in (8), that is, The case = 2 of (26) is already in A.W. Kemp and C.D.
Kemp [29], and for arbitrary this assertion is equivalent to Lemma 2 in Puig and Valero [20]. The special case (1) > 0, ( ) ≥ 0, ( ) = 0, = 2, . . . , − 1 is the generalized Hermite by Gupta and Jain [30]. The multiparameter Hermite belongs also to the Kumar [31] family of distributions. In general, the conditions on the sequence ( ), = 1, 2, . . . , , under which (25) defines a true probability distribution have been identified in Lévy [32]. According to Lukacs [33, page 252] and Johnson et al. [34, page 356], this is the case provided that a negative value ( ) < 0 is preceded by a positive value and followed by at least two positive values. In particular, if at least Together, this shows that (10) is satisfied. The result follows by Lemma 7.
We are ready for the following new characterization result. Proof. The result follows by combining Theorems 5 and 14 making the observation that a multiparameter Hermite distribution can always be put in the form of Lemma 15 (generalization of Example 16). The assertion about the orthogonal parameters to the means , , follows along the same arguments as in Example 9 using (27).