Smoothed Dirichlet Distribution

When the cells are ordinal in the multinomial distribution, i.e., when cells have a natural ordering, guaranteeing that the borrowing information among neighboring cells makes sense conceptually. In this paper, we introduce a novel probability distribution for borrowing information among neighboring cells in order to provide reliable estimates for cell probabilities. The proposed smoothed Dirichlet distribution forces the probabilities of neighboring cells to be closer to each other than under the standard Dirichlet distribution. Basic properties of the proposed distribution, including normalizing constant, moments, and marginal distributions, are developed. Sample generation of smoothed Dirichlet distribution is discussed using the acceptance-rejection algorithm. We demonstrate the performance of the proposed smoothed Dirichlet distribution using 2018 Major League Baseball (MLB) batters data.


Introduction
The smoothed Dirichlet distribution has the same parametric form as the Dirichlet distribution that forces the neighboring cells to be closer to each other by adding a penalty function.Many researchers currently force the neighboring cells to be closer to each other by using different parameter values ( ) for the Dirichlet distribution.There is a big necessity for a suitable distribution for the above issue, and our proposed distribution fills that gap, allowing for efficient modeling of data.
A new parametric family of distributions was introduced and developed with some extensions to the Dirichlet distribution to suit different purposes [11].introduced generalized Dirichlet distribution has a more general covariance structure than Dirichlet distribution.The grouped Dirichlet distribution (GDD) is a multivariate generalization of the Dirichlet distribution, which uses to model incomplete categorical data; it was first described by [5].Also [6], introduced nested Dirichlet distribution to model incomplete categorical data.The spherical-Dirichlet distribution was introduced by [1], which is obtained by transforming the Dirichlet distribution on the simplex to the corresponding space on the hyper-sphere.[4] proposed a different version of the smoothed Dirichlet distribution in the context of smoothed language model representation of documents.This proposed smoothed Dirichlet distribution is the same as the Dirichlet distribution except that the domain of the distribution is restricted to include only smoothed language models [3].Used the above-mentioned smoothed Dirichlet distribution for image categorization using a smoothed simplex.The above proposed smoothed Dirichlet distribution uses smoothed domain and mainly focuses on multimedia data, whereas our proposed smoothed Dirichlet distribution can apply to any context of data [7].Proposed the modified Dirichlet distribution that simultaneously performs smoothing by making the parameters ( ) to be negative.

Basic Properties
In this section, we introduce the proposed smoothed Dirichlet distribution and its basic properties.

Probability Density Function
The smoothed Dirichlet distribution was suggested by [2] as a variation to the Dirichlet distribution that forces the successive cell categories to be closer to each other.Let x = (x 1 , x 2 , … , x K ) t be a vector with K components where x j ≥ 0 for j = 1, 2, … , K and ∑ K j=1 x j = 1 .Also, let = ( 1 , 2 , … , K ) t , where  j > 0 for each j and  > 0 .The smoothed Dirichlet (SD) probability density function is where exp(− Δ(x)) is a penalty function that forces successive x j 's to be close to each other with higher probability than under the standard Dirichlet distribution satisfying ∑ K j=1 x j = 1 .dictates the extent to which the neighbouring x j s to be close.For instance, a large value of forces realizations of x to have small values of Δ(x) .The constant C( , ) can be written as From now on, we refer to smoothed Dirichlet distribution (SDD) as X = (X 1 , X 2 , … , X K ) t ∼ SDD( , , Δ) .Here Δ = Δ(X) and its importance is dis- cussed later. .

Moments
In this sub-section, we compute the first, second, and nth order moments, variance, and covariance of the smoothed Dirichlet distribution.First, we compute the mean of the smoothed Dirichlet distribution is Let il = 0, ∀l, l ≠ i , and ii = 1 .Here i = ( i1 , i2 , … , iK ) t .Then where C( Using (1), Here (2)

Substituting (2),
Similarly, we compute the expected value for X n i as Then where C( Using (1), 1 3 Journal of Statistical Theory and Applications (2023) 22:237-261 Substituting (4), More generally, the product moments of smoothed Dirichlet distribution random variables can be expressed as where We can easily compute the second moment by plugging n = 2 to (5), Then the variance of the smoothed Dirichlet distribution is (4) Next we compute the covariance of X i and X l ( Cov(X i , X l ) ), i, l = 1, 2, … , K and i ≠ l .Using (6), Then ( 8) 1 3 Journal of Statistical Theory and Applications (2023) 22:237-261

Sample Generation
We use the acceptance-rejection method to generate a random sample from the smoothed Dirichlet distribution with parameters , , and Δ .In the acceptance-rejec- tion sampling method, we generate sampling values from a target distribution X by using a proposal distribution Y .We generate the values from Y instead of X and accept the values of Y if f (x) ≤ Cg(y) where f and g are probability density functions of X and Y , and C is a constant.We consider the Dirichlet distribution as the target distribution.
Here are the steps of the acceptance-rejection algorithm.
1. Generate a set of random samples y 1 , y 2 , … , y M from the Dirichlet distribution with given .2. Compute E [exp(− Δ(y))] using all the generated random samples.3. Generate a random number u from Uniform(0,1) and set i = 1.
accept y i and otherwise reject y i , set i = i + 1 and repeat the steps again until i = M.

Role of the Penalty Term ( Exp(−ı1(x)))
As we mentioned before, the penalty function which forces successive x j 's to be close to each other with higher probability than under the standard Dirichlet distribution satisfying ∑ K j=1 x j = 1 .The roles of and Δ are very important when constructing the smoothed Dirichlet distribution.The penalty parameter ( ) dictates the extent to which cell probabilities of neighboring categories have to be similar.Some examples of Δ functions that we consider are From now on, for our analysis, we use Also, the maximum value of Δ is attained when one of the x j = 1 , 2 ≤ j ≤ K − 1 .Then the maximum value of Δ is Fig. 1 shows the effect of different and for the same penalty function ) 2 after simulating data from the smoothed Dirichlet distribution with K = 3 .The plots in each row of this Fig. 1 are for the same , and the plots in each column are for the same value.We consider (1, 1, 1) t , (5, 5, 5) t , (10, 10, 10) t , (10, 5, 1) t , (1, 5, 10) t and (1, 10, 1) t as for the plots in each row of Fig. 1.The color scale runs from yellow (lowest value) to red (highest value).
As j increases, the distribution becomes more tightly concentrated around the center of the simplex for a given value.Also, for a given , the distribution becomes more tightly concentrated around the center of the simplex as increases.If j values are different, then we will get an asymmetric (non-central) distribution with higher values for the highest j .Also, the highest penalty occurs at (x 1 = 0, x 2 = 1, x 3 = 0) t for this penalty function.
Figure 2 shows the effect of different and penalty functions for = (5, 5, 5) t after simulating data from the smoothed Dirichlet distribution with K = 3 .The first row of this Fig. 2 is for the penalty function Δ The second and third rows of this Fig. 2 are for the penalty functions It is clear that the distribution becomes more tightly concentrated around the center of the simplex for the penalty function Δ 1 = ∑ K−1 j=1 (log x j+1 − log x j ) 2 than other penalty functions.Also, the highest penalty occurs at (x 1 = 0, x 2 = 1, x 3 = 0) t for all of these penalty functions.

The Upper and Lower Limits for E(X i ) and Var(X i )
Next, we compute the upper and lower limits for E(X i ) and Var(X i ) .The upper and lower limits help to find out the possible range for E(X i ) and Var(X i ) .First, we compute the upper and lower limits for E(X i ) .We know that the value of Now we can compute Journal of Statistical Theory and Applications (2023) 22:237-261 Fig. 1 The effect of for different ; row 1 with (1, 1, 1) t , row 2 with (5, 5, 5) t , row 3 with (10, 10, 10) t , row 4 with (10, 5, 1) t , row 5 with (1, 5, 10) t and row 6 with (1, 10, 1) which is the expected value of the Dirichlet distribution.Next, we compute the upper and lower limits that help to find out the possible range for E(X i ) and Var(X i ) .We know that This is also called the Maclaurin series.Then, Similarly, exp(2 Journal of Statistical Theory and Applications (2023) 22:237-261 By using 9 and 14, Also, Substituting 16 to 14, We know that, Var(X i ) = E(X 2 i ) − (E(X i )) 2 .Then we compute the lower limit of Var(X i ) (Var LL (X i )) and the upper limit of Var(X i )(Var UL (X i )) separately.( 14) (16) Journal of Statistical Theory and Applications (2023) 22:237-261 Note that when → 0 , both Var LL (X i ) and Var UL (X i ) which is the variance of the Dirichlet distribution.Next, we discuss the marginal distribution of the smoothed Dirichlet distribution.

Marginal Distributions
We know that the marginal distributions of the standard Dirichlet distribution are beta distributions.The smoothed Dirichlet (SD) probability density function is The marginal probability density function of X 1 is a product of the probability den- sity function of Beta( 1 , A − 1 ) and the ratio of exp(− Δ(x)) . Now let's obtain the marginal distribution of x 2 .The joint distribution of x 1 and x 2 is Then Substitute

Estimation of Parameters and Bayesian Inference
We now outline the estimation of the parameters of the smoothed Dirichlet distribution.We first derive estimators for j , j = 1, 2, … , K and using the method of moments (MOM).

Method of Moments (MOM)
Suppose we have a random sample with n random vectors X 1 , X 2 , … , X n such that We know that the first and second population moments are and We define the first and second sample moments as and for j = 1, 2, … , K .We have K − 1 first-order moment equations and K − 1 second- order moment equations to solve for K unknown j and .Then and Journal of Statistical Theory and Applications (2023) 22:237-261 for j = 1, 2, … , K .There is no closed-form solution for j and in solving simulta- neously 19 to 20, so we must solve numerically to obtain the corresponding method of moments estimators for the parameters.

Bayesian Inference
In the Bayesian paradigm, if the posterior distribution is in the same probability distribution family as the prior distribution, then the prior is called a conjugate prior distribution.Like the Dirichlet distribution, the smoothed Dirichlet distribution is also a conjugate prior to the multinomial cell probabilities vector.The distribution of cell counts (X) is given by where p = (p 1 , p 2 , … , p K ) t denotes the vector of cell probabilities for the multino- mial population.We suggest using the smoothed Dirichlet distribution as the prior distribution for p, Then, the posterior distribution of p , given the observed counts ( (x) is We introduce the smoothed Dirichlet-multinomial distribution, a compound distribution of a multinomial distribution, and a smoothed Dirichlet distribution.The marginal distribution of x is given by (20 4 Data Analysis

A Simulation Study
An in-depth simulation study would be useful to demonstrate and compare how the proposed and existing estimators perform over simulated datasets and considering different scenarios.Nevertheless, we performed a brief simulation study using 10000 Monte Carlo simulations, knowing the true cell probabilities.We report the Mean Squared Error (MSE) and compare the estimators below.Also, we varied the number of populations (N -100, 200, and 500) and the number of categories (K-3, 6, and 9) to explore the performance of the estimators.We generated data from multinomial distributions with sample sizes ranging from 15 to 75.We considered two scenarios for the true cell probabilities.Here we compare our proposed method with the Dirichlet process method proposed by [10] and the weighted likelihood method by [8].Note that for the empirical Bayes method, we considered the standard Dirichlet distribution.In the proposed method we considered the penalty function Δ = ∑ K−1 j=1 (x j+1 − x j ) 2 for the smoothed Dirichlet distribution.Also, changes with the scenarios, N and K, and it is ranged from 100 to 300.

Scenario 1
In this scenario, the true cell probabilities are strictly decreasing but the differences between successive x 's are the same.For example, when K = 3 , the true cell proba- bilities are x = 3 6 , 2 Table 1 provides the mean squared error values for each estimator based on Scenario 1.When N is fixed and K increases the MSE decreases.The MSE also decreases when K is fixed and N increases.The Bayesian shrinkage estimator based on a smoothed Dirichlet prior is the best estimator based on the MSE.

Scenario 2
In this scenario, the true cell probabilities are increasing and decreasing (zig-zagag pattern).For example, when K = 3 , the true cell probabilities are p = 1 12 , 10 12 , 1 12 Table 2 provides the mean squared error values for each estimator based on Scenario 2. As in previous scenarios, when N is fixed and K increases the MSE decreases.The MSE also decreases when K is fixed and N increases.The estimator based on weighted likelihood approach is the best estimator based on the MSE for this scenario.The Bayes estimator based on the smoothed Dirichlet prior is not doing well in this case, which is not surprising given it isn't designed to handle this case where successive cell probabilities are different.

Real Data Analysis
Real-world situations often arise where outcomes in certain categories are not observed due to limited sample size and small, but non-zero, cell probabilities.In such cases, the Maximum Likelihood Estimation (MLE) method may yield poor results by underestimating the actual cell probabilities.However, a proposed approach is highly valuable and applicable in such scenarios, particularly when dealing with ordinal categories that possess a natural ordering.This approach ensures that the borrowing of information among neighboring cell categories is conceptually meaningful, making it an effective and useful methodology for analyzing data applications with missing outcomes in the presence of small probabilities.
Let's now consider the estimation of p using a smoothed Dirichlet prior.For the data application, we consider 2018 Major League Baseball (MLB) batting data from the Baseball-Reference website (www.baseball-reference.com).We consider data for all the regular season games taking place between March 29, 2018, and October 12, 2018.Our analysis includes m = 556 players with at least 25 plate appearances.[8] proposed an estimator based on the weighted likelihood approach to predict a good baseball batting metric for each batter, especially when a batter has a few plate appearances.This estimator borrows information from other similar batters to make inferences about a target.Our proposed Bayesian estimator using smoothed Dirichlet  prior distribution borrows information across other batters but, more importantly, also across neighboring ordinal categories (batting outcomes) to improve the estimation of cell probabilities.[9] used this proposed Bayesian estimator using smoothed Dirichlet prior distribution to estimate the distribution of positive COVID-19 cases across age groups for Canadian health regions.
In our analysis, we consider K = 11 possible outcomes to batting; SO -strikeout, GO -ground out, AO -air out, SH -sacrifice hit, SF -sacrifice fly, HBP -hit by a pitch, BB -bases on balls/walk, S -single, D -double, T -triple and HR -home run.The outcome of batting in baseball can be divided into discrete categories; this is the basis for constructing metrics that evaluate the batting performance of players.It is also the basis of our analysis.Let x ij be the number of plate appearances in which the batting outcome j occurs for the i th batter (j = 1, 2, … , K) and denote the number of plate appearances for the i th batter by n i .In this paper, we considered K = 11 bat- ting outcomes, and the joint distribution of the counts for these 11 discrete categories for batter i is given by where = (p i1 , p i2 , … , p i11 ) t represents the vector of outcome specific probabilities satisfying ∑ j p ij = 1 .Taking a Bayesian approach, assume that Then, the posterior distribution for the ith batter is given by It is clear that there are mainly two types of groups; strikeout, ground out, and air out are outs/dismissals, and single, double, triple, and home run are hits.Borrowing of information across cell categories within each group only.For our analysis, we use Δ = ∑ K−1 j=1 (p i(j+1) − p ij ) 2 .We modify this penalty function slightly so that the borrowing of information across cell categories is done within each group only.Assuming the batting outcomes are arranged in the given order (SO, GO, … , HR) as above, the modified penalty function is given by Fig. 3 shows the estimates of the cell probabilities for the top 10 batters using smoothed Dirichlet prior with different that are very close to the MLE.This behavior was to be expected, given the top 10 batters have a large number of plate appearances: in their case, when increases, we can see small fluctuations from the MLEs.= (x i1 , x i2 , … , x i11 ) t ∼ Multinomial(n i , ), = (p i1 , p i2 , … , p i11 ) t ∼ SD( = ( 1 , 2 , … , 11 ) t , , Δ).

Conclusions
The proposed smoothed Dirichlet distribution constitutes a superior alternative for borrowing information among neighboring cells.The proposed smoothed Dirichlet distribution forces the probabilities of neighboring cells to be closer to each other

Table 1
MSE values for scenario 1

Table 2
MSE values for scenario 2