Forecasting Using Information and Entropy Based on Belief Functions

(is paper introduces an entropy-based belief function to the forecasting problem. While the likelihood-based belief function needs to know the distribution of the objective function for the prediction, the entropy-based belief function does not. (is is because the observed data likelihood is somewhat complex in practice. We, thus, replace the likelihood function with the entropy. (at is, we propose an approach in which a belief function is built from the entropy function. As an illustration, the proposed method is compared to the likelihood-based belief function in the simulation and empirical studies. According to the results, our approach performs well under a wide array of simulated data models and distributions. (ere are pieces of evidence that the prediction interval obtained from the frequentist method has a much narrower prediction interval, while our entropy-based method performs the widest. However, our entropy-based belief function still produces an acceptable range for prediction intervals as the true prediction value always lay in the prediction intervals.


Introduction
Analysts are concerned with the uncertainty of the future. Reliable and timely information about future conditions is important for policy makers and planners to propose an appropriate plan and make a correct decision in the current situation. In addition, the assessment of future conditions is essential to reach reliable results and rational decisions. For example, in the economic research field, predicting the future of the economy has become a common practice (see, Mitchell [1] and Lahiri and Monokroussos [2]). In recent years, the need to forecast in economics has become even more important due to high fluctuation in the global economy. Whatever the approach used, a forecast cannot be trusted unless it is accompanied by some measure of uncertainty. us, in this context, it is highly desirable to have subjective probabilities or predicted intervals for the prediction [3]. In addition to the economic field, the prediction is also important in finance, medicine, engineering, and biology in order to provide an efficient solution for any current problems.
Since the introduction of the Dempster-Shafer theory of belief functions [4,5], new formal frameworks for handling uncertainty have been proposed and have attracted increasing interest in various areas. In this approach, a piece of evidence can be represented by a belief function, which is mathematically equivalent to a random set [6]. Although the approach is conceptually simple and easy to apply in various problems and models, statistical methods based on belief functions have not gained much acceptance. Kanjanatarakul et al. [7] mentioned that this approach cannot be applied easily to the complex statistical models in many areas, such as machine learning or econometrics. us, they suggested Shafer's approach to statistical inference using belief functions, based on the concept of likelihood. Shafer [5] extended Dempster's seminal work and proposed a belief function that is constructed from the likelihood function. It was recently justified by Denoeux [8] from the following three basic principles: likelihood principle, compatibility with Bayesian inference, and the least commitment principle. e likelihood-based belief function has gained an increasing interest in the last few years. Previous works using the likelihood-based belief function in forecasting models are found in many studies such as Kanjanatarakul et al. [3], Kanjanatarakul et al. [7], ianpaen et al. [9], and Chakpitak et al. [10]. However, it is often very difficult to specify the appropriate likelihood distribution for such models. If the distribution assumption is not specified properly, it might lead to model misspecification and bring about a prediction bias. To overcome this problem, an approach based on information theory, the classical maximum entropy (ME) principle, which was introduced by Jaynes [11], is proposed in this study. e ME principle has opened the way for a whole new class of estimation methods. e estimation allows us to extract all the available information from the observed data with minimum assumptions and avoid specifying a likelihood function. In this study, we consider the Generalized Maximum Entropy (GME). We modify the likelihood-based belief functions by using an entropy function in place of the likelihood function. erefore, we propose an approach in which a belief function Bel Θ y on Θ is built from the entropy function. e advantage of this method is that the statistical inference will not assume any distribution in the plausibility pl Θ y which refers to the relative entropy. Hence, the method becomes more flexible to explain a wide range of variation in the observed data.
Although the likelihood-based belief function has been proved to be a reliable interval prediction method (Denoeux [8]) applications to forecasting have remained limited in some applications. e reason might be that the likelihoodbased belief function requires the statistician to provide the likelihood distribution function, which is problematic when no likelihood knowledge, or only weak information, is available. e purpose of this paper is to introduce a new approach for forecasting future observation as well as providing a prediction interval. Different from the likelihood-based belief function, we replace the parametric likelihood function with the entropy function to derive the entropy-based belief function, and thus the model misspecification is avoided since no distributional assumption is made.
In this paper, we further explore this proposed approach by proving that, under weak distribution function information, the predictive entropy-based belief function converges to the true probability distribution of the not yet observed data. Several simulation studies are conducted to investigate the performance of our proposed method. Finally, we apply our proposed method to Autoregressive (AR) model in predicting the growth rate of ailand's GDP where data is scarce and involves high uncertainty. Besides, we further explore this method by showing that it can be used for predicting future observations based on the linear regression framework using the data from the operation of a plant for the oxidation of ammonia to nitric acid, measured on 21 consecutive days. We note that our efficient method allows for accurate decision-making based on reliable forecasts in various research fields. e remainder of the paper is organized as follows. Section 2 provides the background on the linear entropybased belief function concept and describes the estimation and prediction using the entropy-based belief function methods. Section 3 proposes a simulation study and real application study in various fields to evaluate the performance of our method through autoregressive and regression models. Section 4 summarizes the paper.

Methodology
is section briefly explains the necessary background notions related to the entropy approach and introduces the connection of entropy on the belief function. e Generalized Maximum Entropy approach and parameter estimation are first presented in Sections 2.1 and 2.2, respectively. e entropy-based belief function is then introduced in Section 2.3. Finally, the basic concept for predictive belief function is presented in Section 2.4.

Generalized Maximum Entropy Approach.
In this study, we propose using a maximum entropy estimator to estimate our unknown parameter p km s β � β 1 , ..., β K in the econometric models, say AR and linear regression models. Before we discuss this estimator for regression and its statistical properties, we briefly summarize the entropy approach. e maximum entropy concept consists of inferring the probability distribution that maximizes information entropy given a set of various constraints. Let be a proper probability of variable k on support m and the vector z k � z k1 , · · · , z kM be M-dimensional vector support of coefficients k, k � 1, ..., K. Following Shannon's entropy (1948), the summation of the K entropies is where the probabilities P are the unknown and unobservable probabilities in the entropy and M m�1 p km � 1. e entropy measures the uncertainty of distribution and reaches a maximum when p k1 � p k2 � · · · � p km � 1/M; in other words, if no constraints (data) are imposed, H(P) reaches its maximum value for uniform distribution.

Parameter Estimation.
To better understand the entropy estimation, we give a simple example. Let us consider the simple autoregressive AR model. Let y t , t � 1, ..., T, be observed data which are expressed as a linear combination of past observations y t−k . In the AR (K), we have where K and β k are the autoregressive order and the autoregressive coefficient, respectively, and ε t is the independent and identically distributed random white noise that is not assumed to have any distribution and is assumed to be an uncorrelated random variable y t−k .
To apply this concept to be an estimator of the regression model, we generalize the maximum entropy to the inverse problem in the regression framework. Rather than searching for the point estimates β, we can view these unknown parameters as expectations of random variables with M where p km are the M-dimensional estimated probability defined on the support vector z km . Next, similar to the above expression, ε t is also constructed as the mean value of some random variable v t . Here, each ε t is assumed to be a random vector with finite and discrete random variable with M support value, v t � [v t1 , ..., v tM ]. Let w t be a M dimensioned proper probability weights defined on the set v tm such that Using the reparameterized unknowns β k and ε t , we can rewrite equation (2) as where the vector support z km and v tm are convex set that is symmetric around zero. en, we can construct our Generalized Maximum Entropy (GME) estimator as subject to where p km and w tm are on the interval [0, 1]. is optimization problem can be solved using the Lagrangian method which takes the following form: where the unknown multipliers λ 1 , λ 2 , λ 3 are T × 1, K × 1, and T × 1 vectors of Lagrangian multipliers. us, the resulting first-order conditions are us, we have Since the constraint requires that M m�1 p km � 1 and M m�1 w tm � 1, we sum both p km and w tm over m, and we have where λ ⌢ is the estimated multiplier. To estimate the Lagrangian multipliers, we can substitute the candidate probabilities, p km and w tm , into the original Lagrangian (equation (9)) en, the estimated λ ⌢ can be obtained by differentiating the concentrated Lagrangian (equation (9)) (conditional on the candidates p km and w tm ) with respect to λ and set it to be equal be zero. en, substitute these estimated parameters in equations (13)- (14).
) are constants for a given parameter, so the optimal probability fd15equations (11) and (12) can be rewritten as Summing up the above equations, we maximize the joint-entropy objective equation (6) subject to the regression Complexity 3 equation (7), by adding restrictions equation (8). e solution to this maximization problem is unique by forming the Lagrangian and solving for the first-order conditions to obtain the optimal solutions p ⌢ km , w ⌢ tm , and λ ⌢ 1 (estimated values). en, these estimated probabilities are used to derive the point estimates for the regression coefficients and error term, see equations (3) and (4).

Example 1.
To better understand the entropy estimation, we give a simple example. Let consider the first-order linear autoregressive AR(1) model as defined by e true values for β 0 and β 1 are set to be 1 and −1, respectively. e error is drawn randomly from a normal distribution with mean 0 and variance 1. We generate equation (16) using sample size T �10. e support spaces of β 0 and β 1 are defined as −5, 0, 5 }. e simulated data are shown in Figure 1. And the results are reported in Tables 1 and 2.
To estimate equation (16), we need to construct our GME problem as where p km and w tm are the parameter estimates in this problem, see equations (3)-(4).

e Entropy-Based Belief Function.
Following the three basic principles of the likelihood-based belief function, consisting the likelihood principle, compatibility with Bayesian inference, and the least commitment principle, which are justified by Denoeux [8], we modify these approaches by using an entropy function as the substitute for the likelihood function. Let P ∈ Θ be the parameter of interest, and we propose an approach in which a belief function Bel Θ y on Θ is built from the entropy function. In this case, the entropy ratio refers to a "relative plausibility," and the contour function can thus be readily calculated as where P � p km , w tm in which p km and w tm are proper probability vector mass functions on the interval [0, 1] and H(P) is subject to Meanwhile, H(P) is defined as and the marginal contour functions on each individual parameter are Subject to the additional constraints equations (20)-(21), this belief function is called the entropy-based belief function on Θ induced by y.
e corresponding plausibility function can be estimated by pl y (P) as for all C ⊆ Θ. e focal sets of Bel Θ y are the level sets of pl y (P) defined as where ω is uniformly distributed in [0, 1]. is belief function Bel Θ y is equivalent to the random set induced by the Lebesgue measure λ on [0, 1] and the multivalued mapping

e Basic Method for Predictive Belief Function.
To forecast future data Y f � (Y T+1 , ..., Y T+h ), the sampling model used by Dempster [12] is introduced here. In this model, the forecast data Y T+1 is expressed as a function of the proper probability mass function P which is obtained by past observed data y � (y 1 , ..., y T ) and an unobserved auxiliary S ∈ S variable with known probability measure μ not depending on P. Note that S is the sample space of s: where φ is defined in such a way that the distribution of Y f is conditional on fixed P. When Y f is a continuous random variable, equation (26) can be computed by where G −1 y,P is the inverse conditional cumulative distribution function (cdf ) of Y f | y and s is uniformly distributed in [0, 1]. We note that, in the likelihood-based belief function case, G −1 y,P becomes G −1 y,β which is the inverse of some conditional cumulative distribution functions, say normal, student-t, or other parametric distributions. However, in the entropy-based belief function case, we do not need any distribution assumptions on G −1 y,P . Let us compose e predictive belief function Bel y and plausibility Pl y are then induced by the multivalued mapping Γ y and λ ⊗ μ on [0, 1] × S as follows: for all C ⊆ R.

Simulation and Application: Entropy-Based Belief Function
Let us now turn our attention to examining the performance of the entropy-based belief function. In this section, the inference and forecasting methodology outlined in the previous section will be applied to the simulated data and real data. We apply the entropy-based belief function to the forecasting AR and regression models which will be presented in Sections 3.1 and 3.2, respectively.

e Entropy-Based Belief Function in Autoregressive
Model Problem 3.1.1. Prediction. As we have seen in Section 2, the estimation problem is to make statements about some probability after observing some data. To describe the prediction method for the AR model, let us consider the case where the K � 1; thus, the first-order linear autoregressive AR (1) model is defined by where Y T+1 is a predicted value of T + 1 which is not yet observed and y T is the observed value at time T. e main idea is to estimate Y T+1 as a function of β, which is viewed as an expectation of probability with M support and some pivotal variable S, whose distribution does not depend on β. We can write equivalently where F(·) −1 is the quantile function of empirical distribution based on the estimated residuals, ε 1 , ..., ε T . en, in the case of entropy approach, we rewrite equation (30) as where m is the number of supports of z and v. Consequently, in the entropy estimation, we rewrite the focal set equation (31) as To forecast Y T+1 | y T , the predictive belief function and plausibility function can then be approximated using Monte Carlo simulation as in the following procedure (Kanjanatarakul et al. [3] and Kanjanatarakul et al. [7]: where y i (s i , ω i ) and y i (s i , ω i ) are the lower and upper bounds which can be computed using a constrained nonlinear optimization algorithm. is optimization problem can be solved using the nonlinear optimization: which subject to (2) We can then approximate the predictive belief (Bel y ) and plausibility (pl y ) functions on Y f as or all C ⊆ Θ. ese lower and upper cdfs of the predictive belief function are approximated using N = 5000 randomly generated focal sets (Algorithm 1).

Experiment Study.
In this section, we conduct a simulation and experiment studies to investigate the finite sample performance of the proposed Shannon entropybased belief methods. e simulated data are constructed from the AR model using different error distributions consisting of normal, Student-t, skewed Student-t, and uniform. We note that the normal and student-t are selected as symmetric distribution, while skewed student-t and uniform are typical examples of highly skewed distribution and flat distribution, respectively. In these experiments, we consider three competing likelihood functions: normal, student-t, and skewed student-t likelihoods.
To compare these likelihood-based belief functions, the one-step-ahead forecast is computed from simulated data and compared with the true prediction value. We note that, as the function pl(y) � Pl(y) has a unique maximum y, it may be taken as a point prediction of the AR (1) model-based belief function approach. is is to say y can be viewed as the most plausible value of Y given the past information. us, we need a criterion to gauge the performance of the method. Here, the mean square error (MSE) method is employed, and it can be defined as where R � 20 is the number of repetitions, Y true is the true value, and Y f � y is the predicted value.
In this Monte Carlo simulation, we consider sample size T � 20 and T � 40. R � 20 samples are generated for each case. To make a fair comparison, the error terms are generated from en, the sampling experiments are based on the AR (1) model: In this experiment, the first 20 observations are used to predict the 21st observation. Likewise, the first 40 observations are used to predict the 41st observation. en, the most plausible predicted values, Y 21st In the case of the Shannon entropy, the support z 1m is initially set to (−5, 0, 5) and the supports for v tm to (−3σ, 0, 3σ), where σ is computed from the conventional AR model. Table 3 reports the prediction error MSE of our considered estimators. Consider the performance of the proposed estimator when the errors are generated from a uniform distribution, and we found that the error of the Shannon entropy-based belief function is smaller than the other methods over the onestep-ahead forecast. In addition, the performance of this method is better when the sample size increases from 20 to 40.
Next, we investigate the performance of the proposed estimator when the errors are generated from normal, student-t, and skewed student-t distributions and compare with the parametric likelihood-based belief function. We observe that the overall result is different in Table 3. We find that the Shannon entropy-based belief function seems to outperform the likelihood-based belief function in terms of lowest MSE of the prediction when the U(−2, 2) error distribution is assumed. When the errors are generated from other parametric distributions, we find that the likelihoodbased belief function method with the correct distribution function always has the best performance. However, we notice that our Shannon entropy-based belief function performs the second-best prediction over the wide range of error distributions. 6 Complexity us, in this simulation study, we can conclude that our method performs well particularly when the error is unknown or has a uniform distribution. Nonetheless, when the error is generated from known distribution, the parametric likelihood-based belief function outperforms the entropy.
is means that if the distribution is known and we estimate the model based on the true distribution, the accurate result is obtained. However, the error distribution is sometimes unknown in practice, these results confirm the advantage of the Shannon entropy-based belief function, which performs uniformly well over a wide range of error distributions. Furthermore, it would be more interesting to consider other entropy measures such as Renyi entropy [13] and Tsallis entropy [14]; we thus introduce these two entropy-based belief functions to investigate the performance of various entropy measures. We note that Renyi entropy is the generalization of Shannon entropy while Tsallis entropy is nonlogarithmic and is obtained through the joint generalization of the averaging procedures and the concept of information gain. ese two entropies are indexed by an order d. e Renyi entropy is formulated as ALGORITHM 1: Monte Carlo algorithm for approximating a predictive entropy-based belief function (random interval case). Complexity while Tsallis entropy can be defined as where the value of c is a positive constant and depends on the particular units used. For simplicity, we set c � 1 and order d (see [15]). Note that both Renyi and Tsallis entropies become the Shannon entropy as a special case when d � 2. To construct the Renyi entropy-based belief function and Tsallis entropy-based belief function, we can replace Shannon entropy in the relative plausibility (equation (18)) by Renyi and Tsallis, respectively. e results of these three entropies for comparison are shown in the last three columns of Table 1, and it is evident that the Shannon entropy-based belief function still performs the best prediction performance in this simulation data. However, the performances of these three entropies are not much different in terms of MSE.
Example 2. Application to predicting GDP growth by the AR model.
In this example, the real data are also used to investigate the performance of our method. e dataset considered, derived from the omson Reuter Datastream and World Bank database, consist of yearly data, from the ending of 1995 to 2018, of Gross Domestic Product (GDP). e examined data series is transformed into the growth rate and plotted in Figure 2. en, we perform the Augmented Dickey-Fuller unit roots test on GDP growth and determine the lagging order number of the AR model according to Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). e ADF-statistic result is −4.337, which is greater than the critical values at 10% significance level and means that GDP is stationary. en, the lag selection for the AR model is shown in Table 4. e result shows that the AR model with lag 1 provides the lowest AIC and BIC.
To evaluate the performance of prediction, the forecast evaluation period is from 2014 to 2018. We perform a recursive forecasting to update GDP growth year by year, totally 5 years, which give us enough data points to evaluate the out-of-sample forecast performance. For this evaluation period, we compute the root mean square forecast error (RMSE) and mean square error (MAE) of forecasts at each forecast horizon (from one to five years ahead). We can compute the RMSE and MAE as in the following: where T � 5 and e � Y true − Y f . e forecast comparison of the AR(1) model with the likelihood-based approach and the entropy-based approach, as well as the conventional frequentist method, performed in out-of-sample is reported in Table 5.
e out-of-sample forecasts made for the 2014-2018 period are reflected by the most plausible prediction values and their large upper-lower prediction intervals. e true prediction values are shown in the first column. e estimated conditional expectation as well as the frequentist 95% prediction intervals are shown in column 2. Columns 3-7 show the most plausible predictions and quantile intervals for the normal, student-t, skewed student-t, and the three entropy-based belief functions, respectively.
We can now consider the performance of different methods. First, we evaluate the entropy-based method regarding the prediction value. Second, we compare the performance regarding the range of the prediction intervals.
In the first evaluation, the smaller the MAE and RMSE, the superior the corresponding model's performance. According to Table 5, we learn that all three entropy-based belief functions outperform the other likelihood-based methods. We can observe that the values of RMSE and MAE of entropy-based belief functions are a bit smaller than those of the likelihood-based belief functions. Again, the Shannon entropy still provides lower RMSE and MAE compared to the other two entropies. We expect that if the real data has a distribution and the assumption of the likelihood-based belief function is not correctly specified, a less accurate prediction is obtained. In addition, small sample size may affect the accuracy of the estimated parameters. Hence, in this study, we conclude that entropy-based belief function for AR (1) performs well in the out-of-sample forecasts relative to other competing methods.
In the second evaluation, for the prediction intervals for ai GDP growth, the methods suggest reliable prediction intervals around the predicted value. We can observe that all prediction interval methods for the AR (1) model are reliable as the true values exist along with the range of the intervals, except for the 95% frequentist approach. e prediction interval based on the 95% frequentist for ai GDP growth in 2014 does not cover the true value. is result reveals that the conventional frequentist approach may not provide the appropriate prediction value.
Regarding the prediction intervals, we can calculate the average of the prediction intervals for making a comparison. e results portray the difference between the methods exists. e frequentist method has a very narrow mean of the prediction intervals, while our entropy-based method performs the widest. We suspect that our approach relies on the empirical distribution assumption on the error distribution and leads to a larger simulated error in the Monte Carlo algorithm for approximating a predictive entropy-based belief function process. However, this experiment shows that our method produces an acceptable range for prediction intervals as the true values always lie in the intervals.
We now use the entropy-based belief function method to predict ai economic growth in the future. e prediction is very important because of its relevance to the current policy debate on whether ai GDP growth will be up or Furthermore, to show the accuracy of our estimation in this study, we show the contour plots of plausibility (pl y (θ)) in two-dimensional parameter subspaces of each method, as   shown in the Appendix.
ese contour plots show twodimensional slices of the contour function, where one of the three parameters is fixed to its maximum normal likelihood, two of the four parameters are fixed to its maximum student-t likelihood, three of the five parameters are fixed to its maximum skewed student-t likelihood, and ((z + v) × m) − 2 of the ((z + v) × m) probability parameters are fixed to its maximum entropy. e results of these contour plots show that the estimated parameters of each method are likely to reach the maximum plausibility, as presented by the red cross. Consider the contour plots of pl y (θ) for our proposed method, entropy-based belief functions. We can summarize the process graphically by plotting the optimization on a contour plot of the plausibility pl y (P). We can observe that, for example, the estimated probabilities p 11 and p 12 reach a maximum plausibility pl y (P) at the red cross.

Regression Problem.
Consider the simple linear regression model: where y t is observed dependent variable at time t, x kt is an independent variable k at time t, and ε t is the error that is not assumed to be any distribution when the entropy-based approach is used.

Prediction with Certain Inputs.
Similar to the AR model, the prediction with linear regression can be easily computed using the method described in Section 3.1.2. Let Y T+1 be the future value which cannot be observed at time T + 1 and x kT be the observed inputs or covariates at time T: where Y T+1 is a predicted value which is not yet observed and y T is the observed value at time T. We can write equivalently, Under the entropy approach, we rewrite equations (45)-(46) as To forecast the future observation conditional on all information of the observed inputs at time, T, Y T+1 | x kT , the predictive belief function and plausibility function can then be approximated using Monte Carlo simulation as described in Section 3.1.2. We, then, compute the maximum and minimum of the linear regression function φ y (P, s) subject to pl y (P) ≥ ω i for minimum and maximum problems. e additional constraints are (48) Example 3. As an example, we illustrate the proposed method by applying them to the data from the operation of a plant for the oxidation of ammonia to nitric acid, measured on 21 consecutive days. e data are analyzed by Brownlee [16] in a simple regression setting, and it is already available within R software. e dataset consists of 4 variables, namely, air flow to the plant (AIR), cooling water inlet temperature (WATER), acid concentration (percentage minus 50 times 10 (ACID), and ammonia lost (percentage times 10) (LOSS). We consider LOSS as the dependent variable, and AIR, WATER, and ACID as independent variables. us, we consider the following linear regression model: By using the Maximum likelihood (ML) and GME estimations, we can obtain the estimated parameters, standard errors, and p values. Table 6 provides the estimated results of our ML and GME estimates altogether with some basic statistics (standard errors and p values) and the plausibilities pl(β k � 0) (see [7]). However, in the case of GME estimation, the computation of the plausibilities pl(β k � 0) is different. Recall that, in the entropy approach, the estimated parameter of the model cannot be obtained directly from the estimation, but it is computed as the summation of all expected support points. us, it would be reasonable to have pl( M m�1 p km z km � 0). We know that if p k1 � p k2 � · · · � p kM � 1/M, then M m�1 p km z km � 0. Hence, we turn our attention to test pl(p km � 1/M). In our approach, suppose M � 3 and we have β k � 0, whenever p k1 � p k3 . To compute this, we need to maximize the entropy under the usual constraints and additional constraint p k1 � p k3 � 1/3. erefore, we can further compute the pl(β k � 0) � pl(p k1 � p k3 ). In this test, we test whether p km be the optimal probability of variable k at support m. If pl(p k1 � p k3 ) < pl( ⌢ p k1 , ⌢ p k2 , ⌢ p k3 ), the hypothesis of the coefficient β k equal to zero is rejected, indicating that this coefficient is significant. Figure 4 shows the marginal contour functions pl(β 3 � 0) for parameter ACID obtained from the likelihood-based approach, and we can observe that the pl(β 3 � 0) is greater than 0.10, indicating that the acid concentration variable is said to have insignificant effect to ammonia lost. In Figure 5, the marginal contour functions of pl(β k � 0) � pl(p k1 � p k3 ) are plotted for parameters AIR (k � 1), WATER (k � 2), and ACID (k � 3). We can observe that is confirms that there is no significant effect of ACID to LOSS. Example 4. Continuing Example 3, we then consider the task of prediction problem. We consider the ammonia lost as the dependent variable and only AIR and WATER as the inputs since they show a significant effect on LOSS variable. We have re-estimated the model, and the result is provided in Table 7 and Figure 6. We can observe that the similar significant result of the estimated parameters is obtained for all methods. We also make a prediction of LOSS when the inputs are known at time T; hence, the prediction equation is based on with ε T which is assumed to be empirical distribution. We can write, equivalently,  ). For the case of entropy, the support z 1m is initially set to (−50, 0, 50) and the supports for v tm to (−3σ, 0, 3σ), where σis computed form the conventional LS estimation.   ). For the case of Entropy, the support z 1m is initially set to (−100, 0, 100) and the supports for v tm to (−3σ, 0, 3σ), where σ is computed from the conventional LS estimation.    14 Complexity e predictive belief function on LOSS * T�21 can then be approximated using the methods described in Section 3.2.2. Figure 6 displays the lower and upper cdfs of the various predictive belief functions, approximated using N � 5000 randomly generated focal sets. e prediction value of LOSS * T�21 is plotted by the vertical blue dotted line as well as the α quantile intervals with α � 0.05, 0.95. We then compare the prediction performance based on prediction bias. e true value of LOSS T�21 is 15; thus, we calculate prediction bias from |LOSS * T�21 − 15|. According to the last row of Table 7, we can observe that the Shannon entropy-based belief function method outperforms the others as the lowest bias is obtained. Moreover, we find that the prediction interval of the Shannon entropy method provides a reliable interval. Although the 0.05 quantile interval does not cover the true value, the lower bound of its interval is closer to the true value � 15. erefore, to confirm the reliability of the method, we extend the lower quantile interval of the Shannon entropy-based method to be α � 0.01 and we can observe that the true observed data is contained in the 0.01 lower quantile interval (green dashed line).

Conclusion
In recent studies, forecasting is an important tool for decision-making and strategic planning. Describing the uncertainty of the forecasts accurately is thus a very important issue. e approach developed in this paper is to model estimation uncertainty using a belief function. e belief function Bel y is just another piece of evidence. It can be combined with the likelihood belief function and the joint belief function by Dempster's rule. e combined belief function is then, as before, marginalized on φ y (p, s) to get the upper and lower predictive belief function.
As the information of the distribution function is generally unknown, we thus have a concern that the estimation and prediction problems that are solved in likelihood functionbased belief function framework might have a bias if the observed data are not normally distributed. erefore, this study takes advantage of the entropy approach to construct a belief function and illustrate these solutions under the autoregressive (AR) and regression contexts. Specifically, we replace the parametric likelihood function with the entropy measure (Shannon, Renyi, and Tsallis) to derive the entropy-based belief function and thus the model misspecification is avoided since no distributional assumption is made.
To validate the performance of our proposed method, the simulation study is conducted. e results reveal that our method performs remarkably well in comparison with a host of competing methods. We would like to highlight that if the distribution is known and we estimate the model based on the true distribution, the accurate result is obtained. However, the error distribution is sometimes unknown in practice, these results confirm the advantage of the Shannon entropy-based belief function, which performs uniformly well over a wide range of error distributions.
Furthermore, our entropy-based belief function is also applied to real data. In the first real application model, we conduct the out-of-sample forecasts from the year 2014 to 2018 and the results are displayed in Table 5. We find that the entropy-based belief function for AR (1) exhibits good performance in the out-of-sample forecasts relative to the other competing methods. We then apply our method to predict the growth rate of GDP in 2019 and find that the growth rate of the ai GDP is around 3.961%. In the second real application, we apply our method to predict the future value using the linear regression with certain inputs. We can observe that the estimates under the likelihood-based methods and entropy-based methods exhibit slight differences, as expected. e forecasting bias of our entropy-based belief is lower than those via likelihood-based belief, in particular the Shannon entropy. is suggests that the entropy-based belief seems to produce more accurate estimates of the prediction under the regression context as well. From these results, it is encouraging that the use of entropy-based belief offers a better alternative in the prediction problem.
In brief, this study proposes the entropy-based belief function as an alternative method in place of likelihood predictive distributions. e method is very general and can Complexity 15 be used with any parametric distributions. However, as the computational complexity is quite high, we may face the higher computational cost in the large datasets. We leave the performance of the computation of our method to future work. In addition, for the further studies, to make a more accurate prediction, this approach may be extended to combine a random interval encoding expert assessments of the prediction from several government agencies, investors, business enterprises, and private/public institutions. Furthermore, the forecasting performance of entropy-based belief function could be improved by increasing the order d in the Renyi and Tsallis entropies. erefore, we suggest the future study to conduct the higher order of Renyi and Tsallis entropies. Finally, it deserves comparing our approach with the posterior distribution-based belief function in the Bayesian context and Deng entropy-based belief function to confirm the performance of our entropy-based method.