2 Ill-posed Inverse Problem Solution and the Maximum Entropy Principle

As explained in the introduction, many economic relationships are characterized by indeterminacy. This may be because of long-range feedback and complex correlations between source and targets, thus rendering causal relationships more difficult to investigate. In this part of the work, the formal definition of the inverse problem will be discussed. A Moore-Penrose approach will be presented for solving this kind of problem and its limits will be stressed. The next step will be to present the concept of the maximum entropy principle in the context of the Gibbs-Shannon model. Extensions of the model by Jaynes and Kullback-Leibler will be presented and a generalisation of the model will be implemented to take into account random disturbance. The next step will concern the non-ergodic form of entropy known in the literature of thermodynamics as non-extensive entropy or non-additive statistics. There will be a focus on Tsallis entropy, and its main properties will be presented in the context of information theory. To establish a footing in the context of real world problems, non-extensive entropy will be generalized and then random disturbances will be introduced into the model. This part of the work will be concluded with the proposition of a statistical inference in the context of information theory.


Introduction
As explained in the introduction, many economic relationships are characterized by indeterminacy. This may be because of long-range feedback and complex correlations between source and targets, thus rendering causal relationships more difficult to investigate.
In this part of the work, the formal definition of the inverse problem will be discussed. A Moore-Penrose approach will be presented for solving this kind of problem and its limits will be stressed. The next step will be to present the concept of the maximum entropy principle in the context of the Gibbs-Shannon model. Extensions of the model by Jaynes and Kullback-Leibler will be presented and a generalisation of the model will be implemented to take into account random disturbance. The next step will concern the non-ergodic form of entropy known in the literature of thermodynamics as non-extensive entropy or non-additive statistics. There will be a focus on Tsallis entropy, and its main properties will be presented in the context of information theory. To establish a footing in the context of real world problems, non-extensive entropy will be generalized and then random disturbances will be introduced into the model. This part of the work will be concluded with the proposition of a statistical inference in the context of information theory.

The Inverse Problem and Socio-Economic Phenomena
An inverse problem (e.g., Thikonov et al. (1977), Bwanakare (2015), Golan et al. (1996)) explains a situation where one tries to capture the causes of phenomena for which experimental observations represent the effect.
The essence of the inverse problem is conveyed by the expression: (2. 12) or its equivalent in continuous form: In classical econometrics, when given a state X, an operator B and, as happens most of the time, a disturbance term (ζ), what is Y? This is referred to as a forward problem. In social science, one must often cope with the above random (Gaussian or not) disturbance term, and this usually complicates matters in spite of significant, recent developments in econometrics, particularly concerning stochastic time-series analysis (Engle & Granger, 1987). Furthermore, the inverse question is more profound: Given y and a specific B, what is the true state X?
If B should also be a functional of X, the problem becomes arbitrarily complex. Correlation between (ζ) and X will be at the base of such additional complexity.
Every day, psychologists cope with such inferential problems. Patients display identical symptoms from different sicknesses. Health practitioners need more historical (a priori) information on patients to try to find the solution.
In economics, the same national output growth rate may result from different combinations of factors. One of the main problems encountered by practicing economists is isolating the causes of economic phenomena once they have occurred. In most cases, the economist becomes inventive in finding an appropriate hypothesis before trying to solve the problem. As an example, in the case of a recession or financial turbulence, it is usually difficult to point to principal causes and fix them. Schools of economics suggest different, even contradictory, solutions-the legacy of its inverse problem nature.
In empirical research, many techniques exist to try to solve the inverse problem. In the context of the present work, the presentation will be limited to those more applicable to matrix inversion, like the Moore-Penrose pseudo-inverse approach, and, naturally, maximum entropy based approaches. The approach better known in economics for updating national accounts on the basis of bi-proportionalities will then be added to these two techniques.

Moore-Penrose Pseudo-Inverse
Let us consider the discrete and determinist case and rewrite (2.12) as follows: (2.14) In the right equality ρ reflects the case where we have to deal with a ratio or probability parameter, for example, after reparametrizing B. We then have: (2.15) which means: XBX = X and V, representing the generalized inverse matrix (Golan, 1996), (Kalman, 1960). This is a matrix with the symbol B+ that satisfies the following requirements: Following Theil (1967), a unique B + can be found for any matrix: square, nonsingular or not. When the matrix B + happens to simultaneously be square and nonsingular, then the generalized inverse will be the ordinary inverse B -. The problem that interests us is the over-determined system of equations where B has n rows, K < n columns and column rank equal to R ≤ K.
If we retain the particular case when R equals K to ensure the existence of (B'B) -1 , then the generalised inverse of B is as can be easily verified. A solution to the system of equations can be presented as: Following Green (2003, p. 833), we note in this case that the length of this vector minimizes the distance between Y and BX, according to the least squares properties method. This distance will naturally remain equal to zero if y lies in the column space of B.
If we now retain the more general case where B does not have full rank, the above solution is no longer valid and a spectral decomposition using the reciprocals of the characteristic roots is involved to compute the inverse which becomes: where C 1 are the R characteristic vectors corresponding to the non-zero roots arrayed in the diagonal matrix A 1 . The next and last case is the one where B is symmetric and singular, that is, with the rank R ≤ K. In such a case, Moore-Penrose inverse is computed as in the preceding case but without pre-multiplying by B'. Thus, for such a symmetric matrix, (2.16) with A 1 -1 being a diagonal matrix of the reciprocals of the non-zero roots of B. It is important to note that only matrix B with full rank ensures a minimum distance between Y and BX. In other cases, there may exist an infinite number of combinations of elements of matrix B or ρ̂ which satisfy (2.14).
To conclude, in spite of strong advantages of the Moore-Penrose generalised inverse, outputs will not always reflect an optimal solution.

The Gibbs-Shannon Maximum Entropy Principle and the Inverse Problem
Let us introduce the concept of Shannon entropy by continuing with the case of pure linear inverse problem solution discussed above. The simplest (one dimensional case) example is the Jaynes dice inverse problem.
If a die is fair, and we throw it a large number of times n, with k different output modalities9 (k = 1,..., K), the expected value will be 3.5, as from a uniform distribution with probability f k equal 1/6. How can one infer about p k if we have 'loaded' (unfair) dice and the expected value of the trial becomes: where frequencies p k is n n k ?
In this case, the central question is: Which estimate of the set of frequencies would most likely yield this number? The problem is underdetermined since there are many sets of f k that can be found to fit the single datum of Equation (2.17). Here we have to deal with a multinomial distribution where the multinomial coefficient W is given by: Deriving and using the Stirling approximation lnx! ≅ xlnx -x for a large number of N, we get the Shannon entropy formulation: In the case of a die, parameter K equals 6, and W is the multinomial coefficient, i.e., the number yielding a particular set of frequencies among 6 N possible outcomes. 9 Generally, if the number of trials is equal to n, we will have n k possible outputs corresponding to each modality k with   k k n n . Thus, the frequency p k = n n k is related to each modality k.
We need only find the set of frequencies maximizing W in order to find the set that can be realized in the greatest number of ways. This is the most plausible combination in the case of fair dice.
This turns out to convey the same logic as maximizing Shannon Gibbs entropy. Thus, starting from two pieces of information, that is, the number k equal to six and N, a large number of trials, we are able to derive six probabilities related to a die distribution.
Next, Jaynes (1994) maximized the Shannon function through the restriction of consistent information at hand. This opened entropy theory application to many scientific fields, including the social sciences.
Thus, if we add to the formulation (2.18) the moment-consistency and the adding up-normalization constraints, we then get: where {y 1 , y 2 ,..., y t } denotes a set of observations (e.g., aggregate accounts or their averages) being consistent with a function f t (x k ) of explicative variables weighted by a corresponding distribution of probabilities {p 1 , p 2 ,..., p k }. As usually happens, T is less than K, and the problem is ill-posed (underdetermined).
Two main results emerge from the above formulation. First, if all events are independent or quasi-independent (locally dependent) and equally probable, then the above entropy is a linear function of the number of the possible system states and then is extensive11.
A second fundamental result is connected with information theory and suggests that a Gaussian variable has the largest entropy among all random variables of equal variance (see Papoulis, 1991 for proof). In the next chapter on non-extensive entropy, a measure to assess the divergence of a given distribution from Gaussian distribution will be presented.
Coming back to the dice case, maximization of Shannon entropy in (2.19), that is H(P) = -P΄ln P under Jaynes consistency, leads to the distribution presented in Table 1. To solve this inverse problem of six unknowns, the only two pieces of information available are the expected value-from the experiments in this example-assumed to be equal to 4.5 and the information that the probability of different possibilities adds up to one. However, since we are dealing with unbalanced dice, we have no idea about the distribution.
The next chapters extend the Shannon-Gibbs-Jaynes maximum entropy principle with Kullback-Leibler relative entropy. The next to the last targeted presentation will deal with the general linear entropy model, that is, the one with a stochastic component. To conclude, Tsallis power law distribution to generalize Kullback-Leibler crossentropy will be considered. Kullback (1959), Good (1963) extended the Jaynes-Shannon-Gibbs model by formulating the principle of minimum (cross or relative) entropy. Using an a priori piece of information q about unknown parameter p, the resulting formulation is as follows:
These restrictions are the same as those presented earlier. In the criterion function (2.22), a posteriori and a priori vectors or matrices p and q are confronted with the purpose of measuring entropy reduction resulting from exclusive new content of data information. One should note that when q is fully consistent with moments, then p = q and the distribution becomes uniform with q k = 1/K. This leads to the solution of the maximum entropy principle. Thus, the cross-entropy principle stands for a certain form of generalization of maximum entropy. Relation (2.22) above is an illustration of the previous Kullback formulation in (2.8) as a mean information from (2.23) and (2.24) for discrimination in favour of p against q.

General Linear Entropy Econometrics
In social science, it is rare to encounter the situation described by the relation (2.14) where the random term is meaningless as is often encountered in the experimental sciences. Social phenomena are particularly affected by stochastic components. Let us rewrite it below in its generalized form: with the random term ζ i ∈e and i = (1,..., I) (I being the number of observations); K is the number of model parameters to be estimated.
We treat each B j (j = 1,…, K) as a discrete random variable within a compact support and 2 < M < ∞ possible outcomes. So, we can express B j as: where p km is the probability of outcome v km and the probabilities must be non-negative and sum up to one. Similarly, let us treat each element ζ i of e as a finite and discrete random variable with compact support and 2 < M < ∞ possible outcomes centred on zero. We can express ζ i as: where r n is the probability of outcome w n . The term ζ i , like any prior value in the model, reflects Bayesian properties and is not a fixed value as in the case of classical econometric models. In practice, support sets with three or more points12 are used to take into account higher moments of the distribution during the process of information recovery.

Definition and Shannon-Tsallis Entropy Relationships
This relatively new form of entropy is emerging over an immense area of applications in social science, including economics. One of the fields of interest is modelling and predicting markets of financial returns (Drożdż & Kwapień, 2012), (Grech & Pamula, 2013). Nevertheless, due to the high frequency nature of Big Data in Official Statistics (e.g., Braaksma & Zeelenberg, 2015), the PL-based non-extensive entropy econometrics should be seen as a potential and natural estimation device in this new statistical area. As in statistical physics, socioeconomic random events display two types of stochastic behaviour: ergodic and non-ergodic systems. Whenever isolated in a closed space, ergodic systems dynamically visit with equal probability all the allowed micro-states (Gell-Mann & Tsallis, 2004). However, it seems logical to imagine systems visiting the allowed micro-states in a much more complex way than defined by ergodicity. The financial market is a well-known example of such complex systems, as characterized by multifractal dimensions (Drożdż & Kwapień, 2012), (Grech & Pamula, 2013). Other examples include income distribution inside a given region, evolution of a given disease inside a region, size of cities, or cellular structure. These forms seem to display an organized structure owing to long-range correlation between micro-elements, heavy queues with respect to Gaussian distribution, scaleinvariant structures, and criticality. Such phenomena would be better described by a stable law-based Levy process, like power law distribution.
Shannon-Kullback-Leibleir Equations (2.22-2.24) are generalized by Tsallis relative entropy formulation. To emphasize consistency among the principal formulations, it is worthwhile to reiterate the statistical theory connection between the above relations and the Kullback relation presented in (2.8) or to some extent (2.9), which 12 Golan, Judge, and Miller (1996) suggest the Chebyshev inequality as a good starting point to define the error support set: Pr[|x| < vσ] ≥ v -2 where v is a positive real and x a random variable, such that E(x) = 0 while var(x) = σ 2 . This inequality leads to the three-sigma rule (Pukelsheim, 1994) for v = 3, i.e., to the probability Pr[-3σ < x < 3σ], which is at least 0.88 and higher when x displays a standard normal distribution. Let us remember that this inequality has the additional advantage of being independent of distribution laws. measures the divergence between two hypotheses H 1 and H 2 . A similar concept will be introduced in the case of non-extensive entropy, which will constitute the final step of Shannon entropy extensions.
Let us generalize the Shannon Gibbs inverse problem through ordinary differential equation characterization (Tsallis, 2009). First, we need to introduce the three simplest-in terms of dynamic complexity-differential equations and their inverse functions, Its solution is y=1 (∀x), and its inverse function is X=1 (∀y). The next simplest differential equation is (2.28) Its solution is y=(1 + x) and its inverse Y=(x -1). The next higher step in increasing complexity is the differential equation (2.29) Its solution is y = e x , and its inverse is y = lnx. Note that the latter inverse equation satisfies the additive property: Following Gell-Mann & Tsallis (2004) and trying to unify the three cases (without preserving linearity), we get: (2.31a) We observe that this expression displays power-law distribution form. Its solution is (2.31b) The above represents the non-extensive (Tsallis) entropy formula. Though it will be discussed in the next section, let us immediately show here the relationship between Shannon and Tsallis entropies through the next pseudo-additive property: for q → −∞, q = 0, q = 1 we obtain the three initial cases (2.27 -2.29), respectively. In particular for q = 1, we then obtain (after using l'Hôpital's rule) the solution of (2.29), the case of Shannon Gibbs entropy. The expression (2.30) states that if two systems x a and x b are logarithmically multiplied, the output is the additive sum of these systems in a logarithmic scale. This explains why Shannon entropy is sometimes referred to as additive entropy. This observation has been taken from (2.21) to emphasize that Shannon entropy is a direct function of data. The term q is referred to as "q-Tsallis." When it is equal to unity, we reach in this limiting case the Shannon entropy.
Tsallis entropy should now be described and compared with other entropy forms. This description indirectly replies to the question of why Tsallis or Shannon entropy rather than Renyi entropy or another is appropriate for a given problem.

Characterization of Non-Extensive
if they happen to be equal, then A and B are said to be probabilistically independent. Otherwise, they are dependent or correlated. Let us then define entropies: More interestingly, the conditional entropies definition, that is, may deserve closer attention, as it can intervene for the definition of estimation precision of a model whether or not the hypothesis of independence between the model variables and its random terms has been accepted. If and only if A and B are independent, Finally, to be more explicit than in the previous section, S q is said to be nonextensive in the sense that given two independent random systems A and B, i.e., P(A, B) = P(A)P(B), then, In the next, empirical part of this book, for inferential purposes and for optimal simplification of numerical computations, this formula will play a key role in determining the level of entropy of a complex system under the hypothesis of independence of subsystems, i.e., between the model and the random term.

Concavity
The concept of concavity is important since, among others things, it allows us to determine whether or not a system is stable. Stability is a meaningful concept in econometrics since it implies stationarity of a process in a given system. Testing for stationarity and cointegration using entropy distribution seems thus to be an open area of further research14. Tsallis, 2004

S q is concave (convex) for all probability distributions and all
By concavity we mean that it can be proven that for all λ,

Tsallis Entropy and Other Forms of Entropy
Let us first review the mathematical main forms of entropies before presenting their most important distinctive properties.
A key element deserves attention here. We see from the first mathematical relation in (2.36) above that Shannon-Gibbs entropy may be generalized, too, by Renyi entropy (2.38) or by the normalized non-extensive form (2.39), independently introduced by Landsberg & Vedral (1998) and by Rajagopal and Abe (2000). Both forms of entropy are monotonically increasing functions of Sq. Tsallis (Gell-Mann & Tsallis, 2004, p. 11) poses and explains a relevant question concerning relationships between these forms of entropy. In fact, after pointing out that monotonicity makes Sq, SqR, and SqN extreme for the same probability distribution, he asks why not base thermodynamics on SqR or SqN rather than only on Tsallis entropy. The response lies in the laws does not necessarily fit into optimal entropy equilibrium. This problem will be briefly covered, later. disadvantages of these two forms of competitive entropy. In fact, it happens that they are not concave for all positive values of q, but only for 0  . 0 0   q q q ≤ 1. Since many physically meaningful phenomena for which q are higher than unity exist, this becomes a serious drawback of both competitive entropies. As far as economic, financial, or social phenomena are concerned, the problem does not allow for any ambiguity since, as we will see in the next section, 1 ≤ q  . 0 0   q q 5/3. For the majority of them15, extreme events are on average more frequent (with persistence) than predicted by Gaussian law and not the reverse (i.e., less frequent-with persistence-than predicted by Gaussian law). Tsallis entropy thus remains the one form that not only generalizes SG entropy but also ensures concavity (stability) inside the whole finite interval where probability distribution is defined. The reader should thus far understand why non-extensive Tsallis entropy has been recently used to generalize all other forms of entropy, at least in many fields where entropy is applied.

Characterization
In the following table, we illustrate different links between the commonly used forms of entropy with respect to the characterization in Table 2. "Yes" and "No" correspond, respectively, to what, according to recent thermodynamics literature (Gell-Mann & Tsallis, 2004), are thermodynamically allowed and forbidden violations of the Boltzmann-Gibbs (BG) entropy properties.
15 For example, for stock market returns, q is around 1.4, far enough from the unity which characterises Gaussian distribution.

Scale of q-Tsallis Index and its Interpretation
Following the thermodynamcs literature built on Lévy-like anomalous diffusion, it has been shown that 3. The index γL of Lévy distribution is related to q as follows: Thus, in empirical applications, the value of q should vary inside an interval from unity to 5/3, which corresponds to cases of finite variance for phenomena dwelling within the Gaussian basin of attraction.

The q-Generalization of the Kullback-Leibler Relative Entropy
Kullback-Leibler-Tsallis cross-entropy is known in literature as the q-generalization of Kullback-Leibler relative entropy. The Kullback-Leibler-Shannon entropy introduced in Part II can be q-generalized (Tsallis, 2009) in a straightforward manner. The discrete version becomes: , one has the following properties: 16 In a continuous case, we have: Therefore, coming back again to the generalized K-Ld cross-entropy, we have18: (2.42) Thus, as Tsallis (2009) has made us aware, the above q-Kullback-Leibler index has the same basic property as the standard Kullback-Leibler entropy and can be used for the same purpose while having the additional advantage of an adaptive q according to the system with which we are dealing.
There exist two different versions of the Kullback-Leibler divergence (K-Ld) in Tsallis statistics, the usual generalized K-Ld shown above and the generalized Bregman K-Ld. According to Venkatesanet et al. (Plastino & Venkatesan, 2011), problems have been encountered in empirical thermodynamics trying to reconcile these two versions. Unfortunately-or fortunately!-the same problems seem to reappear while applying this theory in social science since every version of generalized K-Ld leads to different outputs. Let us try to synthesize what recent literature says about this problem.

Tsallis Versions of the Kullback-Leibler Divergence in Constraining Problems
This short section represents the final bridge between theory and the applications in the last parts of this work. In a recent study, Plastino & Venkatesan (2011) lay out interesting aspects of empirical research when q-generalized K-Ld cross-entropy is associated with constraining information. Since, in the social sciences, we particularly need discrete forms of these relative entropies, let us first rewrite these forms before commenting on their conditions of applicability: The form (2.43) is the one derived directly from Kullback-Leibler formalism and presented in (2.40). The second form is referred to as the generalized Bregman form of K-Ld cross-entropy, and it is more appealing than (2.43) from an information-geometric viewpoint (Plastino & Venkatesan, 2011)

even if it does contain certain inherent drawbacks.
A study by Abe and Bagci (2005) has demonstrated that the generalized K-Ld defined by (2.44) is jointly convex in terms of both p i and p 0i while the form defined by (2.43) is convex only in terms of p i . A further distinction between the two forms of the generalized K-Ld concerns the property of composability. While the form defined by (2.44) is composable, the form defined by (2.43) does not exhibit this property.
The second interesting aspect for practitioners concerns the manner in which mean values are computed. Non-extensive statistics has employed a number of forms in which expectations may be defined. The first among these are the linear constraints initially used by Tsallis (2009), also known as normal averages, that is: A fourth-less applied by practitioners-constraining procedure is the optimal Lagrange multiplier approach.
Among these four methods to describe expectations, the most commonly employed by Tsallis practitioners is TMP, referred to as escort distribution.
Recent work by Abe (2009) suggest that, in generalized statistics, expectations defined in terms of normal averages, in contrast to those defined by q-averages, seem to display higher consistency in material chaos hypotheses. Recent reformulation of the variational perturbation approximations in non-extensive statistical physics followed from these findings. To my knowledge, application in the social sciences to assess the universality of this finding has not been done yet.
Finally, there is the issue of consistency. This stems from the form of the generalized K-Ld defined by (2.43) being consistent with expectations and constraints defined by q-averages ("prominently" the TMP) while, on the other hand, the generalized Bregman K-Ld defined by (2.44) is consistent with expectations defined by normal averages.
Thus, through reformulations of an empirical inverse problem, this last point may play a key role since non-appropriated constraints should lead to a non-optimal solution in the best case or to computational problems, as is often the case.

A General Model
This section presents a generalized linear non-extensive entropy econometric approach to estimate econometric models. Following Golan et al. (1996), we first reparametrize the generalized linear model of the equation (2.12') rewritten below: with, once again, the random term ζ i ∈e and i = (1,..., I) (I being the number of observations); K is the number of model parameters to be estimated; where B values are not necessarily constrained between 0 and 1, and ζ is an unobservable disturbance term with finite variance, owing to the nature of economic data that exhibits error observation from empirical measurement or random shocks. If we treat each B j (k = 1... K) as a discrete random variable with compact support and 2 < M < ∞ possible outcomes, we can express B as: where p km is the probability of the outcome v km . The probabilities must be non-negative and add up to one. Similarly, by treating each element ζ i of ζ as a finite and dis-crete random variable with compact support and 2 < M < ∞ possible outcomes centred around zero, we can express ζ i as: where r i is the probability of outcome w i on the support space j, with j∈{1,...,J} and i∈ {i = 1,...,N}. Note that the term e (an estimator of ζ) can be fixed as a percentage of the explained variable, as an a priori Bayesian hypothesis. Posterior probabilities within the support space may display non-Gaussian distribution. The element v km constitutes a priori information provided by the researcher while p km is an unknown probability whose value must be determined by solving a maximum entropy problem. In matrix notation, let us rewrite β = V⋅P with p km ≥ 0 and where the real q, as previously stated, stands for the Tsallis parameter. Above, H q (p,r) weighted by α dual criterion function is nonlinear and measures the entropy in the model. The estimates of the parameters and residual are sensitive to the length and position of support intervals of β parameters. When parameters of the proposed mode19 concern elasticity or error correct coefficients, the values of which lie between 0 and 1, then the support space should be defined inside the interval zero and one. In other cases, the support space may be defined between minus and plus infinity, according to the intuitive evaluation of the modeller. Additionally, within the same interval support, the model estimates and their variances should be affected by the number of support values (Golan et al., 1996). Increasing the number of point values inside the support space leads to improving the a priori information about the system. A few years of modelling with the maximum entropy approach seem to show that a well-defined support space is crucial to obtaining better results. The weights α and (1 -α) are introduced into the above dual objective function. The first term "of precision" accounts for deviations of the estimated parameters from the prior (defined under support space). The second, "prediction ex post," accounts for an empirical error term as a difference between predicted and observed data values of the model.

Parameter Confidence Interval Area
In this section, we will propose the normalized Tsallis entropy coefficient S(âk) as an equivalent to a standard error measure in the case of classical econometrics. An equivalent of the determination coefficient R 2 will be introduced, also under the entropy symbol S(Pr). The departure point is that the maximum level of entropy-uncertainty is reached when significant information-moment constraints are not enforced. This leads to a uniform distribution of probabilities over the k states of the system. As we add each piece of informative data in the form of a constraint, a departure from the uniform distribution will result, which means a lowering in uncertainty. Thus, the value of the proposed S(Pr) below reflects a global departure from the maximum uncertainty for the whole model. Without giving superfluous theoretical details, we follow formulations in, e.g., Bwanakare (2014) and propose a normalized non-extensive entropy measure of S(âk) and S(Pr).
From the Tsallis entropy definition, S q vanishes (for all q) in the case of M = 1; for M > 1, q > 0, whenever one of the p i (i = 1..M) occurrences equals unity, the remaining probabilities, of course, vanish. We get a global, absolute maximum of S q (for all q) in the case of a uniform distribution, i.e., when all p i = 1 / M . Note that we are interested, is referred to as escort probabilities, and we have for q=1 (then P m is normalized to unity), that is, in the case of Gaussian distribution (Gell-Mann & Tsallis, 2004), (Tsallis, 2009). for our economic analysis, in q values lying inside the interval (1, 5/3). In such an instance, we have for our two systems: (2.51) and S q (r) = (N 1-q -1)⋅(1 -q) -1 (2.52) in the limit when q = 1, relation (2.51) or (2.52) leads to the Boltzmann-Shannon expression (Gell-Mann & Tsallis, 2004). Below, a normalized entropy index is suggested, one in which the numerator stands for the calculated entropy of the system while the denominator displays the highest maximum entropy of the system owing to the equiprobability property: (2.53) with k varying from 1 to K (number of parameters of the system) and m belonging to M (number of support space points), with M > 2. S(âk) then reporting the accuracy on estimated parameters. Equation (2.54) reflects the non-additivity property of Tsallis entropy for two (probably) independent systems; the first, parameter probability distribution, and the second, error disturbance probability distribution (plausibly with quasi-Gaussian properties):

S(Pr) = [ S(p̂ + r)] = {[S(p) + S(r)] + (1 -q) ⋅ S(p) ⋅ S(r)}
(2.54) where: is then the sum of normalized entropy related to parameters of the model S(p) and to the disturbance term S(r). Likewise, the latter value S(r) is derived for all observations n, with J the number of data points on the support space of estimated probabilities r related to the error term. The values of these normalized entropy indexes S(âk), S(Pr) vary between zero and one. Their values, near to one, indicate a poor informative variable while lower values are an indication of better informative parameter estimate â k about the model. The next part of the book will present in detail national accounts tables used for building or forecasting macroeconomic models. The statistical theory will be implemented particularly in the case of the inverse problem, while keeping in line with this work's objective.

An Application Example: a Maximum Tsallis Entropy Econometrics Model for Labour Demand
This example presents, through Monte Carlo simulations, a model for labour demand adjustment for the Polish private sector. It constitutes an extension of an initial model presented by Bwanakare (2010) for the labour demand adjustment by the private sector of Subcarpathian province in Poland. The model aims at displaying short-run and long-run relationships between labour demand determinants through an error selfcorrect process. Due to the relatively short period of the sample (fourteen annual data points) and the autoregressive nature of the model, we may have to deal with limited possibilities of statistical inference in the absence of convergence properties or, in the worst case, an inverse ill-behaved problem. Thus, traditional methods of parameter estimation may fail to be effective. We then propose to apply the generalized maximum Tsallis entropy econometric approach-as an extension of Jaynes-Shannon-Gibbs Information theoretic entropy formalism, already applied in econometrics (Golan, Judge & Miller, 1996). Due to an annual data frequency of the sample, the approach proves to be applicable in the case of classical econometrics when a small, lower frequency data sample is available. Such a small data sample should display tail queue Gaussian distribution. Through this application, Monte Carlo experiment outputs seem to confirm the reliability of the Tsallis entropy econometrics approach, which in this particular case performs as well as the generalized least square technique.

Theoretical Expectation Model
In the short run, managers decide on the number of employees to be hired (or dismissed) in accordance with the expected long-run optimal level of production. However, because of institutional or economic reasons, that optimal number is not hired (or fired) at once. First, uncertainty remains a predominant characteristic of business. For this reason, employers naturally prefer a moderate and progressive adjustment of recruited workers to the targeted optimal level. Recruitment in some economic sectors could be time-consuming as well, especially when searching for good specialists. Second, relatively well organized trade unions could prevent employers from abrupt, large-scale layoffs, or the cost of dismissing a worker may become high, depending on prevailing labour laws at a given period. In both cases, the process of shock correction will be more or less long, depending on its origin and magnitude.
Under classical assumptions of constant returns to scale, ex ante and ex post complementarities of factors, and long-run constant rate of labour productivity, the desired level of labour demand L t * is a function of the output Y t and the technical progress t20: Assuming that labour demand adjusts to its targeted level by an error correction model: log(L t /L t-1 ) = λ.log(L* t /L* t-1 ) +μ. log(L* t-1 /L t-1 ), (2.56) combining (2.55) and (2.56) leads to: The parameter λ is the impact of output on labour demand, and then a short-run elasticity of labour demand with respect to output Y t , μ being the error correction parameter. Since a relation -1 ≤ μ ≤ 0 should prevail, the equilibrium error is only partly adjusted at each period. In other words, this parameter synthesizes employers' determinants of labour demand adjustment once a shock in sales for the coming period is expected.

General Model
Presently we are interested in the estimation of parameters of a Podkarpacki labour demand model, applying a generalized non-extensive entropy econometric approach. Following Golan, Judge & Miller (1996) and Bwanakare (2014aBwanakare ( , 2014b, we reparametrize, in the first step, the generalized linear model before fitting it to Equation (2.48). This step allows for including in moment equations-restrictions the same probability variables as those optimized in the criterion function.
To reparametrize the model, we follow each equation in (2.45-2.46) where each β k (k = 1,…,K) is treated as a discrete, random variable with compact support and 2 < M < ∞ possible outcomes. Next, for the estimation of the model, we maximize the entropy criterion function in (2.47) under moment and normality condition restrictions presented in (2.48-2.50). For confidence area analysis, we need to apply Equations (2.53-2.54).
With the purpose of improving estimated parameter quality, one can add additional a priori restrictions to (2.48-2.50) as follows: (2.58) Then we constrain the error term e to sum up to zero21 which provides an additional quality of requiring an unbiased parameter estimator.
The property of efficiency mainly depends upon the informative quality of both the prior (support space) and the model (econometric equation). When it is poor, the values of the estimated p̂i from the model tends to be equal for all p i , i.e., the case of a uniform distribution.
According to economic theory, we constrain elasticity parameters within a point support space of zero and one. As known (e.g., Golan, Judge & Miller 1996), sharper support area points of a parameter act as increasing quality of the "a priori" information. Furthermore, this allows computations of this nonlinear model to promptly converge to a global optimum solution. This is explained as follows: Likewise, we may add additional economic restrictions to the model (2.57) parameters; this leads to the following formulations:

Estimated Confidence Area Of Parameters
In classical econometrics, we usually combine the variance of random model error with the co-linearity level of explicative variables to determine the standard error of estimated parameters and to infer their confidence area while assuming a normal distribution law of random errors. This is particularly true in the case of the Least Squares approach for a linear model. In entropy econometrics, the approach is very different. We use the normalized entropy S(âi j ) (Equation 2.53) as an equivalent of the estimate standard error measure in classical linear model econometrics. Likewise, the equivalent of the coefficient of determination R 2 is a S(Pr) (Equation 2.54). Following Golan et al. (1996aGolan et al. ( , 1996bGolan et al. ( , 1996cGolan et al. ( , 2002 and Soofi (1992Soofi ( , 1994, in the case of maximum entropy formulation, the maximum level of entropy-uncertainty results when the information-moment constraints are not enforced. Furthermore, this leads to uniform distribution of probabilities over the k states of the system. As we add each piece of informative data in the form of a constraint, a departure results from the uniform distribution, which explains an uncertainty reduction. Thus, the value of S(P) reflects a global departure from the maximum uncertainty for the whole model. A similar measure, 1 -S(P), called the information index, explains the level of informative content of the model. For theoretical details, we refer the reader to the formulations presented above in Equations (2.51-2.54) or, e.g., in Golan et al. (1996).

Data and Model Outputs
In this section, the output parameters of Tsallis entropy, Shannon entropy, and least squares econometric models are presented. Next, the obtained results will be compared to those from a Monte Carlo simulation using the same data.
Data used in the model (Equation 2.57) come from the Polish Office of Statistics (GUS) and concern the period 1997-2010. Parameters of the model have been computed with the GAMS (General algebraic modelling system) code with the incorporated solver PATHNLP. We have noticed, through different simulations, that the Shannon-Gibbs entropy model seems more sensitive to initial conditions (support space of parameters in particular) than Tsallis entropy. This is a useful property, particularly when an economic theory does not exist to prompt us as to the starting parameters with which to begin. Parameter estimation by the robust standard errors least squares (LS) approach has been carried out, using freeware Gretl software (http://gretl.sourceforge.net/). Thus, the HAC estimator is used for heteroscedasticity and autocorrelation correction. a) Parameter outputs of Tsallis entropy model: Estimates Throughout many conducted experiments, we have observed the coefficient S(Pr) to be very sensitive to weighting parameters α in the objective function. Tsallis-q value being itself influenced by the above weights, its values closer or higher to 5/3 correspond to meaningless information index coefficients for which S(Pr) vanishes to zero. In empirical research, the Tsallis-q coefficient may take much higher values as a consequence of model linearity attributes or in the case when the sample is small. In the present case, we have noticed a high sensitivity of this Tsallis-q parameter on the change of the weight α i in the criterion function. The higher the weight α i, the higher the value of the Tsallis-q parameter. We have retained the value of this weight for which I[S(Pr)] is the highest.
b) Parameter outputs of Shannon-Gibbs entropy model: Estimates Three parameters are different from zero at 1%, and one on the variable a 0 significant at 10%. The above precision on the estimated parameters from such a small data sample of an autoregressive model suggests the presence of co-integrating-at the same order-variables L t and Y t . Such a particular situation leads to super-consistency of estimated parameters.  For comparative purposes, Table 3 presents outputs from Monte Carlo experiments (computed with Mathlab 7.3.0 software).
The above outputs have been derived under the hypothesis of random normal law. The empirical standard error initially computed for random value generation is 0.02035. This constitutes 50% of the observed endogenous variable standard error. We observe that Shannon-Tsallis entropy and the least squares outputs are similar and almost reflect Monte Carlo convergence outputs. The initial t-student related to the parameters on the variables y t /y t-1 and y t-1 /L t-1 decrease when we carry out the #5000 simulation and remain practically unchanged up at the #25000 simulation experiment. Nevertheless, we observe that parameter estimates of the model remain unchanged irrespective of the number of the simulations.
To conclude, we note the accuracy in the similarity of outputs from the three models. This suggests that we are dealing with a convergent case of power law to Gaussian distribution. If the Tsallis-q parameter is too high, it cannot be interpreted in a model where its nonlinearity and the small sample size (in this case 14 observation years) should have a significant impact on the value of that parameter (Grech & Pamula, 2013). The impact parameter is around 0.71. This is, on average, a 0.71% growth of labour demand when gross profits shift up to 1%. As it has been indicated, these outputs are related to a period (1997-2010) during which Poland was undergoing structural, post-communism reforms. As such, their interpretation should be done carefully. As far as exogenous technical progress is concerned, we observe a negative sign on the value of the estimated parameter β on the symptomatic variable t, which indicates an expected adverse impact of technical progress on labour demand. The discrimination criterion for independence concerns the comparison of p(x, y) with p o (x, y) ≡ h 1 (x)h 2 (y). Once again, the one-dimensional random variables x and y are independent if and only if p(x, y) = p o (x, y). Therefore, the criterion becomes: When q → 1, we then recover the usual discrimination criterion, i.e.: ∫ dx dy p(x, y) ln p(x, y) -∫ dx h 1 (x) ln h 1 (x) -∫ dy h 2 (y) ln h 2 (y) ≥ 0.
An interesting case is if q → 2, then we have: The value of this quantity, useful in economics, may give a sign of independence between x and y, when it vanishes.