Shannon entropy and particle decays

We deploy Shannon's information entropy to the distribution of branching fractions in a particle decay. This serves to quantify how important a given new reported decay channel is, from the point of view of the information that it adds to the already known ones. Because the entropy is additive, one can subdivide the set of channels and discuss, for example, how much information the discovery of a new decay branching would add; or subdivide the decay distribution down to the level of individual quantum states (which can be quickly counted by the phase space). We illustrate the concept with some examples of experimentally known particle decay distributions.


Introduction
Shannon entropy [1] has found applications in all data-intensive fields of science; a recent review [2] with focus on heavy ion collisions provides an ample reference list and we refer the interested reader there. This information entropy measures the uncertainty associated with a random variable, or when ignoring the value taken by the variable, of the average missing information content.
The decay width of an unstable particle can be decomposed into a sum over the partial widths for each of its possible decay channels, Γ = Γ j . We can also characterize the decay by the branching ratios BR j = Γ j /Γ, and as their sum is unity j BR j := j Γ j /Γ = 1, they provide a probability distribution for the various decay channels (j = 1, . . . , N ).
This makes the information entropy of particle decay distributions a well posed observable to compute 1 and we will evaluate it with actual data from meson and gauge boson decays, all taken from [3]. The maximum value of the entropy (for a decay distribution with a fixed number of channels) is reached when they are all equally likely, that is, BR j = 1 N (because N j=1 BR j = 1 and BR i = BR j for all i, j in 1 The quantity of Eq. (1) is named entropy in analogy to the Gibbs mixing entropy, the increase in thermodynamic entropy S = k B ln Ω obtained upon mixing two gases totalling N molecules, with partial molar fractions x 1 and x 2 , this case). Then, The minimum value is simply 0 and is reached when one channel concentrates all the probability, BR 1 1, BR j 0, j > 1. Thus, we are looking at a variable S ∈ (0, log N ) that characterises how disordered the decay products are, or namely, how difficult it is to predict the particular outcome of one decay event.
One can define the "information" function as the negative of the logarithm of the branching ratio, that is, I j := − log BR j . The entropy is then the average of that function over the distribution of decay channels, Interpreting I j as the information obtained when a given particle decay proceeds through channel j (a quantity associated to a given decay), S is then the average information in the distribution of the random decay process (a quantity associated to all the decays, that is, to the decaying particle itself). Shannon entropy has been used before in other contexts in particle physics. Early ones concentrated in the information entropy produced upon parton splittings (e.g. in jet emission) [4,5]. The concept has also been applied to study various fragment ratios after a heavy ion collision [6].
Early-on after the Higgs boson discovery, d'Enterria [7] observed that the Higgs sits in the window of maximum entropy of its decay distribution: were it heavier, around 200 GeV, above the W W and ZZ threshold, these two vector-boson channels would dominate the decay, with the rest of the Standard Model having much smaller branching fractions (and thus, the entropy being way smaller). As it is, at 125 GeV these boson channels are kinematically closed, and this makes the width small but the entropy large (as all the allowed Standard Model decays have to share a small portion of the decay).
Alves, Dias and da Silva [8] then introduced a "Maximum entropy principle" elevating that observation about the Higgs to a more general principle, to try to predict the hypothetical axion mass [9,10].
Even if the principle does not hold as a law of nature, the observation about the Higgs is sound and intriguing, and helps understand why its discovery happened so late in the development of high energy physics.
In this article and in two companion proceedings publications [11,12] we present numerous examples of computing the Shannon entropy of decaying mesons of multiple quark-flavor compositions, and of decaying electroweak bosons, and explore several new features of this variable and two related ones.

Unknown decay channels
In practice, many particles have complicated and multibody decays, so one does not always know the entire decay distribution. In that case, N j=1 BR j < 1 (with BR j < 1). The discovery of new channels brings the sum closer to one, and the entropy increases. Nevertheless, if additional channels have a small branching fraction, their contribution to the entropy turns out to be negligible, and the entropy saturates. This can be seen in figures 1 and 2. The error bars are computed from the experimental ∆BR j uncertainties, and have been added linearly and not in quadrature, as determining different branching fractions of the same particle are often two very correlated measurements.
The first obvious approximation that we can perform is to bunch all unknown decay channels in just one with branching ratio equal to the missing part to reach 1 from the already known branching ratios at hand. N S Figure 1: Shannon entropy for the decay distribution of the χ c1 (1p) charmonium as function of the number of channels included. Channels are added from left to right in order of decreasing branching fractions. (A further plot that assesses the effect of removing radiative decays from the distribution can be found in [11]). N S Figure 2: Shannon entropy for the decay distribution of mesons ω(782) and Υ(3s) (that can decay through the strong force), and for the Λ baryon (that decays weakly). The OX axis shows the number of channels included; the OY axis, the resulting entropy. (A further plot showing the effect of removing the Υ(3s) radiative decay channels can be found in [11]).
The entropy is then, as a particular case of Eq. (1) In view of Eq. (6) below, this formula must underestimate the true entropy. As a way to estimate the error incurred, one could use perhaps the Kullback-Leibler divergence. This states that the difference between the entropy and its estimate is given by where N of the M channels are known and the number M can be obtained from other considerations (for example, by studying which channels are open given phase space and conservation laws). The BR k , N + 1 ≤ k ≤ M are the unknown branching ratios and For the purpose of the uncertainty estimate, they can be taken equal to each other, say BR k = We will not pursue these theoretical error estimates any further, but content ourselves with propagating the experimental uncertainty BR j ± ∆BR j to the entropy.
As seen in figures 1, 2 and 3, the entropy increases monotonously upon adding more channels, but saturates into what seems at most logarithmic growth (which matches expectations from max(S) = log N ). The second plot of figure 3 displays the entropy of certain kaon resonance decays not in terms of the number of channels included, but in terms of the sum of the branching ratios accounted up to the given channel, so the axis of abscissae ends at precisely 1.

Entropy additivity and phase space
There is a difficulty in analysing a given hadron decay chain: at which stage to count the final products. Does one have to descend all the way to stable particles, p, e and ν? Is it sufficient to stay at the level of particles unable to decay strongly, so that π, K, etc. are considered final products? Or should one stop right away at the level of unstable hadron resonances with short lifetimes characteristic of the strong force of order 10 −23 seconds? Figure 4 shows an exercise where we study the decay of several excited kaons through another intermediate kaon and down to final products that are stable under the strong force, considering as an example the various subchannels to which the K * resonance decays to, i.e. Figure 4: Shannon entropy for the decay distribution of several kaon excitations, counting only primary decays to particles that are themselves unstable, or following the decay chain one more step to include secondary decays.
The entropy of the secondary decay chain (lines shown with hollow symbols) is quite different from that of the primary decay chain alone (only lines), since the secondary particles can decay through several channels, so a decision needs to be taken so as to how much to descend in the decay tree. For most applications in the strong interactions, when clear resonances can be identified one should stay at the first level (e.g. discount ρπ from a given πππ branching fraction).
Nevertheless, it is worth recalling a basic property of Shannon entropy 2 , namely its additivity. If each of the branching ratios at the first level, BR j , counts the joint probability of routing the decay into certain subchannels j 1 , j 2 , . . . , j mj , the entropy at the second level, which includes all subchannels, can be obtained from the entropy at the first level in terms of the BR j and the entropy of each subdivision as follows 3 , The first function is the entropy at the higher level (where the subchannels are all bunched in one) and the second is a sum, over each of the primary channels, of the entropy within each of them weighted with its overall probability BR j . That second term explains the difference between the lines with and without symbols in figure 4.
Quite surprisingly, the decay entropies are not always so different. This is highlighted by the decay entropy of the J/ψ in figure 5. In that case, the entropy accumulated when describing the decay in terms of intermediate quarks and gluons is similar to that accumulated when employing the identified final state hadrons. The later one is a bit larger, consistently with Eq. (6), but the experimental errors propagated from the uncertainty in the branching fractions are much larger than the difference. As a matter of principle, the lowest level to which one can descend in a decay chain is that of individual quantum states, to which we turn next.

Entropy in terms of phase space
The partial decay width can be written in terms of the invariant Feynman amplitude and the Lorentz Invariant Phase Space which counts the number of available quantum states (thus, the decay distribution cannot be subdivided any further). Restricting ourselves to two-body decay channels, the integrated phase space is that, with center of mass kinematics, yields the well-known relation j ρ j S Figure 6: Entropy as function of the accumulated phase space N j=1 ρ j upon including successive decay two-body channels of the electroweak Z boson, ordered from larger to smaller branching fraction.
We plot the entropy for the decay distribution of the electroweak Z boson against the accumulated phase-space in figure 6 (the case of the W boson has also been analysed and reported in [11]).
The OY axis should now be considered as arbitrarily normalised, as we are still plotting the entropy S(BR 1 , . . . , BR N ) out terms correcting for the internal entropy of each channel. Nevertheless, the OX axis is now scaled with the correct phase space for each of the included two-body decays. There is not much qualitative difference at this point (a vertical stretching of the entropy function if we took into account the internal entropy), so we will continue plotting entropy at an aggregate level channel by channel.

Information value of discovering a new decay channel
We wish to propose a simple criterion to quantify what information the discovery of a new branching fraction provides to the knowledge of a particle decay (purely from the statistical point of view, without entering to judge whether that decay may be showing the violation of an approximate symmetry, or be a golden mode for certain observables or any other qualitative effects that need to be judged on a case by case basis).
The first obvious effect is that of Eq. (2). Having observed that the maximum entropy grows as the log of the number of channels, if all were equally weighted, the actual importance of a new channel can be obtained by studying the separation of the entropy from this maximum value. Therefore, we propose two possible measures of this added information. One is the normalized entropy increment, defined by that is plotted in figure 7, where we show how this normalized entropy increment would evolve upon sequentially including (eventually, discovering) the represented decay channels of the Z boson. (N increases by one every time a new channel is added to the list.) In the figure we see that the discovery of decay channels 5, 6 and 7 would then be less significant, from the point of view of information theory, than the discovery of one of the channels 1 through 4.
Another possibility is to employ a certain "degree of likeness" (to the maximum possible entropy) which can be simply defined by S(N ) log(N ) ∈ (0, 1). Its increment upon adding one new channel would then be A positive Θ means that the entropy of distribution steps closer to the maximum possible value of S upon introducing the new channel; this can happen when the new channel has a branching fraction similar to the ones already known. If Θ is negative, the entropy decreases relative to its maximum possible value, and the new channel is very dissimilar from the others (typically smaller). This function, applied to the same decay distribution of the Z boson as in figure 7 is plotted in figure 8.
N Θ Figure 8: Change in the degree of likeness Θ upon discovering each new channel (so that the last unknown channel splits off part of its probability to that new known one, and N increases by one unit) for the Z boson.
We can ascertain once more in this figure that channels 5, 6 and 7 contribute less to the entropy because their branching fractions are much smaller than those of the channels included earlier.

Base of the logarithm
The base of the logarithm in Eq. (1) provides a mean to compare different decay chains. Very often in computer science, log k is taken with k = 2 so that information is measured in bits. Our entropy here has been rather presented in nats by employing the natural logarithm. An interesting additional choice is to use k = N , the actual number of channels needed to describe the particle's decay. (For small N it is worth noting that, since we are usually packaging an unspecified number of unknown channels into an additional one, we will rather use k = N + 1).
This choice of scaling the base with N has an advantage to compare the entropy of decay distributions for different particles associated, not to the number of channels, but rather to the inhomogeneity of the decay product distribution among them. This comes about because the maximum possible value of the entropy is then log N N = 1 (or, being precise, log N +1 (N + 1) = 1) and one can then obviously compare two particles on the same scale.
The comparison is even more telling if both particles have the same number of relevant decay channels. This is approximately the case for instance for the pair of f 1 (1285), an axial J P C = 1 ++ meson, and its multiplet partner f 2 (1270), a tensor 2 ++ meson. This last one has a decay very much dominated by ππ (85%) while the former has the probability more distributed among the ηππ, 4π, KKπ and a 0 (980)π channels, the rest being minor. Figure 9 shows the data; in both cases the entropy has its maximum possible value at 1. N S N S Figure 9: Shannon entropy for the decay product distribution of two example flavor-singlet f mesons against the number of channels included in the decay (analogous to figure 2). The base of the logarithm is here chosen to be N + 1, the number of decay channels included.
As a second example, figure 10 shows the entropy of the decay distribution of the φ meson, also normalised to 1 by choosing log N BR j instead of the natural logarithm. Figure 11 plots, for the light, unflavored mesons (η, ω, φ, etc.) the entropy against the maximum of the branching fractions BR j for the various decay channels of each.

Further observations and outlook
There is a clear anticorrelation between the two variables: the entropy (lack of predictivity about any one particular decay) is much larger when there is no dominant decay branching fraction, as should be evident to the reader. Thus, it is more informative to discover a new decay channel when none carries a fraction close to unity of the total decays. A further observation is that, not uncommonly, the branching fractions are ordered in a geometric hierarchy BR 0 , BR 1 = f BR 0 , . . . , BR N = f N BR 0 with f < 1 (this is typically seen in entropy functions that grow quickly for the very first channels and saturate almost immediately; a plot for such a distribution with f = 1/2 has been relegated to the proceedings in [11]). This statement is reflected in the following approximation 4 , It converges quickly for many particles (typically those with few open strong-decay channels) but f does depend on the particle in question. Mesons that contain heavy quarks but low excitation number do not fall in this category. Instead, they possess many channels (with light valence quarks only) that have similar branching ratios. Then the entropy function grows linearly with the number of channels and many of them are required to start saturating it. This is best visible in figures 12 and 13, especially the second one (entropy against the number of channels). 4 In an extreme, idealized case, N → ∞ and BR 0  For those two low-lying charmonia, about half of the total width is accounted for. Each new channel, of a size similar to those previously known, increases the entropy practically in proportion to its branching fraction ( figure 12). The characteristic log N growth of the entropy is however visible if we plot the same data against the number of channels instead of the branching fraction accounted for ( figure 13).
To conclude, we have found that Shannon's entropy is an interesting tool to ascertain the relative importance of different decays. Taking into account the sheer size of the particle physics decay data collected by the community and ordered by the Particle Data Group, this and other methods of information theory find a rich field of applicability.
As we have seen in numerous examples, the generic behaviour of the entropy of the distribution against the number of channels is a linear increase for the first few, larger ones, followed by a saturation well below the entropy's maximum for N channels, log N .
We have discussed how to compare different particles, using the logarithm of base N is fair as it normalises the maximum entropy to unity. We have also discussed simple derived functions that help quantify the amount of entropy that a given decay channel adds to the distribution after its discovery.
And finally, we have shown the anticorrelation between the entropy and the maximum branching fraction of any decay channel. Shannon's entropy is maximized by particles that decay more or less equally through their decay channels (perhaps because the decaying particle is below the threshold of the channel it couples more strongly to).  Figure 13: Same as in figure 12 but plotted as a function of the number of channels instead of the accumulated branching fraction. Because many channels have commensurable branching fractions, S grows approximately linearly with the number of channels until i BR i starts being a sizeable fraction of 1. The characteristic log N behavior is visible following that linear regime.