Bayesian model comparison for one-dimensional azimuthal correlations in 200GeV AuAu collisions

In the context of data modeling and comparisons between different fit models, Bayesian analysis calls that model best which has the largest evidence, the prior-weighted integral over model parameters of the likelihood function. Evidence calculations automatically take into account both the usual chi-squared measure and an Occam factor which quantifies the price for adding extra parameters. Applying Bayesian analysis to projections onto azimuth of 2D angular correlations from 200 GeV AuAu collisions, we consider typical model choices including Fourier series and a Gaussian plus combinations of individual cosine components. We find that models including a Gaussian component are consistently preferred over pure Fourier-series parametrizations, sometimes strongly so. For 0-5% central collisions the Gaussian-plus-dipole model performs better than Fourier Series models or any other combination of Gaussian-plus-multipoles.


Introduction
A significant problem has emerged in the interpretation of high-energy collision data from the relativistic heavy ion collider (RHIC) and the large hadron collider (LHC). Two classes of data models with divergent physics implications are invoked to support competing narratives. One narrative interprets results with concepts from high-energy physics in which the essential phenomenon is dijet production; the other uses quark-gluon plasma and collective flow concepts in which the essential phenomenon is a flowing, dense medium while dijets play only a subordinate role.
To place evaluation and comparison of competing models on a scientific footing we require a formally consistent and statistically sound framework. We suggest that Bayesian inference provides the formal consistency and the necessary mathematical framework for representing prior knowledge, evaluating candidate models for different parameter values and thereby establishing value judgments on models as a whole. In this contribution, we show by example how the Bayesian framework successfully evaluates and compares models for that type of one-dimensional correlation data which has played a large role in claimed support for these narratives. The exercise also serves to pinpoint the weaknesses of the conventional minimum-χ 2 and maximum-likelihood approaches.

Data and parametrization
The data analyzed below were published by the STAR Collaboration [1] in the form of twodimensional binned histograms derived from 1.2 million 200GeV AuAu collision events in eleven centrality classes. The relevant autocorrelations are measured on azimuthal and pseudorapidity difference variables φ ∆ = φ 1 − φ 2 and η ∆ = η 1 − η 2 , pair densities ρ(φ ∆ , η ∆ ) and ρ ref (φ ∆ , η ∆ ) and the one-particle differential cross section, ρ 0 = d 2 n ch /dηdφ. Here, we consider one-dimensional projections of the 2D autocorrelations onto azimuth difference φ ∆ with 24 bins of size π/12. As the data are symmetrized on both η ∆ and φ ∆ , there are only 13 unique points. These are reduced to N = 11 data points by excluding from fits the φ ∆ = 0 bin containing the Bose-Einstein peak and constraining the 1D projections to sum to zero.
Since the 11-parameter 2D parametrization (or data model) described in Ref. [1] describes the corresponding 2D data rather well, we adapted it to the present 1D projection. Taking into account the zero-sum constraint, this results in what we call the Basic Model; abbreviating φ ≡ φ ∆ , with three parameters w = (A 1D , σ φ , A D ) reflecting the basic same-side (SS) peak and away-side (AS) dipole structure. The Basic Model H BM is supplemented in Augmented Models H AM by a quadrupole term −A Q 2 cos(2φ), a sextupole −A S 2 cos(3φ), and an octupole −A O 2 cos(4φ) in various combinations. The factor 2 entering the cosine terms is designed to reflect a corresponding Fourier Series (FS) parametrization popular within the collective-flow paradigm [2], In our analysis, the number of free parameters K varies from 1 to 11 in the FS case, while in the Basic and Augmented Models K varies from 3 to 6.

Model comparison
The two goals of fitting a parametrization to data are model comparison and, given a particular model H, finding the best-fit values of its parameters. In the present case, the data D = (y 1 , . . . , y N ) are given by binwise autocorrelations A(φ) at mid-bin sample points φ n while the competing models include H BM , H AM and H FS with various values of K.
In the usual frequentist approach, model comparison is effected by searching the space of parameters w to find the global minimum χ 2 0 of with σ n the experimental standard errors. Model H 1 with K 1 parameters is deemed to be better than Model . Underlying the χ 2 criterion is the assumption of independent Gaussian fluctuations of deviations between data and parametrization, which results in a likelihood When parametrizations f (φ, w | H) are linear in w, the Hessian is w-independent and the likelihood takes on a Gaussian form in parameter space around the modẽ w in terms of H or equivalently the covariance matrix C = H −1 .
Bayesian model comparison utilizes the same likelihood (6) and hence also χ 2 and the covariance matrix. Crucially, however, Bayesian model comparison is based not on the maximum likelihood or χ 2 0 but on the so-called evidence for data D within model H, which averages the likelihood over all values of w, weighted by the prior p(w | H) for the parameters. The reason why evidence is a good measure for model comparison lies within Bayes' Theorem itself. Starting with the joint data-parameter probability p(D, Bayes' Theorem states that, once data D is known, the posterior probability for parameters w can be found from the likelihood, prior and evidence by Similarly on a higher level p(D, implies that, given data D, the probability for the entire model H including all possible parameter values is which, used twice for competing models H 1 and H 2 , says that model comparison given data D can be expressed as a ratio of evidences, on setting equal a priori probabilities for models, p(H 1 ) = p(H 2 ) = 1/2. The choice of parameter prior p(w | H) is governed by available information or lack thereof. We use uniform priors, implying indifference for any value of w within the bounds set by cutoffs or prior widths ∆ k , We assign the same widths across different models for corresponding quantities. In particular, we set ∆ k = 1/3 for all k = 1, . . . , K cosine coefficients, based on knowledge of typical coefficient magnitude. The interval for σ φ is set to [0, π/2] in accordance with its definition as a same-side peak width. To the prior for the Gaussian amplitude parameter A D we assign width ∆ = 1 since correlation-structure amplitudes such as peak-to-peak excursions are generally < O(1). For uniform priors, the evidence integrals (8) can be carried out analytically in terms of the Laplace approximation which neglects third-and higher-order terms in the Taylor expansion of the log likelihood and is exact for linear systems. In our case, the log likelihood is nonlinear only in the parameter σ φ . The Laplace Approximation also requires that the effect of finite integration limits set by the ∆ k be negligible. Within these constraints, the evidence takes on a simple form in terms of the maximum likelihood, prior and covariance matrix, Since χ 2 0 is related to the maximum likelihood by χ 2 0 = −2 log p(D |w, H), we can write the corresponding log evidence in terms of χ 2 0 and an information I, The smaller −2LE for a given model, the larger its evidence. As shown in Eq. (11), −2LE for different models have no absolute meaning but must always be interpreted in terms of evidence ratios (also called odds in the literature) or of log differences.

Results
In this conference summary, we can show results only for the STAR "Bin-10" data covering the most-central 0-5% collision events; further results for Bin-0 peripheral collisions (83-94%) and Bin-8 intermediate collisions (9-18%) are presented and discussed in Ref. [3].
In Figure 1, the projected 1D data along with the best-evidence fit for the Basic Model (2) is shown. Experimental error bars were scaled up by a factor 2 to make them visible in the plot. As stated, the point at φ = 0 was excluded since it contains the Bose-Einstein peak and chargeneutral electron pairs from photoconversions. The best-fit parameter values are A 1D = 0.57±0.007, A D = 0.115 ± 0.002 and σ φ = 0.635 ± 0.007 while χ 2 0 = 12.5 for 11 − 3 degrees of freedom. A fit of the Fourier Series model (3) with K ≥ 4 is indistinguishable to the eye on the scale of the plot. That does not mean that the BM and FS models have the same −2LE. Figure 2, illustrates for the example of Fourier-Series models why model comparison based on minimum χ 2 alone is misleading when the number of data points is small, as is the case here since N = 11. Plotted are various quantities as a function of the number of free parameters K in Eq. (3). The solid blue line traces the evolution of χ 2 0 as more cosine terms are added. As expected, there is a dramatic decrease in χ 2 0 up to K = 4 but little benefit for adding further terms beyond that as K ≥ 5 Fourier Series models increasingly fit the noise represented by the lower solid curve showing fits to the residuals (data minus Basic Model). The naive criterion of minimum χ 2 0 is oblivous to this effect: χ 2 0 keeps decreasing and purely χ 2 0 -based model comparison would hence select K = 9 or higher. Note that the equivalent plot for χ 2 0 /(N − K) (not shown) barely changes the general decrease with K.
The contribution of information 2I of Eq. (15) is shown as the red dashed curve. The shaded band shows the sensitivity to the ∆ k = 1/3 prior width, using ∆ k = 1 and ∆ k = 1/5 as extremes. The resulting log-evidence −2LE of Eq. (14), shown as the black dashed curve, has a clear minimum at K = 4, confirming the above conclusion that FS models with K ≥ 5 are not preferred. Note that −2LE is a logarithmic scale.
In Figure 3(a), the scope of comparison between pure χ 2 0 and evidence-based model comparison is widened to include FS, Basic Model and Augmented Models with various combinations of higherorder multipoles. Consideration of χ 2 0 alone as in Fig. 3(a) would make the K ≥ 6 FS models the winner. By contrast, the Occam penalty of 2I forming part of −2LE in Figure 3(b) effects a dramatic re-ordering: the 3-parameter Basic Model has substantially smaller −2LE (larger evidence) than all other combinations. In addition, FS models are universally rejected in favour of Gaussian-plus-multipole models with the same number of parameters.

Conclusions
Physics interpretation: As shown, the evidence for a simple same-side Gaussian plus away-side dipole is larger than for any Fourier Series-based model. While our approach and results must naturally be tested and confirmed with other data, measurement quantities and models, our results do cast serious doubt on the interpretation of azimuthal correlations as indicators of collective flow.
Lessons for data analysis: Physics conclusions must rely on the integrity of the underlying data analysis methods and assumptions. Below, we provide in tabular form a motivation why Bayesian evidence-based model comparison is superior to simple χ 2 fitting. The sometimes very different conclusions shown above suggest that any physical interpretation based on simple χ 2 fitting of onedimensional correlation data may be debatable or outright wrong. We secondly emphasize that no model or "goodness of fit" has any meaning unless it is compared to another competing model.
Traditional use of χ 2 Bayesian Inference Parameter estimation: Find maximum likelihoods max p(D | w 1 , H) and max p(D | w 2 , H 2 ) which determine single best fit parametersw 1 ,w 2 . This is equivalent to Estimating uncertainties: σ(w) is estimated from parameter covariance matrix C.
Estimating uncertainties: Posterior p(w | D, H) provides comprehensive information on parameters, including σ(w).