DAMPE squib? Significance of the 1.4 TeV DAMPE excess

We present a Bayesian and frequentist analysis of the DAMPE charged cosmic ray spectrum. The spectrum, by eye, contained a spectral break at about 1 TeV and a monochromatic excess at about 1.4 TeV. The break was supported by a Bayes factor of about $10^{10}$ and we argue that the statistical significance was resounding. We investigated whether we should attribute the excess to dark matter annihilation into electrons in a nearby subhalo. We found a local significance of about $3.6\sigma$ and a global significance of about $2.3\sigma$, including a two-dimensional look-elsewhere effect by simulating 1000 pseudo-experiments. The Bayes factor was sensitive to our choices of priors, but favoured the excess by about 2 for our choices. Thus, whilst intriguing, the evidence for a signal is not currently compelling.


I. INTRODUCTION
The Dark Matter Particle Explorer (DAMPE) experiment recently published the energy spectrum of electrons and positions from about 10 GeV to about 4 TeV [1]. The spectrum, by eye, contained two interesting features: a break at about 1 TeV and a monochromatic excess at about 1.4 TeV. The DAMPE analysis itself contained no statistical analysis of the excess, which, nevertheless, stirred much interest . In particular, dark matter (DM) was invoked to explain the excess. DM with a mass of about 1.4 TeV could annihilate into electrons in a subhalo within about a kpc resulting in a narrow spike in the spectrum.
It is thus important to estimate the statistical significance of the excess. We do so with frequentist statistics in Sec. II and Bayesian statistics in Sec. III. In each case, we fit the spectrum by three toy models: • A single power-law (PL), described by a normalisation Φ 0 and a power p.
• A smoothly-broken power-law (SBPL), described by a normalisation Φ b , powers p 1 and p 2 , a break E b and a smoothing parameter ∆. This approximately equals two power-laws, which are smoothly matched at the break at E b by a smoothness governed by ∆.
• A half-normal distribution upon a smoothly-broken power-law (signal), for E ≤ m χ and zero elsewhere. This template is motivated by DM particles of mass m χ annihilating into electrons in a nearby subhalo, resulting in a signal of amplitude A and width σ.
The PL, SBPL and signal models have 2, 5 and 8 parameters, respectively. The toy models capture the behaviour of possible spectra from underlying physical processes. The unknown relationships between fundamental and toy model parameters cannot impact our frequentist analysis; however, they could influence suitable choices of prior in our Bayesian analysis. This is especially so for the width and amplitude of the signal, which could, in principle, be related to the DM annihilation cross section, subhalo properties and diffusion equations governing the propagation of charged cosmic rays. DAMPE measured the average flux in 38 energy bins. We may predict the average flux in the i-th bin bȳ where the bin spans energies a i to b i . DAMPE associated their measurement in the i-th bin with the energy E i at which the predicted flux equals the predicted average flux in that bin for the best-fit SBPL model [36], i.e., E i is defined by The SBPL and PL fluxes are approximately linear on scales similar to the bin width such that Φ( E i ) ≈Φ i for the SBPL and PL models. The signal model, however, contains a peak that may be narrower than the bin width and we must explicitly calculateΦ i as it is not approximated by Φ( E i ). This subtlety means that previous calculations of the required amplitude of a DM signal are underestimates by a factor of approximately the bin width divided by the signal width, ∆E/σ ≈ 5 -20.

II. FREQUENTIST ANALYSIS
We performed two hypothesis tests: an SBPL versus a single PL under the hypothesis of a single PL, and arXiv:1712.05089v1 [hep-ph] 14 Dec 2017 an SBPL versus a signal under the hypothesis of an SBPL. We performed the former to validate our methodology against a result published by DAMPE. We used chi-squared test-statistics, We minimised the chi-squared with respect to each model's parameters with a CMA-ES evolutionary algorithm [37] implemented in stochopy [38]. The chi-squared itself was whereΦ i and µ i were the predicted and measured average flux in the i-th bin, we summed over bins from 55 GeV to 2.63 TeV (matching the DAMPE analysis), and we added statistical and systematic errors in quadrature. We found the distributions of our test-statistics by Monte Carlo. To do so, we generated 1000 pseudodatasets from the best-fit single PL and best-fit SBPL models and reminimised the test-statistic for each dataset and model. Thus, we estimated the p-value, by the fraction of pseudo-experiments in which the teststatistic exceeded that observed. We, furthermore, calculated 68% Clopper-Pearson intervals for the p-value (see e.g., Ref. [39]). We found no differences in chi-squared between the PL and SBPL models as extreme as that observed in 1000 pseudo-experiments under the PL hypothesis. This resulted in a p-value associated with the PL model of at most 0.002, which is equivalent to at least 2.9σ. DAMPE applied Wilks' theorem to estimate the significance, finding 6.6σ; however, in the limit p 1 → p 2 the SBPL reduces to the single PL with no other parameters and, thus, Wilks' theorem cannot strictly apply. We found about 7σ with a similar procedure. Although we could not populate the tail of the distribution by Monte Carlo, since the observed test-statistic of about 56 lies in the extreme tail of the distribution we expected that the p-value was negligible.
Only 11 of our 1000 pseudo-experiments under the SBPL hypothesis had differences in chi-squared between the PL and SBPL models as extreme as that observed, resulting in a global significance of about 2.2σ -2.4σ. This includes a two-dimensional look-elsewhere effect in the mass and width of the excess and corresponds to a p-value of about 1%. The local significance was about 3.6σ, assuming a 1 2 χ 2 distribution for the test-statistic. To validate our methodology, we checked that our Monte Carlo reproduced a 1 2 χ 2 1 distribution from a model with a fixed mass and width.
We show best-fit spectra for our three models in Fig. 1. There were degeneracies in the fits, especially in the amplitude and width of the signal. The amplitude of the narrow excess demonstrates that previous analyses underestimated the amplitude required to fit the anomalous bin. We show in Fig. 2, furthermore, confidence regions for the DM mass and width of the signal. The DM signal must have a mass of about 1300 GeV to 1500 GeV, a width of less than about 100 GeV, and an amplitude of about 10 −5 /s/sr/m 2 . This amplitude corresponds to a peak flux of about 10 −7 /GeV/s/sr/m 2 for a signal width of σ = 10 GeV.

III. BAYESIAN ANALYSIS
We considered Bayes factors between the three competing models of the spectrum. Bayes factors update the relative plausibility of two hypotheses with experimental data (see Ref. [40]); Posterior odds = Bayes factor × Prior odds. The Bayes factor itself may be written for data D, and models M 1 and M 2 . This is a ratio of evidences, where x represents a model's parameters, p(D | M, x) = e − 1 2 χ 2 is our likelihood function and p(x | M ) are our priors for the model's parameters. We calculated evidences with (Py-)MultiNest-3.10 [41][42][43][44]. We list our priors in Table I. We picked flat priors for the exponents in the PL and SBPL models and logarithmic priors for all other parameters. Since we a priori knew the order of magnitude of the exponents, the choice of flat or logarithmic prior was moot. We found that, as anticipated, the SBPL model was favoured against the single PL model by about 10 10 . Since this was resounding and agreed with our frequentist analysis, we considered the matter settled and did not investigate prior sensitivity.
We found that the signal model was favoured versus an SBPL by a Bayes factor of about 2. We anticipate that changes in priors for the SBPL parameters, which are present in each model, could not substantially modify the Bayes factor. We found that the Bayes factor increased to 4 with linear rather than logarithmic priors for the mass, amplitude and width of the DM signal. Our prior range for the amplitude spanned only three orders of magnitude about that favoured by the 1.4 TeV excess and for the width spanned fewer than two orders of magnitude; arguably, they should have been more diffuse, which would decrease the Bayes factor. Our prior for the mass spanned the range searched by DAMPE, 55 GeV to 2.63 TeV; shrinking it to between 1 TeV to 2.63 TeV could increase the Bayes factor to about 4. The maximum Bayes factor achievable with any priors is about 500, which is obtained for Dirac delta functions at the best-fit mass, width and amplitude of a DM signal. Nevertheless, it seems difficult to make a reasonable case that the Bayes factor is compelling, especially since the narrow signal and substantial amplitude preferred by DAMPE were, if anything, a priori implausible as such a signal must originate from a nearby subhalo with a substantial DM density.

IV. CONCLUSIONS
The DAMPE energy spectrum of electrons and positrons contained two interesting features: a spectral break and a monochromatic excess. We performed a Bayesian and frequentist analysis of the features by testing three models: a single power-law, a smoothly-broken power-law, and a smoothly-broken power-law with a signal feature motivated by dark matter annihilation in a nearby subhalo. We found global p-values through 1000 pseudo-experiments, including refits of models with 2, 5 and 8 parameters with evolutionary algorithms. We found Bayesian evidences by nested sampling. The break in the spectrum was significant with frequentist and Bayesian statistics -we bounded the p-value at about 0.1% and the Bayes factor was about 10 10 . We expect in fact that p-value ≪ 0.1%; our Monte Carlo may be unsuitable and specialised techniques such as Gross-Vitells [45] may be more appropriate. The excess, on the other hand, was present at 3.6σ local and 2.3σ global significance. The Bayes factor was sensitive to our choices of priors for the mass, amplitude and width of the signal, but for our choices favoured a signal by about 2. Thus whilst intriguing, the excess is not currently compelling. We hope that this serves as a example of using frequentist and Bayesian methods for analysing anomalies in high-energy physics [46].