Examining the time dependence of DAMA's modulation amplitude

If dark matter is composed of weakly interacting particles, Earth's orbital motion may induce a small annual variation in the rate at which these particles interact in a terrestrial detector. The DAMA collaboration has identified at a 9.3$\sigma$ confidence level such an annual modulation in their event rate over two detector iterations, DAMA/NaI and DAMA/LIBRA, each with $\sim7$ years of observations. We statistically examine the time dependence of the modulation amplitudes, which"by eye"appear to be decreasing with time in certain energy ranges. We perform a chi-squared goodness of fit test of the average modulation amplitudes measured\ by the two detector iterations which rejects the hypothesis of a consistent modulation amplitude at greater than 80\%, 96\%, and 99.6\% for the 2--4~keVee, 2--5~keVee and 2--6~keVee energy ranges, respectively. We also find that among the 14 annual cycles there are three $\gtrsim 3\sigma$ departures from the average in the 5-6~keVee energy range. In addition, we examined several phenomenological models for the time dependence of the modulation amplitude. Using a maximum likelihood test, we find that descriptions of the modulation amplitude as decreasing with time are preferred over a constant modulation amplitude at anywhere between 1$\sigma$ and 3$\sigma$, depending on the phenomenological model for the time dependence and the signal energy range considered. A time dependent modulation amplitude is not expected for a dark matter signal, at least for dark matter halo morphologies consistent with the DAMA signal. New data from DAMA/LIBRA--phase2 will certainly aid in determining whether any apparent time dependence is a real effect or a statistical fluctuation.

If dark matter is composed of weakly interacting particles, Earth's orbital motion may induce a small annual variation in the rate at which these particles interact in a terrestrial detector. The DAMA collaboration has identified at a 9.3σ confidence level such an annual modulation in their event rate over two detector iterations, DAMA/NaI and DAMA/LIBRA, each with ∼ 7 years of observations. We statistically examine the time dependence of the modulation amplitudes, which "by eye" appear to be decreasing with time in certain energy ranges. We perform a chi-squared goodness of fit test of the average modulation amplitudes measured by the two detector iterations which rejects the hypothesis of a consistent modulation amplitude at greater than 80%, 96%, and 99.6% for the 2-4 keVee, 2-5 keVee and 2-6 keVee energy ranges, respectively. We also find that among the 14 annual cycles there are three > ∼ 3σ departures from the average in the 5-6 keVee energy range. In addition, we examined several phenomenological models for the time dependence of the modulation amplitude. Using a maximum likelihood test, we find that descriptions of the modulation amplitude as decreasing with time are preferred over a constant modulation amplitude at anywhere between 1σ and 3σ, depending on the phenomenological model for the time dependence and the signal energy range considered. A time dependent modulation amplitude is not expected for a dark matter signal, at least for dark matter halo morphologies consistent with the DAMA signal. New data from DAMA/LIBRA-phase2 will certainly aid in determining whether any apparent time dependence is a real effect or a statistical fluctuation.

I. INTRODUCTION
The nature of dark matter is one of the most compelling mysteries in both cosmology and particle physics. One of the foremost possibilities is that dark matter is a new fundamental particle that is yet to be discovered. There is currently an extensive experimental effort to try to detect the particle nature of dark matter through its interactions with Standard Model particles [1,2]. Among the leading dark matter candidates are Weakly Interacting Massive Particles (WIMPs), a generic class of particles that includes, for example, the supersymmetric neutralino. These particles interact gravitationally and through the weak force, and their expected masses range from O(1) GeV to O(10) TeV. More than thirty years ago, Refs. [3,4] proposed a mechanism for detecting weakly interacting particles, including WIMPs, via coherent elastic scattering with nuclei. Soon after, the dark matter detection rates in the context of a Galactic Halo of WIMPs were computed for the first time, and it was proposed that they could be differentiated from background by looking for an annual modulation of the signal [5].
The DAMA collaboration has constructed one of these "direct detection" experiments to search for the annual modulation of nuclear recoils due to dark matter scattering within NaI scintillation detectors. The original detectors were deployed in 1995 and consisted of 100 kg of NaI [6]. The experiment went through an upgrade in 2002 that included increasing the detector mass to 250 kg and improving the photo-multiplier tubes that detect the scintillation light from the nuclear recoils. Through the seven years that the original DAMA/NaI experiment ran, the total exposure was 1.08 · 10 5 kg-days [7,8]. The upgraded DAMA/LIBRA experiment collected a larger exposure of 3.80 · 10 5 kg-days over the seven years it took data [9][10][11]. The DAMA collaboration has identified at a 9.3σ confidence level an annual modulation in their event rate that is consistent with a dark matter signal. Thus far, no plausible alternative explanation for the modulation signal observed in the DAMA detectors has been universally accepted [12]. Meanwhile, there are three upcoming NaI experiments that will test the DAMA modulation results: SABRE [13], which will consist of two identical experiments located in the northern and southern hemispheres; ANAIS-112 [14,15], which, as of April 2017, is commissioning at the Canfranc Underground Laboratory; and COSINE-100 [16], a joint effort between the DM-Ice [17] and KIMS [18] collaborations, which has been running at Yangyang Underground Laboratory since September 2016. Furthermore, DAMA/LIBRA has been running in an upgraded phase 2 configuration since January 2011 [19], and may also shed light on past DAMA modulation results [20].
In this work we examine the amplitude of the modulation signal observed by the DAMA collaboration, focusing on the possibility of any time dependence of the amplitude over the 14 years of operation of the DAMA/NaI and DAMA/LIBRA experiments. We find that descriptions of the modulation amplitude as decreasing with time are preferred over a constant modulation amplitude at anywhere between 1σ and 3σ, depending on the phenomenological model for the time dependence and the signal energy range considered.
In Section II we remind the reader of the expected form for the modulation signal. In Section III we present our analysis of the modulation amplitudes and their potential time dependence. We discuss the results and draw conclusions in Section IV.

II. ANNUAL MODULATION
If dark matter is composed of a new, as-yet-unknown particle with weak interactions with ordinary matter, then dark matter particles in the galactic halo will pass through the Earth and occasionally scatter on nuclei, causing a nucleus to recoil. Dark matter direct detection experiments aim to observe these rare, low-energy nuclear recoils induced by collisions with a dark matter particle [4]. The diffuse dark matter halo has little bulk rotation so, from the perspective of the Earth as it orbits about the center of the galaxy (along with the rest of the galactic disk) and through the halo, there is a wind of dark matter approaching from the direction of the disk rotation. The form of the expected signal (measured in counts per mass per time per energy) in a direct detection experiment is often written as where ω ≡ 2π/year, S 0 is the time-averaged differential recoil rate, S m is the annual modulation amplitude, and higher-order terms in the expansion are suppressed. The phase of the expansion t 0 is approximately the time of year at which the Earth is moving fastest into the dark matter wind, around the beginning of June for the Standard Halo Model (SHM), a common first-order approximation to the dark matter halo [5,21]. For a recent review of status of the search for the annual modulation signal from dark matter including alternatives to the SHM, see Ref. [22].

III. ANALYSIS
A. DAMA/NaI vs. DAMA/LIBRA In this subsection, we gather the best fit modulation amplitudes for the DAMA/NaI and DAMA/LIBRA exposures, as well as the combined best fit amplitudes for the entire 14 years of operation for which the modulation amplitudes have been published. The DAMA collaboration provides the best fit modulation amplitudes for the 2-4, 2-5, and 2-6 keVee energy ranges. However, we would like to explore the signal in more detail, specifically the higher end of the energy range.
An estimation for the modulation amplitudes in the 4-6, 4-5, and 5-6 keVee energy ranges, which are not provided in the DAMA collaboration publications, can be obtained from the known modulation amplitudes available from the collaboration using the following procedure. For each energy interval, there is a modulation amplitude (S m ), as well as an average efficiency for the detector to detect a recoil (ε). If this interval is split into two ranges, one can write with the subscripts 1 and 2 referring to the two energy ranges, and f 1,2 representing the fraction of the energy interval that each of the two ranges covers. For example, S m and S m 1 could be the published 2-5 keVee and 2-4 keVee amplitudes, and S m 2 could be estimated using Eqn. 2. This relationship comes from the fact that the DAMA experiment is fundamentally a counting experiment with the number of events in each range proportional to the total exposure. Similarly, the uncertainty (σ) can be expressed as 0.0200 ± 0.0032 0.0097 ± 0.0013 0.0112 ± 0.0012 4-6 keVee † 0.0167 ± 0.0042 0.0034 ± 0.0016 0.0054 ± 0.0015 4-5 keVee † 0.0161 ± 0.0062 0.0043 ± 0.0022 0.0062 ± 0.0022 5-6 keVee † 0.0173 ± 0.0056 0.0026 ± 0.0022 0.0047 ± 0.0019 TABLE I: Best-fit modulation amplitudes for the DAMA/NaI, DAMA/LIBRA, and combined exposures. Energy-interval-averaged efficiencies are calculated from the overall efficiencies in Fig. 17 of Ref. [6] and Fig. 26 in Ref. [23]. Modulation amplitudes for the 2-4, 2-5, & 2-6 keVee energy ranges are fits from the DAMA collaboration, taken from Refs. [8,11,23]. ( †) Modulation amplitudes for the 4-6, 4-5, & 5-6 keVee energy ranges are estimates derived from the existing analysis intervals, as described in the text (caveats apply).
with σ 1,2 representing the uncertainty in each of the relevant energy ranges. This procedure assumes the data for the different energy ranges modulate with the same period and phase but statistically independent, i.e. the 2-5 keVee fit is the appropriately weighted average of fits to the 2-4 keVee and 4-5 keVee ranges. In the case where the phase and period are fixed, this assumption is valid. When the phase and period are allowed to vary, however, the assumption is not strictly correct. In looking at the fits provided by the collaboration in Tables 3 and 4 of Ref. [11], we see that the largest difference in modulation amplitude between the fixed versus free fits is at the ∼ 0.5σ level. We thus conclude that the assumption that the energy bins modulate with the smae period and phase is approximately correct to quite good accuracy. This caveat should be kept in mind, however, when examining our results for the derived data in the higher energy ranges.   Table I, under the assumption of a common modulation amplitude. The minimum χ 2 and corresponding p-value for a χ 2 goodness-of-fit are shown, as well as the best-fit common amplitude. ( †) Data for these energy intervals are derived as discussed in the text. Table I presents the results that have been released to date by the DAMA collaboration along with our estimated data for the higher energy ranges. The best-fit modulation amplitudes are listed under the assumption of a fixed period and phase, as well as allowing for a free period and phase. The modulation amplitudes displayed for the 2-4, 2-5, and 2-6 keVee energy ranges are the fits performed by the collaboration as found in Refs. [8,11,23], while the modulation amplitudes for the 4-6, 4-5, and 5-6 keVee energy ranges, as well as the efficiencies, are calculated as explained above, and we use the data in Refs. [6,23] to calculate the average efficiencies.
In comparing the amplitudes for DAMA/NaI to DAMA/LIBRA as shown in Table I, we find that the amplitudes decrease in every energy range. This points towards a possible inconsistency in the results between the two incarnations of the experiment. To test the hypothesis of a time-varying modulation amplitude, we begin by examining the compatibility of the DAMA/NaI and DAMA/LIBRA fit results of Table I, under the assumption of a common modulation amplitude. In Table II, we present the minimum χ 2 and corresponding p-value for a χ 2 goodness-of-fit, as well as the best-fit common amplitude. We find the 2-6 keVee energy range is discrepant at the ∼3σ level for both the fixed and free period and phase. Our estimated data indicates that the majority of the cause of the poor fit is due to events with energies > ∼ 4 keVee, as the 4-6 keVee interval is similarly discrepant. Note that the best fit modulation amplitudes in Table II match very well the values presented by the  collaboration for the combined fits in the third column of Table I, as expected.
We also collect the fits for the per-cycle modulation amplitudes as performed by the DAMA collaboration for the free period and phase in Table III. Unfortunately the fits to the data under the assumption of a fixed period and phase have not been released to the   Figure 3 of Ref. [11] and exposures are taken from Refs. [8,11]. The mean time for each cycle is given relative to January 1, 1995, the first year of DAMA/NaI's operation.
public, and we are thus unable to analyze the data in that case. The mean cycle times and amplitudes are taken from Figure 3 of Ref. [11] and exposures are given in Refs. [8,11]. The mean time for each cycle is relative to January 1, 1995, the first year of DAMA/NaI's operation.
There are many different possible run tests that can be performed on a given set of data to check for consistency. In this context, a run is a group of consecutive identical elements in a two-valued random sequence constructed from the data. A run test checks for runs in the data, in terms of a two-valued characteristic one can assign to each data point. The collaboration has performed run tests to determine whether all data are consistent with the best fit value for the amplitude, and found lower tail probabilities of 41%, 29%, and 23% for the 2-4, 2-5, and 2-6 keVee energy ranges, respectively. We have repeated the DAMA collaboration run tests and reproduced their results (see Table IV; Above and Below Best Fit). Here, we are interested in the consistency between the two iterations of the DAMA experiment, NaI and LIBRA, and we perform several other tests of interest as described below.
For the DAMA data, there are a total of 7 data points for each of the two iterations of the experiment (for each energy bin). To check whether the NaI and LIBRA data are consistent with each other, we rank the full 14 measurements from low to high, keeping track of which experiment made the given measurement. The number of runs is then one plus the number of times in this list the subsequent element changes from one version of the experiment to the other. As an example, when the amplitude values for the 2-6 keVee energy range are sorted in this way, the list would be (LIBRA, LIBRA, LIBRA, LIBRA, LIBRA, NaI, LIBRA, LIBRA, NaI, NaI, NaI, NaI, NaI, NaI), giving a total of four runs in this list. In this version of the run test, the null hypothesis would be that the two distributions are equal, i.e. that the DAMA/NaI and DAMA/LIBRA experiments are measuring the same amplitude in a given energy bin. The alternative hypothesis is that the two distributions are not equal.
Under the null hypothesis, i.e. that the two populations of data are drawn from the same distribution, the probability of obtaining a certain number of runs can be calculated using P (R = 2k) = 2 where R is the number of observed runs, n 1 is the number of values from one population or experiment, and n 2 is the number of values from the other population or experiment, and k is an integer that gives the appropriate number of runs. Similarly, if the number of runs is odd, the probability of obtaining that number of runs can be written as The lower tail probability can then be computed by summing the probability of two runs (there must be at least two runs) up to the actual number of runs observed. To be concrete, there are four runs in the case of the 2-6 keVee energy bin, so the probability would then be summed for obtaining, two, three, and four runs. The results for this run test applied to the data in Table III are collected in Table IV under the heading of Ranked by Experiment. We find that the null hypothesis for the 2-6 keVee energy range would be rejected at the 2σ level, while the 2-4 keVee and 2-5 keVee energy ranges have p-values of 30% and 21%, respectively.
Other possible run tests focus on fluctuations above and below a reference point, i.e. the median, mean, best fit point, etc. In this case, the measurements are put in an array in the order in which they were measured. Then a new list is formed where at each element, a '+' sign or '-' is placed depending on if the measurement is above or below the reference point. The number of runs are then counted in this list and the probabilities calculated exactly as in the previous case with n 1 and n 2 now representing the number of points above and below the reference value. The null hypothesis for this run test is that the data points are randomly fluctuating about the reference value. A run test of this type with reference to the best fit point is employed by the DAMA collaboration, and we reproduce their p-values. The results for these run tests are presented in Table IV.
We also utilized the Kolmogorov-Smirnov (KS) test for the two data sets. The KS test is another non-parametric test in which the null hypothesis is that two data sets are drawn from the same distribution. As can be seen in Table IV, we find that the null hypothesis has p-values of 21% and 5% in the 2-4 keVee and 2-5 keVee energy ranges, respectively, and that  it is rejected at the 2.4σ level in the 2-6 keVee energy range. One possible explanation for these results, which will be explored in the remainder of this paper, is that the modulation amplitude is changing (seemingly decaying) with time.

B. Annual cycles
Given the apparent decrease in the modulation amplitude from DAMA/NaI to DAMA/LIBRA, it is interesting to investigate in detail the time dependence of the modulation amplitude. Here, we postulate several phenomenological models for the time dependence of the modulation amplitude, and perform likelihood fits to investigate whether or not any model is favored by the data.
A visual representation of the deviations of the amplitude in each annual cycle from the average is shown in Figure 1. For each energy interval considered, we plot the deviation in each annual cycle as where S m,k and σ k are the amplitude and uncertainty in the kth annual cycle, respectively, and S m is the uncertainty-weighted average amplitude in the particular energy range over the entire 14 year period. We observe that all the published data (in the 2-4 keVee, 2-5 keVee, and 2-6 keVee energy ranges) fall within 2σ of the average. the 14 annual cycles, there are three > 2.8σ departures from the average in the 5-6 keVee energy range. If the data in this range were gaussian distributed, then the probability of this occurring is 2 · 10 −5 (4.2σ). This seemingly unlikely situation is an indication that there may be a problem with the data in this energy range.
To describe a potential time dependence of the modulation amplitude, we have performed likelihood tests for the following models: a constant modulation amplitude, separate constants for NaI and LIBRA, linear in time, exponential in time, and a broken exponential (a common exponential scale, but independent normalizations for NaI and LIBRA). We estimate the per-cycle modulation amplitudes for the 4-6, 4-5, and 5-6 keVee energy ranges using Eqn. (2) and Eqn. (3) and these values are shown in Table V. Fits for these models to the per-cycle DAMA modulation amplitude data from Table III and Table V are presented  in Table VI. The minimum χ 2 and corresponding p-value for a χ 2 goodness-of-fit are shown. Best-fit parameters are found in the final column and parameters not shown are profiled over. We have highlighted in red any hypotheses that are discrepant at ≥ 2.5σ.
Based on the goodness of fit test, we find that all the models we investigated (including the single constant amplitude) provide reasonable fits to each of the energy ranges, with the exception of the 5-6 keVee range. This range is excluded at nearly 4σ by the goodness of fit test for every model we studied. This is the quantitative conclusion regarding the odd  The per-cycle modulation amplitudes for the 4-6, 4-5, and 5-6 keVee energy ranges, derived from the data in Table III as described in the text. Exposures and mean times for each cycle are the same as in Table III.
behavior in the 5-6 keVee range observed in Figure 1. Finally, we compared models (hypothesis tests) against a single, constant modulation amplitude taken as the null hypothesis. The results are displayed in Table VI with the improvement in fit given by the ∆χ 2 along with the corresponding p-value. The improvements in fit for the 2-4 keVee energy range are marginal (1.0σ to 1.6σ), indicating consistency in this energy range. We determine, however, that all time-varying models we investigated are preferred over a constant amplitude in the 2-6 keVee energy range at levels from 2.3σ to 2.7σ. As the 5-6 keVee data may be somewhat suspect, we find that the alternative models are still preferred over a constant amplitude in the 2-5 keVee range at the level of 2.0σ to 2.7σ. Although the data clearly indicate that some form of time dependence is preferred, none of the alternative models is clearly better than the others. Unfortunately, the data is not capable of distinguishing between the different functional forms proposed here to model the time dependence of the modulation amplitude.

IV. DISCUSSION AND CONCLUSIONS
In this study, we have analyzed the annual modulation data released by the DAMA collaboration that they collected over 14 annual cycles and performed likelihood analyses for several phenomenological models to explore a possible time dependence in the modulation amplitude. Our results indicate that all of the models we examined are preferred over a  Table III and Table V. The minimum χ 2 and corresponding p-value for a χ 2 goodness-of-fit are shown, followed by the ∆χ 2 and corresponding p-value for a hypothesis test where the conventional constant modulation amplitude is taken as the null hypothesis. Best-fit parameters are shown in the final column; parameters not shown are profiled over. We have highlighted in red any hypotheses that are discrepant at ≥ 2.5σ. A differential rate unit (dru) is equal to 1 count/kg/day/keVee. ( †) Data for these energy intervals are derived as discussed in the text.
constant amplitude at up to ∼3σ. Although the data clearly prefer some form of time dependence, they are currently incapable of distinguishing between the different functional forms for the time dependence we investigated. Though we have identified the fact that the modulation amplitude is discrepant at the 2-3σ level for each of the two versions of the experiment in all but the 2-4 keVee energy range, we do not propose a physical explanation for the phenomenon. More data will certainly aid in determining whether this is a real effect or a statistical fluctuation. The DAMA/LIBRA experiment has undergone a significant upgrade, and has been taking data in the DAMA/LIBRA-phase 2 configuration since January 2011 [19]. In November of 2010, all of the photo-multiplier tubes (PMT's) in the DAMA/LIBRA experiment were replaced by high quantum efficiency PMT's [24]. The anticipated lower thresholds (∼1 keVee) will give access to a new signal range that should help to clarify the dark matter interpretation of the DAMA signal, as described in Ref [20].
It is also interesting to note that the half life for the exponential decay model in the 2-5 keVee energy range is about 11 years. If the collaboration releases data again in the near future, both the linear and exponential decay models would predict a noticeable decrease in the modulation amplitude, somewhere in the neighborhood of 50%. If this trend continues, this would have serious implications for the dark matter interpretation of the DAMA modulation.
In this study we have attempted to answer the question of whether the DAMA collaboration data demonstrates any time dependence, and the answer to that question appears to be "yes," at the 2-3σ significance level in all but the 2-4 keVee energy range. A more conclusive answer will of course require additional data, which we look forward to in the near future, from the DAMA collaboration as well as from ANAIS-112, COSINE-100, and SABRE, each of which will test the DAMA modulation results. Furthermore, given the questions raised here as well as the tension of the DAMA results with results from other dark matter direct detection experiments, we urge the DAMA collaboration and other experimental collaborations to make enough of their data public (i.e. providing time-stamped events) in order for researchers to repeat their analyses to encourage scrutiny and, with any luck, a consistent explanation for the observed phenomena, dark matter or otherwise.