Quantifying the evidence for dark matter in CoGeNT data

We perform an independent analysis of data from the CoGeNT direct detection experiment to quantify the evidence for dark matter recoils. We critically re-examine the assumptions that enter the analysis, focusing specifically on the separation of bulk and surface events, the latter of which constitute a large background. This separation is performed using the event rise-time, with the surface events being slower on average. We fit the rise-time distributions for the bulk and surface events with a log-normal and Pareto distribution (which gives a better fit to the tail in the bulk population at high rise-times) and account for the energy-dependence of the bulk fraction using a cubic spline. Using Bayesian and frequentist techniques and additionally investigating the effect of varying the rise-time cut, the bulk background spectrum and bin-sizes, we conclude that the CoGeNT data show a preference for light dark matter recoils at less than 1σ.


Introduction
In many theories of particle dark matter (DM), the DM particle has a small interaction with the Standard Model particles.Direct detection experiments are searching for evidence of this interaction.Of particular interest in the past few years have been the results from the DAMA/LIBRA [1], CoGeNT [2][3][4][5], CRESST-II [6,7] and CDMS-Si [8] experiments, which have observed an excess number of events above naive background expectations.These experiments have been interpreted as providing evidence for DM elastically scattering on nucleons with a mass around 10 GeV.However, this interpretation is not supported by the results of CDEX [9], CDMS-II [10], EDELWEISS-II [11], LUX [12], SuperCDMS [13], XENON10 [14,15] and XENON100 [16], for example, thus calling for a thorough investigation of available data.
The CoGeNT collaboration have publicly released their 1129 live-days dataset, which contains events spanning the period between 4th December 2009 and 23rd April 2013 [5].
This dataset shows an exponential rise in the number of events at energies below 1 keV ee , where a low-mass DM recoil signal is expected to occur.The CoGeNT dataset also shows ∼ 2.2σ evidence for the expected ∼ 10% annual modulation in the event rate induced by DM recoils [17,18].This may also be consistent with a DM recoil signal, but appears to require significant departures from a Maxwellian velocity distribution [5,[19][20][21][22][23].In light of this, in this paper we focus only on the unmodulated data, which exhibits the low-energy exponential rise, and leave the analysis of the much smaller annual modulation signal to future work.Our aim here is to quantify the evidence for a DM signal in the unmodulated dataset.
A limitation of the CoGeNT experiment is that the data contains both bulk and surface events; the surface events constitute a background that mimics a DM recoil signal at low energy so should be removed before searching for a DM signal.In section 2 we discuss how the bulk and surface events can be distinguished by their 'rise-time' since surface events typically have a longer rise-time than bulk events.Unfortunately, the populations of bulk and surface events overlap significantly at low-energy because the difference between the bulk and surface events rise-time is small.This means that in the low-energy region of interest for DM searches, only a fraction of the observed events are bulk events; the fraction of bulk events (the 'bulk fraction') must be determined before the evidence for DM can be quantified.
In section 3 we characterise the bulk fraction as a function of energy using cubic splines (owing to the absence of a theoretically motivated function).This allows us to robustly incorporate the bulk fraction uncertainties in our Bayesian and frequentist analyses.Following previous analyses [4,5,24] we first model the bulk event population with a log-normal distribution.We then investigate fits with a Pareto distribution that gives a better fit to pulser data (which mimic bulk events) at larger values of the rise-time.We also test the robustness of our conclusions by considering variations in the DM search region (by investigating alternative rise-time cuts), the bulk background spectrum and bin-sizes.
As our conclusions differ from the results of the CoGeNT analysis in ref. [4], in section 4 we reanalyse the older 807 live-days dataset used in that analysis.We summarise all of our findings in section 5 and conclude in section 6.A number of appendices give cross-checks of our results and technical details related to our analysis.

Surface and bulk events
The CoGeNT experiment employs a detector based on a p-type germanium semiconductor.The bulk of the detector is a pure p-type semiconductor while the surface is an inert 'dead layer' (this is the location of the electrical contact for the detector).Between the dead layer and the bulk is a ∼ 1 mm thick 'transition layer'.As the transition region is on the outside of the detector module, we follow the CoGeNT collaboration and define events occurring in the transition region as 'surface events'.
In this transition region, the charge collection efficiency is less than one.Because of this, the reconstructed energy of surface events will be smaller than the true energy of the incoming particle.Specifically, this implies that high energy gamma-rays (or other backgrounds) which scatter in the transition region will be reconstructed as low energy events, resulting in a buildup of low-energy events [24].This is a problem because the signal from elastically scattering DM particles also has a characteristic rise at low energies so surface events can mimic a DM signal.
The transition region is only ∼ 1 mm thick compared to the cm scale bulk region so the volume of the transition region is much smaller than that of the bulk.Since DM particles . Bulk (red solid line) and surface (blue solid line) event distributions as a function of rise-time for energies between 0.5 keV ee and 0.9 keV ee .The pulser data (cyan points) mimics the behaviour of bulk events.In the upper panel both populations are fit with log-normal distributions, while in the lower panel the bulk population is fit with a Pareto distribution.Both the log-normal and Pareto bulk distributions (when summed with the surface distribution) give good fits to the CoGeNT data (black points) but the slower fall off of the Pareto distribution at large rise-time is better able to fit the pulser data.
are weakly-interacting, they are significantly more likely to scatter in the larger bulk region.Hence, the removal of the surface events will reduce the number of background events that mimic a DM signal while leaving the DM rate essentially unaffected.For these reasons, to search for a DM signal it is imperative that the surface event population is separated and removed from the bulk event population.This separation is possible because surface events are characterised by a longer duration (the 'rise-time') than those in the bulk.Previous analyses [4,5,24] assumed that the bulk and surface event populations both follow a log-normal distribution.This assumption is motivated by the fact that the build up of charge carriers in the detector after an energy deposit is geometric i.e. more charge carriers implies a faster rate of increase.The overlap of the two population is energy dependent and is clearer at higher energies (see [4,24]).The black data points in figure 1 show the rise-time distribution of events in the low-energy range between 0.5 keV ee and 0.9 keV ee (the range of interest for DM searches).Here the separation between bulk and surface events is not as clear.The upper panel of figure 1 shows that the sum of the log-normal distributions from the bulk events (red solid line) and the surface events (blue solid line) give a good fit to the CoGeNT data points.
Also shown in figure 1 is the rise-time distribution from fast electronic pulser events, which mimic the behaviour of bulk events [25].The electronic pulser data is shown by the cyan data points (after a suitable renormalisation).The upper panel of figure 1 shows that these events are poorly fitted by the log-normal distribution because of the large tail of events at high rise-times.This motivates us to consider other models of the bulk event distribution.Specifically, we consider a generalised Pareto distribution whose PDF we give in appendix A. This distribution has a parameter that allows a better fit to the tail of the pulser data, as shown in the lower panel of figure 1.
Both the log-normal and Pareto fits for the bulk events (when summed with the lognormal fit to the surface events) give a comparable quality of fit to the CoGeNT data.For example, the fit with the bulk events modelled with a Pareto distribution gives a χ 2 = 37.3, while with the log-normal gives χ 2 = 36.6.
In light of this, in the remainder of our analysis, we consider both the log-normal and Pareto distribution for the bulk rise-time distribution.We always fit the surface events with a log-normal distribution, as we are primarily concerned with the overlap of the bulk and surface event distributions; this occurs at smaller rise-times when the difference between the log-normal and Pareto distributions is small.Finally, the pulser data is used only as a motivation to consider the Pareto distribution: we do not fit to it in our analysis, fitting only to the CoGeNT data itself.

Analysis of the full 1129 live-days dataset
Having demonstrated how to separate bulk events from surface events (which are dominantly background events that rise at low-energy), in this section we discuss our analysis of the full 1129 live-days CoGeNT dataset.Initially, we describe our procedure to remove the surface events leaving the bulk-only event spectrum.Performing a fit to this spectrum will allow us to quantify the evidence for DM in the CoGeNT data.We begin by assuming that the bulk population is modelled by a log-normal distribution and take an energy-independent rise-time cut τ cut = 5 µs.This value is conservative as it captures the majority of the total number of events in the 0.5 to 0.9 keV ee energy bin, where the DM signal is maximal (cf.figure 1).Our analysis procedure with these assumptions is presented in sections 3.1 to 3.3.We then consider variations in these assumptions to test the robustness of our results: in section 3.4 we model the bulk distribution with a Pareto distribution; in sections 3.5 and 3.6 we take an energy-independent rise-time cut at τ cut = 0.4 µs and an energy-dependent rise-time cut respectively.

Fitting the bulk fraction
As figure 1 demonstrates, in a given energy range only a fraction of the events below the rise-time cut τ cut = 5 µs are bulk events.We refer to this fraction of events as the 'bulk fraction' R. To remove the surface events contamination from the measured spectrum, we need to determine the bulk fraction as a function of energy.To do this, we first determine R j in equally spaced energy bins E j between 0.5 and 2.9 keV ee and then fit a function R(E) to these values.In each bin j, we find the best-fit log-normal distributions for the surface 0.5  Figure 2. The fraction of bulk events ('bulk fraction') obtained from log-normal fits to the rise-time data.The red solid line is the best-fit cubic spline for the knot positions indicated by the black stars.The green band represents the 68% uncertainty.At low energies only around 20% of the events are from the bulk of the detector.This rises to ∼ 40% at higher energies.and bulk populations (as in section 2) and integrate up to the rise-time cut τ cut to calculate the bulk fraction Number of bulk events < τ cut Total number of events < τ cut j .
The value of R j and its 1σ uncertainty σ j for each bin are shown in figure 2. We have used a bin width of 0.4 keV ee to obtain six values of R j .The error-bars on R j arise from the uncertainty in the six parameters used to fit the two log-normal distributions.In appendix C we have checked that our choice of 0.4 keV ee for the bin width does not bias our results.
Given that there is no theoretically motivated function for R(E), we determine it with a cubic spline fit.This is similar to the method used by the XENON100 collaboration to parameterise the functional form of the relative scintillation efficiency L eff [16,26,27].The result of our cubic spline fit is shown in figure 2. The red solid curve indicates the best-fit cubic spline along with the associated 1σ uncertainty band (containing all splines within 1σ of the best-fit) for the choice of knots shown by the black stars.We have performed this fit using a Gaussian likelihood function where d R refers to the bulk fraction data, R j is the bulk fraction and its uncertainty σ j for each energy bin E j and R(E j ) is the value of the spline at that point.The best-fit spline is the one which maximises this likelihood.The more a spline R(E) deviates from the data, the smaller its likelihood will be.
The cubic spline fit shown in figure 2 is for fixed knot positions.In the remainder of our analysis we also allow the energy of the two lowest-energy knots to vary.The numerical implementation of this is discussed in appendix E. The energy position of the other knots makes little difference to our analysis as they are beyond the energy region of interest for DM recoils.Therefore we do not consider variations in their position.Figure 3.A graphical demonstration of the procedure to remove the surface events.Upon multiplying the total spectrum (green solid) by the bulk fraction R(E), we are left with the red dashed spectrum, which contains only bulk events.After removing the L-shell peak at ∼ 1.3 keV ee , the resulting blue dashed spectrum is used to search for a DM signal.

Removing the surface events
We now use the function R(E) to remove the surface event contamination, leaving a spectrum of purely bulk events.This procedure is shown in figure 3 using the best-fit spline from figure 2 for R(E).The spectrum of surface and bulk events (green solid line) is multiplied by the bulk-fraction R(E) to obtain the bulk-only spectrum (red dashed line).We then remove the L-shell peak at E ∼ 1.3 keV ee , as discussed in appendix B.1, to leave the blue dotted spectrum.This is the spectrum which can be analysed for a DM signal.However, to robustly quantify the evidence for a DM signal, we should not only consider the best-fit spline but also all the other splines within e.g. the 1σ or 2σ uncertainty bands.Since each spline implies a different bulk fraction, the resulting bulk-only spectrum associated with all these splines are different from that obtained using the best-fit spline.Hence, the evidence for a DM recoil signal depends strongly the choice of spline fit for R(E) (especially at energies around 0.5 keV ee ).We therefore need to marginalise over all splines to obtain a robust unbiased result.

Marginalising over the bulk fraction
A robust (unbiased) analysis should be independent of the choice of spline for R(E).Therefore, we consider all possible R(E) splines by marginalising over these different choices.We start by defining the posterior distribution P(m, σ|d), which tells us how much the data prefer a given DM recoil signal over background for all choices of R(E).This is defined using Bayes' theorem, P(m, σ|d)P(d) = P(d|m, σ)P(m, σ) where d = d E + d R refers to the sum of the data from the bulk energy spectrum and bulk fraction respectively, P(m, σ) ≡ P(σ)P(m) is the prior for the theoretical parameters m and σ, the mass and cross-section of the DM particle, and P(R(E)) is the prior for R(E).The form of the prior P(R(E)) is determined from the likelihood from the spline fits of section 3.1.
The function P(d E |m, σ, R) is the likelihood for the bulk-only data and it tells us how much the data prefer a DM recoil signal compared to the background-only scenario.If the data prefer a DM+background fit over the background-only scenario, then this should result in a peak in this likelihood at the preferred value of the cross-section σ.This likelihood takes the form of a Poisson function, where i runs over the N energy bins (of width 0.05 keV ee ), n i is the number of events in each bin and λ i = f i (m, σ) + b i is the sum of signal f (m, σ) and background b in each bin.For the signal, we use the standard DM elastic recoil spectrum (described in appendix B.2) and the background refers to that from bulk events only; we assume that all surface events are removed by the bulk fraction function R(E).In our analysis we use the spectral shape of CoGeNT's background estimate given in [4], corrected for the increased exposure.Appendix B.1 shows that we obtain similar results with an alternate background estimate from [24] (cf.figures 4 and 15).Our likelihood analysis is performed over the energy range 0.5 to 3.0 keV ee , the same range we considered when determining R(E) (cf.figure 2).We investigate variations in this energy range in appendix D and find that the results are robust against these variations.For practical purposes, our marginalisation reduces to the following discrete sum, where we sum over all splines.Appendix E describes in detail our numerical implementation of this procedure.For a particular choice of spline, P(d E |m, σ, R(E)) gives the quality of the DM recoil signal fit to the bulk energy spectrum.This is weighted by P(d R |R(E)), which tells us how well this spline fits the bulk fraction R from our rise-time fits.We assume that P(m, σ) is flat so that all values of m and σ are considered equally.The process used to calculate P(m, σ|d) for m = 10 GeV is shown graphically in figure 4 for three individual splines choices.Each spline is the result of a fit to the bulk fraction shown in the left panel and leads to a different spectrum of bulk only events, shown in the central panel.One immediate observation is that choosing spline 3 (pink dotted line) implies a considerably larger number of bulk events at low energy than the choice of spline 2 (blue solid line) or spline 1 (red dashed line).As a result, spline 3 leads to a low energy excess of bulk events over the background estimate, shown by the solid black line.This excess is compatible with a DM signal that has a cross-section σ 9 × 10 −42 cm 2 (central panel).
For each of these splines, we show the product in the right panel.Spline 3 results in a large excess above the background estimate at low-energy (central panel) so its likelihood peaks around σ 9 × 10 −42 cm 2 (right panel) indicating a preference for a DM signal with this cross-section.However, the significance of this result is down-weighted by the prior from the spline fit, since spline 3 gives a poor fit to the risetime data (left panel).In contrast, spline 2 gives no low-energy excess above background (central panel) and the likelihood, which is largest for vanishingly small cross-sections, shows no preference for a DM signal.Spline 2 provides a good fit to the rise-time data in the left panel resulting in the large posterior probability.Spline 1 results in a small low-energy excess (central panel) but the shallow peak in the right panel indicates that the preference for a DM recoil signal with σ 3 × 10 −42 cm 2 is weak.The posterior distribution is smaller in comparison to spline 2 because the fit to the rise-time data is poorer (left panel).
Marginalising over all splines (including those not shown in the left panel) results in the marginalised posterior (black curve) in the right panel.The absence of any peak (for non-zero values of σ) shows that there is no preferred value for the DM cross-section after marginalisation.This demonstrates that although a specific choice of R(E) may give a preference for a DM recoil signal, robustly accounting for all reasonable choices via marginalisation may washout the preferred cross-section picked out by a specific choice.
The results above are for m = 10 GeV.We can easily extend our analysis to cover all values of mass m.After doing so and extending our marginalisation to over 40, 000 splines, our marginalised posterior gives a Bayes factor of ln B = −0.69.This implies a weak preference for the background-only scenario (the interpretation of ln B is discussed further in appendix F).In addition, we have performed an equivalent frequentist analysis, replacing the marginalisation by a profile likelihood procedure.This results in a p-value of p = 0.36, implying that the data prefers the DM signal over the background only hypothesis at less than 1σ.

Analysis with the Pareto distribution
As discussed in section 2, the bulk population of events may be better described by a Pareto distribution as it gives a better fit to the pulser data at large rise-times (cf.figure 1).We  now perform the same analysis described in sections 3.1 to 3.3 except we model the bulk population with a Pareto distribution rather than a log-normal distribution.This modifies the values of R j for each energy bin E j and so affects the energy-dependence of R(E).
The values of R along with the results of a cubic spline fit are shown in figure 5 (analogous to figure 2 for the log-normal distribution).The slower fall-off of the Pareto distribution means that there are a larger number of bulk events at large rise-times compared to the log-normal distribution.This means that the values of R are larger than their log-normal Corrected bulk fraction Figure 7. Spectrum of CoGeNT events with rise-times below 0.4 µs.We also show the corrected bulk fraction R c (E), which, when applied to the data accounts for surface events as before, and also for bulk events which have been removed by the rise-time cut.
counterparts.However, the overall shape of R(E), including the drop at low-energy, is similar to that found in figure 2.
The result of our analysis when a Pareto distribution is used is shown in figure 6.Again, although a specific form of R(E) (such as spline 3) may result in a low-energy spectrum that favours a DM recoil signal, others (such as spline 1) do not.The marginalised posterior function (black solid line in right panel) gives the result including the sum over all splines.As before, the marginalised posterior is flat indicating vanishing evidence for a DM signal.Scanning over all masses results in a Bayes factor of ln B = −0.94 and a p-value of p = 0.33.The Bayes factor implies a weak preference for the background only model while the p-value implies that the data prefers the DM signal over the background only hypothesis at less than 1σ.

A smaller rise-time cut
In the previous analysis, we assumed a conservative rise-time cut τ cut = 5 µs that meant that the majority of all events were included in our analysis (cf.figure 1).We now investigate the effect of choosing a smaller value for τ cut in order to directly remove some of the surface events that have a rise-time above this value.This is the approach taken by the CDEX collaboration in their analysis; they choose τ cut = 0.8 µs [9].
In the analysis presented in this section we choose τ cut = 0.4 µs as this cuts away around 99.5% of the surface events while leaving around 40% of the bulk events, assuming the bulk is modelled by a log-normal distribution.The choice for τ cut is somewhat arbitrary but we have tested our analysis with other cuts and reach similar conclusions.
The resulting spectrum after applying the cut τ cut = 0.4 µs is shown by the solid green curve in figure 7.As before, we must correct this spectrum to account for surface events that fall below the timing cut.Additionally, we must also include a cut-efficiency term to correct Figure 8. Analysis procedure for a 10 GeV DM particle after removing all events with rise-time above 0.4 µs.This cut removes over 99% of the surface events.The left panel shows the corrected bulk fraction data (in green) together with three example spline fits.As can be seen in the central panel, there are no plausible forms of the corrected bulk fraction which fully restore the low-energy excess, giving a featureless marginalised posterior in the right-panel.
for the fact that we have removed bulk events.Hence, we define the corrected bulk fraction = Number of bulk events Total number of events < τ cut , where the first term in equation (3.7) is the same as equation (3.1), while the second term is the cut-efficiency term.We obtain a functional form for R c (E) with a cubic spline fit, analogous to the method used to determine R(E).An example of applying R c (E) to the data is shown in figure 7.In this case, R c (E) (black dashed line) is larger than unity because of the cut-efficiency term.The blue dashed line in figure 7 shows the corrected spectrum (after also subtracting the L-shell peak), which is comparable to the corrected spectrum shown in figure 3 for τ cut = 5 µs.
Figure 8 shows the results of our analysis when τ cut = 0.4 µs, for m = 10 GeV.We assume that the bulk spectrum is modelled by a log-normal distribution.The green data points in the left panel are our results for R c and the lines show three example spline fits.
The lines in the central panel show the corresponding number of bulk events for each spline choice.The dips at ∼ 0.5 keV ee and ∼ 0.7 keV ee are simply a reflection of the reduced number of events measured by CoGeNT at these energies (they are clearly seen in the total spectrum (green line) in figure 7).This rise-time cut has effectively removed the vast majority of the surface events and the resulting corrected spectrum shows no clear low energy excess.This is suggestive that the low energy excess arises from surface events.In this case, only the extreme splines (e.g.spline 2) results in a preference for a DM recoil signal while the marginalised posterior (black line in right panel) is flat, indicating no preference for non-zero σ.
After scanning over all DM masses, we obtain a Bayes factor ln B = −1.02and a p-value of p = 0.63.As before, the Bayes factor implies a weak preference for the background only model while the p-factor implies that the data prefers the DM signal over the background at less than 1σ.

Energy [keV ee ]
Rise-time cut τ cut [µs] 0.5 -0.9 0.68 0.9 -1.3 0.84 1.3 -1.7 0.62 1.7 -2.1 0.64 2.1 -2.5 0.59 Table 1.The rise-time cut τ cut for each energy bin.The cut is defined as the time where the best-fit surface and bulk distributions are equal (assuming a log-normal fit to the bulk events).All events with rise-times above this value are removed, leaving mostly bulk events only.

An energy-dependent rise-time cut
Instead of the constant rise-time cut used in the previous section, we now use a rise-time cut τ cut which varies with energy.This is closer to the procedure followed by the CoGeNT collaboration in [2][3][4][5].We define our rise-time cut by the time at which the best-fit bulk and surface populations cross.Since the surface events are present predominantly above this cut-off, we will be left with mostly bulk events.For example, for the energy-bin between 0.5 keV ee and 0.9 keV ee , we see from figure 1 that the distributions cross at 0.68 µs (assuming a log-normal for the bulk distribution).This cut removes around 96% of the surface events while maintaining around 80% of the bulk events.Table 1 lists all of the rise-time cuts for each energy bin used in our analysis.The result of our analysis using the energy-dependent rise-time cuts in table 1 are shown in figure 9.The left panel shows the values of R c along with three example spline-fits.These results are in full agreement with our previous findings: splines which give a poor fit to the corrected bulk fraction result in some preference for a DM signal (spline 1 and spline 3) 0.5 CoGeNT Exponential (α =1.10) Figure 10.Data for the bulk-fraction (from [4]).The purple lines show the best-fit exponential used by CoGeNT and its 1σ error range.Along with this, we show the best-fit cubic spline (red solid) and its associated 68% uncertainty region (green band).
while the better fitting spline results in no preference (spline 2).The absence of any peak in the marginalised posterior indicates that there is no preference for a DM signal when all splines are marginalised over.After including all values of m in our analysis, we obtain a Bayes factor of ln B = −0.83and a p-value of p = 0.52, implying that the data has a weak preference for the background only hypothesis (from the Bayes factor) or that the preference for the DM signal over the background is less than 1σ (from the p-value).

Analysis of the 807 live-days dataset
All of the analyses in the previous section showed that the preference for a DM signal in the full CoGeNT dataset is non-existent or small; the Bayes factors indicated a slight preference for the background only model while the p-value indicated a preference for DM at less than 1σ significance.This seems to be at odds with previous results presented by the CoGeNT collaboration, where an irreducible low-energy excess was found even after correcting for the surface event contamination below the rise-time cut.This low-energy excess was interpreted as evidence for DM with mass 7-10 GeV and cross-section ∼ 3 × 10 −41 cm 2 [4].In order to show why our results differs from theirs, we also analyse the CoGeNT 807 live-days dataset.Unlike in previous sections, we now use the bulk fraction R determined directly by the CoGeNT collaboration in ref. [4].We have reproduced these values in figure 10.In their analysis, the CoGeNT collaboration employed an energy-dependent rise-time cut that results in a smoother drop-off in the bulk fraction at low energy (cf.figures 2 and 10).
To derive their 'region of interest', the CoGeNT collaboration fitted a one-parameter exponential of the form R CoGeNT (E) = 1 − exp[−α • E] to the bulk-fraction data R, obtaining α = 1.21 ± 0.11.The purple dashed and dotted lines show the best-fit exponential and its error bars in figure 10.The one-parameter fit is determined by a fit to 12 data points but the fit is dominated by the small error bars at higher-energy.This means that the range of The lines in the central panel show the resulting bulk-only spectrum for each of these functions, together with the background estimate (black) and the number of events expected from a DM signal+background (green).A 'low-energy excess' is produced for a limited choice of the fits consistent with R. For the cubic splines, the excess is either reduced or has vanished relative to the CoGeNT exponential (central panel).The right-hand panel shows that other functional choices give much lower evidence for a DM signal, than the exponential used by CoGeNT.When marginalising over all possible spline choices (black solid line), the signal is washed out leaving vanishing evidence for DM recoils in the CoGeNT 807 live-days data.
allowed values from the exponential fit is several factors smaller than the errors on the data at the lowest energy, precisely the energy range relevant for an 8 GeV DM particle.As there is no a priori reason to fit the data with an exponential function over another function, we instead use a cubic spline.The result of this fit together with the 1σ uncertainty is indicated by the green-band in figure 10.The cubic spline fit allows the error on R(E) to grow at lower energies, matching more closely the behaviour of the data points.
To examine the extent to which this choice of exponential function inclines the analysis towards a DM recoil fit, we repeat the analysis of section 3.3 using the 807 live-days data using the bulk-fraction data shown in figure 10.We follow the analysis cut scheme described in [4].The results of the analysis for CoGeNT's exponential fit and two alternative cubic spline fits are shown in figure 11.
The central panel shows that the CoGeNT exponential fit results in a low energy excess above the background estimate.This excess is consistent with a DM particle with a mass 8 GeV and cross-section σ = 2.8 × 10 −41 cm 2 .The strongly peaked likelihood in the right panel indicates that there is a strong preference for a DM signal and we obtain the same 'region of interest' as the CoGeNT collaboration found in [4] (red region in figure 12).Now, the blue solid spline shown in the left panel of figure 11 is just as plausible a function for R(E) (it gives a slightly better fit compared to the CoGeNT exponential).In contrast, this results in no low-energy excess in the central panel and therefore no preference for a DM recoil signal (right panel).As with the 1129 live-days data, we should marginalise over the form of R(E) so as not to bias our result by a particular functional form.This result is shown by the solid black line in the right panel of figure 11.It shows that there is no strong preference for a DM recoil over the bulk background, leading to a Bayes factor of  ln B = −0.76,consistent with the values we found in section 3. Hence, we conclude that the CoGeNT 'region of interest' is the result of a bias from the choice of the exponential function for R(E); equally valid functional forms of R(E) (since there is no theoretically motivated function) result in no excess.The uncertainties from the surface event contamination were not fully accounted for at low-energy when deriving the 'region of interest' in [4].

Summary of results
Table 2 summarises the results of our Bayesian and frequentist analyses of the 1129 live-days dataset performed in section 3.In these analyses, we modelled the bulk population by a lognormal and Pareto distribution and considered three variations in the rise-time cut τ cut .The results in the 'Log-normal' column correspond to the analyses in sections 3.3, 3.5 and 3.6, where τ cut = 5 µs, τ cut = 0.4 µs and was energy-dependent respectively.The first result in the 'Pareto' column corresponds to the analysis in section 3.4, where τ cut = 5 µs.For completeness, we also list the results for the Pareto distribution when τ cut = 0.4 µs and when it is energy-dependent.These analyses were not shown in section 3 but the methodology is the same as the other analyses.
All of these analyses give consistent results for the p-values and Bayes factor B from the CoGeNT 1129 live-days dataset.The Bayes factor indicates a slight preference for the background-only model while the p-value indicates a preference for DM at less than 1σ.These results are robust against changes in the bulk background estimate and the bin-size used to calculate R(E), as we demonstrate in appendices B.1 and C respectively.
Given we find no strong preference for a DM signal above background from the CoGeNT data, we proceed to derive a 90% upper limit on the DM-nucleon cross-section.Our limits are defined by integrating under the normalised posterior from σ = 0 up to the limiting value σ L , such that 90% of the posterior is contained between these two values.
We first reproduce the earlier CoGeNT 'region of interest' from the 807 live-days data, found using their exponential fit for R(E) (we also marginalise over their quoted uncertainties, which broadens the region slightly).This is shown as the red contour in figure 12.The purple

CoGeNT exponential (this work) 807 days CoGeNT limit (this work) 1136 days CoGeNT limit (this work) LUX SuperCDMS
Figure 12.The red contour shows the 90% best-fit contour from the 807 live-days data, obtained using CoGeNT's exponential fit to the bulk-fraction, which biases the analysis.The purple line is the 90% limit from the 807 live-days data after marginalising over all cubic spline fits.The blue line is the 90% limit from the full 1129 live-days dataset after marginalising over all cubic splines, and assuming a log-normal distribution for the bulk population and τ cut = 5 µs.Also shown for comparison are the 90% limits from LUX and SuperCDMS.CoGeNT sets the strongest limit in the range 4.5 GeV to 6.5 GeV.
dashed line shows the 90% upper limit from this data when marginalising over all cubic spline fits to R(E).The solid blue line shows the 90% upper limit from our analysis of the 1129 live-days data after marginalising over all cubic spline fits to R(E).We model the bulk population with a log-normal distribution and impose τ cut = 5 µs.This limit is competitive with the low-mass limits from the SuperCDMS and LUX experiments (orange and green curves respectively).The 1129 live-days CoGeNT limit is the strongest limit in the narrow window between 4.5 GeV and 6.5 GeV.

Conclusions
We have presented an independent analysis of the most recent 1129 live-days dataset from the CoGeNT experiment.The data contain both bulk and surface events; DM events are dominantly bulk events while the surface events constitute a background which mimics a DM recoil signal at low energy, so should be removed.
Surface and bulk events can be discriminated by their rise-time (see figure 1).This allows the surface events to be subtracted by using the bulk fraction R(E) (see figure 3).Owing to the absence of any theoretically motivated functional form for R(E), we parameterised the energy dependence of R(E) with cubic splines (see figure 2).Using these cubic splines, we found that an excess above background arises only for specific choices of the bulk fraction R(E), while other equally plausible spline choices give no excess.We demonstrated this repeatedly in section 3 (see e.g.figure 4).A robust (unbiased) analysis should therefore be independent of the choice for R(E).We accomplished this by marginalising over all splines; we integrated out R(E) using Bayesian and frequentist methods and found less than 1σ evidence for a DM recoil signal in CoGeNT data, above the known backgrounds.
To test the robustness of our results, we demonstrated that we obtain the same conclusions under different assumptions.We first considered an alternative model for the rise-time of the bulk events.The Pareto distribution provides a better fit to pulser data at high risetimes (see figure 1) but the resulting preference for DM from the data is still less than 1σ.This is shown graphically in figure 6.Additionally, we considered variations in the rise-time cut: an energy-independent cut at 0.4 µs that removes the vast majority of surface events and an energy-dependent cut.In both cases, we found that DM is preferred over background at less than 1σ significance (see figures 8 and 9).
Furthermore, we have analysed the 807 live-days data that was used by the CoGeNT collaboration to define a 'region of interest' in the plane of DM mass and cross-section [4].We showed that this region of interest is the result of a bias coming from the choice of the exponential functional form for R(E).We demonstrated in figure 11 that other equally plausible functional choices give a reduced (or zero) preference for DM recoils.We found that the region of interest vanishes for the marginalised result, which accounts for all reasonable choices of R(E).
Thus far, we have not commented on the 2.2σ significance for an annual modulation also present in this dataset [5].We note only that if this annual modulation is attributed to DM recoils, then the apparent lack of consistency with the unmodulated data must be explained (see [23]).
In conclusion, we find that there is at most a 1σ evidence for a DM recoil signal in CoGeNT data.Using the same dataset to derive a limit on the DM-nucleon scattering crosssection, we found that the CoGeNT limit is competitive with those from other experiments (see figure 12) and sets the strongest limit in the mass range 4.5 GeV to 6.5 GeV.This means that spline 2 and 3 make a greater contribution to the marginalised posterior than in the analyses in section 3, which results in a very small peak in the marginalised posterior.Despite this, the data prefer a DM signal at less than 1.2σ above the background-only scenario.
Alternatively, the bulk events can be modelled with a Type IV Pareto distribution (with location parameter µ = 0).In that case, the bulk population is modelled with In section 3 we fixed α b = 0.2 so that the bulk distribution has three free parameters: the amplitude A b and the variables γ b and κ b .The distribution shown in figure 1 has {A b , γ b , κ b } = {138, 0.249, 0.300}.We constrained the parameter α b so that the log-normal and Pareto fits were performed with the same number of free parameters.To show that the choice for α b does not bias our result, we here show the result when α b is also allowed to vary.We call this distribution Pareto 4 as there are now four free-parameters.In this case the fit to the rise-time data has seven free parameters: four from the Pareto 4 distribution for the bulk events and three from the log-normal distribution for the surface events.
Values for the bulk fraction R are shown in figure 13, along with the best-fit cubic spline and its 1σ uncertainty band.Owing to the extra parameter, the uncertainties on R are larger for each bin.This is reflected in the 1σ band from our cubic spline fits, which is wider.We show in figure 14 the analysis with the Pareto 4 distribution and τ cut = 5 µs.From this analysis we obtain a p-value of p = 0.23 and a Bayes factor of ln B = −0.49,after marginalising over all splines (including allowing the energy values of the first two knots to vary).These results are consistent with those found earlier: DM is preferred above background at less than 1.2σ.

B Sources of bulk events
In this section, we discuss the spectra used in our analysis for the bulk event background and the DM recoil signal.

B.1 Cosmogenic and radioactive backgrounds
In addition to the surface events, there are a few known backgrounds expected in the bulk of the detector.The L and K shell peaks, which appear prominently in the CoGeNT spectrum, arise from activated cosmogenic isotopes which decay to give lines at various (known) energies.By fitting to the K-shell, we can subtract the L-shell events at lower energy [4,24].
The K-shell peaks are all present above 4 keV ee .The fit proceeds as follows: we fit a gaussian to the K-shell peak for each isotope given in [3,5] with the mean and width fixed and the amplitude allowed to vary freely.There is additionally a roughly constant component in the fit, which is mostly flat but falls gradually towards higher energies [24].This gives us the amplitudes of each K-shell peak.We use the ratios of the amplitudes of the L and K shell peaks for each isotope [5] to build their L-shell peaks, which are all present around 1.3 keV ee .We then subtract these from the spectrum, as shown in figure 3. 1CoGeNT also discuss other backgrounds.The dominant contributions for bulk events arise from muons in the shielding of the detector, radioactive isotopes in the resistors and α and neutron emission from the cavern walls [4].For these events, we use the bulk background spectrum derived in [4], corrected for the increased exposure.It is clear that there is a flat plateau in this spectrum, but these events also extend into the signal region, with a small rise at low energy.For our analysis, the amplitude of the bulk background is allowed to float, which in practice means that its amplitude is fixed to match the (flat) bulk spectrum at energies above the L-shell peak.
We have also cross-checked our results using the bulk backgrounds discussed in [24], and using a flat background.For example, we show in figure 15 the result of our analysis performed using only the flat component from [24] for the bulk background estimate.Comparing with figure 4, we see that in this case the solid blue spline now gives strong evidence for a low-energy excess, consistent with a light-DM recoil.However, the purple dashed spline, which is a better fit to the data for R in the left panel, gives no low-energy excess and has no preference for a DM recoil signal.Figure 15.Analysis of 1129 live-days of CoGeNT data with the bulk background estimate from [24].In this case, the blue solid spline now gives strong evidence for a light DM recoil signal.However, as can be seen in the right panel, the result after marginalising is similar to that in figure 4, since the purple dashed spline is fully consistent with the bulk background estimate.Hence, our marginalised result is robust to changes in the bulk background.
Changing the bulk background estimate has altered the fit for each individual spline, but the result is the similar to that found in section 3.3 after marginalising over all possible splines (solid black line in right panel).This demonstrates that our result is robust against reasonable changes in the bulk background spectrum.

B.2 Dark matter
We generate the DM recoil spectrum f (m, σ) using the following expression, where σ(E nr ) is the DM-nucleus cross-section as a function of nuclear-recoil energy E nr , is an efficiency factor accounting for the CoGeNT cut-acceptance [25,28] (for the 807 live-days dataset we follow the cut scheme in [4]), µ is the DM-nucleus reduced mass, ρ = 0.3 GeVcm −3 is the local dark matter density and η( is the DM mean velocity.The mean velocity is integrated over the distribution of DM particle velocities in the galaxy f (v) boosted into the reference frame of the Earth by u e [29,30].The lower limit of the integration is v min (E nr ), which is the minimum DM velocity required to induce a recoil of energy E nr .We assume the standard halo model such that f (v) is given by a Maxwell-Boltzmann distribution cut off at an escape velocity of v esc = 537 kms −1 [31]; in this case, there is an analytic form for η(E nr ) [32].
We assume that DM particles interact identically with protons and neutrons so that σ(E nr ) = σ (µ/µ p ) 2 A 2 F 2 (E nr ), where σ is the zero-momentum DM-nucleon cross-section, A is the atomic mass of germanium, µ p is the DM-proton reduced mass and F (E nr ) is the Helm nuclear form factor [33].We convert nuclear-recoil energy E nr into electron-equivalent energy E using the relation E = 0.2E 1.12  nr [4].The upper panels show that the energy range 0.5 to 1.5 keV ee gives similar results to the full range 0.5 to 3.0 keV ee used in section 3 (cf.figure 4).The lower panels show that there is no preference for DM in the range 1.5 to 3.0 keV ee for any choice of spline.
interaction and has a mass around 10 GeV.The results of this analysis are shown in the upper panels in figure 17.In this case, we still use the range 0.5 to 3.0 keV ee to determine the bulk fraction R(E) but only consider the range 0.5 to 1.5 keV ee when determining the bulk likelihood.In this instance, splines 1 and 3 result in slightly higher likelihoods (cf.figure 4) but the marginalised result is similar to previous results; after marginalising over all spines we obtain a p-value of p = 0.43 and a Bayes factor of ln B = −0.99,consistent with the values in table 2.
Secondly, we consider the high energy range 1.5 to 3.0 keV ee .While a DM signal would not be present in this range if it scatters elastically with the usual spin-independent interaction, it is possible that more exotic interactions could result in a DM signal at higher energies.This range is also of interest since a significant component of CoGeNT's modulating signal is at higher energies [19].The results from this analysis are shown in the lower panels of figure 17.As the three splines are similar at between 1.5 and 3.0 keV ee (lower left panel) the resulting bulk-only spectra in the lower central panel are similar.Furthermore, these spectra are in good agreement with the background estimate (in black) so it is not surprising that the likelihoods do not peak at non-zero values of the cross-section (lower right panel).The reason why the likelihoods for the three splines are different in the lower right panel is because the spline prior is different in each case (the bulk likelihood is similar), reflecting the fact that spline 1 and spine 3 give progressively worse fits to the bulk fraction at low energy.The marginalised result is again similar to previous results; after marginalising over all spines we obtain a p-value of p = 0.93 and a Bayes factor of ln B = −0.80,consistent with other values (cf.table 2).

E Marginalisation procedure and other technical details
This section provides details of our Bayesian method to marginalise over the bulk fraction R(E).We start with equation (3.4) for the marginalised posterior.This is calculated by comparing CoGeNT data to the bulk background plus a potential DM recoil for every possible form of the bulk fraction R(E).The marginalisation is then achieved by summing up all of these likelihoods, multiplied by a prior for m, σ and R(E).The latter prior provides a weight for each spline in the marginalisation integral.
We now describe how we incorporated the information from the values of R into our marginalisation procedure.We model R(E) as a cubic spline, parameterised by a set of knots with positions x i = (E i , R i ) on the E-R axis, as shown for example in figure 2. Parameterising R(E) with cubic splines is a good choice in the absence of any theoretically motivated function.The quality of the fit of the spline to data for R is measured by the posterior where the priors for each of the knot positions P(x i ) are assumed to be constant.We use this directly for our prior on R(E), i. where the integral over R has been converted into an integral over the positions of the knots.So, for example, i runs between one and four for a 4-knot spline and the integral over R becomes a four-dimensional integral.In practice, this integral is evaluated as a discrete sum.Setting all of the priors to be constant within some finite space reduces equation (E.2) to the discrete sum in equation (3.6).This means that we prepare a set of splines which we wish to marginalise over, defined by their knot positions, and then calculate P(d R |R(E)) and P(d E |m, σ, R(E)) using each spline one-by-one.We then sum the products of these likelihoods and the constant prior P(m, σ) to obtain the marginalised posterior.It is important that the range of splines (or other suitable functions) scanned over in this sum is large enough to cover the range of possible choices, otherwise the result will be biased.
As discussed in section 3, we allow the amplitude of the knots together with the energy of the two lowest energy-knots to float.When a log-normal distribution is used to model the bulk events and τ cut = 5 µs, the E-axis position is varied between 0.5 and 1.2 keV ee for the knot at lowest energy, and between 1.2 and 2.0 keV ee for the knot at second lowest energy.In both cases, the positions are varied in units of 0.1 keV ee .The R value of the two lowest energy knots are varied between 0 and 1, within 50 evenly-spaced bins.A similar procedure is followed for the Pareto distribution when τ cut = 5 µs.The difference is that the lowest energy knot is varied between 0.5 and 1.0 keV ee while the next knot is varied between 1.0 and 1.6 keV ee .
In the cases where we vary the rise-time cut, R(E) is replaced with the corrected bulk fraction R c (E), which incorporates an efficiency factor (see equation (3.7)).Since R c can take values greater than unity, we scan between 0.5 and 4.0 in 50 separate bins, while also allowing the positions of the two lowest-energy knots to vary on the E-axis, between 0.5 and 1.1 keV ee for the lowest knot and between 1.1 and 1.8 keV ee for the second-lowest.The other knots do not vary with E.

F Bayes factors and p-values
Our analysis includes both Bayesian and frequentist statistical tests.For the Bayesian case, we can determine to what extent the data prefer a DM+background fit over the backgroundonly scenario using the Bayes factor [34,35].This is defined as where all terms are as defined in section 3.3, and P(d E |σ = 0, R(E)) is the likelihood function in the case of zero DM signal (i.e. the background-only scenario).We have marginalised over the bulk-fraction R(E) using the prior from the spline fit to the R data points.In all cases, we define our prior P(m, σ) to be linearly flat in mass between 2 GeV and 20 GeV (we focus on low-mass DM) and logarithmically flat in σ between 10 −46 cm 2 and 10 −34 cm 2 (the priors are zero outside these ranges).We have tested our conclusions with different priors, and our results do not change significantly.If B > 1 then the fit to the data is improved when adding a DM recoil signal to the bulk background spectrum (after marginalising over R(E)), while B < 1 implies that the data disfavours a DM recoil signal [34].It is generally more convenient to work with the logarithm of B, in which case −1 < ln B < 0 implies weak/inconclusive preference towards the background-only scenario, and −3 < ln B < −1 implies substantial preference.For positive values of ln B, the DM recoil scenario is preferred over background.
Our discussion can also be placed in a frequentist context.We use a profile likelihood method to profile out the bulk fraction R(E), analogous to marginalisation in the Bayesian method.After this, we use a likelihood ratio test to compare the DM+background with the background-only scenario, and define p-values from this quantity.In full, this profiled likelihood ratio is We assume that λ follows a χ 2 distribution with two degrees of freedom (mass and crosssection), which allows us to convert λ to a p-value.Larger values of λ imply smaller values of p, with for example p = 0.32 indicating a 1σ preference for DM over the bulk background only, p = 0.05 indicating a 2σ preference and so on.

ee
Surface and Bulk events (SB) SB ×R(E) = Bulk events only (B)

Figure 4 .
Figure 4. Analysis procedure for a 10 GeV DM particle when the bulk events are modelled with a log-normal distribution and τ cut = 5 µs.The left panel shows three spline choices for the bulk fraction R(E).The central panel shows the resulting bulk-only data for each spline together with CoGeNT's background estimate (black) and the number of events expected from signal+background (green and blue).Spline 1 and spline 3 result in an excess above the background estimate at low energy.The right panel shows the likelihoods from the fits in the left and central panels.The peaks for spline 1 and spline 3 indicate a preference for non-zero values of σ with different significances.Spine 2 results in no excess (central panel) and no preference for non-zero σ (right panel).The solid black line in the right panel is the marginalised posterior, obtained by marginalising over all spline choices.It does not posses a maximum value, indicating a vanishing preference for a DM recoil signal over the background estimate.

Figure 5 .
Figure 5.The fraction of bulk events R when fitting the bulk event population with a Pareto distribution.Shown also is the best-fit cubic spline fit (for the knot positions indicated by the black stars) and the 1σ uncertainty.

Figure 6 .
Figure 6.Analysis procedure for a 10 GeV DM particle using a Pareto distribution to model the rise-time of the bulk population.The left panel shows three spline choices for the bulk fraction R(E).The central panel shows the corresponding bulk spectrum for each spline, together with CoGeNT's background estimate (black) and the number of events expected from signal+background (green and blue).Spline 2 and spline 3 result in a low energy excess in the central panel so a non-zero value for σ is preferred, indicated by the peak in the likelihood in the right panel.In contrast, spline 1 results in no low energy excess and no peak in the likelihood.The marginalised posterior (black line, right panel) is flat indicating a vanishing preference for a DM recoil signal over the background estimate.

Figure 9 .
Figure 9. Analysis procedure for a 10 GeV DM particles after imposing an energy dependent rise-time cut τ cut (E), whose values are listed in table 1.The left-panel shows the corrected bulk fraction R c together with three different cubic spline fits.These spline fits lead to three different forms for the bulk-only spectrum in the central panel.The presence of an excess of events above background at low energy depends on the choice of spline.After marginalising over all splines, the marginalised posterior (black solid line in right panel) is flat, indicating that the data is fully consistent with known backgrounds.

Figure 11 .
Figure 11.Analysis of 807 live-days data for an 8 GeV DM particle.The dashed purple line in the left panel shows the exponential fit used by the CoGeNT collaboration, along with two alternative cubic spline fits to the same data.Both splines fit as well (or better) to this data compared to the exponential.The lines in the central panel show the resulting bulk-only spectrum for each of these functions, together with the background estimate (black) and the number of events expected from a DM signal+background (green).A 'low-energy excess' is produced for a limited choice of the fits consistent with R. For the cubic splines, the excess is either reduced or has vanished relative to the CoGeNT exponential (central panel).The right-hand panel shows that other functional choices give much lower evidence for a DM signal, than the exponential used by CoGeNT.When marginalising over all possible spline choices (black solid line), the signal is washed out leaving vanishing evidence for DM recoils in the CoGeNT 807 live-days data.

Figure 13 .Figure 14 .
Figure13.Values of the bulk fraction R obtained by fitting the bulk population to the Pareto 4 distribution.This distribution has an additional free parameter compared to the analyses in section 3. The best-fit spline and its 1σ uncertainties are also shown.

Figure 17 .
Figure 17.Analysis of 1129 live-days of CoGeNT data for a 10 GeV DM particle for different energy ranges in the likelihood analysis.The upper (lower) panels show the results for the range 0.5 to 1.5 keV ee (1.5 to 3.0 keV ee ).The upper panels show that the energy range 0.5 to 1.5 keV ee gives similar results to the full range 0.5 to 3.0 keV ee used in section 3 (cf.figure4).The lower panels show that there is no preference for DM in the range 1.5 to 3.0 keV ee for any choice of spline.

λ = − 2
ln max P(d E |m, σ, R)P(d R |R) max P(d E |σ = 0, R)P(d R |R) , (F.2)where 'max' refers to the largest value of the numerator for all values of R, m and σ, and the maximum value of the background-only likelihood P(d E |σ = 0, R)P(d R |R) for all splines.

Table 2 .
The Bayes factors B and p-values p from our Bayesian and frequentist profile likelihood marginalisation analyses respectively.These analyses fully incorporate the uncertainties on the bulk fraction R(E).Lower p-values correspond to a greater preference for a DM signal with p < 0.32 (0.05) indicating (at least) a 1σ (2σ) significance.Bayes factors for which ln B < 0 imply a preference for the background-only model (i.e.no DM).Further details are given in appendix F.