A likelihood framework for cryogenic scintillating calorimeters used in the CRESST dark matter search

Cryogenic scintillating calorimeters are ultra-sensitive particle detectors for rare event searches, particularly for the search for dark matter and the measurement of neutrino properties. These detectors are made from scintillating target crystals generating two signals for each particle interaction. The phonon (heat) signal precisely measures the deposited energy independent of the type of interacting particle. The scintillation light signal yields particle discrimination on an event-by-event basis. This paper presents a likelihood framework modeling backgrounds and a potential dark matter signal in the two-dimensional plane spanned by phonon and scintillation light energies. We apply the framework to data from CaWO 4 -based detectors operated in the CRESST dark matter search. For the first time, a single likelihood framework is used in CRESST to model the data and extract results on dark matter in one step by using a profile likelihood ratio test. Our framework simultaneously


Introduction
Modern, precision astrophysics and cosmology hardly leave any doubt that dark matter (DM) exists and that it is five times more abundant in the Universe than ordinary, baryonic matter [1].Unraveling the nature of DM is a major priority of particle physics.However, despite the tremendous progress made in the last decades, no unambiguous signal could be observed yet, neither at particle colliders nor through indirect or direct detection.The properties of DM, such as the mass of DM particle(s) and their interaction cross-section(s) with ordinary matter, are only loosely arXiv:2403.03824v2[astro-ph.CO] 16 Sep 2024 constrained so far.Thus, various direct DM detection experiments featuring different technologies are mandatory for a broad search for interactions of DM particles in earth-bound detectors.
Due to their low energy thresholds, cryogenic DM detectors have leading sensitivity for light DM particles.One of them is the CRESST 1 experiment, currently in its third stage: CRESST-III [2].CRESST uses scintillating crystals (typically made from CaWO 4 ) simultaneously recording a phonon (heat) signal and a scintillation light signal from a particle interaction.
The most established and recommended tool for statistical inference, in the field of direct dark matter detection (and rare event searches in general), is the likelihood formalism [3].This paper presents a profile maximum likelihood analysis modeling all known event types and a potential DM signal in the two-dimensional plane spanned by the phonon and light energies.We show that our likelihood framework is capable of characterizing the discrimination capabilities between the nuclear recoil signal and electromagnetic backgrounds and may also be used to combine data from different detectors.The framework presented here evolved from the one presented in [4].The most crucial update is that only a single likelihood fit is done to model the data in the lightphonon plane and extract results on a potential dark matter signal, yielding proper treatment of uncertainties.
We begin with a short introduction to CRESST-II/III detectors and the data used in this work in section 2, followed by an in-depth description of our likelihood framework in section 3. Results are presented in section 4, conclusion and outlook in section 5.The presented likelihood framework is tailored to CRESST, but might, with modifications, also be applied for related experiments, in particular COSINUS 2 [5] and NUCLEUS 3  [6], which are based on the same technology.

Detector layout and used data
This paper uses data from CRESST-II [7,8,9] and CRESST-III [2] all using CaWO 4 target crystals.The CaWO 4 -crystals are equipped with a transition edge sensor (TES) made from tungsten to read the primary phonon signal from a particle interaction.We call the ensemble of target crystal and TES the phonon detector.The target crystal is paired with a cryogenic light detector made from a wafer-shaped silicon-onsapphire piece equipped with a second, separate TES.The 1 cresst-experiment.org 2 cosinus.it 3 nucleus-experiment.orgTES thermometers are produced at the Max-Planck Institute for Physics in Munich/Garching.
Phonon and light detector constitute a detector module.A schematic drawing of a CRESST-III detector module is depicted in figure 1. Apart from the two TESs for phonon and light detector, the CaWO 4 -holding sticks are also instrumented, thus denoted iSticks.This iStick system vetos potential background events originating from the holding.It was not yet available in CRESST-II.The iStick veto is applied during the event selection in the raw data analysis; for the likelihood framework (working on processed data) it is irrelevant.A further difference to CRESST-II are the roughly ten times lighter crystals used in CRESST-III, which were introduced to lower the energy threshold and, thus, enhance the sensitivity for light DM.Table 1 summarizes the crucial detector parameters and the data used for this work.We simultaneously fit background data (no calibration source) and neutron calibration data (AmBe neutron source) for all detectors.We use the CRESST-II detector module TUM40 to illustrate the likelihood method; afterward, we will discuss the combination of data from multiple detectors and calculate exclusion limits on the DM-nucleon cross-section using a profile likelihood ratio test.The block-shaped target (absorber) crystal has a mass of ∼24 g, and its dimensions are (20x20x10) mm 3 .It is held by three instrumented CaWO 4 holding sticks (iSticks), two at the bottom and one on top.Three non-instrumented CaWO 4 holding sticks keep the squareshaped silicon-on-sapphire light detector in place.Its dimensions are (20x20x0.4)mm 3 .Figure and caption from [2].

Energy, light and light yield information
Before we present the construction of the likelihood framework, we introduce the relevant observables.For every particle interaction, a CRESST detector simultaneously measures the energies deposited in the phonon channel (E p ) and in the light channel (L), respectively.We define the light yield LY as the ratio of these two quantities: LY = L/E p , which is set to one for events from a predetermined calibration line 4 .This calibration results in an electron-equivalent unit, keVee, for E p and L. However, other event types are characterized by different light yields.In particular, the sought-for nuclear recoils exhibit lower light yields than the dominant β/γ-backgrounds.Since we measure the energy sharing between phonon energy E p and light energy L, we can obtain the total event-type-independent energy, in keV, using the following equation: The scintillation efficiency η quantifies the amount of energy converted to scintillation light for a γ-calibration event.
As one can see, the above formula will affect all events with a light yield not equal to one and allows to correctly obtain the total deposited energy E for any event in the detector [7,9].In this work, the scintillation efficiency η is a free parameter in the maximum likelihood fit.The obtained values are compatible with previous results [7,9] where η was extracted from the tilt of γ-lines in the light-phonon energy plane caused by statistical fluctuation of the light production.
Throughout the paper, we will stick to the following notation: L: Energy measured in the light detector (keVee) E p : Energy measured in the phonon detector (keVee) E: Event-type-independent total deposited energy (keV) LY : Light Yield LY = L/E p

Likelihood framework
We use the extended maximum likelihood framework (explained in more detail in subsection 3.8) to model backgrounds and a potential signal in the L-E-plane using nonnormalized density functions ρ x (E, L, ⃗ θ x ) (with x standing for different density functions).They depend on E, L and the parameters ⃗ θ x .Most density functions, except for the excess light events (see subsection 3.4), can be written as the product of a band describing the position in the L-Eplane and an energy spectrum (dN/dE x ( ⃗ θ x )) accounting for the content of the band.A band is given by its mean line 4 122 keV γs from 57 Co for CRESST-II and 63.2 keV γs from the K αescape of tungsten for CRESST-III when illuminated with 57 Co.
For illustration, we show exemplary bands in the L-Eplane in figure 2 and the same bands in the LY -E-plane in figure 3.Both figures show the mean lines (dashed) and the upper and lower 90 % boundaries (solid) which are the ±1.28 σ one-sided quantiles around the (dotted) mean line.Within the two solid boundaries 80 % of the events of the respective event class are expected (with respect to their distribution in L/LY for a given energy E).
Fig. 2 Schematic illustration of the electron band (blue) and the nuclear recoil bands for oxygen (red) and tungsten (green) in the light (L) versus energy (E) plane.The calcium band is not shown for clarity; it would be located between the oxygen and the tungsten band.

Mean of the bands
In this subsection we describe the mean of the bands in the L-E-plane for each of the respective event classes.The width of the bands will be discussed in the following subsection 3.2 and the corresponding energy spectra (the content of the bands) in subsection 3.3.

Electron band
All bands are derived from the electron band (or β-band) which describes energy depositions in the crystal stemming from interactions with electrons in the crystal.The electron band is the band with the highest light yield and its mean line L e (E) is modeled as: L 0 accounts for the linear light output while L 1 is an empirical parameter allowing small deviations from linearity.The term in square brackets is called non-proportionality arising from the saturation of light production per unit path length at low energies, see [10,11] for details.The nonproportionality effect varies for different CaWO 4 -crystals and is typically more pronounced for crystals grown at TU Munich (TUM40 and Detector A) than for commercially produced ones (Lise).

γ-band
γs interacting in the target crystal transfer their energy to multiple electrons.For the energy E this makes no difference, as only the total deposited energy matters.However, because of the aforementioned non-proportionality effect, the total amount of produced light L is smaller if multiple electrons share the energy.We found, using data and a Monte Carlo simulation, that the mean line of the γ-band L γ (E) can be modeled by evaluating the electron band at a reduced energy with two detector-specific free fit parameters Q γ,1/2 .

β/γ band(s)
We also observe mixtures between β and γ events in the case of a β-decay accompanied by the emission of a γ from the subsequent de-excitation of the nucleus. 5The total deposited energy of such a mixed event is E = E γ + E β and the total light consequently is: i : index of β/γ-decay in respective detector This description is specific for each excited state of each βdecaying nuclide, thus we added the index i.

Elastic nuclear recoil band(s)
Nuclear recoils exhibit a significantly lower light output than electron events, an effect known as quenching.The quenching depends on the nuclide resulting in one band per nuclide in the target material.The reduction in light output of a nuclear recoil compared to an electron recoil of the same energy is quantified by the quenching factor which may depend on the energy.Nuclear recoils do not suffer from the saturation effects, thus they are not affected by the nonproportionality effect.The mean lines of the elastic nuclear recoil (ENR) bands are then obtained by: x ∈ O,Ca,W In the above equation, L 0 E + L 1 E 2 describes the electron band without non-proportionality, and parametrizes the quenching factors measured for the three nuclei in CaWO 4 with a neutron beam, see [12] for the measurement and details on the parametrization.The corresponding parameter values are listed in table 2. Since the neutron beam measurement could determine these parameters with high precision, they are fixed in our likelihood fit.However, in [12] the authors also found that different CaWO 4 -crystals may have shifted nuclear recoil bands, where all three bands are commonly affected.Thus we add the detector-specific free fit parameter ε to our model.We obtain values for ε that are compatible with [12].

Inelastic nuclear recoil band(s)
We find inelastic nuclear recoils (INRs) in the neutron calibration data.These are interactions of neutrons with atomic nuclei where the nucleus is left in an excited state and deexcites with the emission of a γ-ray or an electron.Since de-excitation happens quickly, our detectors will measure the total deposited energy in the crystal which is the sum of the kinetic energy transferred to the nucleus plus the energy released by the de-excitation.The light output of an INR is thus a mixture of the light output of a nuclear recoil (off the respective nucleus) and the output of a γ or electron.Consequently, the INRs are modeled analogous to the β/γ-events (see equation 4): i : index of INR band in respective detector z ∈ β, γ : β (electron) or γ from de-excitation x ∈ O,Ca,W

Constant light yield or no-light band
Although trivial, we explicitly mention that a band with a constant LY (LY c ) can be defined: LY c = 0 corresponds to a no-light band for events where no scintillation light is produced.

Width of the bands
The distribution of the events in the band at a specific energy (width of each band) is modeled as a Gaussian function around its mean line with an energy-dependent width.It results from the finite resolutions of light and phonon detector, where the impact of the latter depends on the slope of the band in the L-E-plane (equation 10).

Resolution of the light detector
The resolution of the light detector σ L (L) is given by: In the above equation σ L,0 is the baseline resolution (= resolution at zero light, σ L (0)) of the light detector.This parameter is determined precisely from the raw data analyses [2,8,9] and, therefore, fixed in the likelihood fit.Instead, S 1 and S 2 are free fit parameters.S 1 L results from the Poissonian light production and detection process and consequently enters σ L proportional to √ L. S 1 is the amount of energy that has to be deposited in the target crystal to measure one photon in the light detector (on average and for an electron event).S 2 is a generic parameter that was empirically found to improve the fit result for certain detectors.However, S 2 is typically small, and the total resolution σ L (L) is dominated by σ L,0 close to the threshold and by S 1 L otherwise. 6

Resolution of the phonon detector
The resolution of the phonon detector σ P (E) is parametrized in the following way: It is dominated by the baseline resolution σ P,0 which again is determined externally and fixed in the fit, while σ P,1 remains free.σ P,1 is typically small and accounts for a potential energy dependence of the resolution.E thr is the energy of the trigger threshold.

Total width of the bands
The width of each band is then calculated by: L x may be any band, see equations 2, 3, 4, 5, 6, 7.
The slopes dL x /dE are calculated analytically for maximum speed and accuracy.Figure 2 intuitively illustrates that steeper bands (higher slope dL x dE ) are more affected by the resolution of the phonon detector (σ P (E)).

Energy spectra
While the two previous subsections (3.2 and 3.1) described the position of the various bands, this subsection will present the modeling of their content and their energy spectra.For this work, we are using an empirical model.However, for future work, we may also include results from Geant4 Monte Carlo simulations used for the CRESST background model [13,14].

Electron spectrum
Electrons from Compton scattering dominate the electron band.Their energy spectrum for the energy regime of interest is mostly constant with a small linear slope and consequently given by: dN e dE = P 0 + EP 1 (11)

γ-peaks
We find several γ-peaks in our physics data.Each of them is modeled with a Gaussian function of free mean µ and amplitude C. Their width is given by the resolution of the phonon detector σ P (E) (see equation 9).In total we get the following equation: i : index of γ-peak in respective detector The determination of the γ-peaks to be included in our model is an iterative process.We start by fitting a likelihood framework including all known peaks to the data, simultaneously allowing for determining the position of the γ-band.All events located inside of the 90 % upper and lower boundary of this band are then selected, and a density function is created either through binning or Gaussian kernel density estimation (KDE).The width of the Gaussian kernels, as well as the width of the bins, is set according to the resolution of the phonon detector.We find that the Gaussian KDE method usually yields better results.In comparison, the Gaussian KDE had a higher ratio of known peaks to overall found peaks.Therefore, we use it for all the data presented in this paper.Peaks are then localized in that spectrum based on their topographic prominence [15,16].The possible origins of additional, a priori unknown peaks are then determined by comparing their energy to known γ-sources.We include any peak of plausible origin in the final model.

β/γ spectra
The β/γ spectra consist of a γ-photon with energy E γ,i and a decaying tail down to the Q-value of the respective βdecay.Therefore, we model the β/γ-spectra as a triangle starting at E γ,i and approaching zero for the Q-value.To account for the resolution this triangle is convolved with σ P (E) (equation 9): • erf Typically the pre-factor C i and the energy of the γ particle E γ,i are free fit parameters, while the distance between γ and Q-value D i = Q i − E γ,i is fixed in the fit to the literature value.This parametrization allows scaling the amplitude and shifting the convolved triangle but not shrinking or stretching it.

Bremsstrahlung spectrum
γs are also created as secondary particles from high-energy charged particles, particulary muons, interacting in the copper surrounding the detectors.Our understanding is that the high-energy particles create δ -electrons that emit Bremsstrahlung when slowed down [17].The result is a characteristic bump peaking at about 150 keV [18].Although the CRESST setup is equipped with a muon veto, a faint bump may still be observable in the background data.The neutron data, on the other hand, show a clear bump which is confirmed by Geant4 Monte Carlo simulations [19]. 7To model the bump we use a semi-empirical description of an exponential decay multiplied with a third-order polynomial: •(1 We use the same shape parametrization for background and neutron calibration data (parameter values p gb,x ).The amplitudes A gb differ for background and neutron data to account for the much higher rate of these events during the neutron calibration.

Elastic nuclear recoil spectra
In the neutron calibration data, we model the energy distributions of neutron-induced ENRs as exponential functions x ∈ O,Ca,W with three fit parameters A x and l x .For the background data, the neutron background is very small and thus set to zero in the likelihood fit.

Inelastic nuclear recoil spectra
The • In the above equation, we neglect subdominant terms to enhance computing speed at practically no loss of accuracy.Like for ENR, the number of INR events in background data is negligible and, consequently, set to zero in the likelihood fit.

Low-energy-excess spectrum
We observe an exponentially rising event population close to the threshold of some detectors, typically denoted lowenergy-excess (LEE) [20,21].The origin of these events is not fully understood so far, and they appear at very low energies where hardly any scintillation light is produced.We model the LEE with a no-light or constant light yield band (equation 7) and (one or two) exponential functions for their energy spectrum: N i is the number of exponentials used to model the LEE.Under standard assumptions, the expected DM-nucleus recoil spectrum features an approximately exponential shape.Therefore, any potential background with an exponentially shaped energy spectrum must be treated with special caution since it can be misinterpreted as a signal and vice versa.While a discussion of the LEE is not the goal of this work, there are strong indications for a non-DM origin of the LEE in CRESST-III [21].Also, many other rare event searches see a similar excess which led to a community effort in the form of a workshop series denoted "EXCESS" [20,22,23,24].Furthermore, due to the large abundance of LEE events in many detectors, a description of the data in a maximum likelihood framework is not reasonably possible without accounting for this population.
We find that the LEE in TUM40 is reasonably modeled with one exponential, while Detector A benefits from two exponentials to describe the LEE.We show the effect of the LEE and its modeling on the DM exclusion limits in the appendix 6.3.
It should be noted that in the case of the profile likelihood calculation of an exclusion limit where the signal contribution is gradually increased from the best-fit point (while the rest of the parameters in the model are kept free) until the model is no longer compatible with the data, the contribution assigned by the likelihood to the LEE gradually decreases if there is any ambiguity between DM signal and LEE background.

Potential dark matter signal
We follow the so-called standard scenario to model a potential DM signal, assuming elastic, coherent DM-nucleus scattering and an isothermal, Maxwellian dark matter halo.It is given by x ∈ O,Ca,W with the DM-nucleus scattering cross-section σ 0 , the DM mass m χ , the reduced DM-nucleus mass µ x , the local DM density ρ χ , and the form factor F(E).To be able to compare results of different experiments, not the DM-nucleus crosssection is reported, but the DM-nucleon σ χ cross-section: where m x is the mass of the nucleus and m p is the proton mass.The integral over the velocity distribution f (⃗ v) starts at the minimal velocity to produce a nuclear recoil above E, v min (E) = (Em x )/(2µ 2 N ) and runs to the galactic escape velocity v esc .For f (⃗ v) we follow the implementation in [25], with v esc = 544 km/s, a(n) (azimuthal) solar velocity of v sun = 231 km/s, a velocity at infinite distance v inf = 220 km/s and the dark matter density at Earth's position ρ χ = 0.3 (GeV/c 2 )/cm 3 .We average over the annual modulation effect caused by the earth's orbit around the sun. 8This implementation of the expected DM signal is identical to the one previously used by CRESST [2,7,8] and follows to a large extent the recommendation of PHYSTAT-DM [3].As recommended by PHYSTAT-DM, we do not profile over the uncertainties of the DM halo, although technically possible.
For high momentum transfer, coherence is lost, and the form factor F(E) of the nucleus is considered.For oxygen and calcium, we use the model-independent parametrization of the form factors presented in [26].For tungsten, a Woods-Saxon approximation is made whose parameters may also be found in [26]. 9For other materials, which are, however, not discussed in this work, we rely on the Helm form factor [27] in the parametrization of Lewin/Smith [28].
The total DM signal results from the weighted sum over all target nuclides, where the weight for each nuclide is its mass fraction.

Excess light events
Excess light events are events with an additional light component which may e.g.be created by a particle traversing the scintillating polymeric foil surrounding our detector modules (see figure 1) and depositing energy in the foil and the target crystal.Here, we follow the empirical parametrization of [11] modeling the density of the excess light events as exponentially decreasing with El dec and distance to the mean of the electron band (El width ).Convolving with the resolution then yields: El amp is the amplitude of the density.

Exposure and efficiency
Table 1 lists the gross exposures of the respective data sets, which are given by the live time of the measurement times the mass of the target crystal.In the raw data analysis, we apply certain cuts to the data removing time periods of unstable detector operation, events where the correct determination of energy might not be guaranteed (e.g.pile-up, tilted 8 Including time information for signal and background is subject of future work and beyond the scope of this article 9 Details on the form factors for CaWO 4 are also summarized in [11].baselines, etc.) and events coincident between multiple detectors and/or the muon and/or iStick veto (see [2,8,9]).The efficiency of this selection process, defined as the probability of a valid event to survive, is measured with artificially created pulses.These pulses are generated by scaling a pulse template to the desired injected energy and adding it to an empty baseline (a noise trace).Then the selection criteria are applied, and the fraction of surviving to totally generated events yields the efficiency eff(E) for a certain energy E. The cuts are designed such that eff(E) only depends on the energy.Its shape is the same for background data and neutron calibration data; the scaling of the latter is a free fit parameter, as will be discussed in the next subsection 3.6.Figure 4 shows the efficiencies as a function of energy in double-logarithmic scale.
Although we obtain and plot the efficiencies as a function of simulated (true) energy, we apply it to the reconstructed (measured) energy, that is after convolution with the resolution.The efficiencies as a function of simulated energy already account for the finite resolution of the detector, thus they have to be applied to the reconstructed energy.This is conservative and was e.g.justified in [2] with a more detailed study.

Background and neutron calibration data
For all detectors analyzed in this work, we make use of the neutron calibration data, acquired by illuminating the experiment with an AmBe source.We always fit background data and neutron data simultaneously; the total negative loglikelihood is then given by: N : Gaussian function N bck/ncal is the number of events in the acceptance region for background and neutron calibration data, respectively.Accordingly, N ρ bck/ncal is the number of densities to consider for the respective data.
The likelihood function in equation 22 consists of four terms which we will briefly describe starting with the first one in the uppermost line.For each event, with energy E i and light L i , we evaluate the densities ρ j (E i , L i ) and sum the result which we multiply with the efficiency eff(E i ).Then, for the total likelihood, one has to multiply these values for each event.Since we are using the log L , this multiplication simplifies to a sum of the logarithms.In the first term, we consider all events in the background data (index i) and all densities relevant to the background data (index j).The second term in line two does the same for the neutron calibration data (indices k and l, respectively).
The last two terms are needed since the densities ρ j,l are not normalized, allowing one to have a different number of observed to expected events.The numbers of expected events are calculated in terms three and four for background and neutron calibration data, respectively.This formalism is known as extended maximum likelihood; see [29] for further details.
The acceptance region is the area in the light versus energy plane accessible to the likelihood framework.It starts at the energy threshold of the respective detector and ends at an a priori chosen upper energy, typically corresponding to the linear range of the detector (for the data analyzed in this work, see section 4).In light of this, we chose very loose bounds for the acceptance region, ensuring that all valid particle events are inside.
The density functions ρ ncal describing the neutron calibration data are constructed using the same band functions, consisting of mean lines and widths, as for the background density functions ρ bck .Since all background contributions are also present during the neutron calibration, we add them to ρ ncal weighted by the relative exposure of neutron calibration to background data.However, in the neutron calibration, additional contributions may arise from the neutrons themselves, from secondary particles created by the neutrons or the neutron source, and contributions from a not fully closed shielding during the calibration. 10lthough external measurements of the quenching factors in CaWO 4 exist, only fitting the neutron calibration data for each detector allows us to determine the detector-specific factor ε precisely.For this reason, the use of the neutron calibration data in the likelihood fit is crucial to precisely determine the position of the nuclear recoil bands, which is critical to calculate the expected distribution of a potential DM signal.Another benefit of using neutron calibration data is the additional statistics (also in the electron and γ-bands) which helps to fit the band parameters more accurately.

Combination of detectors
Similar to the combined description of background and neutron calibration data for a single detector, it is also possible to combine multiple detectors into a single likelihood description.In this case, the total log-likelihood is the sum of the log-likelihoods for the individual data sets.Unlike for the combination of background and neutron calibration data, the different detectors do not share the same band parameters.Also, the background and neutron calibration spectra are different.Instead, the only common parameters are present in the possible DM signal, and for detectors with the same material, the parameters describing the energydependent nuclear recoil quenching factors.Even though the properties of potential DM particles are the same, the shape and height of the dark matter spectrum can vary greatly between detectors due to different exposure, resolution, and efficiency.
The common dark matter signal also highlights the possible advantage of combining data sets from different detectors into a single description.The additional data provides more exposure which may result in stronger limits.For detectors with different target materials, the distinct shapes of the expected DM recoil spectra can yield a better signal-tobackground separation, especially if a low-energy excess is present.Additionally, a combination of detectors could potentially help to take advantage of the different properties of multiple detectors, e.g. two detectors, one with a low threshold and one with low background.

Profile likelihood ratio
To set exclusion limits on the DM-nucleon cross-section, we are using the profile likelihood ratio (PLR) which is a standard statistical method of the field and which will be reported briefly here following the notation of [30].
The PLR is defined as: where µ is the primary parameter of interest, in our case the DM-nucleus cross-section µ = σ 0 (see equation 19).The nuisance parameters are put to the vector θ θ θ .
In simple words, two (conditional) maximizations of the likelihood L are performed in the PLR.θ θ θ is the set of nuisance parameters which maximizes the likelihood L for a given, fixed value of µ.The denominator is the maximum likelihood for free μ and θ θ θ .The PLR can now be used to either claim a discovery or set an exclusion limit.For discovery, one would use λ (µ = σ 0 = 0), so compare the null hypothesis (no DM signal) to the best fit (including a potential dark matter signal) using the test statistics According to Wilk's theorem, q 0 follows a χ 2 distribution with one degree of freedom [31].The statistical significance Z is approximately given by Z = √ q 0 .For limit setting the test statistics is very similar: To set a limit, we solve the above equation 26 to find the value for µ = σ 0 corresponding to the desired and predefined confidence Z (conventionally 90 % are used in direct DM detection).

Results
In this paper, we apply the likelihood framework to three detectors; details on the detectors and data sets are given in section 2 and table 1, the efficiencies are plotted in figure 4. For reasons of clarity, we only show the fit results for TUM40 here.However, all fit results are available in the appendix 6 and the ancillary files.TUM40 has a low background level with clearly identifiable peaks and a well-performing light detector, but also a pronounced non-proportionality effect and a low-energy excess.These features make it the most challenging to fit out of the three detector modules considered in this work.
The energy range for the acceptance region (see equation 22) always starts at the threshold.The upper energy boundary is chosen in accordance with previous work: 16 keV for Detector A [2] and 40 keV for TUM40 [7], and Lise [8].For TUM40, we also give results (in the ancillary files) for a fit up to 400 keV to demonstrate that our likelihood framework provides an accurate description of the data not only in the energy range relevant for dark matter but throughout the whole linear range of the detectors.The acceptance region boundaries in light are set generously to ensure that all relevant event contributions are fully contained in the acceptance region.The same acceptance region is set for background and neutron calibration data.

Band fits
For all band fits, various minimization methods, as well as multiple reruns, were used to ensure a global minimum (the most likely set of parameters) is found.All relevant parameters are free in the fit, except for the energy-dependent nuclear recoil quenching factors and the baseline resolutions for phonon (σ P,0 ) and light detector (σ L,0 ).While the likelihood value does not provide a way to judge the quality of the fit, a comparison of the resulting bands and spectra to the data can help evaluate the fit.
Figure 5 shows the background data for TUM40 together with the resulting bands in the LY -E plane.Figure 6 shows the same bands but for the neutron calibration data.As one can see, the model accurately describes the position of the electron and the nuclear recoil bands.Also, the energy and  Figures 7 and 8 show the energy histograms for the background and neutron calibration data, respectively, together with the fit results.Also these plots show that the likelihood framework is capable of accurately modeling the γ-peaks, the β-decays, the ENR in the ncal data, and also the low energy excess.
The resulting fit values, including uncertainties, are compiled in table 3. Plots for the remaining detectors (Lise and Detector A) can be found in appendix 6, CSV-files with the fit results are attached as ancillary files.

Dark matter exclusion limits
Exclusion limits at 90 % confidence level for the DM particle-nucleon cross-section are calculated using the profile likelihood method.All fits in this routine are performed using a global (particle swarm) and a local (Nelder-Mead) minimization.The same free parameters as for the band fit are used, except for high-energy γ and β/γ peaks.They do not overlap with the possible signal region and can therefore be fixed in the limit calculation to increase the convergence speed of the fits.
To evaluate the performance of the profile likelihood, figure 9 compares the likelihood limits to published limits which were obtained using the Yellin optimum interval method [32].As one can see, the likelihood and Yellin approaches generally lead to comparable limits for our data sets.One exception is the limit of Lise (red) which is considerably stronger in the likelihood case.This outcome is expected, as Lise had a rather poorly performing light detector which leads to a significant leakage from the β/γ-bands to the nuclear recoil bands.The likelihood framework is able to model this background contribution, while the Yellin method treats it as a potential signal contribution weakening the resulting exclusion limit.It should be stressed that the likelihood limits for TUM40 and DetA only achieve comparable performance to the Yellin limits if (one or two, respectively) LEE excess contributions are allowed in the framework.Including the LEE is delicate, as its origin is not completely understood and its shape is similar (but not identical) to a potential DM signal.However, our knowledge of the LEE strongly disfavors a DM origin, justifying including an LEE background.In appendix 6, the reader may find a more detailed discussion including an illustration of the influence of the LEE on the exclusion limits.Fig. 9 Exclusion limits for the detectors TUM40 (blue), Lise (red), and Detector A (green).Dotted lines are limits previously published by the CRESST collaboration for the same detectors (TUM40 [9], Lise [8], and Detector A [2]), but calculated with Yellin's optimum interval method [32].Likelihood limits for the respective detectors from this work are depicted as solid, dashed, and dashed-dotted lines.So, samecolored lines are based on the same data (from the same detector) but calculated with different methods: dashed for Yellin's optimum interval (previous works) and solid for likelihood limits (this work).
The combination of detectors (see subsection 3.7) in the likelihood framework is evaluated using the TUM40 and Lise data.The strong differences between the two detectors should help to identify the strengths and weaknesses of a combined calculation.For low DM masses, where only Lise contributes to the sensitivity, the combined limit is identical to the Lise limit.For the remaining mass range, the combined limit typically ranges between the limits of two single detectors.This might not seem intuitive at first glance, but is a direct result of the profile likelihood ratio test, see equation 24.In summary, we show that the likelihood framework allows the combination of data from different detectors, but also that this does not automatically lead to a gain in sensitivity, in particular if data from detectors with very different features and performances are combined.

Conclusion and outlook
In this paper, we present a profile maximum likelihood framework for the description of data from scintillating cryogenic calorimeters in the scintillation light versus energy plane.We apply this model to data from three detectors from the CRESST DM experiment and find that the model accurately describes the data.The likelihood allows extracting precise information on the particle processes happening in the detector and the detector response, thereby making it a frame- Fig. 10 Likelihood limits obtained for Lise (no LEE, red) and TUM40 (one LEE, dashed blue) and a limit obtained from the combined likelihood function (dashed cyan).
work of very high value for the understanding and modeling of our detectors.Furthermore, it is the basis for the test of a potential DM signal in the data allowing for a discovery analysis as well as for setting limits on the DM-nucleon interaction crosssection.Since CRESST does not observe a potential DM signal in their data, we set exclusion limits in this paper and find comparable, or stronger, limits than those obtained previously using the Yellin optimum interval method.In addition, the likelihood framework enables the combination of data from different detectors in a straightforward way, which we also showcased here.
The likelihood framework discussed in this paper may lay ground for many future developments, such as, in particular, combining data from many detectors (O(100)) for the planned upgrade of the CRESST readout system.In addition, it may be used to test alternative DM models and to include background estimates obtained with Monte Carlo simulations directly [13,14].We also plan to extend the likelihood to include the time information as a third event observable to allow for properly evaluating time-dependent signals (e.g. the DM modulation signature) and backgrounds (e.g.radioactive decays, or the low-energy excess [21]).05A20VTA, and by the Austrian science fund (FWF): I5420-N and P 33026-N AnaCONDa.FW was supported through the Austrian research promotion agency (FFG), project ML4CPD.SG was supported through the FWF project STRONG-DM (FG1).JB and HK were funded through the FWF project P 34778-N ELOISE.The Bratislava group acknowledges partial support provided by the Slovak Research and Development Agency (projects APVV-15-0576 and APVV-21-0377).We are grateful to the Laboratori Nazionali del Gran Sasso -INFN for their generous support of CRESST.

Numerical minimization tools
The large number of parameters and their strong correlation present a significant challenge when maximizing the likelihood.
The efficiency is not an analytical function and therefore prevents the use of an analytical derivative.The nature of the log-likelihood function which is a sum over many data points makes a numerical estimation of the gradient neither accurate nor fast.Therefore, most of the algorithms used are gradient-free methods.Additionally, to assure the validity of the determined parameters as well as the profile likelihood ratio, precise convergence to the global maximum is important.No single algorithm is capable of handling such a diverse and challenging use case while being sufficiently fast.For this reason, a multitude of algorithms is used in this work.Since most optimization problems require the finding of a minimum, optimization methods are usually implemented as a minimization.The negative log-likelihood is used as an objective function for the minimization algorithms.The most important algorithms for this work and their implementation are presented in the following.A brief introduction to the method is presented; we refer to the corresponding reference for a more detailed explanation.
Two packages which implement various minimization algorithms included are used.The first one is the Optim.jlpackage [33], which contains various algorithms mainly focused on unconstrained optimization of uni-and multivariate functions.In addition, we use BlackBoxOptim.jl [34] and specializes in gradient-free, global optimization using meta-heuristic and stochastic algorithms.

Differential evolution
Differential Evolution optimization was first proposed by Storn and Price [35].Similar to Particle Swarm optimization, Differential Evolution algorithms are population-based methods.The difference to Particle Swarm Optimization is the way in which candidate solutions are created.In Differential Evolution strategies, candidate solutions are created by adding the weighted difference vector of two candidate solutions to a third one.The Differential Evolution algorithm used here is part of the "BlackBoxOptim.jl"[34] package; the implementation is called "Adaptive DE/rand/1/bin with radius limited sampling".This implementation does not take any starting values, instead, it probes the whole search space.Due to its fast and robust global convergence as well as not introducing any bias due to starting values, it is extremely well suited for an initial estimate of the parameters for a new dataset.

Particle swarm
This method uses a population of candidate solutions denoted "a particle" in this context.This population moves through the entire search space to find a global minimum.The velocity and direction of individual particles are influenced by the optimal solution found by the particle itself, by the best solution found by the nearest particles, as well as the best global solution.The implementation used here is called Adaptive Particle Swarm optimization and was first proposed in [36].This implementation improves global convergence compared to classical Particle Swarm implementations.The algorithm implements four evolutionary states, one of which is called 'jumping out', and improves global convergence at the cost of speed by moving particles away from current minima in search of better ones.The implemented Particle Swarm optimization takes starting values and is, therefore, well-suited if a rough estimate for the parameters exists.In addition, due to the global convergence properties, it is more robust against local convergence compared to the Nelder-Mead algorithm.While convergence for a small number of dimensions tends to be slower compared to the Nelder-Mead method, for high dimensions or strong correlations between parameters the Particle Swarm optimization converges faster.

Nelder-Mead
The Nelder-Mead method [37] -often called downhill simplex method -is a gradient-free direct search method.A simplex is constructed, containing information about the function value at different points.From this point, the algorithm performs one of four possible operations: reflection, expansion, contraction or shrinking.The purpose of this operation is to gradually replace points with high function values with points with lower values.The behavior of this algorithm can be tuned by tweaking the parameters associated with the four operations.The implementation used here utilizes the adaptive parameters introduced by Gao and Han [38] which provide better convergence characteristics in high dimensions.The algorithm usually provides fast and precise convergence to a local minimum.In this application, it is, therefore, usually used as the final step of the minimization to refine the minimum found by the global optimization routines.

Minuit
Since the Minuit package is well established in the particle physics community, for a detailed description the reader is referred to [39].We use MIGRAD as an alternative or in addition to Nelder-Mead for the minimization of the negative log-likelihood function.We also use MINUIT's built-in output of parameter uncertainties, where the symmetric uncertainties are based on the error/covariance matrix and the asymmetric error is based on MINOS.

Other algorithms
Various other minimization algorithms are implemented, but in most cases, they are outperformed by the ones mentioned before.An implementation of a conjugate gradient descent method [40] as well as a simulated annealing algorithm [41] from the "Optim.jl"package can be used.From the "Black-BoxOptim.jl"package a few natural evolution-based algorithms [42] and a generating set search method [43] can be employed.

Implementations
The results of this work were obtained with a non-public software named Romeo.Romeo is programmed in the Julia language [44].To cross-check Romeo a second program, named lxx, was developed independently.lxx is based on C++ and uses ROOT (version 6.18, [45]) for plotting and for the interface to the MINUIT minimizer [39].lxx and Romeo both are in the ballpark of 10,000 lines of code and reach similar speed 11 .Their results on the best fit L ( μ, θ θ θ ) perfectly agree within numerical accuracy.

Effect of the low energy excess (LEE)
We observe an excess of events at low energies for the modules TUM40 and Detector A which was discussed in section 3.3.7.In figure 11 we show a comparison of limits obtained with likelihood frameworks with no LEE contribution (solid), one LEE contribution (dashed), and two LEE contributions (dashed-dotted, only Detector A).For both detectors, including one, or two LEE contribution(s) significantly improves the limit, in particular for DM particle masses larger than 1 GeV/c 2 .This is not surprising as the LEE is similar, 11 Romeo is about twice as fast as lxx.but not identical, in shape to the DM spectrum.Therefore, if no LEE is allowed the likelihood fit is forced to increase the DM contribution to make model and data compatible, even if the shapes of the LEE and the DM spectrum are slightly different.
It should be noted that the inclusion of a background similar in shape to the expected signal has to be done with special care.For TUM40, a DM origin of the LEE can be safely excluded, as Lise with an even lower threshold does not show any LEE.For Detector A, a DM origin can also be quite robustly rejected due to several observations, in particular, its rate decaying with time, as shown in [21,20].

Fit results
Table 3 lists the results of the maximum likelihood fit (best fit) for the detector TUM40 with one LEE contribution for the energy range [E thr = 0.6 keV, 40 keV].The uncertainties were calculated with MINUIT (see section 6.1.4);the symmetric uncertainty is calculated via the covariance matrix and represents a symmetric uncertainty interval of the parameter around the best-fit value.The MINOS errors are also obtained with MINUIT and give an asymmetric uncertainty interval.The MINOS uncertainties are found by increasing/decreasing the parameter in question while simultaneously minimizing the negative log-likelihood for all other (N-1) parameters.The lower/upper uncertainty for a parameter then corresponds to an increase of the negative log-likelihood of 0.5 which in turn corresponds to one standard deviation.The main advantage of the computation-intensive MINOS procedure is that it fully takes into account correlations between parameters and non-linearities [39].MINUIT also provides a validity criterion (evaluated on maximal function calls, improvement on parameters and convergence) on the uncertainty interval which is listed in the last column.It should be noted that the scintillation efficiency η (parameter 28 in table 3) had to be fixed in the calculation of the MINOS errors to reach convergence.The value listed in the table, however, corresponds to the value obtained by the maximum likelihood fit.
As already discussed, we fit the background data (bck) and the neutron calibration data (ncal) simultaneously.All background components have to be also present with the same or a higher rate in the ncal data.With the ratio of the exposures between ncal and bck (par.26) one can turn the rate into an activity in the ncal data.This parameter is a free parameter in the fit, as it can be precisely determined by e.g. the cosmogenically activated γ-lines.We assume the same energy dependence of the efficiency (see figure 4) between bck and ncal data, but by leaving this parameter-free in the fit we can account for an overall lower efficiency in the ncal data which is expected due to the higher overall rate in the ncal data creating more pile-ups and impacting detector stability.The value obtained for par.26 is however in very well agreement with plausibility checks done on the data.For some background processes, we find a higher rate in the ncal data, thus we add a second component of the respective process only increasing the rate, but not the position/shape and mark these parameters with "ncal" in table 3.An example would be the Cu-fluorescence line at ∼ 8 keV (pars.47-49) where the rate in ncal is roughly five times higher than the bck rate.This is expected, as the fluorescence is triggered by the overall background level.For background components with the same rate in bck and ncal we set the ncal contribution to zero.It should be stressed that the likelihood framework allows extracting the rates in bck and ncal directly from data which is very valuable information in the identification of the background components.3: Maximum likelihood fit results for TUM40 including uncertainties.The column "Fixed" indicates whether a parameter was free in the fit (Fixed = False), and the lower and upper boundaries mark the fit boundaries for the parameters.The parameter η was free for the minimization but had to be fixed for the MINOS error calculation.A false MINOS validity indicates a potential issue in the calculation of the asymmetric MINOS uncertainty.In case, MINOS could not find a minimum at all, it returns the symm.uncertainty, otherwise, it returns its best estimate.For more details, the reader is referred to [46].

Fig. 1
Fig. 1 Schematic of a CRESST-III detector module (not to scale).Parts in blue are made of CaWO 4 ; the TESs are sketched in red.The block-shaped target (absorber) crystal has a mass of ∼24 g, and its dimensions are (20x20x10) mm 3 .It is held by three instrumented CaWO 4 holding sticks (iSticks), two at the bottom and one on top.Three non-instrumented CaWO 4 holding sticks keep the squareshaped silicon-on-sapphire light detector in place.Its dimensions are (20x20x0.4)mm 3 .Figure and caption from [2].

Fig. 3
Fig.3Schematic illustration of the same bands (electron band (blue), oxygen band (red) and tungsten band (blue)) as in figure2but in the light yield (LY ) versus energy (E) plane.The calcium band is not shown for clarity; it would be located between the oxygen and the tungsten band.

Fig. 4
Fig. 4 Efficiencies (= survival probabilities) for the detectors analyzed in this work as a function of injected (simulated) energy.The dashed vertical lines mark the energy threshold of the respective detector.The efficiencies were determined in the analyses in [2, 8, 9].

Fig. 5
Fig.5Result of the band-fit along with the background data of TUM40.The mean lines of the bands are shown as solid lines, while the dashed lines represent the lower and upper 90% lines.The population of events above the electron band is mostly attributed to excess light events.

Fig. 6
Fig. 6 Result of the band-fit along with the neutron calibration data of TUM40.The mean lines of the bands are shown as solid lines, while the dashed lines represent the lower and upper 90% lines.

Fig. 7
Fig. 7 Energy histogram of the background data of TUM40 including fit results.

Fig. 8
Fig. 8 Energy histogram of the neutron calibation data of TUM40 including fit results.

Fig. 11
Fig. 11 Likelihood limits for TUM40 and Detector A without lowenergy-excess (LEE) in solid, with one LEE component in dashed and two LEE components (only for Detector A) in dashed-dotted.

Table 1
Overview on the detector modules used in this work.All target crystals were made from CaWO 4 .TUM = Technical University Munich energy spectrum of INRs is modeled as a linear function with the constant P 0,INR and the parameter P 1,INR varying linearly with the distance to the excitation energy S INR .To account for the resolution, we convolve this function with the resolutions at S INR and the Q-value (denoted E INR here), resulting in the error functions in equation 17.