1 Introduction

Double beta decay (DBD) is an extremely rare nuclear process where a simultaneous transmutation of a pair of neutrons into protons converts a nucleus (A, Z) into an isobar (A, Z+2), with the emission of two electrons and two anti-neutrinos. This two-neutrino decay mode (\(2\nu \beta \beta \)) is predicted in the Standard Model and was detected in several nuclei. The neutrinoless mode of the decay (\(0\nu \beta \beta \)) is a posited Beyond Standard Model process that could shed light on many open aspects of modern particle physics and cosmology such as the existence of lepton number violation and elementary Majorana fermions, the neutrino mass scale, and the baryon asymmetry in the Universe [1,2,3,4,5]. Both DBD modes can proceed through transitions to the ground state as well as to various excited states of the daughter nucleus. While the former can be easier to detect through their shorter half-lives, the latter leaves a unique signature which may be detected with significantly reduced backgrounds. The excited state decays also provide powerful tests of the nuclear physics of DBD and can shed light on nuclear matrix element calculations as well as the ongoing discussion on the quenching of the effective axial coupling constant \(g_A\); eventually, they could even be used to disentangle the mechanism of \(0\nu \beta \beta \) decay [6].

So far, \(2\nu \beta \beta \) decay to the first \(0^+\) excited state has been observed in only 2 isotopes: \(^{100}\mathrm {Mo}\) [7] and \(^{150}\mathrm {Nd}\) [8], with half lives of \((\mathrm {T}_{1/2})^{2\nu }_{0^+} = 6.1^{+1.8}_{-1.1} \times 10^{20} \, \mathrm {yr}\) and \((\mathrm {T}_{1/2})^{2\nu }_{0^+} = 1.4^{+0.4}_{-0.2} \mathrm {(stat.)} \pm 0.3 \mathrm {(syst.)} \times 10^{20} \, \mathrm {yr}\), respectively. Searches for the same process in other isotopes has yielded lower limits from \(3.1 \times 10^{20} \, \mathrm {yr}\) to \(8.3 \times 10^{23} \, \mathrm {yr}\) at \(90 \, \% \) Confidence Level (C.L.) (see Ref. [9] for a review).

In this work, we focus on the search for \(0\nu \beta \beta \) and \(2\nu \beta \beta \) decays of \(\mathrm {^{130}Te}\) to the first \(0^{+}\) excited state of \(\mathrm {^{130}Xe}\) with the CUORE experiment. Presently, the strongest limits on the decay to excited states half-life of \(^{130}\mathrm {Te}\) come from a combination of Cuoricino [10] and CUORE-0 [11] data: the latter (not included in Ref. [9]) was published recently and includes the combination of the predecessor’s results. The obtained limits are:

$$\begin{aligned} (\mathrm {T}_{1/2})_{0^+_2}^{0\nu }&> 1.4 \times 10^{24} \, \mathrm {yr} \; (90\%\, \mathrm {C.L.}) \end{aligned}$$
(1a)
$$\begin{aligned} (\mathrm {T}_{1/2})_{0^+_2}^{2\nu }&> 2.5 \times 10^{23} \, \mathrm {yr} \; (90\%\, \mathrm {C.L.}) \end{aligned}$$
(1b)

for the \(0\nu \) and \(2\nu \) process, respectively. Theoretical predictions [12] on the \(\mathrm { (T_{1/2}) }^{2\nu }_{0^+_2}\) observable in the \(2\nu \beta \beta \) decay channel are based on the QRPA approach and favor the following range:

$$\begin{aligned} ^{th}(\mathrm {T}_{1/2})^{2\nu }_{0^+_2} = (7.2 - 16) \times 10^{24} \, \mathrm {yr} \end{aligned}$$
(2)

where the range depends on the precise treatment of \(g_A\). The lower bound assumes a constant function of the mass number A, and the upper bound assumes a value of \(g_A = 0.6\) [12,13,14].

A data driven estimate of the \(2\nu \beta \beta \) ground state to excited state decay rate in the IBM-II framework based on Refs. [15,16,17] is reported in Ref. [18] as

$$\begin{aligned} ^{th}( \mathrm {T}_{1/2})^{2\nu }_{0^+_2} = 2.2 \times 10^{25} \, \mathrm {yr}. \end{aligned}$$
(3)

In this regard, as stated before, both a measurement or a more stringent limit with respect to Ref. [11] are informative from the point of view of refining and validating the theoretical computations.

The decay to excited states has a unique signature. The double-beta decay emits two electrons, which share kinetic energy up to 734 keV. The subsequent decay of the excited daughter nucleus typically emits two or three high energy gamma rays in cascade. Due to the emission of such coincident de-excitation \(\gamma \) rays, both \(0\nu \beta \beta \) and \(2\nu \beta \beta \) decay channels allow a significant background reduction with respect to the corresponding transitions to the ground state. This holds especially in an experimental setup that exploits a high detector granularity, such as the CUORE experiment.

Fig. 1
figure 1

The decay scheme of \(^{130}\mathrm {Te}\) is shown with details about the involved excited states of \(^{130}\mathrm {Xe}\) up to its first \(0^+\) excited state. The nomenclature \(0^+_1, \ldots , 0^+_n\) indicates states with the same angular momentum in increasing order of excitation energy. An energy scale is shown (right) where the \(^{130}\mathrm {Xe}\) ground state is taken as reference [39]

2 Detector and data production

The Cryogenic Underground Observatory for Rare Events (CUORE) [19, 20] is a ton-scale cryogenic detector located at the underground Laboratori Nazionali del Gran Sasso (LNGS) in Italy. CUORE is designed to search for the \(0\nu \beta \beta \) decay of \(\mathrm {^{130}Te}\) to the ground state of \(\mathrm {^{130}Xe}\) [21, 22], and has a low background rate near the \(0\nu \beta \beta \) decay region of interest (ROI), an excellent energy resolution, and a high detection efficiency. The CUORE detector consists of a close-packed array of 988 TeO\(_2\) crystals operating as cryogenic bolometers [23,24,25] at a temperature of \(\sim \)10 mK. The CUORE crystals are \(5 \times 5 \times 5\) cm\(^3\) cubes weighing 750 g each, arranged in 19 towers: each consisting of a copper structure with 13 floors and 4 crystals per floor. A custom-made \(^3\)He/\(^4\)He dilution refrigerator, which represents the state of the art for this cryogenic technique, is used to cool down the CUORE cryostat, where the entire array is contained and shielded [26,27,28,29,30,31,32].

Each CUORE crystal records thermal pulses via a neutron-transmutation doped (NTD) germanium thermistor [33] glued to its surface. Any energy deposition in the crystal causes a sudden rise in temperature and can indicate the emission of a particle inside, the crossing of a particle through, or some environmental thermal instability (e.g. earthquakes).

The data acquisition and production of CUORE event data used in this work closely follows the procedure used in [26] and is described in detail in [34]. We briefly review the basic process here and highlight the differences.

The NTD converts the thermal signal to a voltage output, which is amplified, filtered through a 6-pole Bessel anti-aliasing filter, and sampled continuously at 1 kHz. The data are stored to disk and triggered offline with an algorithm based on the optimum filter (OF) [35,36,37].

For each triggered pulse, a 10 s window around each trigger (3 s before and 7 s after) is processed through a series of analysis steps, with the aim of extracting the physical quantities associated to the pulse. The waveform is filtered using an OF built from a pulse template, and the measured noise power spectrum.

The signal amplitudes are then evaluated from the OF filtered waveforms and those amplitudes are corrected for small changes in the detector gain due to temperature drifts. We calibrate each bolometer individually using dedicated calibration runs with \(\mathrm {^{232}Th}\) and \(\mathrm {^{60}Co}\) gamma sources deployed around the detector array. These calibration runs typically last a few days every two months.

We impose a pulse shape selection (PSA) based on 6 pulse shape parameters. This cut removes noisy events, pileup events, and non-physical events.

Unlike the decay to ground state search described in Ref. [26], the physics search described in the present work focuses on coincident energy depositions in multiple crystals. In particular, we are focusing on events where energy is deposited in either two or three bolometers. As the reconstructed time difference between events on nearby bolometers is affected by differences in pulse rise times, a bolometer-by-bolometer correction is applied. Sets of coincident energy releases in M bolometers within a \(\pm 5\,\) ms time window are grouped together as multiplets of multiplicity M.

CUORE started its data taking in May 2017 and, after two significant interruptions for important maintenance of the cryogenic system, is now seeing its exposure grow at an average rate of \(\sim \) 50 kg\(\times \)yr/month. The CUORE data collection is organized in datasets: a dataset begins with a gamma calibration campaign that typically lasts 2–3 days, followed by 6–8 weeks of uninterrupted background data taking, and ends with another gamma calibration.

Recently, the CUORE collaboration released the results of the search for \(0\nu \beta \beta \) decay to the g.s. on the the accumulated exposure of 372.5 kg\(\cdot \)y, setting an improved limit on the half-life of \(\mathrm {^{130}Te}\) of \((\mathrm {T_{1/2}})^{0\nu }_{0^+_1} > 3.2\times 10^{25}\) yr [22].

Table 1 The de-excitation \(\gamma \) rays emitted by \(^{130}\mathrm {Xe}^*\) in the transition from the \(0^+_2\) to the ground state. Each row corresponds to a different path through intermediate states. The energies of the emitted \(\gamma \)s are listed, in order of energy, along with the branching ratio (BR) of each pattern [39]

3 Analysis

In this section we describe the analysis steps that are specific to the search for \(^{130}\mathrm {Te}\) decay to the excited states of \(^{130}\mathrm {Xe}\). The de-excitation of the \(^{130}\mathrm {Xe}\) nucleus follows one of three possible patterns, i.e. paths through states of decreasing energy from the \(0^+_2\) to the \(0^+_1\) ground state (Fig. 1). Details about the probability of each de-excitation pattern, referred in the following as A, B and C (in decreasing order of probability), and the energy of the emitted \(\gamma \) rays are reported in Table 1.

The simultaneous emission of DBD betas and de-excitation gammas produces coincidence multiplets, i.e. sets of simultaneous pulses in M bolometers, grouped by the coincidence algorithm. We search for events with full containment of the final state gammas in the crystals: more specifically we try to avoid multiplets where one or more of the final state \(\gamma \)s escape the source crystal and are absorbed by some non-active part of the experimental apparatus, or Compton scattering events, where the energy of a single de-excitation gamma is split among two or more detectors. We place energy selection cuts to find these events, which are listed in Table 3 and described in more details in Sect. 3.4.

Partitions are defined as unique groupings of energy depositions that pass a particular set of energy selection cuts. For a fixed multiplicity M and a source pattern, they are identified by all possible ways of partitioning the final state particles in M different crystals. Finally we define signatures as partitions from different patterns that are indistinguishable. Single-site (\(M = 1\)) signatures are not taken into account, as the \(0\nu \beta \beta \) decay channel would be indistinguishable from the same decay on the ground state of \(^{130}\mathrm {Xe}\), while the \(2\nu \beta \beta \) decay channel, instead, would suffer from high background from the decay to the ground state. Therefore, there remain 8 partitions for patterns A and B, and 14 for pattern C. Each of the partitions is labelled with strings of 3 characters with the following convention

$$\begin{aligned}{}[\textit{Multiplicity}] [\textit{Pattern}] [\textit{Index}] \end{aligned}$$

where Multiplicity = 2, 3, 4 indicates the number of involved crystals, Pattern = A, B, C stores the originating de-excitation pattern, and Index is a unique integer counter to distinguish the various combinations of energy groupings for that pattern and multiplicity. Partitions sharing the same expected energy release are indistinguishable and are merged as signatures. For this reason, instead of handling a total number of 22 partitions we are left with 15 signatures [38]. An example of indistinguishable partitions is given in Table 3 by the 2A0–2B1 signature. In one of the crystals a gamma energy release of 1257 keV is expected. This can be either due (see Table 1) to \(\gamma _1\) from pattern A or the simultaneous absorption of \(\gamma _1 + \gamma _2\) from pattern B.

3.1 Monte Carlo simulations

We use Monte Carlo (MC) simulations to compute the detection efficiency (Sect. 3.2) and the expected background (Sect. 3.3) associated with each signature, to rank experimental signatures and eventually fine tune selection cuts on the most sensitive ones (Sect. 3.4).

CUORE uses a Geant4-based MC to simulate energy depositions in the detector. The Geant4 software [40] simulates particle interactions in the various volumes and materials of a modeled detector geometry. A separate post-processing step converts the resulting energy depositions into an output as close to the output of the data production as possible. We refer to Ref. [41] for further details about the CUORE MC simulations.

Signal simulations, that are simulations of the double beta decay to excited states and the subsequent de-excitation gammas, are produced separately for each process (\(0\nu \),\(2\nu \)) and pattern (A, B, C). Gamma energies are generated as monochromatic. Angular correlations induce a negligible effect on the containment efficiency of the experimental signatures listed in Table 3 as opposed to isotropic gamma emission and compared with the dominant systematic uncertainty described in Sect. 5.1. Beta energies are randomly extracted from the beta spectrum of the corresponding decay [16, 42,43,44] in the HSD hypothesis.Footnote 1

Background simulations take as input the CUORE background model [45], and include contaminations in the crystal and several other parts of the CUORE setup,Footnote 2 such as: the copper tower structure, the closest copper vessel enclosing the detector, the Roman lead, the internal and external modern lead shields and the internal lead suspension system. The contaminants include bulk and surface \(^{238}\)U and \(^{232}\)Th chains with different hypotheses on secular equilibrium breaks, bulk \(^{60}\)Co, \(^{40}\)K, and a few other long lived isotopes. Additional sources of background included are the cosmic muon flux and the \(2\nu \beta \beta \) decay of \(^{130}\)Te to the ground state. Both signal and background simulated energy spectra are convolved with a Gaussian resolution that has a width of \(5~\mathrm {keV}\) full width at half maximum, a standard choice for our simulation studies [41].

3.2 Efficiency evaluation

The detection efficiency of a given signature consists of two components: the containment efficiency and the analysis efficiency. Given a signature s and a set of energy selection cuts on the involved bolometers, the corresponding containment efficiency \(\varepsilon _s\) represents the probability that the energy released by a nuclear decay of \(^{130}\mathrm {Te}\) to the \(0^+_2\) state of \(^{130}\mathrm {Xe}\) matches the topology of the signature. We evaluate this efficiency component from the signal MC simulations described in Sect. 3.1, by summing over the contributions of all patterns p populating the signature s

$$\begin{aligned} \varepsilon _s = \left[ \sum _{p} \mathrm {BR}_p \cdot \frac{ \big [ N^{(sel)}_{MC} \big ]^{(s)}_p }{ \big [N^{(tot)}_{MC}\big ]_p\,\,\, } \right] \end{aligned}$$
(4)

where \(\mathrm {BR}_p\) is the branching ratio of pattern p, \(\big [ N^{(sel)}_{MC} \big ]^{(s)}_p\) and \(\big [ N^{(tot)}_{MC} \big ]_p\) are respectively the selected and total number of simulated decays in the de-excitation pattern of interest. For \(0\nu \beta \beta \) decay signatures the signal is monochromatic in all the involved crystals, so the signal region is expected to lie around a specific point in the M-dimensional space of coincident energy releases. A selection is enforced, in simulations, with a box cut, i.e. a selection interval for energy releases in each crystal, defined as

$$\begin{aligned} | E_i - Q_i | < 5 \, \mathrm {keV} \quad \mathrm {where} \quad i = 1 \cdots M, \end{aligned}$$
(5)

where \(E_i\) is the reconstructed energy release in the ordered energy spaceFootnote 3 and \(Q_i\) is the corresponding expected energy release. For \(2\nu \beta \beta \) decay signatures the same selections apply except the one crystal where the energy release from the \(\beta \beta \) is expected. Since the emitted neutrinos carry away an unknown (on an event basis) amount of undetected energy, the expected energy release is not monochromatic. It is instead expected to vary from \(Q_{j}^{min}\) to \(Q_{j}^{max}\) where j indicates the bolometer where the \(\beta \beta \) release their energy. For that bolometer, in each multiplet the following selection is applied

$$\begin{aligned} Q_{j}^{min} - 5 \, \mathrm {keV}< E_j < Q_{j}^{max} + 5 \, \mathrm {keV}. \end{aligned}$$
(6)

Selection cuts need to be further tuned at a later stage to optimize the sensitivity to signal peaks. We do this including the widest possible sidebands around each signal peak in order to best constrain the underlying continuous background. We try to avoid including background peaks in the fit range, in order to minimize systematics due to their modeling. This process yields the selections listed in Table 3 (see Sect. 3.4). We then update our computation of signal efficiencies using Eq. 4, where \(N^{sel}_{MC}\) is replaced by the result of a Gaussian fit to the distribution of selected MC signal events.

The second efficiency contribution, namely the analysis efficiency, is the combination of the probability of correctly detecting and reconstructing the energy deposited in each bolometer (cut efficiency, \(\varepsilon ^{cut}\)), and the probability of assigning the correct multiplicity and avoiding an accidental coincidence (accidentals efficiency, \(\varepsilon ^{acc}\)).

The cut efficiency term is named after the data processing cuts needed to select triggered events that pass the base and PSA cuts (see Sect. 2). The method used to calculate this efficiency follows closely what was used in [22]. We measure the efficiency of correctly triggering, reconstructing the pulse energy and the pile-up contribution (base cuts) from heater pulses. The base cut efficiency is computed on each bolometer-dataset pair given the large number of available heater events, and then averaged to obtain a per-dataset value. The PSA cut efficiency is extracted from two independent samples of events: either coincident double-site events where the total energy released is compatible with known prominent \(\gamma \) lines, or single-site events due to fully absorbed \(\gamma \) lines. The first sample includes events whose energy spans a wide range, and allows the determination of the PSA cut efficiency dependence on energy. The second sample has a higher statistics but provides a measurement at the energies of the selected \(\gamma \) peaks only, rather than on a continuum. We evaluate for each dataset the PSA cut efficiency term as the average of the two efficiencies obtained from such samples. We treat the difference as a systematic effect. The \(\varepsilon ^{cut}\) term must be raised to the \(M^{th}\) power because it models bolometer-related efficiencies and a multiplet is selected if and only if all of the involved bolometers pass the selection cuts.

The accidentals efficiency term \(\varepsilon ^{acc}\) is obtained, separately for each dataset, as the survival probability of the \(^{40}\)K \(\gamma \) line at 1460 keV. A fully absorbed \(^{40}\)K line in CUORE is uncorrelated to any other physical event because it follows an electron capture of a \(\sim 3\) keV shell, which is below threshold.

Summarizing, the total signal detection efficiency for signature s is

$$\begin{aligned} \varepsilon ^{tot} = \varepsilon _s\times (\varepsilon ^{cut})^M \times \varepsilon ^{acc}. \end{aligned}$$
(7)

Since the cut efficiency and accidentals term are evaluated separately for each dataset, the total efficiency term in Eq. 7 must be thought of as the signal efficiency for signature s for a specific dataset. A summary of the relevant efficiency values is provided in Table 2, where the per-dataset values are exposure weighted over all datasets. The containment efficiency is the dominant term.

Table 2 We report the efficiency terms that appear in Eq. 7 separately for the \(0\nu \beta \beta \) (top) and \(2\nu \beta \beta \) (bottom) analyses. The containment term dominates the efficiency. We report the cut efficiency raised to power M according to the signature it refers to. We quote effective values computed as exposure weighted mean for the cut and accidentals efficiency terms. All values are percentages, the uncertainty on the last digit is included in round brackets

3.3 Background contributions

Radioactive decays and particle interactions other than \(\mathrm {^{130}Te}\) decay to \(\mathrm {^{130}Xe}\) excited state, may mimic the process we search for. We estimate this background contribution by means of background MC simulations described in Sect. 3.1. We combine background simulations of different sources, according to the CUORE background model, and from the simulated background spectra we compute the expected number of background counts for each signature \(B_s\), by counting the expected events from each source included in the background model, and summing the contributions from all sources. We apply the same tight selection cuts around the signal region defined in Eqs. 5 and 6.

We use \(B_s\) to evaluate an approximate sensitivity for each signature and ultimately select the ones that will enter the analysis (see Sect. 3.4). Once the signatures that enter the analysis are selected, we optimize the selection cuts around the signal region in order to reject background structures while leaving the widest possible sidebands around the expected signal position. In this way we can parameterize the background with an appropriate analytical function, whose shape is dictated by background simulations, and use that to perform the final analysis (see Sect. 4.1). With this method we infer the number of reconstructed background events in each signature from data, rather than relying just on simulations.

Table 3 Selected experimental signatures for DBD search on the \(0^+_2\) excited state of \(^{130}\mathrm {Xe}\) in the \(0\nu \beta \beta \) (top) and \(2\nu \beta \beta \) (bottom) channel are listed. For each signature the corresponding Regions Of Interest (ROI, i.e. the applied selection cuts) are listed in terms of the ordered energy releases \(E_{1} \ge E_{2} \ge E_{3}\). The component that will be used for the fit is highlighted with a \(^*\) superscript. For each signature the partition of the secondaries expected to contribute are listed. For each secondary we report in round brackets the expected energy release in keV. The last row reports the corresponding relative score (Eq. 10)

3.4 Experimental signature ranking

The 15 unique signatures under analysis have different signal efficiencies and backgrounds, and thus different detection sensitivities of the signal. In this section we evaluate an approximate sensitivity of each signature and reduce the 15 signatures down to the most sensitive subset.

We analytically evaluate the discovery sensitivity of signature s starting from a background-only model for the total number of counts observed in a single bin centered at the expected signal position. In background-free signatures \(B_s \ll 1\) we assume an exponentially decaying prior P\((\mu ) = e^{-\mu }\) where \(\mu \) is the true value of the number of background counts. In background-limited ones \(B_s \gg 1\) we assume a Gaussian prior whose mean and variance are \(B_s\). We define the discovery sensitivity as the minimum number of observed counts \(N_s\) such that the probability of observing \(N > N_s\) counts in the background-only model is smaller than a given threshold \(p_{th}\). Then, from \(N_s\), we extract the corresponding half life sensitivity

$$\begin{aligned} {\tilde{S}}_{1/2}(\varepsilon _s,B_s) = \bigg [ \frac{ \ln (2) \, M\Delta t \, N_A \, \eta (^{130}\mathrm {Te})}{m(\mathrm {TeO}_2)} \bigg ] S(\varepsilon _s,B_s) \end{aligned}$$
(8)

where M is the detector mass, \(\Delta t\) its live time, \(N_A\) the Avogadro constant, \(\eta (^{130}\mathrm {Te}) = (34.167 \pm 0.002) \%\) [46] the isotopic abundance of \(^{130}\mathrm {Te}\) in natural tellurium, \(m(\mathrm {TeO}_2) = 159.6\) g/mol the molecular mass of a tellurium dioxide molecule [46] and \(S(\varepsilon _s,B_s)\) is a score function

$$\begin{aligned} S(\varepsilon _s,B_s) = {\left\{ \begin{array}{ll} \frac{\varepsilon _s}{-\ln (p_{th})} &{} B_s < B_{th} \\ \frac{\varepsilon _s}{n_{\sigma }(p_{th})\sqrt{B}_s} &{} B_s \ge B_{th} \end{array}\right. } \end{aligned}$$
(9)

where \(n_{\sigma }(p_{th})\) is the number of Gaussian sigma which correspond to \(p_{th}\), and \(B_{th}\) sets the transition from the background-free approximation to the background-limited approximation making \(S(\varepsilon _s,B_s)\) continuous. For \(n_\sigma = 5\), \(p_{th} \sim 3 \times 10^{-7}\) and \(B_{th} \sim 9\). We note though that all signatures have a number of expected background counts either \(<1\) or \(>10\) and their ranking would not be affected by a different choice of the \(B_{th}\) threshold.

We compute the relative score of each signature s as

$$\begin{aligned} R_s \doteq \frac{S(\varepsilon _s,B_s)}{\sum _{s'} S(\varepsilon _{s'},B_{s'})} \end{aligned}$$
(10)

where \(s'\) is an index running on all experimental signatures. The efficiency term \(\epsilon _s\) only includes the MC-based containment efficiency term. This is acceptable for the computation of an approximate analytical score function, since the containment term by far dominates the overall efficiency (Table 2).

We set a threshold of \(R_s > 5 \%\) and we identify three signatures both for the \(0\nu \beta \beta \) and \(2\nu \beta \beta \) decay search, namely the 2A0–2B1, 2A2–2B3 and 3A0 listed in Table 3. The selected experimental signatures account for a majority of the sensitivity contributions among studied signatures. The sum of their scores accounts for \(84 \% \, (87 \%)\) total score of all signatures in the \(0\nu \beta \beta \, (2\nu \beta \beta )\) search respectively.

4 Physics extraction

We use a phenomenological parameterization of the background in the fitting regions (as opposed to using the predicted spectra from the MC), hence real data are required to tune the fit. To avoid biasing our results, we build the fit (Sect. 4.1) on blinded data using the blinding procedure described in Sect. 4.2.

4.1 Fitting technique

We extract the \(0\nu \beta \beta \) decay rate and \(2\nu \beta \beta \) decay rate of \(^{130}\mathrm {Te}\) to the \(0^+_2\) excited state of \(^{130}\mathrm {Xe}\) using two separate Bayesian fits. For the single process (\(0\nu \) or \(2\nu \)) the fit is run simultaneously on all the involved signatures. Every multiplet of multiplicity M can be represented as a point \(\mathbf {E}_{ev}\) in a M-dimensional space of reconstructed energies. The energy releases are ordered so that \(E_i > E_{i+1} \,\, \forall i = 1, .. ,M-1\). For each signature, one of the components of the \(\mathbf {E}_{ev}\) vector is selected to perform the fit. This is referred to as projected energy, and indicated with a \(^{*}\) superscript in Table 3. In the following we will denote this energy as \(E_{ev}\).

An unbinned Bayesian fit is implemented with the BAT software package [47]. It allows simultaneous sampling and maximization of the posterior probability density function (pdf) via Markov Chain Monte Carlo. The likelihood can be decomposed, for each signature and dataset, as follows:

$$\begin{aligned} \begin{aligned} \log {\mathcal {L}}_{s,ds}&= - ( \lambda _{S \, s,ds} + \lambda _{B \, s,ds} ) \\&\quad + \sum _{ev \in (s,ds)} \log \bigg [ \lambda _{S \, s,ds} \xi _{bo,ds} f_S( E_{ev} ) \\&\quad + \lambda _{B \, s,ds} \xi _{bo,ds} f_B( E_{ev} ) \bigg ] \end{aligned} \end{aligned}$$
(11)

where the subscripts s, bo, ds will be used to refer to a specific signature, bolometer, or dataset respectively. The form of Eq. 11 is the same for \(0\nu \beta \beta \) and \(2\nu \beta \beta \), the \(\lambda _S\) and \(\lambda _B\) terms are the expected number of signal and background events respectively, \(\xi _{bo,ds}\) is the ratio between the exposure of bolometer bo to the exposure of dataset ds, \(f_S\) and \(f_B\) are the normalized signal and background pdfs. They depend just on the projected energy variable \(E_{ev}\).

The response function of CUORE bolometers to monochromatic energy releases has a functional form defined phenomenologically for each bolometer-dataset pair [48] [49] as the superposition of 3 Gaussian components to account for non-Gaussian tails. A correction for the bias in the energy scale reconstruction is implemented together with the resolution dependence on energy (see Ref. [21] for more details). The signal term \(f_S(E_{ev})\) models such shape in the bolometer-dataset pair the projected energy \(E_{ev}\) was released in. The expected number of signal counts can be written as

$$\begin{aligned} \begin{aligned} \lambda _{S \, s,ds}&= \Gamma ^{(p)}_{\beta \beta } [\mathrm {yr}^{-1}] \bigg [ \frac{N_A \, 10^3 \, \eta (^{130}\mathrm {Te})}{m(\mathrm {TeO}_2)\,[\mathrm {g/mol}]} \bigg ] \epsilon _{s} \cdot \\&\quad \cdot (M\Delta t)_{ds}\,\mathrm {[kg\cdot yr]} \, (\epsilon ^{cut})_{ds}^M \,(\epsilon ^{acc})_{ds} \end{aligned} \end{aligned}$$
(12)

where \(\Gamma ^{(p)}_{\beta \beta }\) is the decay rate of process p and the other parameters were introduced following Eq. 8. The \(\Gamma _{\beta \beta }^{(p)}\) parameter describes the rate of the process \(p = 0\nu , 2\nu \) under investigation and is given in both cases a uniform physical prior, \(\Gamma _{\beta \beta } > 0\).

The background term \(f_B(E_{ev})\) is parameterized as

$$\begin{aligned} f_B(E_{ev}) = \frac{1}{\Delta E} \bigg [ 1 + m_{s} \big ( E_{ev} - E^{(s)}_0 \big ) \bigg ] \end{aligned}$$
(13)

where \(\Delta E = E^{max}_{s} - E^{min}_{s}\) is the width of the region of interest, \(E^{(s)}_0\) is the center of it, and \(m_s\) describes the slope of the background for signature s. The normalization of the background term represents the number of expected background counts

$$\begin{aligned} \lambda _{B \, s,ds} = \mathrm {BI}_{s} \, (M\Delta t)_{ds} \, ( E^{max}_{s} - E^{min}_{s} ) \end{aligned}$$
(14)

where BI\(_s\) is the background index for signature s. Background simulations suggest that a uniform event distribution is enough to describe the continuous background in all signatures except the \(2\nu \beta \beta \) 2A0–2B1. For this reason the \(m_s\) parameter is included only when necessary. The background is fully described by the BI\(_s\) and \(m_s\) which, together, make 4 (3) nuisance parameters in the \(2\nu \beta \beta \) \((0\nu \beta \beta )\) case respectively, that will be marginalized over. The prior for background indices BI\(_s\) and slopes \(m_s\) is uniform.

Table 4 Results of the blinded fit to \(0\nu \beta \beta \) (top) and \(2\nu \beta \beta \) (bottom) candidate events in the signatures of Table 3. For each parameter the mean and standard deviation of the corresponding marginalized posterior distribution are reported. These values are used only as input to the sensitivity studies and fit validation. Final results reported in Table 5

The combined log-likelihood reads

$$\begin{aligned} \log {\mathcal {L}}({\mathcal {D}}|\mathrm {H}_{S+B}) = \sum _{s,ds} \log {\mathcal {L}}_{s,ds} \end{aligned}$$
(15)

where \(\mathrm {H}_{S+B}\) indicates that the likelihood is written in the signal-plus-background model hypothesis H\(_{S+B}\), i.e. that the existence of the process of interest is assumed. The background-only hypothesis H\(_B\) is a particular case that can be obtained by setting \(\Gamma _{\beta \beta } = 0\).

4.2 Blinding and sensitivity

We blind the data by injecting simulated signal events into the experimental spectrum. We inject a random and unknown number of fake signal events that would correspond to an event rate larger than the current 90% upper limit [10, 11]. Then we compute the expected number of counts in each signature, according to known efficiencies and exposures, for each dataset. Each generated signal event is randomly assigned a bolometer, according to its exposure within the considered dataset. Finally the projected energy of the signal event is generated according to the detector response function \(f_S(E_{ev}|Q_s)\) centered at the expected position \(Q_s\) of the monochromatic energy release in the projected energy space.

The injection rate of the simulated signal events is comprised between:

$$\begin{aligned} \Gamma ^{min}_p = 1 \cdot 10^{-23} \, [\mathrm {1/yr}] \quad \mathrm {and} \quad \Gamma ^{max}_p = 5 \cdot 10^{-23} \, [\mathrm {1/yr}] \end{aligned}$$
(16)

We then fit the blinded datasets to get data driven estimates of the background levels in each fitting window. These background estimates are used as inputs to our sensitivity studies in the next section.

The results of the fits to the blinded data are reported in Table 4. We see a non-null background for both the 2A0–2B1 and 2A2–2B3 signatures. No background is expected for the 3A0 signature.

To extract the median half-life sensitivity for each decay, we generate \(10^4\) background-only Toy Monte Carlo simulations (ToyMC), using the numbers in Table 4. A background-only ToyMC simulation is an ensemble of simulated datasets, according to the following procedure which is iterated \(N_{toy}\) times, to produce the same number of ToyMC ensembles. We define a set of signatures, together with the multiplicity and cuts in the ordered energy variables that identify candidate events. For each signature, we set a functional form for the background pdf, either constant or linear, and sample a value from the posterior pdf of the corresponding blinded fit. We compute the number of expected background events for each signature and dataset according to Eq. 14 and sample the actual number of background events from a Poisson distribution with expectation value equal to the number of expected counts.

We store each simulated ToyMC event as a vector of ordered energy releases \(\mathbf {E}_{ev}\) and related bolometers \(\mathbf {\mathrm {ch}}_{ev}\), where the bolometers are randomly extracted from the active bolometers of each dataset according to their exposure in the data, while the energies are generated according to the selected shape of the background pdf computed with the parameters (e.g. background index) generated according to the posterior pdfs obtained with the blinded fit to the data.

We then fit each ToyMC with the signal-plus-background model \(\mathrm {H}_{S+B}\) and compute the lower limit for the decay half life from the \(90 \%\) quantile of the marginalized posterior pdf for the decay rate parameter. We show the distribution of such limits in Fig. 2 for both the \(0\nu \beta \beta \) and \(2\nu \beta \beta \) decay process. We quote the half-life sensitivity as the median limit of the ToyMCs (Table 4). They are respectively: \(\mathrm {S^{0\nu }_{1/2} = (5.6 \pm 1.4) \times 10^{24} \, \mathrm {yr}}\) for the \({0\nu }\) decay and \(\mathrm {S^{2\nu }_{1/2} = (2.1 \pm 0.5) \times 10^{24} \mathrm {yr}}\) for the \({2\nu }\) decay where the uncertainty is the MAD of the corresponding distribution.

Fig. 2
figure 2

Distribution of \(90 \%\) C.I. marginalized upper limits on \(\mathrm {T_{1/2}} = \log 2 / \Gamma \) for \(0\nu \beta \beta \) decay (top) and \(2\nu \beta \beta \) decay (bottom) obtained from Toy MC simulations. We obtain a median sensitivity of \(S^{0\nu }_{1/2} = 2.1 \times 10^{24}\) yr and \(S^{2\nu }_{1/2} = 5.6 \times 10^{24}\) yr (black dashed line), compared to the 90% C.I. limit from this analysis (red solid line)

Fig. 3
figure 3

Result of the unbinned fit plotted on binned data. Error bars are just a visual aid and correspond to the square root of the bin contents. We show the best fit curve (blue solid), its signal component (blue dashed) at the global mode of the posterior for \(0\nu \beta \beta \) (left) and \(2\nu \beta \beta \) (right), and the 90% C.I. marginalized limit on the decay rate (red solid)

5 Results

We show in Fig. 3 and Table 5 the results of the fit to unblinded data for both \(0\nu \beta \beta \) and \(2\nu \beta \beta \). Though the data are binned for graphical reasons, the analysis is unbinned. Including contributions from all sources of systematic uncertainty listed in Table 6, no significant signal is observed in either decay mode. The global mode of the joint posterior pdf for the rate parameter is

$$\begin{aligned} {\hat{\mathrm{\Gamma }}}^{0\nu }_{\beta \beta } = 4.0_{-4.0}^{+3.0} \times 10^{-26} \mathrm {yr}^{-1} \end{aligned}$$
(17a)
$$\begin{aligned} {\hat{\mathrm{\Gamma }}}^{2\nu }_{\beta \beta } = 2.2_{-1.9}^{+1.7} \times 10^{-25} \, \mathrm {yr}^{-1} \end{aligned}$$
(17b)

whereas we quote the uncertainty as the marginalized \(68~\%\) smallest interval. We report the marginalized posterior pdf and the \(90\%\) C.I. in Fig. 4. We include the marginalized posterior pdfs for individual background parameters in Fig. 5\((0\nu \beta \beta )\) and Fig. 6\((2\nu \beta \beta )\). Their means agree with the corresponding results from blinded data within one standard deviation. We observe a slight negative background fluctuation (i.e. limit stronger than expected) in the \(0\nu \beta \beta \) decay analysis and a positive one (i.e. limit looser than expected) in the \(2\nu \beta \beta \) decay analysis with respect to the median \(90\%\) C.I. limit. The probability of setting an even stronger (looser) limit in the \(0\nu \beta \beta \) (\(2\nu \beta \beta \)) decay analysis is respectively \(45.1\%\) and \(10.4\%\). Null signal rates are included in the \(1\sigma \) (\(2\sigma \)) smallest C.I.Footnote 4 of the marginalized pdf for the \(\mathrm {\Gamma }^{0\nu }_{\beta \beta }\) (\(\mathrm {\Gamma }^{2\nu }_{\beta \beta }\)) rate parameter respectively. The following Bayesian lower bounds on the corresponding half life parameters are set:

$$\begin{aligned} \big ( \mathrm {T_{1/2}} \big )^{0\nu }_{0^+_2}&> 5.9 \times 10^{24} \, \mathrm {yr} \quad (\mathrm {90\% \, \, C.I.}) \end{aligned}$$
(18a)
$$\begin{aligned} \big ( \mathrm {T_{1/2}} \big )^{2\nu }_{0^+_2}&> 1.3 \times 10^{24} \, \mathrm {yr} \quad (\mathrm {90\% \, \, C.I.}) \end{aligned}$$
(18b)
Table 5 We report here mean and standard deviation of the marginalized posterior distributions for the decay rate and background parameters for each signature, derived from unblinded data. The S\(_{1/2}\) parameter indicates the median expected sensitivity for limit setting at \(90\%\) C.I. on the T\(_{1/2}\) parameter together with the MAD of its distribution. The last row reports the marginalized \(90\%\) C.I. Bayesian lower limit on the decay half life. All results come from the combined fit with systematics
Table 6 Systematic uncertainties. We report each effect separately and their combination on the marginalized \(90\%\) C.I. T\(_{1/2}^{90\%}\) limit on the \(0\nu \beta \beta \) and \(2\nu \beta \beta \) decay half life

The results reported in this Article represent the most stringent limits on the DBD of \(^{130}\)Te to excited states and improve by a factor \(\sim 5\) the previous results on this process.

5.1 Systematic uncertainties

The major sources of systematic uncertainty are: signal efficiencies, the detector response function, energy calibration, and the uncertainty of the isotopic abundance of \(^{130}\)Te (see Table 6). Each systematic uncertainty can be introduced as a set of nuisance parameters in the fit with a specified prior distribution. Each set of nuisance parameters can be activated independently with the priors listed in Table 6. We individually monitor the effect of activating each source of systematic uncertainty repeating the fit and comparing the 90% C.I. Bayesian limit on the half life with respect to the minimal model, where we describe all sources of systematics with constants rather than fit parameters. Finally, we repeat the fit activating all additional nuisance parameters at once.

We include, for each dataset, two separate parameters to model different sources of uncertainty in the cut efficiency evaluation, and replace the \(\varepsilon ^{cut}\) constant with the sum of \(\varepsilon ^{cut\,(I)} + \varepsilon ^{cut \, (II)}\). We refer to cut efficiency I to parameterize the uncertainty due to the finiteness of the samples of pulser events and \(\gamma \) decays used to extract the cut efficiency. Its prior is Gaussian with mean equal to \(\varepsilon ^{cut}\) and width equal to the corresponding uncertainty (see Table 2). The additional cut efficiency II is uniformly distributed, with zero mean. It models the systematic uncertainty due to the PSA efficiency, shifting the cut efficiency by at most \(0.7\%\). The accidentals efficiency contributes to the systematic uncertainty just with the uncertainty due to limited statistics in the \(^{40}\)K peak. We add one nuisance parameter per dataset with a Gaussian distributed prior to model this effect. The containment efficiency is instead affected by uncertainty due to the simulation of Compton scattering events at low energy. The uncertainty due to the number of simulated signal events is negligible. We take the ratio of the Compton scattering attenuation coefficient to reference data [50] as a measure of the relative uncertainty on this efficiency term. We account for this, for each signature, introducing a nuisance containment efficiency parameter with a Gaussian prior. The \(^{130}\)Te natural isotopic abundance on natural Te is modeled with a single global nuisance parameter with a Gaussian prior \(\eta = (34.167 \pm 0.002) \%\). Both the detector response function shape and the energy scale bias are evaluated from data, as anticipated in Sect. 4.1. Each effect is separately parameterized with a 2nd order polynomial as a function of energy, whose coefficients are evaluated with a fit to the 5–7 most visible peaks in each dataset. The uncertainty and correlations among such parameters are themselves a source of systematic uncertainty, and are included as 2 independent sets of correlated parameters per dataset, with a multivariate normal prior distribution. In this way, the detector response function width and position are allowed to float within their uncertainty.

Uncertainty in modeling the detector response function leads to the dominant systematic effect on the limit, which is below a 1% shift. Sub-dominant effects come from the energy scale bias and the containment efficiency (Table 6).

Fig. 4
figure 4

Marginalized decay rate posterior pdf for \(0\nu \beta \beta \) (top) and \(2\nu \beta \beta \) (bottom) from the combined fit with all systematics. We show the \(90\%\) C.I. in gray

Fig. 5
figure 5

Marginalized posterior pdf for the background parameters of the \(0\nu \beta \beta \) model from the combined fit with all systematics. We show the \(68.3\%, 95.5\%, 99.7\%\) smallest C.I. in green, yellow and red respectively

Fig. 6
figure 6

Marginalized posterior pdf for the background parameters of the \(2\nu \beta \beta \) model from the combined fit with all systematics. We show the \(68.3\%, 95.5\%, 99.7\%\) smallest C.I. in green, yellow and red respectively

6 Conclusions

We have presented the latest search for DBD of \(^{130}\)Te to the first \(0^+\) excited state of \(^{130}\)Xe with CUORE based on a 372.5 kg\(\cdot \)yr TeO\(_2\) exposure. We found no evidence for either \(0\nu \beta \beta \) nor \(2\nu \beta \beta \) decay and we placed a Bayesian lower bound at \(90\%\) C.I. on the decay half lives of \(\mathrm {(T_{1/2})^{0\nu }_{0^+_2} > 5.9 \times 10^{24} \, \mathrm {yr}}\) for the \(0\nu \beta \beta \) mode and \(\mathrm {(T_{1/2})^{2\nu }_{0^+_2} > 1.3 \times 10^{24} \, \mathrm {yr}}\) for the \(2\nu \beta \beta \) mode.

The median limit setting sensitivity for the \(2\nu \beta \beta \) decay of \(2.1 \times 10^{24}\)yr is starting to approach the \(7.2 \times 10^{24}\)yr lower bound of the QRPA theoretical predictions half life range for this decay mode. The CUORE experiment is steadily taking data at an average rate of \(\sim 50\) kg\(\cdot \)yr/month and by the end of its data taking the collected exposure is expected to increase by an order of magnitude. Work is ongoing to improve the sensitivity by extending the analysis to not-fully-contained events, leveraging the information from the topology of higher dimension coincident signal multiplets to further reduce the background, and improving the signal efficiency by lowering the threshold of the pulse shape discrimination algorithm.