Probability Densities of the effective neutrino masses $m_{\beta }$ and $m_{\beta \beta}$

We compute the probability densities of the effective neutrino masses $m_{\beta }$ and $m_{\beta \beta}$ using the Kernel Density Estimate (KDE) approach applied to a distribution of points in the $(m_{\min}, m_{\beta\beta })$ and $(m_{\beta }, m_{\beta\beta })$ planes, obtained using the available Probability Distribution Functions (PDFs) of the neutrino mixing and mass differences, with the additional constraints coming from cosmological data on the sum of the neutrino masses. We show that the reconstructed probability densities strongly depend on the assumed set of cosmological data: for $\sum_j m_j \leq 0.68\ @\ 95\% \ \mathrm{CL}$ a sensitive portion of the allowed values are already excluded by null results of experiments searching for $m_{\beta \beta}$ and $m_{\beta }$, whereas in the case $\sum_j m_j \leq 0.23\ @\ 95\% \ \mathrm{CL}$ the bulk of the probability densities are below the current bounds.


Introduction
Although the physics of neutrino oscillation is entering a precision era, with all mixing angles and absolute values of the mass differences measured at the level of some percent, there are still questions related to the nature of neutrinos that need to be answered. Among these, we are interested in whether neutrinos are Majorana or Dirac particles and in the absolute value of their masses. As it is well known, experiments on neutrinoless double beta decays (0νββ) consider the possibility that the reaction (A, Z) −→ (A, Z + 2) + e − + e − (1) really occurs; in the case of positive signal, we could conclude that the total lepton number is violated by two units, although the process behind the conversion of two down quarks into two up quarks would not be uniquely determined [1,2,3]. The 0νββ-decay amplitude has the form A(0νββ) = m ββ M(A, Z), where M(A, Z) is the nuclear matrix element of the decay in Eq. (1) that does not depend on the neutrino masses and mixing parameters, and m ββ is the effective mass which, in the case of three lepton families, is given by m ββ ≡ j m j U 2 ej = cos 2 θ 13 m 1 cos 2 θ 12 + m 2 sin 2 θ 12 e iα + m 3 sin 2 θ 13 e iβ .
In the previous relation, U ej are the elements of the Pontecorvo-Maki-Nakagawa-Sakata (PMNS) mixing matrix U PMNS that encodes the leptonic mixing angles θ ij , whereas the phases α and β are the so-called Majorana phases (one of which eventually absorbs the CP violating phase δ). As it is usually done, two of the three neutrino masses m j in Eq. (2) can be expressed in terms of the lightest one m min in a way that dependents on the supposed neutrino mass hierarchy; for Normal Ordering (NO) we have: m 1 = m min m 2 = m 2 min + ∆m 2 21 m 3 = m 2 min + ∆m 2 31 , whereas for the Inverted Ordering (IO) we set: so m ββ effectively depends on the seven independent parameters θ 12 , θ 13 , ∆m 2 21 , ∆m 2 31 , α, β and m min . The study of the electron spectrum near the end point in the nuclear reaction 3 H → 3 He+e+ν e allows, in presence of neutrino mixing, to get information on the other effective mass largely studied in the literature, m β , defined by Since absolute values of the PMNS matrix are taken, complex phases play no role and m β only depends on three independent observables and it is somehow correlated, although not in a simple form, with m ββ . It is customary to present such a correlation varying all mixing parameters inside their 1, 2 or 3σ range ([0, 2π] for the Majorana phases in any case) and computing the maximum and minimum allowed value. While this procedure certainly gives insights on the possible outcomes of an experimental search, no information whatsoever can be drawn on the probability distribution of the observable itself. So, inspired by the work of [4] and [5], we computed the distributions of m β and m ββ and the Credible Regions (CR) as obtained using the available PDFs of θ 12 , θ 13 , ∆m 2 21 and ∆m 2 31 , with the additional constraints coming from cosmological data on the sum of the neutrino masses (see also Refs. [6,7,8] and [9]). However, unlike the procedure followed in [4] and [5], we use the KDE approach to compute PDFs of the observables in the 2D planes (m min , m ββ ) and (m β , m ββ ). The use of such a procedure also allows us to save computation time which could become a critical aspect in this sort of simulations.
A short summary of the numerical procedure and the data set we used to get the PDFs from the available data is done in Sect. 2 whereas a short description of the KDE method and the obtained results is done in Sect. 3. Finally in Sect. 4 we compare the PDFs derived from several choices of the Kernel function.

Numerical procedure and datasets
The construction of the PDFs for m β and m ββ passes through the extraction of the observables p = {sin 2 θ 12 , sin 2 θ 13 , ∆m 2 21 , ∆m 2 3 } (with = 1 for NO and = 2 for IO) from which they depend; the sampling is based on the knowledge of the likelihoods L(p) which in turn are functions to the single ∆χ 2 (p): For the observables p (which are only midly correlated), this information is available online at the address http://www.nu-fit.org, where the ∆χ 2 for the November 2016 data, based on the procedure discussed in Ref. [10], are given. Notice that a Bayesian analysis on the 2014 data set is available in Ref. [11]; the authors found that the results generally agree (at the level of one standard deviation) with those of the frequentest method, with some differences involving the atmospheric angle θ 23 and the Dirac CP violating phase. However, θ 23 does not enter into the expressions of the effective masses and the information on the CP phase is hidden by the presence of the Majorana phases. We then decided to use the most recent data set. For the sake of completeness, we report in Tab. 1 the central values and 3σ errors for all the observables relevant in neutrino oscillation for both orderings; similar values are also obtained in Ref. [12]. In addition to the oscillation data, our estimate of the PDFs also takes into account the cosmological constraints on the sum of the neutrino masses j m j coming from the Planck experiment [13].
The Planck Collaboration provides several likelihoods based on different assumptions among which we decide to use the following ones: • a conservative estimate (set-1) based on the set of data given by PLANCK TT + lowP + Lensing, which has j m j ≤ 0.68 eV at 95% CL;  Table 1: Central values ± the 1σ errors and 3σ ranges for the neutrino mixing parameters as obtained in Ref. [10] (available at the website http://www.nu-fit.org). Note that in the last line = 1 for NO and = 2 for IO. The analysis prefers a global minimum for NO with respect to the local minimum of IO, • a more aggressive one (set-2) based on PLANCK TT + lowP + Lensing + Ext, which has j m j ≤ 0.23 eV at 95% CL, with a maximum of the likelihood for j m j ∼ 0.05 eV.
The acronyms used above refer to the data on the temperature power spectrum (PLANCK TT), to the Planck polarization data in the low-temperature (lowP), to the data on Cosmic Microwave Background lensing reconstruction (Lensing); with Ext the constraints from Baryon Acoustic Oscillations, Joint Lightcurve Analysis of supernovae and the Hubble constant are indicated. For comparison purposes, we show in Tab. 2 the upper limits at 95% CL on the sum of the neutrino masses for different datasets which also include the data from the temperature-polarization cross spectrum (TE) and those from the polarization power spectrum (EE).  Table 2: Upper bound at 95% confidence level on the sum of the neutrino masses (in eV) using the data of Ref. [13].
With these likelihoods at our disposal, we employed the following procedure to accept or reject a given extraction of the set of observables p and Majorana phases (notice that δ is not relevant because the Majorana phase β hides any information on the Dirac CP phase): • we first extract sin 2 θ 12 , sin 2 θ 13 , ∆m 2 21 and ∆m 2 3 according to (6); the Majorana phases α and β are extracted according to a flat distribution in the interval [0, 2π]; • we then extract the value of M = j m j using the Planck data obtained from Fig. 30 in Ref. [13]; for NO, if M ≤ ∆m 2 21 + ∆m 2 31 (or M ≤ −(∆m 2 21 + ∆m 2 32 ) + −∆m 2 32 for IO), we reject such an M and extract a new value for the sum of the neutrino masses; • once the value of j m j is accepted, we compute the lightest neutrino mass m min using the relations Thus Eqs. (2) and (5) are used to get the numerical values of m β and m ββ . We generate O(10 6 ) realizations that satisfy the constraints discussed above. This order of magnitude is necessary to guarantee a 5σ coverage for the input parameters, as discussed in Ref. [4].

PDF analysis
The procedure outlined above produces two-dimensional histograms in the planes (m min , m ββ ) and (m β , m ββ ). In order to compute from them the PDF and CRs, we used the Kernel Density Estimate (KDE) approach [14]. Suppose we have a d-dimensional vector x of observables of which we want to know the PDF, f (x), and suppose also that we have N different realizations of the same observables {t j } N j=1 obtained according to the procedure described above; thus f is estimated from where h k is the bandwidth of the k-th component of the vector x, whose estimate according to the Scott's rule of thumb [15] is given byĥ σ k being the standard deviation of the k-th observable x k . The Scott's rule reduces the asymptotic expected value of the integrated square errors between the actual distribution f and the estimatedf . The positive function K is called kernel and must satisfy the normalization condition A simple but equally suited kernel is the Gaussian kernel, defined as: that we estimate using the same algorithm of Ref. [16] 1 , based on the modified SciPy function gaussian kde described in Ref. [17].
The results for the PDFs as a function of log 10 m min (m β ) and log 10 m ββ at the 68%, 95% and 99% CRs obtained with the analysis performed using set-1 for the sum of the neutrino masses are shown in Fig. 1 in the (m min , m ββ ) plane and in Fig. 2 in the (m β , m ββ ) plane. The analogous results for set-2 are shown in Figs. 3 and 4. In all planes, the excluded region for m ββ is the area above the horizontal magenta dashed line, around m ββ ≥ 0.19 eV [18] obtained using the 90% CL limit on the half-life of 76 Ge, T 0ν 1/2 ( 76 Ge) > 5.2 × 10 25 years, in the preliminary analysis of GERDA phase II [19]. A recent result using the 136 Xe, T 0ν 1/2 ( 136 Xe) > 1.07 × 10 26 years at 90% CL obtained by the KamLAND-ZEN experiment [20], gives the lower bound indicated with green dashed lines, which excludes the region m ββ ≥ 0.083 eV [18]. The bounds we quote for m ββ are obtained according to Ref. [21], where the Authors used the results of Ref. [22] for the phase-space factor and those of Ref. [23] for the nuclear matrix elements. In our analysis we fixed the axial coupling constant of the nucleon g A = 1.269. We also outline that the large uncertainties associated to the nuclear matrix elements M(A, Z) can modify the prediction for decay amplitude A(0νββ); however, the impact of such effects are beyond the scope of this paper and will not be analyzed in the following. For the other observable, m β , the red vertical dashed line indicates the expected sensitivity of the KATRIN experiment (0.2 eV at 90% CL [24], see https://www.katrin.kit.edu/128.php) and the grey vertical dashed line the expected sensitivity of the Project 8 experiment (4 × 10 −2 eV at 90% CL [25]) which has been especially designed to probe the whole IO parameter space. Notice that the most stringent upper limit on m β has been obtained by the Mainz and Troitzk experiments, m β ≤ 2.05 eV at 95% CL [26,27]. To better compare the different cosmological datasets we use the same scale for the PDF densities (which are normalized to one). In the (m min , m ββ ) plane the maximum is fixed to be 10, while in the (m β , m ββ ) plane it is 30.     A close inspection at Fig. 1 shows that a large portion of the 68% CR for m ββ , the one corresponding to large m min , is already excluded by the KamLAND-ZEN data, for both hierarchies. In practice, this is the   consequence of having a non-negligible probability that j m j 0.5 eV, therefore the value of m min can be sufficiently large to approach O(0.1). On the other hand, set-2 relaxes this constraint and the probability density is centered around smaller m min (and consequently smaller m ββ ). For both cases, the cosmological bounds on the sum of neutrino masses implies that for NO and IO the low mass region for m ββ is strongly disfavoured.
The interesting features of Figs. 2 and 4 is that, for both assumptions on the sum of the neutrino masses, the Project 8 experiment would be able to probe almost the whole allowed regions for m β at 99% level whereas KATRIN shows only a modest ability to probe the largest possible values of m β , around O(10 −2 − 10 −1 ) eV.
Instead of discussing the effects of the cosmological bounds on the effective masses, one can also adopt an opposite point of view, asking what would be the effect on j m j of a possible measure of m ββ at the new generation of experiments, see Refs. [28,29,21]. As an example, we can explore the situation that a signal for the 0νββ-decay is observed at the (near future) CUORE or (next-to-near future) nEXO experiments.
Following the discussion of Ref. [21] we assume an optimistic scenario where a signal is in the expected 90% experimental sensitivity region, that is m ββ = 0.073 ± 0.008 eV (assuming g A = 1.269) for CUORE [30]. Similar values can also be achieved by GERDA Phase-II [31], MAJORANA-D [32] and NEXT [33] experiments, so our discussion applies equally well to a large number of possible future experiments. In the case of nEXO experiment [34], we set m ββ = 0.011 ± 0.001 eV, which is below the IO region.
The results of our finding are shown in Fig. 5 where the frequency of the sum of light neutrino masses (histograms normalized to 1), after the constraints coming from m ββ , is displayed. With black dashed lines we also show the Planck PDFs for set-1 (upper panels) and set-2 (lower panels). In the first column of the plot, which refers to the case m ββ = 0.073 ± 0.008 eV, we clearly see that there exists a cutoff in the distribution in the low mass region due to the fact that m ββ cannot be arbitrarily small, with maxima around j m j ∼ O(0.2 − 0.3) eV for both set-1 and set-2 and for both hierarchies (NO in blue and IO in red). On the other hand, in the high mass region the distributions essentially follow the shape of the Planck priors since the assumed values of m ββ do not impose strong constraints on m min .
If we assume a positive signal at the nEXO experiment m ββ = 0.011 ± 0.001 eV, second column of Fig.  5, we see that we cannot distinguish among different Planck datasets since the bound on the 0νββ-decay effective mass constraints j m j to be of O(0.1) eV.

Discussion and conclusions
At first sight, the results described above seem to depend on the choice of the kernel used to estimate the PDFs. However, we have checked that adopting different functions K the CL regions are not altered in a significant manner. We test different kernels, provided by the scikit-learn package [35]: The check is performed adopting the k-fold cross-validation approach, proposed in Refs. [36,37]; in few words, the sample of extracted points is split into k smaller sets; of them, k −1 sets are used to estimate f (x) according to a given kernel and the resulting model is then validated on the remaining part of the dataset. In Tab. 3 we show our result for the cross-validation analysis with m min and m ββ as independent variables (similar results can be achieved for m β and m ββ ): we analyze ten subsets with N = {1000, 5000, 10000, 50000} points, then we average the results. In order to investigate possible overfitting effects, each subset has been divided into two parts: a train (N train ≈ 0.6N ) and a test (N test = N − N train ) set. We estimate the best bandwidthĥ using twenty k-folds in the train dataset. The error E set between the actual distribution and the kernel estimate is defined as: where set can be train or test-set. The actual distribution can be obtained from the two dimensional density histogram. We assume for the histogram N  (11) is necessary to compare datasets with different dimensions. In Fig. 6 we show our results off (x) for the set-1 prior on the sum of neutrino masses and for all kernels introduced above (green shaded area). The PDFs are superimposed on a subset of 5 × 10 3 points. As we can see, the Gaussian kernel as well as the exponential one correctly reproduce the testing dataset for both orderings (for these two cases, the green areas are concentrated below the points and they do not appear in the graphs). For the other kernels, the agreement does not appear to be as good as for the previous ones, since the PDFs extend over regions outside the subset of points. In particular, in Tab. 3 we observe that the errors E set of the Gaussian and the exponential kernels are roughly one half those of the other kernels.
The cross-validation procedure is also useful to compute the best bandwidthĥ that minimizes the residual error between the predictions and the actual values of the sample points. Our findings are compatibles with the Scott's rule defined in (8), see Tab. 4 for a summary of the bandwidths computed using the same data of the cross-validation analysis. Notice that in the cross-validation a single bandwidth is estimated for each kernel. For the set-2 our conclusions remain unaltered: the Gaussian and the exponential kernels reproduce the training dataset with a good accuracy.
In conclusions, we have shown that the KDE method is an efficient tool to evaluate the PDFs of interesting physical observables. We have concentrated our efforts on two observables related to neutrino physics, namely the effective neutrino masses m ββ and m β which will help to reveal the true nature of neutrinos and the values of their absolute masses. For them, we have computed the Credible Regions using the available PDFs on the mixing angles and mass differences, with the additional constraints coming from cosmological data on the sum of the neutrino masses. We found that the reconstructed probability densities strongly depend on the assumed set of cosmological data and, in particular, for j m j ≤ 0.23 eV at 95% CL the bulk of the probability densities are below the current bounds on the analyzed observables. This conclusion remains qualitatively unaffected if one uses a different choice of the kernel function.