Sterile Neutrino Fits to Short Baseline Data

Neutrino oscillation models involving extra mass eigenstates beyond the standard three (3+N) are fit to global short baseline experimental data. We find that 3+1 has a best fit of Delta m^2_41 = 1.75 eV^2 with a Delta chi^2 [null-min] (dof) of 52.34 (3). The 3+2 fit has a Delta chi^2 [null-min] (dof) of 56.99 (7). Bayesian credible intervals are shown for the first time for a 3+1 model. These are found to be in agreement with frequentist intervals. The results of these new fits favor a higher Delta m^2 value than previous studies, which may have an impact on future sterile neutrino searches such as the Fermilab SBN program.


Introduction
The well-established discoveries of neutrino mass and three-active-flavor mixing can be phenomenologically incorporated into the Standard Model [1], resulting in a model that we can call the "νSM". This model successfully predicts neutrino oscillations in many experiments. However,the masses and mixings must be incorporated in an ad hoc manner. This leads one to ask if there is more "new physics" in the neutrino sector that is yet to be discovered that can give us a clearer picture of the underlying theory.
A set of 2σ to 4σ anomalies have been observed in short baseline (SBL) oscillation experiments that may indicate new physics. SBL experiments have L/E ∼ 1 m/MeV, where L is the distance from the source to the detector and E is the neutrino energy. Anomalies are observed from the Liquid Scintillator 1 email: gabrielc@mit.edu Neutrino Detector (LSND) experiment [2], the Mini Booster Neutrino Experiment (MiniBooNE) [3,4], the collection of SBL reactor experiments (often called the "reactor Anomaly") [5,6], and the source calibration data from the gallium-based experiments, SAGE and GALLEX [7,8]. Any interpretation must also consider similar SBL experiments that have seen no anomalous oscillations (called "null experiments") [9,10,11,12,13,14,15,16,17,18].
Oscillations between active and light sterile neutrinos represent a possible explanation for the combination of anomalous and null SBL data sets. Sterile neutrinos are beyond-Standard Model, non-weakly-interacting additions to the neutrino family. Introducing these new particles extends the number of mass states and expands the mixing matrix [19] in the νSM. This allows oscillations with squared mass splittings, ∆m 2 , that are large compared to those in the νSM. Experimental anomalies suggest a mass scale ∼ 1 eV 2 . Models with one (3 + 1), two (3 + 2), and three (3 + 3) additional sterile neutrino states are generically called "3 + N " models.
This paper explores the viable parameter space for oscillation models involving sterile neutrinos. The most obvious signature of oscillation to sterile neutrinos is disappearance of an active flavor. Potential ν e → ν s signals have been observed in neutrino and antineutrino mode by the reactor and Galliumbased experiments. A ν µ → ν s at a compatible ∆m 2 is yet to be observed, and we will show that this places strong constraints on the phenomenology. If disappearance occurs, then the model also predicts appearance, ν µ → ν e at the same ∆m 2 value(s). This could be consistent with the LSND and MiniBooNE results, which are seen for both neutrinos and antineutrinos.
This global fit does not make use of the limits from cosmology. This is because reasonable mechanisms can be put forward within cosmology reduce or remove the constraint, as discussed in Ref. [20].

3 + N Fits to Short Baseline Data
The νSM model has three massive neutrinos leading to two distinct differences between the squared masses, ∆m 2 21 and ∆m 2 32 . The 3 × 3 lepton mixing matrix, called the Pontecorvo-Maki-Nakagawa-Sakata (PMNS) matrix, connects the mass eigenstates to the weak interaction eigenstates.
For vacuum oscillations in a 3 + N model, the probability for finding a neutrino in flavor state β after propagating a distance L and being produced as a flavor state α is given [21] by where E is the neutrino energy and ∆m 2 ji = m 2 j − m 2 i . Furthermore, the corresponding antineutrino oscillation probability can be obtained by replacing U → U † .

Incorporating Sterile Neutrinos into the Model
The incorporation of one additional neutrino mass state, in order to extend to a 3 + 1 model, introduces a third squared mass splitting. This also requires an extension of the PMNS matrix to a unitary 4 × 4 matrix: This introduces seven new matrix elements, four of which (U s1 , . . . , U s4 ) cannot be directly constrained by experiment due to the non-interacting nature of the fourth 'sterile' flavor state. The matrix is assumed to be unitary, and the magnitude of the new elements can be constrained by the current measurements of unitarity of the PMNS matrix [22]. The new degrees of freedom can be parameterized by introducing three new neutrino mixing angles θ i4 and two new CP violating phases. Eq. (1) still holds in describing oscillations, but now the indices i, j run up to 4. Although the 3 + 1 model has three independent squared mass splittings, data indicates that two are small compared to the third. The anomalies described in the introduction are all consistent with oscillations corresponding with a squared mass splitting on the order of 1 eV 2 . The two splittings associated with the νSM are are on the order of 10 −5 eV 2 and 10 −3 eV 2 . The effect of the two small splittings on an experiment designed to look for O(1 eV 2 ) scale oscillations will be negligible. Therefore, we use the short baseline (SBL) approximation, where we assume that the mass eigenstates that participate in the standard oscillations are degenerate (i.e. ∆m 2 21 = ∆m 2 32 = 0).
The oscillation probability formula for ν α → ν β in the 3 + 1 model then reduces to: With any particular selection of α and β this can be seen to be equivalent to a simple two neutrino model with a mixing amplitude of sin 2 2θ αβ = |4(δ αβ − U α4 U * β4 )U * α4 U β4 |. More generally, for a 3 + N model incorporating N sterile neutrinos, the complex phases of U must be taken into account. Let A transformation of ν →ν causes Φ → −Φ allowing a difference between neutrino and anti-neutrino oscillations. These are the CP -violating phases. The probability of oscillation for a 3 + N model can then be written as For N > 1 sterile neutrinos, the SBL experiments are sensitive to the mass hierarchy through the non-squared sine term. In the global fit, we assume that the degenerate mass states have the lightest mass, i.e. they follow a normal mass hierarchy.

Improved 3 + N Global Fitting Algorithm
For this analysis, we have rewritten our previous fitting software [19]. Along with converting from Fortran to C++, this package has been designed to make the addition of new data sets easier, as well as to allow the testing of models beyond the 3 + N presented in this article. Also and importantly, we have improved the method of searching the parameter space, which, in our previous fits, did not use a standard Markov chain Monte Carlo (MCMC) algorithm. The new algorithm for searching the parameter space is based on the affine invariant parallel tempering MCMC method used in the Emcee Fitting Package [23]. An MCMC efficiently samples the most likely regions of parameter space, whereas a comprehensive scan would be cost-prohibitive. Technical details of the new approach appear in the appendix to this paper.
The MCMC explores the parameter space by incremental movements governed by the specifics of the algorithm. At each step, a χ 2 value is calculated using the standard definition for normally distributed data: and a likelihood for Poisson distributed data [24]: where n is the number of bins, d the observed data, p( θ) the model prediction for parameters θ, and Σ is the covariance. These χ 2 values are saved along with their respective θ. The algorithm continues until a predetermined number of steps have been executed. From this list, the minimum χ 2 is found. The quantity is found for each saved χ 2 . These ∆χ 2 values are used to draw the confidence intervals in plots. All points that satisfy are drawn inside the interval with probability p. Where CDF −1 χ 2 is the inverse χ 2 distribution CDF and k is the number of degrees of freedom. Where there are multiple intervals, they are drawn on the plot in descending order of probability. The plot is effectively a marginalization via minimization. For a 2D plot, the number of degrees of freedom is thus k = 2.

3 + 1 Frequentist vs. Bayesian Results
In the frequentist treatment (Sec. 2.2), confidence intervals are drawn from the value of the ∆χ 2 statistic. For the intervals to be meaningful, the statistic must be correctly χ 2 distributed. This may not necessarily be true, especially in the case of neutrino oscillations where the model predictions use sinusoidal functions.
Feldman-Cousins [25] provides a technique for drawing meaningful confidence intervals in these conditions. However, the method is far too computationally expensive to be used in a global fit. Thus, the frequentist intervals in this paper assume that the χ 2 statistic is correctly distributed.
It would be advantageous to side-step the issue entirely by avoiding the use of a χ 2 statistic. This can be done using Bayesian credible intervals.
For experiments with normally distributed data the log-likelihood is defined using the normal distribution and experiments with Poisson distributed data we use, The density of the explored points in parameter space reflects the underlying posterior distribution π( θ). An estimate of this posterior is generated from the distribution of walkers with temperature β = 1. Typically a certain number of steps at the beginning of each walker chain contains information about the walkers starting position. As the ensemble begins to equilibrate, this information is lost. The estimate of the posterior should not be polluted by the starting values, so a certain number of steps from the beginning of the chain is typically ignored. These ignored steps are called the "burn sample." The α probability credible interval C(α) must satisfy While there are multiple definitions for C, the most useful when comparing best fits is the highest posterior density interval. Here, the interval is the (possibly disjoint) set of points whose posterior probability meets a threshold t: where t is constrained by Eq. 12. Intuitively this can be seen as an interval, which starting at the mode (i.e. the best fit point), grows to include an area whose integrated probability is exactly α and where all points inside the interval have higher probability density than all points outside the interval. In order to present the Bayesian credible intervals, we plot the highest posterior density interval for a probability α, by drawing the samples whose posterior is greater than a threshold value t. The value of t is chosen so that the number of samples meeting this criteria is a fraction α of the total number of samples [26].
In both the frequentist and Bayesian cases, the MCMC algorithm was run with a uniform prior on log 10 |U ai |, log 10 ∆m 2 4i and log 10 Φ. The positions of the walkers in parameter space was limited as follows: The matrix elements were required to lie within the space of unitary matrices and be larger than 10 −6 . The phases were required to be less than 2π. Large ∆m 2 parameters require much more computing time to evaluate, which slows down the entire ensemble. Therefore, the ∆m 2 parameters are required to be between 10 −4 eV 2 and 10 4 eV 2 for 3 + 1. In the case of additional sterile neutrinos, this was narrowed to 10 −3 eV 2 and 10 3 eV 2 . Proposed steps outside these listed boundaries are penalized with a log-likelihood of −∞.

The Experimental Data Sets
The full list of experiments included in this study is provided in Tab. 1. Most data sets used in our past analysis [19] have been incorporated into this analysis, however the atmospheric data set and a MiniBooNE disappearance data set that were used previously have been replaced by the Mini-BooNE/SciBooNE joint disappearance analyses, which are more restrictive. A second reason to drop the atmospheric constraint was that it assumed no oscillations of electron neutrinos in order to obtain the limit, and this is inconsistent with a global fit. Also, the description of the LSND experimental result was improved in the code to better represent the published result [2].
The MiniBooNE/SciBooNE data sets in neutrino mode [15] and antineutrino mode [16] were taken from the public release for each analysis. However, for the neutrino data set, an updated covariance matrix was used, along with a cosmic background data set omitted from the data release [27].
The best fit has moved to the ∼ 1.7 eV 2 region in the new result. This was caused by the addition of the SciBooNE/MiniBooNE disappearance analyses. The changes made to the datasets is discussed in Sec 2.4.
The credible intervals of the Bayesian fit are shown in the bottom right of  The χ 2 values, degrees of freedom (dof) and probabilities associated with the best-fit and null hypothesis in each scenario. P best is the χ 2 -probability at the best fit point and P null is the χ 2 -probability at null (no oscillation). Fig. 1. The 90% Bayesian credible intervals are compatible with the 90% frequentist confidence intervals shown in the bottom left plot. We note slightly worse agreement in the 99% credible and confidence intervals of these plots, where the ∆m 2 ≈ 5 interval is substantially smaller in the Bayesian result.

Updated Fits: 3 + 2
To relieve the tension in the 3 + 1 model, one can move to a 3 + 2 model. The frequentist global fit for this result is shown in Fig. 2. This model has 7 parameters, and so we select some examples to illustrate the allowed parameter space. Fig. 2, left, shows the space of the two mass splittings. The best fit is for the solution where both splittings are less than 1 eV 2 (see Tab. 3). However, one can see that in the region of ∆m 2 41 ∼ 1 eV 2 , there are multiple high ∆m 2 solutions that have roughly the same χ 2 value. Thus, while our new fit appears at first glance to be a dramatic change from Ref. [19], which found best fit values of 3 + 2 of ∆m 2 41 = 0.92 eV 2 and ∆m 2 51 = 17 eV 2 , in fact this is actually a small shift of χ 2 . The previous best fit from Ref. [19] remains within the allowed region. Fig. 2, right, shows the value of ∆m 2 51 as a function of the CP violating parameter. This shows that the CP violation parameter can shift over a wide range to accommodate many (∆m 2 41 , ∆m 2 51 ) pairs of solutions. Introducing the CP parameter does not greatly improve the fit, however. As can be seen from Tab. 2, the difference in ∆χ 2 null−min for 3 + 1 versus 3 + 2 models is about four, while four degrees of freedom were added.

Summary and discussion
Using the improved software package, we have presented two new results. First, in a global analysis of the SBL data, we find that a 3 + 1 model has a best fit of ∆m 2 41 = 1.75 eV 2 with a ∆χ 2 (dof) of 52.34 (3) with respect to the null hypothesis. Second, for the first time we have demonstrated that our fit results are stable if one uses a frequentist or a Bayesian approach.
The fact that our new fits favor a ∼ 2 eV 2 solution has interesting implications for the immediate future of sterile neutrino studies. MicroBooNE [30], which has just begun to take data, is located on the Booster Neutrino Beamline (BNB) with a peak ν µ energy of 700 MeV. The 170 t detector is located at 470 m from the BNB target. MicroBooNE is directly upstream of the 800 t MiniBooNE experiment, which is at 540 m from the BNB target. If the 2 eV 2 solution of a 3 + 1 model is correct, then MicroBooNE sits closer to oscillation maximum than MiniBooNE, thus predicting a higher signal in MicroBooNE than simple scaling for solid angle and tonnage assumes. On the other hand, the ICARUS T600 detector, planned for 600 m from the BNB target [30], may be poorly located to address this 2 eV 2 solution. However, the combination of the three SBN detectors [30] including SBND, MicroBooNE, and ICARUS should be able to cover the full range of interest for a 3+1 sterile neutrino signal, given sufficient statistics.

Appendix: Implementation of MCMC
The fitting algorithm used in this study is based on the affine invariant parallel tempering Markov chain Monte Carlo (MCMC) method used in the Emcee fitting package [23]. An MCMC moves randomly in the parameter space. Each movement is called a step. Before a new step is entered into the history of the Markov chain, it must first pass a probabilistic test. The acceptance probability is based on a Boltzmann distribution: where E is the energy of a position θ in parameter space. This energy is a function of the log-likelihood of the posterior π( θ) With suitable definitions, the log-likelihood can be related to the χ 2 by A set of N seed points are selected randomly in logarithmic parameter space according to a uniform distribution. Each seed is the beginning of an independent Markov chain called a 'walker'. Collections of these walkers are arranged in groups called ensembles.
The walkers are evolved in a step-wise fashion. At each step, the affine invariant movement algorithm is performed on each walker, followed by the parallel tempering swap. In traditional Metropolis-Hastings movement the new walker location is chosen based on a multi-variate normal distribution. The parameters of this distribution need to be chosen in advance. If the shape of the distribution does not resemble the underlying posterior then inefficient sampling will result. In comparison, the affine invariant method [31] only requires the affine scale a to be chosen in advance. The movement of the walkers is based on the current ensemble. Hence, any affine transformation of a normal distribution will be efficiently sampled.
For a given walker (i), the affine invariant movement randomly selects another walker (j) in the ensemble and attempts to move toward it. The proposed new set of parameters at step n + 1 is Where θ i (n) is the parameters of walker i at step n and z is a step distance which is randomly selected according to the distribution Here a is called the affine scale and is set to 2. The new set of parameters are then accepted according to the probability min 1, z k−1 e −E i ( θ i (proposed)) e −E i ( θ i (n)) , where k the number of parameters in the model. The affine invariant method has problems sampling multi-modal distributions. Parallel tempering is a well known MCMC method for sampling multi-modal posterior distributions [32]. Multiple ensembles of walkers are evolved in parallel. Each of these ensembles has its own temperate parameter T = 1/β. The energy function for the walker is then defined as This "flattens" the posterior distribution for ensembles with a large temperate parameter. Walkers in these ensembles have an easier time moving out of a local maximum and exploring the space for other potential maxima.
The information from these high temperature ensembles needs to be communicated back to the low temperature ensembles so that they can be sampled. This is achieved by occasionally swapping the positions of walkers between ensembles. On each step, a swap is performed with probability θ = 0.1. Random pairs (i, j) of walkers are selected, with walkers in different ensembles. The walkers then swap position with probability min 1, e −E j ( θ j (n)) .