Combining neutrino oscillation experiments with the Feldman–Cousins method

In this paper we describe how two or more experimental results can be combined within the procedure of Feldman and Cousins, to provide combined confidence limits on the physical parameters of interest. We demonstrate the technique by combining the recent electron neutrino appearance results from T2K and MINOS. Our best fit point is sin2 2θ13 = 0.08 (0.11) and δ = 1.1 (2.0)π; in addition we exclude sin2 2θ13 = 0 at 2.7σ (2.8σ) for the normal (inverted) neutrino mass hierarchy.


Introduction
In order to obtain global constraints on physical parameters, it is often necessary to combine the results of two or more experiments with sensitivities to the same parameters. Such combinations can be performed with different levels of sophistication, depending on what information is available about the original measurements.
In the absence of detailed information, relatively crude combinations, e.g. by simply summing log-likelihood values, can be useful, but these results will always be subject to caveats.
In this work, we describe a method to combine results which requires somewhat detailed experimental information, but which focuses on obtaining correct coverage of the parameter space under study, by producing a combined likelihood curve from the joint data set of the experiments, and using the Feldman-Cousins method [1] to define acceptance regions for each point in the parameter space. We will describe the inputs that would be required from an experiment to enable its inclusion in such an analysis in § 2.1. We also present, as an example of the technique, the combination of the recent electron neutrino appearance results from the long baseline experiments MINOS and T2K. This example is particularly pertinent, since the parameter θ 13 is known to be near a physical bound; the Feldman-Cousins technique was developed to deal with such cases, since conventional methods for determining acceptance regions can produce incorrect coverage or null contours. A comparison will also be made between our method and one where fixed log-likelihood differences are used to produce the contours.

The example: Constraints on θ 13 from MINOS and T2K
Neutrino oscillations, a now well established phenomenon [6][7][8][9][10][11], can be parametrised by the PMNS mixing matrix [12,13], assuming a minimal 3-flavour mixing model. All of the mixing angles in this matrix have now been measured to be non-zero, including the angle θ 13 [4,5], which governs muon to electron neutrino oscillations in the atmospheric (L/E) regime. The CP-violation parameter δ is as yet unknown. In the appearance channel T2K has found an excess of electron neutrino events in a muon neutrino beam, suggesting that θ 13 is non-zero, with a significance of 2.5σ [2]. MINOS has also found an excess of events, excluding θ 13 = 0 at 89% CL [3]. We will combine the results of these two experiments to get improved constraints on θ 13 from electron neutrino appearance measurements, as well as the CP-violation parameter δ. In the disappearance channel, θ 13 has recently been measured to be non-zero by two different reactor experiments, Daya Bay [4] and RENO [5].
MINOS and T2K are both long baseline accelerator neutrino oscillation experiments, in which a predominantly ν µ beam is created by colliding protons onto a target, then magnetically focusing the resultant charged mesons and allowing them to decay. The neutrino beam thus produced is sampled at source by a near detector, and several hundred kilometers away by a far detector. Details of the individual experiments' setups are described elsewhere [14,15]. Both experiments are at an atmospheric L/E, but their different energies and baselines (MINOS is at 735 km and has a beam peaked at 3 GeV whereas T2K is at 295 km with a beam peak of 0.6 GeV), along with different detector technologies and analysis techniques lead to different systematic errors, as well as different sensitivities to the oscillation parameters. Both experiments use the difference in the numbers of electron-neutrino-like events observed in their far detectors from those predicted to constrain θ 13 . Both analyses are performed with the Feldman-Cousins method.

Method to Combine Fits
In order to combine the data from two or more experiments, the binned data from all experiments are considered together, as a single larger experiment. The Feldman-Cousins technique is then used to produce confidence contours based on the combined data.

Inputs to the analysis
In order to perform our analysis, the data from each experiment are required, along with Monte Carlo expectation values. Specifically, these consist of: (i) The number of data events in each analysis bin for each experiment.
(ii) The expected number of events in each analysis bin, as a function of the parameters θ to be fitted over, in our example (sin 2 2θ 13 , δ). The number of events will vary smoothly with the oscillation parameters, so a set of values on a grid, which can be numerically interpolated for intermediate points, is suitable for this purpose. (iii) The correlated bin-by-bin systematic errors for each experiment, encoded in covariance matrices, as a function of the parameters θ. We therefore require that the calculation of a covariance matrix from the underlying sources of systematic uncertainty (cross-section errors, beam-line uncertainties etc.) has been performed by each experiment. We assume that both experimental and theoretical uncertainties are included in the matrix.
Note that although we assume that the bin-to-bin correlations within each experiment will be provided, in general there will also be correlations between experiments, which may be important for the final result. This detail is discussed in § 2.6. It should be noted further that in order to be combined consistently, we require that the original analyses make essentially the same physics assumptions; for example the values of mixing parameters other than those in the fit.

The Feldman-Cousins technique
The Feldman-Cousins technique is a method for generating acceptance regions within the classical framework for interval calculation [17]. Its distinctive feature is that when choosing which data values n to include in the acceptance for a given θ, the candidates are ranked by the relative likelihood of observing n at θ, with respect to the likelihood of observing n at the best fit point for that data. The use of relative rather than absolute likelihoods means that data values which are unlikely for all points in the parameter space will still be included in the acceptance region for some θ, avoiding the problem of null contours for some data values. The method also automatically produces a smooth transition between one-and two-sided intervals depending on the observed data, with correct coverage throughout ‡.
The key procedural step in the technique is the generation of a large number of "toy" Monte Carlo experiments at many positions in the parameter space, in order to identify the relative likelihood limit for each point which will give the correct coverage. For our example, 10 4 toy experiments were generated at each value of θ considered. For each toy experiment result n, a fit is performed for θ best using a likelihood function ln L. The difference, ∆(ln L), in the log likelihood between the best fit point θ best , and the value θ actually used to generate the toy experiment, is calculated.
Using the ∆(ln L) from all toy experiments, a value ∆(ln L) crit is calculated, such that some fixed proportion, say 90%, of toy experiments satisfy the condition The condition (1) defines an acceptance region with 90% probability for the value of θ in question. Repeating the procedure for all values of θ, a surface ∆(ln L) crit (θ) is calculated. A fit is then made to the real data using our likelihood function, obtaining a best fit point θ best , and a corresponding log-likelihood value (ln L) best . A log-likelihood surface (ln L)(θ) is also calculated on a grid in θ. We then draw our confidence interval, using the condition (1), with ∆(ln L) = (ln L)(θ) − (ln L) best , at each point in θ-space to decide whether the point should be included in the contour.

Toy experiment generation
The toy Monte Carlo (MC) is required to give a number of events for each analysis bin, n obs i , based on the expected number of events n exp i for a given value of θ, allowing for fluctuations due to systematic § and statistical uncertainties. The expected number of events used is the total (i.e. signal plus background) expectation, so that both signal and background counts fluctuate between our toy experiments. The main complication in the MC is to ensure that correlations between the bin values are taken into account.
Expressing the systematic uncertainties as absolute shifts ("tweaks") x i in the expected number of events in each bin, we can write Our method makes the common assumption that the systematic errors follow a multivariate normal distribution; that is, their joint pdf f syst (x) follows where V is the covariance matrix for the x i . Since V is a symmetric, positive-definite matrix, we can use the method of Cholesky decomposition [16] to find an upper-diagonal matrix L such that (4) ‡ Where the number of events is very small, as with the T2K data, Feldman-Cousins will give some over-coverage due to the discrete likelihood distribution generated from toy Monte Carlo events. This effect will be negligible for the combination example presented here. § Note that this method of treating systematics is Bayesian. In a frequentist prescription, systematics would be included as extra dimensions in θ; however this is not computationally feasible when using the Feldman-Cousins method.
We can then use the matrix L ⊤ as a transformation to enable us to generate the vector x from another vector of random variables y, such that One can show by trivial matrix algebra that we can rewrite the factor in the exponential in (3) as and so we deduce that if we generate the y i independently on a normal distribution with unit standard deviation, then the vector x generated using (5) will have the desired covariance properties. From now on, unless otherwise stated, we assume that n exp i has been modified for systematics using this prescription, so that it is now a function n exp i (θ, y) of both θ and y. We will also use f syst (y) to refer to the probability density function (pdf) of the random vector y.
Having taken account of systematics, a number of events for each bin can be generated using Poisson statistics, i.e. on a pdf where the y i must be generated separately for each toy experiment.

The likelihood function ln L
When performing our fits, the value of the systematic "tweak" parameters is allowed to vary, so our likelihood function must include penalty terms to account for the finite uncertainty in these parameters. We define a likelihood function including both systematic and statistical terms, as a function of θ and y, using the same pdfs we used to generate the toy experiments: L(n obs ; y, θ) = where we have dropped factors independent of (θ, y). As is standard, minimisation is actually performed on the logarithm of the likelihood function since this is a simpler process, and is equivalent as ln L is a monotonic function of L. This function is given by In the case that we do not wish to fit for systematics (e.g. for direct comparison with other analyses), we use the simpler function where the n exp i are no longer functions of y.

Performing the fit
We perform the minimisation of (−2 ln L) using the MINUIT minimiser via the interface provided by the ROOT framework [18]. The derivatives ∂ ln L ∂yi are calculated analytically to improve performance. The fit requires as inputs the vector n exp , and the Cholesky decomposition L of its covariance matrix V , at every point in θ space. These inputs are generated on a grid in θ in advance and then picked out by interpolation.
The fit is split into two nested components: a "top-level fit" over the parameters θ, and another fit performed over y which is called by the top-level fit to find the best (ln L) value for a given θ. sin 2 (2θ 13 ) is constrained to lie in the physical region [0,1]. The y are allowed to float freely, but all n exp i are constrained to remain above zero (actually 10 −5 ) regardless of the y.

Correlated systematic errors
When combining experiments with some common systematic error sources, it is necessary to calculate the correlation coefficients between the bins of the two experiments. In order to do this we must identify the common systematic uncertainties, and get from each experiment the contributions of these error sources to the total errors on each bin.
Once this information is available, the cross-terms in the covariance matrix can be calculated. Assuming Gaussian errors, we express the number of events in each bin i as where the y i and z α are independent normally distributed random variables with (σ = 1, µ = 0), α indexes the correlated error sources and the c iα give the contribution of the error source α to the error on bin i. σ uncorr i is the uncorrelated component of the systematic error on bin i. From (11), it is easy to show that

Our Combination Example
In our example we combine the electron neutrino appearance results of two long baseline experiments: MINOS [3] and T2K [2]. The inputs we take from each experiment are the expected number of electron neutrino events (as a function of θ 13 , δ), in each bin, along with the covariance matrices of the systematic errors between the bins in each experiment. The T2K result consists of a single bin, and the MINOS result of fifteen bins (five of energy times three of the Library Event Matching particle identification parameter).

Validation
To validate our method, we have reproduced the results of both the T2K and MINOS analyses individually. There are some slight differences in the ways in which the two experiments perform their analyses, which must be considered when validating our method, and when making a combination of their results. One difference is that MINOS minimise over the systematic errors in their likelihood function, whereas T2K do not. Since in general it may be advantageous to minimise over the systematic errors, we do minimise over systematics in our example fit. Another difference is that T2K show their results for sin 2 2θ 13 at a fixed value of 2 sin 2 θ 23 = 1, whereas MINOS show the combination 2 sin 2 θ 23 sin 2 2θ 13 , by throwing θ 23 between its errors and then choosing θ 13 to keep this quantity fixed in each toy experiment. The systematic errors that MINOS have provided us with, however, do not include any uncertainty in the oscillation parameters   Figure 1. Reproduction of the T2K results: we find good agreement with the published contours [2]. In addition, we exclude sin 2 2θ 13 = 0 at 2.5σ.
The validation results are shown in Figures 1 and 2, for T2K and MINOS respectively. In both cases, we find good agreement with the published contours.

Possibly Correlated Systematic Errors
In order to combine the errors from both experiments into a single covariance matrix, we need to consider the systematic errors that may be correlated between them. In the case of our example, most of the sources of error cited in [2,3] can be assumed to be uncorrelated between experiments, due to the different experimental set-ups and different neutrino energies involved. However, cross section uncertainties between the single T2K bin (energies up to 1.25 GeV) and the MINOS low energy (1-2 GeV) bins are potentially correlated.
To assess the necessity of evaluating this correlation, we performed a worst-case estimate of its effect, by assuming complete correlation between the T2K cross-section error, and the total MINOS cross-section plus flux error [19]. These numbers were chosen as they will overestimate the effect and are available from the cited publications. Running our analysis with and without these worst-case correlations included, we found negligible difference in the results. We therefore conclude that we can neglect correlated errors for the MINOS-T2K combination at this point, though this may not be true for later data sets, if cross-section errors assume a greater relative importance. It should be noted for the future that whilst one reactor experiment could be combined with MINOS and T2K without needing to account for correlations (very different energies, flux, baseline etc.), if multiple reactor experiments were to be included some effort would be needed to calculate the correlations between their errors. The same would be true if another long-baseline neutrino experiment with similar peak neutrino energy as either MINOS or T2K were to be included.

Results
As previously mentioned, we take the profile likelihood, minimising over systematic errors, in our final fit. We also minimise over δ when calculating the minimum value of our likelihood function, since the combination of the two experiments gives slight sensitivity to CP-violation. The normal and inverted neutrino mass hierarchies are treated separately. We find allowed regions of: 0.02 (0.03) < sin 2 2θ 13 < 0.16 (0.21) at 95% C.L., 0.03 (0.04) < sin 2 2θ 13 < 0.15 (0.19) at 90% C.L., and 0.04 (0.05) < sin 2 2θ 13 < 0.12 (0.16) at 68% C.L., for the normal (inverted) neutrino mass hierarchy, where we have taken the profile likelihood, minimising over δ, for each value of sin 2 2θ 13 , both for the data and during the calculation of the critical values of ∆(2lnL) (δ was thrown uniformly across [0, 2π) in the generation of the toy experiments), to give one-dimensional confidence intervals. Two-dimensional confidence intervals are shown in Figure 3. No values of δ can be ruled out at 1σ. The significance of the neutrino mass hierarchy preference, calculated by generating toy experiments about the global best fit point (which happens to be in the inverted neutrino mass hierarchy), and seeing what fraction of toy experiments had their global best fit point in the inverted mass hierarchy, is negligible. The best fit values of the oscillation parameters are sin 2 2θ 13 = 0.08 (0.11) and δ = 1.1 (2.0)π for the normal (inverted) neutrino mass hierarchy. Our best fit value is compatible with the results presented in [20]. We exclude sin 2 2θ 13 = 0 at 2.7σ (2.8σ) for the normal (inverted) neutrino mass hierarchy.
For comparison, in Figure 4 we show the contours that would be obtained by selecting an allowed region using a fixed value of ∆(2lnL). The regions obtained using the Feldman-Cousins approach are significantly narrower than those from the fixed log-likelihood contours.   Figure 3. The Feldman Cousins 68%, 90% and 95% confidence level contours for the combined electron neutrino appearance measurement of T2K and MINOS, for normal (top) and inverted neutrino mass hierarchies. We exclude sin 2 2θ 13 = 0 at 2.7σ, in the normal hierarchy.   Figure 4. The Feldman-Cousins confidence intervals and those obtained by placing the contour at a fixed ∆(2lnL) elevation. Shown here are the 95% CL allowed region (and corresponding ∆(2lnL) = 5.99 contour) and 68% CL allowed region (and corresponding ∆(2lnL) = 2.30 contour). The Feldman-Cousins allowed regions are significantly narrower than those formed using fixed ∆(2lnL) elevations.