Compatibility Analysis of the LSND Evidence and the KARMEN Exclusion for numubar->nuebar Oscillations

A combined statistical analysis of the experimental results of the LSND and KARMEN numubar->nuebar oscillation search is presented. LSND has evidence for neutrino oscillations that is not confirmed by the KARMEN experiment. However, there is a region in the (sin^2(2theta),Dm^2) parameter space where the results of both experiments are statistically compatible. This joint analysis is based on likelihood functions for both data sets. A frequentist approach creating Monte Carlo samples analogous to the experimental outcome is applied to deduce correct confidence limits. Different schemes of combination can be chosen to provide correct coverage which lead to slightly different confidence regions in (sin^2(2theta),Dm^2).


Introduction
Over the last years, the controversial results of the two experiments LSND (Liquid Scintillator Neutrino Detector at LANSCE, Los Alamos, USA) and KARMEN (KArlsruhe Rutherford Medium Energy Neutrino experiment at ISIS, Rutherford, UK) both searching for neutrino oscillationsν µ →ν e have led to intense discussions. The two experiments are similar as they useν µ beams from the π + -µ + decay at rest (DAR) chain π + → µ + + ν µ followed by µ + → e + + ν e +ν µ with energies up to 52 MeV. Furthermore, both experiments are looking forν e fromν µ →ν e oscillations via the reaction p (ν e , e + ) n providing a spatially correlated delayed coincidence signature of a prompt e + and a subsequent neutron capture signal.
However, the detection techniques are significantly different: LSND uses a homogenous detector volume of mineral oil with a low concentration of scintillator viewed by 1220 phototubes, thereby giving excellent particle identification by detecting a directional Cherenkov cone as well as isotropic scintillation light with a characteristic pattern of hit photomultipliers [1]. KARMEN is a segmented liquid scintillation calorimeter with excellent time and energy resolution exploiting the distinct time structure of the ISIS neutrino source [2]. Thus, at KARMEN aν e excess fromν µ →ν e would be identified by requiring its time distribution to follow the 2.2 µs slope from the parent µ + decay.
In two data sets taken during the periods 1993-95 and 1996-98 LSND has observed a clear beam-on minus beam-off excess of events withν e signature, i.e. (e + ,n) sequences. These have been interpreted as evidence forν µ →ν e oscillations (see [3] for the first data set). The analysis of the sum of these data sets, although slightly different in their spectral shape, results in corresponding favored areas of the mixing parameter sin 2 (2Θ) and ∆m 2 [4].
KARMEN has found no excess events above the expected background. For all events, potentialν e signal and measured background, the energy, time and spatial distributions for both the prompt and delayed events are precisely known. Using this spectral information also leads to no hint for oscillations. Therefore, KARMEN cannot confirm the LSND result. Furthermore, 90% CL exclusion limits are deduced cutting into the LSND evidence region in the (sin 2 (2Θ),∆m 2 ) parameter space forν µ →ν e oscillations [5], [6].
The statistical analysis of the data has become a showcase of how to determine statistical significance and upper limits. KARMEN with no apparentν µ →ν e signal and very low background has the problem of treating a result in a low statistics regime near the physical boundary sin 2 (2Θ) = 0. In LSND, the maximum likelihood analysis of the data clearly indicates an oscillation signal. A problem arises when determining a region of correct confidence, i.e. statistical significance, in the (sin 2 (2Θ),∆m 2 ) plane having a likelihood function in two parameters, which shows a pathological behavior, namely an oscillatory dependence in ∆m 2 with numerous local maxima.
In 1998, the discussion was intensified by a paper of Feldman and Cousins [7], who described a method of dealing with the problems described above. Their approach to extract upper limits in the case of a small number of measured events is itself highly controversial, though recently adapted by the Particle Data Group [8].
This report describes the individual evaluation of both data sets with maximum likelihood methods. The statistical interpretation of the likelihood functions and confidence regions is based on a frequentist approach and follows closely the analysis suggested by Feldman and Cousins. The main purpose of such an approach is to determine correct regions of confidence in (sin 2 (2Θ),∆m 2 ) §. A correct coverage is defined in terms of frequency, i.e. fraction of occurrence for future experiments. Probability or confidence in this context does not mean "degree of belief" as defined in a Bayesian statistics. For a detailed introduction into Bayesian and frequentist approaches we refer to [10].
Although the central statements of LSND and KARMEN are contradicting there can be a region in the (sin 2 (2Θ),∆m 2 ) parameter space where the results are compatible. Combining the two experiments is done in different ways of constructing statistical distributions, pointing out that there is no unique way of determining regions of specific confidence. However, as we will see, the regions of compatibility in (sin 2 (2Θ),∆m 2 ) are very similar.
The method described below is a complete analysis of the two experiments. However, the actual result is preliminary for various reasons: A new analysis of the LSND data is under way with a new reconstruction algorithm. In addition, flux calculations and efficiencies for 1996-1998 used here are still preliminary. The KARMEN2 data used for this analysis is taken from February 1997 to February 1999 and represents about 50% of the envisaged total accumulated neutrino flux for the upgraded experiment. It is therefore an intermediate data set updated by the ongoing experiment.
A statistical analysis combining two experimental results which apparently disagree is a delicate and controversial approach. It is not the task nor the purpose of this analysis to overcome this disagreement. However, assuming that there is no serious systematical error in either of the experiments and the interpretation of their results with respect to oscillationsν µ →ν e , the question of statistical compatibility of the individual results is well justified and should be addressed quantitatively. This is the objective of the analysis presented in this paper.

KARMEN2 data evaluation
In 1996, the KARMEN experiment underwent a substantial upgrade. An additional veto counter with 300 m 2 surface surrounding the central detector on all sides was the main improvement. This veto counter reduces cosmic induced background for theν µ →ν e search by a factor of 40, which consists of energetic neutrons produced in the iron of the blockhouse by deep inelastic scattering of cosmic muons. With the new configuration and increased neutron detection efficiency, KARMEN is running as KARMEN2 since February 1997. Starting as a simple counting experiment [11], the evaluation method was changed last summer to a more sophisticated maximum likelihood analysis of the data, making use of detailed event information in energy, time and spatial position.

The data set
The data collected through February 1999 correspond to 4670 C accumulated proton charge on the ISIS target. Veto cuts for all veto components up to 24 µs before a potential oscillation event were applied and a spatial coincidence between the initial e + and the neutron capture of 1.3 m 3 was required. Figure 1 summarizes the remaining 8 event sequences in the appropriate energy and time windows. The background components are also given with their distributions. All components except the intrinsicν e contamination are measured online in different time and energy windows (see table 1). In total, the background expectation amounts to 7.8 ± 0.5 events. Therefore, the 8 extracted sequences show no hint for an oscillation signal. For oscillations with full mixing and large ∆m 2 , i.e. (sin 2 (2Θ) = 1, ∆m 2 ≥ 100 eV 2 ), a signal of 1605 ± 176 coincidences were expected. Taking the results from section 4.1 for the LSND signal in the region ∆m 2 < 2 eV 2 , one would have expected an oscillation signal of about 2 to 6ν µ →ν e events added to the background within the data set. In order to extract more information from the 8 events about any potentially small oscillation signal a detailed maximum likelihood analysis was performed.

Data analysis
This likelihood function analyses 5 event parameters: the energy E p and E d , the prompt time t p and the delayed coincidence ∆t = t d − t p as well as the spatial correlation ∆ x = x d − x p . The likelihood is calculated varying the oscillation signal r osc as well as the background components relative to the overall data sample: r CC for charged current events 12 C ( ν e , e − ) 12 N g.s. , r cos for cosmic background, r ran for random coincidences with a ν-induced prompt event and r con for the intrinsicν e contamination. With the condition 5 j=1 r j = 1 and ρ = (r osc , r CC , r cos , r ran , r con ) the likelihood function for the M = 8 events can be written as The density functions f ji contain the spectral information of all components, and as the positron energy spectrum depends on ∆m 2 , the dependence of L on ∆m 2 enters via the density function f 11 . The parameter sin 2 (2Θ) is determined by the ratio of oscillation events N osc = M · r osc divided by the expected number of events for maximal mixing N exp (∆m 2 , sin 2 (2Θ) = 1): sin 2 (2Θ) = N osc /N exp . The second line in (1) is the combined Poisson probability P for the background contributions r j calculated with the expectation values r expected j . As mentioned before, all background components but the simulated intrinsic contamination are measured online with their spectral information f ji and the expectation values r expected j . Since the event sample is still very small, the different backgrounds are summarized into one component with fixed relative contributions: f bi = 5 j=2 r expected j · f ji , and the combined Poisson probability in (1) reduces to P (r b |r expected b ) with r expected b = 7.8/8. Hence, the likelihood function effectively has two free parameters, r osc or sin 2 (2Θ) and ∆m 2 .
For technical reasons, it is more convenient to optimize the logarithmic likelihood function lnL. Taking the 8 events of KARMEN2 so far, this function has its maximum at a value of sin 2 (2Θ) < 0. This can be explained by the measured energy spectrum of the prompt events (figure 1a). Since there are all events below 36 MeV but 1.0 ± 0.1 background events expected above E p = 36 MeV, the best fit consists in increasing slightly the overall background contribution. Since an oscillation signal for large ∆m 2 has a higher contribution for E p > 36 MeV, a negative oscillation signal can then compensate the background to account for no events in that energy region. Figure 2 shows lnL where the maximum in the physically allowed range sin 2 (2Θ) ≥ 0 has been renormalized to a value of lnL(sin 2 (2Θ) m = 0, ∆m 2 ) = 100. Note the sharp fall with increasing sin 2 (2Θ). From the likelihood function it is obvious that there is no oscillation signal in the data. The task at this point is how to extract upper limits in (sin 2 (2Θ),∆m 2 ) for a given confidence level.
The method of extracting correct confidence regions in (sin 2 (2Θ),∆m 2 ) is based on a frequentist approach and will be discussed in detail in section 4.1: The basic idea is to create a large number of event samples analogous to the experiment. These samples are created by Monte Carlo using the full event information for the likelihood procedure. The samples also contain oscillation events based on a hypothesis H with (sin 2 (2Θ) H , ∆m 2 H ). A statistic is constructed by comparing each sample's maximum of lnL with the value lnL(sin 2 (2Θ) H , ∆m 2 H ) from which confidence regions are extracted.

LSND data evaluation
The data analysed in this context have been reduced by requiring the following criteria: They are so-called 'electron-like' events surviving a χ L cut, they have energies of 20 ≤ E ≤ 60 MeV and their reconstructed distance to the tank photomultiplier surface is d ≥ 35 cm. In the following, all data from 1993 through 1998 are analysed in one data set, where the flux calculations, efficiencies, and background treatment of the 1996-1998 data are preliminary extrapolations of the older data subset. The information about a delayed event is compressed into a likelihood ratio R. If within one millisecond after the initial event another event is recorded at distance ∆r ≤ 250 cm, the ratio of likelihood R in energy (PMT hits), time and distance of being a correlated p ( n,γ ) over an accidental coincidence is calculated, otherwise R = 0. Details of the event reconstruction and the definition of R can be found in [1] and [12].

Event samples
Requiring a high likelihood ratio R selects (e + ,n) correlated events with low uncorrelated background. Applying a cut of R ≥ 30 reduces the data to 70 events with 20 ≤ E ≤ 60MeV. After subtracting beam unrelated (17.7 ± 1.0) and beam related (12.8 ± 1.7) background, a net excess of 39.5±8.8 events remains. Its energy distribution is shown in figure 3 and clearly demonstrates the significance of the excess. This sample is referred positron energy (MeV) beam excess events to by LSND as "gold-plated" events. Taken the excess as oscillation signal, the signal to background ratio for this sample is better than 1.
To determine the oscillation parameters sin 2 (2Θ) and ∆m 2 , an event sample with no cut in R is used, leading to higher efficiency for the oscillation channel, but naturally increasing the background. Figure 4 shows this event sample comprising 3049 beam-on events. Four variables are used to categorize the events: Electron energy, likelihood ratio R, spatial distribution in the detector expressed in distance L to the LANSCE neutrino source A6 and the angle cos θ between the direction of the incident neutrino and the reconstructed electron path. Note that these are projections of a 4-dim space of correlated parameters for each event.
The evaluation method uses these 4 correlated parameters to extract the oscillation signal, i.e. sin 2 (2Θ) and ∆m 2 , from other background sources by calculating the overall likelihood function L for all 3049 events. There are 2 ways of setting up this function  described below.

Maximum likelihood analysis of bins
Up to the evaluation of the 1997 data, the likelihood function was defined as the product of the Poisson probability of all bins in the 4-dimensional space made up by the parameters (E, R, L, cos θ). Although this definition will not be used in the further context, the analysis of the 1993-95 data in [12] was based on this definition of the likelihood function. It is therefore described here for completeness and to allow a comparison to the results of the new likelihood method given in section 3.3.
The bin-based likelihood function was set up as with N = 198000 bins, n i the number of beam-on events in bin i and ν i the expected event number: BUB indicates beam unrelated background measured in the beam off time window with high precision, BRB stands for beam related background made up of various components: 12 C ( ν e , e − ) 12 N g.s. (making up 50.5% of the total BRB), 12 C ( ν e , e − ) 12 N * (33.6%), ν e -e − scattering (7%), 13 C ( ν e , e − ) 13 N (5.7%), ν µ -andν µ -induced events on protons and 12 C (1.9%) and intrinsicν e contamination (1.1%). The relative contributions are subject to change due to the 1996-1998 preliminary evaluation. The definition of the likelihood function was straight forward, but the treatment of the errors on the expected background components was somewhat arbitrary. The best fit resulted in an oscillation signal which, added up to the expected backgrounds, didn't explain the total beam excess, i.e. there were events in the sample which were not attributed to either BUB, BRB or oscillations. In a second step, the likelihood function was calculated again with a beam related background expectation enhanced (lowered) by one standard deviation. All three resulting lnL-functions were put in .OR., which means added with the same relative weight. The problem with this approach is that although the likelihood analysis clearly indicates an upward fluctuation of the BRB, the case of lowering the expectation of BRB is taken with the same weight as BRB + 1σ therefore artificially increasing the potential oscillation signal.
The favored region in (sin 2 (2Θ),∆m 2 ) was then defined by taking values greater than max(lnL)-2.3 and max(lnL)-4.6 as one would expect to determine 90% CL and 99% CL regions in a two dimensional Gaussian likelihood function. Given the oscillatory behavior of lnL as a function of ∆m 2 this gives obviously only favored regions but no correct coverage in terms of confidence.

Event-based likelihood analysis
To overcome the problem of background fluctuations, an alternative construction of the likelihood function based directly on the events was introduced . In the following, all results are based on this new method unless otherwise indicated explicitly.
The likelihood function is the product of all M individual event likelihoods to fit a combination of 4-dim density distributions f (E, R, L, cos θ) where the relative strengths r of the contributions are the parameters to be optimized with the side condition r j = 1. In an approximation, all beam related backgrounds are added up to one This method, in addition, has the technical advantage of being less CPU time consuming if the number of events investigated is less than the number of bins in the parameter space.
contribution ¶. Treating all beam related backgrounds as one component, the likelihood function is defined as There are effectively three free parameters: r osc or sin 2 (2Θ), ∆m 2 and r brb . The Gaussian terms account for the background expectation values and their systematic and statistical uncertainties BUB = 1467 ± 38 and BRB = 1349 ± 148. If the shape analysis of the M events (first two lines of equation 4) favors values of r brb or r bub corresponding to BRB or BUB far from the expectation, the overall likelihood value is reduced by these Gaussian expressions.
The oscillation parameter sin 2 (2Θ) is determined as a function of ∆m 2 according to where N ∆m 2 (sin 2 (2Θ) = 1) indicates the number of oscillation events expected for a given ∆m 2 and full mixing sin 2 (2Θ) = 1 in the detector, taking all resolution functions and cuts into account. The absolute event numbers corresponding to the maximal value of L(r osc , r brb ) are 73 oscillation events, 1495 BRB and 1481 BUB events for a 3049 total event sample. As already indicated in the bin-based likelihood analysis, the best fit favors an upward fluctuation of the beam related background of one standard deviation. In a next step, the original likelihood function (4) is then integrated along the axis of the parameter r brb which is of no further interest. This is a standard procedure of reducing free parameters of a likelihood function described in [13]. The logarithmic likelihood lnL is therefore a function of the 2 free oscillation parameters lnL(sin 2 (2Θ), ∆m 2 ) which is shown in figure 5. It reaches its maximum in the physically allowed range of sin 2 (2Θ) ≤ 1 at (sin 2 (2Θ) m = 0.93, ∆m 2 m = 0.056eV 2 ) which is set to lnL(sin 2 (2Θ) m , ∆m 2 m ) = 100. However, the position of this maximum in (sin 2 (2Θ),∆m 2 ) is not significant due to the flatness of the likelihood function along its 'ridge' for small values of ∆m 2 . Allowing values of sin 2 (2Θ) > 1 in the maximum likelihood analysis, the maximum of lnL increases by only 0.02 units along a line of constant values sin 2 (2Θ) · (∆m 2 ) 2 which can be understood by developing the oscillation probability for small ∆m 2 : In the limit of small ∆m 2 , the energy distribution of oscillation events does not depend any longer on ∆m 2 . This is underlined by the best fit result of 73 oscillation events which is stable along the 'ridge' of lnL. Note, however, that the maximum likelihood analysis favors the lowest value of ∆m 2 possible within the fit, or, in other words, thē ν e energy spectrum with the lowest mean value possible by oscillationsν µ →ν e .

Checks of the event-based likelihood results
To test the dependence of the likelihood function on the expectation values of the background, the Gaussian terms in (4) were omitted. The result of the likelihood analysis is nearly unchanged underlining the stability of the shape analysis.
In a second test, only events with 36 ≤ E ≤ 60 MeV were analysed. The event sample consisted of 476 events with expectations of BRB = 119 ± 13 and BUB = 274 ± 17 for the background. The maximum of lnL is then reached at (sin 2 (2Θ) = 0.69, ∆m 2 = 0.063eV 2 ) with 46 oscillation events, BRB = 148 and BUB = 281 using the full likelihood function as defined in (4). This result is in good statistical agreement with the analysis of the larger event sample concerning the extracted oscillation signal. However, it is obvious that with the stringent energy cut there is a certain loss of discrimination power between low and high ∆m 2 solutions (see for example figure 3). This can be seen by comparing the best fit values for ∆m 2 = 100 eV 2 . For both energy windows the favored mixing at this ∆m 2 is sin 2 (2Θ) = 4.0 · 10 −3 . The distance in logarithmic likelihood units to the maximum is different, however: ∆lnL = lnL(sin 2 (2Θ) m , ∆m 2 m ) − lnL(4.0 · 10 −3 , 100eV 2 ) = 2.6 for 20 ≤ E ≤ 60 MeV compared to ∆lnL = 0.25 for 36 ≤ E ≤ 60 MeV.
Applying the stringent energy cut of 36 ≤ E ≤ 60 MeV and analysing only the spectral shape of the events (i.e. discarding the Gaussian expectation terms in equation 4) leads to a less consistent maximum likelihood result. Again, there is almost no discrimination between different ∆m 2 values: The values of lnL along the best fit values of sin 2 (2Θ) as a function of ∆m 2 are almost constant. But the oscillation signal is significantly reduced from about 46 to some 21 events, with background contributions to the data sample of BRB = 267 and BUB = 188 far from their expectation values. This may indicate some unexplained distortion of the high energy part of the event spectrum but could also be due to the already mentioned naturally decreased discrimination ability of the maximum likelihood method in a very narrow energy window. Another useful check is the comparison of the two likelihood functions defined in (2) and (4). Figure 6 shows the regions in (sin 2 (2Θ),∆m 2 ) obtained by taking the contours of both logarithmic likelihood functions at values of ∆lnL = 2.3 and 4.6 units below their global maximum. As explained above, the treatment of the background uncertainty

Combining both experiments
In this section, we will first construct the correct confidence regions in the parameter space (sin 2 (2Θ),∆m 2 ) ofν µ →ν e oscillations for each experiment individually. In the second part (4.2), the logarithmic likelihood functions of both experiments are added and discussed qualitatively. In section 4.3, methods of using and combining the obtained experiment statistics are defined. These will then allow the extraction of combined confidence regions. The results of these combination methods are presented and discussed in part 4.4.

Construction of individual confidence regions
The basic idea of getting correct confidence regions using the logarithmic likelihood function lnL(sin 2 (2Θ), ∆m 2 ) is to create a statistic based on a frequentist approach. A high number of event samples is created by Monte Carlo using all experimental information on the event parameters. Different hypotheses are tested by including in the generated event samples oscillation events according to the oscillation parameters (sin 2 (2Θ),∆m 2 ). In this section we will describe this method in detail for the LSND experiment and then show the representative results for both KARMEN and LSND. The analogous statistical approach for the KARMEN data can be found in detail in [14]. For a preselectedν µ →ν e oscillation hypothesis H with oscillation parameters (sin 2 (2Θ) H , ∆m 2 H ) the creation of a LSND-like event sample is done in two steps. First, the number of oscillation events, BRB and BUB are thrown on the basis of the corresponding expectation values + . In a second step, for each event, parameters (E,R,L,cos θ) are generated from the density functions f j (E, R, L, cos θ). The index j stands for the 3 different contributions.
After an event sample is generated, the sample is analysed in exactly the same way as the experimental sample, i.e. the logarithm of the likelihood function (4) is calculated as a function of (sin 2 (2Θ),∆m 2 ). Figure 7 shows the distribution of the maxima (sin 2 (2Θ) m , ∆m 2 m ) of 1000 MC generated samples with (sin 2 (2Θ) H , ∆m 2 H ) = (4.2 · 10 −3 , ∆m 2 = 1eV 2 ). The maxima are spread over a wide range in (sin 2 (2Θ),∆m 2 ) indicating already the limited capability to determine a small area in (sin 2 (2Θ),∆m 2 ) on the basis of the LSND event sample.
To construct confidence regions, the distribution shown in figure 8 is central and should be read in the following way: To include the oscillation hypothesis (sin 2 (2Θ) H , ∆m 2 H ) with a probability (frequency of occurrence) of 90%, the area in (sin 2 (2Θ),∆m 2 ) has to be defined by cutting lnL at a value of ∆lnL(90%) = 3.25 for each individual likelihood function. This statistic as a function of ∆lnL shows the spreading of the maximal value of lnL compared to a given pair of oscillation parameters. If, for a given experiment, the value ∆lnL exp is smaller than ∆lnL obtained for a specific + The overall event number per sample is fixed to be 3049, the experimental event number.  hypothesis, such a parameter combination (sin 2 (2Θ) H , ∆m 2 H ) would be included in the region of 90% confidence. For the LSND logarithmic likelihood function, the difference is ∆lnL = lnL(sin 2 (2Θ) m , ∆m 2 m ) − lnL(4.2 · 10 −3 , 1eV 2 ) = 1.4 As shown in figure 8, in 52% of all MC samples ∆lnL is expected to be larger than 1.4 demonstrating that (4.2 · 10 −3 , 1eV 2 ) is clearly within the 90% confidence region of the LSND experimental result.
As ∆lnL(90%) is itself a function of the parameters (sin 2 (2Θ) H , ∆m 2 H ), the generation of MC samples has to be repeated for all possible parameter combinations (sin 2 (2Θ),∆m 2 ) under consideration. This task, however, is almost impossible due to the large computing time necessary to construct the statistic in ∆lnL for each point. The creation and likelihood evaluation of 1000 samples as described above takes about 500 hours CPU time on the SGI origin 200 with R10000 processor available within the LSND computer cluster. The strategy was therefore to construct these statistics for representative pairs (sin 2 (2Θ),∆m 2 ) and interpolate the obtained ∆lnL(90%) values.
The normalized distribution in figure 8 is named C ′ (∆lnL) and the variable Plotting the normalized integration of C ′ as function of ∆ defined as ∆lnL ∆lnL confidence C Figure 9.
Cumulative distributions C(∆lnL L ) for various starting points (sin 2 (2Θ) H , ∆m 2 H ). The left plot shows the distributions C for hypotheses with high likelihood for the LSND sample whereas the right figure is based on 'unlikely' starting hypotheses. The intersection of C with the dotted lines can be used to extract the ∆ 90 and ∆ 99 values.
4.2 · 10 −3 , ∆m 2 H = 1eV 2 ) for the LSND analysis. Note that these C distributions could be quite different. There are two major conclusions to be drawn from figure 9: It is obvious that the approximations of constant ∆ 90 = 2.3 and ∆ 99 = 4.6 under the assumption of a Gaussian likelihood function with 2 independent parameters do not hold anymore * . On the other hand, although there are differences in the shape of C(∆ L ) depending on (sin 2 (2Θ) H , ∆m 2 H ) the variations are not too large, even by comparing extreme hypotheses like (sin 2 (2Θ) H = 0.1, ∆m 2 H = 0.1eV 2 ) or (sin 2 (2Θ) H = 1 · 10 −2 , ∆m 2 H = 1eV 2 ) with 25 or 170 oscillation events in average within their MC samples, respectively. This fact is mainly due to the relative high statistics of the event samples and their individual components which ensure relatively stable results of the likelihood analysis.
The situation for KARMEN concerning the C(∆ K ) distribution is different for two reasons. KARMEN2 has seen 8 events so far. The samples created by MC consequently also consist of this small statistic. Therefore, the distributions C and the values ∆ 90 K as functions of sin 2 (2Θ) and ∆m 2 have larger variations. This is demonstrated in figure 10 for the same starting points as for LSND. On the other side, the small event sample  Figure 11 shows regions of 4 different confidence levels for both experiments individually. For LSND, the innermost contour represents the area of lowest confidence. For KARMEN the confidence level of excluded areas increases with the curves from left to right. The lack of smoothness of the lines reflects the limited number of grid points as well as the statistics of samples generated per grid point (sin 2 (2Θ) H , ∆m 2 H ). This second limitation is also the reason why no 99% CL region is plotted. Checking the distribution shown in figure 7 of 1000 MC samples demonstrates that ∆ 99 L for LSND is determined by the tail of 10 MC samples and has therefore a large uncertainty. The highest confidence level deduced by these distributions which will be used in this context is therefore 95% confidence.
At 90% CL, each individual experimental outcome was compared with other experiments. Figure 12 shows the oscillation parameters inside the 90% CL LSND region and the 90% CL limits from KARMEN2 and other experiments. Notice that the limits of the Bugeyν e →ν x search [15], the CCFR combined ν µ → ν e andν µ →ν e search [16] and the preliminary results from the NOMAD ν µ → ν e search [17] are not based on this unified frequentist approach by Feldman and Cousins. Comparing the LSND 90% CL region with the region defined by a constant ∆lnL = 2.3 (see figure 6 for 1993-98 data), solutions with high ∆m 2 'reappear'. However, due to the relative steepness of the logarithmic likelihood function of the 1993-98 LSND evaluation, the changes from ∆lnL = 2.3 to the correct ∆lnL(sin 2 (2Θ), ∆m 2 ) do not lead to a dramatically different confidence region in (sin 2 (2Θ),∆m 2 ). One of the most misleading but nevertheless very frequently used interpretation of the LSND and KARMEN results is to take the LSND region left of the KARMEN exclusion curve as area of (sin 2 (2Θ),∆m 2 ) 'left over'. Such an interpretation, though appealingly straight forward, completely ignores the information of both likelihood functions and reduces them to two discrete levels of individual 90% confidence. To be able to correctly combine the two experimental results and extract the combined confidence regions, we have to go some steps back to the original information of the distributions C ′ K (sin 2 (2Θ), ∆m 2 ) for KARMEN and C ′ L (sin 2 (2Θ), ∆m 2 ) for LSND. This is the task for section 4.3.

Combining likelihood functions
It is a well known procedure to multiply the likelihood functions of two independent experiments in order to combine the experimental results. Instead of multiplying the  likelihood functions, an equivalent way is to add the logarithms. As already indicated in figures 2 and 5, there is some freedom in choosing the absolute scale of lnL. A convenient presentation of lnL is to normalize the individual functions lnL K and lnL L to a point in (sin 2 (2Θ),∆m 2 ) where they are equally sensitive to a potential signal. In our case of the oscillation search this corresponds to values of sin 2 (2Θ) = 0. A stringent exclusion would then lead to only negative values of lnL whereas a strong signal leads to a significant maximum with a positive value of lnL ♯. Hence, the combined logarithmic likelihood function can be expressed as lnL(sin 2 (2Θ), ∆m 2 ) = {lnL K (sin 2 (2Θ), ∆m 2 ) − lnL K (sin 2 (2Θ) = 0)} + {lnL L (sin 2 (2Θ), ∆m 2 ) − lnL L (sin 2 (2Θ) = 0)} (9) Figure 13 shows the combined function lnL(sin 2 (2Θ), ∆m 2 ) with a maximum of lnL(sin 2 (2Θ) = 0.78, ∆m 2 = 0.058) = 14.1 on a long flat 'ridge' of low ∆m 2 values. Figure 14 shows slices for some values of ∆m 2 for the three ♯ Note, however, that the absolute values of lnL have no direct meaning. Information can be obtained only by comparing the values within the (sin 2 (2Θ),∆m 2 ) parameter space or with the individual experimental likelihood functions.
functions lnL K (sin 2 (2Θ), ∆m 2 ) − lnL K (sin 2 (2Θ) = 0) (leftmost or green curves), lnL L (sin 2 (2Θ), ∆m 2 ) − lnL L (sin 2 (2Θ) = 0) (rightmost or blue curves) and lnL as defined in equation 9. The function lnL(sin 2 (2Θ), ∆m 2 ) allows a direct qualitative interpretation of the experiments: There is a clear maximum of the combined likelihood function with a positive value of lnL favoring overall the evidence for oscillations given by LSND. On the other hand, compared to the individual LSND maximum, lnL L , the negative KARMEN result reduces the maximal value by 1.6 units (see figure 14 for ∆m 2 = 0.1 eV 2 ) which corresponds to a reduction to only 20% of the original maximal likelihood. This reduction of the global maximum is a direct reflection of the general disagreement of the two experimental results. From figure 14 it is seen that for low ∆m 2 the position in sin 2 (2Θ) of the maximum is not substantially shifted. In contrast, for larger ∆m 2 the negative influence of the KARMEN result clearly shifts the maximum in sin 2 (2Θ) and strongly reduces the LSND likelihood value. It also increases the difference ∆lnL to the global maximum which is an important fact in terms of the statistics C ′ (∆) and demonstrates that values of ∆m 2 > 2 eV 2 have a much smaller likelihood than some combinations (sin 2 (2Θ),∆m 2 ) in the low ∆m 2 region. Although these observations help in assessing the combination of the two experiments, probability statements cannot be deduced from the above arguments. However, an evaluation of quantitative confidence regions can be based on the distributions C ′ (∆), which is shown below.

Methods to combine both experiments
In this section we describe 4 different methods to extract areas in (sin 2 (2Θ),∆m 2 ) of a certain confidence level CL. Though they can be derived analytically we follow a more phenomenological approach. The methods are based on different ways of ordering in a two dimensional space created by the individual statistics of the two experiments, C ′ L and C ′ K . The assumption that the two experiments LSND and KARMEN are independent is well justified. Therefore, a two dimensional distribution C ′ (∆ L , ∆ K ) can be constructed from the one dimensional normalized distributions C ′ (∆ L ) and C ′ (∆ K ) by an inverse projection. A box plot of C ′ (∆ L , ∆ K ) and its original functions C ′ are shown in figure 15 for an example of a chosen parameter combination of (sin 2 (2Θ) = 4·10 −3 , ∆m 2 = 2eV 2 ). The different lines in figure 15 correspond to the limits for 90% CL of the different methods described below.
This corresponds to a rectangle in (∆ L , ∆ K ) defined by the side lengths ∆ CL K and ∆ CL L . The combined confidence is then CL comb = (CL) 2 . To obtain a confidence level of CL comb = 0.9 we therefore have to determine ∆ 95 i . The lines in figure 15 labeled (a) show these values ∆ 95 i and the resulting rectangle in (∆ L , ∆ K ). If the experimental value (∆ exp L , ∆ exp K ) lies within this rectangle the parameter combination (4 · 10 −3 , 2eV 2 ) is included in the combined 90% CL region. This method can be expressed also by taking the overlap of the √ CL confidence regions of both experiments to deduce the combined CL confidence region.

Method b
The second method is based on the combined statistic C ′ (∆) with ∆ = ∆ L + ∆ K defined as the convolution of the individual ones The confidence value ∆ CL is then defined by integration of C ′ : For a given CL, the limit corresponds to a diagonal line in figure 15, where (b) indicates ∆ 90 for this specific (sin 2 (2Θ),∆m 2 ). The value ∆ exp = ∆ exp L + ∆ exp K is then compared with this ∆ 90 . If ∆ exp ≤ ∆ 90 the combination (4 · 10 −3 , 2eV 2 ) is accepted at a 90% confidence level. Such an approach in (∆ L , ∆ K ) corresponds to an ordering along lines of constant combined likelihood, ∆ below the two maxima of the likelihood functions.

Method c
This method is based on an ordering principle of the elements C ′ (∆ L , ∆ K ), i.e. the frequency or probability of occurrence of (∆ L , ∆ K ). This differs to integrating starting at ∆ = 0 as it is done in the previously described approaches. For a given confidence level CL, combinations (∆ L , ∆ K ) are added up in descending order starting with the highest probability of occurrence C ′ until a fraction of CL of the total C ′ (∆ L , ∆ K )d∆ L d∆ K is reached. In figure 15 this subset S of all (∆ L , ∆ K ) is shown in blue. If (∆ exp L , ∆ exp K ) ∈ S, the combination (sin 2 (2Θ),∆m 2 ) under consideration is included in the confidence region.

Discussion
It is instructive to discuss the differences of the methods by comparing the corresponding areas of the (∆ L , ∆ K ) plane (see figure 15) by each method. The triangle defined by (b) and the rectangle defined by (a) have almost the same area. In their corners with high values of ∆ i they allow experimental outcomes which are very unlikely, at least for one experiment. This drawback is overcome by the method (c) of ordering along probability of occurrence which has the disadvantage of principally disfavoring the unlikely, but very best fits of very small ∆ i . On the other side, the convolution method integrates along contours of constant likelihood for the combined likelihood function which is a very plausible procedure. The easiest and most straight forward method may be method (a), and as we will see, leads to confidence regions very similar to those obtained by the convolution or ordering method.
In section 4.4 we will show the resulting confidence regions in (sin 2 (2Θ),∆m 2 ) for all these methods and provide further discussion and interpretation of the approaches.

Combined regions of confidence
The combined regions of 90% and 95% confidence are shown in figure 16 as green and yellow areas in (sin 2 (2Θ),∆m 2 ). The figures (a) through (d) correspond to the methods (a) through (d) described in section 4.3. Also shown for comparison are the individual experimental results: The KARMEN 90% CL exclusion curve (K) and the LSND 90% CL region (L) according to the frequentist approach (see figure 12) as well as the exclusion curves of the two experiments Bugeyν e →ν x (B) and NOMAD ν µ → ν e (N). Comparing the results of methods (a) to (c) (in short called overlap, convolution and ordering), the confidence regions have only minor differences. High ∆m 2 solutions are not excluded at 95% confidence, although the convolution and ordering methods clearly favor ∆m 2 < 10 eV 2 . The confidence region for ∆m 2 < 2 eV 2 is almost identical for all combinations. At first sight, these regions are even similar to the 90% CL region of LSND only (see lines indicated with L in figure 16), however the combined 90% CL region extends to smaller values of sin 2 (2Θ) in the low ∆m 2 region. For large ∆m 2 , the combined region is reduced and shifted to smaller mixing values. Although there are regions at ∆m 2 > 2 eV 2 within a 90% CL these solutions have considerably smaller likelihood than along the 'ridge' at low ∆m 2 , as was discussed in section 4.2. This argument is underlined if regarding regions of combined confidence at an 80% confidence level. At such a level, none of the methods (a) through (c) include solutions above ∆m 2 = 2 eV 2 .
As mentioned before, the limited statistics of the LSND frequentist analysis creating the distributions C ′ does not allow to calculate 99% CL regions with the needed accuracy. However, the only minor extension of the 90% CL area by the extracted combined 95% confidence region indicates the statistical significance of the positive LSND result. An oscillation scenario according to this combined statistical analysis is clearly compatible with both experiments. Of course, this does not account for a potential systematic error or misinterpretation of the beam excess seen in LSND. Figure 16(d) shows a very distinct region of 90% confidence. It is a very controversial way of combining two experiments if their central statements are different. By selecting relative high confidence (70%) regions of each experiment and building a common area by adding these regions, the underlying interpretation tends to choose one experiment over the other. As discussed in section 4.3 for the other methods, this approach allows combinations of (∆ L , ∆ K ) where one value is not restricted at all, i.e. extremely unlikely points in the constructed two dimensional statistics space of C ′ (∆ L , ∆ K ). This method will, with better individual experimental statistics, ultimately lead to two distinct areas forming a combined 90% CL which does not help in making a decision about the statistical compatibility of two experiments. Although this method results in a correct coverage, it is therefore obviously disfavored.

Conclusion and outlook
The data sets of both the LSND and KARMEN experiment were analysed with a maximum likelihood method. The definition of the LSND likelihood function was changed from a combined likelihood of bin contents to a product of event based likelihoods allowing the backgrounds to float according to their expectation value and its uncertainty. This improvement led to slightly lower values of sin 2 (2Θ) for a given ∆m 2 .
For the first time, a frequentist approach based on [7] was applied to determine confidence regions of correct coverage for the LSND experiment. It is shown that in the case of a likelihood function depending on the oscillation parameters sin 2 (2Θ) and ∆m 2 , the approach assuming a two dimensional Gaussian likelihood function is only a rough approximation and does not lead to correct coverage.
As both the KARMEN and LSND experimental data were analysed with a likelihood function and the statistics to deduce confidence regions were built in the same manner, it is possible to combine the likelihood functions and extract combined confidence regions based on a combination of the individual statistics created by Monte Carlo procedures. These regions are regions of correct coverage in terms of a frequentist approach.
The combined confidence regions in (sin 2 (2Θ),∆m 2 ) extracted from different methods to combine the experiments are very similar (if we exclude the controversial and not really convincing method (d)). Though graphically not very different from what one would expect in a naive approach using the individual 90% CL regions, a more detailed look shows that there are subtle differences. In figure 12 there are no LSND areas left to the KARMEN exclusion curve for ∆m 2 > 2 eV 2 . The combined 90% CL regions in figure 16(a-c) however include some areas of larger ∆m 2 although the likelihood value lnL is much smaller than for ∆m 2 < 2 eV 2 therefore favoring the low ∆m 2 solutions. Requiring a more stringent confidence level of 80% or less, only solutions with ∆m 2 < 2 eV 2 remain.
The results of this analysis remain preliminary, as stated in the introduction. This is not due to the statistical analysis itself but to the data sets used. However, the work described here is the first statistical analysis combining both the LSND and KARMEN experimental outcomes and shows the feasability and results of such a method. As there are other experiments like NOMAD, CCFR and Bugey sensitive in part to the confidence region in (sin 2 (2Θ),∆m 2 ), a complete analysis should also include these results on the basis of the same statistical analysis. This implies, however, the detailed knowledge of experimental data of these experiments not accessible to the author. In addition, the exclusion curve from the Bugey experiment is based on the disappearance searchν e →ν x . Combining this experiment correctly with the appearance results ofν µ →ν e or ν µ → ν e in terms of mixing angles would therefore also require a full three or four dimensional (with a sterile neutrino) mixing scheme.
KARMEN2 has no evidence for oscillations and sets the most stringent experimental limits so far on the mixing sin 2 (2Θ) for a range of 0.2 ≤ ∆m 2 ≤ 20 eV 2 . However, the sensitivity of its result is not sufficient to completely cover the parameter region given by the LSND signal. It was shown quantitatively that there are areas of confidence in (sin 2 (2Θ),∆m 2 ) which are compatible with both experiments, especially for ∆m 2 ≤ 2 eV 2 . Such a statistical analysis is necessary to assess the two experimental outcomes in terms of (sin 2 (2Θ),∆m 2 ), however, further experimental investigations are needed. The Booster Neutrino Experiment BooNE [18] at Fermilab will be built to check this controversial region of the oscillation parameters with high statistics and different systematics. Another experiment is proposed at the CERN proton synchrotron [19]. These experiments may then resolve the issue of the KARMEN and LSND results on ν µ →ν e oscillations.

Acknowledgments
Many people have substantially contributed to this work. First of all, I want to thank both collaborations in general for allowing the use of the full experimental data for this analysis. In particular, Markus Steidl provided me with the most recent data analyses of KARMEN2. In numerous discussions the track for the analysis was set up. Especially helpful were discussions and clarifications with Geoffrey B. Mills and Steven J. Yellin's proposals on how to combine the individual statistics, and I gratefully acknowledge their help.
Last but not least I want to thank the Alexander von Humboldt-Foundation who, by granting me a Feodor-Lynen fellowship to visit Gerald T. Garvey, made possible a very stimulating and fruitful stay with my colleagues in Los Alamos.