toise: a framework to describe the performance of high-energy neutrino detectors

Neutrinos offer a unique window to the distant, high-energy universe. Several next-generation instruments are being designed and proposed to characterize the flux of TeV--EeV neutrinos. The projected physics reach of the detectors is often quantified with simulation studies. However, a complete Monte Carlo estimate of detector performance is costly from a computational perspective, restricting the number of detector configurations considered when designing the instruments. In this paper, we present a new Python-based software framework, toise, which forecasts the performance of a high-energy neutrino detector using parameterizations of the detector performance, such as the effective areas, angular and energy resolutions, etc. The framework can be used to forecast performance of a variety of physics analyses, including sensitivities to diffuse fluxes of neutrinos and sensitivity to both transient and steady state point sources. This parameterized approach reduces the need for extensive simulation studies in order to estimate detector performance, and allows the user to study the influence of single performance metrics, like the angular resolution, in isolation. The framework is designed to allow for multiple detector components, each with different responses and exposure times, and supports paramterization of both optical- and radio-Cherenkov (Askaryan) neutrino telescopes. In the paper, we describe the mathematical concepts behind toise and provide detailed instructive examples to introduce the reader to use of the framework.


Introduction
High-energy neutrinos are key messengers to the distant, high energy universe. To this end, several high energy neutrino experiments have been built in the last decade with the goal of discovering and characterizing the flux of TeV-EeV neutrinos. This includes experiments relying on the optical-Cherenkov light produced in neutrino-nucleon interactions, such as IceCube [1], ANTARES [2], KM3NeT [3], Baikal-GVD [4], as well as experiments that rely on the radio-Cherenkov light produced by super-PeV neutrinos, such as RICE [5], ARIANNA [6], and ARA [7]. Analysis of data from these high-energy neutrino detectors requires a detailed model of the detector response that can be used to predict observations under some physics scenario. This model typically has such a large number of dimensions that it is infeasible to evaluate directly, and instead must be estimated via Monte Carlo (MC) simulation. This can be quite computationally expensive, requiring hundreds to thousands of CPU-or GPU-hours per hour of detector livetime. For example, the IceCube collaboration reports that to achieve 10 years of simulated detector livetime with adequate statistics, they require 6M CPU hours for signal simulation and over ∼ 30k CPU years for background simulation [8].
When designing a future high-energy neutrino detector, like IceCube-Gen2 [9] or P-ONE [10], it is infeasible to undertake these enormous simulation efforts while the designs are still rapidly evolving. Further, it can be helpful to estimate how a detector's projected sensitivity would compare to that of existing, well-understood analyses. In this paper, we present a framework for producing such sensitivity estimates, using a simplified and factorized model of the detector response based on targeted MC simulations. This makes it possible to efficiently compare different detector designs without repeating the entire simulation chain, as well as study the influence of various factors on the detector's sensitivity in isolation.
The toise framework is written in Python [11], installable with conda [12][13][14] and pip [15], and utilizes Jupyter notebooks to illustrate its features and capabilities. We provide comprehensive tutorials that also form the basis of this article.

Describing a neutrino detector
A neutrino detector can be fully described by a few functions. While accurately obtaining each of these may be difficult, they are straightforward from a didactic point of view. The first is a model for the neutrino flux and its relevant interactions in the sensitive volume. In toise, this is parameterized as a transfer tensor between initial neutrino flavor states and observable final states like muons, and is described in Sec. 2.1. The second is the acceptance of the detector to signals and backgrounds, which are the analysis/selection efficiencies and the effective areas; this is described in Sec. 2.2. The third and fourth components are the detector's angular and energy resolution with respect to the observable final states, and are discussed in Sec. 2.3.
In the toise framework, the full information on an instrument's response is encoded in a fivedimension response tensor , ,cos( ), rec ,ΔΨ . The dimensions of this response tensor are the six neutrino types (3 (anti-)neutrino flavors), the true neutrino energy and direction cos( ), the reconstructed energy rec and the angular reconstruction uncertainty ΔΨ. This section describes both the general approach, as well as how the parameterizations are treated within the framework.

Neutrino Physics
Fluxes of atmospheric and astrophysical neutrinos are typically parameterized at the surface of the Earth, but a neutrino detector below the surface is sensitive only to the number of neutrino-induced leptons that reach the instrumented volume and the number of neutrino-induced particle showers (also referred to as cascades) that occur inside the sensitive volume. In order to separate the part of the detector performance that can be influenced by design choices from the limitations imposed by the physics of neutrino interactions, we split the event rate calculation into two stages: neutrino physics and detection. In the neutrino physics stage described in this section, we convert neutrino fluxes at the surface of the Earth to area or volume densities of final states at the detector. These densities serve as input to the detection stage described in Sec. 2.2 and do not depend on the detector configuration.
In the neutrino energy range relevant for optical neutrino detection, two general classes of neutrino-induced events are relevant. The first are incoming muon events, where a neutrino interaction far from the detector produces a high-energy muon that reaches the instrumented volume. These are almost entirely due to charged-current (CC) interactions. The second are starting events, where the neutrino vertex is close enough to the instrumented volume that the entire final state is observable, making flavor identification, in principle, possible. All neutrino flavors, and both charged-current (CC) and neutral-current (NC) interactions contribute to this event class.
The transfer tensor for muon events is computed as follows. The flux of neutrino-induced muons arriving at the instrumented volume with energy from the colatitude angle is given by where the sum runs over neutrino flavors, and ( | ) is the probability that a neutrino of flavor and energy at the surface of the Earth produces a muon that reaches the instrumented volume with energy less than . This is evaluated by Monte Carlo, propagating neutrinos to the target volume with NeutrinoGenerator [16] and propagating muons produced in charged-current interactions with PROPOSAL [17]. The final product is a 6 × 100 × 40 × 100 transfer tensor . Each entry gives the probability that a neutrino of type , incident on the Earth with energy and angle with respect to the zenith at the detector size, would produce a muon that reaches the instrumented volume with energy . This approach treats the details of muon range straggling correctly [18], at the expense of fixing the neutrino-nucleon cross-section.
The transfer tensor for starting events is calculated using a method adapted from Vincent et al. [19]. First, we solve the cascade equations for a test flux to obtain a transfer matrix where = ( , | , ), the probability that a neutrino incident on the Earth with energy , either reaches the detector with energy , or produces a secondary neutrino that reaches the detector with energy , . For each zenith angle, we produce one matrix for each pair of initial and final neutrino types, resulting in 36 matrices for each zenith angle. Of these, only 10 have nonzero entries: → and → for = { , , } due to CC losses and NC down-scattering, as well as → , → , → , and → due to secondary decay. We neglect the energy loss of leptons before decay, as well as the contribution of secondary neutrinos from leptonic − decays. Fig. 1 shows columns of for different initial and final neutrino types at a moderate column depth. Next, we use the differential neutrino-nucleon cross-sections to approximate the production rate of neutrino-induced cascades as a function of neutrino and cascade energy: where is the mass density of the detector medium and Δ is the width of each energy bin. For CC and interactions both the outgoing electron and hadronic cascade contribute; for resonant / − scattering, there is an additional contribution from hadronic decays of the resulting − . As in the transmission calculation, we neglect contributions from leptonic − decays. For and and NC and interactions, only the hadronic cascade contributes. For CC and cascades, we consider only cascades from the decay of the secondary ± . The end product is a transfer matrix for each neutrino type whose entries give the number of cascades produced per unit path length. Fig. 2 shows columns of for select neutrino types. Finally, we form the inner product of and to obtain a transfer tensor , where gives the average number of cascades that a neutrino of type , incident on the Earth with energy and angle with respect to the zenith at the detector site, would produce per unit length of detector medium at cascade energy . Fig. 3 shows slices of for select initial neutrino types at a moderate column depth.
This transfer tensor is then multiplied into the final-state effective area described in Sec. 2.2 to form a neutrino effective area whose first 3 dimensions are the same as those of the transfer tensor (neutrino type, energy, and incidence angle). The transfer tensor is calculated on demand given a parameterization of the neutrino-nucleon cross-section. One such parameterization, created from B-splines fit to cross-sections obtained from nusigma [20] using the CTEQ6 parton distribution [21], is distributed with toise, but other parameterizations may be used in its place.
At the higher energies relevant for radio detection, the transfer matrices described above can still be used as a reasonable proxy. However, since event signatures for the charge current interactions of (LPM effect [22]) and ( track length) change, and because future proposed detectors are sparsely instrumented, the toise framework allows for using custom transfer matrices instead. This has the advantage that the deposited shower energy distribution at trigger level for the considered detector can be taken into account directly, at the expense of not using the cross sections and inelasticity distributions implemented within toise.

Effective Area and Volume
For a neutrino telescope, a few factors control the expected sensitivity to astrophysical phenomena. The most critical of these is the acceptance of the detector to signals and backgrounds. The acceptance is usually estimated as an energy ( ) and zenith ( ) dependent effective area eff ( , ). These acceptances are convolved against the neutrino-to-final state transfer tensor from Sec. 2.1 to calculate event rates. In this section, we describe the process of calculating and parameterizing the effective area for optical and the effective volume for radio neutrino telescopes. The toise framework implements and manages effective areas through the effective_areas module. We show details on its use in the examples in Sec. 3.3.

Optical Neutrino Telescopes
To calculate the effective area eff ( , ) of the detector, the geometric area geo ( ) must be multiplied with an energy and zenith dependent efficiency for detecting and selecting events of interest ( , ) as in Eq. 2.3: For an optical neutrino telescope like IceCube or KM3NeT, the instrumentation is usually deployed in a relatively compact footprint on the scale of a few kilometers. A convex hull can be placed around the geometric extent of the instrument, defining a solid. Assuming the detector is roughly azimuthally symmetric, the projection of the solid along a given zenith angle defines the geometric area of the detector geo ( ). The selection efficiency, ( , ), characterizes the triggering efficiency of the detector, as well as the probability with which an event passes a set of analysis criteria. In practice, the selection efficiency is usually estimated using Monte Carlo techniques, where a detector is exposed to an isotropic flux of signal or background events, and triggering simulation and analysis selections are performed. The ratio of the number of events passing these cuts to the number of events thrown defines the selection efficiency. However, simple approximations can also be entered into the framework.
For the geometric area geo , the framework contains a utility module, surfaces, to create and manage the geometric solids, its projections, and azimuthal averages. To construct an object, the user provides a 2D footprint of the detector and an extent, or depth, over which to extrude the footprint to define the bounding volume.
For the selection efficiency , the framework expects the energy and zenith dependent efficiency to be provided as multidimensional B-splines formulated with the photospline package [23] and stored as FITS files [24]. An example of how to prepare these splines is provided in the code repository [25]. Two types of optical selection efficiencies are already supported in the framework. The first is a selection efficiency for tracks, which are typically created by through-going muons produced in charged-current interactions, and traditionally drive the sensitivity of searches for the sources of astrophysical neutrinos [26][27][28]. In the framework's effective_areas module, efficiency for tracks is handled in the eponymous ZenithDependentMuonSelectionEfficiency class. The second is a selection efficiency for cascades, which are created by neutral-current interactions of all neutrino flavors, and charged current interactions of electron and tau neutrinos [29,30]. These are handled by a HECascadeSelectionEfficiency class.
Combining the geometric area and the selection efficiency creates the effective area. The framework currently has one provided class for tracks: the MuonEffectiveArea class. Users can create their own effective area classes by following the example therein, including creating the associated selection efficiency parameterization.

Radio Neutrino Telescopes
Radio detectors are only sensitive to cascades above ∼ 10 16 eV, in contrast to optical neutrino telescopes. Since radio neutrino detectors are sparsely instrumented and the possibility to detect a cascade depends more on the orientation of the Cherenkov cone than on the vertex position, using a geometric area as in Eq. 2.3 is impractical for radio telescopes.
Instead it is more convenient to simulate neutrino interactions for each neutrino flavor in bins of neutrino energy and direction in a simulation volume sim . The neutrinos are forced to interact in this simulation volume, and then weighted by their transmission probability through the Earth, or weight . From these, tabulated neutrino effective volumes eff ( , ) can be evaluated by taking the ratio of the sum of weights of events triggering to the number of simulated neutrino interactions sim : The tables of eff can be passed to the radio_effective_areas module, where they are interpolated to match the shape of the transfer tensor (cf. Sec. 2.1) and converted to an effective area through the thin-target approximation: where L int ( ) = /( − ) is the interaction length of a neutrino in the Earth, and the cross-section contains the energy dependence.
Instead of using the default transfer tensor shipped with toise, the radio_effective_areas module allows to include a custom transfer tensor, which -dependent on neutrino flavor, neutrino energy and direction -holds the neutrino to shower energy probability distribution for detected events.
Like with the optical array, there are additional selection, or analysis, cuts. This is usually imposed by requiring the signal be strong enough to be identifiable above thermal noise, and fulfill certain reconstruction requirements. Radio neutrino detectors typically rely on single-station reconstruction of the neutrino properties to maximize the chance of seeing ultra-high energy neutrinos. A practical requirement for reconstruction is, for example, detecting signal in at least three antennas. Similarly, observing both the direct radio signal and a secondary signal reflected/refracted in the firn provides a good estimate of vertex distance and thus energy. A third requirement could be to observe a signal in a specific antenna-type, which can help in determining the neutrino direction.
These additional requirements can be parametrized using an analysis efficiency sigmoid curve in logarithmic energy, as can be seen in Fig. 4. This sigmoid/logistic curve shape is common of analysis efficiencies in the radio community [6,7]. We have designed the radio_analysis_efficiency module such that the passed parameters correspond to physical quantities, namely the efficiencies in the low and high energy limits ( 0 , ∞ ), a turn-on energy turn where 50% of the maximum is reached, and the logarithmic width of the turn-on lhw , The analysis efficiency may be chosen such that it also effectively reduces the contribution of the non-physics backgrounds, which can originate from high-wind periods, human activity near the detector or thermal noise surpassing the trigger threshold.
By choosing the different parameterization for Eq. 2.6 we can use toise to evaluate how optimistic or conservative assumptions of analysis efficiency impact the instrument's sensitivity. In particular, if the analysis efficiency behavior with respect to zenith is known, a user can define a more sophisticated parameterization taking the zenith dependency into account.

Angular and Energy Resolution
In addition to the effective area and selection efficiencies, the detector sensitivity depends on performance metrics such as the angular and energy resolution. In this section we discuss how the angular and energy resolutions are parameterized and utilized in the framework. The angular resolutions are handled in the angular_resolution module, while the the energy resolutions are managed by the energy_resolution module.

Optical Neutrino Telescopes
Like the effective areas and selection efficiencies, the framework expects the optical angular resolution to be provided as energy and zenith dependent B-splines in photospline [23]. Again under the assumption of azimuthal symmetry, the angular resolution, described through the point spread function (PSF), reduces to an energy and zenith dependent quantity. The PSF is defined by the More precisely, the region covering 25% to 75% of the transition from 0 to ∞  distribution of opening angles ΔΨ, where ΔΨ is the angle between reconstructed event direction and the true direction of the neutrino. For the optical array, the framework currently supports parameterizing the PSF using the Moffat/King function, as is done in e.g. the Fermi gamma ray telescope [31][32][33]. For a given energy and zenith bin, the Moffat/King function takes the form of Eq. 2.7: The function has two parameters, and . Qualitatively, describes the overall scale of the PSF (it is analogous to the width of a Gaussian distribution), while controls the slope of the tail of the distribution. An example of a PSF and the fitted Moffat/King function for a set of public IceCube data is shown in Fig. 5. We observe that this parameterization works well for analyses focusing on tracks, which typically have good, degree-scale median angular errors, but a user could implement other parameterizations as necessary. In practice, to calculate the PSF, one applies an analysis event selection to a Monte Carlo signal or background simulation, and builds the multidimensional distribution of ΔΨ; this can then be fit with a multidimensional B-spline, and saved to a FITS file. An example of building the multidimensional spline is provided in the code repository [25]. Construction of the energy resolution parameterization proceeds analogously, where a user applies cuts, and then builds distributions of reconstructed energy as a function of true energy. The framework particularly works with final state energy, e.g. energy of the muon at the detector boundary, or energy deposited in a cascade. The transfer function between this observable final state and the true flux of atmospheric or astrophysical neutrinos is discussed in Sec. 2.1. The framework currently supports a generic EnergySmearingMatrix class, where for a true final state energy bin, a user provides a bias and width for the reconstructed energy. We observe that for optical neutrino telescopes, e.g. IceCube [35] and ANTARES [36], a parameterization in energy alone is sufficient, as the energy resolution tends not to have strong zenith or azimuth dependence. Because this parameterization is comparatively simple, and requires only a univariate spline, there  [34] associated with IceCube's latest search for point sources using ten years of data [27], specifically the distribution for 5 ≤ log 10 ( ) ≤ 5.5 and declinations < −10 (corresponding to the Northern sky).
are minimal memory and computational savings to storing splines instead of the binned resolution directly, and so currently the framework uses the latter. The default class expects the bias and width parameters to be supplied as compressed archived files, specifically Numpy npz files, and a spline is built when an instance of the class is created. One implementation is already provided in the MuonEnergyResolution sub-class, which focuses on tracks. A user could build their own classes with more sophisticated parameterizations.

Radio Neutrino Telescopes
Reconstruction methods for radio neutrino detectors are under development but are currently less mature than for optical telescopes [37,38]. In particular, detailed large sets of Monte Carlo simulations after reconstruction are often not present. Therefore, more general smearing functions are currently provided in the radio_responses module for the angular error and energy smearing. The currently implemented functions for the radio energy and angular resolution are shown in Fig. 6.
As opposed to optical telescopes, to maximize effective volume, radio detectors are often designed to reconstruct a neutrino event with only a single station. Whether a neutrino event is detected or not depends primarily on the vertex distance [37] and is therefore, to first order, independent of the neutrino energy. The RadioEnergyResolution class uses a Cauchy PDF to account for the resolution on the shower energy, which compared to a Gaussian has larger tails, and therefore matches the distribution of reconstructed events better.
Accurate pointing for the neutrino direction is typically only possible for events for which the shower vertex position and the signal polarization can be determined. The RadioPointSpreadFunction describes the angular resolution in terms of ΔΨ using a double-Gaussian PDF with a constant term. The parameters are the widths of the Gaussians G, ( 1 , 2 ), and the fraction of events in each While the first Gaussian accounts for events that can be accurately reconstructed, the remaining events are absorbed in the second Gaussian and constant terms. With future and more detailed analyses based on larger Monte Carlo sets, a more realistic description of the angular resolution will give some guidance on how to incorporate the dependence on the true zenith direction into the angular resolution.

Background
When estimating sensitivities, we must account for backgrounds that may mimic the signal by contributing to the event rate in a region of interest. Which categories of background are important depend both on the kind of detector, the event selection, and the target science case. For a measurement of the diffuse astrophysical neutrino flux with an optical detector, for example, the primary backgrounds are penetrating atmospheric muons and, to a lesser degree, atmospheric neutrinos. In a search for point sources of neutrinos, on the other hand, the overall flux of highenergy astrophysical neutrinos is itself a background. If this were not accounted for, a single upward-going muon of sufficiently high energy could be considered evidence of point emission. Backgrounds can be accounted for either by adding their contributions to the event rate or ignoring regions where they are expected to contribute. The framework takes both approaches in different cases. For all detectors and science cases, the flux of atmospheric neutrinos (as parameterized with nuflux [39]) is added as a background contribution, using the same effective area as for astrophysical neutrinos. The atmospheric neutrino flux can optionally be reduced in the downward-going region to account for the fact that atmospheric neutrinos can be vetoed by the detection of accompanying muons from the same extensive air shower [40], following the calculation of Gaisser et al. [41].
For the detection of incoming muons with an optical detector, an approximate flux of single, penetrating muons from extensive air showers is added as a background contribution in the downward-going region. This can also be reduced to account for an ability to veto penetrating muons by detecting the electromagnetic component of the extensive air shower at the surface over a limited solid angle, e.g. with IceCube and IceTop [42]. For starting events with an optical detector, an energy threshold is applied such that the penetrating muon background is negligible compared to the rate of high-energy astrophysical neutrinos, mimicking the event selection of e.g. IceCube's "High Energy Starting Events" sample [29].
A potential irreducible background to neutrino detection with under-ice radio antennas is possible from extensive air showers penetrating the ice or from ultra-high-energy atmospheric muons producing detectable showers deep in the ice. The rate at which these events occur is subject to large uncertainties and actively debated, but roughly follows the zenith angle distribution of the neutrino signal [43]. If specified, the framework expects the assumed air-shower background rate as a two-dimensional grid binned in in-ice shower energy and zenith angle (cos ). We assume that all other reducible backgrounds, such as triggers induced by noise fluctuations, wind [44], or human activity, are effectively suppressed by the applied selection efficiency.

From detector parameters to physics performance
With a full description of the detector and the expected backgrounds, the physics performance can be evaluated. We first describe methods and approach before discussing the general set-up of the examples Sec. 3.3, which are each described in Appendix A.

Framework Overview
Once the response of a detector is fully described in the response tensor , ,cos( ), rec ,ΔΨ , it can be stored in the toise.factory. In addition to the , ,cos( ), rec ,ΔΨ , the factory allows the user to pass a distribution of non-neutrino backgrounds as described in Sec. 2.4 for each specified detector. The factory is used to collect the responses of multiple instruments (e.g. optical and radio) or disjoint analysis samples of one instrument (e.g. cascades and (un)shadowed tracks for an optical detector). The detectors stored in the factory can then later be retrieved for chosen livetimes. This allows the user to retrieve several stored effective area tensors and join them to produce combined sensitivity estimates. In addition, the same (set of) detector(s) can be retrieved for a series of livetimes in order to evaluate the sensitivity evolution with longer operation. Examples of retrieving a combination of detectors from the factory are given in Sec. 3.3. As can be noted throughout the previous sections, the optical and radio component of the detector are conceptually different, so the radio components of the detector response are often isolated into radio-specific modules, like the radio_aeff_generation. Parameters for the radio related modules can be defined in corresponding YAML files, and then manipulated by the framework; this is further discussed and demonstrated in Sec. A.4.
Neutrino fluxes are multiplied with the effective area tensors and their requested livetimes to obtain the expected event rates. Frequently used neutrino flux models for diffuse (atmospheric and astrophysical) neutrino fluxes and point source flux predictions are shipped together with the framework in the diffuse and pointsource modules, respectively.
The user may define source and background fluxes contributing to a particular analysis, for example, a diffuse astrophysical neutrino spectrum with a background of atmospheric neutrinos; or a point source spectrum with the sum of diffuse astrophysical and atmospheric neutrinos as backgrounds. The atmospheric muon background described in Sec. 2.4 adds to the neutrino background. The event rates for signal and background can be used in an Asimov approach [45] to estimate the sensitivities for the set of detectors and livetimes obtained from the factory; see the following section for a more thorough discussion.

Method to estimate sensitivities
In estimating the sensitivity of a detector to astrophysical phenomena, a few quantities are typically used. For example, it is common for a detector to quote the discovery potential for a flux of neutrinos from a source or a class of sources. The discovery potential is usually defined as the minimum value required of a parameter of the astrophysical flux, e.g. the normalization, such that an experimental search would exclude the parameter as being zero at the 5 level. We can also define a median 90% confidence level (CL) upper limit on a parameter, which is sometimes referred to as the sensitivity. It should be noted that discovery potential and sensitivity are distinct; discovery potential quantifies the ability of an experiment to discover a real signal, while sensitivity the ability to exclude one.
Experimental searches and measurements are usually implemented as Poisson maximumlikelihood problems, where a likelihood function is constructed of terms that model the backgrounds (such as the atmospheric neutrino flux) and terms that model a signal (such as the spectral index of the neutrino flux) The framework contains a module, multillh, and a contained class, LLHEval, for facilitating these log-likelihood type analyses. It contains utilities for specifying fixed and free (nuisance) parameters, and evaluating, fitting, and profiling over the likelihood space. The framework uses the SciPy limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS-B) routine for the fitting [46]. Several examples of how to use the likelihood tool to estimate performance for both diffuse and point-source type analyses are provided in the examples in Sec. 3.3.
For example, the Poisson likelihood L in the search and characterization of a diffuse flux of neutrinos takes the form of: In this formulation, is the measured number of events in some bin , and is the expected number of events given parameters of the astrophysical flux ì and nuissance parameters of the backgrounds (e.g. the atmospheric flux) ì . The best fit signal parametersì can be found by maximizing the likelihood. The ratio of the log of the likelihood at the best fit point to that of the likelihood under the null hypothesis (typically ì = 0, meaning no astrophysical component is present) defines the test statistics = 2 log L ( ì =ì)/log L ( ì = 0). To evaluate the significance of the observation one typically performs pseudo-experiments (PEs) to establish the expected distribution of under the null hypothesis. The observed test-statistic can then be compared to the background-only distribution to determine the significance of an observation. A PE is usually formed by drawing realizations of the data Poisson-distributed around the mean event rate in an observable bin, and then performing the analysis on the pseudodata. Through signal-injection or artificially adding simulated signal events to the data sample, one also derives the sensitivity and discovery potential. For example, one can use this process to identify the largest possible flux that would still not be detected at the 90% CL. By running many PEs, one builds a distribution of 90% CLs, and the median of this distribution becomes the quoted sensitivity.
Unfortunately, the process of generating many pseudo-experiments is too computationally expensive for purposes of rapidly evaluating sensitivities in the framework. To make the estimates computationally feasible, the framework makes two simplifying assumptions. First, the framework builds an Asimov dataset, which replaces the observed event rates by the exact mean [45]. Second, the framework assumes the distribution of the null test-statistic is 2 distributed as predicted by Wilk's Theorem [47]. These assumptions are justifiable as the test-statistic derived from the Asimov set is asymptotically equivalent to the median test-statistic derived from Poisson-sampled realizations [45,48], and the event rate in the bins providing most of the test-statistic is large ( 10).

Application examples and fictive detector
Since this framework is intended to be of practical use, it is shipped with four examples of obtaining figures and quantities that can be used to evaluate the performance of a neutrino detector. Each example is contained in an iPython notebook obtainable in the GitHub repository [49]. They are also described in Appendix A.
The numerical characteristics provided in these examples are those of a fictive detector. While this framework can describe the detector performance of any existing detector, the default values in the examples should not be considered as description of an actual detector. We look forward to a data release from the IceCube-Gen2 collaboration, as well as other collaborations, providing parameterizations for specific instruments.
In order to quantify the sensitivity of a given detector configuration with detail, the effective areas and instrument responses (cf. Sec. 2.2 and Sec. 2.3) need to be quantified. As a starting point and example, the software package contains simplified detector response curves for an under-ice optical array and a radio station, which is instructive to discuss to understand various aspects of detector performance that are required instead of simply using effective areas.
The fictive optical array is based on a cylindrical instrumented volume 500 m in height and 700 m in radius. Its performance metrics have features that could be expected from a detector that is wider than it is tall, and instrumented more densely in the vertical direction than in the horizontal. The effective area for muons shown in Fig. 7 is largest from vertical directions, but has a higher energy threshold, accounting for the smaller chance of a muon passing close to a sensor. The angular resolution for muon tracks shown in Fig. 8 degrades sharply from vertical directions for the same reason. The effective volume and angular resolution for starting events shown in Fig. 7 and 9 are independent of zenith angle, as they depend only on the instrumentation density near the neutrino interaction vertex. Energy resolution (Fig. 10) depends mostly on the characteristics of typical energy estimators, and so is assumed to be entirely a function of muon or cascade energy. The fictive detector also includes a surface component with an area of 5 km 2 that can detect extensive air showers that pass through the under-ice instrumented volume at zenith angles of less than 47 degrees. Fig. 11 shows the area of the under-ice detector that is shadowed by the surface detector. As a fictive radio detector, we consider a 30 station radio array, where each station is a single dipole at a depth of 100 m antenna. This was simulated using the NuRadioMC framework [50]. This is a reasonable proxy for a trigger at this depth, when not including overlap between stations, i.e. full array effects.   Figure 10. Energy resolution of the fictive optical detector. The (logarithmic) energy resolution for muons is nearly constant at low energies and increases above 100 TeV, modeling the performance of an energy estimator that is primarily sensitive to the mean energy loss rate, but does not account for stochasticity. The energy resolution for starting events is inversely proportional to the square root of energy, modeling an estimator whose performance is dominated by the statistics of detected photons. Energy resolutions are assumed to be independent of zenith angle. Effective area (km 2 ) Shadowed Unshadowed Figure 11. Surface veto coverage of the fictive optical detector. Trajectories from zenith angles less than 47 degrees may pass through both through the footprint of the surface detector and the deep instrumented volume, creating a "shadowed" area with reduced penetrating muon background in the throughgoing-track channel. In the remaining, "unshadowed" area, the penetrating muon background is unaffected.

Summary and Outlook
In this paper, we have presented the toise framework, which is used to calculate the physics performance of neutrino detectors based on parameterized performance characteristics. The framework code and the related examples are available on GitHub [51]. Since the framework relies on parameterizations of detector performance, using toise allows for a fast comparison of different detector designs, which is particularly helpful in early stages of experiment planning and design. Since next generation instruments like IceCube-Gen2 are foreseen as combination of optical and radio technologies, the toise framework has been extended to combine different detectors and evaluate their performance together. This feature may also be useful in the global context of exploring the usefulness of joint analyses between different experiments.
The toise framework has been used by the IceCube-Gen2 collaboration to describe the science capabilities of the detector [9]. It will continue to be used for the Technical Design Report (TDR), which is currently in preparation. We encourage IceCube-Gen2 to issue a data release of the detector quantities used by toise in parallel to the TDR. This will allow the community to evaluate the envisioned instrument performance for different science scenarios. Because the framework is general, we also encourage other projects under development, such as P-ONE [10] or RNO-G [52], to consider releasing detector performance parameterizations in a way that can be utilized by toise.

Acknowledgments
toise was first developed in the context of IceCube and with an eye towards IceCube-Gen2. We thank our collaborators for adopting this framework and making suggestions for its improvement. We particularly thank Tianlu Yuan for first introducing radio detectors in toise and Daniel García-Fernández for adding NuRadioMC support. We are thankful to the broader scientific computing community for developing and maintaining the software on which toise depends, including Matplotlib [53], Numpy [54], SciPy [55], healpy/HEALPix [56], and the conda-forge packaging infrastructure [14].

A Application Examples
We describe four examples that are provided in toise that best illustrate the capabilities of the framework. Please see the notebooks in the framework for further details [49].

A.1 Ultra-High Energy Differential Sensitivity and Upper Limits
This first example describes finding a differential sensitivity for the fictive detector configuration. This first tutorial notebook walks through the basics of toise, starting with generating effective areas from preset detector configurations.
The easiest way to use pre-made instrument responses is through the toise.factory module. This module keeps track of known instrument configurations, and creates effective areas for them on demand. As discussed in Sec. 2, the last dimension of the toise effective area tensor is the direction resolution binning. For a diffuse analysis, where we are interested in events across the whole sky, regardless of their angular resolution quality, we we will not constrain this part of the tensor for now, and so choose only a single bin in psi which spans the entire sky from zenith angles of 0 to : import numpy as np from toise import factory factory.set_kwargs(psi_bins={k: [ , np.pi] for k in ('tracks', 'cascades', 'radio')}) From this we can explicitly call the factory to construct the effective areas for various event classes: This returns a dictionary whose keys are the event classes. Currently there are: • shadowed_tracks: tracks entering the detector from the outside that pass through the footprint of the surface veto. These have a reduced penetrating muon background.
• cascades: neutrino interactions inside the fiducial volume (1/2 string spacing inside the outer strings) where the outgoing lepton is a) not a muon, and b) not a tau with a decay length > 300 m.
Behind each of these keys is a pair of effective areas, one for neutrinos, and one for penetrating atmospheric muons. Cascades are a special case, as we currently assume that the outer-layer veto completely removes penetrating muons (see Sec. 2.4).
To predict the sensitivity of a given detector configuration (exposure, i.e. effective area and livetime) to a neutrino emission scenario, we have to calculate event rates, which in turn requires a flux model. There are a number of flux models available in toise.diffuse and toise.pointsource. The diffuse versions predict event rates integrated over zenith angle bins, while event rates from point source fluxes are divided into rings centered on the putative source position.
A real analysis will use multiple detection channels in different detectors (e.g. years of optical array plus years of radio array), each of which is represented by a different effective area. We can create event rate predictions for such a collection of effective areas by writing a factory function and using factory.component_bundle() to construct components that predict event rates in all detectors and detection channels for the given combination of livetimes (here, 15 years). We can use these to construct a likelihood and predict sensitivities and discovery potentials. As an illustration, we calculate the sensitivity (median 90% upper limit assuming there is no actual signal) for the normalization of an E −2 isotropic equal-flavor flux. To calculate sensitivity, we create pseudodata where there is no contribution at all from the E −2 component and calculate the teststatistics (TS) between the best fit (astrophysical normalization is zero) and a given normalization, and finding the point where this crosses ∼ 2.705 (90% CL for 1 degree of freedom): llh = multillh.asimov_llh(bundle.get_components(), astro= ) from scipy.optimize import bisect from scipy import stats # test statistic between astro = f and astro = ts = lambda f: -2*(llh.llh(**llh.fit(astro=f)) -llh.llh(**llh.fit(astro= ))) # fit for \Delta TS = 2.7 5 (9 % CL for 1 degree of freedom) critical_ts = stats.chi2 (1).ppf( .9) limit = bisect(lambda f: ts(f)-critical_ts, , 1) print(limit) > . 79 This result indicates that if the flux of astrophysical neutrinos were actually zero, 15 years of observing with the fictive detector would constrain that flux of astrophysical neutrinos to 2 Φ < 0.0079 × 10 −8 GeV cm −2 sr −1 s −1 per neutrino flavor. This is a median upper limit; depending on the fluctuations in the number of observed atmospheric background events, the actual flux limit could be higher or lower, each with probability 0.5. This component-construction and limit-setting procedure is automated for a few common cases in the figures_of_merit module, including for an upper limit (sensitivity), discovery potential, or a Feldman-Cousins upper limit, which is more useful in the "no-backgrounds" case typical of searches for neutrinos at ultra-high energies. These can be either integrated over the entire energy range of the signal, or divided into energy bins. For example, we make a figure-of-merit object for testing sensitivity to the Ahlers & Halzen cosmogenic neutrino flux [57] or we can calculate a model-dependent upper limit using the full energy range: The framework also makes it straightforward to construct a differential limit, where the neutrino flux is only considered in smaller energy ranges. For example, we can construct a limit over single or half decade bins in energy. Note that the limit is returned in units of the Ahlers et al. flux [57]. fulldecade = fom.benchmark(figures_of_merit.DIFF.ul, decades=1) halfdecade = fom.benchmark(figures_of_merit. DIFF.ul,decades= .5) This results in the detector performance as shown in Fig. 12, where both a version with and without a radio array is shown.

A.2 Transient (Stacking Search) and Time Integrated Point Source Sensitivity
In this tutorial, we calculate a sensitivity to a population of transient point sources. In contrast to the diffuse tutorial, this uses full angular and time information, reducing background levels significantly. To start, we build effective areas for the fictive detector configuration.   [57], calculated for both half-decade and decade bin widths. The solid lines are for the optical detector alone, while the dashed lines include a fictive radio detector as well. We can see that, especially at high energies, the radio detector has a dramatic contribution to the sensitivity. A major advantage of our framework is that it can easily generate such comparisons and releases the data to be visualized by the end user. For a point source search, there is more than signal to worry about. In the Northern sky the background is dominated by atmospheric neutrinos, while in the Southern sky penetrating muons must be considered. The following snippet leads to Fig. 14, which illustrates the expected background and signal rates. A stacking search is a weighted sum of these sources and backgrounds over all zenith bands. The background expectations are scaled by the same factor, and all event-count expectations multiplied by the assumed livetime. A stacked transient search is similar in toise, though it works directly with fluences instead of integrating fluxes over an assumed livetime. To model the varying observation time, we can either assume the observation lasts a certain duration (in a mock study of GRBs, this might might be 90 [58], the duration over which a GRBs counts rise from 5% to 95% of the total) and integrate the backgrounds for that duration, or assume a distribution of integration times, and scale the signal event counts to account for the fraction of the total fluence expected during each observation window. For simplicity in this example we choose the first option.

A.3 Effective Areas
The third notebook covers the details on how to directly plot the effective area for varying detector configurations using the fictive radio and optical detectors as an example. We initialize the toise factory and use the provided fictive detectors.

A.4 Radio Example
The fourth example presents a radio specific analysis, where we evaluate the distribution of detected events for different assumptions about the fictive instrument's performance. For the analysis, we choose a livetime of nyears years, and a fictive array of nstations stations. We also define the direction resolution binning. This is the last dimension of the toise effective area tuples. Setting a lower psi_max_rad will exclude a fraction of the angular resolution CDF above psi_max_deg from analysis. For a diffuse analysis we will not constrain it for now, and also choose only a single bin. nyears = 1 nstations = 3 psi_max_rad = np.pi psi_bins = np.sqrt(np.linspace( , psi_max_rad**2, 1)) We then generate a radio effective area with the resolution parameters for our fictive detector defined in a .yaml file. The parameters can also be updated by hand. from toise import radio_aeff_generation radio_array = radio_aeff_generation.radio_aeff('radio_config.yaml', psi_bins=psi_bins) radio_array.set_parameter('detector_setup', 'nstations', nstations) radio_array.switch_analysis_efficiency(True) radio_array.switch_energy_resolution(True) We now create the toise effective area tuples. A atmospheric muon background may also be created. As discussed in Sec. 2.4, the normalization of this background at high energies is still activeld debated in the community, so only a simple version of it is provided in toise. radio_aeff = radio_array.create() backround_muons_aeff = radio_array.create_muon_background() # Adding to toise framework factory.add_aeffs("radio_aeff", (radio_aeff, backround_muons_aeff)) Next, we generate additional effective areas at trigger-level (i.e. we do not account tighter analysis selection) and also assume perfect energy resolution, to compare with the more realistic default radio_aeff. This also demonstrates various functions in the radio classes, such as the turning on and off of analysis efficiencies and the addition of effective areas to the factory. radio_array.switch_analysis_efficiency(False) radio_array.switch_energy_resolution(False) radio_aeff_perfect_E_eff = radio_array.create() factory.add_aeffs("radio_aeff_perfect_E_eff", (radio_aeff_perfect_E_eff, None)) The effective areas we have generated above can now be used to evaluate and compare the distributions of events for different flux assumptions. Within the framework, diffuse neutrino fluxes are provided in toise.diffuse. For convenience, the flux plots are provided as functions in this tutorial online and may be fed with any set of previously defined effective area tensors. The essence is provided here:   [59].