Identifying the Theory of Dark Matter with Direct Detection

Identifying the true theory of dark matter depends crucially on accurately characterizing interactions of dark matter (DM) with other species. In the context of DM direct detection, we present a study of the prospects for correctly identifying the low-energy effective DM-nucleus scattering operators connected to UV-complete models of DM-quark interactions. We take a census of plausible UV-complete interaction models with different low-energy leading-order DM-nuclear responses. For each model (corresponding to different spin-, momentum-, and velocity-dependent responses), we create a large number of realizations of recoil-energy spectra, and use Bayesian methods to investigate the probability that experiments will be able to select the correct scattering model within a broad set of competing scattering hypotheses. We conclude that agnostic analysis of a strong signal (such as Generation-2 would see if cross sections are just below the current limits) seen on xenon and germanium experiments is likely to correctly identify momentum dependence of the dominant response, ruling out models with either"heavy"or"light"mediators, and enabling downselection of allowed models. However, a unique determination of the correct UV completion will critically depend on the availability of measurements from a wider variety of nuclear targets, including iodine or fluorine. We investigate how model-selection prospects depend on the energy window available for the analysis. In addition, we discuss accuracy of the DM particle mass determination under a wide variety of scattering models, and investigate impact of the specific types of particle-physics uncertainties on prospects for model selection.

Abstract. Identifying the true theory of dark matter depends crucially on accurately characterizing interactions of dark matter (DM) with other species. In the context of DM direct detection, we present a study of the prospects for correctly identifying the low-energy effective DM-nucleus scattering operators connected to UV-complete models of DM-quark interactions. We take a census of plausible UV-complete interaction models with different low-energy leading-order DM-nuclear responses. For each model (corresponding to different spin-, momentum-, and velocity-dependent responses), we create a large number of realizations of recoil-energy spectra, and use Bayesian methods to investigate the probability that experiments will be able to select the correct scattering model within a broad set of competing scattering hypotheses. We conclude that agnostic analysis of a strong signal (such as Generation-2 would see if cross sections are just below the current limits) seen on xenon and germanium experiments is likely to correctly identify momentum dependence of the dominant response, ruling out models with either "heavy" or "light" mediators, and enabling downselection of allowed models. However, a unique determination of the correct UV completion will critically depend on the availability of measurements from a wider variety of nuclear targets, including iodine or fluorine. We investigate how model-selection prospects depend on the energy window available for the analysis. In addition, we discuss accuracy of the DM particle mass determination under a wide variety of scattering models, and investigate impact of the specific types of particle-physics uncertainties on prospects for model selection.

Introduction
Identifying the particle nature of dark matter (DM) is one of the most important open problems in modern physics. There is a world-wide effort to build increasingly large experiments to search for scattering of the weakly interacting massive particle (WIMP) off nucleons, with the next-generation experiments cutting into theoretically favored WIMP parameter space. Direct detection experiments of Generation 2 (G2) feature increased exposures and sensitivities as compared to current experiments, and their aim is to establish a first confirmed detection of DM particles in the coming decade [1]. Once DM is detected, the immediate goal will be to infer details about DM properties, cross-compare those results between searches, and confront various theories of DM-nucleon scattering with data. Direct detection data analysis has so far focused on the simplest scattering scenarios, in which the cross section is momentum-(i.e. energy-) and velocity-independent. In these scenarios, DM effectively couples to target nuclei through a coherent spin-independent (SI) interaction (so that the scattering rate scales as the size of the nucleus squared), or through a spin-dependent (SD) interaction (where the scattering rate scales as the square of the average spin of the unpaired nucleons in the target). However, a number of studies have pointed out that direct detection can access a richer phenomenology, manifesting as a nontrivial momentum or velocity dependence of the scattering cross section [2][3][4][5][6][7][8][9][10][11][12], and potentially triggering new types of nuclear responses [12][13][14].
Though the standard SI or SD interactions often naturally dominate scattering rates, it is possible for other kinds of low-energy interactions to contribute at comparable levels, or even to dominate if the standard responses are suppressed or forbidden. For example, if the DM has a magnetic dipole moment, couplings dependent on the spin and orbital angular momentum contribute the observed event rate. These new types of interactions give rise to different nuclear responses than those appearing in the standard SI or SD cases [12,13], and have been invoked in the literature to try to reconcile apparently inconsistent results reported by some experiments [5][6][7][8][9][10][11]15].
With the complexity of direct detection phenomenology in mind, an effective field theory (EFT) approach has recently been proposed to categorize the entirety of available theories accessible through direct detection measurements [16,17]. These studies give a complete basis of non-relativistic operators allowed by the symmetries of the scattering process that may describe the scattering of DM through mediators of spin 1 or less. They also provide the correct mapping of these operators onto the appropriate nuclear responses for a wide variety of targets, introducing some novel responses that have previously been neglected in the literature and data analyses.
This study addresses future extraction of the particle nature of DM from direct detection experiments in the context of the rich array of possible scattering phenomenologies. Several studies have previously explored different aspects of the direct detection inverse problem: Ref. [18] focused on a subset of UV completions and limited to standard nuclear responses; Refs. [19,20] investigated an incomplete set of EFT operators that provoke only the standard nuclear response; Ref. [14] looked at a broad set of operators, turning on one operator at a time; Ref. [21] looked at a complete EFT but isolated a single nuclear response at a time; and Ref. [22] explored "simplified models". In this study, we focus only on elastic scattering of a fermion, and for the choice of UV completions to consider, follow Ref. [12]. We then investigate phenomenology from a wide variety of realistic models, including models that are are either theoretically well motivated (have natural UV completions), plausible (have viable UV completions), or otherwise phenomenologically distinct.
Specifically, we investigate how different DM scattering theories can be distinguished with direct detection data in the presence of significant Poisson noise, as this is the regime in which the first signals would arise. The first study to analyze distinguishability of scattering phenomenologies in a statistical way was presented in Ref. [20]; we adopt the methods presented there and apply them to a wide range of scattering scenarios. Concretely, we choose a set of competing scattering "hypotheses", compute the expected signals for each hypothesis on a variety of targets, and analyze such simulations to establish whether the underlying model can be correctly identified in light of future data. We thereby address two main questions: (a) How likely is direct detection to correctly identify the underlying theory of DM-nucleon interactions? (b) What experimental strategies maximize chances for success of such a pursuit?
To answer these and related questions, three prerequisites are necessary:

DM-nucleus scattering
The nuclear recoil energy spectrum is the number count of nuclear recoil events observed per recoil energy E R , per unit time, per unit target mass, This quantity is the observable output of most direct detection experiments. It is a function of the experimental parameters, astrophysics inputs, particle properties of DM, and nuclear properties of target material. In the above Equation, ρ χ is the local DM density; m χ and m T are the DM particle and target-nucleus mass, respectively 1 ; v min = m T E R /2µ 2 T is the minimum velocity a DM particle of mass m χ needs to produce a nuclear recoil of energy E R , where µ T = m T mχ m T +mχ is the DM-nucleus reduced mass; v esc,lab is the escape velocity from the Galactic halo in the lab frame; v is the relative velocity of the DM in the lab frame; and f (v) is the local DM velocity distribution. For the purposes of this study, we set the astrophysical parameters to the following values [23]: ρ χ = 0.3 GeV/cm 3 ; v esc = 544 km/sec in the Galactic frame; we assume that f (v) is a Maxwellian distribution, with an rms speed of 155 km/sec, and a mean speed equal to the Sun's rotational velocity around the Galactic center, v lag = 220 km/sec. The choice of scattering model enters the calculation of the recoil rate through the differential scattering cross section dσ T /dE R , as discussed in §3. The ultimate goal of direct detection analysis is to reconstruct both the normalization and functional form of dσ T /dE R from nuclear recoil data.
The total rate R of events (per target mass, per time) is an integral of the differential rate within the nuclear-recoil energy window of a given experiment 2 . The total expected number of events for exposure T obs (typically in kilogram-years) is N ≡ RT obs .

Scattering operators and responses
In the following, we restrict our attention to spin-1/2 DM. This allows us to sample DMnucleus interactions that depend on spin, but introduces (minimal) model-building limitations. Furthermore, we focus only on t-channel elastic scattering, and assume that scattering changes neither the mass nor the flavor of the incoming DM particle χ, or nucleon N .
The SI and SD cross sections are standard choices for interpreting experimental data because, given similar coupling strengths for most interactions, these responses dominate at low energy. These cross sections arise if any of the contact operators χχN N , (3.1) χγ µ χN γ µ N , or (3.2) χγ µ γ 5χN γ µ γ 5 N , (3.3) are generated by the theory. The interactions of Eqs. (3.1) through (3.3) make simple predictions for experiments: cross sections produced by Eqs. (3.1) and (3.2) scale coherently with nucleon number, and so nuclei of larger atomic mass are always more effective in direct detection searches; the scattering cross section produced by Eq. (3.3) is most effectively probed by a nucleus with large spin. Despite these generic expectations, a key point is that these leading interactions may be suppressed or only co-leading. For example, the operator in Eq. (3.2) does not arise if the DM is a Majorana particle. Exploring nonstandard nuclear responses can thus be crucial for understanding the full range of possible particle physics interactions and extracting the correct one.
As a first step away from the conventional case, we consider models with scattering mediated by the photon of electromagnetism, or by kinetic mixing of a massive gauge field with the photon [4,10,[24][25][26][27]. These models are well motivated and provide a low-mass degree of freedom that can couple the DM to the SM. Interestingly, (dark) photon-mediated models give rise to momentum-and velocity-dependent detection rates. Likewise, scattering mediated by a pseudoscalar particle is characterized by pronounced momentum suppression in the direct detection rate at low recoil energies. Ensuring the dominance of such interactions, however, requires suppressing the operators in Eqs. (3.1) through (3.3), which may indicate a tuning of UV parameters, or that the pseudoscalar is much lighter than any scalar or vector particle that can mediate the processes in Eqs. (3.1) through (3.3). If the scattering is mediated by either the (dark) photon or a pseudoscalar, the associated nonstandard nuclear responses could upend notions of which target element is most sensitive to scattering. One purpose of this work is to explore what sets of target nuclei are most effective in distinguishing models when both standard and nonstandard interactions are considered.
To evaluate prospects for direct detection experiments to distinguish different models of DM-nucleus scattering, we first choose competing hypotheses. The full list of models considered in this work is given in Table 1. It represents a balanced subset of the allowed interactions, involving a compromise between exhausting the set of operators allowed by the symmetries of the theory and restricting to the set of operators produced by relatively simple UV completions. The set of operators mostly follows the discussion of [12]. In addition to representing a diversity of models in terms of UV completions, the operators chosen in Table 1 form a representative (if not exhaustive) class of models in terms of momentum and velocity dependence, and triggered nuclear response. In this Section, we sketch examples of each of these categories of models and demonstrate how they might fall out of a UV complete theory.
The models we list in Table 1 are motivated by their UV completions. We write them explicitly in terms of Lorentz-invariant operators of a relativistic quantum field theory. To map onto low-energy (nuclear) physics appropriately, we need to relate these relativistic operators to the appropriate non-relativistic expansion. Completing this map allows one to apply the correct nuclear response, which is encoded in a form factor for each type of response. In the case of standard SI or SD scattering, this is normally written where Z, A, and J are the atomic number, atomic mass, and spin of the nucleus, respectively; µ T and µ p are the reduced DM-target and DM-proton masses, respectively; f n /f p is the ratio of DM-neutron and DM-proton coupling strengths; S p and S n are the proton and neutron spin content of the nucleus, respectively; and σ p is the cross section for scattering off a proton. The form factors F describe the dependence of the scattering on energy, defined such that F 2 SI,T (0) = F 2 SD,T (0) = 1. While F 2 SI,T (E R ) could in principle strongly depend on the nucleus in question, a Helm form factor [28] F SI, Model name Table 1. Scattering models and corresponding relativistic operators considered in this work are listed here. In model names, "MD" stands for "magnetic dipole", and "ED" for "electric dipole"; "heavy med." and "light med." refers to the mediator mass, as compared to the characteristic value of momentum transfer. SI q 2 , SD q 2 , and SD q 4 represent the pseudoscalar-mediated models; note that they do not simply correspond to q 2 (q 4 )× SI(SD); also note that SD q 2 and SD q 4 taken with two different values of f n /f p (denoted as Higgs-like and flavor-universal; values are listed in the last column) are treated as two separate models each for the purposes of our analysis in later sections. N is a nucleon field; A µ is the photon field; F µν is the EM field strength; v ⊥ is the transverse velocity; q 2 = 2m T E R is the threemomentum transfer; and Λ is a heavy-mass or compositeness scale appearing in the dipole models with a heavy mediator. The leading momentum and velocity dependence of the response corresponding to the given operator is listed (schematically) in the third column, and the corresponding response functions in the fourth column. The last column contains a list of benchmark values for f n /f p used for simulations in this work. "Photon-like" refers to the fact that ratios analogous to f n /f p are completely determined by the EM properties of the target nucleus when the mediator is a photon.
is found to be a good approximation across many nuclei [29]. Here, j 1 is a spherical Bessel function of the first kind; q 2 = 2m T E R ; s 0.9 fm; and r 2 T = c 2 + 7 3 π 2 a 2 − 5s 2 is an effective nuclear radius, with a 0.52 fm and c 1.23A 1/3 − 0.60 fm. No such universal form suffices for F SD,T , although measured values for these functions are available [30].
For more general interactions, universal form factors are insufficient, but a suitable basis that can describe all nuclear responses compatible with the low-energy symmetries has recently been found [16]. Instead of two responses (M for SI and Σ + Σ for SD), the Response Colloquial name Table 2. Some nuclear response functions relevant for DM direct detection. The E R → 0 limit for (N, N ) = (p, p) is shown in the right-most column; in this limit, we also have 4π 2J+1 W generalized scattering can trigger five responses (M, Σ , Σ , ∆, Φ ) along with the relevant interference terms. Refs. [16,17] parameterized the generalized responses as follows 3 The functions R X in Eq. (3.6) have mass dimension negative four since they arise from dimensionsix operators in the Lagrangian. The R X encode the momentum and velocity dependence arising from the interaction(s) between DM and the SM. Through the functions R X , the particle physics content is carried through in a target-independent way. Explicit forms of the R X in terms of the low-energy DM-nucleon operator coefficients c i are given by Eq. (38) of Ref. [17]. These c i are mass dimension negative two parameters that control which nonrelativistic operators O i enter the cross section. The target-specific information is carried by the W (N,N ) X (y), which we call the nuclear response functions. These can be obtained from Ref. [17] 4 ; we summarize some low-energy limits of these responses in Table 2, which we have reproduced from Ref. [12]. The dimensionless variable y ≡ m T E R b 2 /2 stands in for the nuclear recoil, where b ≡ 41.467/(45A −1/3 − 25A −2/3 ) fm is the harmonic oscillator parameter for an atom with mass number A. Of the five different responses that can arise in elastic scattering mediated by the exchange of spin-0 or spin-1 particles (M, Σ , Σ , Φ , and ∆) only the pairs M, Φ and Σ , ∆ can interfere. The response entering into "standard" SI scattering is M , while that entering into "standard" SD scattering is Σ + Σ . The Φ and ∆ responses (and their interference terms with the others) are "novel" in the sense that they do not appear in the conventional scenarios.
We now break Eq. (3.6) into expressions for each of the operators from Table 1.

Standard spin-independent and spin-dependent scattering
Given heavy mediators and order one coefficients for all Lorentz-invariant UV operators, standard SI or SD scattering as presented in Eq. (3.4) will dominate the behavior of the scattering cross section at low energies. This happens because these are the only terms not suppressed by the DM velocity or momentum. At the level of the DM-nucleon interaction, these standard scattering cross sections are produced by the Lagrangians 5 where we assume that the particle mediating the interaction has mass m med ∼ M/ √ f N , and we reserve the symbol N = n, p to represent the nucleon involved in the interaction. The corresponding cross sections for scattering off a nucleus are The SI scattering acts coherently on the entire nucleus, and consequently scales quadratically with nucleon number A, such that N N W (N,N ) M | y=0 ∝ A 2 , as in Eq. (3.4). In contrast, the SD scattering response depends primarily on the average spin of unpaired nucleons, so the resulting cross section does not scale with nuclear size. Assuming heavy mediators, if any Lagrangian gives rise to one of these responses (without momentum suppression), the cross section will be well approximated by Eq. (3.8) at low momentum transfer.
When reporting limits on the strength of these standard interactions, and performing corresponding model fits, we refer to the cross section off the proton, defined as where µ p = m p m χ /(m p + m χ ) is the reduced mass of the proton-DM system; the factor of 3 in σ SD p is included to agree with conventions. Instead of parametrizing cross sections in terms of f p , f n , and M , we rescale the DM-neutron cross section to the DM-proton cross section. This leaves two free parameters to describe the scattering strength: σ SI p and f n /f p .

Photon-mediated scattering
Some of the best-motivated models that produce nonstandard scattering are those where the mediator of the scattering is the SM photon. Electrically neutral DM with an anapole, electric dipole, or magnetic dipole moment can arise in composite DM models (where the constituents carry electromagnetic charges) [2], models with EW-neutral DM that couples to heavy, charged messengers [31], or from models where the photon kinetically mixes with a dark photon [10,32]. DM with a small electric charge can arise when the SM photon kinetically mixes with a heavy U (1) gauge boson [33] and its mass is stabilized by the Stückelberg mechansim [34]. For each of these cases, we recap simple models and their resulting momentum and nuclear response dependence. In all of these cases, the couplings to the SM nucleons are entirely fixed. This leaves no freedom to fine-tune operators against each other, but it still allows non-standard nuclear responses (for which the relative sensitivities between elements differ from the standard cases) to dominate the recoil spectrum. All photon-mediated scattering involves an interaction with the nucleon-level electromagnetic current, where we explicitly pull out the elementary charge e > 0,μ N = magnetic moment nuclear magneton is the dimensionless magnetic moment, and K µ is the sum of incoming and outgoing nucleon momentum. The scattering behavior is determined by the DM bilinear. For neutral DM, the bilinears are restricted by gauge invariance to be The magnetic and electric dipoles can arise when the DM is a Dirac fermion which is a singlet under the SM gauge group but which couples to the SM through heavy fermion and scalar messenger states [31]; they may also arise if DM is a fermionic SM singlet composite particle [2]. The scale of the heavy fermion or scalar messenger states, or alternately the confinement scale, is Λ. If CP is nearly conserved in the dark sector, Λ ED Λ MD . If certain conditions on the mass and kinetic mixing of the gauge bosons hold, the DM may pick up an electric charge [33,34], typically parametrized by a small number multiplied by the electron charge, q χ = e. Such millicharged DM interacts with J EM µ via the bilinear O µ χ,Millicharge = eχγ µ χ, (3.14) and is necessarily coupled through the SM photon. In these cases, the mediator is naturally the photon, which is exactly massless. The contact operator for direct detection scattering is where | q| = √ −q µ q µ is the magnitude of the momentum transfer in the event. The corresponding scattering is dominated by a SI term that is enhanced rather than suppressed at low momentum transfer. In general, other photon-mediated interactions of DM (such as through the nucleon charge radius [2,31] or polarizability [35,36]) can be important, but we neglect them in this work since they arise from higher-dimension operators. If the DM is a Majorana fermion, the anapole is the only operator that can give rise to a spin-one-mediated interaction with SM matter. In this case, kinetic mixing with the gauge boson of a broken U (1) plays an irreducible role. A gauge boson of mass M that kinetically mixes with the photon gives rise to an operator The gauge boson may in principle have a very small mass, M 2 q 2 , in which case the scattering would appear to be through a light mediator (compared to the momentum transfer). However, masses this small are constrained by astrophysical and collider searches [37], and we focus on the M 2 q 2 regime instead. Of course, kinetic mixing with a heavy mediator may also play a role in the preceding cases; in general, q 2 in Eq. 3.15 should be replaced with The scattering cross sections for these operators are [12,16] σ Anapole where the transverse velocity v ⊥2 T = v 2 − q 2 /4µ 2 T generates nonstandard velocity dependence. The difference between a velocity-and momentum-dependent rate is seen in the differential energy spectrum: a velocity-dependent rate has a suppressed normalization but a relatively unchanged differential energy spectrum compared to an unsuppressed rate. In contrast, a momentum-dependent rate will contain the novel feature of a characteristic energy below which the event rate rises with E R . The various rates are compared in Figure 2 below.
In addition to generating momentum and velocity dependence in the cross section, these interactions stimulate different nuclear responses than the ones probed by standard scattering. The average orbital angular momentum, L, of the constituent nucleons (via the ∆ response) is important for photon-mediated interactions. In Figure 1, we illustrate the relative size of the standard SI versus SD responses for the case of DM interacting through the anapole interaction. To highlight the fact that the responses diverge from the standard case (and, crucially, diverge in different ways for different detector elements) we show the rates on xenon and iodine targets. These elements have isotopes with an unpaired neutron and proton, respectively. The SI contribution dominates the rate for xenon because of its unpaired neutron: since the angular momentum coupling is dominantly through the proton, the overall angular momentum response of xenon is small. In contrast, iodine has an unpaired proton and so has a significant angular momentum response. This simple, well-motivated example illustrates the importance of considering non-standard interactions at direct detection experiments. No tuning was necessary to bring the novel nuclear response to the forefront. On the other hand, the novel nuclear responses play an important role only for certain targets. In particular, a xenon target experiment is still rather sensitive to the underlying interaction due to its large M response.
As in the case of standard interactions, for the purposes of our numerical analysis, we define quantities analogous to the cross section for scattering off the proton, for the heavy-mediator cases; analogously, for the light-mediator cases, We choose a reference momentum typical of direct detection experiments, q 2 ref = (100 MeV) 2 . In terms of q ref , the characteristic turn-over energy described above is E turn = q 2 ref /2m T .

Pseudoscalar-mediated DM
We next consider interactions mediated by the exchange of a spin-0 particle with at least one CP-odd vertex. There is more parameter freedom in these pseudoscalar-mediated models than in the photon-mediated examples, which allows us to isolate additional phenomenologically nontrivial responses. However, this freedom comes at the cost of having a somewhat fine-tuned UV theory. The symmetries that protect the novel responses in the photonmediated case are absent here and some sort of tuning, or hierarchy of mass scales between scalars and pseudoscalars (with the pseudoscalars being much lighter), is required so the novel responses are not dominated by the standard responses. We begin with the Lagrangian The terms involvingN γ 5 N give rise to a longitudinal SD (Σ ) response; depending on the CP properties of the DM vertex, one additionally expects either a q 2 or q 4 suppression in the rate. The cross section for scattering with at least one CP-odd vertex is given by [12,16] σ pseudoscalar The SD part of the interaction depends on only the longitudinal SD response rather than the longitudinal plus transverse SD response that is conventionally considered for SD DM interactions. For this reason, it is inappropriate to treat the cross section as ( q 2 / q 2 ref ) n × σ SD T . Nonetheless, we will use the (potentially misleading) nomenclature SD q 2 and SD q 4 when referring to these models in the context of model selection.
From the model building perspective, there is an immediate challenge with models that rely on pseudoscalar mediators to suppress a SIχχN N term: generally, pseudoscalars are accompanied by scalars in a complete theory, and CP violation at either vertex can mimic this effect. If present, a scalar interaction (from a CP-even particle or from CP violation at a vertex) will mediate a purely SI, non-momentum-suppressed interaction with nuclei, which will dominate the scattering rate. An explicit example of a model in which the f SDq 2 term can naturally compete with a SI term is given in Sec. 6 of [16].
Analogous to the case of standard interactions and photon-mediated models, when discussing current constraints and fitting pseudoscalar-mediated models to simulated data in later sections, we parameterize coupling strength via the cross section for scattering off a proton, defined as To make contact with earlier literature we write the low-momentum-transfer expansion for the Σ response, (3.26) The corresponding cross section σ p and f i n /f i p are, in principle, the two free parameters of each of the pseudoscalar-mediated scattering models.
The values for f SDq 2,4 n /f SDq 2,4 p required to generate events for our mock simulations arise as a consequence of the couplings to quarks. The quark couplings, f SDq 2,4 q , are the free parameters. Starting with a UV theory that describes these interactions, we then construct the couplings to nucleons. We define the nucleon coupling to be The pseudoscalar content of the nucleons is defined by f N (q,0) 5 ≡ N |im qq γ 5 q|N , for which we use quark masses m q from current measurements. Finally, we calculate the ratio of the nucleon couplings as [38] f SDq 2,4 For a given set of quark couplings f SDq 2,4 q , the ratio f SDq 2,4 n /f SDq 2,4 p is fixed by nuclear measurements [39]. Unfortunately, such measurements are beset by large uncertainties [38]. We emphasize that for some of the best-motivated choices of f for Higgs-like (flavor-universal) scenarios should be understood to carry significant error bars. With couplings to heavier quarks or the inclusion of two Higgs doublets, the bounds on this ratio become even less constrained. In the Type-II two Higgs doublets Models [40], for example, couplings to leptons and down-type quarks are enhanced by tan β while couplings to up-type quarks are suppressed by cot β, where tan β = v u /v d > 1 is a free parameter in the model. Thus, while |f p | |f n | is expected at the effective operator level [41], it is clear that |f n | |f p | is equally well motivated in UV-complete models. As a choice of representative benchmark values for the purposes of simulating data under these models in §4, we use:

L · S Scattering
In some models, the interaction arises, which leads to a DM-nucleus interaction in which the Φ response competes with the M response [16]. For computational tractability, we reduce the number of free parameters by making the arbitrary (though, in this case, untuned) choice f N The scattering rate is then given by [12,16] To parameterize the overall coupling strength we will use Since the Φ response is related to products of orbital and spin angular momenta of constituent nucleons (see Tab. 2), we refer to this model as " L· S-like" and note that it is another example of a model in which a novel nuclear response contributes significantly to the scattering rate.

Simulations
We have described all theory inputs necessary to compute scattering rates and simulate nuclear recoil-energy spectra for a wide range of scattering theories on a variety of nuclear targets. To simulate future data, we use a total of 14 scattering models, listed in Table 1.
Each scattering model in this Table is defined by the choice of the interaction operator (corresponding to specific nuclear responses), and the choice of f n /f p (note that some models in this list only differ by their value of f n /f p ), leaving us with two parameters that can be freely chosen for each model: the cross section for scattering off protons σ p (defined for different models in Eqs. (3.9), (3.21), (3.22), (3.25), and (3.31)), and the DM particle mass m χ . Our benchmark values for m χ used for most simulations are 15 GeV, 50 GeV, and 500 GeV, but we also explore a 7 GeV DM in a small subset of simulations in §6.2.2. The choices of σ p values used to simulate future experiments are detailed in §6.1; overall, they cover both the optimistic scenarios where the DM signal is close to the current upper limit, and the scenarios where it is just below the reach of the G2 experiments. The nuclear recoil-energy spectra produced by our scattering models are shown in Figures 2 and 3. From these Figures, it is apparent how the momentum dependence of the leading response for each model gives rise to a distinct phenomenology. In particular, for germanium and xenon targets, a positive power of momentum transfer in the cross section (e.g. q 2 and q 4 dependence for ED and MD models with heavy mediators, respectively) produces a suppression (turnover) of the rate going to smaller nuclear-recoil energy. On the other hand, models with a negative power of momentum transfer (such as those with a mediator much lighter than momentum transfer, i.e. the Millicharge, and the ED and MD with light mediators) show a sharp rise in the event rate for the lowest-energy recoils. Another striking feature is the target dependence of the spectra: as emphasized in previous literature [12,[18][19][20], the nuclear-target dependence of DM spectra will be essential for breaking degeneracies between different interaction models. We quantify this statement in detail in §6.
We consider a wide range of experimental setups, with a particular focus on G2 experiments. For the baseline analysis (results shown in §6.2), we focus on xenon (labeled as "Xe"), germanium ("Ge"), iodine ("I"), and fluorine ("F") targets, with an outlook towards some existing and proposed experiments (LZ [42], SuperCDMS Snolab [43], sodium-iodide experiments [44], and fluorine-based bubble-chamber experiments [45], respectively). Later in §6, we explore prospects for argon ("Ar"), sodium ("Na") 6 , and helium ("He") 7 targets (inspired by proposals of Refs. [1], [44,46], and [47], respectively). Furthermore, we consider somewhat modified versions of Xe and I, to quantify the impact of changes to the nuclear recoil-energy window on the results of model selection: "Xe(lo)" is equivalent to Xe with a lower energy threshold; Xe(hi) is equivalent to Xe with a higher upper end of the energy window; "Xe(wide)" is Xe with a wider energy coverage; and "I(lo)" is equivalent to I with a lower threshold. And finally, we consider a next-stage xenon experiment we name "XeG3" with exposure that reaches the irreducible neutrino background, and explore futuristic fluorineand iodine-based experiments "F+" and "I+" (with larger energy windows and exposures than those of I and F), in order to gain a sense of the ultimate reach of direct detection.
We assume perfect energy resolution for all experiments, except for F, for which we assume no resolution within the analysis window. We leave careful consideration of efficiency and experimental backgrounds for future analysis, and assume that the signal is entirely DMdominated. We thus adjust the exposures of Xe and Ge to reproduce the exclusion curves presented in Ref. [1], assuming zero background events.  Figure 2. Theoretical nuclear recoil energy spectra for the first 8 models of Table 1 ("set-I" models used for our baseline analysis; see also discussion of models in §5), shown for a variety of G2-like experiments (described in Table 3). Spectra are calculated for a 50 GeV DM particle, with scattering cross sections set close to their current upper limits.  Figure 3. Same as Figure 2, but for models of "set II" discussed in §5 (note that the MD and ED with heavy mediators are omitted from this Figure, for clarity, even though they are treated as members of set II). All of the models here have a suppressed scattering rate at the lowest nuclear-recoil energies, and a corresponding turnover in the event-energy spectrum.
In order to statistically capture the impact of Poisson noise on future data analyses, we create a large number of simulations under each of the scattering hypotheses (allowing for independent realizations of the noise), following Ref. [20]. We repeat this procedure for all experiments in Table 3, for all benchmark DM particle masses and σ p values (set to the current upper limit for our baseline analysis). Examples of simulated spectra for three scattering models are shown in Figure 4 to get a visual sense for the relevant level of noise.  (9) 3-100 1200 Table 3. Mock experiments considered in this work. The efficiency and the fiducialization of the target mass are included in the exposure. The first group of experiments is used for most of the simulations in this work and is chosen such to be representative of the reach of G2 experiments for Xe, Ge, I, and F. The exposure for Xe and Ge is chosen to agree with the projected exclusion curves for LZ and SuperCDMS presented in Ref. [1]. The second group of experiments is used to test impact of the energy window on prospects for model selection; note that only the energy window differs from the corresponding experiments of the first group. The last group represents futuristic experiments, where XeG3 reaches the level of atmospheric neutrino backgrounds.   Table  1, on a Xe experiment described in Table 3, for a 50 GeV DM particle, with a cross section set to current upper limit, calculated in §6.1. Error bars include only the Poisson noise. For this work, we create a large number of simulated spectra such as the ones shown here. For illustration purposes only, we bin the events according to their energy; we perform all analyses on unbinned data.
To simulate a recoil-energy spectrum observed with a single experiment under a chosen scattering model M, given a set of its parameter values Θ (m χ , σ p , and f n /f p ), we use the following procedure. For each simulation, we first draw a number N from a Poisson distribution whose mean is the expected number of events N . We then assign energies to each of these N events following the probability distribution for observing a single event at a given energy, This way, a single "data set" (list of nuclear-recoil energies {E R }) is obtained, including Poisson noise. Repeating this procedure, we create a large number of simulations under scattering model M (defined by the choice of dR/dE R ), for a fixed choice of its parameter values Θ, and thus obtain a statistical representation of possible outcomes of future observations. Before moving on to describing the analysis of the simulated data sets, we again emphasize the importance of this statistical approach: given the absence of a confirmed DM signal from present-day experiments, we already know that the limitations from Poisson noise may severely impact the prospects for identifying the right underlying interaction [20]. Making a statement about probable outcomes of future analyses from a single data realization is thus not informative; it is an essential feature of our approach to take this into account and evaluate possible experimental outcomes in a probabilistic sense.

Analysis
The choice of analysis framework adopted here is Bayesian inference, and in particular, Bayesian model selection. We analyze each simulated energy spectrum either by itself or in combination with spectra from other experiments. The main computational step is evaluation of the posterior probability P(Θ|{E R }, M ), as a function of parameters Θ of the model (hypothesis) M, given simulated nuclear recoil energies {E R }, In our case, Θ is a vector of the model parameters: m χ , σ p , and, optionally, f n /f p ; note that f n /f p is held fixed in most of our analyses, unless otherwise noted (specifically, it is only a free parameter for the purposes of §6.4). The nuclear response functions are fixed. All astrophysical parameters are fixed to the values in §2. 8 The posterior is reconstructed as a product of the likelihood L({E R }|Θ, M) (the probability of data 9 , given theory) and the prior p(Θ|M). We choose wide priors on all parameters, in order to remain as agnostic as possible 10 . The normalization in Eq. (5.1) is the evidence of model M, i.e. the integral of the likelihood over the entire prior parameter space, which can be thought of as a measure of how well model M fits the data overall. The likelihood in Eq. (5.1) takes into account the energy distribution of all recoil events, where P and P 1 are defined in Eqs. (4.1) and (4.2), and the product is over all observed events.
(In the case of the F experiment that has no energy resolution, the likelihood is taken to be only the Poisson factor.) To reconstruct the posterior probability on the relevant parameter space, we use PyMultiNest [48] (a Python wrapper of the MultiNest multi-modal nested sampler [49]), with the following parameters: efr= 0.3, tol= 0.01, and 2000 live points.
Within the Bayesian framework, the probability the data {E R } assigns to the model M j (where M j is one of the models listed in Table 1) is given by the ratio of its evidence to the sum of evidences of all the competing hypotheses, Data are said to "prefer" a given model if the corresponding evidence ratio evaluated from Eq. (5.4) is high. To summarize our results in a succinct manner in the following Section, we introduce a (somewhat arbitrary) label of "successful model selection" for the case where the right underlying model is assigned more than 90% probability; in this case, we say that the right model was "confidently selected." Conversely, data is "indecisive" if it assigns about equal probability to several competing models.
For the purposes of model selection, we divide the 14 scattering models listed in Table 1 into two groups of competing hypotheses, "set I": SI, SD, Anapole, Millicharge, MD (light med.), MD (heavy med.), ED (light med.), ED (heavy med.); and "set II": SI q 2 , SD q 2 (Higgs-like), SD q 2 (flavor univ.), SD q 4 (Higgs-like), SD q 4 (flavor univ.), L · S-like, MD (heavy med.), and ED (heavy med.). We compare evidences of models in set I against each other (for nuclear recoil spectra simulated under those models), and do the same separately for set II. This division of models is physically motivated: models of set I include the two standard interactions, supplemented by six photon-mediated scattering scenarios, and are representative of the theoretically best-motivated spectral features that may arise in recoil spectra (driven by specific momentum dependence of the scattering interaction at hand). Moreover, set I comprises a smaller number of hypotheses than the entire Table 1, and its statistical analysis is more computationally tractable. Most of the model-selection results presented in §6 focus on set I. On the other hand, models of set II incorporate all models of Table  1 that show a turnover in recoil rate at low recoil energies, and additionally isolate nuclear responses that are subdominant or present in different linear combinations in set I. In addition to MD (heavy med.) and ED (heavy med.) which overlap with set I, they also include the pseudoscalar-mediated and L · S-like scattering. This subset expands the coverage of modeling space and lets us investigate distinguishability amongst a pool of models that give rise to qualitatively similar spectral features.

Results
As a first step, in §6.1 we calculate current upper limits of the scattering cross section for each model considered in this work, and make predictions for the maximum number of recoil events G2 experiments can hope to see for each model. We then present the results of model selection in §6.2. In §6.3, we examine the projected quality of DM mass measurements under different models. In §6.4, we discuss the impact of uncertainties on the f n /f p parameter on both mass reconstruction and on the success of model selection. In order to provide guidance for future efforts, we investigate the dependence of the recoil signal and the prospects for model selection on the choice of recoil energy window used for data analysis, in §6.5.

Exclusions and expectations
In order to simulate DM signals just beyond the reach of current experiments, we first estimate current upper limits of the scattering cross section for each model considered in Table 1. The corresponding exclusion curves are shown in Figure 5. They are calculated under the assumption that the number of DM-induced events has not exceeded the number of expected background events in LUX [50], SuperCDMS [51], CDMSlite [52], PICO [53], and KIMS 11 [54]. In Figure 5, exclusion limits from individual experiments are combined into a single curve by setting the upper limit equal to the most stringent constraint from the individual experiments. For the models where limits are available, we have checked that our limits qualitatively reproduce the more exact ones available from individual collaborations. However, even a rough estimation of upper limits suffices for our purposes: we simply use it to inform a choice of benchmark cross sections for simulating the data that upcoming and future experiments might hope to see. Set II SI q 2 SD q 2 (Higgs-like) SD q 2 (flavor-univ.) SD q 4 (Higgs-like) SD q 4 (flavor-univ.) L · S ED (heavy med.) MD (heavy med.) ; models on the right, "set II", are those that show a turnover in the recoil spectrum (from top to bottom, on the far right: MD (heavy med.), SD q 2 (Higgs-like) and SD q 2 (flavor-univ.) (these two overlap at this point, but SD q 2 (Higgs-like) is the higher line at lower masses), SD q 4 (Higgs-like) and SD q 4 (flavor-univ.) (the last two overlap at this point, but SD q 4 (flavor-univ.) is the lower line at lower masses), L · S-like, ED (heavy med.), and SI q 2 ). Vertical dotted lines mark the three main DM-mass benchmarks used for the simulations of our baseline model-selection analysis presented in §6.2.1.
We use the upper limits for σ p at the three benchmark DM masses denoted with dotted lines in Figure 5 (15 GeV, 50 GeV, and 500 GeV) to create simulations used for our baseline model-selection analysis in §6.2.1; the same benchmarks are also used when exploring the 11 For KIMS, which is not near-background-free like the other experiments, we obtain the "exclusions" by setting the number of DM-induced recoils to be less than √ N observed , i.e. we assume the signal is not larger than the Poisson noise of the background events.  Table 4. Expected number of nuclear-recoil events for our mock G2 experiments (as defined in Table  3), for each of the scattering models discussed in this work. Cross sections for scattering are set to their current upper limits, presented in Figure 5. The three entries in parentheses correspond to 15, 50, and 500 GeV DM masses, in order.

Model
accuracy of mass reconstruction in §6.3, and the uncertainty on f n /f p in §6.4. In §6.2.2 where we explore targets that favor detection of low-mass DM particles, we introduce another benchmark of a 7 GeV DM particle mass, and use appropriate upper-limit values for σ p . In §6.2.3, we analogously compute projected upper limits assuming that G2 Xe, Ge, I, and F experiments do not see a signal, and simulate data using those limits in order to investigate the ultimate reach of futuristic experiments. To illustrate the statistical sample that represents the most optimistic G2 output under a particular model, we list the number of expected events for our main mock experiments in Table 4; this is representative of the typical numbers of events in simulations used for the baseline model-selection analysis of §6.2.1. To show that the three main benchmark DM masses we choose for our baseline simulations are representative of the statistical sample that might arise for a wide range of DM particle masses, we show the total number of events each one of the experiments in Table 3

Model selection
In this Section, we present the key results of this work: we address the question of how likely it is that direct detection experiments will correctly identify the right underlying scattering theory from a wide variety of competing hypotheses. In addition, we try to identify the particle physics information that we can robustly extract from the same data.  Table 1) on a variety of G2 targets with experimental parameters listed in Table 3. Cross sections are set to their respective upper limits; see Figure  5. This Section is organized as follows. In §6.2.1, we consider Xe, Ge, I, and F experiments from Table 3 for our baseline model selection. This choice covers all major direct detection technologies and considers target elements that would contribute the majority of exposure in a given experiment 12 . In §6.2.2, we additionally evaluate model selection prospects for other targets: Ar, Na, and He, for a specific subset of scenarios. Finally, in §6.2.3, we explore the reach of futuristic experiments, some of which are projected to tap into the irreducible neutrino background. In all simulations used in §6.2.1 and §6.2.2 the signal is set to the current upper limit (cross sections are chosen to match exclusion values of Figure 5), and those in §6.2.3 have a signal at the projected exclusion limit derived assuming a non-detection on G2 Xe and Ge.

Baseline model selection
First, we perform model selection taking simulations created under set I models (discussed in §5) and treating all models from that set as competing hypotheses. After that, we perform analogous hypothesis testing on set II. In order to gain a sense for the complementarity of different targets, we analyze simulated data corresponding to individual experiments followed by combined analyses of several experiments at a time. The main results of this Section are shown in Figures 8, 9, and 10. These Figures only show the most representative cases, for a 50 GeV DM particle; a more complete set of baseline model-selection plots can be found in Appendix A. Each panel in Figure 8 corresponds to simulations under a single underlying scattering model (where σ p is set to its current upper limit for a chosen DM mass). Each column represents an experiment or a combination of experiments (denoted on the x-axis). Each horizontal colored line represents a single realization of the data under that same model (created for the same value of mass, cross section, and f n /f p ). The posterior probability distribution and the corresponding evidence was reconstructed eight times for each simulation (once for each of the competing hypotheses in a given model set); these results are then used to calculate the probability of the right underlying model by evaluating Eq. (5.4). The probability the data set assigns to the right underlying model (given a pool of 8 alternatives) is shown on the y-axis; the spread along the y-axis is due solely to Poisson noise. In Figure 8, we show only the results obtained from simulations of standard SI scattering, millicharged DM, and electric-dipole scattering through a heavy and a light mediator. These constitute distinct cases in the following sense: anapole (on a target with low intrinsic spin) and SD spectra are similar to SI, while the dipole models with a mediator of the same mass have a similar nuclear recoil spectrum. Millicharge is the only model that stands alone in a phenomenological sense, since it has the steepest recoil-energy dependence.
By examining these plots (and the corresponding model probabilities), we can immediately see that the probability for successful model selection is not a monotonic function of the total number of observed events. Instead, model selection depends on a complex interplay of several factors: the type of underlying model, DM mass, and the complementarity of targets used to measure the scattering signal. For example, the only scattering model that is confidently identified regardless of the DM mass, when any two or more experiments are jointly analyzed, is millicharged DM. For all others, the outcome can be significantly different. Similarly, different targets show a varied level of success for different models and masses.
We make two additional observations from this Figure. Firstly, by considering the 3 leftmost columns in each panel of Figure 8, we see that Xe and Ge alone (as realized in experiments like the LZ, Xenon1T, and SuperCDMS) are generically not able to pick out the right underlying scattering process when analyzing data in an agnostic way (i.e. assuming equal prior probabilities on each scattering model); this statement holds for intermediate and high DM masses (see also Figure 23 of Appendix A). For lighter DM particles, prospects are more optimistic only for ED and MD models, especially with a light mediator ( Figure  23). Even in these cases, however, it is only when Ge and Xe data are jointly analyzed that the probability of successful model selection becomes high. Secondly, and crucially, addition of either an iodine or a fluorine target with relatively modest exposure (corresponding to about 200 kilogram-years for our I experiment, and about 90 liter-years on C 3 F 8 for our F experiment) improves the results dramatically, even despite the absence of energy resolution on fluorine. Fluorine shows poor chances for model selection on its own, and iodine alone is only successful for some of the models (only in the case of large DM mass; see Figure 23 of  Figure. Each of the columns represents analysis of simulations from either a single experiment, or a joint analysis of several experiments (denoted on the x-axis). Each panel corresponds to simulations for a single underlying scattering model (indicated in the header) for 50 GeV DM mass; σ p is set to its current upper limit. Each horizontal colored line represents a single realization of the data under that model, and the spread of the lines on the y-axis is due to the Poisson noise. Evidences for all 8 models from set I are compared to compute the probability of the right underlying model, shown on the y-axis. A pile-up of many simulations at more than 90% probability signifies a high chance of selecting the correct underlying model. This chance is generally low if only Xe and Ge are considered, but greatly improved if F or I targets of modest exposure are added to the analysis. See Figures 21, 22, and 23 for related results for other models of set I and for other DM particle masses.
Appendix A). However, these targets are highly complementary to Xe and Ge for the purposes of model selection-the probability of selecting the correct model when data from Xe, Ge, and either I or F is analyzed jointly is much greater than when Xe and Ge are considered on their own. In other words, , such that model i and j are indistinguishable using only Xe and Ge, we find that including a F or I experiment can break the degeneracy, so that . We now discuss the first conclusion in more detail. If we look at the shapes of the recoil spectra for set I, as shown in Figure 2, we see that certain subsets of models look similar on these targets within the relevant energy windows. These are the models with a similar energy (momentum) dependence of the dominant scattering response R(E R , v) of Eq. (3.6) (see also the middle column of Table 1). For SI, SD, and Anapole, the dominant response on a target with a small net spin (such as Xe and Ge) does not have any additional energy dependence, so R does not vary strongly with E R . For the light-mediator dipole models, there  Figure 9. Same as Figure 8, except that the vertical axis shows the sum of probabilities for all models that have the same momentum dependence. In contrast with prospects for selecting the right underlying model (shown in Figure 8), Ge and Xe targets are able to confidently identify momentum dependence of the underlying interaction. The simulations used here are for DM particle mass of 50 GeV.
is an additional ∼ 1/E R enhancement of the recoil spectrum at low energies. For the heavymediator dipole models, there is a turnover feature due to an additional E R suppression. Millicharge DM has a steep enhancement that goes like ∼ 1/E 2 R . These dependences dictate the level of model degeneracy we observe in results for Xe and Ge. Combining the data from these two experiments does not alleviate this degeneracy. For example, the probabilities of the right model pile up around 33% for SI, SD, and Anapole (as expected from this three-fold degeneracy), and around 50% for the dipole models with the light mediator, and for the dipole models with a heavy mediator. When we examine the probabilities of individual models, we confirm this picture; for example, electric-and magnetic-dipole models with a light mediator always appear to be "false positives" to each other in the sense that the model probability in simulations created under one of the two models is roughly evenly distributed between the correct model and its counterpart. These observations motivate a slightly different presentation of the results. In Figure  9, the simulations and fits are the same as those used for Ge and Xe in Figure 8, with one difference in the presentation: the y-axis now represents the sum of probabilities for the several models that share the same energy dependence of the scattering rate. In other words, we sum the probability for SI + SD + Anapole, and also for light-mediator models (including dipoles and the Millicharge), and finally for the heavy-mediator dipoles. This Figure shows that Xe and Ge with G2 capability are able to confidently select the right momentum dependence "class" of the model at hand, for a direct detection signal close to the current upper limit. If data from the two experiments are jointly analyzed, this outcome is close to 100% likely, despite appreciable Poisson noise in the data. Therefore, even without considering iodine or fluorine targets, it is likely that Xe and Ge experiments will correctly select the mediator mass and rule out models with "heavy" or "light" mediators, depending on the results of model selection, if the signal is confidently detected on both targets. This would allow us to narrow the set of operators under consideration, but would not provide sufficient information to identify the correct UV completion.
Let us now examine the "complementarity" of I and F experiments with Ge and Xe in more detail. Ge and Xe can provide a handle on the momentum dependence of the underlying model owing to their energy resolution. However, germanium and xenon are dominated by isotopes with relatively simple nuclear structure, and signals arising from phenomenologically similar models (for instance, SI, SD, and Anapole) are usually indistinguishable on these targets. However, both fluorine and iodine are dominated by isotopes with large spin. As a result, they yield widely different (total) number counts for models that appear to have the same spectral shapes on xenon and germanium (see, for instance, SI, SD, and Anapole in Table 4, and in Figure 6), breaking degeneracies between individual models within the same momentum-dependence "class". In some cases, even a small event count (or a null result) on these targets provides enough complementary information to result in highly successful model selection, when considered in combination with other data.
Finally, since we have concluded that it is likely that the energy dependence of the response can be deduced in a robust manner using future data sets, we now examine a set of models that all display a similar feature in the recoil spectrum. In particular, we examine the 8 models with a turnover in the spectrum at low recoil energies; such models comprise set II, as discussed in §5. The results of this analysis are shown in Figure 10.  Figure 25 of Appendix A), we see that the prospects for selecting out the correct model from amongst 8 possibilities with the same spectral feature is excellent, given a signal close to the current upper limit, and if data from germanium, xenon, and either fluorine, or iodine targets are jointly analyzed. The only exceptions are the SI q 2 and the electric-dipole scattering through a heavy mediator, which remain degenerate with each other. Since momentum suppression is a common feature of a wide class of nonstandard models (indeed, the momentum suppression is typically what ensures their subdominance compared to the standard models), being able to identify the single correct model from an agnostic analysis of many alternatives is an impressive and valuable capability of G2-like experiments.

Other targets
In this Section, we briefly consider the model selection capability of additional nuclear targets. We start by examining an argon target with G2 capability, as defined in Table 3. Due to its high energy threshold as compared to most other experiments (30 keV), Ar would not capture most of the telltale spectral features that would give a unique handle on the underlying interaction. However, because of the wide energy window such a target can achieve (up to 200 keV or more), Ar may contribute statistical information about the high-energy end of  Figure 10. Same as Figure 8, but for models of set II (discussed in §5) compared against each other. Model selection prospects for identifying the right interaction from amongst many alternatives that all produce a turnover in the recoil energy spectrum are excellent, for a signal just below the current limit, when G2 xenon and germanium targets are combined with a modest exposure on fluorine or iodine. See Figure 25 of Appendix A for the rest of the related plots.  Figure 11. Results of model selection preformed amongst set-I models, analogous to Figure 8, but for argon-based G2 experiment defined in Table 3, for a 500 GeV DM particle. the recoil spectrum for large DM masses. For this reason, we examine a mock version of this experiment on simulations for 500 GeV DM, and only perform model selection amongst set-I models, in analogy with our baseline results in §6.2.1. The results forming a representative set of simulations are shown in Figure 11. From this Figure we see that Ar produces a moderate improvement when combined with Xe and Ge. On its own, it has model-selection power comparable to that of Xe, with slightly better prospects for heavy-mediator dipole model, and slightly worse for the light-mediator model. As expected, Ar performs best for models with  Figure 12. Results of model selection preformed amongst set-I models, analogous to Figure 8, but for He, Ge, and Na experiment defined in Table 3, for a 7 GeV DM particle.  Figure), where the number of expected events at high energies is maximal. In these cases, the highenergy lever arm compensates for the high energy threshold. However, the complementarity of Ar with Xe and Ge, despite its superior exposure and high-energy capabilities, does not match that of F or I (see Figure 23 in Appendix A for comparison).
We also examine the model-selection capability of targets that are kinematically favorable for detection of very low-mass DM particles that are otherwise below the sensitivity of most targets. For this purpose, we create simulations for a 7 GeV DM particle, the lowest DM mass considered in this work. We simulate spectra on helium, sodium, and germanium targets as represented in Table 3, and we once again repeat the model selection exercise using set-I models as competing hypotheses. Results for a representative subset of simulations is shown in Figure 12. We also draw attention to Figure 7, showing the number of expected events for low-mass DM particles on these targets. This gives a sense of the statistical sample on hand for a single simulated data set. Figure 12 shows that the single most successful experiment is Ge, partly owing to its low energy threshold. He and Na do not seem to have much model selection power by themselves for the model simulations examined in this Figure. However, there is once again a degree of complementarity amongst these three targets that shows as a visible improvement in model selection success probability when several data sets are combined.

Ultimate prospects
To gain a sense for the ultimate model-selection capability of direct detection, we generate a new set of simulations with experiments that reach the irreducible background of atmospheric neutrinos. For these simulations, the signal is assumed to be just below the thresholds of G2 Ge and Xe (described in Table 3). The following experiments are considered: a nextgeneration xenon-based experiment "XeG3", a large-exposure fluorine experiment "F+", and a low-threshold next-generation iodine experiment "I+" (all described in Table 3). We analyze these simulations in the same way as for our baseline analysis of §6.2.1, for set-I scattering models only. The results are shown in Figure 13. The takeaway point from this Figure  confirms results reported in previous studies [20]: if a signal is not seen at G2 experiments, prospects for model selection with future experiments are slim. Addition of fluorine and iodine targets helps, but a large exposure on these targets, combined with data from liquid-noble (and other) experiments might be necessary to have a chance of identifying the interaction producing any putative recoil signals.

Reconstruction of DM particle mass
Since the DM particle mass is one of the key DM properties of interest, and likely to be the first measurement compared amongst different DM searches, we examine in more detail the problem of mass reconstruction under different underlying scattering models. First, we analyze the accuracy of mass measurements in an optimistic scenario where a signal is just below the current limit, assuming data from germanium, xenon, and fluorine targets are jointly analyzed. We show marginalized posterior probability distributions for the mass in Figure 14, derived from a joint analysis of data from Xe, Ge, and F experiments of Table 3. The simulations used for this Figure constitute a phenomenologically representative subset of the simulations of our baseline analysis discussed in §6.2. We show the posteriors obtained assuming the correct scattering hypothesis only. From this Figure, we see that the mass reconstruction accuracy for all models we consider is similar to that expected from SI and SD interactions, if the correct model is fit to the data. The accuracy is degraded for higher masses, as usual, due to the degeneracy between mass and cross section in this regime.
On the other hand, we confirm results from Ref. [20]: if a wrong model is fit to data, mass estimates can be severely biased, inconsistent amongst different experiments, and have poor accuracy. To illustrate this, we examine the scenario where the standard SI model is fit to recoil-energy spectra generated by another ("true") model. This reflects a likely choice for parameter estimation in the DM direct detection discovery phase, that data will be fit with SI scattering; however, the key points we make based on this scenario are more general in scope.
In Figure 15, we again select a representative subset of our baseline simulations; this time, we only examine fits of the standard SI interaction to scattering signals arising from non-standard operators (specifically, magnetic-dipole scattering through a light mediator, and electric-dipole scattering through a heavy mediator, to cover models without and with the turnover at low-energy recoil rates). The input DM mass is 50 GeV, denoted with a red vertical line. Plots in the LHS column of that Figure correspond to simulations on Xe, and those on the RHS to Ge. We conclude that, when a steep recoil-energy spectrum (from a lightmediator model) is fit with a standard SI interaction, the posteriors are narrow but biased towards low DM masses. This happens because low masses drive a steeper energy spectrum, making the SI model a better fit to the "enhanced" rate at low recoil energies produced by light-mediator scattering (see the top row of Figure 15). Even though inconsistency amongst measurements with different targets might be a hallmark of a wrong scattering hypothesis (compare LHS and RHS panels of the top row in this Figure: biases are different on different targets), it is important to keep in mind that such a cross check may not be available if the signal is only strong enough to be seen with a large exposure (the case of Xe here). On the other hand, when a simulation with a turnover feature (here: from a heavy-mediator dipole model) is fit by the standard SI interaction, the posteriors are artificially wide, with a smaller bias (see the middle row of plots in Figure 15). By eye, however, it is not possible to determine that the SI model was a bad fit to data; see the bottom row of Figure 15, while model selection with Ge and Xe is able to pick out only the right momentum-dependence class (see bottom right panel of Figure 9). In conclusion, model selection is an important step that should ideally precede parameter estimation in future data analyses, as fitting of the wrong scattering model can severely impact both the accuracy and precision of key DM parameter measurements.
6.4 f n /f p uncertainty As discussed in §3 and emphasized in, for example, Refs. [38,55], there are large modeling uncertainties in the choice of the ratio of neutron to proton coupling for some of the models Lmax (fit SI Higgs) Simulated data Figure 15. Top and middle row: Examples of marginalized posteriors for the DM mass, for a subset of our baseline simulations. The fitting model in all four panels is the standard SI interaction, while the model used to create simulations is magnetic-dipole scattering with a light mediator (upper row), and electric-dipole scattering with a heavy mediator (middle row). The LHS and RHS panels correspond to simulations for a xenon and germanium targets, respectively. These posteriors demonstrate massestimation biases when a wrong model is fit to data; the input DM mass is 50 GeV, shown with a vertical red line. Bottom row: an example of simulated recoil-energy spectrum, shown together with the true underlying model (blue solid), and the fitting model (red dashed) for maximum-likelihood values of the fitting parameters. These simulations correspond to the posteriors shown in the middle row of this Figure. By eye, SI does not look like a bad fit to these data, for either xenon (LHS) or germanium (RHS). considered here. It has been pointed out in the literature that direct detection data are not very likely to be able to reconstruct this parameter [56], or even that fine-tuning such a parameter can upset expectations of relative experimental sensitivities [57]. 13 We thus investigate the impact of this uncertainty on the DM mass measurements and on model selection in this Section.
First, we investigate how the uncertainty on f n /f p impacts the extraction of the DM particle mass. In Figure 16, we show the marginalized 2D posterior for DM mass versus f n /f p for our baseline simulations of several scattering models, where data from Xe, Ge, and F experiments is jointly analyzed. When performing fits to recover these posteriors, we let f n /f p vary as a free parameter between -10 and 10 and fit the right scattering model (with  Figure 16. Examples of the posteriors for a joint analysis of Ge+Xe+F simulations; the right scattering operator was used for the fits. The simulations were created using (left to right): the standard SI model, the standard SD model, and the Millicharge model (f n /f p values used for these simulations are listed in Table 1). The red X denotes the input values. When performing posterior analysis, the f n /f p was left as a free parameter with a wide prior (from -10 to 10), in addition to m χ , and σ p (set to the current upper limits in the simulations). In the SI case, f n /f p is completely unconstrained, in SD case, the posterior is bimodal and there is a degeneracy with the sign of this parameter, and in the Millicharge case, the right sign is picked out, but the posterior is still broad.
the right operator and scattering rate, described in Table 1, but with a free f n /f p ) to the simulated data. For simulations created with the SI model, f n /f p is completely unconstrained; for the SD simulations, the posterior is bimodal and there is a degeneracy of the sign of this parameter; and for the Millicharge simulations, the right sign is picked out, but the posterior is fairly broad. 14 However, in spite of introducing this additional degree of freedom, there does not seem to be a significant impact on the measurement of the DM mass; f n /f p and m χ thus do not appear to be degenerate with each other, if the right scattering hypothesis is chosen. Second, we examine how freedom in f n /f p might affect model selection. For this purpose, we take a small subset of our baseline simulations described in §6.2.1-simulations created under the standard SI model, and under the Anapole model-and analyze them in a new way. For this analysis, we pick the following 10 models from Table 1 as competing hypotheses to fit to the simulations: SI, SD, the four dipole models, Anapole, SI q 2 , SD q 2 , and SD q 4 . However, unlike in previous analyses where we held f n /f p fixed to the correct value when performing model fits, this time, we allow f n /f p to be an additional free parameter in the range between -100 and 100 (note that this is an even wider prior range than that used above to investigate impact on mass determination), for all models where the choice of this parameter is not fixed. 15 The presentation of the results in Figure 17 is as before. We jointly analyze data from Xe, Ge, and F experiments for a 50 GeV DM particle and a signal at the current upper limit, a combination that has close to 100% success rate in our baseline analysis. From Figure 17 and by examining the associated model probabilities, we see the following. First, model selection is significantly degraded for simulations under SI. However, most of the probability in this case is actually only distributed between SI and SD models. For Anapole, the situation is almost unchanged from the baseline analysis: this model is confidently identified even allowing such a large modeling uncertainty.
The results presented in this Section only cover a small subset of possible underlying models and DM masses. However, their implications are very optimistic for future data  Table 1 (SI, SD, four dipole models, Anapole, SI q 2 , SD q 2 , and SD q 4 ) treated as competing hypotheses, with f n /f p fit as a free parameter, for all but the photon-mediated models. As compared to our baseline results, model selection still seems robust to allowing this additional degree of freedom, at least for SI and Anapole shown here.
analysis-they indicate that selection of the right underlying model may be robust to the uncertainty on the ratio of nucleon couplings. We emphasize that this conclusion holds true even if there are values (or localized portions of the parameter space) for f n /f p for which data can be well fit by a "wrong" model. The evidence calculation takes into account the entire prior space, and the existence of such "special islands" of f n /f p values does not affect the results of Bayesian model selection.

Experimental designs
So far, our analysis has demonstrated the importance of combining a variety of nuclear targets (including germanium, xenon, fluorine, and iodine) to correctly identify the type of DMnucleon scattering interaction using direct detection. In this Section, we examine how the statistical sample (i.e. the number of observed recoil events) and the quality of model selection depend on the recoil-energy window available for data analysis, for different targets. Figures 18 and 19 show how the number of expected events changes as a function of the lower and upper energy threshold, respectively, for selected models and DM masses. Firstly, because the number of events flattens out at high energy (even for 500 GeV DM particles, and for models with a long higher-recoil-energy tail presented in Figure 19), little is gained by looking for recoils in the range > 200 keVnr, regardless of the scattering model. Secondly, for most targets there are a variety of models whose strongest signature is at low recoil energies, and especially for models where the mediator is lighter than the momentum transfer; this feature is also more pronounced for low DM masses (as illustrated in Figure 18). Thus, we conclude that a wide energy coverage (up to ∼200 keVnr) and especially low-energy thresholds (below ∼1 keVnr) are generally beneficial for recovering particle-physics content from direct detection data.
While these two Figures provide us with a sense of how the event counts change as a function of the energy windows, they do not directly translate into implications for the success of model selection; as we have seen in 6.2, success of model selection is not just a function of the number counts of events, but depends on the interplay of several factors. For this reason, in order to provide guidance for future experimental designs, we investigate the  Table  3 except E min R . Models shown here belong to set I discussed in 5; statistical gain of extending the analysis window to lower thresholds is most evident for low masses, so here, we show results for 15 GeV DM particle.
impact that changes to the energy window have on the results for model selection for two of our baseline experiments: Xe and I. For Xe, we consider three modified versions of this experiment: "Xe(lo)" with a lowered low-energy threshold, "Xe(hi)" with a higher high-energy threshold, and "Xe(wide)" with a wider energy window "Xe(wide)". For I, we only investigate lowering the low-energy threshold in "I(lo)" (see Table 3 for other parameters). We then repeat the model selection of 6.2.1.
We again perform model selection amongst models of set I, for a selected representative subset of our baseline simulations for a 50 GeV DM mass. We compare these results to the results (presented previously) for our baseline versions of Xe and I, in Figure 20. From this Figure, the following conclusions may be drawn. For an intermediate DM mass, expanding the energy window either towards lower or higher recoil energies produces a slight improvement in model selection prospects for Xe. However, only the simultaneous widening of the energy window enables a (slight) chance that Xe alone could confidently identify the underlying scattering model. Unsurprisingly, a larger improvement is visible when the low energy threshold is lowered for I (which was previously ∼22 keVnr), and is most pronounced for light mediator models. So, adjustment of the energy windows is undoubtedly beneficial for model selection, but it is not sufficient for success: complementarity of available targets plays a more  Table  3, except E max R . These plots are for set-II scattering models of §5, and a 500 GeV DM particle, for which the statistical gain of extending to higher energies is most evident.
important role.
Finally, we make one more comment about target complementarity. From Figures 6 and  7, previously emphasized superiority of fluorine targets for SD scattering is well illustrated in the context of a variety of models considered in this work. Namely, several pseudoscalarmediated scattering models that produce a very poor statistic, or are otherwise practically invisible on both G2 germanium and G2 xenon targets, make a strong signal on 600 kilogramyears of exposure on fluorine. The availability of nuclear targets with spin structure different from that of Xe and Ge may thus be of pivotal importance for establishing the first detection of a DM signal and understanding the physics that gives rise to this scattering.  Table 3, with versions of the same experiment where the energy window only was changed. In the top row, Xe(lo), Xe(hi), and Xe(wide) denote Xe experiments with the following modifications, respectively: low-energy threshold lowered to 1 keV; high energy threshold increased to 100 keV; and energy window expanded to 1 keV-100 keV. In the bottom row, I(lo) represents an I experiment with an energy threshold lowered to 1 keV. In the case of Xe, expansion of the energy window does not significantly affect the ability to distinguish different scattering scenarios. For I, the impact is more significant, but not large enough to enable reliable model selection when only data from I target is analyzed.

Summary and conclusions
We have investigated the distinguishability of a wide variety of DM-nucleon scattering models in the context of noisy data from Generation-2 and futuristic direct detection experiments. For this purpose, we simulated over 8000 recoil energy spectra for a range of nuclear targets (with exposures and energy windows representative of many existing and proposed experiments), under models that give rise to a variety of scattering phenomenologies, including nontrivial momentum and velocity dependence of the scattering rate. We then analyzed these simulations agnostically, using Bayesian posterior analysis and model selection (in over 130000 MultiNest runs, requiring ∼ 2 × 10 4 CPU hours) to establish how likely direct detection is to identify the right underlying interaction given a wide set of competing hypotheses. We also investigated a range of questions related to the extraction of dark matter parameters concurrently with performing the model selection.
The key results of this study are shown in Figures 8, 9, and 10. We demonstrated that, for a signal just below the current upper limit, xenon and germanium targets alone, as realized in LUX/LZ and SuperCDMS, respectively, are likely to identify the momentum dependence of the scattering and thus exclude large classes of scattering models, if a strong signal is seen on all of them. However, they are not likely, except in special cases, to single out the correct underlying contact operator describing the scattering. The addition of a relatively modest exposure on either fluorine or iodine targets drastically improves these prospects. Target complementarity emerges as essential for extracting the underlying physics of DM-nucleon interactions in light of significant Poisson noise, which is the regime in which the first signals would appear.
In addition to model selection, we also investigated the quality of DM mass estimation, and concluded that accuracy of mass estimation is driven by what the mass is (marginalized posteriors are wide for larger masses, and narrower for smaller) rather than by the type of the interaction at hand, assuming that the correct scattering model is fit to data. However, we also showed that, if a wrong model is fit to data, the corresponding mass posteriors can be severely biased, inconsistent amongst different experiments, or unnaturally wide for a given mass. We therefore emphasized the need for model selection as an integral part of future data analysis. Furthermore, we explored the impact of modeling and measurement uncertainty on the ratio of nucleon couplings for some of the models considered here. We showed how this affects mass estimation and model selection. This uncertainty, though very large (i.e. data are typically unable to constrain this parameter well, if at all), does not significantly impact the recovery of the mass, and the uncertainty also shows a mild impact on model selection. Such freedom only confuses certain models (like the standard SI and SD), while others (like Anapole) remain distinct when data from germanium, xenon, and fluorine targets are analyzed jointly. This gives us confidence that model selection results discussed in this work are robust to this uncertainty. Finally, in order to inform future experimental designs, and given the variety of scattering phenomenologies accessible to direct detection, we quantified how changing the recoil-energy analysis windows affects the event counts on different targets; we also investigated the impact of such changes on prospects for model selection. We concluded that widening the energy window for individual targets is generally beneficial, although target complementarity still emerges as the main lever arm for identifying the interaction that governs scattering.
At the end, we outline several caveats left for future study. Firstly, in the analysis presented here, we omitted consideration of backgrounds in all experiments. Although this assumption holds for most scenarios we focus on, it may break down in some cases (for example, for extremely low-energy threshold experiments), and it is an important and worthwhile exercise to further investigate model-selection prospects in the presence of significant backgrounds. A similar argument holds for the assumption of perfect energy resolution-while it is expected to hold well for most experiments and most recoil energies considered here, it is a natural next step to expand the analysis presented here to account for finite energy resolution. Additionally, nuclear response functions we used in this study include a degree of nuclear-physics uncertainty that should be accounted for when comparing scattering models and performing parameter estimation with future data. Even though it is currently unclear how to quantify and marginalize over this uncertainty, we make an initial qualitative investigation of their impact in Appendix B. As a result, we are optimistic that nuclear uncertainties will not be a significant obstacle in performing model selection, given a strong signal, but a further detailed investigation is warranted to precisely quantify the validity of this expectation. And finally, the uncertainty in the local dark-matter velocity distribution may also complicate parameter reconstruction and model selection. However, it is also not likely to significantly degrade prospects for model selection. Two key points go in support of this prediction: i) the hierarchy of the expected numbers of events on different targets, shown in this study to be crucial for successful model selection, is unchanged when the velocity parameters of the Maxwellian distribution are varied within their error bars, and ii) the velocity integral that factors into the recoil-energy spectrum, does not show much variation in its value as a function of the minimum velocity, for plausible halo models; thus, it is reasonable to expect that the uncertainty on the exact shape of the halo model will have little degeneracy with the shape of the recoil-energy spectrum, crucial for distinguishing scattering models. For more details, see Appendix C.
For a direct detection signal just below the current thresholds, prospects for understanding dark matter properties using data from Generation-2 experiments are good. In the event of a detection, a comprehensive analysis following the roadmap of this study, including an exhaustive coverage of the modeling space and a joint-analysis capability for multiple experiments, with consideration of the caveats above, will be essential to identifying the theory of dark matter using direct detection.
To obtain the numerical results of this study, we have developed dmdd, a Python package for direct-detection simulation and analysis, now available to the community.

B Appendix: Nuclear uncertainty
Here we briefly consider the uncertainties in the nuclear response functions, i.e. the uncertainties of embedding DM-nucleon interaction into the nucleus, in the context of model selection. We do not consider the (challenging) complications of going from a quark-level description of DM interactions to the nucleon level, since we are primarily interested in effective nuclear interactions.
We quantify the effects of the uncertainty in the normalization (values at zero momentum transfer) of the nuclear responses on the total number of expected events for different scattering hypotheses in Figure 26. This Figure presents a visualization of some of the information in Table 4, with added content. The solid lines show the number of events expected in our mock experiments as a function of DM mass and interaction type, for our fiducial nuclear response functions (from Ref. [16]); the dashed lines represent the expected events given an alternative normalization of the spin-dependent response functions so that they match the zero-momentum-transfer values as calculated in Ref. [59], which employed a more sophisticated calculation than Ref. [16].
The combined spin and orbital angular momentum response on which the Anapole and MD rates depend reduce to the nuclear magnetic moment at zero momentum transfer (see e.g. Ref. [12]). The dashed lines in the Anapole and MD plots were produced by normalizing the spin and orbital angular momentum response to match the measured values of the relevant nuclear magnetic moment at zero momentum transfer (see Ref. [12] and Ref. [16]). Since the spin-independent response dominates the Anapole and MD rates for Xe, Ge, and I, the effect of the alternative normalization is essentially negligible. Since F is much lighter and has a significant magnetic moment, the effect of the alternative normalization is more apparent.
Finally, the biggest difference shows up in the SD case for Xe on the lower left (a factor 2 change in the expected events). Most importantly, however, the hierarchy of event counts on different targets for the different masses remains the same. In this study, we show that a F-based experiment dramatically enhances model selection probabilities when combined with Xe and Ge, even though it is modeled as having no energy resolution above threshold. The pattern of events among targets is therefore critical to distinguishing among operators, and this pattern is unchanged by the shifts in the normalization.
Going beyond the values of the responses at zero momentum transfer, there is also some uncertainty on the spectral shape of the form factors [60]. However, most of the previous work that explored nuclear uncertainty impacts on direct-detection measurements implies that the impact would be fairly modest, at least from a model-selection perspective [60,61]. This is largely driven by the fact that the spectral shape uncertainties are smallest for the smallest momentum transfers, the most relevant range for direct detection.
In summary, we have reasons to be optimistic about model selection being fairly robust to nuclear uncertainties. However, it will be important for future precision DM studies that these uncertainties are quantified and reduced [62].

Millicharged
Elec Dip Heavy Elec Dip Light 15GeV 50GeV 500GeV 15GeV 50GeV 500GeV 15GeV 50GeV 500GeV 15GeV 50GeV 500GeV  Figure 26. Expected events for set-I models at G2 Xe, Ge, I, and F experiments given DM masses of 15 GeV, 50 GeV, and 500 GeV. Solid lines show the number of expected events given the nuclear response functions used in this study [16], while the expected events given alternate normalizations of the spin and magnetic form factors (as described in the text) are shown as dashed lines.

C Appendix: Astrophysical uncertainty
Unlike the nuclear response functions, the astrophysical uncertainties will propagate equally to all targets-all targets will be subject to the same systematic uncertainty. This makes the problem of handling these uncertainties more tractable than in the case of nuclear uncertainties, where either marginalization over the systematics or full recovery of the shape of the speed distribution directly from data [19] might be possible.
Repeating the approach we took in Appendix B to investigate the impact of nuclear uncertainties, here we again evaluate how the total number of expected events might change when astrophysical parameters are let to vary within their respective observational ranges, focusing on the isotropic Maxwell-Boltzmann distribution of §2. If we vary the three main parameters of the model-v esc , v lag , and the rms speed v rms -within their observational uncertainties (or implied uncertainty for v rms in the context of an isothermal halo), we find that uncertainties in v lag drive the variations in the total number of events; see Figure 27. While we see more spread in the expected number of events for each experiment for a fixed model and DM mass (especially for low DM masses) than when we consider nuclear uncertainties, the pattern of the relative number of events in each experiment does not change. Each response and each DM mass we investigated still has a unique hierarchy of events in each experiment. For example, there are always ∼ 100 times more fluorine than xenon events, and ∼ 5 times more xenon than germanium events, for a 15 GeV particle with a dominant spin-dependent interaction. That same particle would still always have close to equal numbers of events in the xenon, fluorine, and germanium experiments in case of an electric-dipole interaction with a light mediator. 17 While the exact number of events in each experiment, and indeed the spectral shape seen in different experiments, will have significant consequences for parameter estimation, the fact that the hierarchy of events remains almost unchanged for the v lag uncertainties suggests that model selection, and in particular selection of the right momentum dependence of the dominant response, should be robust to uncertainties in the DM halo model velocity distribution. To exactly quantify the level to which this expectation holds as a function of target nucleus, DM masses, and halo model at hand, a comprehensive new analysis that follows our approach will be necessary.