Detection of Domestication Signals through the Analysis of the Full 1 Distribution of Fitness Effects using Simulations 2

ABSTRACT


INTRODUCTION
The increase in human population and the emergence of modern society are linked to the domestication of plants and animals (Purugganan and Fuller 2009;Driscoll et al. 2009;Larson and Burger 2013;Amills et al. 2017).Human civilization it was made possible by the domestication of surrounding life forms, where humans began to domesticate plants and animals such as wheat, dogs, pigs, or chickens (Dayan 1994;Zeder et al. 2006;Zeder 2012Zeder , 2015;;Redding 2015;Avni et al. 2017).Domestication is a process that allows humans and other species to establish a long-term mutualistic relationship that provides benefits to both species (Zeder et al. 2006).Human domestication of fauna and flora began about 10-15 thousand years ago and is still ongoing (Larson and Burger 2013;Zeder 2015).Although human civilization is based on domestication, we still lack a complete genomic and evolutionary understanding of domestication.Domestication is a rapid process on the evolutionary time scale, but it is not a discrete event and implies the gradual improvement of domesticated traits.It is believed that human-induced artificial selection during domestication can be expected to be relatively stronger and therefore faster than natural selection.However, in plants it has been shown that the evolutionary rate of domesticated varieties is similar to that of wild plants, suggesting a process similar to natural selection (Purugannan and Fuller 2010).
In addition, domestication tends to be associated with bottlenecks; only a small number of individuals from the wild population are domesticated, which is expected to reduce the efficiency of natural selection (Wright et al. 2005).Another important difference between natural and artificial selection is that modern breeders typically use truncation selection, which is the selection of the top percentage of individuals for the desired trait (Granleese et al. 2019).
The prevalence of truncation selection in nature, or prior to industrialization, is unknown.
Truncation selection is known to be a simple and efficient form of directional selection (Crow and Kimura 1979), and no severe accumulation of genetic load is expected in outcrossing species (Kondrashov 1988;Ohta 1989) if the population size remains large enough (Marsden et al. 2016).A recent comprehensive meta-analysis of the genetic costs of domestication (Moyers et al. 2018) found that deleterious variants are more numerous (or segregate at higher frequencies) in domesticated populations compared to their wild relatives However, this pattern may not be general, as seen in sorghum (Lozano et al. 2021).This pattern is likely driven by a number of processes that collectively reduce the effectiveness of selection in domesticated populations, as first suggested in rice genomes (Lu et al. 2006).
Selection, both natural and artificial, can occur through either a few loci with strong effects or many loci with small effects (Jain and Stephan 2017a;b), depending on the genetic architecture of the trait and the strength of selection on it.Different patterns of genetic diversity around selected loci are expected in response to these two models of selection (Stephan and John 2020).Classic hard selective sweeps have been reported at a few candidate loci among important domesticated traits (Andersson 2012), such as the IGF2 gene region associated with lean domestic pigs (Van Laere et al. 2003), the thyroid-stimulating hormone receptor (TSHR) in domestic chickens (Rubin et al. 2010), or the sh4 and qSW5 loci (Shomura et al. 2008;Li et al. 2018) involved in the traits of seed shattering and grain width, respectively, in domestic rice (Huang et al. 2012).These examples are consistent with a simple Mendelian genetic architecture, in which a few loci account for most of the variance in the domesticated trait (for a comprehensive list of genes important for domestication, see Courtier-Orgogozo and Martin 2020).
Polygenic adaptation, on the other hand, describes a process in which a constellation of small changes in allele frequencies modify differences in the trait under selection, where a trait is a phenotypic trait but also can be the fitness itself.A wide range of population genetic models and simulations have been examined to describe polygenic adaptation (de Vladar and Barton 2014;Stephan 2016).Some models analyze the polygenic response of a trait in the presence of mutation and stabilizing selection (de Vladar and Barton 2014; Stephan and John 2020), while others capture the response of a trait under mutation and stabilizing or directional selection following an environmental change in a finite size population (Jain and Stephan 2017a;Devi and Jain 2023).In practice, polygenic adaptation using genetic data is harder to detect than classic selective sweeps (Pritchard et al. 2010), but the combined use of phenotypic data together with genetic data can detect the selective effect of quantitative traits (e.g., Chen et al. 2021, Berg andCoop 2014).Polygenic adaptation has been detected in some specific studies in wild, domesticated and experimental evolution populations (Barghi et al. 2019;Reid et al. 2023;He et al. 2023).
In this study, we ask to what extent we can detect a genomic signal of domestication using a different approach: comparing the full distribution of fitness effects (DFE) on new and standing variation.To do this, we inferred the full DFE of new non-synonymous mutations in wild and domesticated populations.The DFE of new deleterious mutations has previously been estimated by contrasting the 1D-SFS of synonymous and non-synonymous mutations from a variety of species, assuming that beneficial mutations contribute only to divergence but not to polymorphism due to their rapid fixation in the population (Keightley and Eyre-Walker 2007;Boyko et al. 2009;Kim et al. 2017;Tataru et al. 2017;Barton and Zeng 2018).Tataru et al. (2017) proposed a model, polyDFE, to infer the full DFE and the proportion of adaptive substitutions (α) using only polymorphism data.Castellano et al. (2019) used polyDFE to compare the full DFE of new amino acid mutations across great apes, and found that the shape of the deleterious DFE is constant across this set of closely related species.More recently, a new method using 2D-SFS has been proposed to jointly estimate the full DFE between two populations that recently diverged and share many polymorphisms (Huang et al. 2021).
However, few studies have compared the DFE between domesticated and wild populations (Leno-Colorado et al. 2022).First, the analysis of differences between wild and domesticated individuals has been focused on finding large effects that determine the phenotypic differences between these two groups.Therefore, DFE does not appear to be an adequate method for detecting candidate regions, as DFE considers the entire distribution of selective effects and appears to be blind to single events.Second, the split between wild and domestic populations is very recent (expected to be no more than 10K years), suggesting that the DFE of these two populations would be very similar or identical.Third, methods to detect subtle differences in DFE between two closely related populations have not been developed.
On the other hand, the study of the demographic processes involved in domestication has been addressed in several species (e.g., Miranda et al. 2023, Arnoux et al. 2020, Murray et al. 2010), where the split between wild and domestic populations, bottlenecks, and gene flow events between these two populations have been inferred, also considering multiple selective sweeps (Caicedo et al. 2007).It has also been shown (Torres et al. 2020) that ignoring deleterious selective effects in the study of demographic patterns, can lead to biased estimates of silent variability patterns (see also Comeron 2017, Beissinger et al 2016).Thus, it is suggested that the DFE and demography be inferred together to avoid deviations from the true value.
Here, we use forward-in-time simulations that consider the domestication process under different demographic and selective models.Several combinations of genetic architectures and selective effects have been simulated, ranging from one that considers a relatively small number of loci that change their selective effects to another that considers polygenic adaptation (considering fitness as a trait) where many loci have divergent selective effects.In all cases, the selective effects of a fraction of the existing variants can change (from deleterious to beneficial and vice versa) in the domesticated population.We also describe how linked selection and changing the DFE deform the recovered demographic histories.

Simulation of the Domestication Process
A simulation analysis of the domestication process is developed using the forward-in-time simulator SLiM4 (Haller and Messer 2023) neutral synonymous positions and two-thirds (0-fold) selected non-synonymous positions scattered along the locus.
The simulation parameters for each scenario (Figure 1A) are as follows: the initial blank population at time 0 run for 10*N e generations to reach mutation-selection-drift equilibrium, then splits into the domesticated and wild populations.Hereafter we refer to the Wild and Domestic populations.We aim to mimic a realistic but still general domestication process in large livestock mammals where ancestral N e estimates are on average around 10,000 (Murray et al. 2010;Groenen et al. 2012;Larson et al. 2014;Frantz et al. 2015;Yang et al. 2016;Librado et al. 2021;Todd et al. 2022) and the domestication process, according to archeological records, started around 10,000 years ago (Ahmad et al. 2020).The average generation time in these large domestic mammals is about 5 years per generation (Pacifici et al. 2013).Note that in this study we had to reduce the population size and related population parameters below from 10,000 diploid individuals to N e =5,000 diploid individuals for computational reasons.
Genomic parameters: The mutation rate per site (μ) and generation is 2.5x10 -7 , and the population size (N e ) is 5,000 diploid individuals, thus the expected θ under neutrality is 0.005.
Each locus is separated from its neighbors by 3x10 -6 recombination events per generation.The recombination rate per site and generation within the loci is fixed to a rate of 1.5x10 -7 recombination events per site.Note that the higher recombination between loci aims to mimic their real genetic distance separation (assuming a functional site density of 5%) -this greatly speeds up the simulation as non-coding sites do not need to be simulated.In other words, we simulate 120 Kb of coding sequence in each run, which is equivalent to simulating a 2.4 Mb chromosome window with 5% coding sites.We perform 100 independent runs for each of the eighteen scenarios.
Demographic parameters: The Domestic populations of 5,000 diploid individuals suffer a bottleneck, reducing their population size temporarily to 200 diploid individuals, to recover again to 5,000 diploid individuals after the bottleneck.The bottleneck lasted 100 generations.
The simulation finishes 900 generations after the bottleneck.In nine of the eighteen simulated scenarios we allow that 1% of the Wild individuals migrate to the Domestic population during the 100 generations of the bottleneck.Thus, during the bottleneck 25% of the domesticated population is wild.In the other nine combinations there is no exchange of individuals between the Wild and Domestic populations.This demographic history is equivalent to a 1,000 years long bottleneck followed by a 9,000 years long recovery in an ancestral population with 10,000 diploid individuals and a generation time of 5 years.

DFE parameters:
The selective effects produced by domestication are modeled by changing the fitness values of a proportion of the existing and new mutations in the domestic population (at the time of the split) (Table 1).We call this probability of change p c and it can be 0% (our negative control), 5% or 25%.Domesticated and Wild populations show different proportions of beneficial and deleterious new mutations depending on the scenario.The negative effects in all scenarios and populations follow a gamma distribution with a shape value of 0.  1).

Types of Sites
The sites are initially divided into seven different types (named m 1 to m7), being m 1 neutral (synonymous) and m 2 to m 7 functional (non-synonymous) sites having a different selective effect when mutated (see Table 2 and Figure 1C and unfolded SFS data are fitted using a DFE model comprising both deleterious (gamma distributed) and beneficial (exponentially distributed) mutations.The DFE of each Wild-Domestic population pair is inferred using the 1D-SFS of each population.polyDFE assumes that new mutations in a genomic region arise as a Poisson process with an intensity that is proportional to the length of the region and the mutation rate per nucleotide (μ).We assume that μ remains constant across simulations (as it is the case).Both an ancestral SNP misidentification error (ε) and distortion parameters (r i ) are estimated.However, we notice that the exclusion of ε does not affect the rest of estimated parameters because under the simulation conditions used here no sites are expected to be misidentified.

dadi: 2D-SFS and 2D-DFE
dadi (Gutenkunst et al. 2009) is employed to infer the joint distribution of fitness effects (Jerison et al. 2014;Ragsdale et al. 2016;Huang et al. 2021) and the demographic history of all simulated population pairs.Our new model for the joint DFE between the two populations is a mixture of multiple components designed to mimic the selected mutation types in the simulations (Table 2; Figure 1D).The major exception is that beneficial mutations are

Inference units, and confidence intervals in demographic and DFE parameters
To obtain the sampling variance of parameter estimates and approximate confidence intervals, we use a bootstrap approach.We resample with replacement 100 times 20 independent simulation runs or chromosomal "chunks" (from a pool of 100 "chunks") and concatenate them.Hence, each concatenated unit (or inference unit) is made of 24 Mb of coding sequence (as comparison, the human genome contains ~26 Mb of coding sequence).Uncertainties of DFE parameter inferences in polyDFE and dadi are calculated by this conventional bootstrapping, but in dadi we hold the demographic model fixed.In polyDFE the distortion introduced by demography (and linked selection) is not estimated but corrected with the r i parameters.Note that our procedure with dadi does not propagate uncertainty in demographic parameters through to the DFE parameters.To obtain the sampling variance of demographic parameter estimates with dadi we use the Godambe approach as described in Coffman et al.

2016.
A final consideration on the factor of two differences across simulation and inference tools.In polyDFE, s is defined as the selection coefficient on the heterozygote (as in dadi), and the scaled selection coefficient is defined as 4N e s, while in dadi it is defined as 2N a s (in the ancestral population).In SLiM2, s is defined as the selection coefficient on the homozygote.

RESULTS AND DISCUSSION
Studying the effect of domestication on the DFE of natural populations can be very challenging, especially when available methods for inferring and comparing the DFE have not been compared using exactly the same dataset.In the present work, we conduct a simulation study using different combinations of parameters relevant to the domestication process.Note that the main difference between domestication demographic model used here and demographic models used in speciation studies is the time scale since the split occurred.Our simulated domesticated populations experience a large or small change in the number and selective effects of loci under domestication after a bottleneck period, with or without migration.
Hereafter, we refer to the Wild and Domestic populations. The

Estimation of demographic parameters in Wild and Domestic populations
In this study, we investigate the effects of natural selection, both in a general sense and specifically the change in selection coefficient in shared variation due to artificial selection, on our ability to reconstruct the demographic history of domestication and the DFE.We do this using two commonly used inference tools (polyDFE and dadi) that assume free recombination across loci.Note that dadi first infers the demographic history and then infers the DFE assuming those inferred demographic parameters, whereas polyDFE operates independently of specific demographic histories and is designed to correct for distortions that affect both synonymous and non-synonymous site frequency spectra equally (Tataru and Bataillon 2019).
Figure 1 A and B show the simulated joint demographic model and the joint demographic model used in the dadi inferences, respectively.We have increased the complexity of the inference model by introducing additional parameters, allowing it to account not only for "simulated" or true demographic changes, but also for more complex and unknown demographic histories and the potential influence of linked selection on synonymous SFS.The diagnostic plots can be found in Supplementary Figure 1; there is good agreement between the model fits and the data.
Our findings indicate that when positive selection is relatively weak (S b = 1 or S b = 10), the estimated onset of domestication tends to be approximately twice as old as the actual simulated starting point.Additionally, the bottleneck appears slightly shallower but considerably longer than the simulated value (see Figure 2 and Supplementary Table 1 for the confidence intervals).
This suggests that the influence of linked selection, likely driven primarily by background selection when S b <= 10, has the effect of elongating the timeline.Consequently, it makes the domestication divergence and bottleneck appear more ancient and extended, respectively.For the Wild populations we always detect a larger population expansion than for the Domestic populations, but without a bottleneck.This signal of a recent expansion in the Wild population is expected because when we consider how linked selection affects the SFS, there are more rare synonymous polymorphisms compared to what we would expect if there was free recombination under a constant population size (Nielsen 2005).Remarkably, when positive selection is strong (S b = 100), the temporal stretch becomes even more pronounced and the demographic history of both populations overlap extensively.The domestication divergence shifts to approximately 50,000 years ago, whereas the actual simulated split occurred 10,000 years ago.Additionally, the bottleneck appears significantly longer and less severe, while there is an inferred large population expansion in both Wild and Domestic populations.Although in Figure 2 there appears to be a change in population size before the domestication split, only three scenarios (with IDs 5, 6, and 7) are statistically significant (Supplementary Table 1 and Supplementary Figure 2).Interestingly, we find the migration rate from Wild to Domestic (m w2d ) and from Domestic to Wild (m d2w ) are overestimated in all scenarios (Supplementary Table 1 and Supplementary Figure 3).We observe that neither migration nor an increase in p c appears to significantly change the inferred demographic histories that we have just described.
Torres et al. ( 2020) detected a distortion in the patterns of diversity due to deleterious effects.
Instead, we detected a distortion in the demographic inference that differs depending on positive selective effects.Our findings indicate that when linked selection is at play, the reconstructed demographic history captures certain elements of the actual simulated history.
For instance, if positive selection is not strong, it successfully identifies a bottleneck in Domestic populations compared to Wild populations.However, when positive selection is strong (S b = 100), it tends to "erase" the demographic history through indirect selection effects and recreate large recent population expansions in both populations.Nevertheless, and more importantly, in all scenarios it is not possible to accurately determine the timing of the onset of domestication, the duration of the domestication bottleneck, or to distinguish between the presence and absence of migration between populations.We believe that these aspects are crucial for contextualizing the role of domestication in human history.Unfortunately, either the 2D-SFS or our modeling assumptions (or both) do not seem to be useful in this context.
Thus, the next question is to what extent can the nuisance r i parameters from polyDFE or this distorted inferred demography from dadi help to recover the simulated DFE parameters?
Can domestication be detected as an artificial change in the marginal full DFE between populations?No.
Next, we investigate whether polyDFE captures differences in the marginal (or 1D) full DFE of Domestic and Wild populations across the eighteen domestication scenarios (Table 1).We run five nested models (Table 3) and compare them using likelihood ratio tests (LRTs) (Supplementary Table 2).It is important to note that in all our simulations, the marginal full DFE for new mutations in both Domestic and Wild populations is the same within a given scenario (as detailed in Table 1).This means that the selection coefficients for sites, whether they are monomorphic or polymorphic, are drawn from the same full DFE.In simpler terms, the proportion of new mutations that are advantageous or detrimental is identical for both Domestic and Wild populations within a given scenario.The key distinction lies in the fact that when p c > 0%, Domestic populations might have a higher number of advantageous mutations as polymorphisms.This is because some of these beneficial mutations were already present at intermediate or high frequencies as nearly neutral polymorphisms in the ancestral population, and we expect that migration after the domestication split can also re-introduce beneficial mutations from the Wild to the Domestic population.
LRTs between different nested models allow us to address important questions about the DFE, without assuming any prior knowledge of our datasets.First, we assess whether the inferred shape of the negative DFE is similar in both populations while also examining if the estimation of the shape parameter is influenced by the presence of advantageous mutations.
When comparing models that do not consider beneficial mutations (models M1 versus M10 in the second column of Supplementary Table 2), the model with a distinct shape for Domestic and Wild populations is accepted only in two, rather unrelated, scenarios (scenarios 7 and 14).This indicates that an artificial alteration in the shape of the deleterious DFE between Domestic and Wild populations can be inferred.Fortunately, when comparing models that take into account beneficial mutations (models M2 vs M20, third column in Supplementary Table 2), all scenarios show a shared shape of the deleterious DFE, which is expected based on the simulation parameters.These findings suggest that disregarding beneficial mutations can cause an artificial change in the inferred shape of the marginal deleterious DFE between populations, as noted previously by Tataru et al. in 2017. Second, incorporating the positive DFE yields statistically significant results in all scenarios (see Supplementary Table 2, fourth and fifth columns).Hence, polyDFE appears to effectively detect beneficial mutations, regardless of the strength of positive selection.Third, we investigate whether Domestic and Wild populations could exhibit an artificial change in the beneficial DFEs as a consequence of domestication.When comparing the M20 and M30 models (refer to the last column in Supplementary Table 2), polyDFE invokes changes in the positive DFE between populations in some scenarios.This is applicable when there is no migration between the populations and only a minimal amount (5%) or when none of the sites change their selection coefficient, regardless of the mean strength of positive selection.
These artificial changes occur in scenarios 2, 7, 8, 13, and 14, with a marginal p-value in scenario 1.It is noteworthy that the polyDFE analysis shows no significant difference in the positive DFE between populations with the presence of migration and when p c equals 25%.
Our initial expectation was that in scenarios with a large fraction of sites changing selection coefficients and migration from the Wild population, would result in an increase in the load of advantageous polymorphisms in the Domestic population (due to the re-introduction of beneficial mutations from the Wild to the Domestic population), leading to a higher inferred rate of new advantageous mutations (p b ) compared to the Wild population.We do not observe this result.We suspect that the absence of this result could be due to linkage between selected mutations and synonymous mutations, which may lead to an overcorrection of the excess of non-synonymous polymorphisms at high frequency via the r i parameters.
polyDFE suggests that specific domestication scenarios, particularly those lacking migration and featuring minimal changes in selection coefficients, may artificially alter the marginal full DFE between populations, particularly in its positive side.In the next section, we find that the artificial change in the marginal full DFE is due to the detection of a higher proportion of new, effectively neutral, advantageous mutations (>10%) in Domestic populations compared to Wild populations.Hence, this finding has no significant impact on the marginal full DFE differences between the populations when the DFE is represented in discrete intervals.We demonstrate that domestication does not significantly impact polyDFE's ability to detect a false difference in the marginal full DFE among populations.We conclude that if a significant change is detected in the discretized marginal full DFE, it must be considered valid.

Estimation of DFE parameters in Wild and Domestic populations
Under the polyDFE framework, we begin by extracting the Akaike Information Criterion (AIC) from every model (Table 3) and then computing the AIC-weighted parameters for all models (Tataru and Bataillon 2019;Castellano et al. 2019).This approach is used because the true model generating real data in both Wild and Domestic populations is unknown.Instead, under dadi's framework, we adopt an alternative methodology that utilizes very general, parameter rich and versatile joint demographic and DFE models to fit the 2D-SFS.The diagnostic plots of the new joint DFE model is shown in Supplementary Figure 1, again there is good agreement between the model fits and the data.
Figure 3 depicts the distribution of parameters related to the deleterious DFE that are estimated by performing bootstrap analysis using polyDFE and dadi.We observe that both tools have a tendency to marginally overestimate the shape parameter of the gamma distribution employed to model the deleterious DFE.The overestimation is particularly significant in polyDFE, when positive selection is rare and strong.In such scenarios, dadi's shape estimation is sometimes rather noisy.Regarding the mean of the deleterious (s d ), we observe quite accurate inferences with both tools across domestication scenarios.This finding indicates that regardless of the inference method used, the estimation of the deleterious DFE is resilient to demographic and selective changes, as well as the pervasive impact of linked selection.In contrast, our previous inference with dadi on the demographic parameters concluded that linked selection significantly complicates the process of obtaining accurate demographic histories.Therefore, although it is generally believed that demographic changes should be considered to infer the underlying DFE, we found that inferring the deleterious DFE is "easier" than inferring the true demographic history.We conclude that correcting the non-synonymous SFS through nuisance r i parameters or using a demographic history that fits the data well, even if it is incorrect, appears sufficient for obtaining an accurate depiction of the deleterious DFE.
The distribution of parameters associated with the beneficial DFE, estimated by bootstrap analysis using polyDFE and dadi, is shown in Supplementary Figure 4 and Supplementary Consequently, polyDFE seems to have limited power in identifying effectively beneficial mutations on the 1D-SFS (under these simulation conditions).As suggested before, the r i parameters might be overcorrecting for the increase in high-frequency non-synonymous polymorphisms in the SFS expected from beneficial mutations.By the same logic, dadi's inferred demographic history might be operating in a similar way to explain the uptick of synonymous polymorphisms at high frequency due to linkage to beneficial mutations.Thus, we conclude that both tools struggle to infer the positive DFE and tend to be overconservative and identify weaker positive selection than what has been simulated.However, it is noteworthy that both tools typically yield comparable (and reasonably accurate) discretized deleterious DFEs upon considering the tendency of polyDFE to infer a peak of effectively neutral beneficial mutations.

Domestic and Wild populations
One of the main goals of this study is to determine the proportion of new and standing non-synonymous mutations with differing selection coefficients in Wild and Domestic populations.The usage of our new joint DFE model is not limited to the current study.Our new model, created by mixing multiple distributions to mimic mutation types in our simulations (Table 2; Figure 1C-D), is suitable for usage in any recently diverged populations.
Hence, while we acknowledge that our simulation and inference pipeline has the potential to provide insights into recent parapatric and allopatric speciation events, our primary focus in this work is on assessing our ability to identify the impact of domestication on the full DFE within domesticated populations.This emphasis is due to the availability of independent archaeological evidence that can be used to determine the timing of domestication onset.

Implications for empirical analysis of populations
Although the time of separation between wild and domestic is recent, the external environments in which they live are very different.This means that an unspecified proportion of environmentally influenced variants may have changed their fitness effect in the domestic population, which may have altered their frequencies.In this work, we simulated differential effects between wild and domestic populations, and we observed that selective effects affect the inference of demographic parameters by linked selection, but to different extents depending on the DFE.Linked selection caused by positive effects is responsible for strong differences from the true parameters, even if they are very rare but strong.Nevertheless, the DFE can be estimated quite accurately, suggesting that methods that infer the entire DFE can be used first and then estimate the demographic parameters using this information.
Another point of interest for empirical geneticists is the development of a new method to jointly infer the DFE between wild and domestic and their differences in the positive part of the distribution.The 2D dadi extension algorithm allows to infer differences in p+w (the fraction of mutations that are positively selected in the wild population), pc (the fraction of mutations that change the coefficient of selection in the domestic population), pc+ (the fraction of those mutations that become beneficial in the domestic population).

CONCLUSIONS
In summary, our use of forward-in-time simulations has provided valuable insights into the complex genetic demographic history and distribution of fitness effects (DFE) for both new and standing amino acid mutations in the context of domestication.Through a comparative analysis of two methods, polyDFE and dadi, we have uncovered the impact of linked selection on the reconstructed demographic history of both wild and domesticated populations.Despite biases in the timelines of domestication events and bottleneck characteristics, the estimation of deleterious DFE remains remarkably reliable, demonstrating the robustness of these analytical approaches.In particular, the underestimation of effectively beneficial mutations in the DFE highlights the influence of linkage between beneficial and neutral mutations, which requires careful consideration in model interpretation.In addition, our results shed light on distinguishing scenarios of divergent selective effects between populations under weak and strong positive selection, providing a nuanced understanding of the interplay of evolutionary forces.As we navigate the complex landscape of domestication, these methodological approaches contribute significantly to unraveling the evolutionary dynamics and adaptive processes that shape the genomes of domesticated organisms, and provide a foundation for future research in this critical area of study.     is not shown but it is a constant population size with relative Ne = 1.The 95% confidence intervals calculated using the Godambe approximation can be found in Supplementary Table 1.
3 and a mean of S d = -100 (2N e s d in the heterozygote, N e = 5,000 diploid individuals and s d = -1%), which is in the range of values inferred using empirical data (Boyko et al.PLoS Genet 2008, Galtier PLoS Genet 2016), while variants with positive effects follow an exponential distribution.We investigate three combinations of parameters for the positive DFE: pervasive & nearly neutral (with mean selective effects S b of 1 and probability of being beneficial p b of 10%), common & weak (S b = 10 and p b = 1%) and rare & strong (S b = 100 and p b = 0.1%) (Table ). Mutations at m 5 , m 6 and m 7 sites generate deleterious variants in the Wild population, and mutations at m 2 , m 3 and m 4 sites generate beneficial mutations in the Wild population.The selection coefficient of mutations generated at m 2 (beneficial) or at m 5 (deleterious) sites are invariant for the Wild and Domestic populations.However, the mutations at m 3 , m 4 , m 6 and m 7 sites will change their selective effect in the Domestic populations relative to the Wild populations.That is, the new selective effect is drawn from the corresponding DFE section (positive or negative), independently of their value in the wild population.Those can be understood as sites with divergent selective effects.The selection coefficient of a given beneficial mutation at m 3 sites will remain beneficial in the Domestic population, but it will be different from the original beneficial effect at Wild.A mutation at m 4 sites will change its selection coefficient from beneficial in the Wild to deleterious in the Domestic population.Equivalently, the selection coefficient of a deleterious mutation at m 6 sites will remain negative in the Domestic population but it will be different from that found at Wild.A mutation at m 7 sites will change its selection coefficient from deleterious in the Wild to beneficial in the Domestic population.For each of the 18x100 independent simulation runs, we randomly precalculate the location of each site type (except for the permanent disposition of two non-synonymous sites followed by a synonymous site within codons) using an ad hoc R script.This hard-coding of selective effects on different sites allows us to gain insight into the relative importance of each mutation type for the domestication process.Distribution of fitness effects (DFE): Two complementary approaches polyDFE: 1D-SFS and 1D-DFEWe use the polyDFEv2.0framework(Tataru and Bataillon 2019) to estimate and compare the DFE across Wild-Domestic population pairs by means of likelihood ratio tests (LRTs)github.com/paulatataru/polyDFE/blob/master/postprocessing.R) to compare pairs of models.The inference is performed only on the unfolded SFS data (divergence counts to the outgroup are not fitted), modelled to have a single fixed selection coefficient, rather than arising from an exponential distribution.Let p +w be the fraction of mutations that are positively selected in the Wild population, p c be the fraction of mutations that change selection coefficient in the Domestic population, and p c+ be the fraction of those mutations that become beneficial in the Domestic population (note in our simulations p +w = p c+ ).To model mutation types m 2 and m 3 , a proportion p +w x (1-p c ) + p +w x p c x p c+ of mutations are assumed to have the same fixed positive selection coefficient in both populations.To model m 4 , a proportion p +w x p c x (1p c+ ) is assumed to have a fixed positive selection coefficient in the Wild population and a gamma-distributed negative selection coefficient in the Domestic population.To model m 5 , a proportion (1-p +w ) x (1-p c ) of mutations are assumed to have equal negative gammadistributed selection coefficients in the two populations.To model m 6 , a proportion (1-p +w ) x p c x (1-p c+ ) is assumed to have independent gamma-distributed selection coefficients in the two populations.To model m 7 , a proportion (1-p +w ) x p c x p c+ mutations is assumed to have a gamma-distributed negative selection coefficient in the Wild population and a fixed positive selection coefficient in the Domestic population.All gamma distributions are assumed to have the same shape and scale.This model is implemented in dadi as the function dadi.DFE.Vourlaki_mixture.Note in our simulations the marginal 1D-DFEs of Wild and Domesticated populations are exactly the same; the difference is that in Domesticated populations a given fraction of sites (some already polymorphic, some still monomorphic) can change their selection coefficient relative to the Wild population.For inference, a new, more general demographic model with branch-independent population size changes is first fit to the synonymous mutations from each simulation, and then the newly proposed joint DFE model is fit to the non-synonymous mutations.The parameters of the demographic model (Figure1B) are estimated by running 100 optimizations per inference unit.The 2D-SFS for selected sites are precomputed conditional on the demography for 104 2 values of γ 1 and γ 2 (2N a s, a population scaled selection coefficient for the heterozygote where N a is the ancestral population size), 102 negative and 2 positives.For the negative part of the DFE, γ values were logarithmically equally spaced between -2000 and -10 -4 .The expected DFE for selected sites can then be computed as a weighted sum over these cached spectra(Kim et al. 2017).The DFE parameters shape, scale, p +w , p c , and p c+ are then estimated by maximising the Poisson likelihood of the simulated data, with the non-synonymous rate of mutation influx fixed to twice that inferred for neutral sites in the demographic history fit.For the DFE inference, optimization is repeated until the best three results are within 0.5 log-likelihood units.Ancestral state misidentification is modelled, however in our simulations no sites are expected to be misidentified.For the purpose of this work, dadi software is downloaded and installed according to the instructions provided at the following link:https://bitbucket.org/gutenkunstlab/dadi/src/master/.Since dadi operates as a module of Python, the Anaconda3 and Spyder (Python 3.7,Rossum and Drake 2009;Anaconda 2016;Raybaut 2009) versions are used in this study.
Wild populations have a constant DFE and constant population size, but limited recombination across loci to mimic a realistic recombination landscape.Beneficial mutations arise at Wild populations following an exponential distribution, while deleterious mutations are drawn from a gamma distribution with shape 0.3 and mean S d = -100 (where S d = 2N a s d , the selection coefficient s d in the heterozygote is -1%, and N a = 5,000 diploid individuals is the ancestral effective population size, see Material and Methods: Simulating the Domestication Process).All mutations, beneficial and deleterious, are co-dominant.The Domestic population originates from the Wild population through a bottleneck and a concomitant change in selective effects at a fraction of non-synonymous sites (Figure 1; Table1).The recombination and mutation landscapes are drawn from the same distribution in the Domestic and Wild populations.The change in selective effects affects both new mutations that arise within the Domestic population and existing variants that existed before the domestication event.Put simply, not only can mutations that were deleterious (or beneficial) before the population split become beneficial (or deleterious) within the domestic population, but even if the direction of the selective effect remains the same, the intensity of selection can change.Table2shows all the combinations of changes in selective effects between Wild and Domestic populations.Our simulated scenarios aim to cover a variety of possible changes in the genetic architecture (number of loci) and the strength of selection (selection coefficients) of the trait/s under domestication.Three DFEs for beneficial mutations are assumed: (i) pervasive and nearly neutral, where a large fraction of new mutations (10%) are on average nearly neutral (S b = 1), (ii) common and weak, where beneficial mutations are still fairly common (1%) but weakly selected (S b = 10) and (iii) rare and strong, where very few mutations (0.1%) are strongly beneficial (S b = 100).Depending on the scenario, a selective change occurs only at a small (5%) or at a substantial proportion (25%) of sites in the Domestic population (Table1, "p c " column).We leave six scenarios as negative controls; the selection coefficients of new and standing variation in the Domestic and Wild populations are exactly the same.Finally, demographic changes affect only the Domestic population; the Wild population evolves under a constant population size.Two versions of the same demographic model (Figure1A) are simulated: (i) one with migration, and (ii) another without migration.When there is migration, it only occurs from the Wild to the Domestic population during the domestication bottleneck.

Figure 4
Figure 4 displays the distribution of the inferred p c for three different positive DFEs, along with simulated p c values.When positive selection is not strong, it becomes apparent that scenarios with a significant fraction of mutations with dissimilar selective effects (p c = 25%) can readily be differentiated from those where a small (p c = 5%) or nonexistent (p c = 0%)

*
The eighteen simulated combinations of parameters in this study.The first column refers to the DFE of new beneficial mutations, the second column represents the migration rate from the Wild to the Domestic population and the third column shows the percentage of sites that change their selection coefficients in the Domestic population (pc).Last column shows the ID we use to quickly label scenarios along the manuscript.

Figure 2 .
Figure 2. Solid lines showing the inferred demographic histories for the eighteen simulated scenarios.In salmon the Wild population and in light-blue the Domestic population.The dashed line shows the true simulated demography in Domestic populations.The true Wild population

Figure 3 .
Figure 3. Sampling distributions of estimated parameters for the deleterious DFE are obtained using 100 bootstrap replicates.Dotted vertical lines indicate the actual simulated parameter values.A) Shape parameter estimated with polyDFE in dark gray the Wild population and in light gray the Domestic population, B) shape parameter estimated with dadi, C) mean sd estimated with polyDFE in dark gray the Wild population and in light gray the Domestic

Figure 4 .
Figure 4. Sampling distributions of inferred pc are obtained using 100 bootstrap replicates.In light green scenarios without migration and in dark green scenarios with migration.
The r i parameters are fitted independently for each frequency bin (from n = 1 to n = 19), and they are able to correct any distortion that affects equally the SFS of synonymous and non-synonymous variants (such as, in principle, demography or linked selection).Model averaging provides a way to obtain honest estimates that account for model uncertainty.To produce the model average estimates of the full DFE we weight each competing model according to their AIC following the equation 6.1 shown in the polyDFEv2 tutorial ("polyDFE/tutorial.pdfat master • paula-tataru/polyDFE"). tataru/polyDFE/blob/master/postprocessing.R to do the model averaging R) to obtain the AIC values.

Table 3 .
Depending on the scenario, we simulate an average increase in relative fitness (s b ) of 1%, 0.1%, and 0.01%.Positive selection's strength is usually substantially underestimated by polyDFE and dadi, but only polyDFE consistently overestimates the proportion of new advantageous mutations (p b ), regardless of the true simulated value.Supplementary Figure5displays the discretized full DFE.Given the distribution of inferred values of p b and s b , we reason that a peak of effectively neutral advantageous mutations is being measured by polyDFE.The overall excess of effectively neutral advantageous mutations measured by polyDFE is generally balanced by the defect of effectively neutral deleterious mutations.