Host genetic factors associated with the range limit of a European hantavirus

Abstract The natural host ranges of many viruses are restricted to very specific taxa. Little is known about the molecular barriers between species that lead to the establishment of this restriction or generally prevent virus emergence in new hosts. Here, we identify genomic polymorphisms in a natural rodent host associated with a strong genetic barrier to the transmission of European Tula orthohantavirus (TULV). We analysed the very abrupt spatial transition between two major phylogenetic clades in TULV across the comparatively much wider natural hybrid zone between evolutionary lineages of their reservoir host, the common vole (Microtus arvalis). Genomic scans of 79,225 single nucleotide polymorphisms (SNPs) in 323 TULV‐infected host individuals detected 30 SNPs that were consistently associated with the TULV clades CEN.S or EST.S in two replicate sampling transects. Focusing the analysis on 199 voles with evidence of genomic admixture at the individual level (0.1–0.9) supported statistical significance for all 30 loci. Host genomic variation at these SNPs explained up to 37.6% of clade‐specific TULV infections. Genes in the vicinity of associated SNPs include SAHH, ITCH and two members of the Syngr gene family, which are involved in functions related to immune response or membrane transport. This study demonstrates the relevance of natural hybrid zones as systems not only for studying processes of evolutionary divergence and speciation, but also for the detection of evolving genetic barriers for specialized parasites.

| 253 SAXENHOFER Et Al. maximize efficiency for a single host (specialists) and infecting multiple host species (generalists; Ciota & Kramer, 2010;Deardorff et al., 2011). In both cases the capability of infecting additional host species is a major opportunity for a virus to increase its infection range (Ciota & Kramer, 2010).
Such opportunities, however, may be limited as virus spillover infections often lead to evolutionary dead ends and transmissions are confined in the new host Webby et al., 2004). Transmissions to new hosts that allow the virus to proliferate and continue spreading are overall infrequent and mediated by a variety of factors, most importantly the degree of relatedness between the hosts and the frequency of host contacts (Luis et al., 2015;Parrish et al., 2008;Streicker et al., 2010). In recent decades, cross-species transmission, particularly of RNA viruses, has resulted in a large number of disease emergences in humans and livestock, representing a major issue for health systems and economies (Hu et al., 2020;Jones et al., 2008;Lloyd-Smith et al., 2009;Mayer et al., 2017). Understanding the patterns of viral diversity in reservoir species and the identification of factors that affect cross-species virus emergence have become the focus of surveillance and prevention programmes (Lam et al., 2020;Olival et al., 2017).
The constitution of species barriers defining the infection range for a virus has been analysed mostly in the context of zoonotic disease emergence (Longdon et al., 2014;Olival et al., 2017;Plowright et al., 2017;Sironi et al., 2015). Phylogenetic analyses of viruses and their corresponding hosts have suggested that genetic factors play a primary role in determining the infection range of viruses (Faria et al., 2013;Longdon et al., 2014;Streicker et al., 2010). Overcoming a species barrier and establishing an efficient transmission chain in a new host may require multiple adaptive changes in the viral genome, representing a major evolutionary challenge (Kuiken et al., 2006;Parrish et al., 2008;Reperant et al., 2012;Simmonds et al., 2019).
The natural infection ranges of some viruses are restricted to closely related species (Streicker et al., 2010) or even to evolutionary lineages of the same host species (Drewes et al., 2017;Gryseels et al., 2017;Saxenhofer et al., 2019). Experimental studies comparing the infection efficiency of viruses in cells of different host species have allowed the identification of single host genes limiting the species range that certain viruses can infect, such as coronaviruses (van Doremalen et al., 2014), HIV-1 (Stremlau et al., 2004) or influenza A virus (Long et al., 2016). This suggests that relatively few intrinsic differences between otherwise genetically similar hosts can mediate strong barriers to virus emergence. The order of evolutionary events and the relative contribution of adaptive vs. neutral changes is difficult to determine in natural virus-host systems but it is possible that divergence of key genes involved in virus-host interactions could occur early in the process of host speciation (Saxenhofer et al., 2019). Still, it remains generally unclear how a host's genetic factors shape the selective environment of associated viruses and determine their infection range in natural populations.
Here we investigate intrinsic genetic factors delimiting the distribution ranges of Tula orthohantavirus (TULV) among evolutionary lineages of its reservoir host, the common vole (Microtus arvalis), using a genome-wide association study (GWAS) approach.
TULV is a single-stranded RNA virus with a three-segmented genome that is horizontally transmitted and causes asymptomatic chronic infections in the rodent reservoir host (Forbes et al., 2018;Vaheri et al., 2013). In Europe, the large-scale geographical distribution of highly diverged phylogenetic clades in TULV is partially associated with the distribution of morphologically cryptic evolutionary lineages in M. arvalis (Heckel et al., 2005;Schmidt et al., 2016). A detailed study of an intraspecific hybrid zone in the common vole demonstrated a very tight spatial association between the host lineages Central and Eastern and the virus clades Central South (TULV-CEN.S) and Eastern South (TULV-EST.S; Saxenhofer et al., 2019). A strong evolutionary barrier to effective TULV transmission in the hybrid zone operates at distances that individual voles can travel in very short time (Hahne et al., 2011;Saxenhofer et al., 2019;Schweizer et al., 2007) and where gene flow between host lineages is ongoing (Beysard & Heckel, 2014). Deep genomic divergence (17% sequence difference) and the absence of recombination or reassortment between TULV-CEN.S and TULV-EST.S indicate that these viral clades exceed the stage of speciation of the host lineages (Saxenhofer et al., 2019). The zone of contact between these TULV clades contains no physical barriers, such as rivers or changes in altitude, which might impede host movements or TULV transmission and is thus likely to be driven by intrinsic genetic factors that probably arose in the host lineages before secondary contact after post-glacial recolonization formed the hybrid zone (see Saxenhofer et al., 2019).
In this study, we take advantage of this natural system to interrogate the host genome for genetic polymorphisms associated with the sharp and probably selectively maintained distribution range limit of TULV clades that behave effectively as distinct viral species (see Saxenhofer et al., 2019). We focused our GWAS on the zone of natural admixture where the cosegregation of genetic polymorphisms distinguishing host lineages with the TULV clades is partially broken up by many generations of hybridization after secondary contact (Beysard & Heckel, 2014). Employing replicate sampling transects served to limit the potential of detecting geographically restricted associations and enabled the identification of genomic polymorphisms and genes in M. arvalis that may contribute to confining the range of this European hantavirus.

| Sample collection
We sampled 547 common voles mostly from the direct contact area between the TULV-CEN.S and the TULV-EST.S clades and combined them for our analyses with TULV-infected samples available from Saxenhofer et al. (2019) (Table 1 and Table S1). Sampling was per- DNA concentration was measured for each sample using the Qubit dsDNA BR Assay Kit (Invitrogen) and DNA quality was evaluated on an agarose gel. Voles of less than 20 g bodyweight were not assessed for TULV infection as young individuals are typically protected by maternal antibodies (Kallio et al., 2006;Schmidt et al., 2021). TULV infection was detected by amplifying a 540-nucleotide fragment of the nucleocapsid gene on the small TULV genome segment (S-segment) following the RT-PCR (reverse transcriptase PCR) assay described in Essbauer et al. (2006). RNA was extracted from lung tissue, and S-segment fragments of TULV-positive samples were sequenced as described by Schmidt et al. (2016) and Saxenhofer et al. (2019).

| TULV screening and clade assignment
A phylogenetic analysis was performed with mrbayes version 3.2.6 (Ronquist et al., 2012) on the CIPRES platform (Miller et al., 2010) to assign TULV sequences to major evolutionary clades using reference sequences from Schmidt et al. (2016) and Saxenhofer et al. (2019) ( sampling was performed for 10 7 generations in four independent runs comprising four chains. We implemented reversible-jump sampling over the entire general time-reversible substitution model space (Huelsenbeck et al., 2004) and samples were recorded every 10 3 generations after discarding a burn-in fraction of 25%.

| Host genotyping
Genotyping by Sequencing (GBS; Elshire et al., 2011)  used for calling genotypes, except that a minimum of five reads were required to identify a unique tag. Only bi-allelic SNPs were retained, and genotypes were only called if individuals had a read depth for that locus of at least five. After SNP calling, individuals with more than 50% of loci with missing data were removed. Then, loci were filtered out if they had a minor allele frequency (MAF) of less than 5%, more than 20% missing data or observed heterozygosity greater than 50%, which may indicate multiple paralogues merged into single loci (White et al., 2013). Three loci with complex indels were also removed. This filtering resulted in a data set of 323 TULV-infected individuals genotyped at 79,225 SNPs, which corresponds to an approximate density of 1 SNP per 50 kb of vole genome.

| Host population structure
Previous population structure analyses based on microsatellite markers in the hybrid zone within the Bavaria and Porcelain transects have already shown genetic admixture between the common vole lineages in these regions, supporting K = 2 genetic clusters (Beysard & Heckel, 2014;Saxenhofer et al., 2019). To examine the extent to which these patterns of genetic admixture were also present in the infected individuals of our genomic data set, we analysed the genetic structure in the hybrid zone with the admixture software (Alexander et al., 2009;Zhou et al., 2011). The analyses were performed for both transects separately using only TULV-infected individuals. We used initial cross-validation implemented in admixture for K = 1-10

TA B L E 1 Number of common vole
individuals from two sampling transects across the hybrid zone screened for Tula orthohantavirus which supported K = 2 for both transects. We then performed admixture bootstrapping with 1000 replicates for each transect to establish genetic cluster membership of each infected individual.

| Genome-wide association analysis
A Genome-wide Efficient Mixed Model Association (gemma) analysis was conducted with version 0.98.1 of the software (Zhou & Stephens, 2012). As gemma requires a data set without missing data, we used ld-knni as implemented in tassel 5 (Glaubitz et al., 2014) for the imputation of missing SNPs based on the mean of the 10 closest related neighbours. Neighbours were determined by the 30 physically closest SNPs for each missing site and individual. Imputation was performed independently for the whole data set, as well as for each transect separately, and any SNPs still possessing missing data or a MAF of less than 5% were removed. Correcting for population structure within the data set was initially performed using covariates (principal components 1 and 2) and/or relatedness matrix (calculated via gemma version 0.98.1). However, this led to an overcorrection of the GWAS results due to the transition of virus phenotypes (TULV clades) within the host hybrid zone (Saxenhofer et al., 2019).  (Wald, 1945) and corrected for multiple testing with the Bonferroni method (Bonferroni, 1936). Only SNPs which were significantly (p < .05) associated with clade-specific TULV infections across all three GWAS (Full data set, Bavaria transect only, Porcelain transect only) were considered for further analyses. To estimate the effect size of loci associated with TULV clades across both transects, a probit model was fitted on the genotype information of individual SNPs using the glm function followed by calculating Nagelkerke's R-squared (Nagelkerke, 1991) implemented in the fmsb package in R.

| Candidate genes
We analysed a flanking region of 100 kb up-and downstream (based We used Fisher's exact test to find significantly enriched GO terms for both the biological processes and molecular functions of our candidate genes and corrected for multiple testing with the Bonferroni method (Bonferroni, 1936). For all genes with a GO term relevant for virus infection (i.e., related to immune response or membrane/vesicle transport), we examined further if there was published evidence that either they were involved in virus-related pathways, or a family member of a gene involved these pathways.

| TULV infections of common voles in the sampling area
The  Table 1 and Table S1; GenBank accession nos.: where TULV sequences from both clades were detected.  (Beysard & Heckel, 2014;Heckel et al., 2005;Lischer et al., 2013) in both transects, with a clear transition between mostly pure populations at the transect ends ( Figure 2).

|
Of 323 infected individuals, 199 showed evidence of admixture with cluster memberships of 0.1-0.9 to either cluster.

| Genome-wide association across two replicate transects
Filtering of SNPs with a MAF of less than 5% and imputation of missing data for the GWAS provided a total of 60,471 SNPs for the full data set containing both transects, 57,461 SNPs for the Bavaria transect and 64,490 SNPs for the Porcelain transect. In total, 38,715 SNPs were shared between all three data sets. Among those, we found a total of 32 SNPs that showed a significant association with TULV clades in separate GWAS of the three individual data sets (Figure 3). We observed a particularly strong association on Chromosome 6 across a region of ~1 MB, containing five of the 32 significant SNPs and the two SNPs with the lowest p-values.
To assess the potential impact of imputation on the GWAS re-  (Table S3). We detected no significant enrichment of GO-terms among these genes (all p > .3).
Among our candidate genes we identified nine with published evidence of involvement in viral response or infection pathways or that are a family member of such a gene (

| Clade-specific TULV infections and associated host genotypes
Individual-level and spatial patterns of the SNPs at the candidate

| DISCUSS ION
Our analyses demonstrate very clear spatial patterns of local genomic admixture between common vole lineages and corroborate the extremely abrupt transition between parapatric TULV clades within the hybrid zone. Relatively few SNPs in the rodent host were consistently associated with infection by specific TULV clades across replicate sampling transects, explaining up to 37.6% of the variance.
Even at the very fine geographical scale of our analyses, the effects of genetic polymorphisms appear to limit the range of the two TULV clades, providing strong evidence for the evolution of an infection barrier within a host species.
The geographical distribution of pathogens within a host species is impacted by many factors, including landscape features (Guivier et al., 2011) or polymorphisms in resistance against pathogen infections (Alves et al., 2019;Magwire et al., 2012;Rohfritsch et al., 2018;White et al., 2011). Spatially narrow contacts between TULV-CEN.S and TULV-EST.S clades (Figure 1c,d) and an absence of potential dis- of these host genes along the transects represent a major barrier for transmission of differentially adapted TULV clades (Saxenhofer et al., 2019). Single host factors that restrict the infection range and impede virus emergence have been described so far predominantly between relatively distantly related host species with highly divergent genetic and ecological backgrounds (Long et al., 2016;Stremlau et al., 2004;van Doremalen et al., 2014). In hantaviruses, studies on the interactions with the host immune system have revealed several host factors involved in virus infection, persistence and replication (Charbonnel et al., 2014;Easterbrook et al., 2007;Guivier, Galan, Male, et al., 2010;Martínez-Valdebenito et al., 2019;Müller et al., 2019). However, infection barriers for hantaviruses have been characterized mostly in the context of human infections (Liu et al., 2009;Mäkelä et al., 2001;Martínez-Valdebenito et al., 2019;Mustonen et al., 1996;Wang et al., 2009). The rodent host genes identified in this study may present new candidates involved in restricting viral infection ranges and in limiting host shifts of hantaviruses between closely related species.
Among the candidate genes, SAHH, ITCH and Syngr03 as well as its close neighbour TSC2 stand out as particularly promising candidates (Table 2). SAHH enables methylation of a variety of both DNA and RNA motifs, which is crucial for the replication of several virus species such as vaccinia virus, yellow fever and vesicular stomatitis (Borchardt et al., 1984;Tseng et al., 1989). ITCH is an E3 ubiquitin ligase and a common target for viral hijacking to recruit host proteins necessary for virus development and budding in multiple viruses including Ebola (Han et al., 2016), Herpes (Koshizuka et al., 2018) and Influenza A. Syngr03 is a member of the Synaptogyrin gene family, which is involved in vesicular transport and endo-and exocytosis crucial for virus replication (Sessions et al., 2009;Sun et al., 2016;Walker et al., 2018). Among our candidate genes is also Syngr02 with a spatial allele distribution resembling that of Syngr03 (Table 2 and Figure S2).
Finally, TSC2 is involved in anabolic metabolism in cells and a crucial target of the Human Cytomegalovirus UL38 protein, facilitating efficient viral replication (Bai et al., 2015;Moorman et al., 2008). None of these genes has been implicated in hantavirus-host interactions so far, but comparable genomic analyses at similar scale are lacking for other natural systems except for an investigation into the genetics of bank vole tolerance to Puumala hantavirus (PUUV; Rohfritsch et al., 2018). and others have shown the difficulty in identifying key genes or proteins conferring resistance or differences in susceptibility to infection in dead-end hosts such as humans but also in reservoir hosts (Charbonnel et al., 2014;Martínez-Valdebenito et al., 2019;Müller et al., 2019;Rohfritsch et al., 2018). The well-defined ecological, evolutionary and spatial context of this natural system analysed here holds the potential for using even more refined genomic approaches for complementing and prioritizing the list of candidate host factors (see Atkinson et al., 2021;Kwok et al., 2021), which may then feed back into research on pathogenic systems.
Cross-species transmission of viruses is related to ecological and evolutionary diversity in many mammalian and avian host taxa (Allen et al., 2017;, and intrinsic species barriers for viruses may arise mostly as a by-product of host diversification (Cuypers et al., 2020;de Bellocq et al., 2015;Gryseels et al., 2017;Martinů et al., 2020). In the TULV system, Central and Eastern are the evolutionarily closest lineages in the common vole (Beysard & Heckel, 2014;Lischer et al., 2013;Sutter et al., 2013) and the re-

2012). Studying the genomic barriers to virus transmission across
TULV contact zones of different evolutionary divergence analogous to their hosts (Beysard & Heckel, 2014;Beysard et al., 2015) may offer insights into the evolutionary mechanics that drive the divergence of hantaviruses and potentially even the generation of new virus species. For closely related and human-pathogenic PUUV, many virus clades have been described across Europe and partially associated with particular evolutionary lineages in the bank vole host and the regional absence of the disease in humans (Drewes et al., 2017), but the direct link to genomic polymorphisms in the rodent host has not been established.

| CON CLUS IONS
Extant pathogen populations in animal reservoirs are the most common source of outbreaks of infectious diseases in humans and livestock (Jones et al., 2008;. Understanding the factors that affect cross-species virus emergence is the focus of research and prevention programmes but the combat against outbreaks is often impeded by very limited knowledge about the reservoir hosts (Groseth et al., 2007;Shi et al., 2018). As RNA virus evolution is mostly driven by patterns of co-evolution and co-divergence with host taxa (Lin et al., 2012;Mélade et al., 2016;Switzer et al., 2005), combining evolutionary analyses of host and virus divergence directly may allow insights that are difficult to obtain in the nonequilibrium situation of disease outbreaks (Cuypers et al., 2020;Schneider et al., 2021).
The explicit consideration of the spatial context of the association of many hantaviruses with their host taxa may be particularly beneficial for clarifying the relationships and succession of events in evolutionary adaptation or host-species switches (see also de Bellocq et al., 2015Bellocq et al., ,2018Bennett et al., 2014;Cuypers et al., 2020;Gryseels et al., 2017;Guo et al., 2013;Martinů et al., 2020;Saxenhofer et al., 2017Saxenhofer et al., , 2019Worobey et al., 2010). Our analyses demonstrate that detailed examination of natural hybrid zones between host taxa-or more generally admixture between hostshas the potential to aid in identifying not only genetic polymorphisms relevant for developing and maintaining species barriers among the hosts but also those loci that contribute to these processes in tightly associated parasites.

ACK N OWLED G EM ENTS
We thank Susanne Tellenbach for laboratory support, and Melanie

CO N FLI C T S O F I NTE R E S T
The authors declare no conflicts of interest.