EPAS1 Gain-of-Function Mutation Contributes to High-Altitude Adaptation in Tibetan Horses

Abstract High altitude represents some of the most extreme environments worldwide. The genetic changes underlying adaptation to such environments have been recently identified in multiple animals but remain unknown in horses. Here, we sequence the complete genome of 138 domestic horses encompassing a whole altitudinal range across China to uncover the genetic basis for adaptation to high-altitude hypoxia. Our genome data set includes 65 lowland animals across ten Chinese native breeds, 61 horses living at least 3,300 m above sea level across seven locations along Qinghai-Tibetan Plateau, as well as 7 Thoroughbred and 5 Przewalski’s horses added for comparison. We find that Tibetan horses do not descend from Przewalski’s horses but were most likely introduced from a distinct horse lineage, following the emergence of pastoral nomadism in Northwestern China ∼3,700 years ago. We identify that the endothelial PAS domain protein 1 gene (EPAS1, also HIF2A) shows the strongest signature for positive selection in the Tibetan horse genome. Two missense mutations at this locus appear strongly associated with blood physiological parameters facilitating blood circulation as well as oxygen transportation and consumption in hypoxic conditions. Functional validation through protein mutagenesis shows that these mutations increase EPAS1 stability and its hetero dimerization affinity to ARNT (HIF1B). Our study demonstrates that missense mutations in the EPAS1 gene provided key evolutionary molecular adaptation to Tibetan horses living in high-altitude hypoxic environments. It reveals possible targets for genomic selection programs aimed at increasing hypoxia tolerance in livestock and provides a textbook example of evolutionary convergence across independent mammal lineages.


Introduction
Hypoxia and ultraviolet exposure are key characteristics of high-altitude environments. Recent functional genomic studies have revealed the genetic basis of adaption to highaltitude hypoxia in Tibetan people (Simonson et al. 2010;Yi et al. 2014), Tibetan dogs , antelopes (Ge et al. 2013) and gray wolves (Zhang et al. 2014). Although other selection targets have been reported, the endothelial PAS domain protein 1 (EPAS1) gene often appears as a common selection target, providing a striking example of convergent evolution across a range of mammals exposed to similar environmental pressure in the Qinghai-Tibetan (QT) Plateau. The EPAS1 gene encodes one subunit of the hypoxiainducible factor (HIF) and shows multifarious effects, including the regulation of angiogenesis, hemoglobin concentration (HMG) and erythrocytosis (Beall et al. 2010). Other mechanisms, such as the higher levels of nitric oxide (NO) enhancing the pulmonary blood flow and oxygen delivery (Hoit et al. 2005), are also known to facilitate life at high altitude. Additionally, other convergent modifications of metabolic pathways and the composition of rumen microbiota  have been described in Tibetan Sherpas (Horscroft et al. 2017), the yak (Qiu et al. 2012), and other high-altitude mammals (Ge et al. 2013).
Many livestock species other than the dogs and the yak live in the QT Plateau (above 3,000 m), including chicken , sheep (Wei et al. 2016), and goats (Song et al. 2016). They are all known to exhibit higher hemoglobin levels than their lowland relatives. Domestic horses have also inhabited the QT Plateau for thousand years where they have significantly contributed to the development of human societies (Chen et al. 2015). The biological basis for their adaptation to life at high altitude has, however, not yet been studied.
In contrast to Tibetan horses, previous work has studied the adaptive response to high altitude in Andean horses and identified selection signatures within a genomic block encompassing the EPAS1 gene (Hendrickson 2013). However, the density of the genotyping data collected was limited to 50 k SNPs, and the causative mutation(s) underlying horse adaptation to life at high altitude in the Andes could not be identified.
In this study, we undertook the first genomic analysis of selection signatures in Tibetan horses, aiming at identifying the genetic basis for adaptation to high-altitude hypoxia in the QT Plateau. To achieve this, we sequenced the genome of 138 Chinese horses spanning a wide range of altitudes, from the lowland regions of the Northeastern Plain to the mountains of the QT Plateau. We identified that the EPAS1 gene and the hemoglobin subunit epsilon 1 (HBE1) gene showed some of the strongest signatures of positive selection in the Tibetan horse genome. Further exploration of EPAS1genotypes in a larger panel of 908 horses and functional validation assays suggested that at least one missense mutation in one of the PAS domains is a key driver of the physiological response of Tibetan horses to high-altitude hypoxia.

Genomic Variation
We used Illumina HiSeq instruments to generate wholegenome sequencing data of 138 horses sampled from China (supplementary table S1, Supplementary Material online and fig. 1A). This provided a total of 25.87 billion of read pairs across Chinese domestic breeds (5-10 individuals per breed). Our data set also included seven Thoroughbreds and five Przewalski's horses for comparison. The median genome coverage achieved across the full data set was $7.5X (min ¼ 3. 1X (Alexander et al. 2009) was largely consistent with the PCA results (supplementary fig. S5, Supplementary Material online) and supported both the similar genetic makeup of the Tibetan and Qinghai horses, as well as the presence of three major clusters, consisting of the NC, SW, and QT Chinese native horses.
We next investigated the phylogenetic relationships amongst the clusters identified within Chinese native horses. Both the ML-TreeMix tree (Pickrell and Pritchard 2012) and the distance-based Neighbor-Joining tree (Saitou and Nei 1987) were in line with the results of the PCA and Admixture analyses and supported a three-clade population structure, separating the southwestern montane horses, northern lowland horses and Qinghai/Tibetan horses ( fig. 1C and D). The reliability of the Neighbor-Joining tree was estimated by 100 bootstrap pseudoreplicates. BEAST (Drummond and Rambaut 2007) phylogenetic inference based on the 4-fold degenerate (synonymous) nucleotide sites also supported the genetic structure around three main geographic areas of NC, SW and QT (supplementary fig. S6, Supplementary Material online). Interestingly, important migration weights that showed the magnitude of migration events, were inferred by TreeMix between breeds from the QT Plateau and the southwestern montane area ( fig. 1C,  supplementary fig. S7, Supplementary Material online). Whether this reflects the influence of the so-called "Ancient Tea-Horse Road," which connected Tibet and Yunnan Province over the last thousand years (Yang 2004), remains to be tested. Yunnan, situated in Southwest China and bordering Tibet in the north, is famous for its Pu-erh tea. Interestingly, none of the six migration edges considered in TreeMix involved Przewalski's horses ( fig. 1C, supplementary  fig. S7, Supplementary Material online), ruling out their significant (if any) genetic influence on Chinese horses. This is in line with ancient genome evidence showing that Przewalski's horses represent the feral descent of Eneolithic horses that did Liu et al. . doi:10.1093/molbev/msz158 MBE not contribute to the genetic makeup of modern domestic horses (Gaunitz et al. 2018).
We next investigated the demographic history underlying the evolutionary origins of the three main population clusters identified. To achieve this, we compared three possible population scenarios showing each one of the three main clusters branching out first, and a fourth scenario in which all three clusters differentiated at the same time. The latter scenario was best supported by the oaoi algorithm (Gutenkunst et al. 2009 (CDM, Chaidamu; HeQu, Hequ; DaTo, Datong) living at an altitude of at least 3,000 m. The orange color represents the Southwestern horses (DeBa, Debao pony from Guangxi; JiCh, Jianchang pony from Sichuan; NiQi, Ningqiang pony from Shannxi). The green represents the Northern horses (MoGo, Mongolian horse from Inner Mongolia; WMG, Mongolian horse from Mongolia; ELC, ErlunChun horse from Heilongjiang; YaQi, Yanqi horse from Xinjiang; Yili horse from Xinjiang). The colors of symbols that indicate the geographic regions are the same as those in the PCA plots and phylogenetic trees. (B) PCA plots of the first two components of all horse samples (inner plot) and all Chinese native horses (outer plot). The fraction of the total variance explained is reported on each individual axis between parentheses. (C) The ML-TreeMix tree of all horses, with PrZ as outgroup, assuming four migration events. Four migration events that are most consistent with known events, because they increased the explained variance to 99.8% (Pickrell and Pritchard 2012), and does not increase afterwards (supplementary table S25, Supplementary Material online). Migration arrows are colored according to their weights. Horizontal branch lengths are proportional to the amount of genetic drift parameter that has occurred on the branch. The drift parameter measures the variance in allele frequency estimated along each branch of the tree. The yellow and orange lines indicate the instantaneous admixtures, whereas arrows denote continuous (unidirectional) gene flow. (D) The Neighbor-Joining tree of the horse breeds, with PrZ as outgroup. Bootstrap reported was close to 100%. (E) @a@i bestsupported population model depicting the evolutionary trajectories of the main three clusters of Chinese native horses. The light blue, green, dark blue and orange rectangles represent the ancestral, Northern China (NC), Qinghai-Tibetan (QT) and Southwestern (SW) populations, respectively. The numbers within the rectangles represent the effective size (individual horses) for the corresponding population. The average number of migrants per year between the different groups is shown between the black arrows. Ne ¼ effective size (individuals). T ¼ time of divergence (years).
The four scenarios tested above considered that the three main lineages of Chinese native horses evolved in isolation after divergence. To accommodate the possibility of gene flow suggested by the TreeMix analyses, we further refined the fourth scenario, considering asymmetric migrations. This further improved the likelihood of the model ( fig. 1E and supplementary fig. S8, Supplementary Material online) and indicated that the three lineages of NC, SW and QT horses diverged $3,700 years ago (average ¼ 3,676 years BP, bootstrap 95% CI ¼ 3,672-3,712 years ago; supplementary table S8, Supplementary Material online). This divergence time is consistent with 1) current time estimates for the expansion of the ancestors to all modern domesticated horses (Gaunitz et al. 2018), 2) the first archaeological evidence of draught horses in China during the early Bronze Age ($4,300 BP to $3,800 BP) and of horse carriage in late Shang dynasty ($3,300 BP to $3,000 BP, supplementary table S9, Supplementary Material online), 3) archaeological findings suggesting the presence of horses in Tibet in the second millennium BCE (Chen et al. 2015;Hu et al. 2019), and 4) the emergence of pastoral nomads in northwestern China (Chen et al. 2015). The best-supported demographic model also indicated asymmetric gene flow among the three clusters, with the largest migration contributions from QT and NC into SW, respectively (log-likelihood ¼ -351,986, migration ¼ 22.03 and 8.33 individuals/year, respectively, fig. 1E). This is consistent with the migration edges inferred by TreeMix. F3-statistics showed the gene flow between NC and SW and also between NC and QT. Furthermore, D-statistics supported the significant gene flow between SW and Tibetan (D [PrZ, SW; NC, Tibetan] ¼ 0.0073, Z ¼ 4.54), as well as between NC and QT (D [PrZ, NC; QT, SW] ¼ À0.0031, Z ¼ -2.81) (supplementary tables S10 and S11, Supplementary Material online).

Genome-Wide Selection Scans for Adaptation to Life at High Altitude
We next scanned the Tibetan horse genomes for signatures of positive selection, leveraging allele frequency differences and patterns of molecular diversity within 100 kb sliding windows (heterozygosity and nucleotide diversity) relative to lowland horse breeds ( fig. 2). The top-5% selection candidates that were common to all three tests carried out represented a total of 159 windows spanning 440 protein-coding genes (supplementary tables S12-S15, Supplementary Material online). Enrichment analyses for Gene Ontology terms revealed that the hemoglobin complex (GO: 0005833), the oxygen transport (GO: 0015671), and the oxygen transport activity (GO: 0005344) categories were significantly overrepresented (P-values ¼ 0.02-0.05, supplementary table S16, Supplementary Material online). This is consistent with the extremely hypoxic environmental conditions in the QT Plateau. Importantly, the genomic window containing the EPAS1 (HIF2a) locus was the top selection candidate identified (chromosome 15, fig. 2) whereas the second candidate region included the HBE1 gene (chromosome 7, fig. 2). Both genes are known to be involved in oxygen transport and hypoxia adaptation ).

Annotation of Variants under Positive Selection in EPAS1
We next sought to further refine the selection targets within the two top-candidate regions (EPAS1 and HBE1), possibly at the individual SNP level by using three different methods, h p ratio of the low-altitude horse breeds to the high-altitude horse breeds (h pLL /h pQT ), Tajima's D and F ST . We noticed that two SNPs within the most significant selection region represent missense variants of the EPAS1 gene ( Material online). The EPAS1 protein is highly conserved among mammals, with two heterodimerization PAS domains showing close to 100% sequence similarity amongst 22 different species, including humans and horses living at low altitude. The missense variants (hereafter referred to as SNP1 and SNP2, respectively) are not present in Przewalski's horses, suggesting that they arose in an independent genetic background.
To further investigate the possible functional consequences of SNP1 and SNP2 variants, we investigated whether they showed any association with high altitude, extending our analysis to a more comprehensive panel of 908 horses originating from 29 populations (min. 20 animals per population, supplementary table S17, Supplementary Material online). This analysis supported a strong correlation between the frequencies of both missense variants and altitude (P-value ¼ 6.02E-05 and 2.00E-07, respectively). Tibetan horse breeds living >4,000 m showed the highest allelic frequencies of both variants (almost 0.8) whereas horses living at low-altitude (below 1,000 m) showed the lowest ( 0.05). Allelic frequencies decreased to 0.6-0.7 in the QT breeds (3,000-3,500 m) and to 0. Using the same approach, we also explored the possible association of one missense SNP present in the HBE1 gene (SNP3). While the association was significant, it was not as strong as those detected at the SNP1 and SNP2 of EPAS1 gene (P-value ¼ 5.5E-04) (supplementary figs. S11 and S12, Supplementary Material online). This suggests that the two variants of EPAS1 likely have larger biological effects than HBE1. The PolyPhen-2 score of SNP1 (R144C) is larger than that of SNP2 (E263D) (supplementary fig. S13, Supplementary Material online), suggesting larger functional effect for the former. It may thus represent the major genetic mutation underlying high-altitude adaptation in Tibetan horses.

Functional Effects of the Two Selected EPAS1 Missense Mutations
We next sought to functionally assess the effects of the R144C and E263D missense mutations in adenocarcinoma human Liu et al. . doi:10.1093/molbev/msz158 MBE alveolar basal epithelial cells (A549). This cell line was selected as it shows high expression levels of the EPAS1 protein and its target genes (Sato et al. 2002). Western blot analyses revealed that the expression levels of the two GFP tagged-mutant EPAS1 proteins carrying each individual variant were higher than wild-type ( fig. 4C and supplementary fig. S14, Supplementary Material online). This suggests that both mutations can stabilize the EPAS1 protein by altering the structure of PAS domains.
Under hypoxic conditions, the EPAS1 protein is known to be translocated into the nucleus, where it binds the aryl hydrocarbon receptor nuclear translocator ARNT (HIF-1b) via PAS domains (Luo and Shibuya 2001). The resulting heterodimer can then bind to DNA hypoxia response elements (HRE) where it initiates the transcription of key genes involved in the response to hypoxia, such as erythropoietin (EPO), lactate dehydrogenase A (LDHA), endothelin (EDN1), vascular endothelial growth factor A (VEGFA), von Hippel-Lindau (VHL) and prolyl pydroxylase domaincontaining protein 2 (PHD2, also EGLN1; Lee et al. 2011). We thus hypothesized that the two mutants identified may influence the structure of the PAS domains and the heterodimerization affinity of EPAS1 for ARNT.
To test this hypothesis, we co-expressed myc-tagged EPAS1 and flag-tagged ARNT proteins in A549 cells and carried out a coimmunoprecipitation assay using antimyc antibody. This was aimed to isolate the heterodimer formed and assess whether any of the two mutations could affect the interaction ( fig. 4D). Coimmunoprecipitation assays confirmed the strong protein-protein interaction between the wild-type EPAS1 and ARNT proteins. This interaction was greatly enhanced in the EPAS1 R144C mutant but did not significantly change for the E263D mutant. This suggests that the R144C missense mutation, but not the E263D missense mutation, increases the heterodimerization affinity of EPAS1 for ARNT, in line with the PolyPhen-2 scores of the variants. Genetic Adaptation to High-Altitude in Tibetan Horses . doi:10.1093/molbev/msz158 MBE We further examined whether the EPAS1missense mutations could alter the transcriptional response to hypoxia using RT-qPCR assays ( fig. 4E). The mRNA expression levels of four out of the seven downstream genes tested (EDN1, LDHA, VHL and EPO) were significantly increased in R144C transfected cells ( fig. 4E). No significant changes in RNA expression levels were observed in the E263D transfected cells or in the three other genes tested (VEGFA, EGNL2 and EGNL3). This confirmed that the R144C missense mutation as a gain-of-function mutation, which amplifies the transcriptional activity mediated by EPAS1 in response to hypoxia.

EPAS1 Missense Mutations Correlate with Improved Oxygen Carrying Traits and High Anaerobic Metabolic Capacity in Tibetan Horses
We next measured a total of 23 physiological blood characteristics in the Tibetan (N ¼ 88) and lowland horses (N ¼ 85) that were also genotyped at SNP1 (the G allele results in the presence of an arginine, R at amino acid position 144 whereas the A allele introduces a cysteine, C at that position). We found a significant statistical association between the EPAS1 SNP1 (R144C) mutation and hemoglobin levels (HMG, P-value ¼ 1.11E-17), the volume of red blood cells (RBCV, P-value ¼ 1.20E-21), the mean corpuscular hemoglobin concentration (MCHC, P-value ¼ 5.12E-13) and three other hematological parameters (HBDH, P-value ¼ 4.79E-16; LDH, P-value ¼ 2.80E-9; CK, P-value ¼ 6.33E-5). Interestingly, the different hematological parameters investigated were found to show different associations with particular EPAS1 genotypes, with A homozygous carriers showing lower RBCV and HMG but larger MCHC (and a-HBDH3, LDH and CK Genetic Adaptation to High-Altitude in Tibetan Horses . doi:10.1093/molbev/msz158 MBE levels; fig. 5, supplementary tables S18 and S19, Supplementary Material online). Additionally, Tibetan horses showed $15% higher CK levels than lowland horses, possibly reflecting the absence of oxidative stress observed in response to hypoxia ( fig. 5 and supplementary

Discussion
In this study, we sequenced the genomes of 138 horses from 17 Chinese breeds spanning a wide range of altitudes, from the lowlands of the NC plains to the QT Plateau. Our extensive genome data set allowed us to identify a total of $10.3 million SNP variants (plus $2.3 M indels and CNVs) that helped unveiling the population structure of Chinese horses at unprecedented levels. In line with previous work based on 27 microsatellites (Ling et al. 2011), we found that the genetic landscape matched broad geographic subdivisions, with three main clusters in the NC, SW and the QT regions ( fig. 1A-E). Statistical modeling revealed that the three main clusters split simultaneously $3,700 years ago. This follows in $500 years the early demographic expansion of the lineage ancestral to all modern domestic horses (Sanchez-Mazas 2012) and predates the earliest archaeological evidence of horse chariotry in the early Shang dynasty by $500 years (Sanchez-Mazas 2012). Altogether, this indicates that domestic horses radiated throughout the southern, northern and montane areas of China soon after their first arrival in the country. Determining whether domestic horses first arrived to China harnessed to spoke-wheel chariots or mounted purpose will require further archaeological work.
Scanning the genome of high-altitude Tibetan horse breeds for signatures of positive selection revealed EPAS1 and HBE1 genes within the top-candidates ( fig. 2). The same selection targets have been described in a number of other organisms, including Sherpa humans (Horscroft et al. 2017), Tibetan dogs ) and yak (Qiu et al. 2012). This repeated selection of similar physiological and metabolic pathways in response to high-altitude hypoxia provides a textbook example of parallel evolution (Beall et al. 2010;Wang et al. 2014;Zhang et al. 2014). Previous studies have established a key role for EPAS1 in mediating the blood physiology of Tibetan Sherpas (Horscroft et al. 2017). EPAS1 SNP variants unique to native Tibetans as well as two gain-offunction mutations in the EGLN1/PHD2 negative regulator of HIF, which include EPAS1 (Lorenzo et al. 2014), have indeed been found to be significantly associated with lower HMGs (Beall et al. 2002). Similar to human studies on large Tibetan cohorts (Yang et al. 2017), we found that both HMG and HCT of Tibetan horses were lower than those of lowland horses ( fig. 5B and C). In contrast to humans, the MCHC of Tibetan horses was, however, higher, a pattern reminiscent of what is also observed in Tibetan dogs (Gou et al. 2014), Tibetan sheep (Wei et al. 2016) and Tibetan Cashmere goats (Song et al. 2016). This reduced number of red blood cells (RBCs) found in the Tibetan horses likely facilitates blood circulation, while the increased HMG per cell ensures that more oxygen can be carried, delivered and ultimately consumed. The increased LDL and a-HBDH3 levels measured in Tibetan horses suggest indeed greater capacity for anaerobic glucose fermentation and ATP production relative to lowland horses, similar to what is observed in human Sherpas (Horscroft et al. 2017).
At the molecular level, our work reveals that two nonsynonymous mutations located in the otherwise conserved PAS-A and PAS-B domains of the EPAS1 gene are key adaptations to hypoxia in Tibetan horses. These mutations are associated with higher stability and/or activity of the EPAS1 protein in transfected A549 cells, in line with previous observations (Prabhakar and Semenza 2012). This ultimately results in increased transcription of key downstream gene targets within the HIF signaling pathway involved in the response to hypoxia, such as VEGFA, LDHA and VHL (Bigham and Lee 2014). Interestingly, the SNP variants underlying hypoxia adaptation in Tibetan horses were found to segregate at low but nonzero frequencies in a number of lowland breeds. These mutations may have different effects on the regulation of erythropoiesis as well as on metabolism, which is another major target of HIF regulation. Future genomic selection programs in commercial breeds carrying these variants might help overcome the physiological deficiency of lowland animals when transported to high altitude.

Conclusions
Our study provides the first in-depth whole-genome sequence analysis of Chinese native horse breeds. It includes a total of 138 individual genomes spread across 17 breeds and spans the entire altitudinal gradient, from the lowland region of northeast China to the high-altitude QT Plateau. Patterns of neutral molecular diversity reveal that China was populated by the ancestral lineage of modern domestic horses in the early Bronze Age, $3,700 years ago, and that an early radiation in the southern, northern and montane QT range formed the three main clusters of present-day diversity. Genomic regions showing signatures of positive selection in QT horses revealed EPAS1 as the top candidate for adaptation to life at high altitude. Genotyping data across a large panel of 908 lowland and high-altitude horses revealed that two missense variants in two PAS domains of the EPAS1 gene are strongly associated with altitude, and reach close-to-fixation ($80%) frequencies in high-altitude Tibetan horses. Functional assays demonstrate that these variants result in higher stability and heterodimerization activity of the EPAS1 protein, as well as low RBCVs, high HMGs and increased anaerobic metabolism capacities. Our approach showcases the power of population genomic tools and selection scans over more classical genome-wide association studies to shortlist a number of selection candidates that can be further functionally validated in a cost-effective manner. Our study adds the horse to the list of other Tibetan mammals showing EPAS1 as a key adaptive driver to life at high altitude. This not only provides a textbook example of convergent evolution but Liu et al. . doi:10.1093/molbev/msz158 MBE also offers new opportunities to the breeding industry in the form of novel selection targets for improving lowland horse welfare following exposure to hypoxic environments.

Ethics Statement
Handling and sampling of horses were carried out in full respect to animal welfare. All procedures involving the handling of horses were approved by the Animal Care and Use Committee of the Chinese Academy of Agricultural Sciences and the Ministry of Agriculture of the People's Republic of China (IASCAAS-AE-03).

Samples
A total of 138 horses representing 126 Chinese native horses, five Przewalski's horses and seven Thoroughbred horses were selected for whole-genome sequencing (supplementary table S1, Supplementary Material online and fig. 1A). Half of the 126 Chinese native horses live above 3,300 m in the QT Plateau, with 35 animals from Tibet (10 Jiangzi, JiZi; 10 Langkazi, LKZ; 6 Mozhu, MoZh; and 9 NiMu, NiMu) and 26 from Qinghai (6 HeQu; 10 Datong, DaTo; and 10 Chaidamu, CDM). All other native horses investigated originated from low-altitude areas (below 1,000 m), mainly from the diverse geographic areas of China, including NC, with 17 horses from Xinjiang (YiLi and YaQi), 7 from Jilin (ELC), 8 from Inner Mongolia (MoGo) and 7 from Mongolia (WMG); and SW China, with 8 horses from Sichuan (JiCh), 8 from Shaanxi (NiQi) and 10 from Guangxi (DeBa). A minimum of two separate flocks were sampled for each breed or location, and parent/offspring pairs were excluded. Closely related individuals were also disregarded based on available microsatellite data (Ling et al. 2011). This large collection of horse samples was further used in the current study to validate the identified variants (supplementary table S17, Supplementary Material online).

Whole Genome Resequencing Analysis
The extracted DNA from each collected sample was sequenced on Illumina HiSeq XTen/2500 instruments at BerryGenomics Company (Beijing, China). High-quality trimmed read pairs were aligned against the reference horse genome assembly EquCab 2.0 genome (Wade et al. 2009) using BWA (version: 0.7.12) with default parameters (Li et al. 2009). Only read pairs that showed mapping qualities superior to 20 and that were identified as uniquely aligned by the Picard Mark Duplicates (picard.sourceforge.net, version 1.86) were further considered for calling genotypes. This was carried out using two methods, including the Genome Analysis Toolkit version 2.4 (McCormick et al. 2015) and SAMtools (Zhou et al. 2015), and the set of genotypes common to both methods was retained to minimize the false positive SNP calls. Specifically, SNPs were retained if 1) their Genetic Adaptation to High-Altitude in Tibetan Horses . doi:10.1093/molbev/msz158 MBE confidence scores (QD) was superior or equal to 20; 2) their Phred-scaled P-values of the Fisher's exact test aimed at detecting strand bias (FS) was inferior or equal to 10; 3) the Z-scores of the Wilcoxon rank sum test of Alt versus Ref read position bias was superior or equal to -8; 4) the SNP quality scores of each individual SNP were greater than average; and 5) they were biallelic and found at a minimal allele frequency of 5%. The detailed assembly for SNP, indel and CNV calling are described in the supplementary methods, Supplementary Material online.
Population Structure and Phylogenetic Analysis All SNPs were pruned using PLINK (Version: 1.90 b), considering window sizes of 1,000 variants, a step size of 5 and a pairwise r 2 threshold of 0.5 (-indep-pairwise 1000 5 0.5). This retained a total of 1,449,645 independent SNPs for all subsequent analyses. Principal component analysis (PCA) was conducted using the GCTA 1.91 software (Xu et al. 2011). For the first PCA plot, all 138 animals were used, with the first three principal components cumulatively explaining 11.95% of the total variance. Population structure was evaluated using ADMIXTURE (Version: 1.3.0; Alexander et al. 2009), considering a total of 10,000 iterations and two to nine genetic clusters (K). The Neighbor-Joining tree was constructed using PHYLIP 3.68 (evolution.genetics.washington.edu/phylip.html). MEGA5 (Tamura et al. 2011) and FigTree software (tree.bio.ed.ac.uk/software/figtree/) were used to visualize the phylogenetic trees. BEAST v1.10 (Drummond and Rambaut 2007) was run on the data set using a log-normal relaxed clock model, for 100,000,000 states to generate the maximum clade credibility (MCC) tree (sampling frequency ¼ 1/1,000, burnin ¼ 10%). We used the HEK_GAMMA model of nucleotide substitution, with four rate categories for gammadistributed rates across sites. The MCC tree returned from BEAST was then visualized using SpreaD3 (Bielejec et al. 2011). SNPs at 4-fold degenerate synonymous sites were used in the BEAST analyses in order to reduce the data size and make them computationally manageable instead of SNPs in noncoding regions used in oaoi. For further information, see the supplementary methods, Supplementary Material online.
Demographic History Inference Using @a@i We reconstructed the Chinese native horse demographic history using the diffusion approximation method for the allele frequency spectrum (SFS) implemented in oaoi. This program can also assess the statistical support of isolation models versus models including migration. oaoi analyses were restricted to SNPs located in noncoding regions to mitigate potential effects of selection that could interfere with demographic inference (Zhang et al. 2018). We considered the three major phylogenetic clusters, consisting of NC, SW and QT, and tested four possible demographic scenarios, each depicting different divergence orders of the three horse clusters. We selected the model showing the highest log-likelihood value as the optimal model (Model 4). We also used fastsimicoal2 (Excoffier and Foll 2011) to confirm our model selection. The detailed models, statistics and python scripts to implement the simulations are shown in supplementary tables S7, S8 and supplementary methods, Supplementary Material online. After model selection, scaled parameters for the bestsupported model were transformed into the real values using the same l ¼ 7.242e-9 (per site per generation) and g ¼ 8 (year) as described in the study (Orlando et al. 2013). Gene flow was modeled as discrete migration events at a certain time after population divergence.

Migration Events
Admixture analysis was conducted by TreeMix (Version: 1.12; Pickrell and Pritchard 2012) at the population level. After converting the PLINK SNP matrix to TreeMixformat by the plink2treemix.py python script, we constructed the ML tree using Przewalski's horses as an outgroup, both in the absence (m ¼ 0) and the presence (1 m 6) of migration.

Genomic Selection Signatures
We scanned the Tibetan horse genomes for signatures of positive selection using the population-differentiation value (F ST ; Weir and Cockerham 1984), the nucleotide diversity (h p ) ratio (h p-LL /h p-QT ) (Nei and Li 1979) of the lowland (LL) group to QT and the transformed heterozygosity score (ZH P ). The window-based ZH P approach was calculated as follows: ZH P ¼ (Hp -lHp)/rHp, where l is the overall average heterozygosity and r is the standard deviation of all windows within each group (Rubin et al. 2012; supplementary tables S12-S14 and supplementary methods, Supplementary Material online). Evidence for positive selection in response to high-altitude hypoxia was evaluated by contrasting differentiation indices between the Chinese native horses from the QT Plateau (>3,300 m) and the LL Chinese horses (<1,000 m). All diversity indices (F ST , ZH P , and h p ) were calculated using 100 kb sliding window (step size ¼ 15 kb) using VCFTools (-fst-window-size, -fst-window-step, and -weirfst-pop;Danecek et al. 2011;Cavadas et al. 2019). The significance threshold was set to the top 5% for each individual test. Only those windows that were returned positive in all three tests were further considered as potential selection candidates, and the corresponding genes were used in Gene Ontology enrichment analysis (supplementary table  S15

Physiological Association
We collected two sets of jugular venous blood samples from 88 adult Hequ horses (HeQu) living at an altitude of 3,500 m and 85 adult Guanzhong horses (GZg) living at an altitude below 100 m. The first set was collected in coagulant tubes, whereas the other was collected in anticoagulation tubes (BD,3678124). We used the IDEXX Vet Autoread Blood Analyzer (IDEXX, America) on-site to measure six hematological parameters on the former set. These included RBC counts, HMG, HCT, white blood cell (WBC) and MCHC. All on-site measurements were completed within 2 h after blood collection. The other functional assays were conducted in the laboratory on the second blood set, following serum preparation within 12 h after blood collection. The serum was stored at -20 C until further blood biochemical parameters were measured using a HITACHI 7080 Autoread Biochemical Analyzer (HITACHI, Japan) following the manufacturer's protocol. The blood parameters measured included glutamic-pyruvic transaminase (ALT), glutamic oxaloacetic transaminase (AST), total protein (TP), A/G, alkaline phosphatase (ALP), serum lactate dehydrogenase (LDH), creatine kinase (CK), a-hydroxybutyrate dehydrogenase (HBDH), blood glucose (CLU), total cholesterol (CHOL), triacyl glycerol (TG) and calcium (Ca). Additionally, all samples were further genotyped for EPAS1 R144C (SNP1), E263D (SNP2) and HBE1V76A (SNP3) to test for possible genetic association with any of the physiological parameters measured.

Statistical Analysis
The statistical test was performed by using ANOVA methods (R packages, stats). Further information pertaining to the blood tests conducted here is provided in supplementary tables S18 and S19, Supplementary Material online. The fixed and random model Circulating Probability Unification (FarmCPU; Liu et al. 2016) were used to calculate the effective values (effB). This approach handles confounding issues between covariates and test markers using both the fixed effect model (FEM) and a random effect model (REM). Sex correction was used as FEM in this study. Then, the effB in the FarmCPU result was regarded as the allele substitution effect, and the proportion of phenotypic variance explained by each significant SNP was estimated as follows: where p and q are the allele frequencies, b is the estimated allele substitution effect, and S 2 is the sample phenotypic variance (Velie et al. 2018).

Constructs and Mutagenesis
The pcDNA3.1 vector with a GFP flag was purchased from Invitrogen Company, whereas the myc-tagged and flagtagged pcDNA3 vectors were provided by Dr Qinghe Li . The wild-type EPAS1 gene was amplified from the cDNA of lowland humans. The mutagenesis of the EPAS1 sequence (R144C and E263D) was achieved through site-directed mutagenesis (Genewiz) and verified by Sanger sequencing. Briefly, the complementary oligo deoxyribonucleotide primers were designed at the mutation sites, and then polymerase chain reaction (PCR) was used to generate fragments including the mutations. After double DNA digestion with NheI and BamHI (Beyotime), the EasyGeno Rapid Recombinant Clone Kit (VI201, TIANGEN) was used to subclone the wildtype EPAS1 cDNA, the SNP1R144C sequence (G to A) or the SNP2 E263D sequence (T to A) into the pcDNA3.1 plasmid (Invitrogen) for the overexpression analysis. For the subsequent co-immunoprecipitation experiment, these three sequences of the EPAS1 gene were subcloned into the myc-tagged plasmid, and the human ARNT gene was subcloned into the flag-tagged plasmid. All the recombinant plasmids were purified using the EndoFree Plasmid Mega or Giga Kits following the manufacturer's instructions (Qiagen). Transfection, Western blotting and quantitative PCR analysis were performed according to the supplementary methods, Supplementary Material online. As the gene Erythropoietin (EPO) is only expressed in HepG2 cell, we also conducted transfection and RT-PCR in HepG2 cells to monitor EPO levels in the wild type, R144C and E263D EPAS1 backgrounds.

Co-Immunoprecipitation
Cells co-expressing myc-tagged EPAS1 mutants and flagtagged ARNT proteins were lysed with RIPA buffer (50 mM Tris-HCl, pH 7.4, 150 mM NaCl, 0.25% deoxycholic acid, 1% NP-40, 1 mM EDTA, and 0.5% SDS supplemented with protease inhibitor) after two consecutive washing steps with PBS and centrifuged at 4 C for 20 min (14,000 rpm). The supernatant was incubated under agitation at 4 C for 2 h with 20 ml Protein A/G bead slurry (Abmart) that was prewashed with RIPA buffer 4 times. The beads were then centrifuged at 4 C for 2 min to obtain a precleared supernatant. A fraction of the latter was frozen at -80 C and served as the "Input" for Co-IP experiments, whereas the remainder was immunoprecipitated with 20 ml prewashed antimyc linked affinity gel beads (Abmart) at 4 C for 2 h or overnight with rotation.
Genetic Adaptation to High-Altitude in Tibetan Horses . doi:10.1093/molbev/msz158 MBE After several washing steps (i.e., rinsing twice and washing twice for 30 min at 4 C with rotation), the gel beads together with the linked EPAS1 complexes were denatured for WB analysis with an antiflag antibody and antimyc mouse monoclonal antibody (Abmart) to detect the protein interaction between the EPAS1 mutants and ARNT ).

Data Availability
All the whole genome Illumina sequencing reads have been deposited in the Sequence Read Archive (https://www.ncbi. nlm.nih.gov/sra; last accessed July 8, 2019) with the accession codes (BioProject ID: PRJNA515160

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.