High-Quality Chromosome-Level Genome Assembly of the Corsac Fox (Vulpes corsac) Reveals Adaptation to Semiarid and Harsh Environments

The Corsac fox (Vulpes corsac) is a species of fox distributed in the arid prairie regions of Central and Northern Asia, with distinct adaptations to dry environments. Here, we applied Oxford-Nanopore sequencing and a chromosome structure capture technique to assemble the first Corsac fox genome, which was then assembled into chromosome fragments. The genome assembly has a total length of 2.2 Gb with a contig N50 of 41.62 Mb and a scaffold N50 of 132.2 Mb over 18 pseudo-chromosomal scaffolds. The genome contained approximately 32.67% of repeat sequences. A total of 20,511 protein-coding genes were predicted, of which 88.9% were functionally annotated. Phylogenetic analyses indicated a close relation to the Red fox (Vulpes vulpes) with an estimated divergence time of ~3.7 million years ago (MYA). We performed separate enrichment analyses of species-unique genes, the expanded and contracted gene families, and positively selected genes. The results suggest an enrichment of pathways related to protein synthesis and response and an evolutionary mechanism by which cells respond to protein denaturation in response to heat stress. The enrichment of pathways related to lipid and glucose metabolism, potentially preventing stress from dehydration, and positive selection of genes related to vision, as well as stress responses in harsh environments, may reveal adaptive evolutionary mechanisms in the Corsac fox under harsh drought conditions. Additional detection of positive selection for genes associated with gustatory receptors may reveal a unique desert diet strategy for the species. This high-quality genome provides a valuable resource for studying mammalian drought adaptation and evolution in the genus Vulpes.


Introduction
The dry climate in the desert area of central Eurasia is mainly characterized by hot and dry summers, cold and dry winters, and the gradual desertification of natural landscapes [1]. Extreme temperatures, food deprivation, and excessive solar radiation present formidable challenges for local species [2]. However, a wide variety of organisms have evolved the ability to adapt to heat and drought stress. For example, camels exhibit adaptive features in fat and water metabolism, as well as in response to heat, strong UV light, and asphyxiating stress [3]. Maintenance of homeostasis under acute dehydration in cactus mice in the desert zone of North America and in livestock, such as desert sheep and Liangzhou donkeys, responds as an adaptation to hot drought [4][5][6]. Although such adaptive features have been described in several species, little is known about the genetic basis of the Corsac fox drought adaptation [7]. Therefore, we investigated the genetic basis of drought adaptation in Corsac foxes at the genomic level. nome ( Figure S2). The result conclusively indicates that the chromosomal genome assembly of the Corsac fox was of high quality and accuracy (Table S8).
We performed an interspecies synteny analysis of the assembled Corsac fox genome with that of the Arctic fox to assess the accuracy of the genome assembly [23]. There was a high degree of co-linearity between the two genomes, consistent with the close phylogenetic relationship between these species ( Figure 1D).  To evaluate the completeness of the genome, we first performed BUSCO v2.3.1 analysis using the mammalia_odb10 database [22]. We found that 92.6% of the 9226 BUSCO genes in the genome were complete ( Figure S1. The 99.68% read-mapping rate, 99.82% genome coverage rate, 0.106% heterozygous SNP rate, and 0.000118% homologous SNP rate of the final assembled genome served to verify its consistency and completeness. The GC (guanine-cytosine) concentration was approximately 41.24%, and the scatter plot indicated no significant separation of GC and no other exogenous contamination in the genome ( Figure S2). The result conclusively indicates that the chromosomal genome assembly of the Corsac fox was of high quality and accuracy (Table S8).
We performed an interspecies synteny analysis of the assembled Corsac fox genome with that of the Arctic fox to assess the accuracy of the genome assembly [23]. There was a high degree of co-linearity between the two genomes, consistent with the close phylogenetic relationship between these species ( Figure 1D).

Genome Annotation
The Repeat elements were annotated based on homology and De novo prediction. We first identified approximately~767 Mb of repetitive elements in the Corsac fox genome, representing 32.67% of the total genome size (Table S9, Figure S3). Gene structure predictions were made using Augustus v3.2.3, Glimmer v3.0.4, SNAP v2013.11.29, GeneID v1.4, and GenScan v1.0 software, annotating Ailuropoda melanoleuca, Canis familiaris, Homo sapiens, Mus musculus, and Vulpes vulpes (Table S10, Figure S4). The annotated protein-coding genes in the V. corsac genome were identified using De novo prediction, homology-based prediction, and transcriptome sequencing-based methods. The above results were combined using the gene prediction integration software EVidenceModeler. A total of 22,501 protein-coding genes were annotated in the V. corsac genome (Table S11, Figure S5). Functional annotation revealed 20,511 genes, up to 91.2% of the V. corsac genes, with annotated functions in several databases, including TrEMBL, GO, KOG, SwissProt, InterPro, and NR (Table S12, Figure S6). Non-coding RNAs (ncRNAs) were annotated using various methods. Prognostic results showed that the ncRNAs in the V. corsac genome included 143,320 transfer RNAs (tRNAs). The ncRNAs in the Corsac genome included 16,395 tRNAs, 21,907 miRNAs, 481 ribosomal rRNAs, and 2448 snRNAs (Table S13). These results indicate that the quality of our annotated Corsac fox genome is reliable.

Unique Genes and Molecular Phylogenetic Analysis
The homologous relationships between the Corsac fox and the other assessed species were confirmed by aligning the protein sequences using OrthoFinder. A total of 20,070 gene families were identified in these species, 12 of which shared 13,379 core gene families. The Corsac fox genome contained 124 unique genes that were not shared by other species ( Figure S7).
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses in the species' specific gene families were performed to identify their functions. The GO enrichment analysis showed that a large proportion of significantly enriched GO terms (p < 0.05) were closely related to protein reactions (Table S14, Figure 2A). An additional seven unique genes were significantly enriched in olfactory receptor activity. KEGG enrichment analysis of these specific genes showed that a large proportion of significantly enriched KEGG pathways (p < 0.05) were closely related to substance synthesis, including ko00400 (phenylalanine, tyrosine, and tryptophan biosynthesis, p = 0.03218), ko00524 (neomycin, kanamycin, and gentamicin biosynthesis, p = 0.026889), ko00541 (O-Antigen nucleotide sugar biosynthesis, p = 0.037442), ko00960 (tropane, piperidine, and pyridine alkaloid biosynthesis, p = 0.037442) and substance metabolism, including phosphonate and phosphinate metabolism, cyanoamino acid metabolism, riboflavin metabolism, vitamin B6 metabolism, and sulfur metabolism (Table S15, Figure 2B).
To explore the evolutionary status of the Corsac fox, we inferred its phylogenetic relationships with four species of canids, five other representative carnivores (Ursidae, Felidae, and Mustelidae), Homo sapiens, and Mus musculus. A phylogenetic tree was constructed using 12,620 single-copy orthologs of the Corsac fox and other mammals, which indicated that the Corsac fox clustered with the Red fox (Vulpes vulpes). The divergence time estimate indicates that the Corsac fox was separated from the closest species by approximately 3.7 MYA ( Figure 2C).

Gene Family Analysis
It is possible that gene family expansion and contraction are one of the reasons for the adaptive differences between the investigated species [24]. This was present in all analyzed species to varying degrees. Of the three Vulpes genera, the Corsac fox had a greater number of gene family expansions and contractions than the Red and Arctic fox using CAFÉ5 [25]. There were 652 expansions and 1694 contractions in the Corsac fox, 208 expansions and 744 contractions in the Red fox, and 372 expansions and 468 contractions in the Arctic fox. We performed functional enrichment analyses of the gene families of these three species and found that the expanded gene families of the Corsac fox and Arctic fox were enriched for more pathways associated with energy metabolism and stress responses. Compared to the Red fox and Arctic fox, the Corsac fox had more pathways associated with protein responses in the expanded gene families. In the Red fox, more pathways related to energy metabolism and protein response were present in the

Gene Family Analysis
It is possible that gene family expansion and contraction are one of the reasons for the adaptive differences between the investigated species [24]. This was present in all analyzed species to varying degrees. Of the three Vulpes genera, the Corsac fox had a greater number of gene family expansions and contractions than the Red and Arctic fox using CAFÉ5 [25]. There were 652 expansions and 1694 contractions in the Corsac fox, 208 expansions and 744 contractions in the Red fox, and 372 expansions and 468 contractions in the Arctic fox. We performed functional enrichment analyses of the gene families of these three species and found that the expanded gene families of the Corsac fox and Arctic fox were enriched for more pathways associated with energy metabolism and stress responses. Compared to the Red fox and Arctic fox, the Corsac fox had more pathways associated with protein responses in the expanded gene families. In the Red fox, more pathways related to energy metabolism and protein response were present in the contracted gene families. The gene families associated with olfactory receptor activity were contracted to varying degrees in the Corsac fox and Arctic fox gene families (Tables S16 and S17). We focus here more on the contracted and expanded gene families of the Corsac fox.

Positive Selection of Genes
In the analysis of the positive selection of genes, we identified 1328 positively selected genes in the Corsac fox genome by comparison with the 11 species noted above, using the branch-site model in PAML (Table S22). Through enrichment analysis of these genes, we obtained 30 significantly enriched GO terms (corrected p < 0.05) and 30 KEGG pathways (corrected p < 0.05). A large proportion of the significantly enriched GO terms and KEGG pathways were closely related to energy metabolism and protein reactions (Tables S23 and S24, Figure 4A). Each of the three Vulpes species was used as a foreground branch to screen for their positively selected genes. More functional pathways related to energy metabolism, protein response, and stress response were present in the Corsac fox compared to the Arctic fox and Red fox (Tables S25 and S26).

Positive Selection of Genes
In the analysis of the positive selection of genes, we identified 1328 positively selected genes in the Corsac fox genome by comparison with the 11 species noted above, using the branch-site model in PAML (Table S22). Through enrichment analysis of these genes, we obtained 30 significantly enriched GO terms (corrected p < 0.05) and 30 KEGG pathways (corrected p < 0.05). A large proportion of the significantly enriched GO terms and KEGG pathways were closely related to energy metabolism and protein reactions (Tables S23 and S24, Figure 4A). Each of the three Vulpes species was used as a foreground branch to screen for their positively selected genes. More functional pathways related to energy metabolism, protein response, and stress response were present in the Corsac fox compared to the Arctic fox and Red fox (Tables S25 and S26).

Discussion
We took advantage of Illumina short-reads and ONT long-reads in combination with Hi-C to produce a chromosome-level assembly of the Corsac fox with high contiguity, completeness, and accuracy. Theia high-quality chromosome-level assembly enabled us to study patterns of genomic variation, differentiation, and other genomic features (number and size of genes, repetitive elements, chromosomes) and to characterize regions of the genome that may be relevant to desert adaptation in the species. To better understand the Corsac fox genome assembly, basic genomic information was compared with that of two other Vulpes species [23,27]. Among the three species, the Corsac fox genome had the largest contig N50 (Table 1). The tree topology obtained for the phylogenetic relationships between foxes was consistent with that of earlier studies, although there were differences in the timing of differentiation. The Red and Corsac fox are closely related, diverging from a common ancestor approximately 3.7 MYA before the present. The time of differentiation between the Corsac and Red fox was longer than that estimated based on mitochondrial genomic and nuclear DNA estimates [28][29][30][31]. This difference may be due to differences in technology. We also found that the divergence of the Arctic and Red fox occurred at 4.3 MYA, which is much more recent than the time estimated by Peng et al. based on the Arctic fox genome [23].
The Arctic fox is a circumpolar inhabitant that lives in the Arctic; the Corsac fox is mainly found in the arid and semiarid regions of central Eurasia, while the Red fox lives

Discussion
We took advantage of Illumina short-reads and ONT long-reads in combination with Hi-C to produce a chromosome-level assembly of the Corsac fox with high contiguity, completeness, and accuracy. Theia high-quality chromosome-level assembly enabled us to study patterns of genomic variation, differentiation, and other genomic features (number and size of genes, repetitive elements, chromosomes) and to characterize regions of the genome that may be relevant to desert adaptation in the species. To better understand the Corsac fox genome assembly, basic genomic information was compared with that of two other Vulpes species [23,27]. Among the three species, the Corsac fox genome had the largest contig N50 (Table 1). The tree topology obtained for the phylogenetic relationships between foxes was consistent with that of earlier studies, although there were differences in the timing of differentiation. The Red and Corsac fox are closely related, diverging from a common ancestor approximately 3.7 MYA before the present. The time of differentiation between the Corsac and Red fox was longer than that estimated based on mitochondrial genomic and nuclear DNA estimates [28][29][30][31]. This difference may be due to differences in technology. We also found that the divergence of the Arctic and Red fox occurred at 4.3 MYA, which is much more recent than the time estimated by Peng et al. based on the Arctic fox genome [23].
The Arctic fox is a circumpolar inhabitant that lives in the Arctic; the Corsac fox is mainly found in the arid and semiarid regions of central Eurasia, while the Red fox lives mainly in most of Eurasia and North Africa [11,23]. The three fox species show different 9 of 19 adaptive traits. The gene families associated with protein responses are differentially altered in Corsac foxes living in hot environments compared to Red foxes and Arctic foxes. For both the Corsac fox and the Arctic fox, which also live in extreme conditions, it is important to use energy efficiently to cope with adverse temperatures and food shortages. Changes in the size of gene families associated with energy and material metabolism and stress reaction may help them adapt to such extreme environments. The Red fox is subject to fewer adverse effects and has easier access to food and a suitable ecological environment. The gene family associated with energy metabolism in the Red fox genome is in a contracted state. Although it is intuitive that gene family contraction is often maladaptive, it may provide an evolutionary mechanism for phenotypic adaptation [26]. This may be related to the ability of the Red fox to adapt to a wide range of ecological niches.
Protein synthesis and metabolism: Analysis of gene families showed that genes associated with physiological processes, such as protein synthesis and response, were affected by natural selection. Proteins are mainly structural and functional substances [32,33]. The main function of ribosomes is to convert the genetic code into amino acid sequences and build protein polymers from amino acid monomers, which are involved in the biological processes of translation and protein folding [34,35]. The Corsac fox genes are heavily enriched for ribosome-related gene ontology terms, including "structural constituent of ribosome," "ribosome," and "mitochondrial ribosome." We also identified the biological processes associated with ubiquitin proteins, whose main function is to label proteins that need to be broken down and to act as regulators of protein interactions [36]. In the expansion of the gene family, "Hsp70 protein binding" and "heat shock protein binding" were commonly enriched gene ontology terms. Heat shock protein 70 (Hsp70) is a powerful chaperone whose expression is induced in response to a wide variety of physiological and environmental impacts, including anticancer chemotherapy, thus allowing cells to survive under lethal conditions [37][38][39][40]. Heat stress, which occurs in the cells of organisms exposed to high temperatures and water deprivation, affects transcriptional and translational processes, increasing the chances of DNA breakage and protein oxidation, ultimately leading to apoptosis or cell death [41,42]. Protein denaturation under heat stress is challenging for many organisms. Previous studies of the response of different mammals to thermal environments revealed that gene ontology terms such as "ribosome" and "translation" were most frequently enriched, which is also consistent with our findings [43]. These analyses suggest that genes associated with protein responses play a role in heat-stressed environments in the Corsac fox.
Energy metabolism: Sugar is one of the primary energy sources for living organisms and is a much more potent source of short-term energy in short periods than proteins and fats [44,45]. Glucose is metabolized in the body via three main pathways: glycolysis, aerobic oxidation, and the pentose phosphate pathway [46]. The expanded gene families and positively selected genes were enriched in glycolysis/gluconeogenesis, glucose metabolism, and glycolysis. Maintaining high levels of glucose metabolism provides the Corsac fox with sufficient energy to participate in a range of energy-intensive life activities such as predation. Studies in camels found that the evolution of genes related to sugar metabolism had occurred more rapidly than in cattle [3]. The adaptive evolution of sugar metabolism genes may be a key to a species' ability to persist in extreme environments. Fat storage plays an important role in energy metabolism when food is scarce [47]. Camels living in desert areas and sheep under extreme conditions have a high capacity for fat metabolism [3,5]. Three enriched pathways related to fat metabolism were closely linked to the adaptation of the Corsac foxes to harsh environments. These genes may enhance the energy storage and production capacity of the species in desert environments.
Visual protection: Intense solar radiation is another challenge faced by Corsac foxes. One risk factor is that organisms exposed to UV radiation for long periods are at risk of developing eye diseases [3]. UV exposure was found to significantly increase the likelihood of cataract development in mice [48]. We investigated the genome of the Corsac fox for orthologous genes that may be involved in the adaptation of the Corsac fox's eyes to solar radiation. BBS9 and BBS10 have been reported as causative genes in Bardet-Biedl syndrome, and their proteins are thought to play a role in protein trafficking and the function of photoreceptors that connect cilia to outer segments [49,50]. Mutations in the GPR143 gene cause severe ocular albinism [51]. IMPG2 plays a role in the retina, and its deletion causes functional degeneration of mouse optic rods and cone cells [52]. AIPL1, ATF6, and CNNM4 function in human and mouse retinal cone cells [53][54][55]. CNGA1, CNGB1, CACNA1F, PAX6, OPN1LW, and LRIT3 function in the retina and are associated with retina-related diseases [56][57][58][59][60]. The ARR3 gene is interlinked with the X chromosome and is associated with high myopia in humans [61]. UNC119 and CABP4 interact with photoreceptor synapses [62]. The gene family associated with phototransduction is in a contracted state, and prolonged exposure to intense UV light may have induced adaptive features in the photoreceptors. We suggest that these genes play a role in visual conservation in the Corsac fox and thus should be used as candidates to validate the genetic basis of vision in this species.
Dietary habits: The Corsac fox is not a specialist predator among mammals. Rodents constitute its main food source, but it has opportunistic habits and will take insects and seeds if food supplies are low [11]. We identified positively selected genes in its genome that mediate taste receptors and gustatory transmission [63], which is also reported in the omnivorous raccoon dog (Nytereutes procyonoides) and not reported as a common trait in canids. Moreover, a contracted gene family associated with olfactory receptors was found in the genome of the Corsac fox. We speculate that this is the genetic basis for the gradual adaptation of the Corsac fox diet to the harsh environment. Bitter-taste perception (related to the ability to taste benzothiamine) is mediated by the TAS2R38 gene [64,65]. The acid taste receptor PKD2L1 expresses proteins that accumulate in the gustatory region, where taste chemicals are detected [66]. TRPM5 and GNG13 are essential for odor transmission, with the TRPM5 gene playing a role in the transmission of bitter, fresh, and sweet tastes [67,68]. Bitter-taste receptors play an important role in avoiding toxic food consumption in the Corsac fox. We identified substance metabolism and degradation-related KEGG pathways (phosphonate and phosphinate metabolism) in the gene enrichment specific to the Corsac fox, with phosphorus being the most enriched in plant seeds. We suggest that these pathways and genes will help the Corsac fox adapt to a wide range of dietary strategies.
Stress reaction: The primary physiological response to environmental changes is cellular stress, which is counteracted by a range of cellular physiological responses [69]. To study the adaptation of the Corsac fox to drought and heat stress, we analyzed the genes involved in stress response [70]. Categories associated with DNA damage and repair showed strong positive selection in the Corsac fox genome. Organisms are chronically exposed to DNA damage factors that affect normal physiological functions, and repair mechanisms ensure overall survival by protecting DNA [71]. Genes related to oxidative stress were enriched in the expanded gene family. Accelerated evolution of related genes was also reported in the llama genome compared to that in cattle and alpacas. Our results were consistent with the finding that accelerated evolution of related genes was also reported in the llama genome compared to that in cattle and alpacas. The FACIT collagen family (fibril-associated collagens with interrupted helices) plays a role in maintaining the integrity of various tissues. Of the 46 genes in this family, we detected nine positively selected genes in the Corsac fox genome [72,73]. According to an earlier study, these nine genes play roles in the muscle, cartilage, eye, blood vessels, heart, and other tissues. These genes may play essential roles in the adaptation of the Corsac fox to harsh environments.
In this study, we have assembled the first chromosome-scale Corsac fox genome. This allowed us to use comparative genomics analysis to gain some insight into the molecular basis of adaptation to drought and extreme environments. Our findings will be further expanded in future research on drought adaptation as genomics technologies develop and more sequencing data become publicly available. Future functional validation analyses are needed to test extended gene families and positively selected genes to validate the genes associated with drought adaptation.

Sample, Library Construction, and Sequencing
An adult male Vulpes corsac was collected from the Hulunbuir Grassland, Inner Mongolia Autonomous Region, China, for genome sequencing. Genomic DNA was extracted from the muscle tissues using the SDS (sodium dodecyl benzene sulfonate) method. The quality and concentration of the extracted DNA were determined using 1% agarose gel electrophoresis and a Qubit fluorometer (Invitrogen, ThermoScientific, Waltham, MA, USA). For gene annotation, transcriptome sequencing was performed using several tissues of Vulpes corsac, including the pancreas, heart, stomach, kidney, spleen, brain, muscle, liver, and lungs. Animals in this study were handled according to the Guide for Care and Use of Laboratory Animals and in conformance with the guidelines established by the Ethics Committee for the Care and Use of Laboratory Animals of Qufu Normal University (permit number: QFNU2019-012).
First, the quality of the isolated genomic DNA was verified. A total amount of 0.2 µg DNA per sample was used as input material for the DNA library preparations. A sequencing library was generated using an NEB Next ® Ultra™ DNA Library Prep Kit for Illumina (NEB, BEVERLY, MA, USA) following the manufacturer's recommendations, and index codes were added to each sample [74]. Briefly, genomic DNA samples were fragmented to a size of 350 bp using sonication. The DNA fragments were end-polished, A-tailed, and ligated using a full-length adapter for Illumina sequencing, followed by further PCR amplification. After PCR products were purified using an AMPure XP system (Beckman Coulter, California, USA), DNA concentration was measured with Qubit ® 3.0 Fluorometer (Invitrogen). Libraries were analyzed for size distribution using a 2100 Bioanalyzer (Agilent, California, USA) and quantified with real-time PCR (>2 nM). Clustering of the index-coded samples was performed on a cBot Cluster Generation System using an Illumina PE Cluster Kit (Illumina, San Diego, CA, USA) according to the manufacturer's instructions. After cluster generation, the DNA libraries were sequenced on an Illumina platform, and 150 bp paired-end reads were generated [75].
DNA extracted from the same individual was used for long-read nanopore sequencing. According to the manufacturer's protocol, 10 µg of Vulpes corsac genomic DNA was used for a 30 kb template library preparation using the Blue Pippin system. A nanopore library was prepared using a Ligation Sequencing Kit (SQK-LSK109), following the manufacturer's instructions, and sequenced on the flow cells of a PromethION sequencer (Oxford Nanopore, Oxford, UK) [76].

Genome Survey
Illumina sequencing data were used for genome survey analysis [77]. K-mer-based analysis was used to estimate genome size and heterozygosity. The genome size calculated by dividing K-mer-number by depth was around 2412.09 Mbp, and the corrected genome size was 2389.14 Mbp. The genome heterozygosity rate was 0.38%, and the percentage of duplicate sequences was 53.90%. Based on the results of k-mer analysis using Soap-denovo2software, the contig N50 was 9,025 bp, and the scaffold N50 was 12,749 bp [78]. Analysis of Contig distribution and GC content led to the conclusion that the Corsac fox genome was generally complete and could be assembled using appropriate strategies.

RNA Extraction and Sequencing
RNA was extracted using the TRIzol Reagent (Invitrogen ThermoScientific, Waltham, MA, USA). RNA purity and integrity were analyzed using agarose gel electrophoresis, and a Nanodrop spectrophotometer was used to determine the RNA quality of each tissue [79].
After the RNA samples were tested, eukaryotic mRNA was enriched using magnetic beads containing oligo(dT). The mRNA was then broken into short fragments by adding a fragmentation buffer. One-stranded cDNA was synthesized using six-base random hexamers with mRNA as the template, followed by buffer, dNTPs, DNA polymerase I, and RNase H. Two-stranded cDNA was subsequently purified using purified double-stranded cDNA, which was first end-repaired, A-tailed, ligated to sequencing junctions, and fragmented using AMPure XP beads for fragment size selection. PCR amplification was performed, and the final library was obtained by purifying the PCR products with AMPure XP beads [80]. After library construction, initial quantification was carried out using Qubit 2.0 to dilute the library, followed by testing the insert size of the library using the 2100 Bioanalyzer. After the insert met expectations, the effective concentration of the library was accurately quantified using the Q-PCR method to ensure its quality. Different libraries were then pooled into a flow cell according to the effective concentration and target downstream data volume. The cBOT was formed into clusters and sequenced using the NovaSeq 6000 high-throughput sequencing platform (Illumina) to produce 150-bp paired-end readings. Quality control was performed on the sequencing data by taking all sequencing reads for image identification, decontamination, and de-functioning. Quality control statistics included the number of sequencing reads, data yield, sequencing error rate, Q20 content, Q30 content, and GC content.

De Novo Assembly and Assembly Results Assessment
After quality control, the filtered reads were used for pure third-generation assembly, and the genome was assembled using NextDenovo v2.3.1 (read_cutoff = 1k, seed_cutoff = 46,333). This was carried out in three principal steps. First, the NextCorrect module was used to correct the original data and obtain a consistent sequence (CNS sequence) after the correction [81,82]. Then, the NextGraph module was used for the genome assembly of the corrected reads to obtain the preliminary assembly of the genome. Finally, the original third-and second-generation data were successively used for genome correction using Nextpolish v1.0.5 (lgs_options = -min_read_len 1k -max_read_len 100k -max_depth 75), and the polished genome was obtained. BUSCO assessment refers to the assessment of genomic integrity at the genetic level [22]. For genome assembly evaluation, the tblastn was compared with the corresponding BUSCO database sequences to identify candidate regions, Augustus v4.1 software was used for gene structure prediction, and HMMER was used for comparison to assess its integrity. CEGMA evaluation is based on a conserved protein family set (248 core genes) that exists in a large number of eukaryotes. The approach evaluates the assembled genome and the accuracy and integrity of the core genes within this genome [83]. Using CEGMA v2.5 (default), we employed information from the core genes of six model organisms to identify candidate regions in the new genome using the tblastn, genewise, and geneid. The system uses the profile of each core gene to ensure the reliability of the final prediction of the gene structure. Sequence consistency assessments use high-quality second-generation sequencing data to assess the accuracy of third-generation sequencing data assembly results at the single-base level. BWA 0.7.8 and samtools v0.1.19 were used to detect single-base variations [84].

Chromosome Assembly Using Hi-C Technology
Two Hi-C libraries were prepared from the muscle tissue [85]. Following the standard protocol described earlier with certain modifications, we constructed Hi-C libraries using the original sample as the input [86]. After grinding with liquid nitrogen, the sample was cross-linked with 4% formaldehyde solution at room temperature in a vacuum for 30 min. 2.5 M glycine was added to quench the cross-linking reaction for 5 min, then the sample was put on ice for 15 min. It was then centrifuged at 2500 rpm at 4 • C for 10 min, and the pellet was washed with 500 µL PBS and then centrifuged for 5 min at 2500 rpm. The pellet was re-suspended with 20 µL of lysis buffer (1 M Tris-HCl, pH 8, 1 M NaCl, 10% CA-630, and 13 units protease inhibitor), and the supernatant was centrifuged at 5000 rpm at room temperature for 10 min. The pellet was washed twice in 100 µL ice-cold 1× NEB buffer and then centrifuged for 5 min at 5000 rpm. The nuclei were resuspended in 100 µL NEB buffer and solubilized with dilute SDS, followed by incubation at 65 • C for 10 min. After quenching SDS with Triton X-100, overnight digestion was applied to the samples with the 4-cutter restriction enzyme MboI (400 units) at 37 • C on a rocking platform. The following steps involved marking the DNA ends with biotin-14-dCTP and blunt-end ligation of the cross-linked fragments. Proximal chromatin DNA was re-ligated using a ligation enzyme. The nuclear complexes were reverse cross-linked by incubation with proteinase K at 65 • C. DNA was purified by phenol-chloroform extraction. Biotin was removed from nonligated fragment ends using T4 DNA polymerase. The ends of the sheared fragments obtained by sonication (200-600 base pairs) were repaired using a mixture of T4 DNA polymerase, T4 polynucleotide kinase, and Klenow DNA polymerase. Biotin-labeled Hi-C samples were enriched using streptavidin C1 magnetic beads. After adding A-tails to the fragment ends and ligation with Illumina paired-end (PE) sequencing adapters, Hi-C sequencing libraries were amplified using PCR (12-14 cycles) and sequenced on an Illumina PE150 [87]. Hi-C uses special experimental techniques to obtain information on the interactions between spatially linked and physically distant DNA fragments. Different contigs or scaffolds are sorted into different chromosomes based on the fact that the probability of interactions within chromosomes is significantly higher than that between chromosomes. Contigs or scaffolds of the same chromosome are sorted and oriented based on the probability of interactions decreasing with increasing interaction distance on the same chromosome. Sequenced Hi-C data were used to mount the assembled contigs/scaffolds at the chromosomal level using AllHic version 0.9.8 [88].

Genome Annotation
A combined strategy based on homology alignment and de novo search to identify whole-genome repeats was applied to our repeat annotation pipeline [89]. Tandem repeats were extracted using TRF via ab initio predictions. Homolog prediction commonly uses the RepBase database employing RepeatMasker v4.0.5 software and its in-house scripts (RepeatProteinMask) with default parameters to extract repeat regions [90]. Ab initio prediction built de novo repetitive elements database using LTR_FINDER, RepeatScout, and RepeatModeler with default parameters, then all repeat sequences with lengths > 100 bp and gap 'N' less than 5% constituted the raw transposable element (TE) library [91]. A custom library (a combination of Repbase and our de novo TE library, which was processed using UCLUST to yield a non-redundant library) was supplied to RepeatMasker for DNA-level repeat identification. Structural annotation of the genome incorporated ab initio, homology-based, and RNA-Seq-assisted predictions and was used to annotate gene models [92]. Sequences of homologous proteins were downloaded from Ensembl, the National Center for Biotechnology Information, and other sources. Protein sequences were aligned to the genome using TBLASTN (v2.2.26; E-value ≤ 1 × 10 −5 ), and then the matching proteins were aligned to the homologous genome sequences for accurate spliced alignments with GeneWise v2.4.1 software, which was used to predict gene structure contained in each protein region [93][94][95]. For gene prediction based on Ab initio, Augustus v3.2.3, GeneID v1.4, Genescan v1.0, GlimmerHMM v3.04, and SNAP (29 November 2013) were used in our automated gene Prediction pipeline for RNA-seq data [96]. Transcriptome reads assemblies were generated using Trinity v2.1.1 for the genome annotation. To optimize genome annotation, the RNA-Seq reads from different tissues were aligned to genome FASTA using TopHat v2.0.11 with default parameters to identify exon regions and splice positions [97,98]. The alignment results were then used as inputs for Cufflinks v2.2.1 with default parameters for genome-based transcript assembly. The non-redundant reference gene set was generated by merging genes predicted by the three methods with Evidence-Modeler v1.1.1, using the Program to Assemble Spliced Alignment (PASA) terminal exon support and including masked transposable elements as input for gene prediction [99]. Individual families of interest were selected for further manual curation by experts. Gene functions were assigned according to the best match by aligning the protein sequences to the Swiss-Prot using BLASTP (with a threshold of E-value ≤ 1 × 10 −5 ) [100]. The motifs and domains were annotated using InterProScan v4.8 by searching against publicly available databases, including ProDom, PRINTS, Pfam, SMRT, PANTHER, and PROSITE. Gene Ontology (GO) IDs for each gene were assigned according to the corresponding InterPro entries. We predicted protein function by transferring annotations from the closest BLAST hit (E-value < 10 −5 ) in the SwissProt database and BLAST hit (E-value < 10 −5 ) in the NR database. We also mapped the gene set to KEGG pathways and identified the best match for each gene [101]. tRNAs were predicted using the tRNAscan-SE program. As rRNAs are highly conserved, we chose related species' rRNA sequences as references and predicted the rRNA sequences using BLAST. Other ncRNAs, including miRNAs and snRNAs, were identified by searching the Rfam database with default parameters using the internal software.
Then, the four-fold degenerate codon sites were extracted from the single-copy orthologs to estimate the divergence time between the Corsac fox and other species. The MCMCtree program in PAML was used to perform this process. Calibration time was obtained from Timetree (http://www.timetree.org, accessed on 10 February 2023), with longer calibration time leading to more reliable divergence time estimation. These data included values for V. lagopus and V. vulpes (2.9-7.4 MYA), M. musculus and H. sapiens The gene families identified by OrthoFinder, of which a total of 20,070 were identified in the genomes of the 12 species, included 12,957 in the sand fox genome. We used CAFE5 to examine contracted and expanded gene families [25]. A random birth-and-death model was used to estimate the size of the gene families at the ancestral nodes. Explicit modeling of rate variation between families was carried out using gamma distribution rate category modeling. A p-value < 0.05 was set as the cut-off. To obtain a better fit to the data, we used the GAMMA model, compared the Model Gamma Final Likelihood, and determined that −k = 3 was a better fit to the data by selecting the highest likelihood converging model [25]. We performed KEGG and GO enrichment analysis on the expansion gene family in the context of annotated genes to determine the functional implications of the expansion and contraction genes (p < 0.05).
Selection analysis was conducted based on phylogenetic relationships, and 9793 singlecopy genes were identified. Positive selection at the DNA sequence level was tested by estimating the ratio of nonsynonymous (dN) to synonymous nucleotide substitutions (dS) between orthologous genes. The branch-site model of CODEML in PAML was used to search for positively selected genes, with V. corsac as the foreground branch and other