Comparative transcriptome provides insights into the selection adaptation between wild and farmed foxes

Abstract The silver fox and blue fox are economically important fur species and were domesticated by humans from their wild counterparts, the arctic fox and red fox, respectively. Farmed foxes show obvious differences from their wild counterparts, including differences in physiology, body size, energy metabolism, and immunity. However, the molecular mechanisms underlying these differences are presently unclear. In this study, we built transcriptome libraries from multiple pooled tissues for each species of farmed fox, used RNA‐seq to obtain a comprehensive dataset, and performed selection analysis and sequence‐level analyses of orthologous genes to identify the genes that may be influenced by human domestication. More than 153.3, 248.0, 81.6, and 65.8 million clean reads were obtained and assembled into a total of 118,577, 401,520, 79,900, and 186,988 unigenes with an average length range from 521 to 667 bp for AF, BF, RF, and SF, respectively. Selective pressure analysis showed that 11 and 14 positively selected genes were identified, respectively, in the two groups (AF vs. BF and RF vs. SF). Several of these genes were associated with natural immunity (CFI and LRRFIP1), protein synthesis (GOLGA4, CEP19 and SLC35A2), and DNA damage repair (MDC1). Further functional enrichment analyses demonstrated that two positively selected genes (ACO1 and ACAD10) were involved in metabolic process (GO:0008152, p‐value = .032), representing a significant enrichment. Sequence analysis of 117 orthologous genes shared by the two groups showed that the LEMD2, RRBP1, and IGBP1 genes might be affected by artificial selection in farmed foxes, with mutation sites located within sequences that are otherwise highly conserved across most mammals. Our results provide a valuable transcriptomic resource for future genetic studies and improvement in the assisted breeding of foxes and other farmed animals.


| INTRODUC TI ON
There is a long history of domestication of wild animals. Humans choose animals with certain characteristics for breeding to satisfy their needs (Li et al., 2020;Rimbault & Ostrander, 2012), and this evolutionary process, during which organisms adapt to living closely with humans, is accompanied by pronounced morphological, physiological, and behavioral changes (Sato et al., 2020). The domestication characteristics of many species have been accumulated and continuously maintained over years of selection, and the related molecular mechanisms have been studied with an abundance of genomic and transcriptomic data resources (Cui et al., 2014;Jiang et al., 2015). As a part of human civilization, the exploration of the genetic mechanism of animal domestication could be used not only to further select traits that are useful to humans but also to improve domestication efficiency (Peng et al., 2014;Song et al., 2019). To date, numerous studies have explored the molecular mechanisms involved in domestication and genetic improvements in several domesticated animals, such as dogs, sheep, cattle, rabbits, and mink (Alberto et al., 2018;Daetwyler et al., 2014;Li et al., 2020;Morris et al., 2020;Sato et al., 2020).
Vulpes is one of the world's most widely naturally distributed groups of terrestrial carnivores, and members of this genus play important roles in maintaining ecosystems (IUCN, 2018). They occupy a wide variety of ecosystems, including deserts, grasslands, forests and agricultural and human-dominated environments (Statham et al., 2014). Two Vulpes species were domesticated 270 years ago in Canada: the blue fox, a variant of the arctic fox, and the silver fox, a variant of the red fox (Trut, 1999(Trut, , 2001. Foxes are important fur species that are widely raised around the world, and the fox fur trade has become one of the three pillar industries of the world fur industry (Zhou, 1987). At present, research on these two farmed fox species focuses mainly on their physiology, biochemistry, breeding, and reproduction (Cao et al., 2019;Zhong et al., 2019). Recently, a genomic study of farmed silver fox focused on the genetic mechanisms underlying behavioral differences identified a few strong positional candidate genes for tame, aggressive, and conventional behavior (Kukekova et al., 2018). Under long-term human management, wild and farmed foxes have major differences in behaviors (Zhang, 2004), body size, and feeding habits (Kukekova et al., 2008(Kukekova et al., , 2011. However, the molecular mechanisms of these differences remain unclear. Developments in sequencing technology provide an opportunity to reveal the genetic mechanism underlying the rapid behavioral evolution that occurs during domestication. Transcriptome analysis is an economic strategy to assemble a large number of protein-coding genes in nonmodel organisms for which genome sequences are not yet available and can be used to identify adaptive genetic mechanisms. Earlier, to characterize the genetic mechanism of differences between wild and domestic animals, comparative transcriptome studies have been performed on a wide range of organisms, such as dogs and wolves, wild and domestic rabbits, and wild and domestic silkworms (Fang et al., 2015;Sato et al., 2020;Yang et al., 2018). Here, we sequenced the pooled transcriptome of multiple tissues of three blue foxes and one silver fox and compared them with those of their wild counterparts to explore genes that might be influenced by domestication and associated with morphological and physiological differences. Furthermore, we aimed to identify gene-associated simple sequence repeats (SSRs) to provide new markers for the future construction of high-resolution genetic linkage maps and reliable genetic markers for the selection of superior fox varieties.

| Sample collection
Farmed foxes, including one silver fox (SF) and three blue foxes (BF1, BF2, and BF3), were collected from the Xingzeyu fox breeding base in Cangzhou city, Hebei Province, China. All of the selected individuals were typical of the breeds and were unrelated. Kidney, brain, and liver tissue was soaked in RNA stabilization reagent (RNAlater, QI-AGEN, USA) and stored at 80℃ in ultradeep-freeze equipment.
The wild fox sequences used in this study included data from one Vulpes vulpes (red fox, named RF) and two Vulpes lagopus (arctic fox, named AF1 and AF2) originally from Kronoby, Finland. These data were downloaded from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database with accession numbers ERR687855, ERR687853, and ERR687854 (Kumar et al., 2015). These data were obtained by sequencing a mixture of brain, liver, and kidney tissue. The arctic fox and blue fox groups were referred to as AF versus BF, and the red fox and silver fox groups were referred to as RF versus SF.

| RNA extraction, library construction, and transcriptome sequencing
Total RNA was extracted using an RNeasy Mini Kit (QIAGEN, USA) according to the manufacturer's protocol. Total RNA was monitored for quality on 1% agarose gels, and its purity was checked using a NanoPhotometer ® spectrophotometer (IMPLEN, USA). The concentration and integrity of RNA were measured and assessed using the Qubit ® RNA Assay Kit (Invitrogen, USA) in Qubit ® 2.0 Fluorometer (Life Technologies, USA) and the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, USA), respectively.
RNA extracts from different tissues (liver, brain, and kidney) of the same individual were pooled. Total RNA (3 μg per individual) was used to construct sequencing libraries with the NEBNext ® Ultra™ RNA Library Prep Kit for Illumina ® (NEB, USA) following the manufacturer's protocol. The library quality was assessed on the Agilent Bioanalyzer 2100 system, and the libraries were sequenced on the Illumina HiSeq 2500 platform; 125 bp paired-end reads were generated.

| Transcriptome assembly, CDS identification, and gene functional annotation
Raw reads in FASTQ format (Cock et al., 2010) were first filtered for adapter or Poly-N and low-quality reads using in-house Perl scripts to obtain clean reads. The error rate (Q20 and Q30 scores) and GCcontent of the clean reads were calculated at the same time, and the subsequent analysis was based on high-quality clean reads (Hansen et al., 2010;Jiang et al., 2011). Reads from the three BF samples were combined for transcriptome assembly. High-quality clean reads were de novo assembled by Trinity 2.9.0 (Grabherr et al., 2011) with min_kmer_cov set to 2 to obtain transcripts. After assembly, the longest transcript sequence from each locus was taken and defined as a unigene, and further analysis was based on these unigenes. The CDS of each putative unigene was extracted using two steps: First, unigenes were subjected to BLAST search in the NR and SwissProt databases with a cutoff of 1e-5. Second, Estscan (Iseli et al., 1999) software was used to identify the unigenes that did not produce any alignment results and to translate the resultant CDSs into amino acid sequences . Gene function was annotated based on seven databases (Nt, Nr, KEGG, SwissProt, PFAM, GO, and KOG/ COG) to understand the function of transcriptionally expressed genes in detail. The website and parameters of the seven databases are listed Table S1 (Altschul et al., 1997;Finn et al., 2008;Stefan et al., 2008). MISA (Miller et al., 2012) was used to identify SSRs in the transcriptome. To verify the SSRs, Primer 3 (http://prime r3.sourc eforge.net/ relea ses.php) was used to design primers for each SSR in batches.

| Simple sequence repeat detection
Three pairs of primers of length 18-22 were designed for each site, and the length of the amplified target fragment was 100-400 bp.
Primers were randomly selected from the SSR of 2/3/4/5/6 nucleotide repeats of BF and SF for PCR amplification and electrophoretic imaging verification. Detailed information regarding the PCR system and amplification conditions can be found in the attachment file of Table S2.

| Identification of gene orthologous groups and calculation of Ka/Ks
OrthoMCL (Li, 2003) software was used to construct orthologous groups from the resultant CDS using BLAST-based approach with dog (Canis lupus familiaris, cfa) and giant panda (Ailuropoda melanoleuca, aml) downloaded from NCBI as the internal and external reference genomes, respectively. Three methods, including DAVID (Dennis et al., 2003), KOBAS (Xie et al., 2011), and the R package ClusterProfiler (Yu et al., 2012), were used for GO and KEGG enrichment analysis. Muscle 3.8 (Edgar, 2004) was used for sequence alignment and for optimizing the protein alignment results. PAUP 3.0 (Guindon et al., 2005) was used to construct evolutionary trees using the sequence alignment results by Muscle.
In genetics, the value of ω (Ka/Ks) can be used as an indicator of the selective pressure acting on a protein-coding gene. The value of ω is the ratio of the number of nonsynonymous substitutions per nonsynonymous site (Ka) to the number of synonymous substitutions per synonymous site (Ks). Sites with ω > 1 are usually said to be evolving under positive selection, and those with ω < 1 are usually said to be under purifying selection. The Paml 4.7 (Yang, 2007) package was used to identify orthologous genes that showed a significantly higher Ka/Ks ratio in the wild and farmed groups.

| Identification of specific mutations in genes of farmed foxes
In the last section of this article, we focus on the orthologous genes shared by two groups (AF vs. BF and RF vs. SF) to identify the gene mutations specific to farmed foxes. First, we performed specific mutation analysis to identify genes in which the two farmed species had the same amino acid mutation relative to wild species. Then, considering that different famed species may have different strategies to adapt to the artificial breeding environment, we also screened for genes with specific mutations in each of the two farmed species.
On this basis, to better understand the conservation of these loci at the mammalian level, we increased the number of target species in the analysis.

Sample
Raw reads Clean reads

| Overview of transcriptome sequencing data
Four transcriptome libraries (SF, BF1, BF2, and BF3) were generated from pooled RNA extracts from different tissues (liver, brain, and kidney) of one SF and three BF. After filtering the raw data, a total of 313.8 million clean reads were remained for further transcriptomic assembly (Table 1). The raw data of RF, AF1, and AF2 were downloaded from the SRA database, and a total of 234.9 million clean reads were remained for further transcriptomic assembly after filtering.

| De novo assembly and functional annotation
Three BF individuals were used to generate a pooled assembly, which we referred to as BF. Two AF individuals were used to generate a pooled assembly, which we referred to as AF. Statistics related to the de novo assemblies of each of the four samples are shown in  Figure S1 and Table S3.
All the unigenes were then searched against seven public data- contributed the largest number of matches ( Figure S2). The percentage of unigenes annotated in at least one database ranged from 39.26% (BF) to 71.23% (RF). Information about the E-value distribution, similarity distribution, and species distributions in these BLAST analyses is displayed in Table S5 and Figure S3. were retained and used for future analysis.  Figure 1 and Table S6.

| SSR detection and primer design
We then used Primer 3.0 to design three primers for each SSR, and this design process was successful for a total of 59,010 (AF: 9,905, BF: 27,471, RF: 6,547, SF: 15,087) SSRs. We randomly selected two pairs of primers to amplify and agarose gel electrophoresis from Di-, Tri-, Tetra-, Penta-and Hexa-nucleotide SSRs, and the results showed 90% and 50% success rates in BF and SF, respectively (Table S7).

| Identification of gene orthologous groups and phylogenomic analyses
We identified a total of 14,750 homologous genes in AF versus BF and Note: N50/N90: The transcript obtained by splicing was arranged from long to short and then accumulated. When the cumulative length>=50%/90% of the total length, then the transcript length is considered N50/N90. the results, only orthologous genes were selected for the following analysis. After filtering, 812 (AF vs. BF) and 710 (RF vs. SF) single-copy genes were retained. We then annotated these single-copy genes according to the GO and KEGG databases to determine their functions.
Different algorithms have been applied in research on the function of orthologous genes, and we used three methods for GO and KEGG enrichment analysis. Ultimately, 1,522 genes in the two groups were annotated with 1,084 GO terms and 37 KEGG pathways (Table 4) showed that the orthologous gene set constructed in this study could satisfactorily cover the genes related to phenotypic differences.
In total, 171 single-copy orthologous genes shared by two groups (AF vs. BF & RF vs. SF) were aligned and then used to construct evolutionary trees in PAUP (Figure 3).

| Calculation of Ka/Ks to Identify Positively Selected Genes
Next, we performed selective stress analysis on the identified single-copy orthologous genes with the goal of revealing the genetic mechanism of phenotypic adaptation in response to domestication.
The identified 812 (AF vs. BF) and 710 (RF vs. SF) single-copy orthologous genes were used to calculate the Ka/Ks ratio. Of these, 11 (AF vs. BF) and 14 (RF vs. SF) candidate orthologous genes were identified to be under positive selection with ω > 1 (Table 5). IGBP1, were identified. To verify the uniqueness of these mutations, the sequences of the three genes in 11 species covering the major mammalian lineages were downloaded from NCBI (Table S8).

| Identification of specific mutations in genes of farmed foxes
Among Carnivores (seven species), Artiodactyla (two species), Perissodactyla, and Cetacea, the specific mutations were found to occur exclusively in fox ( Figure 4).
The LEMD2 (LEM domain-containing protein 2) gene in the wild foxes (AF and RF) was mutated from T to A at site 235, resulting in a change in the encoded amino acid from tryptophan (Trp, W) to arginine (Arg, R). The sequence alignment with the other 11 mammalian species also showed that this change occurred only in wild foxes (RF and AF). Moreover, the RRBP1 (Ribosome binding protein 1) gene in the SF was mutated from A to T at site 527, resulting in a change in the encoded amino acid from leucine (Leu, L) to glutamine (Gln, Q), which was different from the sequence in RF. The IGBP1 (Immunoglobulin binding protein 1) gene showed a patterns similar to that of RRBP1; in SF, it was mutated from G to C at site 153, resulting in a change in the encoded amino acid from lysine (Lys, K) to asparagine (Asn, N), and across the 11 species analyzed, this change occurred exclusively in SF.

| D ISCUSS I ON
Deciphering the genetic basis of animal domestication is an active research area. The availability of transcriptome sequences provides an efficient and economical opportunity to study this issue at the level of individual genes (Li et al., 2020). In our study, comparative transcriptome analysis of mixed RNA from three tissues (liver, brain, and kidney) was used to identify the selected genes and pathways related to the feeding adaptive evolution characteristics, such as changes in physiology, body size, energy metabolism and immunity, between wild and farmed foxes.

| Positively selected genes might play a role in adaptation to strong seasonal fluctuations and energy requirements
Positive selection analyses identified eleven positively selected genes in AF versus BF and fourteen positively selected genes in RF and SF. Among those genes, several were related to protein anabolism (GOLGA4, SLC35A2, and CEP19), innate immunity (CFI, LRRFIP1, and TMEM41B) and DNA damage repair (MDC1). Of these, the protein encoded by the CEP19 gene localizes mainly to centrosomes and primary cilia (Nishijima et al., 2017). Knockout experiments in mice showed that the deletion of the CEP19 gene could lead to obesity in animals (Dayyeh et al., 2011;Adel et al., 2013). We speculate that this gene might play an important role in controlling obesity in fox.
The protein encoded by the CFI gene is a complement regulator protein that may play a role in the immune system through the alterative pathway (AP) in the complement system in the absence of antibodies or the inability of antibodies to effectively bind. Current studies in several species (e.g., shark, rat, carp, and rainbow trout; Anastasiou et al., 2011;Nakao et al., 2003;Schlaf et al., 2010;Shin et al., 2009) show that this gene is conserved and is mainly expressed in the liver.
In contrast to previous studies, this study detected the rapid evolution of this gene, suggesting that CF1 may play an important role in the immune system of Vulpes. The LRRFIP1 gene encodes a DNA recognition receptor that can recognize the transcription products or genomic DNA of pathogenic microorganisms entering the cell and plays a vital role in natural immunity (Yang et al., 2010). The Wild foxes (including the AF and RF) consume considerable energy to hunt and avoid predators, have high energy demands during the reproductive season (Prestrud et al., 1992), need to respond to starvation periods during winter, and react to extremely low temperatures through thermoregulation (Prestrud, 1991). As farmed foxes (including BF and SF) do not have to cope with seasonal fluctuations in food availability, this fluctuation is less pronounced in these species. Hence, these positively selected genes involved in protein synthesis, immunity, and DNA damage repair might play a role in adaptation to strong seasonal fluctuations and energy requirements as experienced by wild foxes. More interestingly, the selected genes vary between the two types of farmed foxes, suggesting that while common targets of selection related to domestication and improvement exist, different evolutionary solutions have arisen to achieve similar end-points within these closely related domesticated species.

| Specific gene mutations might be affected by artificial selection in farmed foxes
A total of 117 orthologous genes shared by the two groups were used to identify the gene mutations specific to farmed foxes. Sequence analysis of these genes showed that the LEMD2, RRBP1, and IGBP1 genes might be affected by artificial selection in farmed foxes. The protein encoded by the RRBP1 gene is a ribosomal binding protein located on the endoplasmic reticulum and is a key gene for ribosomal binding, transport, and secretion of newborn proteins in mammalian cells (Benyamini et al., 2009). Previous studies have shown that this gene can mediate the interaction between the endoplasmic reticulum and microtubules (Cui et al., 2012) and can bind to actin (Diefenbach et al., 2004). Christopher et al. showed that RRBP1 can affect the stability and translation of collagen mRNA after muscle overload (Fry et al., 2017). This result indicated that the expression of this gene was of great significance for the control of muscle fiber hypertrophy in animals. The LEMD2 gene is a nuclear membrane protein gene in the LEM domain. Knockout of this gene in mice caused a significant increase in the activity of MAP kinase and AKT kinase, resulting in embryo death (Brachner et al., 2005;Tapia et al., 2015). The IGBP1 gene is named immunoglobulin binding protein 1 (Jiang et al., 2013;Kong et al., 2009).
Previous studies have shown that the gene is associated with the regulation of protein phosphatase activity, activation of B cells, and apoptosis (Nieradka et al., 2007;Prickett & Brautigan, 2007). Mutation of this gene can lead to developmental delays in intelligence and behavior (Graham et al., 2003). Because of the pleiotropic nature of the gene,

| SSR screening provides reliable genetic markers for the selection of superior fox varieties
Fox is an important fur-bearing animal, and its characteristics, es- Moreover, the SSR dataset will provide new markers for future population genetic and species identification studies.
Although we used transcriptome data to screen out some positive selection genes and mutation sites that might be related to domestication, we had to face the limitations of the transcriptome and the small sample size. We plan to improve these questions by increasing the sample size and using genomic data in the following studies.

| CON CLUS IONS
Understanding the genetic changes that underlie genetic variation in fox may expand our understanding of the genetic modifications F I G U R E 4 Specific mutation sites in the RRBP1, IGBP1, and LEMD2 genes. Comparison of amino acid substitutions of three genes among the gene fragments of 11 Mammalia species. The shared mutation in the LEMD2 gene is T235A, and the amino acid R occurs only in wild fox (arctic fox and red fox). The specific mutation in the RRBP1 gene is A527T and amino acid Q occurs only in silver fox. The specific mutation in the IGBP1 gene is G153C, and the amino acid N occurs only in silver fox that are enriched upon domestication by humans. This study represents the first comparative transcriptome analysis of wild and farmed fox and reveals a number of candidate genes that may be involved in phenotypic differences in foxes in domesticated versus wild environments. Protein synthesis and DNA damage repair genes were under positive selection in farmed species, and the three genes with specific mutations screened from farmed species might play a role in adaptation to strong seasonal fluctuations and energy expenditure as experienced by wild foxes.
Furthermore, a large number of microsatellite markers were discovered and verified, providing new markers for the construction of high-resolution genetic linkage maps and for gene-based association analyses in foxes. which is gratefully acknowledged.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflicts of interests.

E TH I C A L A PPROVA L
All sample procedures and experimental methods were approved by the Qufu Normal University Institutional Animal Care and Use Committee (Permit Number: QFNU2018-013).

DATA AVA I L A B I L I T Y S TAT E M E N T
The raw RNA-sequencing data are deposited in the Sequence Read Archive (SRA) database with accession number of SAMN15015145, SAMN15015146 and SAMN15015147 (Blue fox) and SAMN15015148 and SAMN15015149 (Silver fox).