Genome-Wide Identification and Analysis of Variants in Domestic and Wild Bactrian Camels Using Whole-Genome Sequencing Data

The population size of Bactrian camels is smaller than dromedary, and they are distributed in cold and mountain regions and are also at the risk of extinction in some countries such as Iran. To identify and investigate the genome-wide variations, whole-genome sequencing of two Iranian Bactrian camels were performed with 37.4- and 42.6-fold coverage for the first time. Along with Iranian Bactrian camels, sequencing data from two Mongolian domestic and two wild Bactrian camels deposited in the NCBI were reanalyzed. The analysis eventuated to the identification of 4,908,998, 4,485,725, and 4,706,654 SNPs for Iranian, Mongolian domestic, and wild Bactrian camels, respectively. Also, INDEL variations ranged from 358,311 to 533,188 in all six camels. Results of variants annotation in all samples revealed that more than 88 percent of SNPs and INDELs were located in the intergenic and intronic regions. We found that 800,530 SNPs were common among all studied camels, containing 4,046 missense variants that affected 2,428 genes. Investigation of common genes among all camels containing the missense SNPs showed that there are 98 zinc finger and 4 fertility-related genes (ZP1, ZP2, ZP4, and ZPBP) in this set.


Introduction
Since the Camelidae family has been evolved in North America during Eocene [1], its members have gone a long way to evolution to date. Migration of camel's ancestors in different paths divided them into the Old World (migrated to east) and New World camels (migrated to South America) [2]. The Old World camels include three species, Camelus dromedarius (dromedary camel or onehumped camel), Camelus bactrianus (Bactrian camel or two-humped camel), and Camelus ferus (wild Bactrian camel or wild two-humped camel). Dissimilar to dromedaries living in hot and dry deserts of northern and eastern Africa, the Arabian Peninsula, and southwest Asia [3], the Bactrian camels are distributed in the cold and mountainous regions of Iran, China, Russia, Mongolia, and Kazakhstan [4]. The wild Bactrian camel, known as the only wild survivor of Old World camels, is living in northwestern China and southwestern Mongolia, especially the Outer Altai Gobi Desert [5]. Bactrian population in Iran (less than 150 camels) is in alarming status [6] and a suitable strategy is needed to enhance the population situation. They have been distributed in the Ardabil province located in the northwest of Iran in the border of Azerbaijan.
Camel is the most important animal to provide food in desert areas where other animals are not capable to live and produce. According to FAO, hunger has increased in the world in recent years, so that 821 million people (one out of every nine people in the world) are undernourished [7]. Camels support a significant part of food security in arid and semi-arid regions of Africa and Asia [6]. The use of new technologies in domesticated animal breeding, as an important source of food for humans, will help us raise the quality and quantity of animal source foods and enhance food safety especially in areas with the harsh conditions. Whole-genome sequencing (WGS), as an efficient technology in livestock breeding, provides an opportunity to the better understanding of the genomic structure of desired farm animals. WGS is capable of producing more complete structural genetic information on individuals in comparison with technologies such as SNP chips [8]. Also, genomewide studies as a key tool in the biology era can expand our information about endangered species to monitor and management of their biodiversity [9] and for designing practical strategies to save them from extinction.
The main purpose of the present study was to identify the genome-wide variations (SNPs and INDELs) of Iranian Bactrian camels and to compare them with domestic and wild Bactrian camels from other geographical regions. Functional and regional classifications of identified variants and gene ontology (GO) analysis on affected genes by some classes of variants were among the other goals of this work.

Materials and Methods
2.1. Animals, Sampling, and Total DNA Extraction. Wholegenome sequencing of one 9-(Bac1) and one 11-year old (Bac2) female Bactrian camels was performed to variants identification, variants annotation, and gene ontology analysis of genes containing variants. Bactrian camels in Iran have some characteristics such as the neck, front legs, and humps covered with wool; also, shoulder height and body length in adult camels are 153 and 123 centimetres, respectively (Supplementary 2). All animal care and experiments were approved by the animal science committee of the University of Mohaghegh Ardabili, Iran. Also, all experiments were performed in accordance with a routine guideline which is acceptable by this committee.
Genomic DNA was extracted using an RBC mini kit for mammalian blood with respect to the manufacturer's protocols (Real Biotech; South Korea). The quality of the extracted DNA for each camel was monitored using gel electrophoresis in 1% agarose gel. One paired-end library comprising two lanes was constructed for each camel. The length of reads was 2 × 100 bp with an average insert size of~365 and 360 bp for Bac1 and Bac2, respectively. The sequencing of constructed libraries was carried out using the Illumina HiSeq 2000 system (Illumina, San Diego, CA). To compare our samples with other breeds of Bactrian camels, we downloaded publicly deposited sequenced short reads of two domestic (accessions: SRX973830 and SRX973833) as well as two wild Bactrian camels (accessions: SRX973840 and SRX973843) from SRA-NCBI (short reads archive).

FastQC
(http://www.bioinformatics.bbsrc.ac.uk/projects/ fastqc/) was used to perform a general overview of short reads quality status; in fact, this program allows us to get general information about our data and make better decisions for the downstream analysis. Trimmomatic v.0.36 (MaxInfo approach) [10] was applied to quality filtration of raw sequencing reads for all individuals. After quality filtration, the reads less than 40 bp in length were removed. Supple-mentary 2 presents the output of quality checking of one of the studied samples, before and after trimming.

Postalignment Processes and Variant
Calling. In many large-scale sequencing projects, several processes are performed as postalignment processes between the mapping of reads to reference genome to variant calling for decreasing erroneous calls [12]. Marking duplication was performed using Picard (http://broadinstitute.github.io/picard), and GATK [13] was used for local realignment and base quality score recalibration. We used SAMtools [14] to call SNPs and INDELs for individuals and called variants filtered as described by Khalkhali-Evrigh et al. [15].

Annotation of Called Variants.
Due to the flexibility of SnpEff [16] to annotate genomic variants of animals without a database, we selected it for annotation of identified variants in the present work. Similar to our previous project on dromedary camels [15], due to the lack of database for Bactrian camels among the prebuilt databases for SnpEff, we built a database for Bactrian camels using the Bactrian reference genome and its GFF file using the database building guidelines (http://snpeff.sourceforge.net/ SnpEff_manual.html#databases) in SnpEff.

Gene Ontology (GO) Analysis of Genes Containing
Variants. DAVID v.6.8 [17] was used to the functional enrichment analysis of genes containing SNPs and INDELs. For this purpose, we extracted the genes affected by missense SNPs from shared SNPs among all samples, shared only between Iranian Bactrian camels (only_B1B2), shared only between Mongolian domestic Bactrian camels (only_DC), and shared only between wild Bactrian camels (only_WC). Also genes affected by frameshift INDELs were classified using gene ontology analysis. After uploading desired genes in DAVID, due to incomplete information of camels in mentioned program, we selected cattle (Bos taurus) as species and background species. Cattle was chosen based on Wu et al.'s [18] study suggested that the estimated divergence time between camelids and cattle is closer compared to other mammalian. The calculated p values were corrected using the Benjamini correction for multiple testing, and enriched terms were considered statistically significant at p adjusted < 0.1.

Massively Parallel Sequencing of the Iranian Bactrian
Camels. Whole-genome sequencing of two female mature Bactrian camels sampled from the Jahad Abad station located in Meshkin-Shahr (Ardabil province) yielded 813,158,076 and 933,365,664 short reads of 100 bp for Bac1 and Bac2, respectively. After removing of error-prone and low-quality bases and reads from initial raw reads, approximately 98.09% (797,679,174 reads) and 97.66% (911,654,292 reads) were remained as clean reads for Bac1 and Bac2, respectively. The percentage of retained reads for downloaded samples 2 International Journal of Genomics was higher than our samples (  (Figure 2(b)).

Annotation of SNPs.
Annotation of identified SNPs revealed that more than 88.2% of all SNPs are located in intergenic and intron regions (Table 3). Only 0.008 (for DC408) to 0.01 (for Bac1 and Bac2) percent of SNPs have a high impact (stop lost, stop gained, and start lost variants) on products of genes. The number of missense variants ranged from 14,208 for DC408 to 20,003 for Bac1. We extracted affected genes by missense SNPs in shared SNPs among all individuals (All_ind), only_B1B2, only_DC, and only_WC sets and found 2,428, 1,945, 329, and 1,786 genes for the mentioned sets, respectively (Supplementary 1). Some of the SNPs were categorized into more than one functional classes; therefore, we considered them as multi_annotated SNPs with unknown impact and avoided discussing them anymore. Investigation of all affected genes by missense SNPs revealed that 3,246 genes (Supplementary 1) are common among all individuals. These numbers for individuals ranged from 6,158 for DC408 to 7,919 for Bac1 ( Figure 3).

Annotation of INDELs.
Detected INDELs were classified into seven regional (e.g., intergenic, intron) and seven functional (e.g., frameshift, conservative inframe insertion) variants, as well as some INDELs considered as multi_ annotated variants (Table 4). According to the results for SNPs, most of the INDELs occurred in intergenic and introns, and more than 88.6% were common in all samples. We found that~0.07 percent (ranged from 248 for DC408 to 371 for Bac1) of the INDELs eventuate to frameshift in related genes. We extracted the affected genes by frameshift variants for all individuals, common genes among Iranian and non-Iranian domestic (112 genes), and also those that were common between wild Bactrian camels (174 genes). Eighty genes containing frameshift variants were common among all studied individuals (Supplementary 2).

GO Enrichment of Affected Genes by Missense SNPs and Frameshift
INDELs. GO analysis was performed on affected genes by shared missense SNPs in All_ind, only_B1B2, only_DC, and only_WC sets. Results revealed that significantly enriched terms for molecular function are 5 (ATP binding, ATPase activity, calcium ion binding, microtubule motor activity, and motor activity), 2 (Rho guanyl-nucleotide exchange factor activity and transcriptional activator activity and RNA polymerase II core promoter proximal region sequence-specific binding), and 1 (ATP binding) for All_ind, only_B1B2, and only_WC sets, respectively. We found no significant terms in the biological process for mentioned sets; however, one enriched term in cellular component for All_ind set (myosin complex) was found (Supplementary 1). No significant enriched GO terms were observed in biological Also, we found that only one GO term under molecular function (neurotransmitter : sodium symporter activity; GO : 0005328) was significantly enriched (p adjusted = 0:031) for 80 common genes among all samples, containing a frameshift variant. Enrichment analysis results for common genes among Iranian and non-Iranian domestic Bactrian camels were similar to the mentioned 80 genes; however, no signifi-cantly GO term was found for common genes between wild Bactrian camels.

Discussion
In the present study, we performed the first whole-genome resequencing on two Iranian Bactrian camels to genomewide variant identification and detailed analysis of them. Obtained short reads after quality filtration were covered       Figure 3: Venn diagram created by InteractiVenn [19], showing the overlap of genes containing missense variants in all studied Bactrian camel. 5 International Journal of Genomics wild Bactrian camels, respectively. The lower number of detected SNPs in non-Iranian domestic Bactrian camels (belonged to Mongolia), might be resulted from their higher similarity to reference genome (belonged to Alxa breed, a domestic Bactrian camel distributed in China). The heterozygote to homozygote SNP ratio in DC399 (1.75), DC408 (1.79), and Bac2 (1.82) was close, while this value was 3.77 for Bac1. This result was consistent with our previous work on the Iranian dromedary camel which showed higher heterozygosity in them [15]. Based on the microsatellite data, there is introgression between Bactrian and dromedary camels in Iran [6]; therefore, observing higher heterozygosity in a portion of the Iranian camel's population is not unexpected. Also, based on archaeological evidence, Iran is a possible place of origin for the domestic Bactrian camels [21]. Therefore, another probable reason for the high genetic diversity in Iranian Bactrian camels with a small population size [22] can be attributed to the mentioned issue.
Despite the high number of detected SNPs in wild Bactrian camels (WC247 and WC305), heterozygote SNPs in these samples were the lowest in comparison with four domestic camels, so that 52.2 and 46.1% of SNPs were heterozygote in WC247 and WC305, respectively. In many cases, wild species have higher genetic diversity compared to their domesticated breeds, such as pig [23] and dog [24]; however, in some cases such as wild Bactrian camels and for some reasons, this fact can be reversed. One of the reasons for the low heterozygosity in wild Bactrian camels can probably result from their small population size. According to available statistics, there are less than 1,000 wild Bactrian camels in the world [25], which are distributed in China and Mongolia [26].
Annotation results revealed that most portion of the discovered variants throughout the genome of Bactrian camels (more than 88 percent) are located in the intergenic and intronic regions. Our results were consistent with these values for Iranian (87.7% for SNPs, 88.9% for INDELs) and African dromedary camels (88% for SNPs, 89.1% for INDELs; [15]).
The distribution pattern of SNPs in the different regions for common SNPs set was similar to all identified SNPs, so that approximately 89% of shared SNPs among samples were classified as intergenic and intronic variants, and only 0.5% (4,046) of them as missense variants led to amino acid changes (Supplementary 1). The eighty-eight missense SNPs from this set were found in the 30 solute carrier (SLC) gene superfamily. SLC genes encode membrane-bound transporter proteins to transport substrates such as glucose, amino acids, lipids, cation, anion, and many other essential ingredients for the body [27]. SLC27A4 (FATP4) with a missense SNP was one of the important genes in this list. The expression level of the FATP4 gene in the duodenum is higher in comparison with other organs [28]. Encoded protein by this gene plays an important role in the uptake of fatty acids from the small intestine [29]. Lobo et al. [30] reported a reduction in triacylglycerol deposition, diacylglycerol, and monoacylglycerol levels and also lipid droplet sizes in FATP4 knockdown adipocytes. It seems that FATP4 is a key gene for camels because of the importance of fat metabolism in this species. Also, we found four genes (SLC2A4, SLC2A9, SLC5A4, and SLC37A1) related to glucose metabolism and hemostasis in the above-mentioned list. SLC2A4 (GLUT4) is used by insulin as a tool for glucose uptake; in fact, insulin simulates translocation of GLUT4 to the cell surface and, as a result, to glucose uptake [31]. Evidences showed that decreasing in GLUT4 gene expression is associated with insulin resistance and type 2 diabetes in human [32]. SLC2A9 (GLUT9) is another facilitative glucose transporter with one missense SNP in this study. The expression of this gene in the insulin secreting β-cells may reflect its function as a glucose sensor in β-cells pancreatic islets [33]; however, another important role of this International Journal of Genomics gene is participation in urate homeostasis [34]. SLC5A4 (SGLT3) and SLC37A1 (G3PP) genes that are highly expressed in the digestive system [28] were another affected gene by missense SNPs that play important roles in the absorption and hemostasis of glucose. Considering that camels do not develop diabetes [26], presumably common missense SNPs in glucose metabolism-related genes found in this study have positive effects on camels in dealing with diabetes. Although this is a theory, and to prove it, we need more studies in this field.
Here, we considered common homozygote SNPs among domestic Bactrian camels as fixed SNPs for them, if these variants were absent in the genome of both wild Bactrian camels. It is likely that human selection after domestication is the cause of the fixation of these alleles in domestic Bactrian camels [15]. According to the mentioned criteria, we found 43,944 SNPs, and 16,201 of them were located in the genic region and affected 3,405 genes (Supplementary 1). One of the significantly enriched biological process terms for fixed SNPs set was intracellular signal transduction (GO : 0035556; Supplementary 1), which can be defined as a chain of reactions to transmit chemical signals from the cell surface to their intracellular targets. One of the main targets of these signaling reactions could be transcription factors that ultimately lead to a change in the expression level of genes in response to extracellular stimuli [35].
Investigation in common genes among all individuals that affected by missense variants revealed that 98 of these genes belonged to the zinc finger (ZNF) genes. ZNFs are one member of the large gene family with numerous members and various functions in eukaryotes. Some of the known functions of zinc finger proteins include cell proliferation, differentiation, and apoptosis. Also, zinc finger proteins are involved in skin homeostasis, intestinal epithelium biology, muscle differentiation, adipogenesis, and regulation of cellular stemness [36]. Another interesting issue was the existence of four fertility-related genes (ZP1, ZP2, and ZP4 for females and ZPBP for males) in the list of common genes. The zona pellucida (ZP) is a glycoprotein layer that is synthesized and secreted by mammalian oocytes and surrounds it [37]. ZP in species like human [38], hamster [39], and cat [40] is formed by four glycoproteins (ZP1, ZP2, ZP3, and ZP4) and plays important roles in species-specific spermegg binding, preventing polyspermy, ZP-induced acrosome reaction, and protecting the embryo [38]. Zona pellucida binding protein (ZPBP) that is expressed in testis [28] is involved in male fertility, and defects in this gene are associated with abnormal sperm morphology [41]. These results can probably give us some indication about the evolution in the Bactrian camels and by performing more studies in this field, we can better understand the concept of these results. Identification of genetic variation in the wholegenome scale gives us the opportunity of development of approaches like genome-wide association studies and genomic selection [15]. As an achievement, these tools can help in selecting a base population and increasing the number of Iranian Bactrian camels, as well as deepening our knowledge to designing appropriate breeding strategies for this valuable livestock [42].

Data Availability
All short paired-end reads that used in present study were deposited in SRA database at NCBI with accession numbers SRX3763563 and SRX3774562 for Bac1 and also SRX3773669 and SRX3916437 for Bac2.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.