Background & Summary

Organisms encounter constant environmental changes, which occurred throughout their lifetime and across generations. As organisms move into a new habitat, natural selection helps facilitating evolution of phenotypes that are better equipped to survive in the new environment. As a classic example of adaptation to a novel environment, high-altitude adaptation involves multifaceted adaptations, including at the level of morphology1,2, genetics3,4 and physiology5,6,7. Previous studies investigated genetic mechanisms of high elevation adaptation, and identified numerous genes that are under selection. It was reported that persistent changes in gene expression may be induced by long-term environmental changes, and can respond rapidly to sudden external challenges8.

microRNAs (miRNAs) are a class of ~22 nucleotide (nt) RNAs that post-transcriptionally regulate gene expression through translational inhibition or mRNA degradation9. It was reported that miRNAs play important roles in cold acclimation10,11, hypoxia stress12,13 and ultraviolet radiation stress14,15. For example, miR-32 is induced by cold, and directly targets transducer of ERBB2, 1 (Tob1) to activate p38/MAPK signaling to promote brown adipocyte function and trans-activates white fat browning through fibroblast growth factor 21 (FGF21) in mice16. miR-210, a so-called ‘hypoxamir’ regulated by HIF-1α, has been demonstrated to target a variety of genes involved in the cell cycle, mitochondrial metabolism, angiogenesis, DNA damage repair, and cell survival17. Thus, it is reasonable to suppose the important roles of microRNAs in high-altitude adaptation.

To reveal the potential roles of miRNAs in high-altitude adaptation, we generated high-quality small RNA sequencing data for six tissues (heart, liver, spleen, lung, kidney, muscle) from high- and low-altitude populations of three species: chicken, sheep and pig. In addition, we also generated small RNA sequencing data from low-altitude cattle and high-altitude yak populations (Fig. 1). Detailed sample information is listed in Table 1. In total, 141 samples were analyzed in this study, including 11 samples that were published18. Each population was represented by two or three deep-sequenced biological replicates.

Fig. 1
figure 1

Graphical representation of animal distribution, sample collection and the process of RNA extraction, small RNA sequencing and data analysis.

Table 1 Basic sample collection information.

Using a detection threshold of ≥4 reads across >40% samples in each species, we detected 2,036 mature miRNAs. Briefly, we identified comparable amounts of mature miRNA genes and precursor miRNAs for each species (Table 2). The unbiased miRNA annotation for each species prominently improved the uniformity of microRNA count in domesticated animals compared with miRbase21.0, which facilitated comparative analysis of miRNA evolution in domesticated animals.

Table 2 The number of miRNAs identified in this study.

In addition, one-to-one orthologous miRNA genes were also identified based on evolutionary relationships. Overall, we identified 53 orthologues among vertebrates, 116 orthologues in Artiodactyla and 171 orthologues in ruminants (Fig. 2), consistent with a previous study which reported ~100 miRNA orthologues in mammals9. The detailed information of one-to-one orthologous miRNA genes for each pair of species is listed in figshare19 (The detailed information about one to one miRNA orthologue, figshare https://doi.org/10.6084/m9.figshare.c.4584113).

Fig. 2
figure 2

The number of one to one miRNA orthologues identification across four species. (a) Number of miRNA orthologues identified in each pair of species. (b) Number of miRNA orthologues across three species or four species.

Methods

Sample collection and RNA isolation

Three adult females for each population of the five species from distinct altitudes were used (Table 1). These populations have been confirmed to live at corresponding altitudes for their lifetime. Animals were humanely killed to ameliorate suffering. Upon excision from the corpse, a piece of tissue fragments from six organs including heart, liver, spleen, lung, kidney and skeletal muscle(longissimus muscle for pig, cattle, yak and sheep, and pectoral muscle for chicken) were immediately placed frozen in liquid nitrogen and then stored at −80 °C until use. Total RNA was extracted using the standard Trizol (Takara, Japan) protocol.

Small RNA library preparation, sequencing, and miRNA annotation

Small RNA libraries were constructed using the Illumina TruSeq Small RNA Sample Prep kit. Libraries were assessed using the Agilent 2200 TapeStation and sequenced on the Illumina HiSeq. 2500 platforms. Initial bioinformatics analysis (base calling) was performed with CASAVA 1.8 (Illumina, USA) to generate raw reads (in FASTQ form). Next, raw reads were subjected to a series of stringent filters (such as removing low-quality reads, repeated sequences, and adaptor sequences) and processed sequences were then mapped to the corresponding reference genome [i.e., chicken (Galgal4), pig (Suscrofa 10.2), cattle (UMD3.1), goat (CHIR 1.0), and sheep (Oar v3.1)] for each species with stringent criteria (<1 mismatch along the whole sequence) using Bowtie20. Then, mapped reads were submitted to miRDeep2.021 to detect miRNAs for each species with default parameters. Mature miRNA sequences of chicken and all annotated mammalian species in miRbase21 were selected as reference. To identify miRNAs with ≥4 reads across >40% samples were retained for further analysis. Notably, our previous study identified comparable numbers of miRNA species and similar expressional patterns in bovine genome and yak genomes18, therefore, we selected the bovine genome as the reference genome in this study.

Orthologous miRNA identifications

To identify miRNA orthologues, we first performed an all-against-all BLATST22 analysis of miRNA precursors for a specific evolutionary taxonomy. All BLAT matches with >75% of both precursors aligned, an identity of ≥70% and an e-value < 10−5 were retained, retained matches were clustered using hcluster (http://treesoft.svn.sourceforge.net/viewrc/treesoft/trunk/hcluster). Clustered families with only one member for each species were first defined as “natural one-to-one” orthologues. For those families with more than one member in any species, we performed liftOver23 of each reference genome on human miRNA coordinates to assistant with extraction of one-to-one orthologues.

Data Records

Small RNA-seq data of heart and lung for yak/cattle which we published before18 were available in the NCBI Gene Expression Omnibus (GEO) under accession number GSE8783324, and the small RNA-seq data of 126 remaining samples (excluded four samples with mapping rate <65%) were under accession number GSE12441825. The detailed information of 126 samples is uploaded to figshare19 (Metadata of samples submitted to the NCBI Gene Expression Omnibus, figshare https://doi.org/10.6084/m9.figshare.c.4584113).

Technical Validation

Sequencing quality control

As shown in Fig. 2, the quality of the small RNA sequencing was checked by analyzing the raw reads, clean reads, and mapping rate for each sample. Briefly, total of 1.66 Gb raw data were obtained. After stringent filtering, 1.60 Gb high-quality reads were remained and an average proportion of 80.89% high-quality reads can be mapped to the respective genomes (Fig. 3 and Sequencing quality control for 141 transcriptomes, figshare https://doi.org/10.6084/m9.figshare.c.4584113). Notably, four samples with mapping rate <65% were excluded to ensure data quality.

Fig. 3
figure 3

Overview of small RNA-Seq data. (a) Distribution of raw reads. (b) Distribution of clean reads, and (c) mapping rate of each sample for each species.

Reproducibility validation

To examine the reproducibility of biological replicates, we calculated the Pearson’s correlation for each pair of biological replicates. The majority of biological replicates showed high Pearson’s correlation coefficient (Fig. 4a). Meanwhile, we evaluated between-replicate variation with MvA plots for all pairs. Consistent with Pearson’s correlation analysis, most of biological replicates showed minor variation (Fig. 4b and MvA plots for examining between-replicates variation, figshare https://doi.org/10.6084/m9.figshare.c.4584113). Principal component analysis also showed that the majority biological samples could be grouped together (Principal component analysis of each tissue for each species, figshare https://doi.org/10.6084/m9.figshare.c.4584113). These results highlight the strong experimental confidence of this dataset.

Fig. 4
figure 4

Assessment of reproducibility across biological replicates. (a) Pearson’s correlation coefficient of biological replicates for each species. (b) MvA plots for examining between-variation of heart in yak/cattle group. The Y-axis represents a log (base 2) fold-change in expression changes and the X-axis indicates the log of the average gene expression level.

Usage Notes

Because of its ability to identify novel miRNAs and accurately determine quantitative expression, small RNA sequencing technique has made it possible to obtain large datasets26. High-altitude adaptation has recently become a topic of interest, and attracted many specialists in fields such as genetics and molecular biology. We believe that this dataset will provide valuable groundwork for understanding the molecular mechanisms of high-altitude adaptation in vertebrates. These data could also provide an opportunity to comprehensively compare the similarity and difference of the mechanisms underlying high-altitude adaptation across multiple domesticated animals. In addition, our laboratory profiled the mRNA transcriptome for corresponding samples, which facilitates more precise investigation of the interaction between miRNAs and mRNA genes associated with vertebrate high-altitude adaptation27 Epigenetic modification vary from one individual to another and could be affected by sex, age, weight, habitat, nutrition and so on, thus it is necessary to be cautious to interpret the mechanism of high altitude adaptation using this dataset.