Adaptations to a Subterranean Environment and Longevity Revealed by the Analysis of Mole Rat Genomes

associated with improved oxygen delivery in hypoxia-tolerant seals (Schneuer et 2012) and shellﬁsh (Kraus Colacino, 1986), which is suggestive of a common adaptation with subterranean rodents. Overall, our data suggest that globins are consti-tutively and highly expressed in the brain of hypoxia-resistant rodents and play a major role in their ability to adapt to an oxy-gen-poor subterranean environment. Experiments are in prog-ress to further dissect the expression and function of globins in African mole rats.


INTRODUCTION
Subterranean rodents comprise approximately 250 species that spend their lives in dark, unventilated environments and are found on all continents except Australia and Antarctica . African mole rats (hystricognath rodent family Bathyergidae) are long-lived, strictly subterranean rodents that feed on underground roots and tubers. They are able to flourish in habitats that are poor in oxygen and rich in carbon dioxide and ammonia , conditions that are harmful to mice and rats. It is hypothesized that the African Rift Valley acted as a geographical barrier in shaping the adaptive radiation of African mole rats into southern Africa and from there to other regions (Faulkes et al., 2004). Until now, the only African mole rat genome has been that of the 35 g naked mole rat (NMR, Heterocephalus glaber), which resides in northeast Africa and is the most basal African mole rat lineage . The lack of genomic information for closely related species thus far has prevented detailed analyses of African mole rat traits. To better understand the molecular mechanisms underlying the traits of African mole rats, we developed an improved NMR genome assembly, determined the genome sequence of the related 160 g Damaraland mole rat (DMR, Fukomys damarensis) found in the arid regions of southwest Africa (Figures 1A and 1B), and sequenced the transcriptomes of additional subterranean rodents. Further analyses allowed us to decipher molecular adaptations consistent with subterranean life and shed light on unique traits of a most unusual mammal, the NMR.
heterozygous SNPs and estimated a nucleotide diversity (heterozygosity) of 0.06%, which is comparable to that in the NMR, but lower than that in rodents such as mouse and rat . The low level of nucleotide diversity in the DMR and NMR may reflect their unique social system, which involves a single breeding ''queen'' per colony, and the low effective size of their populations . The number of repeat elements in the DMR genome was also lower (28%) than in other mammals but comparable to that of the NMR (Table 1; . We employed homology and de novo methods as well as RNA sequencing (RNA-seq) data to predict 22,179 protein-coding genes in the DMR genome (Table 1; Figure S1C), which is comparable to what is predicted for other mammals. Our analysis revealed that the common ancestor of the DMR and NMR lived approximately 26 million years ago (Mya) ( Figures 1C and S1D), which is similar to the distance between mice and rats, or between humans and macaques.
We further prepared a version of the NMR genome based on the original genome sequence , additional sequencing, and data generated by the Broad Institute (Table 1). The new NMR assembly had a genome size of 2.7 Gbp (92-fold coverage), with a scaffold N50 of 21 Mb compared with 1.6 Mb in the previously published assembly . The resulting DMR and NMR genomes, gene models, and transcriptome data for these and related rodents were used to reveal both common and unique features of these animals. We primarily focused on genes that are likely to be involved in the ecophysi- (B) Species range map of African mole rats. DMR and NMR occurrence is shown in blue and green, respectively. (C) Phylogenetic tree constructed using 4-fold degenerate sites from single-copy orthologs, with branch lengths scaled to estimated divergence time (with error range shown in parentheses). Distances are shown in millions of years (Myr). See also Figure S1. ology and exceptional longevity of underground-dwelling African mole rats.

Sensory Cues
Analyses of gene family contractions and expansions provide insights into the evolutionary forces that have shaped genomes. Among the 19,839 gene families that are inferred to be present in the most recent common ancestor of mammals, we found that 212 gene families were gained and 59 were lost from the DMR genome ( Figure S1E; Table S1). Over the same period, the NMR gained 378 gene families and lost 29. The gene families gained included olfaction (sense of smell) genes that likely play an important role in social interaction and locating food in complete darkness (Heth and Todrank, 2007). The NMR and DMR live exclusively in the dark and display small eyes and poor visual acuity . However, their eyes can still serve to alert the colony to invasion by predators by detecting light entering their tunnels (Nemec et al., 2008;Kott et al., 2010). The visual perception category was enriched in both the DMR (Table S2) (Gene Ontology [GO]: 0007601, p < 0.001, Fisher's exact test) and NMR pseudogene lists. We found that one visual perception gene (AOC2) was lost and 13 were pseudogenized in the DMR (Table S2). Three visual genes (CRB1, GRK7, and GJA10) were inactivated or missing in the DMR and the new NMR genome assembly (Table S2). Positive selection of the rhodopsin gene RHO, which enables dim-light vision, was found in the lineage leading to the common ancestor of the DMR and NMR (Table S3). This is consistent with evidence showing that African mole rat RHO underwent accelerated evolution while preserving sites critical for spectral tuning (Zhao et al., 2009). Interestingly, we observed cataracts in all examined NMRs ranging from 4 to 20 years of age ( Figure S2A). This phenotype may be a consequence of captive life under atmospheric oxygen levels, but could also highlight an inadequate antioxidant defense. Low glutathione peroxidase 1 (GPx1) levels may contribute to a decreased protection of the lens against oxidative stress (Kasaikina et al., 2011). A premature stop codon occurs in the gene encoding GPx1 (GPX1) in both the NMR and DMR ( Figure S2B), and knockout of this gene in mice results in cataract formation (Reddy et al., 2001;Wang et al., 2009;Wolf et al., 2005).

Adaptations to Hypoxia and a High Carbon Dioxide and Ammonia Environment
The DMR, NMR, and other subterranean rodents rest with conspecifics in underground environments low in oxygen and high in carbon dioxide and ammonia-conditions that would evoke cellular damage and behavioral stress responses in other mammals . Ammonia is a potent irritant that arises from nitrogen and methane accumulation in latrines and nests LaVinka et al., 2009). We found that arginase 1 (ARG1), which catalyzes the final step of the hepatic urea cycle and removes ammonia from the body, has a radical residue change in both the NMR and DMR: His254 replaces Leu/Tyr, which is present in 38 other vertebrate species ( Figure 2A). This amino acid change was also detected in the distantly related subterranean coruro (Spalacopus cyanus) and the semi-subterranean degu (Octodon degus) of South America. The common ancestor of Octodontoidea (coruro and degu) and Cavioidea (guinea pig) diverged 35 Mya, while African and South American rodents diverged 41 Mya (Antoine et al., 2012;Meredith et al., 2011; Figure 2B). His254 is located immediately downstream of a conserved motif required for binding manganese and ARG1 function (Dowling et al., 2008; Figure 2C). Moreover, ARG1 is a homotrimer, with the salt bridges formed by Arg255 and Glu256 being critical for its assembly (Lavulo et al., 2001;Sabio et al., 2001). The charged residue flanking the ARG1 core may improve ammonia removal efficiency by interacting with the acidic Glu256 or by strengthening the Arg255-Glu256 salt bridge. In addition, several genes in the urea cycle were expressed at higher levels in NMR and DMR livers compared with mouse and rat (Table S4; Figure 2D). This included arginase 2 (ARG2), the second arginase gene that normally is not expressed in rodent liver. Moreover, expression of the mitochondrial ornithine transporter ORNT1 (SLC25A15), which is essential for the urea cycle (Fiermonte et al., 2003), was elevated in the NMR and DMR. Taken together, these data indicate that subterranean hystricognath rodents present enhanced ammonia detoxification.
The buildup of CO 2 in underground habitats evokes pain, as CO 2 is converted into acid that stimulates pain receptors in the upper respiratory tract, nose, and eyes (Brand et al., 2010). A recent study found that a negatively charged motif in the sodium channel Na(V)1.7 protein (SCN9A), which is highly expressed in nociceptor neurons, prevents acid-induced pain signaling to the NMR brain (Smith et al., 2011). We compared 44 vertebrate sequences and found that the motif is also present in the DMR, two African mole rats in the same genus as the DMR (the Ansell's mole rat [Fukomys anselli, FA] and the Mashona mole rat [Fukomys darlingi, FD]), the South American subterranean coruro and semi-subterranean degu, the cave-roosting little brown bat (Myotis lucifugus), and the European hedgehog (Erinaceus europaeus). The distantly related subterranean blind mole rat Spalax galili also harbors the same amino acid changes . These animals are exposed to chronic hypoxia and hypercapnia in burrows or caves ( Figure 2E), suggesting that convergent evolution resulted in similar amino acid changes in Na(V)1.7 and adaptation to high CO 2 levels.
Changes in both gene expression and gene sequences contribute to adaptive mechanisms in subterranean rodents (Avivi et al., 2010). We compared the normoxic brain transcriptomes of subterranean rodents with those of rodents living primarily ''aboveground'' (surface dwelling). In addition to the NMR and DMR, we generated the transcriptomes of three subterranean hystricognath rodents: the FA, the FD, and the coruro of South America. We further compared them with rat and two guinea pig subspecies (Table S5).
Several genes associated with DNA damage repair and responses to stress showed higher expression in subterranean rodents even during normoxia (Table S5; Figure S2C). Hypoxia induces DNA damage, and in agreement with recent reports on the blind mole rat Shams et al., 2013), our data suggest that improved DNA repair is an intrinsic mechanism of adaptation to an underground environment. The most obvious adaptation to a hypoxic subterranean environment is improved oxygen uptake to highly oxygen-demanding tissues, such as the brain. The globin family comprises proteins that are responsible for the delivery and storage of oxygen in cells and tissues. We found that hemoglobin a (HBA1 and HBA2, Coding DNA sequence identical coding sequences) and neuroglobin (NGB) displayed elevated expression in the brains of subterranean rodents during normoxia ( Figure 2F; Table S5), and western blot analysis verified higher hemoglobin a protein expression in the normoxic brain of the NMR compared with that of several surface-dwelling rodents ( Figure 2G). We next compared the gene expression of NMR and DMR with the hypoxia-sensitive rat after 8 hr at oxygen levels comparable to those found in NMR burrows (8% O 2 ) . Similar to what was observed under normoxia, hemoglobin a and neuroglobin expression in the hypoxic brain was higher in the NMR and DMR than in the rat ( Figure 2H). We observed a 3.4-fold decrease of hemoglobin a mRNA in the DMR under hypoxia, whereas expression in the NMR did not change significantly. Higher NGB expression was observed in the hypoxic rat and NMR brain, but not in the DMR brain, and there was a trend toward higher cytoglobin (CYGB) expression in the DMR. Species-specific expression of globins in response to hypoxia was previously reported in subterranean blind mole rat species (family Spalacidae) that are distantly related to African mole rats (family Bathyergidae) (Avivi et al., 2010). Hemoglobin has a higher affinity for oxygen and is able to unload oxygen more efficiently in the NMR than in the mouse (Johansen et al., 1976), and hemoglobin a plays a neuroprotective role in the brain of rodents during hypoxia (Schelshorn et al., 2009). A unique amino acid change (Pro44His) in hemoglobin a has recently been hypothesized to convey the adaption to the subterranean and high-altitude habitats of the NMR and guinea pig, respectively . Neuroglobin and cytoglobin mRNA expression is elevated in two blind mole rat species compared with the rat under normoxia (Avivi et al., 2010). Importantly, neuroglobin is expressed in blind mole rat glial cells and neurons, whereas its expression is limited to neurons in surface-dwelling rodents, such as the rat, indicating an organwide protective function. Glial globin expression has also been   Figure S2. associated with improved oxygen delivery in hypoxia-tolerant seals (Schneuer et al., 2012) and shellfish (Kraus and Colacino, 1986), which is suggestive of a common adaptation with subterranean rodents. Overall, our data suggest that globins are constitutively and highly expressed in the brain of hypoxia-resistant rodents and play a major role in their ability to adapt to an oxygen-poor subterranean environment. Experiments are in progress to further dissect the expression and function of globins in African mole rats.
Potential Longevity-Associated Adaptations in the NMR and DMR Subterranean rodents have the highest maximum lifespans for their body weight, with species in both the Bathyergidae (e.g., the NMR and DMR) and Spalacidae (e.g., the blind mole rat) families living for over 20 years (Dammann and Burda, 2007). The NMR is the longest-lived rodent known, with a lifespan exceeding 30 years, while the longest-lived DMRs in our laboratories survived for 20 years. These rodents have a longevity quotient similar to that of humans and may show a comparable age-related disease pattern (Edrey et al., 2011). We compared the transcriptomes of the liver (a relatively homogeneous organ) of the NMR and DMR (Hystricognathi) with those of the short-lived rat and mouse (Muridae) ( Table S4). Compared with the mouse and rat, which spend considerable time aboveground, the NMR and DMR showed differential expression and enrichment of several genes associated with oxidoreduction. Two out of six peroxiredoxins (PRDX2 and PRDX5) were expressed at lower levels (Table S4) in NMR and DMR livers, which, together with reduced GPx1 activity, may result in increased levels of reactive oxygen species (ROS). These observations are consistent with reports of oxidative stress in the NMR (Andziak et al., 2006), and suggest that the long-lived NMR and DMR can thrive despite elevated oxidative stress.

Loss of FASTK, a Sensor of Mitochondrial Stress, in African Mole Rats
We found that the Fas-activated serine/threonine kinase gene (FASTK) is inactivated in both the NMR and DMR ( Figure S2D). FASTK encodes a kinase that serves as a regulator of Fasmediated apoptosis and is located at the inner mitochondrial membrane. FASTK is associated with cell survival and is overexpressed in tumors and immune-mediated inflammatory diseases such as asthma and AIDS, where it can delay the onset of apoptosis and contribute to pathogenesis. Knockdown of this gene results in reduced lung inflammation in mice (Simarro et al., 2010) and reduced oncogenic potential of cultured human cancer cells (Zhi et al., 2013). Chronic inflammation, cancer, and cellular senescence are intertwined in the pathogenesis of premature aging (Campisi et al., 2011). Furthermore, knockdown of FASTK is also associated with improved neuron elongation and regeneration (Loh et al., 2008). Both the neurons' ability to regenerate and their rate of elongation decrease with age. Loss of FASTK may help maintain neuronal integrity in long-lived mole rats, keeping their brains ''younger.'' Thus, the loss of FASTK in the NMR and DMR suggests a role for FASTK in the aging phenotype of somatic cells as well as in cancer resistance.

Divergent Insulin in African Mole Rats
It has been reported that NMR insulin cannot be detected using rodent antibody-based assays, similar to what was found in the guinea pig several decades ago (Chan et al., 1984;Kramer and Buffenstein, 2004). We found that the NMR, DMR, and other hystricognath rodents harbor a divergent insulin b-chain sequence ( Figure 3A). This finding is consistent with the observation that insulin in the South American hystricognath is rapidly evolving (Opazo et al., 2005). In the guinea pig and other South American hystricognaths, the regions encoding the a chain and, in particular, the b-chain are highly divergent, with concomitant alterations in insulin structure and reduced activity compared with most other mammals, and possibly an alternative receptor (King et al., 1983;Opazo et al., 2004). Mutations in the human b-chain result in reduced insulin processing, misfolding, and less effective insulin (based on receptor binding) (Liu et al., 2010). Interestingly, residue 22 of the b-chain, whose mutation (Arg22Gln) is associated with misfolding of insulin and diabetes (Liu et al., 2010), is uniquely changed in both African and South American hystricognaths ( Figure 3A). In the African crested porcupine, this residue was previously linked to an altered insulin structure with reduced affinity for insulin receptors (Horuk et al., 1980). We hypothesize that NMR and DMR insulin exists as a monomer with low insulin receptor activity that targets alternative receptor(s) outside classic insulin-responsive tissues such as liver, muscle, and adipose tissue.
Surprisingly, the NMR (Edrey et al., 2011) and South American hystricognaths (Opazo et al., 2004) are able to handle glucose in the absence of conventional insulin, suggesting that these animals have evolved compensatory mechanisms. In mammals, insulin is not secreted from the pancreas until after birth, and mice lacking insulin die a few days after birth due to acute diabetes mellitus (Duvillié et al., 1997). Until recently, it was unknown how glucose handling in the liver was achieved before birth. It has now been established that insulin growth factor 2 (IGF2), which has high homology to insulin, is abundantly expressed in the fetal liver and signals exclusively via the insulin receptor (IR) to maintain glycemia (Liang et al., 2010a). In most mammals, including mice and rats, IGF2 expression is downregulated after birth in the liver; however, primates and guinea pigs harbor residual IGF2 expression (Lui and Baron, 2013). We found that the NMR and DMR also express IGF2 and its binding protein, IGF2BP2, in the liver ( Figure 3B; Table S4). We hypothesize that autocrine/paracrine production of IGF2 in the liver substitutes for insulin and may partly mediate a fetal-like mode of glucose handling in hystricognath rodents ( Figure 3C).
Reduced levels of insulin are observed during calorie restriction and inhibition of the growth hormone/IGF1 axis, two manipulations that extend lifespan in various species (Blagosklonny, 2012). Interestingly, molecular innovations of this axis may contribute to the lifespan of the long-lived Brandt's bat (Seim et al., 2013). In addition to induction of IGF2 expression in NMR and DMR livers, we observed differential expression of genes associated with insulin signaling: decreased IGF1 and insulin induced gene 2 (INSIG2), and increased IGF1R and resistin (RETN) ( Table S4). Taken together, these results suggest that a less bioactive insulin and altered downstream signaling may partly explain the enhanced longevity of African mole rats and possibly other hystricognaths (e.g., porcupine and guinea pig). Our findings support the hypothesis that hystricognath rodents have evolved a distinct insulin peptide.
A recent study suggested that one potential explanation for mole rats' cancer resistance lies in the enzyme hyaluronan synthase 2 (HAS2) (Tian et al., 2013). Two amino acid residues in the HAS2 active site were reported to be unique to the NMR and hypothesized to result in the synthesis of high-molecular-mass hyaluronan (HMM-HA), an extracellular matrix polysaccharide. HMM-HA serves as an extracellular signal that results in induction of the tumor suppressor p16INK4A, early contact inhibition, and cancer resistance (Tian et al., 2013). We found that one of the unique amino acid changes in the NMR HAS2 sequence (Asn301Ser) is shared by the DMR, whereas Asn178Ser is unique to the NMR ( Figure S2E). In contrast to residue 178, Asn301Ser is present in a highly conserved region. Interestingly, the blind mole rat also secretes HMM-HA (Tian et al., 2013). Taken together, these data suggest that the DMR and the blind mole rat produce HMM-HA that confers cancer resistance. Surprisingly, a recent study found that HMM-HA does not influence the anticancer properties of blind mole rat fibroblasts (Manov et al., 2013), which supports the current evidence showing independent paths to cancer resistance in the blind mole rat (Azpurua and Seluanov, 2012). Future functional studies in mole rats are required to corroborate these observations. Unique Features of the NMR Although the NMR and DMR share a relatively recent common ancestor (26 Mya), the NMR has several exceptional features and is considered a most unusual mammal. Accelerated gene evolution among lineages could indicate an association between genetic changes and the evolution of traits (Qiu et al., 2012). Analysis of nonsynonymous-to-synonymous substitution (Ka/Ks) ratios of 9,367 1:1 orthologs of ten mammalian species revealed that the NMR was significantly enriched for several GO categories, including the respiratory electron transport chain, cell redox homeostasis, and response to oxidative stress ( Figure 4A; Table S6). To test whether genes in the rapidly evolving GO categories were under positive selection, we used a branch likelihood ratio test to identify positively selected genes in the NMR and DMR lineages (Table S3). Body Temperature Regulation Like other mammals, the DMR tightly controls its body temperature (stable at 35 C). The NMR, in contrast, lacks an insulating layer of fur and cannot maintain thermal homeostasis if it is housed on its own away from the warm confines of its humid burrows (Buffenstein and Yahav, 1991). Over the normal range of ambient temperatures encountered in their natural milieu, they are able to maintain body temperature and employ endothermic mechanisms to fine-tune body temperature. To accomplish this, it employs nonshivering thermogenesis, using large pads of brown adipose tissue interspersed between muscle (Hislop and Buffenstein, 1994). Thermogenin (uncoupling protein 1 [UCP1]) is the major protein used in this kind of heat generation. The NMR UCP1 harbors amino acid changes at the site regulated by fatty acids and nucleotides , whereas we find that the DMR sequence is typical of other mammals (Figure S3A). Thus, the altered UCP1 is an adaptation of the NMR rather than of mole rats in general, and it is strongly linked to ineffective thermogenesis.
Melatonin is a regulator of circadian rhythm and body temperature (Cagnacci et al., 1992). In rodents, there are two high-affinity receptors for melatonin: melatonin receptor 1a (MTNR1a) and MTNR1b. We found that the NMR is the only known ''natural'' MTNR1a and MTNR1b knockout animal (Figures S3B and S3C). Both the DMR and NMR lost MTNR1b, although the inactivating mutations are located in different positions ( Figure S3B). MTNR1b is also a pseudogene in the distantly related Siberian hamster (Phodopus sungorus) (Prendergast, 2010) and the Syrian hamster (Mesocricetus auratus; GenBank accession number AY145849). However, MTNR1a alone is sufficient to maintain photoperiod and melatonin responses in the Siberian hamster (Prendergast, 2010). Interestingly, MTNR1a is intact in the DMR but inactivated in the NMR ( Figure S3C). The lack of cognate melatonin receptors could contribute to the inability of NMR to adequately respond to fluctuating temperature. Pain Insensitivity C-fibers are small, unmyelinated axons associated with slow pain signaling in response to a range of external stimuli, which can be thermal, mechanical, or chemical. The NMR has fewer Cfibers than other rodents, including the DMR (St John Smith et al., 2012), and the C-fibers of the NMR's skin, eyes, and nose do not produce the pain-relaying neuropeptides substance P (SP) and calcitonin gene-related peptide (CGRP) (Park et al., 2003). It should be noted that there is not a complete loss of expression, as low levels of these neuropeptides can be found in internal organs (Park et al., 2003). It is currently not known how the expression of SP and CGRP is repressed in the sensory neurons of the NMR, but it was shown that NMRs receiving gene therapy with the SP-encoding preprotachykinin gene (TAC1) responded to pain induced by peripheral inflammation (Park et al., 2008). These observations suggest that the SP-encoding gene itself is altered in the NMR. The NMR TAC1 gene harbors an 8 bp deletion in its proximal promoter . The presence of the 8 bp region in the DMR ( Figure S3D) suggests that this region is associated with pain insensitivity.
The calcitonin gene CALCA is responsible for the synthesis of two distinct preprohormones by means of alternative splicing (Rosenfeld et al., 1981). Splicing into exon 4 results in calcitonin (CT), while splicing into exon 5 and the 3 0 untranslated exon 6 encodes the sensory neuropeptide CGRP ( Figure S3E). The splicing of CALCA is under tight endocrine control, and the regulation of CT and CGRP is complex and involves distal and proximal elements in the CALCA promoter, as well as a range of regulatory elements within and flanking exon 4 (van Oers et al., 1994). Our analysis revealed that there are unique deletions in NMR CALCA, including a conserved 6 bp region, in the 3 0 untranslated part of exon 4 ( Figure S3F). Given that splicing factors are known to be composite and context dependent (Wang and Burge, 2008), and that elements within exon 4 of CALCA regulate CT and CGRP isoform switching, the deleted region in exon 4 may disrupt the mutually exclusive tissue-specific splicing of CT exon 4 and CGRP exons 5-6, resulting in the observed lack of CGRP expression in sensory neurons of the NMR. In addition to unique changes in the NMR genes encoding SP and CGRP, genes associated with neurotransmission of pain were under positive selection in the NMR. This included the NMDA receptor NR2B (GRIN2B), the TRP channel TRPC5, and proenkephalin (PENK) ( Table S3). b-actin May Mediate Enhanced Oxidative Stress Resistance in the NMR Actins are highly conserved proteins involved in cell structure, motility, and integrity. The vertebrate actin family contains six genes, of which only the cytoplasmic actins, b-actin (ACTB) and g-actin (ACTG1), are ubiquitously expressed (Herman, 1993). The b-actin protein is highly conserved throughout evolution (Vandekerckhove and Weber, 1978). During oxidative stress, cysteine residues of actins can be oxidized, which is associated with depolymerization and altered regulatory protein interactions (Terman and Kashina, 2013). Increased ROS levels and actin overoxidation are symptomatic of senescence and diseases such as Alzheimer's (Aksenov et al., 2001). We observed that b-actin (ACTB) is under positive selection in the NMR (Table  S3). RNA-seq and synteny analysis ( Figure 4B) confirmed the identity of NMR ACTB. We found that both Cys272 and Ala230 of b-actin are converted to serine in the NMR (Figures 4C and  4D). Cys272 is highly redox sensitive and may serve as a ''redox sensor'' (Lassing et al., 2007). These observations suggest that NMR b-actin is more resistant to oxidation and may contribute to the longevity of the NMR, which can live at least 10 years longer than the DMR despite its exposure to high ROS levels and lower body mass (Lewis et al., 2013). The potential involvement of ACTB Cys272 in senescence and disease can now be Cell Reports 8, 1-11, September 11, 2014 ª2014 The Authors 7 evaluated more extensively in the NMR, an animal model that lacks this residue.

28S rRNA Processing in Evolutionary and Geographically Distant Hystricognath Rodents
We discovered that NMR rRNA did not display the typical banding pattern, i.e., 28S at 4.4 kb and 18S at 1.8 kb, during denaturing gel electrophoresis ( Figures 5A and 5B). In contrast, the DMR had the standard pattern ( Figure 5B). The unusual NMR pattern occurred in every tissue tested (ovary, kidney, liver, and brain), from separate animals and at any age tested (from 1 to 23 years old). A similar phenomenon, in which 28S rRNA is split into two subunits held together as a single 28S rRNA molecule by hydrogen bonding under native conditions, has been described in insects and plants (Winnebeck et al., 2010). The only vertebrates reported to produce shorter 28S rRNA are South America's tuco-tucos (Ctenomys) and the degu (Octodontomys gliroides) (Melen et al., 1999). We found that the ''break'' region is in the D6 domain of 28S. This NMR region corresponds to a cryptic GC-and simple repeat-rich intron in the Talas tuco-tuco (Ctenomys talarum) (Melen et al., 1999), and there is also a high degree of sequence conservation between these species ( Figure 5C). In tuco-tuco, an unknown site within this cryptic intron results in ''breakage'' of 28S rRNA molecules (Melen et al., 1999). The cryptic intron in the NMR may explain the banding pattern observed ( Figure 5D). Two other hystricognaths, the South American guinea pig and the African DMR, do not harbor the cryptic intron ( Figure S4). The data suggest that the cryptic intron and the resulting ''broken'' 28S rRNA were present in a common ancestor prior to the cross-Atlantic migration of small African hystricognath rodents to South America 41 Mya (Antoine et al., 2012). This proposition is consistent with the predicted ancestor of the NMR, DMR, and guinea pig ( Figure 1C). Following submission of our manuscript, Azpurua et al. (2013) also found the unusual 28S processing in NMR tissues and proposed that the unique NMR 28S may result in improved protein synthesis fidelity and a more stable proteome. The NMR thus holds promise as a model organism for investigating the mechanism and function of ''hidden breaks'' in rRNAs in a laboratory setting.

Conclusions
We performed genome sequencing and de novo assembly of two related subterranean rodents, the DMR and NMR. The transcriptomes of five subterranean rodents were also sequenced. African mole rats share a unique ecology and physiology. These animals are also remarkably long-lived for their size and are characterized by similar traits of cancer resistance, maintenance of neuronal integrity, altered insulin structure, and elevated brain globin. However, in many other traits, the use of the DMR genome pinpointed features responsible for the truly unusual characteristics of the NMR, including unusual thermogenesis, an aberrant melatonin system, pain insensitivity, and processing of rRNA. The genomes and transcriptomes can be further mined to provide insights into the fascinating biology of these animals.

EXPERIMENTAL PROCEDURES
See Supplemental Experimental Procedures for additional protocols.

Animals
A breeding colony of DMRs (Fukomys damarensis) was housed at the University of Illinois at Chicago. The DMR was known as Cryptomys damarensis prior to a recent subclassification into a new genus, Fukomys (Kock et al., 2006). The animals were sacrificed and DNA and RNA were isolated for subsequent sequencing. The genome sequenced was that of a 5-year-old male DMR. Liver and brain transcriptomes were obtained by sequencing individuals from the same colony. Animal experiments were approved by the University of Illinois at Chicago Institutional Animal Care and Use Committee. Species range maps were obtained and adapted from the World Wildlife Fund's WildFinder database (http://www.worldwildlife.org/pages/wildfinder).

DMR Genome Sequencing and Assembly
We employed a whole-genome shotgun strategy and next-generation sequencing technologies, using the Illumina HiSeq 2000 as the platform, to sequence the genome of a captive male DMR. We constructed 16 pairedend (PE) libraries with insert sizes of 250 bp, 500 bp, 800 bp, 2 kbp, 5 kbp, 10 kbp, and 20 kbp. In total, 229 Gbp (or 763) high-quality data, including 151 Gbp (or 503) short insert size reads, were generated (Table 1). The genome was de novo assembled by SOAPdenovo . Then, 151 Gpb (or 503) data from short-insert-size libraries (250-800 bp) were split into 63-mers and contigs, with unambiguous connections in de Bruijn graphs retained. All reads were aligned onto contigs for scaffold building using PE information. We used k-mer analysis  to estimate the genome size of the DMR. In this study, K was 17, K_num was 61,533,145,821, and K_depth was 22.5. Therefore, the DMR genome size was estimated to be 2.73 Gbp ( Figure S1A).

Assembly of the NMR Genome
To develop an improved assembly of the NMR genome, we used lastz (Harris, 2007), with the parameter ''M=60 Y=9400 T=2 --format=axt'', to align previously generated genome sequences  and additional sequence data to a recently released genome assembly from the Broad Institute (GenBank accession number AHKG00000000). ChainNet (Kent et al., 2003) was used to combine traditional alignments into larger structures. After the primary alignment was obtained, PE reads with insert sizes from 2 to 20 kb were mapped to the ''newly formed'' genome. A new NMR assembly with a scaffold N50 of 21.3 Mb was generated (Table 1).

Whole-Genome Heterozygosity Analysis
We aligned all high-quality, short-insert-size reads to the genome assembly using BWA ). Since the alignment results were stored in BAM/SAM format, we selected SAMtools, which is based on the Bayesian model, for variation analysis . After sorting alignments by the leftmost coordinates and removing potential PCR duplicates, we used SAMtools mpileup to call SNPs and short InDels. We rejected SNPs and InDels within reads with a depth that was either much lower or much higher than expected, since a large copy-number variation might lead to miscalling of SNPs. The sequencing depth ranged from 4 to 100 and the upper limit was approximately triple the sequencing depth. SNP miscalling due to alignment around short InDels and low-quality sequences was removed. We applied samtools.pl var-Filter, which can be found in the SAMtools package, as the filter tool with parameters -Q 20 -q 20 -d 4 -D 100 -S 20 -i 20 -N 5 -l 5 -W 5 -N 1.

Repeat Annotation
RepeatProteinMask and RepeatMasker (Tarailo-Graovac and Chen, 2009) were used to identify and classify transposable elements by aligning the DMR genome sequences against a library of known repeats, Repbase, with default parameters. The repeats obtained were combined together to form a list of nonredundant repeats of DMR. The same approach was used to identify repeats in related mammals, including the NMR.

ACCESSION NUMBERS
The DMR whole-genome shotgun project has been deposited in DDBJ/EMBL/ GenBank under accession code AYUG00000000. The version described in this paper is the first version, AYUG01000000. All short-read data have been deposited in the Short Read Archive under accession code SRA099445. Raw sequencing data of the transcriptome have been deposited in the Gene Expression Omnibus under accession code GSE50726. The red curve refers to the k-mer frequency distribution, and the green curve refers to the cumulative distribution of k-mer frequency. More than 25% high depth kmer(>255) may due to repeats in the genome. An unexpected peak at depth of 50 may result from segmental duplication or other repetitive elements. (B) Sequencing depth distribution of the DMR genome. High quality short insert size reads were mapped to the associated genome with an average depth of 34, and approximately 91.64% of the genome was covered by more than 10 reads. The red curve denotes the proportion of the genome in a given sequencing depth, and the blue curve shows the cumulative coverage of the genome.

DMR and NMR genome features and evolution
Prediction of protein-coding genes To predict genes in the DMR genome, we used both homology-based and de novo methods, and in addition, RNA-seq data were incorporated. For the homology-based prediction, human, mouse and rat proteins were downloaded from Ensembl (release 64) and mapped onto the genome using tBLASTn (Kent, 2002). Then, homologous genome sequences were aligned against the matching proteins using GeneWise (Birney et al., 2004) to define gene models. For de novo prediction, Augustus (Stanke, 2003) and GENSCAN (Salamov and Solovyev, 2000) were employed to predict coding genes, using appropriate parameters. RNA-seq data were mapped to genome using TopHat (Trapnell et al., 2009), and transcriptome-based gene structures were obtained by cufflinks (http://cufflinks.cbcb.umd.edu). Finally, homology-based, de novo derived and transcript gene sets were merged to form a comprehensive and nonredundant reference gene set using GLEAN (http://sourceforge.net/projects/gleangene), removing all genes with sequences less than 50 amino acid as well as those that only had the de novo support. We obtained a reference gene set that contained 22,179 DMR genes, and the parameter of DMR genes is comparable to other published mammals, including the related naked mole rat   (Table 1).
Functional annotation of genes Gene functions were assigned according to the best match of the alignments using BLASTp against SWISS-PROT databases (Bairoch and Apweiler, 1997). Gene Ontology IDs for each gene were obtained from the corresponding InterPro entry (Ashburner et al., 2000;Mulder and Apweiler, 2007). All genes were aligned against KEGG proteins, and the pathway in which the gene might be involved was derived from the matching genes in KEGG (Kanehisa and Goto, 2000).

Phylogenetic analysis
A gene family is a group of similar genes descended from a single gene in the last common ancestor of the targeted species. We used the TreeFam methodology (Li et al., 2006) to define gene families in 10 mammalian genomes (human, rhesus macaque, rabbit, mouse, rat, Chinese hamster, dog, guinea pig, DMR and NMR). We carried out the same pipeline and parameters that we used in our previously published work . In total, we obtained 19,839 gene families of which 6,133 are single-copy orthologous families. CDS sequences from each singlecopy family were aligned guided by MUSCLE (Edgar, 2004) alignments of protein sequences and concatenated to one super gene for each species. Next, RAxML (Stamatakis et al., 2005) was applied to build phylogenetic trees under JTT+gamma model. 1,000 bootstrap replicates were employed to assess the branch reliability in RAxML. Divergence time was estimated by PAML MCMCTree (Yang and Rannala, 2006) using 4d site sequences of 5,838 single copy genes ( Figure S1D).

Gene family expansion and contraction
To examine expansion and contraction of gene families in the DMR, we inferred the rate and direction of change of gene family size for DMR and a group of other mammals (human, rhesus, dog, NMR, rabbit, Chinese hamster, guinea pig, mouse and rat) using CAFÉ (De Bie et al., 2006), which is based on a stochastic birth-death model.

DMR gained and lost genes
To determine the orthologous relationship between DMR and human proteins, sequences of the human protein dataset were downloaded from Ensembl (release 64). The longest transcript was chosen to represent each gene with alternative splicing variants. We next subjected human and DMR proteins to BLASTp analysis with a similarity cutoff threshold E-value=1e-5. With the human protein set as a reference, we found the best hit for each DMR protein, with the criteria that more than 30% of the aligned sequence showed an identity above 30%. Reciprocal best-match pairs were defined as orthologs. Then, gene order information was used to filter out false positive orthologs caused by incorrect draft genome assembly and annotation. Orthologs not in gene synteny blocks were removed from further analysis. For example, in the case of 3 continuous genes in the human genome A, B and C, even if all three orthologs could be identified in both humans and DMR based on the cutoff threshold described above, but the B gene in the DMR genome was not located between A and C genes, it was filtered out, as it could be located in another scaffold or another place within the same scaffold. Using this method, we identified gene synteny relationships for human and DMR lineages.
Orthology information was obtained as described above. Since it showed synteny information at the protein level, it could be used to analyze gene-gain and -loss between human and DMR lineages. In the protein synteny blocks, if a human protein had no DMR ortholog, and excluding false positive predictions that could be caused by annotation or genome assembly (gap ≥ 5%), a protein could be defined as either being lost in the DMR lineage or gained in the human lineage. Using DMR as a reference to generate the orthology relationship, we applied this procedure to identify genes gained in the DMR lineage compared to the human lineage.

DMR pseudogenes
To detect homozygous pseudogenes in the DMR genome in silico, we first aligned all human genes (protein sequences downloaded from Ensembl (release 64) onto the DMR genome using BLASTp (with parameters -F F -e 1e-5). SOLAR (Sorting Out Local Alignment Results) ( Yu et al., 2006) was used to conjoin the fragmental alignments for each gene. Best hit regions of each gene with 5 kb flanking sequence were cut down and re-aligned with their corresponding human orthologous protein sequences using GeneWise (with the parameters "-genesf -tfor -quiet"), which helped define the detailed exon-intron structure of each gene (Birney et al., 2004). Genes with frameshifts and premature stop codons, as reported by GeneWise, were considered candidate pseudogenes. We further carried out a series of filtering processes: (i) To avoid the frameshifts and premature stop codons incorrectly reported due to flaws in the GeneWise algorithm, we also aligned all human proteins to their corresponding loci in the human genome using GeneWise. Genes with frameshifts and premature stop codons in the human-to-human alignment were filtered out; (ii) Using the results of the human-to-human alignment from GeneWise, candidate pseudogenes with obvious splicing errors near their frameshifts and premature stop codons were filtered out; (iii) Candidate pseudogenes with a low number of reads covering their frameshift and premature stop codon sites were considered assembly errors. In addition, cases with a considerable number of reads resulting from different genotypes at these sites were treated as heterozygous. Cases of assembly error and heterozygosis were filtered out.

Assessment for visual perception pseudogenes
To identify visual perception genes lost or pseudogenised in the DMR genome, we first employed GeneWise (Birney et al., 2004). To examine the visual perception pseudogenes, we estimated the rate ratio (ω) of non-synonymous to synonymous substitutions (Yang, 2007). The coding sequences of DMR visual perception genes were aligned with their human, mouse, rat and dog homologs using MUSCLE (Edgar, 2004). We used two branch models: Model H0, which considers all branches with the same ω, was used to compute the average ω; Model H1, which examined the DMR and other branches with different ω, was used to compute the ω of the DMR branch (ω2) and other branches (ω1). Then, the likelihood ratio test (LRT) was used to compute the p-value that rejected the model H0. If the p-value was less than 0.05, we could reject the model H0 and accept model H1.
Among the 13 identified visual perception genes, two were absent in the rat (RP1L1 and GRK7), but other genes were present in all examined mammals (human, mouse, rat and dog). Visual genes inactivated or lost in the DMR are indicated in Table S2).
Identification of proteins with unique amino acid changes in African mole rats To associate predicted DMR and NMR peptide sequences with human RefSeq IDs, as presented in UCSC multiway alignments, NCBI tBLASTn v2.2.26+ of the BLAST+ suite (Sayers et al., 2012) with an E-value cut-off set at 1e-5 was employed. The best match (>50% overall amino acid sequence identity along the entire sequence and spanning >75% of the length of the query sequence) was used to annotate the sequences. African mole rat proteins were aligned to orthologs from 38 vertebrate species using ClustalW v2.1 (Larkin et al., 2007). In addition to the Damaraland mole rat (Fukomys damarensis) and the and the naked mole rat (Heterocephalus glaber), the following organisms obtained via the UCSC multiway track (Miller et al., 2007)  In-house Perl scripts were used to parse the ClustalW output and identify unique amino acids. The Perl script scanned orthologous proteins for sites where types/groups of residues are shared by African mole rats only. The script groups residues into four groups: acidic (ED), basic (KHR), cysteine (C) and "other" (STYNQGAVLIFPMW). For example, it identifies cases where the NMR and DMR harbor E or D, whilst the other organisms contain basic, cysteine, or "other" residues at that particular site. Candidate proteins were manually examined for alignment quality and conservation of the region with amino acid changes.
The false positive rate of the detection of unique amino acids was estimated as follows: the error rate for SOAPdenovo assembly is assumed to be 1 nucleotide per 81,025 base pairs . Since only coding regions were used for amino acid conservation analysis, and the total predicted size of coding sequences (CDS) of the DMR and NMR is approximately 33 Mb, then 418 nucleotides could be considered as candidates for false positives. Potential tBLASTn misalignments are of minor importance since the parameters used were quite strict (E-value 1e-5). The unique amino acid method calls for a particular amino acid to be conserved among the tested vertebrate genomes, but differ in the genome-in-question (e.g. the DMR). In a test dataset, 3,287 amino acids were conserved among 36 genomes, out of 1,018,988 tested cases; therefore, only 0.32% of all sequences fit our search criteria. Taken together, with 418 candidates, one would expect a false positive rate of 1.33 per analysis; or in other words, one false positive per 315 unique amino acid residue candidates.
To expand on results pertaining to specific genes, we acquired genomic or transcriptome data of two additional African and two South-American hystricognath rodents: we generated brain RNA-seq data from the African mole rats Ansell's mole rat (Fukomys anselli, FA) and Mashona mole rat (Fukomys darlingi, FD). For comparison, we obtained the recently released genome sequence of the semisubterranean degu (Octodon degus) (GenBank accession code AJSA00000000.1), as well as brain transcriptome data from the subterranean coruro (Spalacopus cyanus) of South-America. We acknowledge the efforts of the Broad Institute in assembling the genome sequence of the degu.
Orthologous gene set for evolutionary analysis using PAML The orthologous sets of the 10 species from the phylogenetic tree were collected. For genes with alternative splice variants, the longest transcript was selected to represent the gene. Orthologs were identified using the method described in our previous study . MUSCLE (Edgar, 2004) was used for multi-protein sequence alignments of orthologs, using the human proteins as the reference. A series of filtering steps were performed on the alignments to assess alignment quality and conservation of exon-intron structure. Briefly, an alignment with more than 20% gaps or less than 50% identity between each protein and its human ortholog was identified as false positive and discarded. Next, the alignment was scanned exon-by-exon, and genes with more than 10% gaps or less than 60% identity, were manually checked against the genome sequences to validate whether these polymorphisms were real or caused by annotation errors. After these filtering processes, high quality alignments of protein sequences were retained and their alignments of associated codon sequences were used for further analysis. In total 9,367 1:1 ortholog alignments were obtained.

Identification of gene categories under accelerated evolution
The orthologous sets of the 10 mammals from the phylogenetic tree were used to identify proteins in each lineage that show accelerated evolution using a branch model implemented in the PAML (Phylogenetic Analysis by Maximum Likelihood) program. We aligned GO categories containing no fewer than 10 genes in the 10-species ortholog data set. PAML's free-ratio model (Yang, 2007) was used to calculate different ω ratios (also known as Ka/Ks) across every branch in the phylogenetic tree. To investigate whether fast-evolving genes in the NMR or DMR lineage were enriched for specific biological processes, a binomial test (P ≤ 0.05) was used to identify GO categories with putatively accelerated non-synonymous divergence (increased dN/dS ratios) in either the DMR lineage or the NMR lineage.
Identification of genes under positive selection To detect genes under positive selection in either the NMR or the DMR lineage, a series of evolutionary models were tested using PAML's branch model (Zhao et al., 2010). Briefly, the average ω across the tree (ω0), ω of the branch tested (ω2) and ω of all other branches (ω1) were determined with the following parameter settings: Codonfreq=2, kappa=2.5, initial omega=0.2. A chi-square test were used to determine if ω2 was significantly higher than ω1 and ω0, which implies that these genes may be fast-changed or fast-evolved in the appointed branch. After obtaining potential positively selected genes (ω2 > ω1 and p value ≤ 0.05 adjusted by the FDR method for multiple testing) (Benjamini and Yekutieli, 2001), we performed a final manual check of the alignment to confirm our results.

Expression analysis
Survey of gene expression in the brain To identify genes that could play a role in the adaptation to an underground environment, we compared the brain transcriptomes of five subterranean and three surface-dwelling ('aboveground') hystricognath rodents. Total RNA was isolated from brain tissue (frontal lobe region) of Fukomys damarensis (DMR, n=2), Fukomys anselli (FA, n=1), Fukomys darlingi (FD, n=1), and Spalacopus cyanus (coruro, n=1). All animals were from captive colonies and the experiments were approved by the respective Animal Care and Use Committees (the University of Illinois at Chicago in the case of DMR; the University of South Bohemia in the case of FA, FD and coruro).
RNA sequencing libraries were constructed using the Illumina mRNA-Seq Prep Kit as per the manufacturer's instructions. Briefly, oligo(dT) magnetic beads were used to purify mRNA molecules. Next, mRNA was fragmented and randomly primed during the first strand synthesis by reverse transcription. Double-stranded cDNA fragments were then obtained by second-strand synthesis with DNA polymerase I, and the double stranded cDNA was subjected to end-repair by Klenow and T4 DNA polymerases, and finally A-tailed by Klenow lacking exonuclease activity. Ligation to Illumina Paired-End Sequencing adapters, size selection by gel electrophoresis and then PCR amplification completed the library preparation procedure. 200 bp pairedend libraries were sequenced using Illumina HiSeq 2000 (90 bp at each end).
Because no reference genomes are available for FA, FD and the coruro, the brain transcriptomes were assembled de novo. We also assembled paired-end reads from published RNA-seq data from the naked mole rat Heterocephalus glaber (NMR, n=2)  (NCBI GEO accession no. GSE30337) and the South-American hystricognath rodents the domestic guinea pig Cavia porcellus (n=3) and the Brazilian guinea pig Cavia aperea (n=3), as well as a tame line of the brown rat (Rattus norvegicus; n=3) in the rodent family Muridae (Albert et al., 2012) (ArrayExpress accession no. E-MTAB-1249). With the exception of the NMR, all samples examined were males. Since many genes would be expected to change their expression between the surface-dwelling rodents the guinea pig (Rodentia, Hystricognathi) and the rat (Rodentia, Muridae) (Konopka et al., 2012), the present study included the rat as an outgroup in order to further narrow down the candidate driver genes for adaptation to a subterranean environment.
Reads were preprocessed to remove adapters and overrepresented sequences identified by FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). The software package Trimmomatic (Lohse et al., 2012) was used to trim Illumina reads falling below the established quality threshold (default settings) as well as to trim adapter sequences and remove unpaired reads. Paired reads were concatenated, keeping only reads with ≥36 bp length (after merging and adapter trimming). Next, transcriptome assembly was completed using the short-read de novo assembler Trinity (Grabherr et al., 2011). Abundance estimates were obtained by running RSEM (RNAseq by Expectation Maximization) separately for the left and right reads from each paired-end run (sample) (Haas et al., 2013;Li and Dewey, 2011). A database of mouse UniRef90 protein sequences (Wu et al., 2006) was interrogated using transcripts predicted by Trinity as a query (BLASTx E-value cut-off 1e-20) (Grabherr et al., 2011). Next, The RSEM output files (RSEM.genes.results) were parsed using a custom Perl script, retaining Trinity transcripts that aligned across at least 80% of the corresponding mouse protein sequence (Grabherr et al., 2011). We did not exclude any genes from the annotations during quantification (e.g. paralogs were kept), and did not modify the gene annotations or attempt to identify new genes using our data.
Differentially expressed genes were detected using the method described by Chen and colleagues (Chen et al., 2010), which is based on the Poisson distribution (Audic and Claverie, 1997) and normalization for differences in the RNA output size and sequencing depth between samples, as well as accounting for different gene lengths. Genes with at least a two-fold difference in expression between the subterranean rodents (DMR, FD, FD, NMR and coruro) compared to the mouse and guinea pigs, with a false discovery rate (FDR) ≤ 0.05, were defined as differentially expressed genes. The resulting gene set was manually curated to remove likely false positive calls (e.g., if FA has a very low RPKM value compared the other subterranean rodents and is similar to guinea pigs and the rat). While we appreciate that, unlike techniques such as in situ hybridization and laser microdissection-coupled RNA-seq, transcriptome analysis of a tissue pieces from a complex organ such as the brain may not correspond to a uniform cell population, we believe that this work provides a foundation for the study of genes that that could play a role in the adaptations to a subterranean environment.
Survey of globin gene expression response to short-term hypoxia DMR, NMR and rat (n=3) were exposed to short-term hypoxia (8 hours) at 8% O 2, the estimated oxygen condition in NMR tunnels (Bennet and Faulkes, 2000). Samples were sequenced, processed and analyzed exactly as detailed above. HIF3A, which displays rapid elevated systemic mRNA expression upon short-term hypoxia in rats (Heidbrederet al., 2003), was used to confirm a hypoxia response.
Survey of gene expression in the liver Total RNA was isolated from the liver from 3-to 7-year-old female and male DMRs. Transcriptome sequencing was performed as outlined above. Briefly, in addition to the DMR, liver RNA-seq data generated from three additional species were obtained: NRM  (NCBI GEO accession no. GSE30337), and the mouse and rat (Merkin et al., 2012) (NCBI GEO accession no. GSE41637). The RNA-seq data analyzed comprised at least three individuals from each species. Transcriptome reads were mapped to reference genomes using TopHat (Trapnell et al., 2009), and mapped reads were analyzed using in-house Perl scripts. We identified orthologs and obtained expression data using gene annotations spanning coding regions. Differentially expressed genes were detected as outlined above. Genes with at least a two-fold difference in expression between the NMR and DMR compared to the mouse and rat, with a false discovery rate (FDR) ≤ 0.05, were defined as differentially expressed genes.
Gene enrichment analysis Gene Ontology (GO) term analyses were performed using DAVID (Database for Annotation, Visualization and Integrated Discovery) (Huang da et al., 2009). Briefly, to test for enrichment we compared genes that were significantly differentially expressed in the subterranean rodents against DAVID's GO FAT database. The DAVID functional annotation tool categorizes GO terms and calculates an "enrichment score' or EASE score (a modified Fisher's exact test-derived p-value). Categories with smaller p-values (P < 0.01) (Kosiol et al., 2008;Qiu et al., 2012) and larger fold-enrichments (≥2.0) were considered interesting and most likely to convey biological meaning (Huang da et al., 2009).
Western blot analysis for hemoglobin α in rodent brain tissue NMR, mouse, rat and guinea pig brains (frontal lobes) were used for preparation of protein lysate. Tissues were rinsed in cold PBS to minimize blood contamination, homogenized in RIPA buffer (Abcam) with protease inhibitor (Sigma), centrifuged 10 min at 150,000 rpm and total protein concentration in soluble fraction was determined by Coomassie Protein Assay Reagent (Thermo Scientific). 50 µg of total protein from each tissue samples was resolved by SDS-PAGE on 10% Bis-Tris gel in non-reducing conditions and transferred onto a PVDF membrane (Invitrogen). Goat polyclonal antibody against mouse hemoglobin α-chain (sc-31333, Santa-Cruz) was used as a primary antibody in dilution 1:500 and donkey anti-goat IgG-HRP (horseradish peroxidase) (sc-2020, Santa-Cruz) was used as a secondary in dilution 1:1000. Monoclonal anti-β-actin antibody (A2228, Sigma) was used for loading control. Western blots were developed with Clarity Western ECL blot substrate (Bio-Rad) and visualized with ChemiDoc XRS+ imaging system (Bio-Rad).

Identification of the novel 28S RNA in the NMR
Denaturing agarose gel electrophoresis Total RNA from the NMR, DMR and mouse was isolated with a RNAqueous Midi total RNA isolation kit (Invitrogen) using 50-100 mg of frozen tissue. The resulting RNA samples (1 µg of total RNA) were electrophoresed on 1% denaturing agarose gels and stained with ethidium bromide. RNA Millennium Marker (Invitrogen) was used as a molecular weight marker.
Characterization of the 5` end of the 28S NMR fragment Total RNA from the NMR liver was extracted with TRIzol reagent (Invitrogen) according to the manufacturer`s protocol. A 5` RACE RNA adapter (5`-GCUGAUGGCGAUGAAUGAACACUGCGUUUGCUGGCUUUGAUGAAA-3`) was ligated with T4 RNA ligase 1 (NEB) in the following reaction mixture, which was incubated for 2 hours at 37 °C: 300 ng total RNA, 50 pmol 5` RACE adapter 1 µl PEG 8000, 1 µl buffer, 1 µl T4 RNA ligase 1 enzyme, and water to 10 µl. The reaction product was purified by phenol-chloroform method and precipitated by ethanol, and the resulting RNA was used for PCR using AccuPrime polymerase (Invitrogen) with primers 5RACEouter (5`-GCTGATGGCGATGAATGAACACTG-3`) and D6R (5`-GACTGACCCATGTTCAACTGCTGT-3`). Cycling conditions included 25 cycles at 60° for 30 sec, with elongation at 72° over 20 sec. PCR products were purified using a QIAGEN purification kit, subcloned into pGEM-T Easy (Promega) and transformed into DH5α E.coli strain (Invitrogen). Colonies were grown in LB media, and the plasmids purified with a QIAGEN miniprep kit and sequenced by capillary electrophoresis.
Characterization of the 3` end of the 28S NMR RNA fragment 300 ng of total RNA from the NMR liver was treated with polyA polymerase (NEB) with 1 mM ATP to attach a polyA tail to ribosomal RNA fragments. After ethanol precipitation, reverse transcription and second strand synthesis was set up using Invitrogen`s SuperScript Double-Stranded cDNA Synthesis kit. DNA was purified by phenol-chloroform and precipitated with ethanol. DNA ends were blunted with Klenow Polymerase (Fermentas) according to the manufacturer`s suggestions. After blunting, DNA fragments were phosphorylated by T4 kinase to enable subcloning into blunted pGEM-T Easy vector (Promega). Plasmids were prepared for sequencing as described above.