Original research
The genome-wide landscape of small insertion and deletion mutations in Monopterus albus

https://doi.org/10.1016/j.jgg.2019.02.002Get rights and content

Abstract

Insertion and deletion (indel) mutations, which can trigger single nucleotide substitutions on the flanking regions of genes, may generate abundant materials for disease defense, reproduction, species survival and evolution. However, genetic and evolutionary mechanisms of indels remain elusive. We establish a comparative genome-transcriptome-alignment approach for a large-scale identification of indels in Monopterus population. Over 2000 indels in 1738 indel genes, including 1–21 bp deletions and 1–15 bp insertions, were detected. Each indel gene had ∼1.1 deletions/insertions, and 2–4 alleles in population. Frequencies of deletions were prominently higher than those of insertions on both genome and population levels. Most of the indels led to in frame mutations with multiples of three and majorly occurred in non-domain regions, indicating functional constraint or tolerance of the indels. All indel genes showed higher expression levels than non-indel genes during sex reversal. Slide window analysis of global expression levels in gonads showed a significant positive correlation with indel density in the genome. Moreover, indel genes were evolutionarily conserved and evolved slowly compared to non-indel genes. Notably, population genetic structure of indels revealed divergent evolution of Monopterus population, as bottleneck effect of biogeographic isolation by Taiwan Strait, China.

Introduction

The swamp eel (Monopterus albus), a fresh-water teleost fish, taxonomically belongs to the family Synbranchidae of the order Synbranchiformes. The fish is distributed mainly in tropical and subtropical freshwater areas of Asia, Indo-Malayan archipelago and northern Australia. It is also present in western Africa, Mexico, Central and South America (Collins et al., 2002; Joseph, 2006). Swamp eel is not only an economically important species for fish production as its high nutritional and potential medical values, but also an emerging model species in development, genetics and evolution (Cheng et al., 2003). It has a characteristic feature of sex reversal from female to male via intersex during its life (Liu, 1944; Bullough, 1947), and also a small genome size (806 Mb) (Zhao et al., 2018). As a wide geographic distribution and attractive biological features of the species, its speciation process may provide insights into vertebrate evolution. However, population genetic structure of the species remains unknown.

Genetic variation often occurs to adapt to biogeographically various and changeable ecosystems, which are mainly divided into structural variations (SVs) (>100 bp) and simple nucleotide variations (SNVs) (<100 bp) (Gonzaga-Jauregui et al., 2012). SVs include small copy number variants (CNVs) (100 bp‒1 kb), large CNVs (>1 kb), and inversions, while SNVs consist of SNPs and small insertions and deletions (indels). SNPs have a highly polymorphic information content, which were mainly used as a genome-wide tool in GWAS (Lee et al., 2017), QTL mapping of quantitative traits (El-Soda et al., 2014) and selective sweep analysis (Li et al., 2013). Small indels are involved in duplicated gene evolution (Guo et al., 2012), tRNA origination (Zuo et al., 2013), genome structure variations (Tian et al., 2008), cellular lineage and gene expression (Imielinski et al., 2017). All these variations are usually detected from alignments of different genomes or transcriptomes with reference genome. However, microsatellite, which is a tract of tandem nucleotide repeated motifs (one to six or up to ten nucleotides, and typically repeated 5–50 times) (Toth et al., 2000), can be easily identified through just the information of one genome. Although expansions occur mainly in noncoding regions, microsatellites are associated with important biological roles, such as expression regulation (Bilgin Sonay et al., 2015). The genes that contain microsatellites in their promoters have higher expression divergence compared to genes with fixed or no microsatellite in their promoters (Bilgin Sonay et al., 2015). Thus, identification of genetic variations is a fundamental task for understanding of genome structure, functions and evolution.

Mutations are mostly arisen from replication error, transposition and DNA damage. In addition, transcription error is another factor for mutation occurrence (Datta and Jinksrobertson, 1995; Aguilera, 2002; Svejstrup, 2002; Hanawalt and Spivak, 2008; Kim and Jinks-Robertson, 2012; Park et al., 2012). In the process of transcription, the non-template strand is exposed as a single-stranded DNA (ssDNA) which is convenient for mutation (Kim and Jinks-Robertson, 2012). Transcription-associated supercoiling and collision between replication forks and transcription machinery may also lead to mutations (Kim and Jinks-Robertson, 2012). Because these mutations were detected mainly based on SNPs, it remains unclear whether transcription has a genome-wide role on indel formation. In addition, indels could be probably generated by unequal crossover during DNA duplication. Nevertheless, genetic and evolutionary mechanisms of indels remain elusive.

In this study, we applied a comparative genome-transcriptome-alignment approach, and obtained thousands of short indels in coding genes in the genome of swamp eel. We observed more deletions than insertions on the genome and population levels. Most of the indels occurred in non-domain regions. Sizes of indels mainly were multiples of three, which led to in frame mutations. Compared to non-indel genes, indel genes had higher expression levels and lower evolution rates. We further indicated that divergent evolution has occurred in the swamp eel populations. These data reveal not only genome-wide landscape of indels in the genome of swamp eel, but also provide new genetic markers for further analysis of population genetic structure and evolution of the species.

Section snippets

Landscape of indel genes in the Monopterus genome

To mine indel genes involved in developmental processes at a genome-wide level, we developed a comparative genome-transcriptome-alignment approach and analyzed genome-wide indels. Three gonadal transcriptomes were sequenced using conventional non-strand-specific method and de novo assembled by Trinity (Grabherr et al., 2011; Haas et al., 2013). We finally obtained 74,181 (ovary), 82,497 (ovotestis) and 146,468 (testis) contigs (Table S1). All contigs were aligned to reference genome (Fig. 1A),

Discussion

Genetic polymorphisms can generate various genotypes/phenotypes, thus benefit disease defense, reproduction, species survival and evolution. However, genetic and evolutionary mechanisms of genetic polymorphisms are not well understood. In the present study, we have established a comparative genome-transcriptome-alignment approach for a large-scale identification of indels in the genome of swamp eel. We displayed a genome-wide landscape of indel mutations, and characterized features of indel

Samples and DNA extraction

Muscle samples from swamp eel (n = 515) were collected from different geographic locations of China. Samples came from seven subpopulations: Dazhou (35 females, 2 intersexes and 21 males), Chongqing (37 females, 3 intersexes, and 31 males), Wuhan (81 females, 8 intersexes, and 45 males), Hefei (35 females, 19 intersexes, and 48 males), Nanjing (11 females, 4 intersexes, and 30 males), Sanming (28 females, 6 intersexes, and 16 males), and Taiwan (19 females, 24 intersexes, 2 males, and 10

Acknowledgments

This work was supported by the National Natural Science Foundation of China (31571280 and 31771370), National Key Technologies R&D Program and Hubei Province Science and Technology project.

References (56)

  • M. El-Soda et al.

    Genotypex environment interaction QTL mapping in plants: lessons from Arabidopsis

    Trends Plant Sci.

    (2014)
  • D. Weber et al.

    An empirical genetic assessment of the severity of the northern elephant seal population bottleneck

    Curr. Biol.

    (2000)
  • A. Aguilera

    The connection between transcription and genomic instability

    EMBO J.

    (2002)
  • M.J. Benton et al.

    Paleontological evidence to date the tree of life

    Mol. Biol. Evol.

    (2007)
  • T. Bilgin Sonay et al.

    Tandem repeat variation in human and great ape populations and its impact on gene expression divergence

    Genome Res.

    (2015)
  • W.S. Bullough

    Hermaphroditism in the lower vertebrates

    Nature

    (1947)
  • H.H. Cheng et al.

    The rice field eel as a model system for vertebrate sexual development

    Cytogenet. Genome Res.

    (2003)
  • T.M. Collins et al.

    Genetic diversity in a morphologically conservative invasive taxon: multiple introductions of swamp eels to the Southeastern United States

    Conserv. Biol.

    (2002)
  • A. Datta et al.

    Association of increased spontaneous mutation-rates with high-levels of transcription in yeast

    Science

    (1995)
  • C.A. Driscoll et al.

    From wild animals to domestic pets, an evolutionary view of domestication

    Proc. Natl. Acad. Sci. U. S. A.

    (2009)
  • D.A. Earl et al.

    STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method

    Conserv. Genet. Resour.

    (2012)
  • G. Evanno et al.

    Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study

    Mol. Ecol.

    (2005)
  • G.Y. Fan et al.

    Detecting a hierarchical genetic population structure via Multi-InDel markers on the X chromosome

    Sci. Rep.

    (2016)
  • R.D. Finn et al.

    The Pfam protein families database: towards a more sustainable future

    Nucleic Acids Res.

    (2016)
  • C. Gonzaga-Jauregui et al.

    Human genome sequencing in health and disease

    Annu. Rev. Med.

    (2012)
  • J.C. Gower

    Some distance properties of latent root and vector methods used in multivariate analysis

    Biometrika

    (1966)
  • M.G. Grabherr et al.

    Full-length transcriptome assembly from RNA-Seq data without a reference genome

    Nat. Biotechnol.

    (2011)
  • B.C. Guo et al.

    Pervasive indels and their evolutionary dynamics after the fish-specific genome duplication

    Mol. Biol. Evol.

    (2012)
  • J. Guo et al.

    MutSbeta promotes trinucleotide repeat expansion by recruiting DNA polymerase beta to nascent (CAG)n or (CTG)n hairpins for error-prone DNA synthesis

    Cell Res.

    (2016)
  • B.J. Haas et al.

    De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis

    Nat. Protoc.

    (2013)
  • P.C. Hanawalt et al.

    Transcription-coupled DNA repair: two decades of progress and surprises

    Nat. Rev. Mol. Cell Biol.

    (2008)
  • Y. He et al.

    Gonadal apoptosis during sex reversal of the rice field eel: implications for an evolutionarily conserved role of the molecular chaperone heat shock protein 10

    J. Exp. Zool. Part B.

    (2010)
  • D.W. Huang et al.

    Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources

    Nat. Protoc.

    (2009)
  • J.P. Huelsenbeck et al.

    MRBAYES: Bayesian inference of phylogenetic trees

    Bioinformatics

    (2001)
  • M. Imielinski et al.

    Insertions and deletions target lineage-defining genes in human cancers

    Cell

    (2017)
  • S.N. Joseph

    Fishes of the World

    (2006)
  • S.T. Kalinowski et al.

    Revising how the computer program CERVUS accommodates genotyping error increases success in paternity assignment

    Mol. Ecol.

    (2010)
  • W.J. Kent

    BLAT - the BLAST-like alignment tool

    Genome Res.

    (2002)
  • Cited by (3)

    • A comparative transcriptome analysis focusing on immune responses of Asian swamp eel following infection with Aeromonas hydrophila

      2021, Aquaculture
      Citation Excerpt :

      Comparative transcriptome analysis is a robust approach for assessing transcriptional responses to different challenging conditions (Sun et al., 2012; Jiang et al., 2016), but studies on the transcriptome of M. albus in response to A. hydrophila infection are lacking. The complete genome of M. albus has been assembled, and it includes 12 chromosomes despite being relatively small (689 Mb) (Zhao et al., 2018; Chen et al., 2019), making it an ideal reference genome for parametric analysis. In the present study, transcriptome sequencing was performed using an Illumina HiSeq 4000 sequencing platform.

    • Blood cell identification and hematological analysis during natural sex reversal in rice field eel (Monopterus albus)

      2021, Aquaculture
      Citation Excerpt :

      Among them, Monopterus albus, a vertebrate of the family Osteichthyes, Actinopterygii, Symbranchiformes, Synbranchidae and the Monopterus, has such a phenomenon of sex reversal (Chan and Phillips, 1969). It is protogynous hermaphroditism: it begins sexual life as a female, goes through an intersexual phase of hermaphroditism, and then develops into a final male stage (Chen et al., 2019a). Although the natural inversion of M. albus was first reported in 1944 (Liu 1944) and the discussion three years later was published in Nature. (

    View full text