Elsevier

Gene

Volume 643, 15 February 2018, Pages 124-132
Gene

Research paper
Genome-wide mining of perfect microsatellites and tetranucleotide orthologous microsatellites estimates in six primate species

https://doi.org/10.1016/j.gene.2017.12.008Get rights and content

Highlights

  • The perfect microsatellites distribution characteristics and composition mode of six primate species were comparatively analyzed for the first time.

  • The correlation analysis of microsatellite parameters with biometrical method involving the SSR number, relative abundance, density and GC content were described in this study.

  • We reconstructed the phylogenetic tree of six primate species and tree shrew based on 13 mitochondrial protein-coding genes and the evolution mode of single copy and orthologous tetranucleotide microsatellites were analyzed based on the divergence time.

Abstract

Advancement in genome sequencing and in silico mining tools have provided new opportunities for comparative primate genomics of microsatellites. The SSRs (simple sequence repeats) numbers were not correlated with the genome size (Pearson, r = 0.310, p = 0.550), and were positively correlated with the total length of SSRs (Pearson, r = 0.992, p = 0.00). A total of 224,289 tetranucleotide orthologous microsatellites families and 367 single-copy orthologous SSRs loci were found in six primate species by homologous alignment. The inner mutation types of single-copy orthologous SSRs loci included the copy number variance, point mutation, and chromosomal translocation. The accumulated repeat times and average length of tetranucleotide orthologous microsatellites in Rhinopithecus roxellana, Papio anubis and Macaca mulatta were longer than Homo sapiens and Pan troglodytes, which showed the tetranucleotide orthologous SSRs loci had more repeat times and longer average length on the branches with earlier divergence time, one exception may be Microcebus murinus as a primitive monkey with a smallest morphology in Malagasy. Our conclusion indicated that single-copy tetranucleotide orthologous SSRs sequences accumulated individual mutation more slowly through time in H. sapiens and P. troglodytes than in R. roxellanae, P. anubis and M. mulatta. However, such divergence wouldn't arise uniformly in all branches of the primate tree. A comparison of genomic sequence assemblages would offer remarkable insights about comparisons and contrasts, and the evolutionary processes of the microsatellites involved in human and nonhuman primate species.

Introduction

Microsatellites, also called simple sequence repeats (SSRs) and short tandem repeats (STRs), are special DNA or RNA sequences with repeated units 1–6 nucleotides in prokaryotic and eukaryotic genomes (Tautz, 1993, Field and Wills, 1996). SSRs have been indicated to play an active role in chromatin fractions, gene expression and regulation, and transcription and protein function (Mrázek et al., 2007, Kashi and King, 2006). With the strand slippage and unequal recombination leading to insertions/deletions of one to several repeat units (Deback et al., 2009), such high level instability makes them attractive polymorphic molecular markers. Microsatellite markers have a variety of applications, and are used in analyses of genetic diversity, linkage map construction, and quantitative trait locus (QTL). This is due to their codominant inheritance, reproducibility, multiallelic nature, relative abundance, and good genome coverage (Ellegren, 2004, Varshney et al., 2005).

However, the different protocols of microsatellite isolation such as traditional magnetic beads enrichment, RAPD (Random amplification of polymorphic DNA), primer extension, selective hybridization, FIASCO and EST-SSR are time consuming, labor intensive, or costly (Zane et al., 2002). Now they are being replaced by in silico mining of SSRs loci from DNA-sequence databases (Qi et al., 2015). Advanced technologies for large-scale genome sequencing and the development of in silico mining microsatellites tools (i.e., SciRoko, Krait, and MSDB) have opened new avenues to execute genome-wide scans for a large number of microsatellites (Rogers and Gibbs, 2014, Sharma et al., 2007, Du et al., 2017). The appearance of whole genome sequences across a diversity of taxa now presents new possibilities to explore both basic distribution patterns, diversity across entire genomes, and begins to address fundamental aspects of their evolution in a comparative setting (Pannebakker et al., 2010).

Genome-wide microsatellites have been carried out in different species. The results of which suggest that SSRs numbers in protein-coding sequences are less than intronic and intergenic regions (Wang et al., 2016, Huang et al., 2016). An interesting point is that microsatellites are more abundant in coding regions than in non-coding regions in some prokaryotes and viral genomes due to higher coding density (Alam et al., 2013). Expansion of microsatellites length may affect the gene regulation, transcription, and protein function of coding sequences (CDS), especially for trinucleotide repeats, which are associated with human diseases, i.e., FRA X loci (CGG repeats), Hdh loci (CAG repeats), AR loci (CAG repeats), DMPK loci (CTG repeats), MJD1 loci (CAG repeats), FRA IIB loci (CGG repeats) and SCA1 loci (CAG repeats) (Xu et al., 2016).

Comparative genomic microsatellite studies mostly focus on perfect microsatellites, and have found that the composition characteristics and distribution patterns differ in mammals, invertebrates, and plants (Grover et al., 2007, Sonah et al., 2011). Based on the presence of interruptions, microsatellite types may also be compound (more than two microsatellites adjacent to each other), interrupted, or interrupted compound. Past studies found the composition of compound microsatellites constituted ~ 11.1% and ~ 11.4% of SSRs in H. sapiens and M. mulatta genomes, respectively. Other eukaryotic genomes like D. melanogaster, G. gallus, O. anatinus, R. norvegicus, M. musculus, D. rerio have 3.3% ~ 25.3% of compound microsatellites in their genomes (Kofler et al., 2008). Next-generation sequencing (NGS) is a rapid and efficient method of searching microsatellites from transcriptomes. The transcriptome-derived microsatellite markers could easily attach to corresponding genes as putative functional markers, which play an important role in biomedical studies and genetic analysis (Richards et al., 2013, Song et al., 2016).

Microsatellite variation within species can be assessed by allele size and the number of repetitions (Forbes et al., 1995). Generally, homologous microsatellite loci can be taken as evidence for directional microsatellite evolution and show a difference in evolution rate between species. The length of repeats are weakly correlated with SSRs number in several species, indicating that variation rate may be correlated with repeat size, even though the structure of the repeat may also be important (Ostrander et al., 1993, Pépin et al., 1995). Significant excesses of longer human alleles were observed when compared to gorillas (Gorilla gorilla), orangutans (Pongo pygmmaeus), chimpanzees (Pan troglodytes), and macaques (Macaca mulatta) using trinucleotide and dinucleotide repeats randomly chosen from the primer bank (Rubinsztein et al., 1995). Tetranucleotide repeats were longer in humans than their orthologous loci in chimpanzees by data set constructing, and the differences in the interspecific lengths may also be attributed to results biased selection of the loci associated with cloning procedures (Webster et al., 2002). Comparisons of microsatellite evolution between species could also be based on reciprocal analyses or on genuinely random selection (Ellegren et al., 1997).

Herein, this study considered the distribution and tetranucleotide orthologous SSRs in six primate genomes. Six primate species with complete sequencing that included Rhinopithecus roxellanae, Papio anubis, Macaca mulatta, Homo sapiens, Pan troglodytes, Microcebus murinus were used to comparatively analyze their SSRs numbers, relative abundance, relative density and GC content. This may provide a better insight into the microsatellite distribution in a range of taxa (Qi et al., 2015, Huang et al., 2016). The preliminary estimates and analysis of tetranucleotide orthologous SSRs loci provide fundamental data for further analysis on microsatellite evolution of primate species.

Section snippets

Sequences sources

The available genome sequences of six primate species including R. roxellanae (2n = 44), P. anubis (2n = 42), M. mulatta (2n = 42), H. sapiens (2n = 46), P. troglodytes (2n = 48) and M. murinus were downloaded from NCBI and saved as FASTA formats. The accession of genome numbers are as follows: GCA_000769185.1, GCA_000264685.1, GCA_000230795.1, GCA_000306695.2, GCA_000090855.1 and GCA_000165445.3. The accession numbers of mitochondrial genome are KM504390.1, KC757406.1, KJ567051.1, KX675270.1,

Overview of the six primate genomes

We examined the distribution of perfect 1-6 bp microsatellites in six primate species using default parameters of MSDB, which would eliminate the errors resulted from different search criteria. The total numbers of microsatellites appeared highly variable ranging from 594,127 to 1,368,292, and the total SSRs length covered from 0.4% to 0.98% in six primate genomes. The maximum number and highest relative abundance of SSRs were found in R. roxellanae (471.91No./Mb), followed by P. anubis (440.71

Diversity of SSRs distribution in six primate species genomes

We found that, on average, there would be one SSRs locus ranged from every 2.11 kb to 4.88 kb in six primate species genomes. In particular, the SSRs locus of M. murinus was much sparse under the circumstance of similar genome size with its related species. All of the primate species, except for P. troglodytes, showed the similar distribution patterns on SSRs number and relative abundance: mono- > di- > tetra- > tri- > penta- > hexanucleotide repeats, which was inconsistent with the bovid species (Qi et

Acknowledgments

This work was supported by the National Key Programme of Research and Development, Ministry of Science and Technology (2016YFC0503200) and National Natural Science Foundation of China (No. 31530068). We would like to acknowledge the anonymous reviewers for their valuable comments and suggestions to improve the manuscript. We also thank Dr Castiglioni for the critical reading and English editing.

References (58)

  • C. Deback et al.

    Utilization of microsatellite polymorphism for differentiating herpes simplex virus type 1 strains

    J. Clin. Microbiol.

    (2009)
  • L.M. Du et al.

    MSDB: a user-friendly program for reporting distribution and building databases of microsatellites from genome sequences

    J. Hered.

    (2013)
  • L.M. Du et al.

    Krait: an ultrafast tool for genome-wide survey of microsatellites and primer design

    Bioinformatics

    (2017)
  • H. Ellegren

    Microsatellites: simple sequences with complex evolution

    Nat. Rev. Genet.

    (2004)
  • H. Ellegren et al.

    Microsatellite evolution—a reciprocal study of repeat lengths at homologous loci in cattle and sheep

    Mol. Biol. Evol.

    (1997)
  • D. Field et al.

    Long, polymorphic microsatellites in simple organisms

    Proc. R. Soc. Lond. B Biol. Sci.

    (1996)
  • S.H. Forbes et al.

    Microsatellite evolution in congeneric mammals: domestic and bighorn sheep

    Mol. Biol. Evol.

    (1995)
  • A. Grover et al.

    Biased distribution of microsatellite motifs in the rice genome

    Mol. Gen. Genomics.

    (2007)
  • J. Huang et al.

    Distribution regularities of microsatellites in the Gallus gallus genome

    Sichuan J. Zool.

    (2012)
  • J. Huang et al.

    Genome-wide survey and analysis of microsatellites in giant panda (Ailuropoda melanoleuca), with a focus on the applications of a novel microsatellite marker system

    BMC Genomics

    (2015)
  • J. Huang et al.

    Characteristics of microsatellites in arborophila rufipectus genome sequences using 454 gs flx

    Sichuan J. Zool.

    (2015)
  • J.E. Janečka et al.

    Molecular and genomic data identify the closest living relative of primates

    Science

    (2007)
  • M.V. Katti et al.

    Differential distribution of simple sequence repeats in eukaryotic genome sequences

    Mol. Biol. Evol.

    (2001)
  • Y.D. Kelkar et al.

    The genome-wide determinants of human and chimpanzee microsatellite evolution

    Genome Res.

    (2008)
  • R. Kofler et al.

    Survey of microsatellite clustering in eight fully sequenced species sheds light on the origin of compound microsatellites

    BMC Genomics

    (2008)
  • S.P. Kumpatla et al.

    Mining and survey of simple sequence repeats in expressed sequence tags of dicotyledonous species

    Genome Res.

    (2005)
  • L. Li et al.

    Orthomcl: identification of ortholog groups for eukaryotic genomes

    Genome Res.

    (2003)
  • W.J. Li et al.

    Comparative analysis of microsatellite sequences distribution in the genome of giant panda and polar bear. Sichuan

    J. Zool.

    (2014)
  • K. Lindblad-Toh et al.

    A high-resolution map of human evolutionary constraint using 29 mammals

    Nature

    (2011)
  • Cited by (0)

    1

    Co-first author.

    View full text