Research paperGenome-wide mining of perfect microsatellites and tetranucleotide orthologous microsatellites estimates in six primate species
Introduction
Microsatellites, also called simple sequence repeats (SSRs) and short tandem repeats (STRs), are special DNA or RNA sequences with repeated units 1–6 nucleotides in prokaryotic and eukaryotic genomes (Tautz, 1993, Field and Wills, 1996). SSRs have been indicated to play an active role in chromatin fractions, gene expression and regulation, and transcription and protein function (Mrázek et al., 2007, Kashi and King, 2006). With the strand slippage and unequal recombination leading to insertions/deletions of one to several repeat units (Deback et al., 2009), such high level instability makes them attractive polymorphic molecular markers. Microsatellite markers have a variety of applications, and are used in analyses of genetic diversity, linkage map construction, and quantitative trait locus (QTL). This is due to their codominant inheritance, reproducibility, multiallelic nature, relative abundance, and good genome coverage (Ellegren, 2004, Varshney et al., 2005).
However, the different protocols of microsatellite isolation such as traditional magnetic beads enrichment, RAPD (Random amplification of polymorphic DNA), primer extension, selective hybridization, FIASCO and EST-SSR are time consuming, labor intensive, or costly (Zane et al., 2002). Now they are being replaced by in silico mining of SSRs loci from DNA-sequence databases (Qi et al., 2015). Advanced technologies for large-scale genome sequencing and the development of in silico mining microsatellites tools (i.e., SciRoko, Krait, and MSDB) have opened new avenues to execute genome-wide scans for a large number of microsatellites (Rogers and Gibbs, 2014, Sharma et al., 2007, Du et al., 2017). The appearance of whole genome sequences across a diversity of taxa now presents new possibilities to explore both basic distribution patterns, diversity across entire genomes, and begins to address fundamental aspects of their evolution in a comparative setting (Pannebakker et al., 2010).
Genome-wide microsatellites have been carried out in different species. The results of which suggest that SSRs numbers in protein-coding sequences are less than intronic and intergenic regions (Wang et al., 2016, Huang et al., 2016). An interesting point is that microsatellites are more abundant in coding regions than in non-coding regions in some prokaryotes and viral genomes due to higher coding density (Alam et al., 2013). Expansion of microsatellites length may affect the gene regulation, transcription, and protein function of coding sequences (CDS), especially for trinucleotide repeats, which are associated with human diseases, i.e., FRA X loci (CGG repeats), Hdh loci (CAG repeats), AR loci (CAG repeats), DMPK loci (CTG repeats), MJD1 loci (CAG repeats), FRA IIB loci (CGG repeats) and SCA1 loci (CAG repeats) (Xu et al., 2016).
Comparative genomic microsatellite studies mostly focus on perfect microsatellites, and have found that the composition characteristics and distribution patterns differ in mammals, invertebrates, and plants (Grover et al., 2007, Sonah et al., 2011). Based on the presence of interruptions, microsatellite types may also be compound (more than two microsatellites adjacent to each other), interrupted, or interrupted compound. Past studies found the composition of compound microsatellites constituted ~ 11.1% and ~ 11.4% of SSRs in H. sapiens and M. mulatta genomes, respectively. Other eukaryotic genomes like D. melanogaster, G. gallus, O. anatinus, R. norvegicus, M. musculus, D. rerio have 3.3% ~ 25.3% of compound microsatellites in their genomes (Kofler et al., 2008). Next-generation sequencing (NGS) is a rapid and efficient method of searching microsatellites from transcriptomes. The transcriptome-derived microsatellite markers could easily attach to corresponding genes as putative functional markers, which play an important role in biomedical studies and genetic analysis (Richards et al., 2013, Song et al., 2016).
Microsatellite variation within species can be assessed by allele size and the number of repetitions (Forbes et al., 1995). Generally, homologous microsatellite loci can be taken as evidence for directional microsatellite evolution and show a difference in evolution rate between species. The length of repeats are weakly correlated with SSRs number in several species, indicating that variation rate may be correlated with repeat size, even though the structure of the repeat may also be important (Ostrander et al., 1993, Pépin et al., 1995). Significant excesses of longer human alleles were observed when compared to gorillas (Gorilla gorilla), orangutans (Pongo pygmmaeus), chimpanzees (Pan troglodytes), and macaques (Macaca mulatta) using trinucleotide and dinucleotide repeats randomly chosen from the primer bank (Rubinsztein et al., 1995). Tetranucleotide repeats were longer in humans than their orthologous loci in chimpanzees by data set constructing, and the differences in the interspecific lengths may also be attributed to results biased selection of the loci associated with cloning procedures (Webster et al., 2002). Comparisons of microsatellite evolution between species could also be based on reciprocal analyses or on genuinely random selection (Ellegren et al., 1997).
Herein, this study considered the distribution and tetranucleotide orthologous SSRs in six primate genomes. Six primate species with complete sequencing that included Rhinopithecus roxellanae, Papio anubis, Macaca mulatta, Homo sapiens, Pan troglodytes, Microcebus murinus were used to comparatively analyze their SSRs numbers, relative abundance, relative density and GC content. This may provide a better insight into the microsatellite distribution in a range of taxa (Qi et al., 2015, Huang et al., 2016). The preliminary estimates and analysis of tetranucleotide orthologous SSRs loci provide fundamental data for further analysis on microsatellite evolution of primate species.
Section snippets
Sequences sources
The available genome sequences of six primate species including R. roxellanae (2n = 44), P. anubis (2n = 42), M. mulatta (2n = 42), H. sapiens (2n = 46), P. troglodytes (2n = 48) and M. murinus were downloaded from NCBI and saved as FASTA formats. The accession of genome numbers are as follows: GCA_000769185.1, GCA_000264685.1, GCA_000230795.1, GCA_000306695.2, GCA_000090855.1 and GCA_000165445.3. The accession numbers of mitochondrial genome are KM504390.1, KC757406.1, KJ567051.1, KX675270.1,
Overview of the six primate genomes
We examined the distribution of perfect 1-6 bp microsatellites in six primate species using default parameters of MSDB, which would eliminate the errors resulted from different search criteria. The total numbers of microsatellites appeared highly variable ranging from 594,127 to 1,368,292, and the total SSRs length covered from 0.4% to 0.98% in six primate genomes. The maximum number and highest relative abundance of SSRs were found in R. roxellanae (471.91No./Mb), followed by P. anubis (440.71
Diversity of SSRs distribution in six primate species genomes
We found that, on average, there would be one SSRs locus ranged from every 2.11 kb to 4.88 kb in six primate species genomes. In particular, the SSRs locus of M. murinus was much sparse under the circumstance of similar genome size with its related species. All of the primate species, except for P. troglodytes, showed the similar distribution patterns on SSRs number and relative abundance: mono- > di- > tetra- > tri- > penta- > hexanucleotide repeats, which was inconsistent with the bovid species (Qi et
Acknowledgments
This work was supported by the National Key Programme of Research and Development, Ministry of Science and Technology (2016YFC0503200) and National Natural Science Foundation of China (No. 31530068). We would like to acknowledge the anonymous reviewers for their valuable comments and suggestions to improve the manuscript. We also thank Dr Castiglioni for the critical reading and English editing.
References (58)
- et al.
In-silico analysis of simple and imperfect microsatellites in diverse tobamovirus genomes
Gene
(2013) - et al.
Genome-wide distribution and organization of microsatellites in six species of birds
Biochem. Syst. Ecol.
(2016) - et al.
Simple sequence repeats as advantageous mutators in evolution
Trends Genet.
(2006) - et al.
Identification and characterzation of dinucleotide repeat (CA)n markers for genetic mapping in dog
Genomics
(1993) - et al.
Mining microsatellites in eukaryotic genomes
Trends Biotechnol.
(2007) - et al.
Genic microsatellite markers in plants: features and applications
Trends Biotechnol.
(2005) - et al.
Comparison of microsatellite distribution in genomes of Centruroides exilicauda and Mesobuthus martensii
Gene
(2016) - et al.
Characterization of perfect microsatellite based on genome-wide and chromosome level in Rhesus monkey (Macaca mulatta)
Gene
(2016) - et al.
Genome-wide survey and analysis of microsatellites in nematodes, with a focus on the plant-parasitic species Meloidogyne incognita
BMC Genomics
(2010) - et al.
The Analysis of Simple Sequence Repeats in Takifugu rubripes Genome
(2006)
Utilization of microsatellite polymorphism for differentiating herpes simplex virus type 1 strains
J. Clin. Microbiol.
MSDB: a user-friendly program for reporting distribution and building databases of microsatellites from genome sequences
J. Hered.
Krait: an ultrafast tool for genome-wide survey of microsatellites and primer design
Bioinformatics
Microsatellites: simple sequences with complex evolution
Nat. Rev. Genet.
Microsatellite evolution—a reciprocal study of repeat lengths at homologous loci in cattle and sheep
Mol. Biol. Evol.
Long, polymorphic microsatellites in simple organisms
Proc. R. Soc. Lond. B Biol. Sci.
Microsatellite evolution in congeneric mammals: domestic and bighorn sheep
Mol. Biol. Evol.
Biased distribution of microsatellite motifs in the rice genome
Mol. Gen. Genomics.
Distribution regularities of microsatellites in the Gallus gallus genome
Sichuan J. Zool.
Genome-wide survey and analysis of microsatellites in giant panda (Ailuropoda melanoleuca), with a focus on the applications of a novel microsatellite marker system
BMC Genomics
Characteristics of microsatellites in arborophila rufipectus genome sequences using 454 gs flx
Sichuan J. Zool.
Molecular and genomic data identify the closest living relative of primates
Science
Differential distribution of simple sequence repeats in eukaryotic genome sequences
Mol. Biol. Evol.
The genome-wide determinants of human and chimpanzee microsatellite evolution
Genome Res.
Survey of microsatellite clustering in eight fully sequenced species sheds light on the origin of compound microsatellites
BMC Genomics
Mining and survey of simple sequence repeats in expressed sequence tags of dicotyledonous species
Genome Res.
Orthomcl: identification of ortholog groups for eukaryotic genomes
Genome Res.
Comparative analysis of microsatellite sequences distribution in the genome of giant panda and polar bear. Sichuan
J. Zool.
A high-resolution map of human evolutionary constraint using 29 mammals
Nature
Cited by (0)
- 1
Co-first author.