Next-generation sequencing of mixed genomic DNA allows efficient assembly of rearranged mitochondrial genomes in Amolops chunganensis and Quasipaa boulengeri

Recent improvements in next-generation sequencing (NGS) technologies can facilitate the obtainment of mitochondrial genomes. However, it is not clear whether NGS could be effectively used to reconstruct the mitogenome with high gene rearrangement. These high rearrangements would cause amplification failure, and/or assembly and alignment errors. Here, we choose two frogs with rearranged gene order, Amolops chunganensis and Quasipaa boulengeri, to test whether gene rearrangements affect the mitogenome assembly and alignment by using NGS. The mitogenomes with gene rearrangements are sequenced through Illumina MiSeq genomic sequencing and assembled effectively by Trinity v2.1.0 and SOAPdenovo2. Gene order and contents in the mitogenome of A. chunganensis and Q. boulengeri are typical neobatrachian pattern except for rearrangements at the position of “WANCY” tRNA genes cluster. Further, the mitogenome of Q. boulengeri is characterized with a tandem duplication of trnM. Moreover, we utilize 13 protein-coding genes of A. chunganensis, Q. boulengeri and other neobatrachians to reconstruct the phylogenetic tree for evaluating mitochondrial sequence authenticity of A. chunganensis and Q. boulengeri. In this work, we provide nearly complete mitochondrial genomes of A. chunganensis and Q. boulengeri.

Some frogs with gene order rearrangements provide the opportunity to test the validity of mitogenome assembly by NGS.Gene order reorganization was reported in the ''WANCY'' gene cluster of some Amolops mitogenomes (Xia et al., 2014;Xue et al., 2015;Zhang, Xia & Zeng, 2016;Shan et al., 2016).The ''WANCY'' gene cluster was also rearranged in the spiny-bellied frog, Quasipaa boulengeri (Shan et al., 2014).Furthermore, the intraspecific diversity of gene rearrangements, which contains four kinds of rearrangements, within Q. boulengeri has been identified in 290 samples from 28 populations (Xia et al., 2016).The discovery of intermediate states and alternative losses types in the spiny-bellied frog provided direct evidence of tandem duplication and random loss model for mitochondrial gene rearrangements.However, such partial duplications and deletions of mtDNA fragments would cause inconvenience to assemble mitogenomes by NGS.
In this study, we choose two frogs with gene rearrangements, Amolops chunganensis and Quasipaa boulengeri, to test whether NGS could effectively obtain the mitogenome with high gene rearrangement.The mitogenomes are assembled and annotated from next generation sequencing reads through Illumina MiSeq genomic sequencing.The nearly-complete mitochondrial DNA sequence of A. chunganensis and Q. boulengeri were recovered, and compared with other Amolops and Quasipaa species, respectively.In order to evaluate mitochondrial sequence authenticity of A. chunganensis and Q. boulengeri, we performed a phylogenetic analysis.The phylogenetic tree was constructed by using Bayesian Inference (BI) and Maximum Likelihood (ML) methods for the two newly obtained and other published neobatrachian mitogenomes from GenBank.

Library preparation and Illumina sequencing
The sample of Amolops chunganensis was collected in Gansu Province, China (Voucher No. XM5526, ♀); and Quasipaa boulengeri was collected in Sichuan Province, China (Voucher No. XM3632, ♀).Chengdu Institute of Biology issued permit number CIB#2014-36 and CIB#2014-110 for field work.All work with animals was conducted according to relevant national and international guide-lines on the Protection of Wildlife.All animal care and experimental procedures were approved by the Animal Care and Use Committee (Permit Number: CIB-20121220A), Chengdu Institute of Biology, Chinese Academy of Sciences.
Genomic DNA was extracted from muscle tissue through SDS-proteinase K/phenolchloroform protocols.DNA samples were shipped to Novogene Bioinformatics Technology (Beijing, China) for library construction and sequencing on Illumina MiSeq platform with 300 bp paired-end reads (PE300).Briefly, the paired-end library was performed with TruSeq kit (insert size ∼500 bp), following the protocols of Illumina DNA sample preparation.These libraries were pooled together to sequence on Illumina MiSeq platform at the Novogene Bioinformatics Technology (Beijing, China).Sequence files for genomic DNA from A. chunganensis and Q. boulengeri were generated in FASTQ format (sequence read and quality information in Phred format).The raw sequencing data were deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/Traces/sra)under accession numbers SRR3929655 and SRR3929656 for Q. boulengeri and A. chunganensis, respectively.

Mitogenome assembly
Contaminant sequences were first removed.Then, low-quality regions (Phred quality score < 20) and sequence adapters in the libraries were trimmed by using Trimmomatic (Bolger, Lohse & Usadel, 2014).De novo assembly of clean reads for each datasets were performed using SOAPdenovo2 (Luo et al., 2012) and Trinity v2.1.0(Haas et al., 2013).Reads preprocess and assembly parameters followed the individual program guidelines.For SOAPdenovo2, we used 7 k-mer sizes between 50 and 80 with a step size of 5 and manually modified the assembly parameter values.Trinity assembler was used with the inchworm k-mer method, following authors' recommendations.For each assembly, we monitored the change of total number of contigs and N50 size over the assessed parameter range.
After assembly, we used the published mitogenomes of Amolops as queries against the contigs dataset of A. chunganensis.Amolops loloensis (GeneBank No. KT750963) and A. mantzorum (KJ546429) were used as queries for the reference mitogenome.The reference mitogenomes were BLASTed against assembly using BLASTn (BLAST + v2.2.30) to search for contigs with mitochondrial protein-coding and RNA genes.For Q. boulengeri, the same strategy has been performed.Quasipaa boulengeri (KC686711) and Q. spinosa (NC_013270) were chose as reference mitogenomes.

Mitogenome annotation and analysis
The mitogenome sequence was annotated using both tRNAscan-SE v.1.21(http://lowelab.ucsc.edu/tRNAscan-SE;Schattner, Brooks & Lowe, 2005;Lowe & Eddy, 1997) and the MI-TOS web server (http://mitos.bioinf.uni-leipzig.de/index.py;Bernt et al., 2013).These programs also used to predict the potential cloverleaf secondary structures of tRNAs genes.The sequences of candidate tRNAs were recognized and aligned with homologous tRNAs genes.Thirteen PCGs of the two mitogenomes were identified by comparing inferred open reading frames with published Amolops and Quasipaa species.Both translation of open reading frames and annotation of the N-and C-terminal ends of each PCG were done in MEGA v6.06 (Tamura et al., 2013).In addition, the boundaries of the rrnL and rrnS genes were predicted by the flanking tRNA genes.

Phylogenetic analysis
To validate the phylogenetic relationship of A. chunganensis and Q. boulengeri among neobatrachians, we aligned 46 mitogenomes of neobatrachians to confirm the phylogenetic relationships, and selected Pelobates cultripes (NC_008144), Xenopus laevis (NC_001573) and X. tropicalis (NC_006839) as outgroup taxa.Details of the GenBank accession numbers of anuran mitochondrial genomes analyzed were listed in Table S1.All sequences were edited manually in BioEdit Sequence Alignment Editor v.7.0.5 (Hall, 1999) and MEGA v.6.06 (Tamura et al., 2013) with default parameters.The alignments of 13 PCGs were determined in software MEGA v.6.06, with the default settings (align codons).The PCGs were translated to amino acids sequences and were manually concatenated all sequences into a single nucleotide dataset (total 11,502 bp).
We constructed the phylogenies with the concatenated dataset using BI and ML methods, which were conducted by MrBayes v.3.1 (Ronquist & Huelsenbeck, 2003;Huelsenbeck & Ronquist, 2001) and RAxML v.8.2.x (Stamatakis, 2014), respectively.Nucleotide substitution model selection was estimated under Bayesian Information Criterion (BIC) scores in jModeltest v.0.85 (Posada, 2008), and model GTR + I + G was selected as the best-fitting model for BI and ML analyses.BI partitioning analysis followed the programs with calculating a majority rule consensus tree with 10,000,000 generations of Markov chain Monte Carlo (MCMC), with frequency of tree sampling every 1,000 generations and the first 25,000 trees discarding as burn-in, and starting from a random tree.After performing two independent runs, the output trees were combined to estimate the Bayesian posterior probabilities (BPP) in 50% majority rule for each node.For ML analysis, program RAxML were performed with model GTRGAMMA under the similar partitioning parameters as BI analysis, and with 1,000 bootstrap replicates to calculate the bootstrap (BS) of the topology.In addition, the significant nodes' supports were considered with 95% BPP (Felsenstein, 2004) and 75% BS (Hillis & Bull, 1993) in BI and ML analysis, respectively.

Sequencing data from Illumina MiSeq
A total of 15.29 million raw PE reads were produced on Illumina MiSeq (PE300) systems (4.9 Gb and 4.2 Gb raw data for Q. boulengeri and A. chunganensis, respectively).After removal of adaptors and low-quality reads, the clean reads were used for the subsequent assembling.All assembly statistics of each assembler were shown in Table 1.Although total numbers of assembled contigs in Trinity were less than SOAPdenovo2, the Trinity assembler collected larger N50 and longer contigs (Table 1).More than two hundred contigs were aligned to reference mitogenome for each K-mer assembly of SOAP denovo2; however, most of them less than 500 bp.Further, few gaps existed when such contigs were used to assemble for the final mitogenome construction.By contrast, the best aligned contigs to reference mitogenome in Trinity are longer than 16 kb without gaps.It implies that Trinity is better than SOAPdenovo2 to recover mitogenome for low coverage metagenomic skimming data.Notes.
a The k-mer sizes used in SOAP denovo2.
In A. chunganensis mitogenome, ten of 13 PCGs started with the common initiation codon ATG, while nad2 began with ATT, cox1 and atp6 initiated with ATA (Table S2).
For the terminal codon, atp8, cox1 and nad6 genes stop with TAG, AGG and AGA, respectively; nad4l, nad4 and cytb ended with TAA; while nad2 and atp6 ended with TA.Five PCGs (cox2, cox3 and nad1, nad3, nad5) ended with an incomplete stop codon (T-).In addition, we had checked two repeated units 5 -CCGTATGGTTTAATT -ATATACCATCTAATAGGTGATATATATACA-3 (8 times) and 5 -CCTATATATGCC -CGATATATACTATACTAAGTATTAATCA-3 (6 times) located at the upstream of trnL and the downstream of cytb, respectively.Because of the tandem repeat units, only 530 bp had been recovered in the D-loop region for A. chunganensis.
In A. chunganensis, however, we found O L was located between the trnW and trnA.It is similar to A. mantzorum and A. loloensis gene arrangement pattern (Shan et al., 2016;Xue et al., 2015).Interestingly, a trnK pseudogene was located between cox2 and atp8 genes (Fig. 1A; Table S2).Compared with the trnK gene, the pseudogene has a nucleotide deletion in the anticodon loop, and has some mutations in the acceptor stem of its secondary structure (Table S2 and Fig. S1).Additionally, the W-O L -ANCY structure from assembled NGS reads was consistent with previous reported gene rearrangement pattern of another individual of A. chunganensis (KF771328; Xia et al., 2014).
In the mitogenome of Q. boulengeri, two trnM genes were derived from a tandem duplication, which was consistent with other Quasipaa species (Chen et al., 2015;Shan et al., 2014;Zhou et al., 2009).Moreover, pseudogene or residues of O L and trnN were observed in the tRNA genes cluster ''WANCY'' (between trnW and trnY ), resulting to rearranged pattern: ''WAN-O L -N'-O L '-CY'' in this individual Furthermore, the gene order of the two available Q. boulengeri individuals (GenBank no.KF199152, KC686711; Table S1) were quite different in the ''WANCY'' region.This indicated that the intraspecific diversity of gene rearrangements existed in Q. boulengeri mitogenomes.In addition, 576 non-coding nucleotides were observed between trnN and trnC (Fig. 1B), which is similar to other individuals of Q. boulengeri by Sanger sequencing (Xia et al., 2016).These observations supported that mitogenome with gene order rearrangement can be obtained effectively and correctly from assembled NGS reads.

Phylogenetic analysis
The concatenated PCGs data of mitogenome sequences contained 11,502 nucleotide positions, including 3,630 conserved sites, 7,806 variable sites and 7,068 potentially parsimony-informative sites.Bayesian inference and Maximum likelihood methods with GTR + I + G model consistently supported the same topology.

Efficient assembly of mitochondrial genomes
The length of the best fitted contigs for A. chunganensis and Q. boulengeri was 16,795 bp and 16,672 bp, respectively.We failed to recover the complete mitogenome of A. chunganensis and Q. boulengeri, due to the highly repetitive sequences in the D-loop region.Except the gap in the D-loop region, the whole mitogenomes were assembled successfully even if there are gene rearrangement in both species.Repeated regions are a well-known problem for sequence assembly algorithms, and it was hard to assemble the D-loop region with extensive repeated regions by NGS (Hahn, Bachmann & Chevreux, 2013;Tang et al., 2014).In the D-loop region, different repeated units, repetitive sequence and repetitions resulted in different sequence length (length polymorphism).For example, in Amolops species, the D-loop region ranged from 2,211 bp (A.mantzorum) to 3,391 bp (A.loloensis) (Zhang, Xia & Zeng, 2016;Shan et al., 2014;Xue et al., 2015).Actually, even if long tandem repeat regions can be successfully amplified by long PCR, they also represent a problem for Sanger sequencing (Xia et al., 2014).
Gene order rearrangement is not a big problem for the assembly of mitogenomes by NGS.In the assembly process, the highly covered regions may be removed if uniform coverage of reads are assumed by a de novo genome assembler (Rubinstein et al., 2013).This  The Bayesian Inference (BI) and Maximum Likelihood (ML) tree based on nucleotide sequences of the combined 13 protein-coding genes (11,502 in size).For the BI tree, the GTR + I + G model was selected, and two independent runs were performed for 1,000,000 generations with sampling frequency 0.001.The GenBank accession numbers of all species are shown.Numbers beside the nodes are Bayesian posterior probabilities and ML Bootstrap, respectively (showed in BPP/BS).between cox2 and atp8 genes.Each gene is shown as an arrow indicating the transcription direction.The arrows on top of the black line correspond to genes coded on the H-strand, and those below show genes on the L-strand.
may be disadvantageous to the assembly of mitogenomes with gene order rearrangement.Tandem duplication followed by random loss (TDRL) is the most frequently invoked model to explain the diversity of gene rearrangements in metazoan mitogenomes (Boore, 2000).According to this model, two copies of mitochondrial gene fragment would exist after tandem duplication.These tandem duplications would cause errors when using short reads of NGS to assemble mitogenomes.Nevertheless, deletion of a redundant gene-copy may happen rapidly due to replication and/or transcription efficiency, facilitating the formation of pseudogenes or the complete deletion of redundant genes Such pseudogenes or residues of tandem repeats could be efficiently distinguished from the original copy.
By aligning published mitogenomes of close species and individuals in rRNAs, tRNAs and PCGs, we assessed the quality of our assembled data.The alignment results suggested that the assembled sequences were consistent with majority of other anurans mitogenomes The non-conservative regions were located between nad2 and cox1, which includes gene rearrangements in Amolops and Quasipaa reported previously (Xia et al., 2014;Shan et al., 2014;Shan et al., 2016).
Our results illustrated that mitochondrial sequences can be successfully assembled from the NGS raw data.Compared with traditional methods, the strategy of mitogenomes assembly from NGS raw data has two distinct advantages.First, it neither depends on specific primers nor be affected by gene rearrangement.When gene rearrangement region include primers, the PCR-based method could not be successfully amplified.Second, this approach is fast, timesaving, relatively cheap, and does not require great effort to recover the mitochondrial genomes (Gan, Schultz & Austin, 2014;Tang et al., 2014;Rubinstein et al., 2013;Haiminen et al., 2011).However, the disadvantage of such an approach is that it requires constructing separate genomic libraries for each sample.

Mitochondrial gene order rearrangements
Some pseudogenes or residues of O L trnK and trnN were observed in A. chunganensis and Q. boulengeri.These gene rearrangements may be explained by the theory of TDRL (Mueller & Boore, 2005;Boore, 2000).In A. chunganensis, O L was located between trnW and trnA.For Q boulengeri, the intraspecific gene rearrangements were checked in the WANCY region.Our results supported the observation that the hotspot of gene rearrangement was adjacent to the origin of light-strand replication (San Mauro et al., 2006).The widespread occurrence of gene rearrangement among different vertebrate groups has been examined in relation to variability in the thermodynamic stability of the light-strand replication origin (Fonseca & Harris, 2008).
Mitochondrial metagenomic skimming by NGS could be a useful approach to recover intraspecific rearrangements of mitogenomes.By sequencing the hotspot of gene rearrangement of mtDNA for Q boulengeri (from nad2 to cox1, which includes the WANCY region), four kinds of gene rearrangements were identified in spiny-bellied frog populations (Xia et al., 2016).Yet, mitochondrial genome analysis may provide more valuable evidences for interpreting the intraspecific gene rearrangements.Intraspecific rearranged variations were not common by comparison of all mtDNA records of amphibians and reptiles in GenBank.

Phylogenetic reconstruction of mitogenomic data
In this study, both the phylogenetic relationships and mitogenome organization suggest that A. wuyiensis and A. ricketti were significantly distinguished from other Amolops species (Fei et al., 2009).Within Amolops clade, the highly supported sister relationship of A. wuyiensis and A. richetti was consistent with the results of Cai et al. (2007).Within Quasipaa clade, the phylogenetic inferences based on mtDNA sequences revealed that all individuals of Q. boulengeri formed a monophyly with high supports, sister to Q. verruspinosa.This result is congruent with Che et al. (2009), but different from the result of Qing et al. (2012).The phylogenetic reconstruction using whole mitogenome provides more credible result than single gene.
As previous demonstration (Yan et al., 2013;Qing et al., 2012;Cai et al., 2007;Frost et al., 2006), the mitogenomic approach proved useful tools for inferring phylogenetic relationships within Neobatrachia.In the present study, all clades were well resolved, with a few exceptions where Bayesian posterior probabilities were no less than 0.90.Despite their fast evolutionary rate, mitochondrial genomes contained a phylogenetic signal that could be efficiently recovered with reasonable taxon sampling (Rubinstein et al., 2013).

KFigure 1
Figure 1 Annotation and map of Amolops chunganensis and Quasipaa boulengeri.Annotation of Amolops chunganensis (A) and Quasipaa boulengeri (B) mitogenome.PCGs are colored in red, tRNA-coding genes are in blue, rrnL and rrnS are in green.A pseudogene of trnK is located between cox2 and atp8 genes.Each gene is shown as an arrow indicating the transcription direction.The arrows on top of the black line correspond to genes coded on the H-strand, and those below show genes on the L-strand.

Figure 2
Figure2The Bayesian Inference (BI) and Maximum Likelyhood (ML) tree (combined 13 PCGs, 11,502 bp).The Bayesian Inference (BI) and Maximum Likelihood (ML) tree based on nucleotide sequences of the combined 13 protein-coding genes (11,502 in size).For the BI tree, the GTR