The revised complete mitogenome sequence of the tree frog Polypedatesmegacephalus (Anura, Rhacophoridae) by next-generation sequencing and phylogenetic analysis

The mitochondrial genome (mitogenome) sequence of the tree frog Polypedates megacephalus (16,473 bp) was previously reported as having the unusual characteristic of lacking the ND5 gene. In this study, a new mitogenome of P. megacephalus (19,952 bp) was resequenced using the next-generation sequencing (NGS) and standard Sanger sequencing technologies. It was discovered that the ND5 gene was not lost but translocated to the control region (CR) from its canonical location between the ND4 and ND6 genes. In addition, a duplicated control region was found in the new mitogenome of this species. Conservative region identification of the ND5 gene and phylogenetic analysis confirmed that the ND5 gene was located between two control regions. The phylogenetic relationship among 20 related species of anura revealed a rearrangement of the ND5 gene during the evolutionary process. These results also highlighted the advantages of next-generation sequencing. It will not only decrease the time and cost of sequencing, but also will eliminate the errors in published mitogenome databases.

With the rapid development of technology, an increasing number of species' genomes have been sequenced using the next-generation sequencing (NGS) technology (Lloyd et al., 2012;Yong et al., 2016;Yuan et al., 2016;Jiang et al., 2017). This new method allows us to more accurately explore the details of gene rearrangements in frogs and compare the results with the mitochondrial genome obtained by the conventional sequencing method (Sanger sequencing).
There are approximately 421 species of Rhacophoridae frogs around the world including two subfamilies (Rhacophorinae and Buergeriinae) and 18 genera (Frost, 2018). These frogs are widely distributed in the tropical and subtropical regions of Asia, the southern part of Africa, India, Sri Lanka, Japan, the Philippines, and the Greater Sunda Islands. Although recent studies have illustrated that mitochondrial gene rearrangements have been detected in quite a few species of anura, there is limited data available on the genome of tree frogs and what is available hardly represents the main lineages of Rhacophoridae (Sano et al., 2004;Sano et al., 2005;Huang et al., 2016).
The genus Polypedates is widely distributed across Southeast Asia. Due to its morphologically cryptic lineages, the taxonomic status of genus Polypedates is disputed (Brown et al., 2010;Blair et al., 2013;Kuraishi et al., 2013;Pan et al., 2013). Brown et al. (2010) sequenced samples of 16S rRNA to investigate the theory that Polypedates is actually complex of lineages. Their results showed that the frogs are highly divergent, especially in the mainland areas, but they did not resolve the phylogenetic and taxonomic issues of the complex. Kuraishi et al. (2013) reported that in China Polypedates is a monophyletic group that encompasses three distinct taxa. Pan et al. (2013) defined five different lineages in Polypedates and four clades, which are distributed in the southern region of China. Most of the studies on this genus focus on phylogenetic analysis of only a few gene fragments (12S rRNA and 16S rRNA). Few scientists have considered the mitochondrial genome in relation to the phylogeny of Polypedates's species.
A previous study has shown that the length of the complete mitogenome of P. megacephalus is 16,473 bp with one control region, and it has been reported that the ND5 gene has been lost (Zhang et al., 2005b). Until recently, however, few attempts have been made to verify this unprecedented gene loss. The alleged absence of the ND5 gene has not been reported in any vertebrate mitogenome aside from that of P. megacephalus (Zhang et al., 2005b) and tuatara (Rest et al., 2003). NADH-ubiquinone oxidoreductase chain 5 is a protein subunit of NADH dehydrogenase located in the mitochondrial inner membrane. The ND5 protein is the largest of the five complexes of the electron transport chain encoded by the ND5 gene (Voet, Voet & Pratt, 2013). A mutant ND5 gene can damage the function of COI and impair any brain or muscle tissue that requires energy input (Charlotte et al., 2010). It is unlikely, therefore, that the mitochondria would lose such an essential gene.
In the present study, we resequenced the complete mitogenome of Polypedates megacephalus using Illumina MiSeq sequencing and standard Sanger sequencing. We discovered that the ''missing'' ND5 gene was not lost but has been translocated to the control region from its canonical location between the tRNA Glu and ND6 genes. Accounting for the ND5 gene and an additional control region, the correct mitogenome of P. megacephalus should be 19,952 bp in length instead of 16,473 bp in length. In combination with previously published data from 19 other anuran species, phylogenetic analysis of the newly sequenced mitogenome of P. megacephalus suggests that similar ND5 gene rearrangements might have occurred in two distinct ranoid lineages: Dicroglossidae and Rhacophoridae.

Taxon sampling and DNA extraction
The specimen, P. megacephalus, was captured in the Shiwandashan National Forest Park, Guangxi Province of China. A field permit (S#2017-42) was granted by the Nature Reserve Administration of Shiwandashan National Forest Park. All animal care and experimental procedures were approved by the Committee of the Ethics on Animal Care and Experiments at Sichuan Agricultural University (CEACE) (Permit Number: S20171001) and carried out in accordance with the guidelines stated by CEACE. Muscle tissue taken from the specimen was preserved in 99% ethanol immediately after collection. Total genomic DNA was extracted from the tissue using an Ezup-type animal genomic DNA extraction kit (Sangon, Shanghai) following the operation manual. The extract was prepared for both Sanger sequencing and next-generation sequencing (NGS).

Library preparation and mitogenome sequencing
One part of the total DNA sample was sent to Personal Biotechnology (Shanghai, China) for WGS (Whole Genome Shotgun) library construction. Sequencing libraries were generated using the TruSeq DNA Sample Preparation Kit (Illumina, San Diego, CA, USA) and the Template Prep Kit (Pacific Biosciences, Menlo Park, CA, USA). Average insert sizes of approximately 400 bp were prepared with sequencing completed on the Illumina MiSeq sequencing platform producing 251 bp paired-end reads.
To link the gaps where some regions (reads) were not assembled by NGS, the remaining mtDNA was used to perform a long-range PCR (LA-PCR) amplification using two sets of primers: F2 (CytbFow1 and FND512800H) and F3 (ND5F2 and R16M1) (Fig. 1, Table 1). Additionally, in order to verify the authenticity of the non-coding sequence between tRNA Lys and ATP6, we used a set of primers, F1 (COIIF and ATP6R). The primer sets were designed based on the mitogenome sequences obtained from the newly sequenced P. megacephalus. The PCR products were preliminarily confirmed on a 1.0% agarose gel ( Fig. 1) and cloned using the pEASY-T5 Zero Cloning Kit (TransGen Biotech, Beijing, China). The recombinant plasmids were then sequenced with Sanger sequencing using  M13 universal primers. The sequence obtained by Sanger sequencing was then spliced with contigs, or scaffolds, using the ContigExpress software package.

Sequence assembly and annotation
The quality of the raw Illumina data (in FASTQ format) was evaluated using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The raw data was cleaned up, filtered, and assembled. The workflow is described as follows: AdapterRemoval v2 (Schubert, Lindgreen & Orlando, 2016) was used to remove low quality bases and discard the contaminated adapters from the 3 end of the reads. Then, the short reads were locally corrected by the module SOAPec in SOAP de novo2 (https://sourceforge.net/projects/ soapdenovo2/) using the k-mer strategy. We mapped the short reads using BWA software (Li & Durbin, 2009) and SAMtools (Li et al., 2009) (Fig. 1). The sequencing reads have been uploaded to NCBI SRA database (SRA accession number PRJNA516153). The high quality data was then selected for de novo assembly by A5-miseq v20150522 (Coil, Jospin & Darling, 2015) and SPAdes v3.9.0 (Bankevich et al., 2012). The contig of mitogenome sequences was identified by the NT library on NCBI using BLASTn (BLAST C v2.2.31+) and the draft assemblies were corrected by Pilon v1.18 (Walker et al., 2014) to evaluate their accuracy and completeness. The gaps between the ND5 and 12S rRNA by NGS were filled using Sanger sequences of the amplicon generated by the two sets of primers, F2 and F3 ( Fig. 1; Table 1).
With the exception of the ATP8 gene, the sequences from the 12 concatenated PCGs and 2 rRNAs were chosen for phylogenetic analysis. The analysis was performed using the Bayesian inference (BI) and maximum likelihood (ML) methods. The nucleotide sequences of the 12 PCGs and 2 rRNAs were aligned using MAFFT version 7 (Karen, Marc & John, 2008) and all termination codons were manually deleted. The sequences were edited and trimmed using BioEdit v.7.0.5.3 (Hall, 1999). The best-fit nucleotide substitution models were calculated using PartitionFinder version 2.1.1 (Lanfear et al., 2017) according to the Bayesian information criterion (BIC).
With the appropriate models and partitioning schemes selected in each case (Table S2), a BI phylogenetic tree was constructed using MrBayes v3.2 (Ronquist et al., 2012). In the BI analysis, two independent runs were conducted for 1,000,000 Markov chain Monte Carlo (MCMC) generations with 4 chains (one cold and three heated) and sampling trees occurring every 1,000 generations. The first 25% of trees were discarded as burn-in samples and the remaining trees were used to generate Bayesian consensus trees. For ML analysis, the branch support of ML phylogeny was obtained with 1,000 bootstrap replications in RAxML software under the GTRGAMMA model (Alexandros, 2014), with partitioning parameters similar to those in the BI analysis. The same methods were used for the single gene tree. FigTree 1.4.2 (https://beast.community/figtree) was used to visualize and edit the results of the Bayesian and ML trees.

Genome composition and gene arrangement
The raw data obtained from the Illumina Pipeline was 524.75 Mb. The total length was 18,328 bp by NGS. However, the NGS results could not be aligned into a complete genome. In order to link the gaps in the NGS sequence, we used the primers Fragment 2 (13,510-16,867) and Fragment 3 (15,877-701) according to the P. megacephalus reference mtDNA (GenBank accession no. MH936677) to amplify the remaining sequence segments ( Fig. 1; Table 1). LA-PCR amplification and Sanger sequencing with two sets of primers resulted in two products with 3,500-bp and 5,000-bp ( Fig. 1; Table 1). Finally, we obtained the complete mitogenome of the P. megacephalus with 19,952 bp by merging the sequences with overlapping alignments. The complete mitogenome sequence was deposited in the GenBank databases under the accession number MH936677.
The complete mitogenome of P. megacephalus was 19,952 bp in total length and contained 22 tRNAs, two rRNAs, 12 PCGs and two control regions ( Fig. 2; Table S3). Consistent with other typical amphibian mitogenomes, most of the genes of P. megacephalus were coded on the H-strand except for ND6 and 8 tRNAs ( Fig. 2; Table S3). 22 tRNAs with sizes ranging from 65 to 74 bp were interspersed across the mitogenome. Aside from tRNA Cys and tRNA Ser(AGY) , which cannot possess a perfect cloverleaf structure (Fig. S1), these tRNAs (20 of 22) can fold into a complete secondary structure. In addition, there was a notable noncoding region that was 862 bp in length between tRNA Lys and ATP6 genes in the mitogenome. This was consistent with the results from a previous study (Zhang et al., 2005b) (Table S3). The base compositions of the whole mitochondrial genome, 2 rRNA genes, PCGs, and control regions of P. megacephalus are shown in Table S4. The A + T content for the PCGs ranged from 54.4% (ND3) to 63.1% (ND2). Four PCGs (ND1, ND2, ATP6, ND4,) had an A+T content of over 60%. The A +T content of the CR1 and CR2 were 65.7% and 68.8%, respectively. The AT-skew value was negative for the entire mitogenome The gene arrangement of the P. megacephalus mitogenome diverged from that of typical vertebrates ( Fig. 2; Table S3). Impressively, the ND5 gene (1,779 bp) was detected and subsequently located between two control regions (CRs) after the gene had been thought to be lost in this species (Zhang et al., 2005b). The ND5 gene and two CRs (CR1 and CR2) accompanied by the tRNA gene cluster of tRNA Leu(CUN) /tRNA Thr /tRNA Pro /tRNA Phe (LTPF cluster) were also identified in the Schlegel's tree frog, Rhacophorus schlegelii (Sano et al., 2005).

Protein-coding genes
The total length of all 12 PCGs was 11,121 bp and the overall A+T content was 59.2%. Most genes were encoded by the H-strand except for ND6. The initiation codons of most of the PCGs were ATG, except ND2 began with ATT, ND4 with GTG, and COI, COII, and ATP6 with ATA. Two genes (COI and ND6) ended with AGG as the stop codon, three genes (COII, ND4L and ND5) used the TAA codon, and the ND2 gene terminated with the TAG codon. The remaining six PCGs (ND1, ATP6, COIII, ND3, ND4 and Cytb) ended with an incomplete stop codon, T-. This incomplete codon can be transformed into a complete one (TAA) through transcription (Ojala et al., 1981).
The relative synonymous codon usage (RSCU) values are shown in Fig. S2. Compared to the other synonymous codons, the RSCU analysis indicated that codons including A or T at the third position were always overused. The most frequently encoded amino acids were Leu (CUN), Ile, and Ala (>300), while the least frequently used amino acid was Cys (<45).
In this study, we discovered that the previously reported ''absent'' ND5 gene was not actually absent but was instead translocated to the position between the CR1 and CR2. To further confirm our results, the ND5 gene of four other Polypedates frogs was also obtained using Sanger sequencing (Dataset S3). The translocated ND5 is an intact gene that has both a normal initiation and stop codon and can be translated into proteins. The open reading frame was also identified by ORF Finder on the NCBI website and the membrane transport proteins of NADH-ubquinone oxidoreductase 5 were identified by Interproscan. Among the aligned sequences of the ND5 gene from eight tree frogs, the functionally conserved region was the most prominent with respect to amino acid sequence similarity (Fig. 3A). The phylogenetic results of the ND5 gene from eight tree frogs also revealed that the closely related species were clustered into the correct clades (Fig. 3B).
Gene rearrangement is generally attributed to a slipped-strand mispairing caused by the stem loop structure of mitochondria that leads to gene duplication. Due to natural selection and the accumulation of natural mutations, the repeat gene sequences were excised in subsequent random loss (Macey et al., 1997). The hypothetical gene rearrangement processes of the tRNAs and ND5 as proposed by Sano (Sano et al., 2005) are shown in Fig. S3: the first tandem duplication occurs in the ND5-CR. After the repeat genes are deleted, the ND5 gene is transferred downstream of the CR. The secondary tandem duplication occurs in the CR-ND5; two CRs appear after the deletion of the repeat ND5 gene. The third tandem duplication occurs in the LTPF cluster (tRNA Leu(CUN) /tRNA Thr / tRNA Pro / tRNA Phe ).
In some species of Dicroglossidae and Rhacophoridae, the position of the ND5 gene shifts to a different locality. For instance, (1) in Fejervarya limnocharis (Liu, Wang & Su, 2005) and F. cancrivora (Ren et al., 2009), the ND5 gene shifts to the 3 end of the CR. Ren et al. (2009) deduced that after a duplication of the ND5-CR region, the ND5 gene, which was located upstream of the CR, was deleted and the downstream ND5 gene is preserved as the CR-ND5 arrangement in the genus Fejervarya.
(2) The ND5 gene of Rhacophorus schlegelii (Sano et al., 2005) is translocated between two copied CRs, confirming the evolutionary trend of this unique gene order. Therefore, we infer that the ND5 gene of P. megacephalus might have experienced a similar gene rearrangement event.
The ATP8 gene is a gene that encodes a subunit of mitochondrial ATP synthase (Doering, Ermentrout & Oster, 1995). The published gene sequences of the P. megacephalus (Zhang et al., 2005b) and P. braueri (Liu et al., 2015) reflected a loss of the ATP8 gene in the Polypedates species. Our results also detected no ATP8 gene or ATP8-like sequence in P. megacephalus.
The ATP8 gene has a high mutation rate, but still possesses the MPQL amino acids as the conserved motif at the N-terminus of the typical metazoan ATP8 (Gissi, Iannelli & Pesole, 2008;Ulianosilva et al., 2015). However, the putative sequence of P. megacephalus was unable to be translated into its corresponding protein suggesting this gene might have lost its function. The absence of ATP8 has been detected in several phylogenetically distant metazoan species: Nematoda (Lavrov & Brown, 2001), Mollusca (Gissi, Iannelli & Pesole, 2008;Ulianosilva et al., 2015), and Rotifera (Steinauer et al., 2005;Suga et al., 2008).
The majority of these are invertebrates. Among the vertebrates, however, the absence was only found in the Polypedates species (Zhang et al., 2005b;Liu et al., 2015). To verify the authenticity of the lost ATP8 (in case of sequencing errors), we also sequenced this region in four other species of the genus Polypedates, finding that there was indeed a noncoding sequence between the tRNA Lys and ATP6 genes (Dataset S3). It is likely that the ATP8 gene has become a pseudogene as Zhang et al. (2005b) inferred (Fig. S4).

Noncoding regions
The noncoding regions in P. megacephalus mtDNA included the control regions and a few intergenic spacers.
Two major noncoding regions (CR1 and CR2) were found in the P. megacephalus mitogenome ( Fig. 2; Table S3). CR1, which had a total length of 1,574 bp, was located between the Cytb and ND5 genes in a position homologous to that of the CR in the Buergeria buergeri mitogenome (Sano et al., 2004) and CR1 in the Rhacophorus schlegelii (Sano et al., 2005). CR2, which was 2,330 bp, was located between the ND5 and tRNA Thr genes (Fig. 2). Two CRs were separated by the ND5 gene with a nucleotide sequence similarity of 99% over 1,571bp. Six tandem repeat units of 38 bp were detected in the 5 side of CR1 and CR2, but the 3 side of CR2 had more 760 bp repeat sequences than CR1 (Fig. 4). These regions are characterized by lower conservation, a high degree of variation, transcription regulation, and the replication of mtDNA. Duplicated control regions have been reported in several metazoan lineages in species such as the tree frog Rhacophorus schlegelii (Sano et al., 2005), several major snake families (Kumazawa et al., 1998;Dong & Kumazawa, 2005), Reptile tuatara (Rest et al., 2003), Thalassarche albatrosses (Abbott et al., 2005), Australasian ixodes ticks (Shao et al., 2005), and Antarctic notothenioids (Zhuang & Cheng, 2010). Two CR copies can improve the transcription and translation efficiency of mitochondrial encoded respiratory chain proteins (Zhuang & Cheng, 2010). Some studies report that the two CR sequences are quite similar, suggesting that they may be undergoing concerted evolution (Kumazawa et al., 1998;Forcada et al., 2003;Peng, Nie & Pu, 2006). However, consistent with Zhang's experimental results (Zhang et al., 2005b), an 862 bp noncoding sequence (NC) was observed between the tRNA Lys and ATP6 genes. This NC might have replaced the original position of the ATP8 gene ( Fig. 2 and Fig. S4).

Phylogenetic analyses
The BI and ML methods of phylogenetic reconstruction yielded fully congruent tree topologies which supported the previous classification (Fig. 5) (Frost et al., 2006;Kakehashi et al., 2013;Kurabayashi & Sumida, 2013;Zhang et al., 2013;Li et al., 2014a;Xia et al., 2014;Yuan et al., 2016). 20 species of anura in the phylogenetic trees were clustered into 6 branches. In this study, phylogenetic trees based on 12 PCGs and 2 rRNAs strongly supported monophyly of Rhacophoridae and Dicroglossidae (BPP = 1 and bootstrap value = 100). Additionally, the relationships among the three ranid families (Mantellidae, Rhacophoridae, and Ranidae) were well resolved in BI analysis (BPP = 1). Some previous studies have suggested the family Ranidae is a paraphyletic group with respect to Mantellidae and Rhacophoridae ( Igawa et al., 2008;Kurabayashi et al., 2008;Ren et al., 2009) and supported Dicroglossidae as being in a sister clade relationship with (Ranidae, (Mantellidae, Rhacophoridae)) (Zhang et al., 2013;Li et al., 2014a;Chen et al., 2017). The tRNA Leu(CUN) /tRNA Thr / tRNA Pro / tRNA Phe (LTPF cluster) was arranged like the typical gene sequence of Babina in Ranidae, but in two Rana species (Rana amurensis and R. kunyuensis), the tRNA Leu(CUN) moved downstream of one of the CRs (Fig. 5). The ND5 genes of the species of these two genera had both moved. Kurabayashi et al. (2008) sequenced the complete mitogenome of Mantella madagascariensis and partial fragments of the Cytb-ND2 region in other Mantellidae-related frogs, and also determined that the ND5 gene had shifted downstream of the CR. The LTPF clusters of some species of Rhacophoridae are arranged in the order of TLPF (tRNA Thr / tRNA Leu(CUN) / tRNA Pro / tRNA Phe ); moreover, the ND5 gene is again located downstream of the CR (Fig. 5).
Four tRNA genes were arranged as TPLF (tRNA Thr /tRNA Pro / tRNA Leu(CUN) / tRNA Phe ) in some species of Dicroglossidae (F. cancrivora, F. limnocharis and F. multistriata), and here the ND5 gene was also translocated downstream of the CR (Fig. 5). Notably, the duplicated ND5 gene was detected in two species (H. rugulosus and H. tigerinus), further confirming the hypothesis of mitochondrial tandem duplication. Therefore it seems that the translocation of the ND5 gene took place in the common lineage of these Dicroglossidae genera ancestors (Alam et al., 2010;Chen et al., 2017). Based on the phylogenetic results it appears that the same translocation of the ND5 gene occurred in the common ancestor of Mantellidae as well as Rhacophoridae, as well as the common ancestral lineage of four genera of Dicroglossidae (Occidozyga, Fejervarya, Hoplobatrachus and Euphlyctis) (Chen et al., 2017). As discussed above, the same gene rearrangements probably occurred in two distinct ranoid lineages as convergent genetic evolutionary events (Ren et al., 2009;Chen et al., 2017).

Possible causes for misdiagnosis of an "absent" ND5 gene
It was previously reported that the ND5 gene was absent in the mitogenome of P. megacephalus ( showed that the fragment size should be 14 kb rather than 9 kb. (2) It was previously inferred that the ND5 gene was absent based on the amplified sequence of the mtDNA region from tRNA Lys to Cytb from P. megacephalus and other related frogs (Zhang et al., 2005b). It was also reported that the amplified fragments from these tree frogs were apparently shorter than those of other frogs, suggesting the absence of the ND5 gene from its original position. However, this study indicates that this evidence could be attributed to translocation instead of gene loss.
(3) In addition, it was found that the copied CRs in P. megacephalus's mitogenome exhibited highly similar sequences to one another at the 5 region (100% across 38 × 6 = 228 bp) ( Fig. 4 and Fig. S5), which may have resulted in assembly error (i.e., the 5 region of CR1 and CR2 were misassembled or combined as a single CR). Thus, only one CR was observed and the absence of the ND5 gene was reported (Zhang et al., 2005b). The misdiagnosis of an ''absent'' ND5 gene could be attributed to the errors in the estimation of long-range PCR fragment size, assembly, and alignment.

CONCLUSION
We successfully resequenced and revised the entire mitochondrial genome sequence of Polypedates megacephalus and found the ND5 gene located between two control regions, although a previous study had suggested this gene to be absent. Our experiments indicated that more accurate results can be obtained using Sanger sequencing in combination with next-generation sequencing. Our subsequent experiments suggest that all species of genus Polypedates might have the same mitochondrial gene arrangement. Further investigations focusing on the mitochondrial gene arrangement of other tree frogs will enhance the understanding of the molecular mechanisms and evolutionary history behind the phylogenetic pathway.
• Song Li conceived and designed the experiments, authored or reviewed drafts of the paper, approved the final draft.
• Mingwang Zhang authored or reviewed drafts of the paper, approved the final draft.

Animal Ethics
The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers): This study about P. megacephalus was approved by the Committee of the Ethics on Animal Care and Experiments at Sichuan Agricultural University (CEACE) and was carried out in accordance with the guidelines stated by CEACE (S20171001).

Field Study Permissions
The following information was supplied relating to field study approvals (i.e., approving body and any reference numbers): Field permit S#2017-42 was granted by the Nature Reserve Administration of Shiwandashan National Forest Park.

Data Availability
The following information was supplied regarding the deposition of DNA sequences: The mitochondrial sequence of Polypedates megacephalus described here are accessible via GenBank accession number MH936677 and the Sequence Read Archive (SRA) is PRJNA516153.
Raw data of the complete mitogenome sequence of tree frog Polypedates megacephalus is available at Figshare: Huang, An (2019)

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.7415#supplemental-information.