Rapid evolutionary divergence of diploid and allotetraploid Gossypium mitochondrial genomes

Cotton (Gossypium spp.) is commonly grouped into eight diploid genomic groups and an allotetraploid genomic group, AD. The mitochondrial genomes supply new information to understand both the evolution process and the mechanism of cytoplasmic male sterility. Based on previously released mitochondrial genomes of G. hirsutum (AD1), G. barbadense (AD2), G. raimondii (D5) and G. arboreum (A2), together with data of six other mitochondrial genomes, to elucidate the evolution and diversity of mitochondrial genomes within Gossypium. Six Gossypium mitochondrial genomes, including three diploid species from D and three allotetraploid species from AD genome groups (G. thurberi D1, G. davidsonii D3-d and G. trilobum D8; G. tomentosum AD3, G. mustelinum AD4 and G. darwinii AD5), were assembled as the single circular molecules of lengths about 644 kb in diploid species and 677 kb in allotetraploid species, respectively. The genomic structures of mitochondrial in D group species were identical but differed from the mitogenome of G. arboreum (A2), as well as from the mitogenomes of five species of the AD group. There mainly existed four or six large repeats in the mitogenomes of the A + AD or D group species, respectively. These variations in repeat sequences caused the major inversions and translocations within the mitochondrial genome. The mitochondrial genome complexity in Gossypium presented eight unique segments in D group species, three specific fragments in A + AD group species and a large segment (more than 11 kb) in diploid species. These insertions or deletions were most probably generated from crossovers between repetitive or homologous regions. Unlike the highly variable genome structure, evolutionary distance of mitochondrial genes was 1/6th the frequency of that in chloroplast genes of Gossypium. RNA editing events were conserved in cotton mitochondrial genes. We confirmed two near full length of the integration of the mitochondrial genome into chromosome 1 of G. raimondii and chromosome A03 of G. hirsutum, respectively, with insertion time less than 1.03 MYA. Ten Gossypium mitochondrial sequences highlight the insights to the evolution of cotton mitogenomes.

In terms of structure, angiosperm mitochondrial genomes are typically mapped as circular molecules with one or more larger (>1 kb) repetitive sequences, which promote active homologous inter-and intra-genomic recombination [4,18,19]. However, it is not clear how plant mitochondrial genomes rearrange so frequently or how the genome sizes can vary dramatically over relatively short evolutionary period. This dynamic organization of the angiosperm mitochondrial genome provides unique information as well as an appropriate model system for studying genome structure and evolution. More syntenic sequences will be helpful to interpret the evolutionary processes for diverse angiosperm mitochondrial structures.
Cytoplasmic male sterility (CMS) is a maternallyconferred reproductive trait that relies on the expression of CMS-inducing mitochondrial sequences [40]. Many examples of CMS stem from the consequences of recombination [40][41][42]. Often, these chimeric CMS genes exhibit co-transcription with upstream or downstream functional genes, which typically affect the mitochondrial electron transfer chain pathways to fail to produce functional pollen [43]. Rearrangements in the mitochondrial DNA involving known mitochondrial genes as well as unknown sequences result in the creation of new chimeric open reading frames, which encode proteins containing transmembrane and lead to cytoplasmic male sterility by interacting with nuclear-encoded genes [43][44][45].
Here, six Gossypium mitochondrial genomes are reported, including three diploid species from D genome groups (G. thurberi D 1 , G. davidsonii D 3-d and G. trilobum D 8 ) and three allotetraploid species from AD genome groups (G. tomentosum AD 3 , G. mustelinum AD 4 and G. darwinii AD 5 ). Comparative mitochondrial genome analysis then revealed rapid mitochondrial genome rearrangement and evolution between diploid and allotetraploid Gossypium. In addition, one of the most surprising outcomes of comparative analyses is how rapidly mitochondrial sequence segments altered within a single subspecies. Finally, the four mitogenomes of D group species provided the useful data resources for interpreting the CMS-related genes in G. trilobum D 8 cotton.

Plant materials and mitochondrial DNA extraction
Seeds of diploid and allotetraploid Gossypium species were acquired from the nursery on the China National Wild Cotton Plantation in Sanya, Hainan, China. Mitochondria were isolated from week-old etiolated seedlings, and the mitochondrial DNA samples were extracted from an organelle-enriched fraction isolated by differential and sucrose gradient centrifugation, essentially as described earlier [37][38][39]46].

Mitochondrial genome sequencing and primary data processing
A total of~5 million clean paired-end reads were sequenced from a~500 bp library for each of three diploid species, respectively. We produced 300 bp read length with paired-end sequencing, using MiSeq sequencing method on Illumina platform at Beijing Biomarker Technologies Co, LTD. A total of~11 million clean paired-end reads were sequenced from a~500 bp library with paired-end, 300 bp read length, for each of three allotetraploid species, respectively, using the same method. Raw sequences were first evaluated by two quality control tools, Trimmomatic [47] and FilterReads module in Kmernator ([https://github. com/JGI-Bioinformatics/Kmernator]) to remove any potential undesirable artifacts in the data such as adapters or low quality or "N" bases and so on.

Genomes assembly and sequence verification
Six Gossypium draft mitogenomes were assembled de novo from the clean reads with velvet 1.2.10 [48] or combining FLASH [49] and Newbler (Version 2.53) methods, respectively. For the first assembly method, i.e., the 300-bp paired-end reads from six Gossypium species, we performed multiple velvet runs with different combinations of kmer values (for (kmer = 75; kmer <=209; kmer = kmer +2), (42 in total)). Three Kmer values (193,195,197), owning larger N50 values, less contig number, were used to assemble the mitogenomes. For each velvet run, the minimum coverage parameter was set to 10× and scaffolding was turned off when the data sets contained paired-end reads. For each of assembly, mitochondrial contigs were identified by blastn [50] searches with known Gossypium mitochondrial genomes for scaffolding and gap filling [37][38][39]. The best draft assemblies for six Gossypium were chosen as the assembly that maximized total length of mitochondrial contigs after combining three Kmer values assembly. In another assembly method, we combined FLASH [49] and Newbler (Version 2.53) softwares together. First, FLASH provides the use of paired-end libraries with a fragment size (500 bp) shorter than twice the read length (300 bp) an opportunity to generate much longer reads (500 bp) by overlapping and merging read pairs [49]. The merging file was then assembled using Newbler (Version 2.53) software. Finally, the assembled mitochondrial scaffolds were aligned with known Gossypium mitochondrial genomes [37][38][39] for anchoring scaffold directions and gap filling. Thus, we combined two types of the assembly results to complete the six Gossypium mitogenomes. The final remaining gaps were filled by aligning individual pair-end sequence reads that overlapped the scaffolds or contig ends using Burrows-Wheeler Aligner (BWA 0.7.10-r789) software [51].
To evaluate six mitogenomes sequence assembly quality and accuracy, pair-end reads were mapped onto their respective consensus sequences with BWA 0.7.10-r789 [51]. The BWA mapping resulting SAM files were transformed into BAM files using samtools view program [52]. The BWA mapping results for these pair-end reads in BAM files were then used to calculate depth of sequencing coverage through samtools depth program [52]. For all six Gossypium species, the Illumina reads covered all parts of the genome consistently, with the average coverage ranging from 50× to 200 × .

Genome annotations and sequence analyses
Gossypium mitochondrial genes from the six species were annotated using G. hirsutum and G. barbadense mitogenomes as references. Functional genes (other than tRNA genes) were identified by local blast searches against the database, whereas tRNA genes were predicted de novo using tRNA scan-SE [53]. Repeat-match program in MUMmer [54] was used to identify repeated sequences within six Gossypium mitogenomes. Their genome maps were generated using OGDRAW [55] and the repeat map was drawn by Circos [56].
Collinear blocks were generated among the ten mitochondrial genomes of Gossypium using the progressiveMauve program [57]. To determine the amount of Gossypium mitochondrial genome complexity shared between species, each pair of mitogenomes was aligned using blastn [50] with an e-value cutoff of 1 × 10 −5 . Using these parameters, the blastn searches should be able to detect homologous sequences as short as 30 bp. The unique segments in Gossypium mitogenomes identified in this study were summarized as follows: i) Paired-end reads were mapped onto their respective consensus sequences using the Burrows-Wheeler Aligner (BWA 0.7.10-r789) software [51]; ii) The BWA mapping resulting SAM files were transformed into BAM files using the samtools program [52] set to the default parameters; and iii) Structure variations (SVs) and InDels reported in this work were manually visualized using the Integrative Genomics Viewer (IGV) software [58].

RNA editing identification
RNA edit sites were computationally predicted using the batch version of the PREP-Mt. online server [59], with a cutoff value of 0.2.

Phylogenetic analyses and estimation of evolutionary divergence
For phylogenetic analyses, 36 protein-coding genes were extracted from 10 Gossypium species and two outgroups: C. papaya and A. thaliana. Sequence alignments for 36 concatenated genes, each chloroplast and mitochondrial coding exons were carried out by MAFFT [60]. Phylogenetic analyses were performed with the same methods to our previous studies [35,36,39]. P-distances for chloroplast and mitochondrial coding genes were calculated with MEGA5.05 [61].
Identifying nuclear mtDNAs in Gossypium, estimation of evolutionary divergence and divergence time between mitochondrial sequences and numts Dot matrix comparisons were generated between the mitochondrial and nuclear chromosomes of four Gossypium species using the nucmer program of MUMmer with the parameters 100-bp minimal size for exact match and 500-bp minimal interval between every two matches [54]. The detailed comparison results were shown in Fig. 7: G. raimondii mitochondrial [39] and nuclear chromosomes [27] in Fig. 7a, G. arboreum mitochondrial [39] and nuclear chromosomes [28] in Fig. 7b, G. hirsutum mitochondrial [37] and nuclear chromosomes [30] in Fig. 7c, G. barbadense mitochondrial [38] and nuclear chromosomes [32] in Fig. 7d. Sequence alignments for each coding, intronic, and intergenic spacer regions were carried out by MAFFT [60] software. P-distances between mitochondrial sequences and numts were calculated with MEGA5.05 [61]. In order to estimate how old these insertions are, p-distance rates and some estimates of rate/million years were studied here. The divergence time of between mitochondrial native sequences and numts was calculated by the following Formula: T = p-distance/(r nu + r mt ) [62]. Based on Gaut et al. (1996) and Muse et al. (2000), the r nu and r mt values were estimate as r nu = 6.5 × 10 −9 and r mt = 2 × 10 −10 , respectively [63,64]. It also has to be made clear that the underlying assumption is homogeneity in rate since their divergence from a common ancestor.

Results
Gossypium mitochondrial genomes from diploid and allotetraploid species Six Gossypium mitochondrial genomes were obtained in present study, including three diploid D species and three allotetraploid AD species. Complete mitochondrial DNA sequences were deposited in the GenBank database respectively: Table 1). The six Gossypium mitogenomes were all assembled as single circular molecules of lengths about 644 kb in diploid (Additional file 1: Figure S1A) and 677 kb in allotetraploid (Additional file 1: Figure S1B), respectively. The genomic structures were identical within diploid group (Fig. 1) and allotetraploid group (Fig. 2), respectively, but differed between the two groups. The diploid group had six large repeats (>1 kb) whereas the allotetraploid group had four large repeats ( Fig. 1; Fig. 2), which may be involved in the rearranged mitogenome organizations in diploid and allotetraploid Gossypium.
Comparison of mitochondrial gene content among Gossypium species reveals a conserved pattern of evolutionary stasis for diploid and allotetraploid species, respectively ( Table 1). The Gossypium mitogenomes contain 36 protein coding genes with five genes (rps1, rps2, rps11, rps13 and rps19) being lost during coevolution with nucleus, compared to the common ancestor of seed plants [65,66]. The repeat sequences confer some redundant gene copies (nad4, nad9 and mttB) in three allotetraploid species with uncertain functions (Additional file 2: Table S1). These mitochondrial genomes (Table 1) show high identity in gene content but no similarity in genome organization ( Fig. 1; Fig. 2) with each other or with previously published cotton mitochondrial genomes [37][38][39], with apparent major differences in genome organization and size.

Syntenic regions and rearrangement
After combining four other Gossypium mitochondrial genomes [37][38][39], totally, ten species of five allotetraploid and five diploid were used for analyses: one from A genome, four from D genome and five from AD genome groups, respectively (Table 1). Syntenic regions were identified between ten Gossypium mitochondrial genomes with eight large major syntenic blocks (Fig. 3). The genomic structures were totally identical in four species of D group, indicating that the mitochondrial genome structures may be highly conserved in D genome species. G. trilobum (D 8 ) contributed the cytoplasmic male sterility (CMS) cytoplasm in cotton [67][68][69], however, no genome rearrangement or large indel segments variations compared with mito-genomes of other D species, implying that the mitochondrial CMS-associated gene in cotton may function with different mechanism.
In addition, compared to D group species, the mitochondrial genomic structure in A group (A 2 ) was highly rearranged (Fig. 3). Interestingly, genome rearrangements also occurred among the five allotetraploid species, as was already reported in the G. hirsutum -G. barbadense comparison [38]. Despite the fact that three allotetraploid species (G. tomentosum AD 3 , G. mustelinum AD 4 and G. darwinii AD 5 ) exhibited the same genome organization, a disorder existed between the mitogenomes of G. hirsutum AD 1 and G. barbadense AD 2 (Fig. 3).

Gene order and repeat sequences
To uncover the formation mechanism of recombination generating multiple genomic arrangements in Gossypium, we presented the gene order with five major linear models and genes located in repeat regions shown in bold (Additional file 3: Figure S2). The gene orders in the D genome species are highly conserved but not identical to that in either G. arboreum (A 2 ) or the AD groups with six-seven gene clusters scattered. Though there exists a few changes in mitochondrial gene order within each of five models in the three Gossypium lineages as shown by ten released mitogenomes, a minimum of two and three changes (inversions and translocations) need to be invoked to explain the differences of gene order among diploid and allotetraploid Gossypium, respectively (Fig. 3), and how these genomic rearrangements events happened are difficult to reconstruct. Repeat sequences have been suggested to serve as sites of homologous recombination, resulting in gene order changes in mitochondrial genomes [19]. The repeat sequences detected in the Gossypium mitogenomes in present and earlier studies [37][38][39] may be responsible for mitochondrial gene order changes between diploids and allotetraploids ( Table 2). There mainly existed six or four large repeats in D group or (A + AD) groups, respectively. The repeat sizes were almost identical in D group but differed in (A + AD) groups. Despite a big Genes exhibited on the inside of outer circles are transcribed in a clockwise direction, while genes on the outside of outer circles are transcribed in a reverse direction. The inner circle reveals the distribution of repeats in two mitogenomes with curved lines and ribbons connecting pairs of repeats and width proportional to repeat size. The red ribbons represent > = 1 Kb repeats, the very deep green lines represent repeats between 100 bp to 1 Kb and the very light grey lines represent repeats <100 bp. The numbers give genome coordinates in kilobase deletion, about 50 kb that occurred in R1 of G. hirsutum (AD 1 ), a 27 kb repeat was unique to the AD group. In addition, the repeat diverged considerately between the two diploid Gossypium groups (Table 2). In addition, genes in the border of the gene clusters in Gossypium were almost located in or close to the repeat sequences (Additional file 3: Figure S2). These variations in repeat sequences may perhaps cause the major inversions and translocations within the mitochondrial genome of the common ancestor shared by D-A and A-AD species after Gossypium had diverged. Evolution of gene order in diploid D group mitogenomes of Gossypium is overall quite conservative, but exists divergence between different diploid and allotetraploid lineages.

Conservation and variants in Gossypium mitochondrial genomes
Considering that all the Gossypium mitogenomes have similar genome complexity, comparative analysis were conducted to determine the proportion of the sequences that each shared in common with the others ( Table 3). One of the most surprising outcomes is how rapidly sequence segments were gained or lost. Genome specific fragments is not present in any two genomes of the D or AD groups, respectively (Table 3). While reciprocity is generally not seen in any other comparisons, even between the two diploid mitogenome groups: G. arboreum (A 2 ) lost 2.97% of the sequences present in D group, but D group lost 0.77% of the A species' sequences. G. arboreum (A 2 ) is attributed to be the putative maternal contributor to the progenitor of AD group [21,34,70], however, each of the five AD group genomes has lost substantial amounts of sequence that is present in the G. arboreum (A 2 ) genomes and vice versa ( Table 3). The difference is more striking when comparing mitogenomes of D group and AD group: D group species lost only 0.64% of the AD group mitogenomes, but AD group species lost 4.41% of the D group mito-genomes. Reciprocal differences were more apparent in the comparisons between male-fertile and CMS (cytoplasmic male sterility) mitochondrial genomes [71][72][73]. In fact, the genome complexity in Gossypium presented eight unique segments ranging from 108 bp to 7888 bp in length in D group mitochondrial genomes, comprising a total of 18,194 bp (Indels of <100 bp were not included) (Fig. 4a, showing one of the unique segments); while three specific fragments detected in (A + AD) group mitochondrial genomes with the largest size 3876 bp in length, 4315 bp in total (Fig. 4b). In addition, a large segment more than 11 kb in length that is present in the diploid mitogenome is not present in any of the other five allotetraploid mitogenomes (Fig. 4c). Despite the fact that the ancestor of A-genome group is the maternal source of extant allotetraploid species [20][21][22][23]34], unique presence/absence variations existed as well ( Fig. 4c; Additional file 4: Figure S3).

RNA editing in Gossypium mitochondrial genes
Post-transcriptional RNA-editing of mitochondrial genes is both ubiquitous and important for regulation [74]. Typically, RNA editing of mitochondrial transcripts in flowering plants occurs in coding regions of mitochondrial transcripts to convert specific cytosine residues to uracil (C → U) [75,76]. For ten Gossypium species, we predicted sites of C-to-U editing using the PREP-Mt. online tool [59] with a cutoff of 0.2. The number of predicted C-to-U edits across the entire coding regions of their shared 36 protein genes is almost similar for ten Gossypium species (451), with one editing site lower in G. hirsutum (450 sites) caused in nad3 gene ( Table 4). The simplest interpretation of these results is that the     whole of edit sites in ten Gossypium species were present in their common ancestor, while the species-specific sites are less derived in cotton.

Mitochondrial genome evolution in Gossypium
Phylogenetic relationships among 10 Gossypium species with two outgroups, was generated using a concatenated analysis of 36 mitochondrial protein-coding genes (Fig. 5).
The topology of the resulting tree supports G. arboreum as the maternal donor to polyploid cotton species, which further supports our former result [39]. We mapped these specific indels into the phylogenetic clades, as shown in Fig. 5, which implied an ongoing dynamic divergence process. First, eight mitogenome fragments (U1-U8) were involved in loss events after G. arboreum (A 2 ) diverged from a common ancestor shared with the D genome group. Subsequently, three genome fragments (U9-U11) were transferred from the nucleus to the mitochondrial genome in an A-genome ancestor or contributor/donator before the formation of the allopolyploidization event.
Finally, a large genome fragment (U12), about 11 kb, was lost during the divergence process of allotetraploid ancestor species (Fig. 5). This 11 kb deletion (corresponding to U12 in diploid mitogenomes) was adjacent to the specific repeat sequence R2 in AD group, which might lead to the formation of R2 repeat sequences (unique to AD group species) during evolution. In addition, variations in repeat and Indels lengths also cause the great difference of Gossypium mitochondrial genome sizes ( Fig. 3 and Fig. 5). MtDNA intergenic regions are known to possess more unique segments than genic regions, however, shorter repeats account for the relatively small size of the D-group mitochondrial genomes. Interestingly, a large deletioñ 50 kb in length in R1 may lead to the small size of G. hirsutum (AD 1 ) [37], compared to the other four allotetraploid genomes. Comparative mitochondrial genome analysis revealed rapid mitochondrial genome rearrangement and evolution even within a single subspecies.
In addition, we calculated the p-distances representing evolutionary divergence from 78 chloroplast and 36 mitochondrial protein-coding exons among 10 Gossypium species, as shown in Fig. 6. Here, the average evolutionary divergence was 0.0031 in chloroplast genes but only 0.0005 in mitochondrial genes among 10 Gossypium species. The mitochondrial genes were highly conserved with low evolutionary divergence, however, their genome structures displayed the extremely rapid evolution of various changes, including repeat and large indels variations. Based on these results, the evolutionary distance of mitochondrial genomes are much lower than the chloroplast genomes in Gossypium, however, rapid varying mitogenome structures evolves much faster than the highly conserved chloroplast genomes [35][36][37][38][39].

MtDNAs insert into the nuclear chromosomes in Gossypium
In this study, four sets of mitochondrial and nuclear genomes of Gossypium species (two diploids and two allotetraploid) were analyzed. Numts in four Gossypium nuclear genomes were detected by whole-genome alignment. Dot matrix analysis of mitochondrial vs nuclear genomes in G. raimondii (D 5 ) show that there is a stretch of~598 kb (92.91%) of sequence that is nearly identical to that of the G. raimondii mitochondrial genome (Fig. 7a) in chromosome 1. This insertion is at least 99.80% identical to the mitochondrial genome, suggesting that the transfer event was very recent. The organization of the assembled mitochondrial genome differs from that of the mitochondrial DNA in the nucleus with an internal deletion (Fig. 7a), which might occur during or after transferring and represent an alternate isoform of the G. raimondii mitochondrial genome. In addition, G. hirsutum also has a nearly complete NUMT on chromosome A03 (Fig. 7c), and small to median-large fragments of mitochondrial DNA have been identified in three Gossypium species nuclear genomes (Fig. 7b-d), showing apparently sporadic fragmentation compared to G. raimondii. So much noise in Gossypium nuclear chromosomes of ( Fig. 7b-d) are just repetitive derived elements. These results may be caused by the insertion of retrotransposon elements into mitochondrial DNA insertions that may contribute significantly to their fragmentation process in the other three nuclear genomes.
In addition, most numts had >99% nucleotide identity to the homologous organelle sequences, so the lack of divergence in G. raimondii indicates that they must have been transferred to the nucleus recently. In order to estimate how old these insertions are, p-distance rates and some estimates of rate/million years between mitochondrial sequences and numts were studied here. We have dated 20 larger NUMTs in G. raimondii (Additional file 5: Table S2), 16 larger NUMTs in G. arboreum (Additional file 6: Table S3), 15 larger NUMTs in G. hirsutum (Additional file 7: Table S4) and 12 larger NUMTs in G. barbadense (Additional file 8: Table S5). These data showed that the insertion time of NUMTs was close among one chromosome, but with big divergence between different chromosomes. For example, the different insertion time for five larger NUMTs in chromosome A03 of G. hirsutum (ranging from 0.33-1.03 MYA), with other chromosomes (insertion time ranging from 0.91-11.43 MYA) (Additional file 7: Table S4).

Discussion
From the perspective of divergence, Gossypium originated from a common ancestor approximately ten million years ago and an allopolyploidization event occurred approximately 1.5 million years ago [35,36]. Plant mitochondrial genomes have experienced myriad synteny-disrupting rearrangements even over a very short evolutionary timescale. Like most angiosperm mitogenomes abundant in repeat sequences with larger repeats mediating recombination at moderate to high frequency [19,77], these recombination events generated multiple mito-genomic arrangements differed in Gossypium genome groups, which may be largely caused by both larger repeats and some key InDels or SVs during evolution, and quickly eroded synteny even among closely related plants [8,72,73,78]. These cotton mitochondrial genomes diverged much, as indicated by the InDels events unique to A genome species, D and AD groups, respectively. All these structural variants (SVs) are located in intergenic regions of mitogenomes. Some of them overlapped with their breakpoints and junctions occurring in repetitive and homologous genomic regions. And insertions or deletions were mostly generated from crossovers of repetitive regions or homologous regions [79].
There existed apparent inversions and translocations, which can offer clues to explain gene order differences of mitogenomes between different Gossypium groups. For examples, the mode of gene order changed by inversions and/or translocations was presented in early land plant mitochondrial genomes evolution of bryophytes [80] as well as rapidly rearranged mitochondrial genomes of vascular plants [71-73, 81, 82]. Apart from apparent rearranged mitogenome organizations in diploid and allotetraploid Gossypium, mitochondrial genome rearrangements have also been detected in diploid and allotetraploid species of Brassica [74,83]. Generally, apparent variations in the mitogenome structures were always tested to be associated with cytoplasmic male sterility (CMS) and its maintainer lines [71][72][73], thus a new mitochondrial gene was produced by recombination and conferred CMS with its encoded protein interacted with the nuclear encoded mitochondrial protein to cause a detrimental interaction [43]. However, no genome rearrangement or large indel segments variations compared with mitogenomes of other D species, implying that the mitochondrial CMS-associated gene in cotton may function with different mechanism.
In addition, RNA-editing sites in Gossypium may not be in charge of cytoplasmic male sterility in D 8 cotton. RNA editing events have been compared in eight mitochondrial genes (atp1, atp4, atp6, atp8, atp9, and cox1,   [75]. Although the frequencies of RNA editing events between mtDNA genes were different, no differences between cotton cytoplasms that could account for the CMS phenotype or restoration. In view of these results, the complete mitogenome sequences will provide the useful data resources for targeting the CMS-related genes in G. trilobum D 8 cotton in further studies. As for MtDNAs insert into the nuclear chromosomes in Gossypium, Lin et al., (1999) and Stupar et al., (2001) also identified an intact mtDNA copy on chromosome 2 in the nucleus of Arabidopsis with more than 99% identity, which proved this type of mitochondrion-to-nucleus migration event [84,85]. Second, these mitochondrion-to-nucleus migrations proved to be the independent events after the divergence of the Gossypium progenitors. These genome changes within the diploid and allotetraploid Gossypium species is worthy of more attention in future studies.

Conclusions
Plants mitochondrial genomes are evolutionarily intriguing because of the highly conserved genic content and slow rates of genic sequence evolution [18,82]. These features contrasted sharply with the highly labile genomic structure, genome size, DNA repair mechanism and recombination induced by different types and origins of repeated sequences [82,[86][87][88]. Whole mitogenome sequences have been released in an ongoing process [9,11,38,81,89], which provide information for dissecting the evolutionary modifications in these genomes, such as gene loss [88], sequence acquisitions or loss [9], multiple sequence rearrangements [73] and dynamic structure evolution [38,39]. Here, we presented six more cotton mitochondrial genomes, which showed apparently distinct divergence. Despite the short divergence time separating diploid and allotetraploid cotton species [35,36], many of the hallmark features of mitochondrial genome evolution are evident, Fig. 7 Mitochondrial DNAs insertions into four Gossypium nuclear genomes detected by whole-genome alignment. The results were filtered to select only those alignments which comprise the one-to-one mapping between reference and query, and then display a dotplot of the selected alignments. The red and blue lines refer positive and reverse matches, respectively. a: Dot matrix analysis of numts in Gossypium raimondii (D 5 ) nuclear genome performed using MUMmer (Delcher et al., 2002). b: Dot matrix analysis of numts in G. arboreum (A 2 ) nuclear genome. c: Dot matrix analysis of numts in G. hirsutum (AD 1 ) nuclear genome. d: Dot matrix analysis of numts in G. barbadense (AD 2 ) nuclear genome including differential genic content, genome rearrangements, inversion and translocation, gains/losses of multiple small and large repeats, presence/absence variations, and the mitogenome of G. trilobum D 8 cotton for targeting CMS-associated gene. Comparative analyses illustrated that four of the outcomes are quite surprising, including: 1) how rapidly mitochondrial genome rearrangements occur within a single subspecies (diverged~10 mya), 2) how rapidly mitochondrial sequence segments are gained or lost, 3) RNA editing events were almost conserved in ten Gossypium mitogenomes, and 4) a previous unusual report of the integration of 93% of the mitochondrial genome of G. raimondii into chromosome 1 is confirmed with an estimation of insertion time 0.05 MYA. Increasing insight into the mechanisms and functional consequences of plant mitochondrial genome variation are expected to be helpful to elucidate the process of rapid evolutionary divergence mechanism between closely related mitochondrial genomes.