Insights from the Complete Chloroplast Genome into the Evolution of Sesamum indicum L

Sesame (Sesamum indicum L.) is one of the oldest oilseed crops. In order to investigate the evolutionary characters according to the Sesame Genome Project, apart from sequencing its nuclear genome, we sequenced the complete chloroplast genome of S. indicum cv. Yuzhi 11 (white seeded) using Illumina and 454 sequencing. Comparisons of chloroplast genomes between S. indicum and the 18 other higher plants were then analyzed. The chloroplast genome of cv. Yuzhi 11 contains 153,338 bp and a total of 114 unique genes (KC569603). The number of chloroplast genes in sesame is the same as that in Nicotiana tabacum, Vitis vinifera and Platanus occidentalis. The variation in the length of the large single-copy (LSC) regions and inverted repeats (IR) in sesame compared to 18 other higher plant species was the main contributor to size variation in the cp genome in these species. The 77 functional chloroplast genes, except for ycf1 and ycf2, were highly conserved. The deletion of the cp ycf1 gene sequence in cp genomes may be due either to its transfer to the nuclear genome, as has occurred in sesame, or direct deletion, as has occurred in Panax ginseng and Cucumis sativus. The sesame ycf2 gene is only 5,721 bp in length and has lost about 1,179 bp. Nucleotides 1–585 of ycf2 when queried in BLAST had hits in the sesame draft genome. Five repeats (R10, R12, R13, R14 and R17) were unique to the sesame chloroplast genome. We also found that IR contraction/expansion in the cp genome alters its rate of evolution. Chloroplast genes and repeats display the signature of convergent evolution in sesame and other species. These findings provide a foundation for further investigation of cp genome evolution in Sesamum and other higher plants.


Introduction
Sesame (Sesamum indicum L., 2n = 26), which belongs to the Pedaliaceae family, is one of the oldest and most important oilseed crops [1]. The history of its cultivation can be traced back to 3050-3500 BC in the Harappa Valley of the Indian subcontinent [2]. Currently sesame is grown worldwide in tropical and subtropical regions with a total area of about 7.8 million hectares, and annual production of 3.84 million tons (2010, FAO). Sesame as an oilseed crop has one of the highest oil-contents at 50-60% [1,3] and is mainly used for oil and food [4,5].
S. indicum is in the asterids clade of the core eudicotyledons in Angiosperm Phylogeny Group 2 (APG 2) (Angiosperm Phylogeny Group, 2003). Compared with 36 plant species from 19 families using publically-available genomic datasets (NCBI), Sesamum is closely related to members of the Solanaceae and Phrymaceae families, but distant to the other oil crops such as soybean (Glycine max), castor (Ricinus communis) and rape (Brassica rapa) [6]. The chloroplast (cp) genome sequence of S. indicum cv. Ansanggae (a black-seeded cultivar) was published recently [7]. Its phylogenetic position suggests that Sesamum is a sister genus to the Olea and Jasminum (Oleaceae family) and is located in the core lineage of the Lamiales family [7]. However, the origin and phylogeny of Sesamum still requires clarification [1,8]. The evolutionary process and relationship between sesame and other oil crops has not been explored using genomic data.
The chloroplast is a vital plastid in plants and algae, containing all the enzymatic machinery required for plant photosynthesis and related genomic information [9,10]. It is regarded as one of the most important indices for comparative evolutionary analysis and molecular taxonomy, as the cp genome is relatively conservative and independent of the nuclear genome [11]. In most plants, the circular cp genome is 120-160 Kb, and usually contains about 4 rRNAs, 30 tRNAs and 80 protein-coding genes related to photosynthesis or gene expression [12,13]. The cp genome is present at high copy number and has been used in genetic modification and crop breeding studies [14][15][16][17][18]. To date, more than one hundred cp genomes have been sequenced (Chloroplast Genome DB, http://chloroplast.cbio.psu.edu).
As part of the ongoing Sesame Genome Project (www.sesamum. org), we have used Illumina and 454 sequencing to sequence and assemble the complete cp genome of cv. Yuzhi 11. We have also performed a comparative evolutionary analysis of cp genomes between sesame and other major crops using publically-available genomic datasets, thus revealing some features of sesame evolution.

Materials and Methods
Plant material and isolation of sesame cp genome DNA Yuzhi 11 (white-seeded), a major Chinese domestic cultivar and the cultivar we used to sequence the sesame genome [6,19], was grown at Yuanyang Experimental Station, Henan Academy of Agricultural Sciences (HAAS) in 2011. Approximately 100 g young leaf tissue was harvested for extraction of cp genome DNA.
Intact chloroplasts of S. indicum were collected by sucrose density gradient centrifugation [20]. Fresh leaves were fully homogenized in chloroplast isolation buffer (0.3 M Sorbitol, 5 mM EDTA, 5 mM MgC1 2 , 1 mM DTT, 5 mM KH 2 PO 4 , 5 mM K 2 HPO 4 , 10 mM 2-Mercaptoethanol and 2 mM Ascorbic acid) at 0uC. In order to remove the cell wall debris and unbroken cells, the homogenate was gently filtered through 8 layers of cheesecloth and then centrifuged at 200 g for 5 min at 4uC before resuspending in chloroplast isolation buffer. Intact chloroplasts were purified by further sucrose density gradient centrifugation at 2,500 g for 15 min and then at 3,500 g for 30 min. Chloroplast genome DNA was isolated using chloroplast lysis buffer (10 mM Tris, 2% sodium dodecyl sulphate and 0.4% sodium Nlauroylsarcosine). Proteinase K and RNase were added to remove all protein and RNA from the cp DNA solution. The quality of cp genome DNA was analyzed by pulsed-field gel electrophoresis (PFGE). 20 mg of cp genome DNA was prepared for constructing a Solexa library, and an equal amount of DNA was reserved for 454 sequencing and gap-filling.

High-throughput sequencing
We sequenced the sesame cp genome using both Illumina and 454 sequencing. High-throughput sequencing of the S. indicum cp genome was first carried out on an Illumina GA IIx platform. Paired-end and mate-pair libraries with insert sizes of 500 bp and 3 Kb, respectively, were constructed using proprietary reagents according to the manufacturer's recommended protocols (https:// icom.illumina.com/). Paired-end and mate-pair libraries were denatured and then diluted in hybridization buffer before loading into an Illumina GA flowcell. 10162 cycle sequencing was performed according to the manufacturer's instructions. To accurately assign the repeat regions to the cp genome, Roche 454 reads from paired-end (PE) libraries with an insert size of 8 Kb were also used. The Roche 454 reads were generated in the Sesame Genome Project (www.sesamum.org) [6], and ranged from 64 to 1,199 bp in length. Constructing 8 Kb PE library and 454 sequencing were performed according to the protocols described by Jarvie and Harkins [21].

Cp genome assembly
Raw reads generated by Illumina-Solexa GAIIx were preprocessed using SolexaQA [22]. Low quality bases (Q,13) were trimmed and all reads shorter than 25 bp were discarded. Trimmed reads were re-paired with an in-house perl script. To efficiently assemble the cp genome, a method as below was performed (see Figure S1). All quality-filtered paired reads were mapped against the cp genomes of Ageratina adenophora (NCBI: NC_015621.1) and Olea europaea (NCBI: NC_015623) using the BWA-SW algorithm and the defaulted parameters [23]. The yielded reads were definitely from the sesame cp genome. Then all mapped reads and their mates were de novo assembled using Velvet [24]. Subsequently, Roche 454 raw reads of sesame nuclear and cp genomes were aligned to the contigs generated in Velvet using GS Reference Mapper (454 Life Science). The mapped Roche 454 reads were definitely from the sesame cp genome. GS De Novo Assembler (v2.6) was used to assemble the extracted Roche 454 reads, and the draft genome was assembled. Potential gaps and the IR (inverted repeat, a collapsed consensus of IRa and IRb), LSC (large single-copy) and SSC (small single-copy) region of the draft genome were identified after aligned to the cp genome of A. adenophora with BLAST. PCR walking and capillary electrophoresis sequencing (ABI 3730xl sequencer) were performed to fill the gaps and to verify the junctions between the single-copy and the IRs regions. The primers used in this step were developed using Consed (v 20.0). After gap-filling, Illumina-Solexa reads and the BWA were used to verify the bases and to correct potential assembly errors.

Bioinformatics analysis
The S. indicum cp genome was annotated with DOGMA (Dual Organellar GenoMe Annotator) [25]. A circular map of the sesame cp genome was drawn using Circos [26]. Repeats and Inverted Repeats (IR) within the sesame cp genome were identified using REPuter, using criteria of length cutoff $30 bp and sequence identity $90% [27]. Protein-coding and noncoding sequences from S. indiucm and the 18 species were aligned using MEGA 5.0 with the MUSCLE-codon (Multiple Sequence Caparison by Log-Expectation) and MUSCLE model, respectively [28]. Sequence alignments at whole cp genome level were also performed in MEGA 5.0 with MUSCLE model. In all the above sequence alignments, default settings were used. Ka (nonsynonymous substitution rates), Ks (synonymous substitution rates) and their ratio were calculated by the KaKs_Calculator program [29] using MA (Model Averaging).

Results
Sequencing of the complete cp genome and its structure in sesame After Illumina and Roche 454 sequencing, the Illumina and 454 raw reads were mapped using Velvet and GS De Novo Assembler, respectively. The mapped reads gave a coverage of approximate 2186 cp genome. Using the 454 mapped reads, the draft genome was assembled into four scaffolds ranging from 10,567 to 65,797 bp in length. The draft genome covered 99% of the genome part and contained the LSC, SSC and one IR region. After gap filling, single-copy and IRs region identifying and sequence verifying, the complete cp genome of S. indicum cv. Yuzhi 11 was formed. The sesame cp genome is a circular molecule containing a total of 153,338 base pairs (GenBank accession no. KC569603) ( Figure 1). The three scaffolds of this cp genome were found to contain the inverted repeat (IR, a collapsed consensus of IRa and IRb, 25,142 bp), large single-copy (LSC, 85,180 bp) and small single-copy (SSC, 17,874 bp) regions. A total of 114 unique genes, encoding 80 proteins, 30 tRNA and 4 rRNA, were identified in the cp genome (Yuzhi 11 genotype). As shown in Figure 1, two copies of 8 protein-coding genes, 7 tRNA and 4 rRNA genes are present in the IR region. Of the 153, 338 bp, protein-coding genes, tRNA genes and rRNA genes occupy 50.44% 1.84% and 5.90%, respectively. There are 18 introncontaining genes in the cp genome, of which 16 contain one intron, and 2 (ycf3 and clpP) have two introns. The overall AT content is 61.8%. The ratio of AT content in protein-coding genes, tRNA and rRNA sequences is 61.78%, 47.34%, and 44.73% respectively.
To evaluate the degree of conservation of the sesame cp genome, we compared the cp genomes of cv. Yuzhi 11 with that of cv. Ansanggae (NC_016433.2) ( Table 1). Results showed that there are only 14 differences within the nucleotide sequences of homopolymers. The number of repeat nucleotides in the 14 homopolymers of the cv. Ansanggae cp genome had uniformly one less base than homopolymers from the cv. Yuzhi 11 cp genome.
Then comparisons of cp genome sequence and structure were analyzed between S. indicum and the 18 species presenting the available nuclear genome sequences (listed in Table 2). The number of cp genes in sesame is the same as that in Nicotiana tabacum, Vitis vinifera, Platanus occidentalis (NCBI data). Among the 19 cp genomes examined, infA, infA and rpl22, infA, and infA, were missing in Arabidopsis thaliana, Glycine max, Brassica napus and Mangifera indica, respectively. Gene order in the sesame cp genome is highly conserved, being similar to that of N. tabacum, A. thaliana, P. occidentalis and B. napus, but different from that of G. max, Helianthus annuus and Gossypium hirsutum (NCBI data). Variation in the length of cp genomes between sesame and 18 other plant species In order to clarify the evolutionary position of the sesame cp genome among higher plants, we conducted a phylogenetic analysis using data from 19 cp genomes (Table 2). Results were consistent with previous reports ( Figure S2) [6]. The size of the sesame cp genome was smaller than that of 11 species such as V. vinifera, A. thaliana, G. hirsutum and N. tabacum, but larger than that of 7 species, i.e., B. napus, G. max, H. annuus and four Poaceae species ( Table 2). The lengths of the LSC and IRs in sesame differed from those in the other 18 species and contributed to the variation of cp genome size. For example, the difference in the size of the sesame and Panax ginseng cp genomes was 2,980 bp, differences in the lengths of LSC and IR sequences contributing 926 bp and 1858 bp, respectively. Differences in the size of the sesame and N. tabacum cp genomes was 2,605 bp, the LSC and IR sequences contributing 1,506 bp and 402 bp, respectively. Variation in the length of IRs had a large effect on cp genomic evolution in A. thaliana, Coffea arabica, P. ginseng and the Poaceae.
We also compared the protein-coding sequences of the 19 cp genomes. The length of all 77 functional cp genes, except for ycf1 and ycf2, was highly conserved. Multiple sequence alignments were performed on the length variation of ycf1 and ycf2. While ycf2 genes were lost in the cp genomes of four grass species, the length of the ycf2 gene in the 14 species varied between 5, 967 bp (Cucumis sativus) and 6,903 bp (V. vinifera (Figure 2A and Figure S2)). However, the ycf2 gene in the sesame cp genome was observed as only 5,721 bp in length and a fragment of about 1,179 bp was lost. In order to trace the missing sequence, the 1-1,179 bp region of the ycf2 gene of O. europaea was selected as a reference to screen the sesame genome sequence database (about 106 coverage) (www. sesamum.org) ( Figure 2B and Figure S3). BLAST results showed that nucleotides 1-585 in the query had hits in the sesame draft genome, while nucleotides 586-1,179 could not be found. Multiple sequence alignments from the 15 species showed that the sesame ycf1 gene was shorter than those from 11 species, and the same fragments in three species, i.e., A. thalianan, H. annuus and B. napus were evidently lost ( Figure S4). Screening results of the sesame ycf1 gene indicated that the 1-1,000 bp fragment had highly similar hits (identities .90%) in the sesame draft genome as a query.

Comparisons of repeats in cp genome between sesame and 18 other plant species
Most repeats in the cp genome are present in the introns or exons of genes. Seventeen forward and inverted repeats ($30 bp) were identified in the sesame cp genome (Table 3). Of these repeats, R3, R13 and R14 were over 40 bp in length, while the other repeats were 30-40 bp in length. To determine their evolutionary characteristics, we used BLAST to compare the 17 repeats in the sesame cp genome with 18 other species (Table 4). The 17 repeats were roughly divided into 4 groups according to their level of conservation. Group 1 consisted of five highly conserved repeats that were present in nearly all monocots and dicots, while group 4 consisted of seven repeats that were detected only in one or a few species and had low conservation. Notably, repeats R10, R12, R13, R14 and R17 were unique to the sesame cp genome, and had no hits in other species. Furthermore, multiple sequence alignment showed that specificity of R13 and R14 in sesame is due to extension of shorter ancestral repeats ( Figure 3).

IR expansion and contraction
The locations of the LSC/IR and SSC/IR junctions are regarded as an index of cp genome evolution. To identify the impact of these junctions on sesame evolution, we screened the structures of IR expansions and contractions in sesame and 14 other species ( Figure S5). In sesame and 12 other cp genomes, the border of the LSC/IR junction was located within the rps19 gene, resulting in the formation of an rps19 pseudogene. In the O. europaea and C. sativus cp genomes, however, rps19 pseudogenes were not present since the LSC/IR junction border was located downstream of the rps19 gene. The length of rps19 pseudogenes in the 13 species ranged from 24 bp to 113 bp, with that in sesame, like Ricinus communis and P. occidentalis, being 30 bp in length. The border of the SSC/IR junction in sesame was located within the ycf1 gene, resulting in the formation of a ycf1 pseudogene. The length of ycf1 pseudogenes varied between 345 bp and 1, 679 bp in the 14 species. The ycf1 pseudogene in sesame was 1,010 bp, a similar length to that in N. tabacum, O. europaea, R. communis, C. arabica, A. thaliana, B. napus and C. sativus. In addition, we also investigated the evolutionary rate of the part of the sequence of the ycf1 gene located in the IR region, since the ycf1 gene of P. occidentalis was chosen as the reference for Ka and Ks estimation ( Table 5). The Ka/Ks of IR region-located fragments of the ycf1 gene were significantly lower among the 13 species than those of the full sequences.
Comparison of evolutionary rates of the 77 genes in the cp genomes between sesame and 13 other plant species Before examining variation in the evolutionary rates of cp genes, we calculated the Ka, Ks and Ka/Ks ratio of 77 protein-coding genes in sesame and 13 other dicot species from the asterid and rosid clades (the corresponding genes of P. occidentalis were chosen as reference genes) ( Figure S6). Results showed that evolutionary rates of cp genes were not uniform. Genes involved in photosynthesis, such as atpH, psaA and petN, evolved more slowly and usually presented low Ka/Ks values, while other genes, including psaI, involved in photosynthesis, rpl23, involved in replication, and ycf2 and ycf15 genes with unclear functions, evolved more quickly and had high Ka/Ks values ($0.5).
Comparisons of evolutionary rates of the 77 cp genes between sesame and the other 13 dicot species (Table S1, Figure 4) indicated that nine genes in the sesame cp genome, i.e., the ndhB, ndhD and ndhI genes encoding the subunits of NADH dehydrogenase, the rpl2, rpl22, rpl32 and rpl33 genes encoding the large  subunit of the ribosome, the rps12 gene encoding the small subunit of the ribosome, and the rbcL gene encoding the large subunit of Rubisco, all evolved rapidly. Genes with low evolutionary rates included the ndhK gene encoding the subunit of NADH dehydrogenase, the atpI gene encoding the subunit of ATP synthase, and the cemA gene encoding the envelope membrane protein.

Discussion
In this article, the chloroplast genome of the Chinese cultivar, Yuzhi 11 (white-seeded) was sequenced and the evolutionary characters of cp genome structure and genes were compared between sesame and the 18 species. The marked conservation of the cp genome exists in sesame, and the characteristics of convergent evolution are evident in cp genes in sesame and some other species. To date, more than one hundred cp genomes have been sequenced and studied. Chloroplast genome sequences and basic genomic structures, e.g., gene content, repeat characteristics, and indel and SSR marker locations, have been analyzed in many important crops [18,[30][31][32]. The conservation of the cp genome suggests a universal evolutionary selection pressure; evolutionary changes in the cp genome do not happen randomly [33]. However, in order to clarify plant phylogenic relationships, evolutionary changes in individual species require further exploration.

Characteristics of the sesame cp genome
With the aid of sesame nuclear genomic data, we have sequenced the cp genome of sesame cv. YuZhi 11 using Illumina and 454 sequencing and explored its species-specific structure. Although recent studies have suggested that the genetic diversity and cytological differences between black-seeded and white-seeded germplasm are significant [34,35], the cp genome sequence of cv. Yuzhi 11 (white-seeded) has high similarity to that of cv. Ansanggae (black-seeded) (NC_016433.2, with only slight variation in the number of nucleotide repeats in 14 homopolymers which may be to use of different sequencing platforms in these two studies. The sesame cp genome has a similar number of genes to species such as Nicotiana tabacum, Vitis vinifera and Platanus occidentalis. The order of genes in the sesame cp genome is highly conserved and is similar to that of N. tabacum, A. thaliana, P. occidentalis and B. napus, but different from that of G. max, H. annuus and G. hirsutum in which there are large inversions [30,31,36]. While gene loss events were not detected, the sesame cp genome has a shortened ycf2 gene. In addition, some unique repeat sequences, e.g., R13 and R14 ( Figure 3) were found, with the number of repeats being lower in sesame than in A. thaliana, G. max and G. hirsutum [30,31,37]. IR/ SC junctions are located in the rps19 and ycf2 genes, respectively, as in some other species.
Variation in the ycf1 and ycf2 gene Ycf genes have proved useful for analyzing cp genome variation in higher plants and algae, even though their function is not thoroughly known [38]. There are 7-8 ycf genes (including pseudogenes) in the cp genomes from higher plants. Of these, ycf1 and ycf2 are the two largest genes and are located in IR/SC junction and IR region, respectively. Biolistic chloroplast transformation studies in N. tabacum have indicated that these genes are essential for plant survival [39] and are likely the targets of positive evolutionary selection [40]. The ycf2 gene in the cp genome is regarded as having one of the fastest evolutionary rates within the cp genome since one copy of the ycf2 gene in ginkgo is lost and both copies of the ycf2 gene in the grasses are lost [40][41][42]. The ycf2 gene in sesame is transcribed as its mRNA is present in the  [43]. In this study, we found that an approximately 1,179 bp fragment of the ycf2 gene was missing in the sesame cp genome ( Figure S3). Moreover, BLAST results showed that querying a 1-585 bp fragment of the ycf2 gene yielded a hit in the sesame draft genome, however, the remaining 586-1,179 bp fragment was not found ( Figure 2B).
Interestingly, multiple sequence alignments showed that a 580-1,179 bp fragment of ycf2 in P. ginseng and a 439-1155 bp fragment of ycf2 in C. sativus, counterparts of the 586-1,179 bp query fragment of the sesame ycf2 gene, are also missing (Figure 2A, B). We thus propose that this sequence deletion may have occurred in at least one of two ways, i.e., by transfer to the nuclear genome, as in the case of sesame, or by direct deletion, as in the case of P. ginseng and C. sativus. The evolutionary Table 5. Evolutionary rate of full length ycf1 and IR region-located fragments of the ycf1 gene.  characteristics of the ycf2 gene in these species are similar, even though a close phylogenetic relationship was not found between sesame, P. ginseng and C. sativus, presenting an evident signature of convergent evolution ( Figure S2). Similarly, the co-occurred missing event in ycf1 gene in sesame, C. arabica and B. napus should be a consequence of convergent evolution ( Figure S3).

Convergent evolution of repeat sequences
Chloroplast genomes in most plants contain repeat sequences other than the Inverted Repeats (IR), with the repeat number ranging from tens to hundreds [30,44]. Repeat sequences often maintain high conservation of sequence identity and location, and thus may play functional roles in cp genomes [30,45]. The detailed functions of the repeats are not well understood, though the number of repeats has been shown to be correlated with the degree of rearrangement of the cp genome [46,47].
R13 and R14, located within the exons of the ycf2 gene, were found to be unique to the sesame cp genome. Repeats of shorter length are present in the same locations as R13 and R14 in the other species such as O. europaea, V. vinifera and G. hirsutum (Figure 3). The uniqueness of these repeats in sesame is likely to be a consequence of extension of shorter ancestral repeats. Moreover, such conservation of repeats in species that are not phylogenetically closely related should be regarded as an incident of convergent evolution.

Consequences of IR expansion/contraction
IR expansion and contraction are common evolutionary events in plant species and have been well verified in many species such as A. thaliana, N. tabacum, and oil palm [37,[48][49][50]. LSC/IR and SSC/IR junctions have different features in the cp genomes of different species. IR expansion/contraction has had two main consequences on the cp genome evolution in almost all publicallyavailable cp genomes, i.e., alteration of cp genome size [30,51] and formation of pseudogenes at IR/SC junctions. In higher plants, IR expansion/contraction has a major effect on genome size [52], which has also been the case in our study (Table 2). In previous studies, sequences located in IRs showed slower rates of evolution compared with those located in SSC or LSC regions [7]. Here, we also found that the evolutionary rate of the part of the ycf1 sequences located in the IR region was significantly lower than that of the full sequences in the 13 species (Table 5). Accordingly, we propose that one consequence of IR contraction/expansion is changing the rate of evolution.

Evolutionary rates suggest convergent evolution
Compared with genes from nuclear genomes, cp genes evolve at a slow rate, making them a useful for plant phylogenetic and taxonomic research [53]. Previous studies have suggested that the evolutionary rate of cp genes is lineage-specific, locus-specific and region-specific [7,40,54]. For example, some cp genes in grass lineages have evolved at a faster rate than those from N. tabacum [54]; IRs have a slower nucleotide substitution rate compared with SSC and LSC regions [7]. In addition, it has been shown that the rate of evolution of a gene correlates with relaxed or positive selection, gene function, and gene expression level [55,56]. In the sesame cp genome, the rapid or slow evolution of some genes is species-specific. The evolutionary rate of rps12 in sesame and G. hirsutum is highest in sesame and 13 other species (Table S1). Multiple sequence alignment results suggested that co-variation of two sites in the rps12 amino acid sequence occurs only in sesame and G. hirsutum (Figure 4). Similarly, convergent evolution was also detected in clpP genes of sesame and C. sativus ( Figure S6).

Conclusion
The cp genome sequence of cv. Yuzhi 11 (white-seeded) has high similarity to that of cv. Ansanggae (black-seeded). The cp gene deletion event occur in cp genomes in at least one of two ways, i.e., transfer to the nuclear genome as has occurred in sesame, and directly deletion as has occurred in P. ginseng and C. sativus. The uniqueness of repeats in sesame is likely due to extension of shorter ancestral repeats. Apart from changing the cp genome size and forming pseudogenes at IR/SC junctions, changing the rate of evolution is regarded as another new consequence of IR contraction/expansion. The characteristics of convergent evolution are evident in cp genes in sesame and some other species. These findings provide a foundation for further understanding of cp genome evolution in Sesamum and other higher plants.
The accession number for the sesame chloroplast genome sequence (cv. Yuzhi 11) is KC569603 (NCBI). The accession number of 454 Roche, 500 bp PE and 3Kb MP Illumina sequencing raw data of sesame cp genome is SRR949053, SRR949054 and SRR949055, respectively. The Illumina and Roche 454 raw reads of sesame nuclear genome sequence have been deposited in sesame genome database and could be downloaded from the website of Sesame Genome Project (http://www.sesamum.org). Table S1 Comparisons of the evolutionary rates of 77 genes between the cp genomes of S. indicum L. and 13 other species. All 77 genes were re-annotated using DOGMA;indicates that no such gene exists in that species, or the gene cannot be estimated using the MA method; * indicates Ka/Ks values larger than 2 which are not credible due to their low Ks values.

Author Contributions
Conceived and designed the experiments: HYZ. Performed the experiments: SJX. Analyzed the data: CL SJX. Contributed reagents/materials/ analysis tools: HMM HYZ. Wrote the paper: HMM CL HYZ.
Evolution of the Sesame Chloroplast Genome PLOS ONE | www.plosone.org