Extensive plastome reduction and loss of photosynthesis genes in Diphelypaea coccinea, a holoparasitic plant of the family Orobanchaceae

Background Parasitic plants have the ability to obtain nutrients from their hosts and are less dependent on their own photosynthesis or completely lose this capacity. The reduction in plastid genome size and gene content in parasitic plants predominantly results from loss of photosynthetic genes. Plants from the family Orobanchaceae are used as models for studying plastid genome evolution in the transition from an autotrophic to parasitic lifestyle. Diphelypaea is a poorly studied genus of the Orobanchaceae, comprising two species of non-photosynthetic root holoparasites. In this study, we sequenced the plastid genome of Diphelypaea coccinea and compared it with other Orobanchaceae, to elucidate patterns of plastid genome evolution. In addition, we used plastid genome data to define the phylogenetic position of Diphelypaea spp. Methods The complete nucleotide sequence of the plastid genome of D. coccinea was obtained from total plant DNA, using pyrosequencing technology. Results The D. coccinea plastome is only 66,616 bp in length, and is highly rearranged; however, it retains a quadripartite structure. It contains only four rRNA genes, 25 tRNA genes and 25 protein-coding genes, being one of the most highly reduced plastomes among the parasitic Orobanchaceae. All genes related to photosynthesis, including the ATP synthase genes, had been lost, whereas most housekeeping genes remain intact. The plastome contains two divergent, but probably intact clpP genes. Intron loss had occurred in some protein-coding and tRNA genes. Phylogenetic analysis yielded a fully resolved tree for the Orobanchaceae, with Diphelypaea being a sister group to Orobanche sect. Orobanche.


INTRODUCTION
About 1% of all angiosperm species can parasitize other flowering plants or mycorrhizal fungi (Nickrent et al., 1997). Facultative or obligatory hemiparasites still carry out photosynthesis to some extent, while holoparasites have completely lost the ability to perform photosynthesis, and obtain nutrients from their host. Transition from an autotrophic to heterotrophic lifestyle is associated with a relaxation of selection pressure on photosynthesis-related genes, both in the nuclear and the plastid genomes. Most evident is a functional and physical reduction of the plastid genome (plastome), which correlates with a loss of genes encoding the photosynthetic machinery and related functions, increased substitution rates, and structural rearrangements (DePamphilis & Palmer, 1990;Wolfe, Morden & Palmer, 1992;Barrett et al., 2014). Since the plastome contains both photosynthesis-related and housekeeping genes, gene loss is not random and follows a particular pattern (Barrett & Davis, 2012;Barrett et al., 2014;Graham, Lam & Merckx, 2017). The NAD(P)H dehydrogenase (ndh) genes are usually lost first, followed by photosynthesis-related genes (psa, psb, pet, rbcL and atp) and plastid-encoded RNA polymerase. Housekeeping genes encoding rRNAs, ribosomal proteins and tRNAs are the last to be lost. Extensive studies of parasitic plants have revealed different levels of plastome degradation and their correlation with the types of parasitism, ranging from minimal in hemiparasitic members of Orobanchaceae (Wicke et al., 2013) and Viscaceae (Petersen, Cuenca & Seberg, 2015) to extreme in some holoparasitic species, such as Pilostyles aethiopica (Bellot & Renner, 2015), and even possibly complete loss of the plastome, as in Rafflesia lagascae (Molina et al., 2014).
The genus Diphelypaea (Nicolson, 1975), also known as Phelypaea, is phylogenetically close to Orobanche sect. Orobanche (Schneeweiss et al., 2004a) and comprises two species, Diphelypaea coccinea (M.Bieb.) Nicolson and Diphelypaea tournefortii (Desf.) Nicolson, which occur in the Caucasus, Crimea and Western Asia (Turkey and Iran). D. coccinea is an achlorophyllous obligately parasitic perennial herbaceous plant up to 30-50 cm in height. The stem is unbranched, red to reddish brown, ending in a single flower of a bright red color (Fig. 1). D. coccinea parasitizes the roots of plants of the genus Psephellus. In order to further explore plastome evolution in the course of transition to holoparasitism, and to clarify the phylogenetic position of Diphelypaea, we determined the complete sequence of

DNA extraction, genome sequencing and sequence assembly
The above ground portion of a single D. coccinea plant growing in a mountain region near the town of Kislovodsk, North Caucasus, Russia (43 • 50 59.8 N, 42 • 38 38.7 E) was used for the extraction of total genomic DNA, by a CTAB-NaCl method (Murray & Thompson, 1980). The extracted DNA was sheared using a g-TUBE device (Covaris Ltd., Brighton, UK) to obtain an average fragment size of 8 kb. The sheared DNA was then electrophoresed on a 0.5% agarose gel, and a portion of gel containing fragments of 6-9 kb was excised. The DNA was purified with a QIAquick Gel Extraction Kit (Qiagen, Hilden, Germany), and then used to prepare a long paired-end library according to the manufacturer's instructions (Roche, Risch-Rotkreuz, Switzerland). The library was sequenced with a Roche GS FLX Genome Sequencer, using the Titanium XL+ protocol. About 329 Mb of sequence, with an average read length of 414 nts was generated. De novo assembly was performed with a Newbler Assembler v.2.9 (454 Life Sciences, Branford, CT, USA) with default settings, which yielded six plastid DNA contigs with an average coverage of 56-fold, ordered in a single scaffold. These contigs were identified based on sequence similarity to plastid genomes of Orobanchaceae and high coverage. The complete plastid genome sequence was obtained through generation of appropriate polymerase chain reaction (PCR) fragments spanning the junctions of the contigs and their sequencing by the Sanger method, using an ABI PRISM 3730 analyzer (Applied Biosystems, Waltham, MA, USA). The list of primers is available in Table S1. Reads spanning junctions between single copy regions and inverted repeats were used to infer contiguous sequences. To verify the correct assembly of the reconstructed plastid genome, raw reads were mapped against the reconstructed sequence with GS Reference Mapper (454 Life Sciences, Branford, CT, USA). The raw reads were deposited in the Sequence Read Archive (SRA) under the accession number SRR9665263. The sequence of the plastid genome of D. coccinea was submitted to GenBank under accession number MK922354.

Plastid genome annotation and analysis tools
Plastid genome annotation was performed using the Dual Organellar GenoMe Annotator (DOGMA; Wyman, Jansen & Boore, 2004), with further manual correction using similarity searches against previously annotated plastid genomes. The tRNAscan-SE server was also used to locate tRNA genes (Lowe & Chan, 2016). A circular map of the plastome was drawn using OrganellarGenomeDRAW software (Lohse et al., 2013).

Verification of the presence of two copies of clpP)
To verify the presence of two copies of the clpP gene, we designed primer pairs flanking each copy (Table S1). Appropriate PCR fragments obtained using DNA samples extracted from two individual D. coccinea plants were analyzed by agarose gel electrophoresis (Fig. S1) and sequenced by the Sanger method.

Plastid genome structure and gene content
The plastome of D. coccinea was assembled into a circular sequence of 66,616 bp from approximately 1.3 million paired-end reads (∼6 kb fragments). It has a typical quadripartite structure with a 37,964 bp large single copy (LSC) region, a 5,220 bp small single copy (SSC) region and a pair of inverted repeats (IRs), each of 11,716 bp (Fig. 2).
The D. coccinea plastome was predicted to contain 54 presumably intact unique genes (Table 1), which was fewer than its fully autotrophic relative Lindenbergia philippensis (113 genes), but comparable to that of holoparasitic Orobanchaceae (42-74; Wicke et al., 2013). Consistent with the inability to photosynthesize and the holoparasitic lifestyle of D. coccinea, its plastome lacks all genes coding for the NAD(P)H dehydrogenase complex and photosynthesis-related proteins. In particular, the D. coccinea plastome lacks the ATP synthase genes that are retained intact in most parasitic Orobanchaceae (Wicke et al., 2013). The genes for plastid-encoded RNA polymerase are also missing. Most of the retained genes are involved in protein synthesis: four rRNA genes, 25 tRNA genes, seven genes coding for the small subunit ribosomal proteins and 11 for the large subunit ribosomal proteins. Most ribosomal protein genes usually found in plastid genomes of photosynthetic angiosperms were also in the plastome of D. coccinea (Table 1). Exceptions are rps15 and rpl32, which are retained as truncated pseudogenes, and rpl23, which could not be identified. The loss of these genes has also been reported in other holoparasitic Orobanchaceae (Wicke et al., 2013). While the plastid genome of D. coccinea contains genes for 25 tRNA species (Table 1), it lacks the tRNA genes trnA-UGC, trnI-GAU, trnK-UUU, trnR-ACG and trnV-UAC, which are usually present in the plastomes of photosynthetic flowering plants. The loss of essential tRNA genes has been observed in parasitic plant plastomes showing an advanced stage of degradation (DePamphilis & Palmer, 1990;Delannoy et al., 2011;Wicke et al., 2013;Ravin et al., 2016). Analysis of the plastomes of holoparasitic Orobanchaceae revealed that up to 13 of 30 conserved tRNA genes, including those mentioned above, could be lost or pseudogenized (Wicke et al., 2013).
Besides genes involved in protein synthesis, the D. coccinea plastid genome contains genes infA, matK, accD, clpP, ycf1 and ycf2, all of which were lost in some lineages of angiosperms, both parasitic and autotrophic. The most frequently lost gene is infA, encoding a translation initiation factor, the loss of which has been described in at least 24 separate lineages of angiosperms (Millen et al., 2001). The AccD protein, the beta subunit of acetyl-CoA carboxylase involved in fatty acid synthesis and leaf development (Kode et al., 2005), is essential for plastome maintenance (Krause, 2012). The gene for this protein is preserved even in the plastomes of most parasitic plants, although loss from the plastome and functional relocation to the nucleus occurs in some photosynthetic species (Rousseau-Gueutin et al., 2013). Among the Orobanchaceae, it has a 5 truncation in the hemiparasite Schwalbea americana and holoparasitic species Phelipanche purpurea and Phelipanche ramosa (Wicke et al., 2013). Although ycf1 and ycf2 are considered to be essential for plastid maintenance (Drescher et al., 2000), multiple instances of loss of these genes, without transfer to the nuclear genome, have been reported in plant plastids (e.g., Wakasugi, Tsudzuki & M, 2001;Cai et al., 2008), but not in parasitic Orobanchaceae (Wicke et al., 2013). The functionality of the ycf1 and ycf2 genes in the D. coccinea plastome might be questioned, since their deduced protein products have long repeat-containing internal insertions contrary to typical Ycf1 and Ycf2 proteins, e.g., from N. tabacum. However, these insertions did not interrupt the open reading frames, suggesting that the proteins could retain functionality.
An interesting finding was the detection of two likely functional copies of the clpP gene in the D. coccinea plastome. clpP encodes a proteolytic subunit of Clp protease involved in protein metabolism within the plastid (Krause, 2012), and was proposed to be essential, being present even in highly reduced plastomes of parasitic plants (Delannoy et al., 2011;Ravin et al., 2016). Among the Orobanchaceae, a presumably functional clpP was found in all species (Wicke et al., 2013). In the D. coccinea plastome, two copies of clpP with 86% nucleotide sequence identity were found (Fig. 1). This duplication of the clpP gene was confirmed by PCR for two individual D. coccinea plants (Fig. S1). Notably, both genes lacked the introns usually present in clpP of photosynthetic angiosperms, but often absent in parasitic species. Their deduced protein products showed 75% amino acid sequence identity, but <35% identity with other plastidial ClpP proteins. It should be noted, that fast evolution of ClpP has been observed in several parasitic and photosynthetic lineages (Wicke et al., 2013;Sloan et al., 2014). Both gene copies contained intact reading frames; therefore, they probably remain functional.
Reduction of the size and gene content of the D. coccinea plastome is also reflected in the loss of introns in the remaining genes. Introns are only present in rpl2, rps12 (trans-spliced), trnG-UCC and trnL-UAA, and appeared to be lost in clpP, rpl16, rps16 and rps12 (cis-spliced intron). The presence of matK correlates with the retention of a group IIA intron in the rpl2 gene, which requires maturase activity for splicing (Zoschke et al., 2010).
As in most angiosperms, the D. coccinea plastome contained an rrn gene cluster within an IR region. Although the gene order is typical (rrn16-rrn23-rrn4-rrn5), two tRNA genes located in the rrn16-rrn23 spacer were lost (trnA-UGC) or truncated as a pseudogene (trnI-GAU ). Interestingly, the rrn23 gene contains a 259 bp intervening sequence, absent from any other plastidial rrn23 sequence available in the GenBank database. A BLASTn search against GenBank found no sequences with high similarity to this insert. The insert occurred in the side hairpin of the H38 helix region of domain II of the rrn23 gene (Fig. S2). Introns in rrn23 genes have been found in the plastomes of the charophytes Chlorella (Wakasugi et al., 1997) and Chlamydomonas (Turmel et al., 1993), and the hornwort Anthoceros formosae (Kugita et al., 2003); however, they have not been reported in other land plants. The site of insertion in the D. coccinea rrn23 gene did not match the positions of intron insertions in the charophytes and Anthoceros formosae. It is possible that this intervening sequence is not an intron but an insertion that is still compatible with the final structure of the ribosome. RNA-seq analysis would help to clarify this issue.

Structural rearrangements and duplications in the D. coccinea plastome
Plastid genomes of most angiosperms are highly conserved, not only in terms of overall quadripartite structure and gene content, but also in the gene order. Although deviation from a conserved gene order due to plastid genome rearrangements occurred in some photosynthetic lineages, e.g., cereals, geranium and clover (Chumley et al., 2006;Cai et al., 2008), numerous translocations, duplications, inversions and deletions are most frequently observed in the plastomes of parasitic species (Wicke et al., 2013). Among Orobanchaceae, the plastome of autotrophic Lindenbergia philippensis is colinear with that of tobacco, and limited deviations are observed in the hemiparasitic species Schwalbea americana (Wicke et al., 2013). In addition, no major rearrangements are observed in the plastome of Lathraea squamaria, belonging to the Rhinantheae clade of Orobanchaceae (Samigullin et al., 2016). However, gene deletions, duplications, inversions, shifts of IR boundaries and even complete loss of one IR are observed in the plastomes of holoparasitic species of the Orobancheae clade (Wicke et al., 2013;Schneider et al., 2018).
Comparison of the order of genes in the plastome of D. coccinea with the standard for angiosperms showed that the D. coccinea plastome, in addition to gene losses, experienced multiple rearrangements, including inversion, translocation and duplication of genes (Fig. 2). The gene order differs from that in other species of Orobanchaceae reported by Wicke et al. (2016). However, the D. coccinea plastome retains the highly conserved S10 operon (rpl2, rps19, rpl22, rps3, rpl16, rpl14, rps8, infA, rpl36, rps11) and the rrn gene cluster. These operons are conserved in all plastomes of Orobanchaceae, but appear to be deconstructed in some of the most highly reduced genomes of parasitic plants (Bellot & Renner, 2015;Ravin et al., 2016).

Phylogenetic position of D. coccinea
Relatively little is known about the phylogenetic position of the genus Diphelypaea, and only two ∼600 bp-long nuclear internal transcribed spacer (ITS) sequences of D. coccinea and D. tournefortii are available in GenBank (accessed on May 3, 2019). Phylogenetic analysis of the nuclear ITS region revealed that the genus Orobanche, as defined by Beck-Mannagetta (1930), divided into two genus-level groups, the Orobanche group (Orobanche sect. Orobanche and Diphelypaea) and the Phelipanche group (Schneeweiss et al., 2004a). This split is also supported by karyological features, since the chromosome base number in the Orobanche/Diphelypaea group is x = 19, while in the Phelipanche group it is x = 12 (Schneeweiss et al., 2004b). In the ITS phylogeny, two Diphelypaea species formed a basal lineage in the Orobanche group. However, these molecular phylogenetic implications were based on a limited sequence dataset and need to be clarified. In this current study, we took advantage of availability of the complete plastid genome sequence of D. coccinea to define the phylogenetic position of this genus in the Orobanchaceae. Analysis of the concatenated nucleotide sequences of 17 conserved genes from 30 species of Orobanchaceae yielded a fully resolved phylogenetic tree (Fig. 3). D. coccinea appeared to be included in a cluster also comprising Boulardia latisquama and species of Orobanche sect. Orobanche. By contrast, the Phelipanche group, comprising Phelipanche and Aphyllon, formed a distinct lineage.

CONCLUSIONS
Being only 66,616 bp in size and containing 54 presumably intact unique genes, the plastome of D. coccinea is one of the most highly reduced among the parasitic Orobanchaceae. Plastome rearrangements, gene duplications and the loss of introns are associated with gene loss and genome reduction. More pronounced gene loss has only been reported in the plastomes of Conopholis americana (45,673 bp, 42 genes), Epifagus virginiana (70,028 bp, 42 genes) and Boulardia latisquama (80,361 bp, 49 genes). In particular, the D. coccinea plastome lacks all genes of the photosynthetic apparatus, including ATP synthase genes that are retained intact in most Orobanchaceae. However, all 16 protein-coding genes, 14 tRNA genes and 4 rRNA genes commonly present in the plastid genomes of all hemi-and holoparasitic Orobanchaceae species (Table 1) are present in D. coccinea, suggesting that further gene loss is unlikely in this lineage. Phylogenetic analysis confirmed that D. coccinea belongs to the Orobanche group of the family Orobanchaceae.
• Andrey V. Mardanov conceived and designed the experiments, performed the experiments, authored or reviewed drafts of the paper, approved the final draft.
• Nikolai V. Ravin conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.

Data Availability
The following information was supplied regarding data availability: The raw reads are available in the Sequence Read Archive (SRA): SRR9665263 and the sequence of the plastid genome of D. coccinea is available at GenBank: MK922354.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.7830#supplemental-information.