Gene and genome duplications and the origin of C4 photosynthesis: Birth of a trait in the Cleomaceae

C4 photosynthesis is a trait that has evolved in 66 independent plant lineages and increases the efficiency of carbon fixation. The shift from C3 to C4 photosynthesis requires substantial changes to genes and gene functions effecting phenotypic, physiological and enzymatic changes. We investigate the role of ancient whole genome duplications (WGD) as a source of new genes in the development of this trait and compare expression between paralog copies. We compare Gynandropsis gynandra, the closest relative of Arabidopsis that uses C4 photosynthesis, with its C3 relative Tarenaya hassleriana that underwent a WGD named Th. We establish through comparison of paralog synonymous substitution rate that both species share this paleohexaploidy. Homologous clusters of photosynthetic gene families show that gene copy numbers are similar to what would be expected given their duplication history and that no significant difference between the C3 and C4 species exists in terms of gene copy number. This is further confirmed by syntenic analysis of T. hassleriana, Arabidopsis thaliana and Aethionema arabicum, where syntenic region copy number ratios lie close to what could be theoretically expected. Expression levels of C4 photosynthesis orthologs show that regulation of transcript abundance in T. hassleriana is much less strictly controlled than in G. gynandra, where orthologs have extremely similar expression patterns in different organs, seedlings and seeds. We conclude that the Thand older paleopolyploidy events have had a significant influence on the specific genetic makeup of Cleomaceae versus Brassicaceae. Because the copy number of various essential genes involved in C4 photosynthesis is not significantly influenced by polyploidy combined with the fact that transcript abundance in G. gynandra is more strictly controlled, we also conclude that recruitment of existing genes through regulatory changes is more likely to have played a role in the shift to C4 than the neofunctionalization of duplicated genes. DATA: The data deposited at NCBI represents raw RNA reads for each data series mentioned: 5 leaf stages, root, stem, stamen, petal, carpel, sepal, 3 seedling stages and 3 seed stages of Tarenaya hassleriana and Gynandropsis gynandra. The assembled reads were used for all analyses of this paper where RNA was used. http://www.ncbi.nlm.nih.gov/Traces/sra/?study=SRP036637, http://www.ncbi.nlm.nih.gov/ Traces/sra/?study=SRP036837 © 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license ∗ Corresponding author. Tel.: +31 317483160. E-mail addresses: erik.vandenbergh@wur.nl (E. van den Bergh), anan.kuelahoglu@uni-duesseldorf.de (C. Külahoglu), ndrea.braeutigam@uni-duesseldorf.de (A. Bräutigam), jmh65@cam.ac.uk J.M. Hibberd), andreas.weber@uni-duesseldorf.de (A.P.M. Weber), huxinguang@picb.ac.cn (X.-G. Zhu), eric.schranz@wur.nl (M. Eric Schranz). ttp://dx.doi.org/10.1016/j.cpb.2014.08.001 214-6628/© 2014 The Authors. Published by Elsevier B.V. This is an open access article u (http://creativecommons.org/licenses/by/3.0/).

a b s t r a c t C 4 photosynthesis is a trait that has evolved in 66 independent plant lineages and increases the efficiency of carbon fixation. The shift from C 3 to C 4 photosynthesis requires substantial changes to genes and gene functions effecting phenotypic, physiological and enzymatic changes. We investigate the role of ancient whole genome duplications (WGD) as a source of new genes in the development of this trait and compare expression between paralog copies. We compare Gynandropsis gynandra, the closest relative of Arabidopsis that uses C 4 photosynthesis, with its C 3 relative Tarenaya hassleriana that underwent a WGD named Th-␣. We establish through comparison of paralog synonymous substitution rate that both species share this paleohexaploidy. Homologous clusters of photosynthetic gene families show that gene copy numbers are similar to what would be expected given their duplication history and that no significant difference between the C 3 and C 4 species exists in terms of gene copy number. This is further confirmed by syntenic analysis of T. hassleriana, Arabidopsis thaliana and Aethionema arabicum, where syntenic region copy number ratios lie close to what could be theoretically expected. Expression levels of C 4 photosynthesis orthologs show that regulation of transcript abundance in T. hassleriana is much less strictly controlled than in G. gynandra, where orthologs have extremely similar expression patterns in different organs, seedlings and seeds. We conclude that the Th-␣ and older paleopolyploidy events have had a significant influence on the specific genetic makeup of Cleomaceae versus Brassicaceae. Because the copy number of various essential genes involved in C 4 photosynthesis is not significantly influenced by polyploidy combined with the fact that transcript abundance in G. gynandra is more strictly controlled, we also conclude that recruitment of existing genes through regulatory changes is more likely to have played a role in the shift to C 4 than the neofunctionalization of duplicated genes. DATA: The data deposited at NCBI represents raw RNA reads for each data series mentioned: 5 leaf stages, root, stem, stamen, petal, carpel, sepal, 3 seedling stages and 3 seed stages of Tarenaya hassleriana and Gynandropsis gynandra. The assembled reads were used for all analyses of this paper where RNA was used. http://www.ncbi.nlm.nih.gov/Traces/sra/?study=SRP036637, http://www.ncbi.nlm.nih.gov/ Traces/sra/?study=SRP036837 © 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/3.0/).

Introduction
Over sixty lineages of both monocot and eudicot angiosperms have evolved a remarkable solution to maximize photosynthesis efficiency under low CO 2 levels, high temperatures and/or drought: C 4 photosynthesis [1]. The evolution of this modified photosynthetic pathway represents a wonderful example of convergent evolution. While the changes necessary for the transition from C 3 to C 4 photosynthesis are numerous, the trait has a wide phylogenetic distribution across angiosperms, with 19  families across the globe known to contain one or multiple members capable of C 4 photosynthesis [2]. Much research on eudicot C 4 has focused on Flaveria species (Asteraceae), which contains not only C 4 species but also a number of C 3 /C 4 intermediates [3]. With the emergence of genomics and the choice of Arabidopsis thaliana as the genomics standard model organism, species in the Cleomaceae, a sister-family to the Brassicaceae (containing Arabidopsis and Brassica crops) have been proposed for genetic studies of C 4 [4,5]. C 4 plants spatially separate the fixation of carbon away from the RuBisCO active site by using phosphoenolpyruvate carboxylase, an alternate carboxylase that does not react with oxygen. As a consequence they are more efficient under permissive conditions [6]. The typical C 4 system is characterized by a morphological change: so-called Kranz anatomy [7]. In this anatomy, specialized mesophyll (M) cells surround enlarged bundle sheath (BS) cells, with the leaf veins internal to the BS. Generally, the veination in C 4 leaves is increased [8]. This internal leaf architecture physically partitions the biochemical events of the C 4 pathway into two main phases. In the first phase, dissolved HCO 3 − is assimilated into C 4 acids by phosphoenolpyruvate carboxylase (PEPC) in the mesophyll cells. In the second phase, these acids diffuse into the chloroplast loaded bundle sheath (BS) cells, where they are decarboxylated and the released CO 2 is fixed by RuBisCO. The increased CO 2 concentration in the BS cells allows carbon fixation by RuBisCO to be much more efficient by reducing photorespiration. Two subtypes of the C 4 biochemical pathway are defined, based on the most active C 4 acid decarboxylase that liberates CO 2 from C 4 acids in the bundle sheath: NADP-malic enzyme (NADP-ME), NAD-malic enzyme (NAD-ME); a facultative addition of phosphoenolpyruvate carboxykinase (PEPCK) activity can be present in either subtype [9]. The subtypes are used as a classification scheme for C 4 . The process of carboxylation and decarboxylation costs more energy than the simpler C 3 form of photosynthesis, but it diminishes photorespiration. In conditions of low atmospheric CO 2 pressure, photorespiration causes a major loss in photosynthetic output and the elaborate concentrating mechanisms of C 4 photosynthesis circumvent this [10].
All genes important for the C 4 pathway are expressed at relatively low levels in C 3 leaves [11]. The mechanism for recruitment of these genes into the C 4 pathway remains to be elucidated. For some ancestral C 3 genes changes in cis-regulatory elements, while in others changes in trans generate M and BS cell specificity [12][13][14], indicating variation in the mechanisms underlying gene recruitment into the C 4 pathway. It has been proposed that gene duplication and subsequent neofunctionalization of one gene copy has facilitated the alterations in gene expression that underlie the evolution of C 4 photosynthesis [15,16]. Gene duplication is proposed to be a (pre)condition for the evolution of C 4 because it allows the organism to maintain the original gene while a duplicate version can acquire beneficial changes. This can lead to significant changes in metabolism without the deleterious effect of modifications to essential genes. A recent study that compared convergent evolution of photosynthetic pathways with parallel evolution concluded that duplications are not essential for the development of C 4 biochemistry, but rather changes in expression and localization of specific genes [11,17]. However, this study highlighted just the number of C 4 genes and did not take into account the age and mechanism of gene duplications.
The modifications necessary for the anatomical changes from C 3 to C 4 photosynthesis are not well established. Recent work has shown that the SCARECROW (SCR) gene that is responsible for vein formation in roots, can produce proliferated bundle sheath cells as well as other changes that can be coupled to the shift to the Kranz anatomy [18]. Further work supports this relation by describing the role that the upstream interacting partner of SCR, SHORT-ROOT (SHR) plays in the variations in anatomy seen in various C 4 species [19,20].
Gene duplicates must be further refined by the mechanism by which they arise; either as single gene tandem duplication or whole genome duplication (WGD). Tandem duplications occur frequently, but the duplicates are often lost again resulting in a constant birth-death cycle of duplicate genes [21]. Second, there is whole genome duplication (WGD) or polyploidy, where all genes are simultaneously duplicated. After duplication there are often dramatic changes in the plant genomic structure, a process referred to as diploidization in which most genes return to single copy. However, the genes that are maintained in duplicate after WGD often have important functions in enzyme complexes (e.g. to maintain proper gene balance [22]) or can diversify and evolve new gene functions (e.g. neo-functionalization).
The contribution of WGD to photosynthesis-related genes has been studied in soybean, barrel-medic, Arabidopsis, and sorghum [23,24]. The polyploid and non-polyploid duplicated gene retention in Glycine max, Medicago truncatula and Arabidopsis for four classes of photosynthesis-related genes was compared: the Calvin-Benson-Bassham-cycle (CBBC), the light-harvesting complex (LHC), photosystem I (PSI) and photosystem II (PSII). It was found that photosystem genes were more dosage sensitive, with more duplicates derived only from WGD whereas CC gene families were often larger with more non-polyploid duplicates retained. In Sorghum bicolor, a recent WGD was reported to be an important origin of C 4 specific genes. Several key C 4 genes of this crop were found to be collinear with genes that function in C 3 photosynthesis when compared to maize and rice. Here, we combine the approaches of these two studies to examine the evolution of photosynthesis and C 4 -related genes in C 3 and C 4 Cleomaceae species.
Gynandropsis gynandra (Fig. 1, blue clade) belongs to the NAD-ME C 4 photosynthesis sub-type [25,26] and is an important South-East Asian and African dry-season leafy vegetable (sometimes referred to as Phak-sian or African cabbage), and is closely related to horticultural C 3 species Tarenaya hassleriana (Fig. 1, pink clade). Both species are easily cultivated in the greenhouse, and a robust phylogenetic framework for Cleomaceae species is emerging [4,5,27]. There are two other independent origins of the C 4 within the Cleomaceae, Cleome angustifolia and Cleome oxalidea ( Fig. 1, yellow clade), identified by carbon isotope discrimination [5,25]. Because of the economic importance and ease of growth, the C 4 -C 3 contrast between G. gynandra and T. hassleriana makes this system most attractive and tractable. Both species also have relatively small genome sizes (T. hassleriana = 292 Mb and G. gynandra ≈ 1 Gb). T. hassleriana underwent a WGD named Th-␣ [28] but it is not yet known whether this event is shared with all or a subset of other Cleomaceae.
In this study we compare C 3 T. hassleriana of the Cleomaceae with C 4 G. gynandra of the same family. We use the knowledge of Brassicaceae gene functions to identify the important photosynthetic genes in both species and address the following questions: Does G. gynandra share the Th-␣ event? What is contribution of duplicate genes to photosynthesis and C 4 -related gene families?
And finally, what is the role of gene duplicates from WGD compared to continuous small-scale duplications?

Transcriptome sequencing and assembly
All transcriptome data was used directly from the Cleomaceae transcript atlas [17]. In the atlas, T. hassleriana genes were used as a reference to map transcripts from both species to Cleomaceae "unigenes" indicated by the gene name coined in the published T. hassleriana genome [29]. For gene quantification we used default BlatV35 parameters [30] in protein space for mapping, counting the best matched hit based on e-value for each read uniquely.

Homolog selection
A TBlastX [31] search of transcriptomes of T. hassleriana and G. gynandra was performed with default parameters (no evalue cutoff) to have a maximum number of hits for subsequent filtering. To filter paralogs and orthologs from these results, CIP/CALP filtering was used [32]. Cumulative Identity Percentage (CIP) is defined as the sum of the number of matching nucleotides for each high-scoring segment pair (HSP) of a pair of genes divided by the total lengths of those HSPs. Cumulative alignment length percentage (CALP) is defined as the sum of the alignment lengths of all HSPs of a matching gene pair divided by the total length of the query sequence. Both of these values give a reliable estimation of the similarity of two genes and is a more accurate method than evalue or bit score threshold filtering. A CIP/CALP threshold of 50/50 was chosen as a suitable cutoff point for orthology and/or paralogy.

Ks/4dtv calculation of paralog pairs
Paralogs identified with CIP/CALP filtering were aligned using Exonerate [33] with the coding2coding model parameter, using a custom output format through the "roll your own" parameter. The exact command line used was: "exonerate -m c2c seq1.fasta seq2.fasta -ryo \"%Pqs %Pts\\n" -showalignment false -verbose 0". The output from this command was fed into CodeML from the PAML package using standard parameters (Codonfreq = 2, kappa = 2, omega = 0.4). Output from PAML [34] was parsed using custom Perl scripts to read the synonymous substitution rate (Ks) and the fourfold transversion rate (4dtv). This workflow is identical to the established paralog identification pipeline Duppipe [35] using updated tools and more stringent selection using CIP/CALP.

Evidence of WGD in both species confirming a shared event
Using the transcript sets of G. gynandra and T. hassleriana, paralogs were matched to each other by BLAST search and CIP/CALP filtering. In total, 55,014 paralogs were found: 26,883 in T. hassleriana covering 49% of transcript space and 28,131 in G. gynandra covering 48% of transcript space. Of all paralog pairs, Ks and fourfold transversion substitutions (4dtv) were determined and binned to establish an evolutionary time distribution (Fig. 2). In both species a large gene birth event has taken place around Ks = 0.4 (Fig. 2 between Ks = 0.25 and Ks = 0.5), which corresponds to the Ks window established earlier for the Th-␣ hexaploidy event [28]. The same analysis was performed using 4dtv values and results were extremely similar. Enumerating the paralogs that fall within the Th-␣ peak, we see that 15,785 gene pairs in T. hassleriana are retained from the Th-␣ paleohexaploidy, or ∼29% of the total transcriptome. For G. gynandra, 16,096 gene pairs fall within the Th-␣ window, or around 27% of all transcripts.

Duplicate loss and retention in essential C 4 families
We examined six gene families that are essential in C 4 photosynthesis in detail: NAD malic enzyme (NAD-ME), NADP malic enzyme (NADP-ME), ␤ carbonic anhydrase (␤CA), malate dehydrogenase (MDH), phosphoenolpyruvate carboxylase (PEPC) and phosphoenolpyruvate carboxykinase (PPCK). Using Arabidopsis genes as a reference, homologous clusters were created using a  In most cases both Cleomaceae species have around 1.5 times the number of genes of A. thaliana except, interestingly, the NADP-ME family where numbers are almost the same in all species. Also of note is that T. hassleriana has 16% more C4 related genes in total than G. gynandra (57 over 49). All genes of one species in a cluster were then aligned to each other and the Ks value of each pairing was established and subsequently binned with a stepsize of Ks = 0.15 (Fig. 3). At the Ks corresponding to the Th-␣ hexaploidy, both T. hassleriana and G. gynandra show a relative increase of gene pairs with this amount of synonymous substitutions. A. thaliana at the Ks of its older At-␣ event shows a similar, if slightly lower increase. Even longer ago in Table 1 C4 photosynthesis homolog cluster sizes in A. thaliana, T. hassleriana and G. gynandra. Both Cleomaceae species have around 1.5 times the number of genes of A. thaliana except the NADP-ME and NAD-ME families where numbers are lower than average in the Cleomaceae species resulting in a similar amount of homologs in each species for these two gene groups.  evolutionary time at the Ks corresponding to the ␤ event T. hassleriana has retained ∼20% of C 4 related genes, where the other species show 2% and 0% retention for G. gynandra and A. thaliana, respectively. The final confirmed paleohexaploidy that all three species share, the ancient ␥ event at Ks = 2.4, has contributed substantially to the genetic makeup of all three species. In A. thaliana the number of relations that stem from the ␥ paleohexaploidy is 23%, with both Cleomaceae at 15% and 21% for T. hassleriana and G. gynandra, respectively.

Syntenic copy number variation
Syntenic analyses of the previously mentioned gene families was performed using CoGe Synfind [36]. Each T. hassleriana c4 related ortholog was used as a query with T. hassleriana, Arabidopsis thaliana, A. arabicum [37] as a basal representative of Brassicaceae. Thus for the T. hassleriana: A. thaliana: A. arabicum ortholog ratio we would theoretically expect 3 (Th-␣):2 (At-␣):2. Query results were enumerated and the average number of regions per family was determined (Fig. 4). For many families, the average is comparable to the 3:2:2 ratio, which is also represented by the average ratio (Fig. 4, rightmost set of bars) being 3.6:2.1:2.5. The exception is the NAD-ME family, which has seen more than expected retention with an orthologs ratio 4.3:3.3:4.3. The PEPC family also seems slightly under-retained in Brassicaceae, with a ratio of 3.3:1.3:1.6. Unfortunately, syntenic data is impossible to obtain without a sequenced genome so data syntenic regions of G. gynandra will have to be obtained in future work.

Regulation of photosynthetic homolog expression
Both Cleomaceae have substantially more copies of photosynthetic genes (Fig. 4). Using the Cleomaceae expression atlases [17], the expression of separate copies was compared in the C 3 and the C 4 species. In the expression atlas, the T. hassleriana coding sequence was used as a reference to map expression in both T. hassleriana and G. gynandra to a single Cleomaceae 'unigene'. Expression was quantified in nine different tissues including three developmental series: development from young to mature leaf (six stages), root, stem, stamen, petal, carpel, sepal, a seedling developmental series (three stages) and a seed time series (three stages).
For the photosynthetic gene families (NAD-ME, NADP-ME, PEPCK, PEPC, MDH, CA), homolog selection resulted in a data set of 43 unigenes with expression data for both Cleomaceae species. Table 2 List of Arabidopsis genes used as representatives of C4 photosynthesis families. ATG identifiers correspond to identifier following the ATG system from the Arabidopsis Information Resource [43].

Gene family ATG identifiers
Expression levels were normalized and compared amongst photosynthetic gene families, examples of which are plotted for NAD-ME and ␤CA (Fig. 5). Immediately noticeable is the highly similar expression profiles of G. gynandra when compared to the more chaotic profiles of T. hassleriana. This is observed in all except one gene family. G. gynandra has 176 expressed unigenes with a highly correlated expression pattern (Pearson correlation > 0.95) whereas in T. hassleriana 87 unigenes share a highly correlated expression pattern (Pearson correlation > 0.95). The expression pattern that is observed in G. gynandra in the ␤-CA family also correspond to their A. thaliana highest ranking match ( Table 2). The cluster consisting of C.spinosa 00253, C.spinosa 13896, C.spinosa 18526 and C.spinosa 10164 for example all match highest to A. thaliana gene ␤ carbonic anhydrase 4 (AT1G70410). The cluster consisting of C.spinosa 07642 and C.spinosa 13410 both map to carbonic anhydrase 1 (AT3G01500). A similar pattern is present in NAD-ME where the cluster of C.spinosa 03046 and C.spinosa 09126 both map to NAD-ME1 (AT2G13560) and the C.spinosa 12536 singleton maps to NAD-ME2 (AT4G00570). Note that leaf0-leaf5 as well as seedling2-seedling6 and seed1-seed3 are time series of the same organ, with the leaf and seedling gradient being two days separated by stage. Transcription levels in G. gynandra (lower graphs) are more strictly regulated across organs, seeds and seedlings. The chaotic patterns in T. hassleriana (upper graphs) results in half the genes having a Pearson correlation > 0.95 compared to G. gynandra.

Discussion and conclusions
In this study, we have analyzed the transcriptomes of the C 3 T. hassleriana and C 4 G. gynandra to address the potential contribution of WGD and recent gene duplicates to the evolution of photosynthesis and C 4 -pathway related genes. The initial comparison of T. hassleriana and G. gynandra was performed to identify the differential expression of key-genes involved in the NAD-ME C 4 biochemical pathway. However, it did not consider the role of gene duplicates. We show that very distinct patterns will occur when the duplication history is taken into account.
We could confirm the Th-␣ hexaploidy that has been found in T. hassleriana using an independent transcriptome dataset. We also find that G. gynandra shares this WGD with T. hassleriana, further establishing the occurrence of WGD in this lineage. Based on the phylogenetic position of both species in Cleomaceae, the Th-␣ duplication took place at least before the divergence of the two species which means that it is shared across Cleomaceae lineages 8-15 according to the latest phylogeny of the family [25]. Dating this polyploidy event in terms of absolute age is always a difficult task, however, here we find that the Ks rate of G. gynandra is extremely similar if not identical to T. hassleriana. Assuming then that mutation rates between these two species are the same, we can reaffirm the previous date estimation of Th-␣ at 13.7 mya [38].
The influence of the Th-␣ WGD event on photosynthetic gene composition is apparent, both in ortholog number as well as in syntenic region copy number for both species. From absolute orthologs numbers we can see that there is no increased retention between Cleomaceae species and even a slightly lower rate of retention in G. gynandra. This indicates that both species have experienced similar evolutionary constraints for a significant amount of time. Also we need to consider that genes sharing a similar sequence, do not necessarily have to share the same function. Even using strict CIP/CALP filtering which has been proved to be an accurate measure for the prediction of true orthologs [32], differential expression either in time, localization or regulation can substantially change the function of a gene. This is especially the case for genes in the core C 4 photosynthesis pathway, where many C 3 genes have been recruited into new functions [13,39].
When establishing Ks values of deeper ortholog nodes of photosynthesis genes, a large proportion of genes seem to have been retained from the ␥ duplication. For a trait that is likely to be highly dosage sensitive [23], we expect that gene loss will be rare and that remnants from this old paleohexaploidy are still present. However, considering the time that has passed since the ␥ paleohexaploidy event and on the basis of absolute gene copy numbers some gene loss has taken place predating the transition from C 3 to C 4 .
The evolutionary importance of WGD events is made clear from the dominant presence of retained Th-␣ genes in both Cleomaceae species. However, certain questions remain: Can we couple this importance to the evolution of specific traits or in this case, C 4 photosynthesis? This is an old discussion, dating back to the works of Ohno who was the first to suggest that the massive radiation of vertebrates was caused by a whole genome duplication in the ancestor [40]. An earlier study on the evolution of photosynthesis in soybean, showed that the Calvin-Benson-Bassham cycle (CBBC) and the light harvesting complex (LHC) gene families show a greater expansion from single gene duplications than both photosystem groups. This is explained by the increased dosage sensitivity of photosystem genes: if some subunits are expressed differently due to duplications while others are not, this is deleterious for the system as a whole [23]. This acts as a conservation mechanism for gene copy number that does not affect the more loosely connected enzyme collection of the CBBC and LHC genes.
In G. gynandra, where the expression of C 4 genes is tightly linked in clusters we would expect a high retention of orthologs. However, this dependency on transcriptional regulation has not lead to an increased retention of photosynthetic genes, as evidenced by lower copy numbers for all C 4 gene families when compared to T. hassleriana. It is not likely that neofunctionalization of genes after polyploidy has played a major role in the shift to C 4 photosynthesis. The much more stringent transcriptional regulation of C4 cycle genes in G. gynandra when compared to T. hassleriana as evidenced in this study is in accordance with the alternative hypothesis, which states that this process was mainly due to recruitment of existing genes in transcriptional space as suggested by several authors [12,14,41,42].
We still have much to learn regarding the development of C 4 photosynthesis. When studying this exceptional trait, we must always consider the genetic history of the species in question. Here, we give evidence that duplications, on a large scale and small, contribute to trait evolution. The exact mechanisms behind the recruitment of these genes into new biochemical pathways however are still largely unknown. Current sequencing efforts for G. gynandra will significantly aid in finding the detailed mechanisms of gene and C 4 photosynthesis evolution. The Cleome genus provides an excellent model system for unraveling the evolutionary origin and workings of C 4 photosynthesis and hopefully will enable us to harvest the fruits of our knowledge on this remarkable form of plant energy conversion.