More than 20 years after it was proposed that plant genomes must contain genes derived from the cyanobacterial ancestor of the plastid (Weeden, 1981), the full impact of endosymbiotic gene transfer is just being revealed. In a recent study published in the Proceedings of the National Academy of Sciences, Martin et al (2002) show that the contribution of cyanobacterial genes to the nuclear genome of the flowering plant Arabidopsis extends far beyond those associated with photosynthesis or the plastid. Cyanobacterial-derived genes appear to make up a large fraction of the plant genome and they not only encode proteins that service the plastid, but proteins for all other cellular functions. These results underscore the importance of endosymbiosis in shaping the biochemistry and cell biology of eukaryotic cells.

In 1981 Weeden crystallized a major component of the theory of endosymbiosis when he proposed that plant nuclear genomes contained genes originating from the cyanobacterium that gave rise to the plastid. Weeden knew that plastids contained far more proteins than their reduced genomes could possibly encode. He suggested that these proteins were originally encoded in the endosymbiont genome, but were transferred to the host nucleus during the early stages of endosymbiosis. He also proposed that the protein products of these genes were somehow targeted to the plastid after they were translated, so that the proteins ended up where they had always been. At the time this idea was galvanizing, now it is taken for granted: hundreds of genes for plastid-targeted proteins have now been described in plant nuclei and the process by which the proteins are targeted to the plastid is now largely understood (McFadden, 1999).

While Martin et al (2002) now elucidate the full impact of endosymbiosis in shaping the plant nuclear genome, the idea that the cyanobacterial endosymbiont contributed more genes to the nucleus than strictly necessary for plastid function is not new. For example, higher plants contain two nuclear-encoded isoforms of the enzyme phosphoglycerate kinase (PGK), one functioning in the plastid, the other in the cytosol. Despite the different evolutionary histories of the cellular compartments in which they function, both PGK isoforms are cyanobacterial in origin. Apparently the cyanobacterial PGK gene was duplicated during plant evolution, with one copy servicing the plastid and the other taking over the role of the pre-existing (noncyanobacterial) cytosolic protein (Brinkmann and Martin, 1996). This phenomenon is known as endosymbiotic gene replacement, and while a few cases have been well documented, its overall contribution to the nuclear genome of plants has not been clear. Armed with the complete set of proteins encoded in the nuclear genome of Arabidopsis (The Arabidopsis Genome Initiative, 2000), Martin and co-workers were able to tackle this question on a large scale.

The researchers compared 24 990 Arabidopsis proteins to those encoded in a set of completely sequenced archaeal, bacterial, and cyanobacterial genomes, as well as those of yeast. From a set of 9368 proteins that produced a significant match in at least one reference genome, about 1700, or 18%, of the genes were most similar to a cyanobacterial homologue. Extrapolating to the genome as a whole, they estimated that about 4500 Arabidopsis nuclear genes are of cyanobacterial origin. Regardless of whether this is an overestimate or underestimate (there are arguments for both), this is an unexpectedly large number. Indeed, the estimated 4500 cyanobacterial genes in the Arabidopsis nucleus is over 1000 more genes than the total gene complement of the cyanobacterium Synechocystis (Kaneko et al, 1996) and over 60% of the number of genes encoded in the largest sequenced cyanobacterial genome, that of Nostoc (Meeks et al, 2001). While subsequent analysis and new data are certain to revise this estimate somewhat (eg, see Rujan and Martin, 2001), it is clear that the cyanobacterial endosymbiont gave vastly more of its genome to the host than previously appreciated.

However, the significance of this observation lies not so much in the sheer number of genes involved, but rather in the diversity of cellular functions predicted for the proteins they encode. Metabolism, cell growth and division, intracellular transport, cell organization, and transcription are all implicated. Even more remarkable, fewer than half of the cyanobacterial-like proteins in Arabidopsis are predicted to be targeted to the plastid, leading the authors to conclude that the impact of plastid endosymbiosis on the host was far greater than just acquiring an organelle (Martin et al, 2002). One of the steps in the textbook explanation of endosymbiotic organelle origins is the severe reduction of the endosymbiont and its genome. This may still be true in a fashion, but at least in plastids it appears that much of the endosymbiont genome has survived this reduction by relocating and finding a new role in the cell. Apparently endosymbiosis creates an influx of raw genetic material, and the mixing and matching of this material with existing host genes fosters a period of invention for the host.

Decades after the general acceptance of an endosymbiotic origin for plastids, various aspects of the process and its implications remain to be fully understood. One aspect of plastid evolution that may be interesting to consider in the light of these new findings is secondary endosymbiosis. While all plastids are ultimately derived from the original endosymbiosis between a eukaryote and a cyanobacterium, plastids have also spread laterally among eukaryotes. Secondary endosymbiosis occurs when a eukaryotic alga is swallowed by a second, heterotrophic, eukaryote and the two integrate to form a new algal lineage (Archibald and Keeling, 2002). This phenomenon accounts for much of algal diversity, and the genetic contribution of these endosymbionts to their hosts is particularly interesting since the endosymbiont brings with it a large, eukaryotic genome. The integration of endosymbiont nuclear genes into the secondary host nucleus should be easier, because the eukaryotic genes and the proteins they encode may be more easily incorporated into their new eukaryotic background than prokaryotic genes. However, such replacements will be far more difficult to detect since the host and the endosymbiotic alga are both eukaryotes, and therefore much more closely related to each other than to cyanobacteria. To date, little sequence information exists from the nuclear genomes of most of these organisms, but in time these genomes should provide another new glimpse into the effects of endosymbiotic mergers at the molecular level.