- Split View
-
Views
-
Cite
Cite
Nicholas A. Stover, André R. O. Cavalcanti, Anya J. Li, Brian C. Richardson, Laura F. Landweber, Reciprocal Fusions of Two Genes in the Formaldehyde Detoxification Pathway in Ciliates and Diatoms, Molecular Biology and Evolution, Volume 22, Issue 7, July 2005, Pages 1539–1542, https://doi.org/10.1093/molbev/msi151
- Share Icon Share
Abstract
During the course of a pilot genome project for the ciliate Oxytricha trifallax, we discovered a fusion gene never before described in any taxa. This gene, FSF1, encodes a putative fusion protein comprising an entire formaldehyde dehydrogenase (FALDH) homolog at one end and an S-formylglutathione hydrolase (SFGH) homolog at the other, two proteins that catalyze serial steps in the formaldehyde detoxification pathway. We confirmed the presence of the Oxytricha fusion gene in vivo and detected transcripts of the full-length fusion gene. A survey of other large-scale sequencing projects revealed a similar fusion protein in a distantly related ciliate, Tetrahymena thermophila, and a possible fusion of these two genes in the diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana, but in the reverse order, with the SFGH domain encoded upstream of the FALDH domain. Orthologs of these fusion proteins may be widespread within the ciliates and diatoms.
Introduction
Development of the ciliate macronucleus involves widespread chromosome breakage followed by telomere addition (reviewed in Prescott 1994; Jahn and Klobutcher 2002). Following these processing events, most macronuclear chromosomes sequenced to date from spirotrichous ciliates contain the coding region of a single gene flanked by telomeres and short untranslated regions (Riley and Katz 2001; D. M. Prescott, J. D. Prescott, and R. M. Prescott 2002; Chang et al. 2004). Recently, 1,356 macronuclear chromosomes from the spirotrichous ciliate Oxytricha trifallax (Sterkiella histriomuscorum) were cloned and end sequenced as part of a pilot genome project (Doak et al. 2003). During annotation of these sequences, we identified a small set of putative multigene chromosomes by searching for chromosomes that contained different genes at opposing ends (Cavalcanti et al. 2004). One of these cloned chromosomes appeared to encode a putative formaldehyde dehydrogenase (FALDH) homolog at one end and a putative S-formylglutathione hydrolase (SFGH) at the other. We sequenced the entire chromosome and found open reading frames (ORFs) capable of encoding both full-length proteins. Intriguingly, both ORFs were oriented in the same direction on the chromosome and were separated by only 30 bp or 10 amino acids. We could find neither a stop codon at the 3′ end of the upstream (FALDH) ORF nor a start methionine codon at the beginning of the downstream (SFGH) ORF, and these observations, together with the preservation of the reading frame, suggested that the two ORFs have merged to encode a bifunctional fusion protein. Genomic DNA polymerase chain reaction and 3′ rapid amplification of cDNA ends (RACE) confirmed the presence and transcription (data not shown), respectively, of this O. trifallax gene, which we have called FSF1 (FALDH/S-formylglutathione synthetase fusion 1). We subsequently found both a genomic sequence and an expressed sequence tag (EST) containing a gene encoding a similar fusion protein in a distantly related ciliate, Tetrahymena thermophila (fig. 1), suggesting that this fusion gene may be widespread among ciliate taxa. A further search of genome sequences available online revealed a fusion of the FALDH and SFGH genes in another protist group, the diatoms. There the two genes are found in the opposite arrangement, with the SFGH domain located N-terminal to the FALDH domain (fig. 2), suggesting an independent fusion event in this clade.
FALDH and SFGH, the two putative proteins that these gene fusions merge, are both members of evolutionarily ancient protein families, with homologs of each protein present in a wide variety of prokaryotes and eukaryotes. While enzymes in both families are known to act on a broad range of substrates, much research has been devoted to their mutual involvement in the detoxification of intracellular formaldehyde (Harms et al. 1996). FALDH, also known as alcohol dehydrogenase III (Holmquist and Vallee 1991), catalyzes the formation of S-formylglutathione from S-hydroxymethylglutathione, which forms spontaneously upon the interaction of formaldehyde and glutathione. SFGH catalyzes the second step in the detoxification reaction, in which glutathione is restored and the formyl group is released as formate. Though ours is the first report of a genetic linkage between these two genes in eukaryotes, prokaryotic FALDH and SFGH genes are adjacent in the genomes of diverse bacterial species (Blattner et al. 1997; Shaw, Arioli, and Plazinski 1998) or separated by only a few genes (Harms et al. 1996). Physical linkage of these functionally related proteins may provide a selective advantage to the protists described here; however, isolation of these fusion proteins from cells or their expression in a heterologous system would be needed to determine how the fusion affects the activity of either half of the protein.
While no naturally occurring fusion of the FALDH and SFGH genes has been described prior to this report, a number of other protein-coding gene fusions have been observed in both prokaryotes (Suhre and Claverie 2004) and eukaryotes. Fusions of genes involved in pyrimidine production and modification have recently been used to aid in studies of eukaryote evolution. The first three enzymes of the six-step pyrimidine biosynthetic pathway are fused at the genetic level in unikonts (animals, fungi, and amoebozoans) and exist as a fusion protein in these species (Nara, Hshimoto, and Aoki 2000). The fifth and sixth genes in the same pathway, which code for orotate phosphoribosyltransferase (OPRT) and orotidine-5′-monophosphate decarboxylase (OMPDC), have fused into a separate multidomain protein (OPRT-OMPDC) in many eukaryotes (Nara, Hshimoto, and Aoki 2000). The fusion of these two genes appears to have occurred independently in trypanosomatids, where OPRT comprises the C-terminal half of the protein (OMPDC-OPRT). These combinatorial fusions are akin to the reciprocal arrangements we report here for the FALDH and SFGH genes of ciliates and diatoms. While in both cases these arrangements most likely indicate independent fusion of coexpressed, functionally related proteins, it is possible that the constituent domains may have swapped positions following their initial fusions. Further analysis at the base of the ciliate and diatom trees may help determine if the diatom or ciliate genes fused independently or rearranged in one or the other lineage.
In later steps of pyrimidine synthesis, thymidylate synthase (TS) catalyzes the methylation of deoxyuridine monophosphate to form deoxythymidine monophosphate and dihydrofolate reductase (DHFR) catalyzes the reduction of 7,8-dihydrofolate, a by-product of the methylation reaction (Myllykallio et al. 2003). The TS and DHFR genes are transcribed separately in unikonts and prokaryotes but have fused to encode a DHFR-TS protein in bikonts (plants and many protist species, including ciliates and diatoms). This gene fusion has provided evidence that bikonts form a single clade, which diverged early in eukaryote evolution (Stechmann and Cavalier-Smith 2002; Stechmann and Cavalier-Smith 2003).
With the help of the above fusion genes, the root of the eukaryotic tree has recently been suggested to be between unikonts and bikonts (reviewed in Baldauf 2003). However, determining the evolutionary relationships among the many clades within these two divisions still remains a major challenge for evolutionary biologists. A thorough investigation of FALDH-SFGH and SFGH-FALDH gene fusions in a variety of protists may help define the origins of two major bikont clades.
Methods
We amplified and sequenced the FSF1 gene from O. trifallax (S. histriomuscorum) DNA (Chang et al. 2004) using primers corresponding to portions of GenBank accession numbers CC819739 and CC819368. The complete sequence of the macronuclear chromosome containing the O. trifallax FALDH-SFGH fusion (FSF1) gene has been deposited in GenBank under accession number AY63987. The T. thermophila FSF1 gene was identified in a TBlastN search of the T. thermophila genome sequence scaffolds using the Blast server at The Institute for Genomic Research, using the predicted O. trifallax Fsf1p protein as a query. The coding sequence of this gene is located between base pairs 542481 and 544951 of scaffold 8254688 of T. thermophila Genome Assembly 2, November_2003, and contains two introns in the FALDH region. These preliminary sequence data were obtained from The Institute for Genomic Research Web site at http://www.tigr.org. The T. thermophila EST clone 50072-2-7-H06 contains sequences corresponding to the genomic clone in the areas encoding FALDH (GenBank accession number BM395174, base pairs 99–494) and SFGH (BM395175, base pairs 149–724) (Turkewitz, A. P., K. M. Karrer, C. L. Jahn, E. Orias, K. E. Kirk, J. Frankel, and L. A. Klobutcher, personal communication).
Searches of the Thalassiosira pseudonana genome were performed using the Joint Genome Institute T. pseudonana Blast server at http://aluminum.jgi-psf.org/prod/bin/runBlast.pl?db=thaps1. The T. pseudonana SFGH-FALDH fusion (SFF1) gene is located between base pairs 60683 and 63183 of scaffold 8 (Release Version 1) (Armbrust et al. 2004). Online database searches using the T. pseudonana SFF1 gene performed using the National Center for Biotechnology Information Blast server (Altschul et al. 1997) identified two overlapping EST clones from Phaeodactylum tricornutum. These sequences are listed in GenBank under accession numbers CD378851 and CD382924 (Scala et al. 2002).
Present address: Department of Genetics, Stanford University School of Medicine.
Geoffrey McFadden, Associate Editor
We thank Wei-Jen Chang for his gift of the O. trifallax RNA and Aaron Turkewitz for assistance with the T. thermophila EST clone. Preliminary genomic sequence data for T. thermophila were obtained from The Institute for Genomic Research Web site at http://www.tigr.org. Thalassiosira pseudonana genome sequence data were produced by the U.S. Department of Energy Joint Genome Institute, http://www.jgi.doe.gov. This work was supported by National Institute of General Medical Sciences Grant GM59708 and National Science Foundation Grant EIA0121422 to L.F.L.
References
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman.
Armbrust, E. V., J. A. Berges, C. Bowler et al. (42 co-authors).
Blattner, F. R., G. Plunkett III, C. A. Bloch et al. (14 co-authors).
Cavalcanti, A. R., N. A. Stover, L. Orecchia, T. G. Doak, and L. F. Landweber.
Chang, W. J., N. A. Stover, V. M. Addis, and L. F. Landweber.
Doak, T. G., A. R. Cavalcanti, N. A. Stover, D. M. Dunn, R. Weiss, G. Herrick, and L. F. Landweber.
Harms, N., J. Ras, W. N. Reijnders, R. J. van Spanning, and A. H. Stouthamer.
Holmquist, B., and B. L. Vallee.
Jahn, C. L., and L. A. Klobutcher.
Marchler-Bauer, A., J. B. Anderson, C. DeWeese-Scott et al. (24 co-authors).
Myllykallio, H., D. Leduc, J. Filee, and U. Liebl.
Nara, T., T. Hshimoto, and T. Aoki.
Prescott, D. M., J. D. Prescott, and R. M. Prescott.
Riley, J. L., and L. A. Katz.
Scala, S., N. Carels, A. Falciatore, M. L. Chiusano, and C. Bowler.
Shaw, W. H., T. Arioli, and J. Plazinski.
Stechmann, A., and T. Cavalier-Smith.
Suhre, K., and J. M. Claverie.