Abstract

During the course of a pilot genome project for the ciliate Oxytricha trifallax, we discovered a fusion gene never before described in any taxa. This gene, FSF1, encodes a putative fusion protein comprising an entire formaldehyde dehydrogenase (FALDH) homolog at one end and an S-formylglutathione hydrolase (SFGH) homolog at the other, two proteins that catalyze serial steps in the formaldehyde detoxification pathway. We confirmed the presence of the Oxytricha fusion gene in vivo and detected transcripts of the full-length fusion gene. A survey of other large-scale sequencing projects revealed a similar fusion protein in a distantly related ciliate, Tetrahymena thermophila, and a possible fusion of these two genes in the diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana, but in the reverse order, with the SFGH domain encoded upstream of the FALDH domain. Orthologs of these fusion proteins may be widespread within the ciliates and diatoms.

Introduction

Development of the ciliate macronucleus involves widespread chromosome breakage followed by telomere addition (reviewed in Prescott 1994; Jahn and Klobutcher 2002). Following these processing events, most macronuclear chromosomes sequenced to date from spirotrichous ciliates contain the coding region of a single gene flanked by telomeres and short untranslated regions (Riley and Katz 2001; D. M. Prescott, J. D. Prescott, and R. M. Prescott 2002; Chang et al. 2004). Recently, 1,356 macronuclear chromosomes from the spirotrichous ciliate Oxytricha trifallax (Sterkiella histriomuscorum) were cloned and end sequenced as part of a pilot genome project (Doak et al. 2003). During annotation of these sequences, we identified a small set of putative multigene chromosomes by searching for chromosomes that contained different genes at opposing ends (Cavalcanti et al. 2004). One of these cloned chromosomes appeared to encode a putative formaldehyde dehydrogenase (FALDH) homolog at one end and a putative S-formylglutathione hydrolase (SFGH) at the other. We sequenced the entire chromosome and found open reading frames (ORFs) capable of encoding both full-length proteins. Intriguingly, both ORFs were oriented in the same direction on the chromosome and were separated by only 30 bp or 10 amino acids. We could find neither a stop codon at the 3′ end of the upstream (FALDH) ORF nor a start methionine codon at the beginning of the downstream (SFGH) ORF, and these observations, together with the preservation of the reading frame, suggested that the two ORFs have merged to encode a bifunctional fusion protein. Genomic DNA polymerase chain reaction and 3′ rapid amplification of cDNA ends (RACE) confirmed the presence and transcription (data not shown), respectively, of this O. trifallax gene, which we have called FSF1 (FALDH/S-formylglutathione synthetase fusion 1). We subsequently found both a genomic sequence and an expressed sequence tag (EST) containing a gene encoding a similar fusion protein in a distantly related ciliate, Tetrahymena thermophila (fig. 1), suggesting that this fusion gene may be widespread among ciliate taxa. A further search of genome sequences available online revealed a fusion of the FALDH and SFGH genes in another protist group, the diatoms. There the two genes are found in the opposite arrangement, with the SFGH domain located N-terminal to the FALDH domain (fig. 2), suggesting an independent fusion event in this clade.

FIG. 1.—

Putative Fsf1p proteins translated from the FSF1 genes of the ciliates Oxytricha trifallax (Sterkiella histriomuscorum) and Tetrahymena thermophila, aligned using the Blast 2 sequences tool at National Center for Biotechnology Information (Tatusova and Madden 1999), using default values. Amino acids linking the FALDH (N-terminal) and SFGH domains (C-terminal) are shaded.

FIG. 2.—

Conserved protein domains encoded by the fusion genes described in this paper, identified by comparisons to the Conserved Domain Database (CDD) at National Center for Biotechnology Information (Marchler-Bauer et al. 2003). (A) The Fsf1p fusion proteins encoded by the ciliates Oxytricha trifallax and Tetrahymena thermophila. (B) The Sff1p fusion proteins encoded by the diatoms Thalassiosira pseudonana and Phaeodactylum tricornutum. Top-scoring CDD domains are ADH_zinc_N (gray bars) and esterase (black bars), except for the SFGH region of the P. tricornutum gene, in which KOG3101 had the second highest score after COG0627. COG0627 is also an esterase domain. Numbers represent amino acid positions of the first three predicted proteins and for the peptide predicted by the overlapping P. tricornutum ESTs (see Methods).

FALDH and SFGH, the two putative proteins that these gene fusions merge, are both members of evolutionarily ancient protein families, with homologs of each protein present in a wide variety of prokaryotes and eukaryotes. While enzymes in both families are known to act on a broad range of substrates, much research has been devoted to their mutual involvement in the detoxification of intracellular formaldehyde (Harms et al. 1996). FALDH, also known as alcohol dehydrogenase III (Holmquist and Vallee 1991), catalyzes the formation of S-formylglutathione from S-hydroxymethylglutathione, which forms spontaneously upon the interaction of formaldehyde and glutathione. SFGH catalyzes the second step in the detoxification reaction, in which glutathione is restored and the formyl group is released as formate. Though ours is the first report of a genetic linkage between these two genes in eukaryotes, prokaryotic FALDH and SFGH genes are adjacent in the genomes of diverse bacterial species (Blattner et al. 1997; Shaw, Arioli, and Plazinski 1998) or separated by only a few genes (Harms et al. 1996). Physical linkage of these functionally related proteins may provide a selective advantage to the protists described here; however, isolation of these fusion proteins from cells or their expression in a heterologous system would be needed to determine how the fusion affects the activity of either half of the protein.

While no naturally occurring fusion of the FALDH and SFGH genes has been described prior to this report, a number of other protein-coding gene fusions have been observed in both prokaryotes (Suhre and Claverie 2004) and eukaryotes. Fusions of genes involved in pyrimidine production and modification have recently been used to aid in studies of eukaryote evolution. The first three enzymes of the six-step pyrimidine biosynthetic pathway are fused at the genetic level in unikonts (animals, fungi, and amoebozoans) and exist as a fusion protein in these species (Nara, Hshimoto, and Aoki 2000). The fifth and sixth genes in the same pathway, which code for orotate phosphoribosyltransferase (OPRT) and orotidine-5′-monophosphate decarboxylase (OMPDC), have fused into a separate multidomain protein (OPRT-OMPDC) in many eukaryotes (Nara, Hshimoto, and Aoki 2000). The fusion of these two genes appears to have occurred independently in trypanosomatids, where OPRT comprises the C-terminal half of the protein (OMPDC-OPRT). These combinatorial fusions are akin to the reciprocal arrangements we report here for the FALDH and SFGH genes of ciliates and diatoms. While in both cases these arrangements most likely indicate independent fusion of coexpressed, functionally related proteins, it is possible that the constituent domains may have swapped positions following their initial fusions. Further analysis at the base of the ciliate and diatom trees may help determine if the diatom or ciliate genes fused independently or rearranged in one or the other lineage.

In later steps of pyrimidine synthesis, thymidylate synthase (TS) catalyzes the methylation of deoxyuridine monophosphate to form deoxythymidine monophosphate and dihydrofolate reductase (DHFR) catalyzes the reduction of 7,8-dihydrofolate, a by-product of the methylation reaction (Myllykallio et al. 2003). The TS and DHFR genes are transcribed separately in unikonts and prokaryotes but have fused to encode a DHFR-TS protein in bikonts (plants and many protist species, including ciliates and diatoms). This gene fusion has provided evidence that bikonts form a single clade, which diverged early in eukaryote evolution (Stechmann and Cavalier-Smith 2002; Stechmann and Cavalier-Smith 2003).

With the help of the above fusion genes, the root of the eukaryotic tree has recently been suggested to be between unikonts and bikonts (reviewed in Baldauf 2003). However, determining the evolutionary relationships among the many clades within these two divisions still remains a major challenge for evolutionary biologists. A thorough investigation of FALDH-SFGH and SFGH-FALDH gene fusions in a variety of protists may help define the origins of two major bikont clades.

Methods

We amplified and sequenced the FSF1 gene from O. trifallax (S. histriomuscorum) DNA (Chang et al. 2004) using primers corresponding to portions of GenBank accession numbers CC819739 and CC819368. The complete sequence of the macronuclear chromosome containing the O. trifallax FALDH-SFGH fusion (FSF1) gene has been deposited in GenBank under accession number AY63987. The T. thermophila FSF1 gene was identified in a TBlastN search of the T. thermophila genome sequence scaffolds using the Blast server at The Institute for Genomic Research, using the predicted O. trifallax Fsf1p protein as a query. The coding sequence of this gene is located between base pairs 542481 and 544951 of scaffold 8254688 of T. thermophila Genome Assembly 2, November_2003, and contains two introns in the FALDH region. These preliminary sequence data were obtained from The Institute for Genomic Research Web site at http://www.tigr.org. The T. thermophila EST clone 50072-2-7-H06 contains sequences corresponding to the genomic clone in the areas encoding FALDH (GenBank accession number BM395174, base pairs 99–494) and SFGH (BM395175, base pairs 149–724) (Turkewitz, A. P., K. M. Karrer, C. L. Jahn, E. Orias, K. E. Kirk, J. Frankel, and L. A. Klobutcher, personal communication).

Searches of the Thalassiosira pseudonana genome were performed using the Joint Genome Institute T. pseudonana Blast server at http://aluminum.jgi-psf.org/prod/bin/runBlast.pl?db=thaps1. The T. pseudonana SFGH-FALDH fusion (SFF1) gene is located between base pairs 60683 and 63183 of scaffold 8 (Release Version 1) (Armbrust et al. 2004). Online database searches using the T. pseudonana SFF1 gene performed using the National Center for Biotechnology Information Blast server (Altschul et al. 1997) identified two overlapping EST clones from Phaeodactylum tricornutum. These sequences are listed in GenBank under accession numbers CD378851 and CD382924 (Scala et al. 2002).

1

Present address: Department of Genetics, Stanford University School of Medicine.

Geoffrey McFadden, Associate Editor

We thank Wei-Jen Chang for his gift of the O. trifallax RNA and Aaron Turkewitz for assistance with the T. thermophila EST clone. Preliminary genomic sequence data for T. thermophila were obtained from The Institute for Genomic Research Web site at http://www.tigr.org. Thalassiosira pseudonana genome sequence data were produced by the U.S. Department of Energy Joint Genome Institute, http://www.jgi.doe.gov. This work was supported by National Institute of General Medical Sciences Grant GM59708 and National Science Foundation Grant EIA0121422 to L.F.L.

References

Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman.

1997
. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Res.
25
:
3389
–3402.

Armbrust, E. V., J. A. Berges, C. Bowler et al. (42 co-authors).

2004
. The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism.
Science
306
:
79
–86.

Baldauf, S. L.

2003
. The deep roots of eukaryotes.
Science
300
:
1703
–1706.

Blattner, F. R., G. Plunkett III, C. A. Bloch et al. (14 co-authors).

1997
. The complete genome sequence of Escherichia coli K-12.
Science
277
:
1453
–1474.

Cavalcanti, A. R., N. A. Stover, L. Orecchia, T. G. Doak, and L. F. Landweber.

2004
. Coding properties of Oxytricha trifallax (Sterkiella histriomuscorum) macronuclear chromosomes: analysis of a pilot genome project.
Chromosoma
113
:
69
–76.

Chang, W. J., N. A. Stover, V. M. Addis, and L. F. Landweber.

2004
. A micronuclear locus containing three protein-coding genes remains linked during macronuclear development in the spirotrichous ciliate Holosticha.
Protist
155
:
245
–255.

Doak, T. G., A. R. Cavalcanti, N. A. Stover, D. M. Dunn, R. Weiss, G. Herrick, and L. F. Landweber.

2003
. Sequencing the Oxytricha trifallax macronuclear genome: a pilot project.
Trends Genet.
19
:
603
–607.

Harms, N., J. Ras, W. N. Reijnders, R. J. van Spanning, and A. H. Stouthamer.

1996
. S-Formylglutathione hydrolase of Paracoccus denitrificans is homologous to human esterase D: a universal pathway for formaldehyde detoxification? J.
Bacteriol.
178
:
6296
–6299.

Holmquist, B., and B. L. Vallee.

1991
. Human liver class III alcohol and glutathione dependent formaldehyde dehydrogenase are the same enzyme.
Biochem. Biophys. Res. Commun.
178
:
1371
–1377.

Jahn, C. L., and L. A. Klobutcher.

2002
. Genome remodeling in ciliated protozoa.
Annu. Rev. Microbiol.
56
:
48
.

Marchler-Bauer, A., J. B. Anderson, C. DeWeese-Scott et al. (24 co-authors).

2003
. CDD: a curated Entrez database of conserved domain alignments.
Nucleic Acids Res.
31
:
383
–387.

Myllykallio, H., D. Leduc, J. Filee, and U. Liebl.

2003
. Life without dihydrofolate reductase FolA.
Trends Microbiol.
11
:
220
–223.

Nara, T., T. Hshimoto, and T. Aoki.

2000
. Evolutionary implications of the mosaic pyrimidine-biosynthetic pathway in eukaryotes.
Gene
257
:
209
–222.

Prescott, D. M.

1994
. The DNA of ciliated protozoa.
Microbiol. Rev.
58
:
233
–267.

Prescott, D. M., J. D. Prescott, and R. M. Prescott.

2002
. Coding properties of macronuclear DNA molecules in Sterkiella nova (Oxytricha nova).
Protist
153
:
71
–77.

Riley, J. L., and L. A. Katz.

2001
. Widespread distribution of extensive chromosomal fragmentation in ciliates.
Mol. Biol. Evol.
18
:
1372
–1377.

Scala, S., N. Carels, A. Falciatore, M. L. Chiusano, and C. Bowler.

2002
. Genome properties of the diatom Phaeodactylum tricornutum.
Plant Physiol.
129
:
993
–1002.

Shaw, W. H., T. Arioli, and J. Plazinski.

1998
. Cloning and sequencing of a S-formylglutathione hydrolase (FGH) gene from the cyanobacterium Anabaena azollae (Accession No AF035558) (PGR98-024).
Plant Physiol.
116
:
868
.

Stechmann, A., and T. Cavalier-Smith.

2002
. Rooting the eukaryote tree by using a derived gene fusion.
Science
297
:
89
–91.

———.

2003
. The root of the eukaryote tree pinpointed.
Curr. Biol.
13
:
R665
–R666.

Suhre, K., and J. M. Claverie.

2004
. FusionDB: a database for in-depth analysis of prokaryotic gene fusion events.
Nucleic Acids Res.
32
:
D273
–D276.

Tatusova, T. A., and T. L. Madden.

1999
. BLAST 2 sequences, a new tool for comparing protein and nucleotide sequences.
FEMS Microbiol. Lett.
174
:
247
–250.

Supplementary data