Analysis of Functional Genes in Carbohydrate Metabolic Pathway of Anaerobic Rumen Fungus Neocallimastix frontalis PMA 02

Anaerobic rumen fungi have been regarded as good genetic resources for enzyme production which might be useful for feed supplements, bio-energy production, bio-remediation and other industrial purposes. In this study, an expressed sequence tag (EST) library of the rumen anaerobic fungus Neocallimastix frontalis was constructed and functional genes from the EST library were analyzed to elucidate carbohydrate metabolism of anaerobic fungi. From 10,080 acquired clones, 9,569 clones with average size of 628 bp were selected for analysis. After the assembling process, 1,410 contigs were assembled and 1,369 sequences remained as singletons. 1,192 sequences were matched with proteins in the public data base with known function and 693 of them were matched with proteins isolated from fungi. One hundred and fifty four sequences were classified as genes related with biological process and 328 sequences were classified as genes related with cellular components. Most of the enzymes in the pathway of glucose metabolism were successfully isolated via construction of 10,080 ESTs. Four kinds of hemi-cellulase were isolated such as mannanase, xylose isomerase, xylan esterase, and xylanase. Five β-glucosidases with at least three different conserved domain structures were isolated. Ten cellulases with at least five different conserved domain structures were isolated. This is the first solid data supporting the expression of a multiple enzyme system in the fungus N. frontalis for polysaccharide hydrolysis. (


INTRODUCTION
Ruminant animals use fibrous plant materials as feed by degrading insoluble polysaccharides during microbial fermentation.Bacteria, fungi and protozoa are the main microorganisms in the rumen ecosystem.After the first identification of the anaerobic fungus Neocallimastix frontalis (Orpin, 1975), 6 genera including Anaeromyces, Caecomyces, Cyllamyces, Neocallimastix, Orpinomyces and Piromyces were isolated and identified from the gut of herbivorous animals.The anaerobic fungi contribute mainly to fiber digestion in the rumen with their effective cellulolytic enzymes as well as physical penetration of fungal rhizoid into the fiber matrix (Ho and Abdullah, 1988).The anaerobic fungi secrete various kinds of carbohydrate degrading enzymes including endoglucanase (Barichieich and Calza, 1990), exoglucanase (Mountfort and Asher, 1985), xylanase (Teunissen et al., 1993), and βglucosidase (Li and Calza, 1991).For this reason, the anaerobic rumen fungi have been regarded as good genetic resources for enzyme production which might be useful for animal production, bio-energy production, bio-remediation and other industrial purposes.However, due to the limited number of research groups for anaerobic fungi in the world, the available genetic information about carbohydrate metabolism for anaerobic fungi is limited.In the public data base, 1,430 nucleotide sequences including 384 core nucleotides, 996 EST (expression sequence tag) and 247 protein sequences from the family Neocallimastigaceae are currently available (http://www.ncbi.nlm.nih.gov/).Among 247 protein sequences, 70 glucose-hydrolyzing enzyme sequences and 33 xylose-hydrolyzing enzyme sequences are currently available.For N. frontalis, 68 nucleotide sequences and 44 protein sequences for basic carbohydrate metabolism, respiration and house-keeping proteins are available in the public data base (http://www.ncbi.nlm.nih.gov/).Among 44 available protein sequences from N. frontalis, only 4 sequences encoding glucose-hydrolyzing enzyme are available in the public data base (http://www.ncbi.nlm.nih.gov/).
When a nucleotide sequence encoding cellobiohydrolase (celB, AY328465.1)was searched using the BLAST algorithm, more than 8 nucleotide sequences encoding cellobiohydrolase from Orpinomyces, Piromyces and other Neocallimastix species were matched with more than 79% of identity and 0.0 E-value.The genera Neocallimastix, Orpinomyces and Piromyces are members of Family Neocallimastigaceae and they are very close in genetic relation and the genes encoding glucose hydrolyzing enzymes are highly homologous among anaerobic fungi (Harhangi et al., 2003).In addition, these genes are even homologous to those in anaerobic bacteria due to horizontal gene transfer (Garcia-Vallve et al., 2000).For this reason, there might be a strong possibility that N. frontalis secrete more than 4 glucose hydrolyzing enzymes which have not been detected.
Many different experimental techniques can be applied to find functional genes from an organism.One experimental technique, high-throughput expressed sequence tag (EST) analysis of complementary DNA (cDNA), provides important information about functional genes with reasonable cost (Nagaraj et al., 2006).Comparison of homology of multiple genes among multiple organisms during evolution is also available with EST analysis (Brinkmann et al., 2005).The purpose of this study was to isolate functional genes related with carbohydrate metabolism from the anaerobic rumen fungus N. frontalis using high throughput EST analysis for further research.

Strains and culture condition
Neocallimastix frontalis PMA02 was isolated from the rumen of a Holstein steer using Hungate's roll tube method (Hungate, 1966).Isolated fungus was identified according to morphological characteristics and rRNA ITS1 sequence (Brookman et al., 2000).The rRNA ITS1 sequence with 680 bp size from N. frontalis PMA02 was homologous to that from N. frontalis SR4 (Fliegerova et al., 2004;AY429664) with 99% of identity.The fungal strain was cultivated with modified Lowe's medium (Lowe et al., 1985) containing 2% of glucose, cellobiose, and starch (2:1:1) mixture as carbohydrate source.After incubation at 39°C for 72 h under anaerobic condition, fungal cells were harvested and stored at -80°C until use.

RNA extraction and cDNA library construction
Fungal cells were homogenized under liquid nitrogen and total RNAs were extracted using Trizol TM reagent (Invitrogen, USA) according to the manufacturer's instruction.The quantity and quality of extracted RNAs were determined using a spectrophotometer (Nanodrop Technologies Inc, USA) and agarose gel electrophoresis, respectively.The mRNAs were further purified from total RNAs using Absolute mRNA Purification Kit (Strtagene, USA) according to the manufacturer's instruction.The cDNAs were synthesized with 5 μg of mRNA using ZAP Express ® cDNA Synthesis Kit (Startagene, USA).In detail, the mixture of 5 μl of 10X first-strand buffers, 3 μl of firststrand methyl nucleotide mixture, 2 μl of linker-primer, 1 μl of RNase block Ribonuclease Inhibitor (40 U/μl), 5 μg of poly(A) RNA, 25 μl of RNA, 1.5 μl MMLV-RT (50 U/μl) reagent and 11 μl of diethylpyrocarbonate(DEPC)-treated distilled water was incubated for 1 h at 42°C for first-strand cDNAs synthesis; 50 μl of ice cooled first-strand cDNAs were mixed with 20 μl of 10X second-strand buffer, 6 μl of second-strand dNTP mixture, 111 μl of sterile distilled water, 2 μl of E. coli RNase H (1.5 U/μl) and 1 μl of E. coli DNA polymerase II (9.0 U/μl) and incubated for 2.5 h at 16°C.Subsequently, second-strand cDNAs were mixed with 23 μl of blunting dNTP mix and 2 μl cloned Pfu DNA polymerase (2.5 U/μl) and incubated for 30 min at 72 °C.After phenol-chloroform (1:1) and chloroform extraction, the precipitated cDNA pellet was washed with 20 μl of 3 M sodium acetate and subsequently 400 μl of ethanol mixture at -20°C.The washed cDNA pellet was dried, suspended with 8 μl of EcoR I adapter and incubated at 4°C for 30 min.for blunting the cDNA termini.For ligation of the EcoRI adapters, blunted cDNAs were mixed with 1 μl of 10× ligase buffer, 1 μl of 10 mM rATP and 1 μl of T4 DNA ligase(4 U/μl) and incubated overnight at 8°C.The reaction was terminated by heating at 70°C for 30 min.After ligation, the reaction mixture was mixed with 1 μl of 10× ligase buffer, 2 μl of 10 mM rAMP, 5 μl of distilled water and 2 μl of T4 polynucleotide kinase (5 U/μl) and incubated for 30 min at 37°C for phosphorylation.The reaction was inactivated by heating at 70°C for 30 min.
Phosphorylated cDNAs were mixed with 28 μl of XhoI buffer and 3 μl of XhoI (40 U/μl) and incubated for 1.5 h at 37°C for digestion and fractionated by molecular size using a separose CL-2B gel filtration column.The sizes of cDNAs were confirmed using 5% non-denaturing acrylamide gel and the cDNA fragments with size over 400 bp were collected.cDNAs were ligated into ZAP Expression vector (Stratagene, USA) according to the manufacturer's instruction.The ligated cDNAs were packed into Gigapack III Gold packing extract (Stratagene, USA) and transfected into E. coli XL1-Blue MRF'.After incubation for 2 h at 22°C, cell debris was removed and phage-containing supernatant was collected.The primary library was titrated with SM buffer to 10 7 pfu lamda phage concentration and stored at 4°C.The mixture of 10 8 pfu XL1-Blue MRF cell and 10 9 pfu ExAssist helper phage was packed into lamda phage.After incubation at 37°C for 15 min, phagemids were titrated with 1× NZY broth.Titrated phagemids were incubated in LB agar and each colony was collected for DNA analysis.

EST sequencing
The purified DNAs were sequenced from the 5' region with a vector specific universal primer, using PRISMTM BigDyeTM Terminator Cycle Sequencing Ready Reaction Kit (Applied Biosystems, USA).In detail, 200-250 ng plasmid DNA, 0.5 μl of 3 pmol primer, 0.87 μl of 5× sequencing buffer, 1.38 μl of distilled water and 0.25 μl of BigDye was mixed and amplified using a Gene Amp PCR system 9700 (Applied Biosystems, USA).The PCR reaction was conducted with 36 cycles of denaturation at 96°C for 10 s, annealing at 50°C for 5 s, and extension at 60°C for 4 min.The PCR products were purified and subsequently sequenced using a ABI3730XL DNA analyzer (Applied Biosystems, USA).

Sequence processing and analysis
The chromatogram data of the sequence analyzer were converted using Phred base calling software (http://www.prap.org/PhredPhrap/phred.html).After base calling, vector trimming was performed using Cross-match software (http://www.phap.org)and subsequent repeat masking was performed using Repeat Masker (http://www.repeartmasker.org)software to remove repeated and E. coli sequences.The TGICL software (http://tigr.org/td6/tgi/software)and Megablast (http://blast.ncbi.nlm.nih.gov)software were used for clustering of cDNA sequences.The clustered cDNA sequences were assembled using CAP3 (http://genome.cs.mtu.edu/cap/cap3.html)software.The similarity analyses for each sequence were performed using blastn (http://blast.ncbi.nlm.nih.gov), for nucleotide sequences.The protein sequences from translated nucleotide sequences were analyzed using BLASTX (http://blast.ncbi.nlm.nih.gov),Uniprot (http://www.uniprot.org),and KEGG (http://www.genome.jp)data bases.For classification of acquired sequences according to their biological function, the Gene Ontology database (GO, http://www.geneontology.org)was used.For determination of significant similarity, the scores greater than 150, p values less than 0.005 and identities greater or equal to 40% from BLAST results were classified as strong similarities.

cDNA library and EST sequencing
From the constructed cDNA library, 10,080 clones were sequenced and 9,811 clones were selected after the base calling process.During the vector trimming process, 9,633 clones were selected and 9,569 clones with average size of 628 bp were finally selected for further sequence analysis.After the clustering process, 1,369 sequences remained as singletons and 1,410 contigs were assembled.As a result, 2,779 partial sequences from N. frontalis genes were obtained out of 9,569 finally selected clones.The EST data base was deposited to an EST knowledge integration system in the Genome Research Center, Korea Research Institute of Bioscience and Biotechnology (http://www.ekis.kr)for public use.
After searching nucleotide sequence using BLASTN, 37 contig sequences were classified as strong similarity and 418 contig sequences were classified as marginal similarity.Thirteen singletons were classified as strong similarity and 217 singletons were classified as marginal similarity (Table 1).However, 955 contigs and 1,139 singletons could not be annotated using BLASTN.Using BLASTX, both translated contig and singleton sequences were compared with protein sequences in the GeneBank data base.As a result, 733 contigs and 459 singleton were classified as strong similarity and 212 contigs and 247 singletons were classified as marginal similarity.However, 465 contigs and 633 singletons could not be matched with any protein sequences.The number of BLASTX search results was greater than that of BLASTN.The search results using Uniprot were not different from those of BLASTX.However, 707 contigs and 435 singletons were classified as strong similarity using KEGG and the total number of annotated contigs and singletons was 918 and 682, respectively.Total annotated numbers using KEGG were 20 less than those using BLASTX or Uniprot.This might be the difference among the deposited data base in each group.
Among 2,779 assembled sequence data, 1,192 matched sequences using BLASTX database were classified as base on the origin of sequence (Figure 1).Most sequences were matched with sequences originated from either fungi (693) or bacteria (98).Some sequences were matched with sequences originated from animals including fish (60), insect (56), mammalian (51) and amphibian ( 54).Interestingly, 6 translated sequences were matched with sequences from the human.In addition, 39 translated nucleotide sequences were matched with sequences of unknown origin.Basic functional genes for maintenance of life might be originated from primitive organisms and the main functional domains might be conserved during evolution.In addition, horizontal and vertical gene transfer might cause the detection of basic functional genes from different origins and these were supported by previous reports (Garcia-Vallve et al., 2000;Van der Giezen et al., 2003).

Classification of functional genes
The EST sequences with strong similarities were further analyzed using the Gene Ontology (GO) algorithm to classify according to biological process, cellular localization and molecular function.One hundred and fifty four sequences were classified as genes related with biological processes such as metabolism, communication, regulation of enzyme activity, secretion and others (Table 2).The 97 and 48 sequences from the genes classified in the category of metabolic process were predicted to genes for cell growth and cell communication, respectively.Since growth is essential for the survival of every cell, the proteins for cell growth might be conservative.This might be one reason that most annotated sequences belonged to metabolic process and cell communication.Among 97 sequences classified as genes related with metabolic process, eleven sequences were classified as genes related with amino acid synthesis such as phosphoserine aminotransferase, carbamoyl-phosphate synthase, tryptophan synthase and others (data not shown).In addition, sequences for fatty acid metabolism, i.e. short chain fatty acid synthase, were also obtained.However, 25 sequences from the category of metabolic process were classified as hypothetical metabolic proteins.
Forty eight sequences in the category of cell communication were sequences encoding GTP-binding proteins, G-proteins and nucleotide binding proteins.Most genes related to cell communication were homologous to those from aerobic fungi such as Ustilago, Aspergillus or Schizosaccaromyces.Currently, the sequences for the proteins involved in the cell communication in anaerobic fungi are not available in the public data base (http://www.ncbi.nlm.nih.gov).Therefore, 48 annotated sequences in this category could not be compared with those from anaerobic fungi.This might be the first report for genes encoding proteins for cell communication in anaerobic fungi.On the other hand, sequences related to secretion, organism physiological process, interaction between organism and responses to stimuli were rarely detected.
When the assembled genes were analyzed based on the cellular location of expression, most proteins were expressed intracellularly but to a minor extent in membrane, extracellular and others.Like other anaerobic fungi, Neocallimastix is reported to secrete extracellular proteolytic enzymes (Wallace and Joblin, 1985); however, no sequences for proteolytic enzymes have been reported previously from genus Neocallimastix (http://www.ncbi.nlm.nih.gov).In this study, 10 assembled sequences were matched with sequences for proteolytic enzymes such as protease, dipeptidase and chitin deacetylase.This is the first report for functional proteolytic enzymes during mediated carbohydrate degradation and metabolism in genus Neocallimastix.
Pyruvate is then further metabolized to either lactate or ethanol in the cytosol.The sequences for lactate dehydrogenase (LD), pyruvate formate-lyase (PFL), pyruvate formate-lyase activating enzyme (PFLA) and aldehyde/alcohol dehydrogenase (ADHE) were isolated.However, acetaldehyde dehydrogenase (E.C. 1.2.1.10)which converts acetyl-CoA to acetaldehyde was not isolated.Unlike other eukaryotes, anaerobic fungi possess hydrogenosomes instead of mitochondria.Produced pyruvate is transported to the hydrogenosome and metabolized to formate and acetate (Akhmanova et al., 1999).Sequences for hydrogenosomal proteins, including malic enzyme (ME) and succinyl-CoA synthetase beta subunit (SCSB), were also isolated in this study.Although enzyme activity was reported to be detected from the culture supernatant of Neocallimastix patriciarum (Yarlett et al., 1986), the hydrogenosomal pyruvate:ferredoxin oxidoreductase which converts pyruvate into acetyl-CoA was not isolated.Phosphoenolpyruvate is then metabolized to succinate in the cytosol (Boxma et al., 2004).The sequences for phosphoenolpyruvate carboxykinase (PEPCK), malate dehydrogenase (MD), and fumarate reductase (FR) were also obtained.The anaerobic fungi are amitochondrial organisms and have a hydrogenosome instead for energy production (Yarlett et al., 1986).The partial sequences for the mitochondrial enzymes of the TCA cycle such as aconitase (AT) and isocitrate dehydrogenase (ICD) were also isolated.From a previous report, enzymes AT and ICD were isolated from the cytosol of the anaerobic fungus Piromyces sp.E2 (Akhmanova et al., 1998).Due to the lack of Krebs cycle in mitochondria, the detection of aconitase and isocitrate dehydrogenase might suggest the existence of α-ketoglutarate as a metabolite of aconitate and isocitrate.The biochemical pathway of citrate production in anaerobic fungi is not clearly understood yet.However, there might be a possible existence of citrate synthase which converts oxaloacetate to citrate.Further research is required to elucidate the biological pathway for citrate production in
In conclusion, polysaccharides such as starch, hemicellulose and cellulose are hydrolyzed into monosaccharides such as glucose and xylose for cellular uptake.Extracellular monosaccharides are transported into cytosol and metabolized through multiple steps of sequential enzymatic reactions (Figure 2).During the biochemical pathway from xylose to fructose-6-phosphate, sequences for xylose isomerase were isolated, but sequences for xylulo kinase and trans ketolase were not isolated.On the other hand, most enzymes in the glycolysis pathway starting from glucose to phosphoenolpyruvate were isolated in this study.The enzymes related to further degradation of malate, i.e. fumarase, fumarate reductase, citrate synthase were not isolated; however, aconitase and isocitrate dehydrogenase were isolated.The existence of aconitase and isocitrate dehydrogenase might implicate the existence of alternative pathway for malate metabolism.Further biochemical and molecular characterization of isolated genes are required for better understanding of carbohydrate metabolism in the anaerobic fungus N. frontalis.

Figure 1 .
Figure 1.The classification results of matched sequence by origin.

Table 1 .
Annotation results of 2779 EST sequences using different data nase systems * Matched (S) = Strongly matched sequence; Matched (W) = Weakly matched sequence; Matched (T) = Total matched sequence.

Table 2 .
Results of gene ontology (GO) annotations based on biological process, cellular components and molecular function

Table 3 .
Results of isolated sequences for glucose metabolism

Table 4 .
Results of isolated sequences for polysaccharide hydrolysis