Cloning and Characterization of the Glycoside Hydrolases That Remove Xylosyl Groups from 7-β-xylosyl-10-deacetyltaxol and Its Analogues*

Paclitaxel, a natural antitumor compound, is produced by yew trees at very low concentrations, causing a worldwide shortage of this important anticancer medicine. These plants also produce significant amounts of 7-β-xylosyl-10-deacetyltaxol, which can be bio-converted into 10-deacetyltaxol for the semi-synthesis of paclitaxel. Some microorganisms can convert 7-β-xylosyl-10-deacetyltaxol into 10-deacetyltaxol, but the bioconversion yield needs to be drastically improved for industrial applications. In addition, the related β-xylosidases of these organisms have not yet been defined. We set out to discover an efficient enzyme for 10-deacetyltaxol production. By combining the de novo sequencing of β-xylosidase isolated from Lentinula edodes with RT-PCR and the rapid amplification of cDNA ends, we cloned two cDNA variants, Lxyl-p1–1 and Lxyl-p1–2, which were previously unknown at the gene and protein levels. Both variants encode a specific bifunctional β-d-xylosidase/β-d-glucosidase with an identical ORF length of 2412 bp (97% identity). The enzymes were characterized, and their 3.6-kb genomic DNAs (G-Lxyl-p1–1, G-Lxyl-p1–2), each harboring 18 introns, were also obtained. Putative substrate binding motifs, the catalytic nucleophile, the catalytic acid/base, and potential N-glycosylation sites of the enzymes were predicted. Kinetic analysis of both enzymes showed kcat/Km of up to 1.07 s−1mm−1 against 7-β-xylosyl-10-deacetyltaxol. Importantly, at substrate concentrations of up to 10 mg/ml (oversaturated), the engineered yeast could still robustly convert 7-β-xylosyl-10-deacetyltaxol into 10-deacetyltaxol with a conversion rate of over 85% and a highest yield of 8.42 mg/ml within 24 h, which is much higher than those reported previously. Therefore, our discovery might lead to significant progress in the development of new 7-β-xylosyl-10-deacetyltaxol-converting enzymes for more efficient use of 7-β-xylosyltaxanes to semi-synthesize paclitaxel and its analogues. This work also might lead to further studies on how these enzymes act on 7-β-xylosyltaxanes and contribute to the growing database of glycoside hydrolases.

The protection and sustainable utilization of natural resources are among the most important and global problems of the 21st century. Paclitaxel (Taxol ® ) is mainly isolated from slow-growing yew trees (genus Taxus, family Taxaceae) and is known as a "blockbuster drug " showing unique active mechanisms (1), with prominent activity against various cancers (including ovarian, breast, lung, head, and neck carcinomas and the AIDS-related Kaposi sarcoma) (2). However, the source of paclitaxel has always been a top concern, because its content in the plant is extremely low, and it is isolated in "large " amounts (ϳ0.02%) only from the bark of the tree (3). A 100-year-old tree might yield 3 kg of bark, which provides enough paclitaxel for one 300-mg dose (4). To preserve the Taxus resource and alleviate some of the pressure on the source, several approaches have been employed to prepare paclitaxel or its analog Taxotere, including chemical semisynthesis from the precursor 10-deacetylbaccatin III (DB), 1 which is readily available from the twigs of yew trees such as Taxus baccata (5,6); isolation from the twigs of nursery trees including T. chinensis var. mairei and T. media (hybrid); paclitaxel-producing endophytic strain fermentation (7,8); and Taxus cell and tissue culture (9). The first two approaches might partially relieve this pressure, but they still cannot meet the growing market demand. 7-␤-xylosyltaxanes are much more abundant and are extracted simultaneously with paclitaxel and DB from various species of yew (10 -12), but generally they are dealt with as byproducts. Among these analogues, 7-␤-xylosyl-10deacetyltaxol (XDT) can be obtained with a yield of as much as 0.5% (from dried stem bark) (13). These 7-␤-xylosyltaxanes can be hydrolyzed via chemical or biological methods to give the corresponding 7-hydroxyltaxanes, including 10-deacetyltaxol (DT) and DB, for the semi-synthesis of paclitaxel. In contrast to the chemical approach, which utilizes periodate or other oxidizing agents and a substituted hydrazine in the reactions to remove the sugar, the biological approach is an enzymatic process that releases the D-xylose from 7-xylosyltaxanes through the specific ␤-xylosidase and is therefore considered to be environmentally friendly.

EXPERIMENTAL PROCEDURES
Fungal Culture-The strain L. edodes M95.33 was grown in 100 ml wheat bran medium (contents per liter of distilled water: 50.00 g wheat bran (mixed with the appropriate amount of water, boiled for 30 min, and filtered to remove the solid residue), 20.00 g peptone, 1.50 g KH 2 PO 4 , 0.75 g MgSO 4 , final pH ϳ6.3; inoculum amount: ϳ1 cm 2 of lawn picked from a mycelial slant) for 6 to 8 days at 25°C to 26°C at 160 rpm in an orbital shaker.
Enzyme Purification-For the natural enzyme purification, mycelium from 5 l culture was harvested via filtration, washed with sterile water, and then homogenized in liquid nitrogen. This was followed by suspension in three to five volumes of 50 mM Tris-HCl cell lysis buffer (pH 8.0) and sonication for 5 min (130 W, 10 s/10 s). The supernatant of the lysate underwent anion exchange column chromatography (DEAE Sepharose Fast Flow, 1.6 cm ϫ 20 cm, GE Healthcare). Proteins were eluted with a phase gradient of 0, 0.1, 0.25, and 2.0 M NaCl (at a flow rate of 2 ml/min). Fractions with ␤-xylosidase activity at 0.1-0.25 M NaCl were collected, pooled, and added to 1 M (NH 4 ) 2 SO 4 for subsequent hydrophobic column chromatography (Phenyl Sepharose Fast Flow, 1.6 cm ϫ 20 cm, GE Healthcare). Proteins were eluted with a linear gradient of 1.0 -0 M (NH 4 ) 2 SO 4 in 1000 ml 50 mM Tris-HCl buffer, pH 8.0 (at a flow rate of 2 ml/min). Fractions with ␤-xylosidase activity were collected, pooled, dialyzed against 50 mM Tris-HCl buffer, pH 8.0, and applied to an anion exchange column (DEAE Sepharose Fast Flow, 1.6 cm ϫ 20 cm, GE Healthcare) as previously described. Proteins were eluted with a linear gradient of 0.1-0.25 M NaCl in 1000 ml 50 mM Tris-HCl buffer, pH 8.0 (at a flow rate of 2 ml/min). Peak fractions with ␤-xylosidase activity were collected, pooled, concentrated, and subjected to gel filtration column chromatography (Sephacryl S200 High Resolution, 1.6 cm ϫ 60 cm, GE Healthcare). Elution was performed with 50 mM Tris-HCl buffer containing 0.1 M NaCl, pH 8.0 (at a flow rate of 0.5 ml/min). Peak fractions with ␤-xylosidase activity were collected, pooled, and concentrated. During the purification process, the ␤-xylosidase activity was monitored with the substrate p-nitrophenyl-␤-Dxylopyranoside (PNP-Xyl, Sigma) and examined with the substrate XDT. Protein concentration was determined using BCA (Thermo) or Bio-Rad protein assay kits.
For the recombinant enzyme purification, 500 mg of freeze-dried recombinant yeast cells was mixed with 10 ml Yeast Protein Extraction Reagent (Merck) and 20 l DNase I and incubated at room temperature for 30 min. The suspension was centrifuged for 10 min at 12,000 rpm, yielding about 8 ml of the supernatant. The recombinant protein was purified from the supernatant using an Amersham Biosciences HiTrap Chelating HP Kit (GE Healthcare) and concentrated via ultrafitration (Millipore, Billerica, MA).
Deglycosylation of the Enzyme-To remove the carbohydrates from the enzyme, 45 l of the concentrated enzyme solution was mixed with 5 l of 10ϫ denatured glycoprotein buffer and boiled at 100°C for 10 min. 5.5 l of 10ϫG5 buffer and 2 l Endo Hf enzyme (NEB) were added and incubated at 50°C for 30 min. To preserve the deglycosylated enzyme activity, the glycoprotein was mixed with 5 l 10ϫG5 buffer and 1 l Endo Hf enzyme and incubated at 37°C overnight.
Enzymatic Activity Assay-The ␤-xylosidase activity was assayed by measuring the amount of p-nitrophenol released from the substrate p-nitrophenyl-␤-D-xylopyranoside (PNP-Xyl, Sigma) using spectrophotometry (NanoPhotometer ® P300, IMPLEN, Munich, Germany) based on the absorbance at 405 nm. Assays were performed in a total volume of 125 l (containing 25 l of the enzyme solution) at 50°C in 50 mM sodium acetate buffer, pH 5.0, containing 5 mM substrate for 20 min. Reactions were terminated by adding 2 ml saturated sodium tetraborate (Na 2 B 4 O 7 ). One unit of activity was defined as the amount of enzyme that catalyzed the release of 1 nM p-nitrophenol per minute at 50°C and pH 5.0. To observe the glycoside specificity of the enzyme, three other p-nitrophenyl glycosides (Sigma)-p-nitrophenyl-␤-D-glucopyranoside (PNP-Glc), p-nitrophenyl-␤-D-galactopyranoside (PNP-Gal), and p-nitrophenyl-␣-L-arabinopyranoside (PNP-Ara)-were used as substrates with PNP-Xyl as the control using the same method as described above.
To examine the enzyme activity against XDT (prepared in this lab), 1 ml of the enzyme solution was mixed with 10 l XDT solution (dissolved in dimethyl sulfoxide, 20 mg/ml), and the reaction was carried out in a 45°C water bath for 24 h. The suspension was extracted with ethyl acetate and assayed by means of thin layer chromatography (TLC) (mobile phase: petrolium ether:dichloromethane:methanol ϭ 1.5:3.5:0.33, v/v/v), and the color developing agent was 10% sulfuric acid and ethanol solution.
SDS-PAGE and Protein Determination-The protein samples were analyzed via SDS-PAGE on 10% (w/v) polyacrylamide gels and stained with silver or Coomassie Brilliant Blue R-250. For the reductive treatment of the samples, 5ϫ loading buffer (0.2 M Tris-HCl, pH 6.8, 10% SDS, 10 mM ␤mercaptoethanol, 20% glycerol, 0.05% bromphenol blue) was mixed with the sample and boiled for 10 min before SDS-PAGE. For the non-reductive treatment of the samples, 5ϫ loading buffer without ␤-mercaptoethanol was added to the sample and directly subjected to SDS-PAGE without boiling. To determine the active protein band on the gel, the non-reductive samples were loaded into the two neighboring wells. After SDS-PAGE, one lane of the gel was cut for staining, and the other was cut for in situ xylosidase activity detection. To restore the enzyme activity, the latter was washed three times (10 min each time) with Triton X-100 (2.5%) in 50 mM sodium acetate buffer, pH 5.0, to replace the ionic detergent SDS with the non-ionic detergent Triton X-100 and to lower the pH (from 8.8 to 5.0). The SDS-free gel was submerged in 5 mM PNP-Xyl solution and incubated at 50°C for 30 min; the process was then stopped by the addition of saturated sodium tetraborate solution. The targeted protein band was used for LC-MS/MS de novo peptide sequencing.
Characterization and Kinetics of the Enzyme-Firstly, the optimal temperature and optimal pH were determined using PNP-Xyl as a substrate. To measure the optimal temperature, 100 l of 5 mM PNP-Xyl solution in 50 mM sodium acetate buffer, pH 5.0, was mixed with 25 l of the enzyme solution (0.01 mg/ml) at a reaction temperature in the range of 30°C-65°C. The reaction assay was performed as described above and in duplicate or triplicate. The optimal pH assay was conducted in a similar fashion, except that a temperature of 50°C was constantly maintained. The pH was in the range of 3.0 -5.5 (sodium acetate buffer) or 6.0 -9.0 (potassium phosphate buffer). Additionally, the reaction time curve was plotted with a constant pH of 5.0 and a constant temperature of 50°C in the time range of 6 -84 min. The kinetic parameters of the enzyme were determined against the substrate PNP-Xyl in a concentration range of 0.39 -50.0 mM. The kinetic parameters of the recombinant enzyme against PNP-Xyl and PNP-Glc were also examined under the same conditions.
To measure the optimal temperature of the recombinant enzyme against XDT, 200 l of 2.12 mM XDT solution in 50 mM sodium acetate buffer, pH 4.5, was mixed with 10 l of the enzyme solution (0.1 mg/ml) at a reaction temperature in the range of 30°C -65°C for 1.5 h. The optimal pH assay was conducted in a similar fashion in the range from 3.0 to 8.0, except that a temperature of 45°C was constantly maintained. The reaction time curve was plotted with the constant pH 4.5 and the constant temperature 45°C and in a time range of 1-7 h. The kinetic parameters of the recombinant enzyme against XDT were determined in the XDT concentration range of 0.039 -5.0 mM (XDT stock solution: 10 mg/ml or 10.6 mM, dissolved in dimethyl sulfoxide; the stock solution was diluted with 50 mM sodium acetate buffer, pH 4.5). The XDT conversion reaction was carried out in a volume of 300 l (containing 100 l of 0.01 mg/ml enzyme solution in 50 mM sodium acetate buffer, plus 200 l XDT solution) at pH 4.5 and 45°C for 40 min. 700 l methanol was mixed with 300 l of each reaction solution, and the mixtures were analyzed via HPLC for DT formation.
The kinetic data were processed via a proportional weighted fit using a nonlinear regression analysis program based on Michaelis-Menten enzyme kinetics (35,36).
The stability of the recombinant enzyme was determined by measuring the released p-nitrophenol from PNP-Xyl as described previously. For determining the pH stability, the enzyme was properly diluted with various 50 mM buffers including sodium acetate (HAc, pH 2.0 -5.0), potassium phosphate (PBS, pH 6.0 -7.0), and Tris-HCl (pH 8.0 -12.0) and incubated at 4°C for 24 h. For testing the thermal stability, the enzyme was properly diluted with 50 mM Tris-HCl buffer (pH 8.0) and incubated at 4°C, 25°C, 37°C, 45°C, 60°C, and 85°C for 24, 48, and 72 h or at 25°C for 1, 2, 3, 7, and 14 d and then cooled to 4°C before analysis. For detecting the stability in the presence of metal ions (K ϩ , Mn 2ϩ , Ag ϩ , Cu 2ϩ , Fe 2ϩ , Co 3ϩ , Zn 2ϩ , Ca 2ϩ , Mg 2ϩ ) and other reagents (EDTA and urea), the enzyme was properly diluted with 50 mM Tris-HCl buffer (pH 8.0) containing 5 nM of KCl, MnSO 4 , AgNO 3 , CuSO 4 , FeSO 4 , CoCl 3 , ZnCl 2 , CaCl 2 , MgSO 4 , EDTA, or urea and incubated at 4°C for 24 h. The enzyme solution without metal ions or agents was used as the control, and the activity was recorded as 100%.
LC-MS/MS de Novo Sequencing of the Enzyme-The active protein bands (designated as LXYL-P1 and LXYL-P2) on the gel were cut and digested with trypsin. The digested samples were used for LC-MS/MS analysis. Accurate mass LC-MS and MS/MS data were collected in high-definition DDA (data-dependent) mode. LC-MS/MS data were processed using ProteinLynx Global Server Version 2.3 (Waters, Milford, MA), and the resulting peaklists were subjected to searches against the NCBInr protein database with the Mascot search engine. If the identical hit of the peaklists was not found in the NCBInr database, the spectra with the highest ion intensities were selected for de novo sequencing using Masslynx Pepseq 4.1 software.
Isolation of the Fungal DNA and RNA-The fungal strain was cultured in the liquid wheat bran medium as mentioned above for 4 days, and the mycelia were filtered and ground into fine powder in liquid nitrogen. The genomic DNA was extracted via the genomic DNA isolation mini-prep method (37), and the total RNA was isolated using the RNeasy plant mini kit (Qiagen, Shanghai, China).
Cloning of the Lxyl-p1 cDNA-The first strand of the cDNA was synthesized from the total RNA of the fungus M95.33 using a Clontech SMART-RACE cDNA Amplification Kit and was used as a template for Lxyl-p1 gene cloning.
The five tryptic oligopeptides of LXYL-P1 decoded by LC-MS/MS de novo sequencing were aligned with the two putative proteins found by the Mascot search engine to preliminarily define their relative arrangement in LXYL-P1 (supplemental Fig. S2). A series of degenerate oligonucleotide primers were designed based on the tryptic oligopeptide sequences. Nest-PCR was performed using combinations of forward and reverse primers (3F1/5R1, 3F2/5R2, and 3F3/ 5R3) (supplemental Table S1) to amplify the cDNA fragment that involved three regions encoding the corresponding oligopeptides (supplemental Fig. S3A). PCR conditions were as follows: initial denaturation at 95°C for 10 min, 35 cycles of 95°C for 40 s, 55°C to 70°C (gradient) for 1 min, 72°C for 2 min, and a final extension at 72°C for 10 min, in a 50-l reaction volume.
Based on the information from the cDNA fragment, the SMART-RACE technique was applied to elongate the cDNA sequence using a Clontech SMART-RACE cDNA Amplification Kit according to the manufacturer's instructions. The 3Ј-and 5Ј-terminal sequences of Lxyl-p1 had more than 200 nucleotides overlapping to generate a full-length cDNA that involved the putative ORF (supplemental Fig.  S3A).
Nucleotide Sequence Accession Number-The fungal nucleotide sequences obtained in this study were deposited in GenBank under accession numbers JN167168 -JN167171.
Bioinformatics and Phylogenetic Analysis-The cloned Lxyl-p1 cDNA sequence and its encoded amino acid sequence were first analyzed via the Basic Local Alignment Search Tool (BLAST) at the National Center of Biotechnology Information (NCBI) to aid in the selection of the most closely related reference sequences from Gen-Bank. The Lxyl-p1 cDNA sequence and its encoded amino acid sequence were also blasted with those from other databases, including JGI, to search for closely related sequences. The selected sequences of ␤-xylosidases and ␤-glucosidases from different species were aligned with LXYL-P1 using ClustalX 1.83 (38). The resultant alignment file was imported into MEGA5 (39). The phylogenetic trees were constructed using the software package and the neighborjoining and maximum parsimony methods with 1000 bootstrap replicates. Gaps and missing data were eliminated. Bootstrap values (Ͼ70%) were shown next to the branches. The ␣-glucosidase (NP_001031247) from Arabidopsis thaliana was used as an outgroup.
The conserved W(N)GR and KH motifs, which are probably involved in substrate binding, and the putative catalytic nucleophile and catalytic acid/base of LXYL-P1 were predicted based on multiple sequence alignments with some GH family 3 enzymes, including that from barley (36), whose active sites have been determined. Calculation of the theoretical isoelectric point (pI) and molecular weight was performed with the Compute pI/Mw tool. The glycosylation sites were studied using NetNGlyc software.
Lxyl-p1 Heterologous Expression-Lxyl-p1 was inserted into the yeast expression vectors pPIC9K or pPIC3.5K to construct the recombinant expression plasmid pPIC9K-P1 or pPIC3.5K-P1. The recombinant expression plasmid was transformed into Pichia pastoris GS115 via electroporation transformation, resulting in recombinant yeast cells that carried the Lxyl-p1 gene. Meanwhile, the vector pPIC9K or pPIC3.5K was introduced into the GS115 competent cells as the control.
Both the recombinant yeast harboring the Lxyl-p1 gene and the mock transformant (control) were inoculated into 20 ml BMGY medium (contents per liter: 10 g yeast extract, 20 g peptone, 100 mM potassium phosphate buffer, pH 6.0, 10 ml glycerol) and incubated at 30°C and 200 rpm for 24 h. The cultures were washed twice via centrifugation, and the cell pellet was resuspended in 20 ml BMMY medium, which was similar to BMGY but contained 10 ml/l methanol instead of 10 ml/l glycerol. Cells were cultured at 30°C and 200 rpm for a couple of days, and methanol was added every day to maintain 1% (v/v) for the induction of the Lxyl-p1 gene expression. Meanwhile, the ␤-xylosidase activity was analyzed on the basis of periodic sampling. The samples were washed twice with dH 2 O via centrifugation, and the cell pellet was resuspended with dH 2 O in the same volume of the culture broth. 100 l of 5 mM PNP-Xyl was added to 50 l of the cell suspension and incubated for 20 min at 50°C for the catalytic activity analysis.
To set up the standard curves and linear regression equations, different concentrations of XDT and DT were prepared: 0.2000, 0.1500, 0.1000, 0.0750, 0.0500, 0.0250, and 0.0125 (mg/ml). The HPLC conditions were as described above. Other compounds were roughly estimated on the basis of these equations.

Purification of the Fungal Glycoside Hydrolase (LXYL)-A
total of 510 ml crude enzyme solution was obtained from 240 g of the homogenized mycelium (wet weight). From this volume, the enzyme was purified over 432-fold with 3.94% recovery in four steps of column chromatography, yielding 0.4 ml purified enzyme (see Table I). After Phenyl Sepharose column chromatography, two active peaks were resolved (Fig.  1A), which were designated as P1 (or LXYL-P1) and P2 (or LXYL-P2). The two fractions were examined for their activity against the substrate XDT (Fig. 1B). It was demonstrated via  (Fig. 1C, lane 1) were displayed in the reductive treatment, and both of them were cut for LC-MS/MS de novo sequencing. According to the Mascot search results, the band with the smaller size (ϳ44.3 kDa) was mostly affiliated with agmatinase, which cannot be excluded as a contaminant. As the peaklists of the other band (ϳ110 kDa) had no identical hit in the NCBInr database, several spectra of the peaklists with the highest ion intensities were selected for de novo sequencing using Masslynx Pepseq 4.1 software, and five different oligopeptide sequences were decoded (Table II, supplemental Appendix S2).
Although the attempt to purify single LXYL-P2 was unsuccessful, a distinct band with an apparent molecular weight between 66 kDa and 97.2 kDa was defined in the in situ gel activity analysis after deglycosylation, (Fig. 1D, lane 4 and  lane 7). This band was cut for LC-MS/MS de novo sequencing, and two different oligopeptide sequences were acquired (Table II, supplemental Appendix S3), of which Oligop2-1 and Oligop2-2 are the same as Oligop1-3 and Oligop1-4 of LXYL-P1, respectively. Based on the information from the LC-MS/MS analysis and the Phenyl Sepharose chromatographic results, we infer that LXYL-P1 and LXYL-P2 are either the same protein with different glycosylation patterns or two variants with high identity to each other. Our results (Fig. 1D) also show that both glycosylated and de-glycosylated patterns were active against the ␤-xyloside substrate.
Cloning of the Lxyl-p1 cDNA and Genomic DNA-To predict the order in LXYL-P1, the five oligopeptides were aligned with the two putative proteins (XP_386781, hypothetical protein FG06605.1 from Gibberella zeae PH-1; XP_956104, and probable ␤-glucosidase 1 precursor from Neurospora crassa OR74A, retrieved from the NCBInr protein database), which had a relatively higher score in terms of similarity to the MS peaklists mentioned above. As shown in supplemental Fig.  S2, the relative arrangement of the five oligopeptides in LXYL-P1 was preliminarily defined in the order of Oligop1-3, Oligop1-2, Oligop1-5, Oligop1-1, and Oligop1-4 (simplified as 3, 2, 5, 1, 4). The gene cloning strategy was put forward as shown in supplemental Fig. S3A.
A large cDNA fragment was acquired that harbored all five coding regions and a putative ORF (2412 bp) (designated as Lxyl-p1) encoding a putative protein of 803 amino acids. Further analysis found two types of cDNA sequences (designated as Lxyl-p1-1 and Lxyl-p1-2) with the same length of ORF and 97% identity in the coding regions, and a total of 21 amino acids were different between the two putative polypeptides (also with 97% identity) (supplemental Fig. S4).
The corresponding structural genes (designated as G-Lxyl-p1-1 and G-Lxyl-p1-2) were obtained by means of PCR and Genome Walking using the genomic DNA from the fungus M95.33 as the template (supplemental Fig. S3B). At the genomic level, G-Lxyl-p1-1 consists of 19 exons and 18 introns, with a length of 3601 bp from the start codon ATG to the stop codon TGA. The structure of G-Lxyl-p1-2 is the same as that of G-Lxyl-p1-1, except that the length is 3608 bp from the start to the stop codons. Nucleotide replacements, insertions, or deletions are mainly present in the introns of the two sequences.
Bioinformatics and Phylogenetic Analysis-We are paying constant attention to the homologous genes and proteins with  high identity to Lxyl-p1-1 and Lxyl-p1-2 and their corresponding polypeptides in GenBank and other databases, including JGI. Until the moment of this paper's preparation, both Lxyl-p1-1 and Lxyl-p1-2 exhibited extremely low identity to any other nucleotide sequences that have ever been accessed. This might well explain why the targeted gene was not achieved when we tried to apply the conserved gene sequences of other ␤-xylosidases to mine it from the genome of L. edodes (data not shown). Similar amino acid sequences (GenBank accession: XP_760179, XP_401647, CBQ68654, CCF48873, EIN14278, EIN14265, EIM20217, EIM22253, EIM82192, EIM88797, EIM88796; most of them were accessed in 2012) were found with 43% to 59% identity to LXYL-P1-1 and LXYL-P1-2. The sequence of EIM20217 was also deposited in JGI (designated as jgi Walse1 21012 e_ gw1. 19.159.1) and was the most similar sequence to LXYL-P1-1 and LXYL-P1-2 present in this database. But all of these accessed sequences are putative proteins. In order to preliminarily classify LXYL-P1-1 and LXYL-P1-2, dozens of GH3 family enzymes summarized by Fincher and co-workers (40) and some other GH family proteins were chosen for the phylogenetic analysis. Several closed hypothetical proteins (XP_760179, ID 43%; EIM82192, ID 57%; EIM88796, ID 59%) were also chosen as references for constructing the phylogenetic tree. The phylogenetic tree also involved some of the GH3 proteins (YbbD, NagZ, Exo1, ExoP, TnBgl3B, and K m Bgl1) whose three-dimensional structures have been defined. Both LXYL-P1-1 and LXYL-P1-2 belong to subgroup 4 of the GH3 family (supplemental Fig. S5). NCBI Blastp analysis revealed that, as the bifunctional ␤-xylosidases/␤-glucosidases (see below), the two proteins were more closely related to ␤-glucosidases than to ␤-xylosidases.
Theoretically, both LXYL-P1-1 and LXYL-P1-2 have the same molecular weight of 86 kDa with pI of 4.61 and 4.67, respectively, calculated from the putative amino acid sequences. The putative substrate binding motifs (Trp 172 (Asn)-Gly 173 Arg 174 and Lys 207 His 208 ), catalytic nucleophile (Asp 300 ), and catalytic acid/base (Glu 529 ), as well as the potential Nglycosylation sites, were predicted, and the matched positions of the five oligopeptide sequences obtained via LC-MS/MS have been confirmed (supplemental Fig. S4).
Lxyl-p1 Heterologous Expression and Catalytic Activity Analysis-The ORFs of Lxyl-p1-1 and Lxyl-p1-2 with six histidine residues at the C-terminal end were introduced into the yeast Pichia pastoris GS115 by the vector pPIC9K, which carries a signal peptide coding sequence that can be fused in-frame with the cloned gene to form a fusion protein. The recombinant protein should be secreted out of the cells via the protein secretion mechanism.
However, both culture filtrates had relatively low or nearly no ␤-xylosidase activity ( Fig. 2A), whereas the recombinant cell pellet was much more active (Fig. 2B), suggesting that the recombinant enzyme was mainly kept inside the cells. More-over, the activity of LXYL-P1-2 was over 2-fold higher than that of LXYL-P1-1 ( Figs. 2A-2C).
Then, the ORF of Lxyl-p1-2 was introduced into the same host, but by the non-secreted vector pPIC3.5K. As expected, leakage of the recombinant enzyme from the engineered GS115-3.5K-P1-2 was completely avoided, leading to a further increase in enzyme activity in the GS115-3.5K-P1-2 cell pellet relative to that in the GS115-9K-P1-2 cell pellet under the same culture conditions (Fig. 2D), suggesting that the engineered yeast GS115-3.5K-P1-2 can be chosen for industrial purposes.
Glycoside Specificity and Characterization of the Enzyme-Firstly, the natural LXYL-P1 was chosen for analyses of glycoside specificity and characteristics. In addition to ␤-xylosidase activity, LXYL-P1 could also hydrolyze PNP-Glc, exhibiting the bifunctional ␤-D-xylosidase/␤-D-glucosidase property, but it showed no activity against PNP-Gal and PNP-Ara. The bifunctional enzyme was even more efficient in hydrolyzing PNP-Glc than in hydrolyzing PNP-Xyl. The optimum pH and optimum temperature of the natural LXYL-P1 were 4.5 and 50°C, respectively.
Similar to its natural LXYL-P1, the optimum temperature of LXYL-P1-2 against PNP-Xyl was 50°C. The suitable reactive pH ranged from 4.0 to 6.0, with an optimum pH of 4.0 (Fig. 3). The optimum temperature and optimum pH of the same enzyme against XDT were 45°C and 4.5, respectively (Fig. 4). We may draw some conclusions from these results: (i) the activity of LXYL-P1-2 is higher than that of LXYL-P1-1, which is in accordance with the results shown in Fig. 3; (ii) the activities of both LXYL-P1-1 and LXYL-P1-2 against the glucoside are higher than those of the enzymes against xyloside; and (iii) because of the complicated taxane structure, the activities of both LXYL-P1-1 and LXYL-P1-2 against XDT are lower than those of the enzymes against the chromogenic substrate PNP-Xyl.
Then, the properties and kinetic parameters of the recombinant LXYL-P1-1 and LXYL-P1-2 against PNP-Xyl, PNP-Glc, and XDT were further investigated, and the results are summarized in Table III. The characterization of LXYL-P1-2 against PNP-Xyl, PNP-Glc, and XDT, including the ␤-xylosidase's optimum temperature and optimum pH, are also shown in Figs. 3 and 4, respectively.
The stability of LXYL-P1-2 was further examined (Fig. 5). The enzyme was most stable at pH 8.0 and was thermostable up to 37°C for 72 h, under which conditions it kept 71% activity relative to the control. It retained 76% activity after incubation at 25°C and pH 8.0 for 14 d. The stability was not influenced by 5 nM of K ϩ , Ag ϩ , or EDTA after incubation at 4°C for 24 h. The enzyme also revealed good tolerance to the same concentration of Mn 2ϩ , Zn 2ϩ , Ca 2ϩ , Co 3ϩ , or Mg 2ϩ , keeping over 90% activity relative to the control. But the activity declined to 87% to 88% when the enzyme was treated with the same concentration of Fe 2ϩ , Cu 2ϩ , or urea.
Both LXYL-P1-1 and LXYL-P1-2 (glycosylated and degylcosylated forms) were subjected to SDS-PAGE analysis, and the results are shown in supplemental Fig. S6. Because of incomplete deglycosylation, the deglycosylated forms are nearly 97 kDa in size (LXYL-P1-2 was then determined to be 92 kDa via MALDI-TOF-MS detection).
Bioconversion of 7-␤-xylosyltaxanes by the Engineered Yeast GS115-3.5K-P1-2-Under laboratory conditions, buffered methanol-complex medium was chosen as the production medium for accumulating the biomass. The methanolinduced recombinant cells were harvested after 6 days of cultivation and immediately used to convert 7-␤-xylosyltaxanes, or they were freeze-dried and kept at Ϫ20°C before use. Based on the characteristics of the purified recombinant enzyme, the catalytic conditions of the engineered yeast were further optimized, resulting in an optimum pH of 4.0, optimum temperature of 45°C, optimum substrate solvent of dimethyl sulfoxide, reaction period of Յ24 h, and final substrate concentration of 10 mg/ml (this concentration is quite oversaturated). The cell amount was 16 to 32 mg/ml (dry weight). Partial results are summarized in Table IV and Fig. 6. At this oversaturated XDT concentration, a maximum DT yield of 8.42 mg/ml with an XDT conversion rate of 85.6% could be achieved.

DISCUSSION
In this work, we demonstrate the feasibility of combining protein de novo sequencing with the RT-PCR/RACE strategy for mining the targeted ␤-xylosidase gene(s) from the fungus L. edodes. The approach omitted the process of constructing a cDNA library, which would be followed by a great deal of screening. This method is particularly practical when the identity of the targeted gene is extremely low with the known DNA sequences. Using this strategy, we were able to discover two highly homologous cDNA variants (designated as Lxyl-p1-1 and Lxyl-p1-2) from the fungus. Both cDNA sequences encode 803 amino acids (supplemental Fig. S4) with the bifunctional ␤-D-xylosidase/␤-D-glucosidase property. The kcat/K m ratios of LXYL-P1-1 for PNP-Xyl, PNP-Glc, and XDT were lower than those of LXYL-P1-2 (0.77 s Ϫ1 mM Ϫ1 , 1.65 s Ϫ1 mM Ϫ1 , and 0.30 s Ϫ1 mM Ϫ1 versus 1.67 s Ϫ1 mM Ϫ1 , 2.85 s Ϫ1 mM Ϫ1 , and 1.07 s Ϫ1 mM Ϫ1 ). The corresponding genomic DNAs (G-Lxyl-p1-1, G-Lxyl-p1-2), both containing 18 introns, were also obtained. To our knowledge, this is the first report on gene cloning and characterization of ␤-D-xylosidase that can specifically release xylose from 7-␤-xylosyltaxanes.
Bioinformatics analysis suggested that Lxyl-p1-1 and Lxyl-p1-2 (97% identity) are new GH genes. The encoded proteins (LXYL-P1-1 and LXYL-P1-2) belong to subgroup 4 of GH family 3 (supplemental Fig. S5). Our findings may lead to further studies on the structure-activity relationships of the proteins that will disclose more biochemical properties, or on the detailed mechanism of these proteins' action on 7-␤-xylosyltaxanes and hydrolysis of xylose from the aglycons.
The potential substrate binding motifs (Trp 172 (Asn)Gly 173 -Arg 174 and Lys 207 His 208 ) and the putative catalytic sites (Asp 300 , Glu 529 ) were also predicted (supplemental Fig. S4). Recently we  have demonstrated by means of site-directed mutagenesis that all of the active sites are responsible for the enzyme activity. Interestingly, in this assay the point mutation H208A signifi-cantly reduced the activity of the enzyme against the substrate PNP-Xyl or XDT, but it had little effect on the activity of the enzyme against PNP-Glc (data not shown).  The activity of LXYL-P1-2 was over 2-fold higher than that of LXYL-P1-1 ( Figs. 2A-2C). Comparison between the two putative amino acid sequences showed that there are only 21 different amino acids present in the first 368 amino acids of the two polypeptides (supplemental Fig. S4), implying that in addition to the possible residues of the active sites, some of these amino acids might affect the enzyme activity. We envision that the properties of these enzymes could be further improved through protein engineering.
Additionally, we detected the partial N-terminal sequence of LXYL-P1-2 (using a PROCISE/PROCISE cLC Protein Sequencer (ABI, New York; data not shown) and found that the first 35 amino acids were missing from the putative 803 amino acids. Theoretically, the matured peptide should have a molecular weight of 82 kDa. Further analysis revealed a sequence of Ile-Phe-Arg-Arg present in the C-terminal end of the first 35 amino acids, which is a typical Kex2 site ((Ali/ Arg)-Xaa-(Lys/Arg)-Arg2, where Ali and Xaa represent an aliphatic amino acid and an arbitrary amino acid, respectively) (41,42). The recombinant enzyme was mainly kept inside the recombinant yeast cells, even when the secreted expression vector pPIC9K was applied ( Figs. 2A, 2B). The reason for this is probably that the high molecular mass and the glycosylated structure hinder the secretion of the protein.
Although large amounts of the secreted protein were not observed, the recombinant cells with the heterologous protein still have some advantages for manipulation, especially for use in industrial production, because the cell cultivation and the substrate bioconversion can be optimized separately, the optimal conditions of which are generally quite different. Moreover, the recombinant cells can be freeze-dried and kept in a cold room or refrigerator for long periods to maintain the enzyme activity, which can be used expediently as a general catalyst. Because the living cells are not indispensable in the catalytic reaction, the impact of cytotoxic effects of both the substrates and the products on the bioconversion can be neglected.
The ␤-xylosidase activity of the enzymes against XDT is lower than that against the chromogenic substrate PNP-Xyl  Table IV for details). (Table III), which is probably due to the complicated taxane structure. The lower solubility of XDT might also affect the detection results. However, at the oversaturated taxane concentrations (8 -10 mg/ml), the recombinant cells still exhibited high catalytic efficiency (XDT conversion rate: 85.60% to 92.45%), yielding as much as 8.42 mg/ml of the product (DT) within 24 h (Table IV), which is over 10-to 20-fold more than the best results of XDT bioconversion that have ever been reported (31)(32)(33)(34). Our results indicate that as long as the substrates are mixed thoroughly with the recombinant cells (in a reaction volume from 1.5 to 200 ml) under proper conditions, a high conversion yield can be easily achieved. We optimistically estimate that the raw material cost of producing 1 kg of DT can be negligible when high-density fermentation is applied to the engineered yeast. Thus, the total production cost from XDT to paclitaxel can be greatly reduced. Currently, pilot scale studies on fermentation (Ͼ10 l fermenters) and bioconversion (Ͼ5 l) are in progress, and some results from early stage research have been published elsewhere (43). It has been demonstrated that the conversion efficacy in a reaction volume of 5 l is similar to that at the lab scale. This work provides a fast and efficient biocatalytic route for preparing 10-deacetyltaxol for the semi-synthesis of paclitaxel. Additionally, because of their bifunctional properties, the enzymes might have the potential to release glucose and/or xylose from other substrates.
In summary, two new GH genes (Lxyl-p1-1 and Lxyl-p1-2) have been obtained from L. edodes, each of which encodes a bifunctional ␤-D-xylosidase/␤-D-glucosidase that specifically releases xylose from 7-␤-xylosyltaxanes. Engineered yeast with such a gene could convert 7-␤-xylosyltaxanes into 7-hydroxyltaxanes with high yields. These findings might greatly improve the utilization of 7-␤-xylosyltaxanes for the semisynthesis of paclitaxel or its analogues. This work might lead to further studies on the detailed mechanisms of these enzymes' action on 7-␤-xylosyltaxanes and hydrolysis of xylose from the substrates.