Combination of MS Protein Identification and Bioassay of Chromatographic Fractions to Identify Biologically Active Substances from Complex Protein Sources

Purification of biologically active proteins from complex biological sources is a difficult task, usually requiring large amounts of sample and many separation steps. We found an active substance in a serum response element-dependent luciferase reporter gene bioassay in interstitial cystitis urine that we attempted to purify with column chromatography and the bioassay. With anion-exchange Mono Q and C4 reversed-phase columns, apparently sharp active peaks were obtained. However, more than 20 kinds of proteins were identified from the active fractions with MS, indicating that the purification was not complete. As further purification was difficult, we chose a candidate molecule by means of studying the correlation between MS protein identification scores and bioassay responses of chromatographic fractions near the active peaks. As a result, epidermal growth factor (EGF) was nominated as a candidate molecule among the identified proteins because the elution profile of EGF was consistent with that of the bioassay, and the correlation coefficient of EGF between MS protein identification scores and bioassay responses was the highest among all the identified proteins. With recombinant EGF and anti-EGF and anti-EGF receptor antibodies, EGF was confirmed to be the desired substance in interstitial cystitis urine. This approach required only 20 ml of urine sample and two column chromatographic steps. The combination of MS protein identification and bioassay of chromatographic fractions may be useful for identifying biologically active substances from complex protein sources.

Nevertheless the recent progress of MS has dramatically changed protein analysis (4). With MS, smaller protein samples can be used than with classical protein identification methods such as N-terminal peptide sequencing.
Interstitial cystitis (IC) 1 is a chronic inflammatory disease characterized by frequency and urgency and/or severe pelvic pain (5). The International Continence Society also selected the term "painful bladder syndrome" for IC (6). The quality of life of IC patients is extremely low because of their severe symptoms. The pathogenesis of IC is unclear, and effective treatments have not been established. To elucidate the mechanism of IC pathogenesis, we attempted to find characteristic proteins in IC urine using proteomics techniques and have already reported active neutrophil elastase as an IC urinary marker (7). We had also performed gene expression analysis of IC bladder tissues using GeneChip technology and found that mRNA expression of GPR18, a member of the G-proteincoupled receptors, was higher in IC bladder than in the control. 2 We tried to confirm whether GPR18 endogenous ligand existed in IC urine by using a bioassay with GPR18 transfectant cells.
In the present study, the existence of an active substance in IC urine was suggested in the bioassay using the serum response element (SRE)-dependent luciferase reporter gene with the stable recombinant HEK293 cell line expressing GPR18. We thought that the response was derived from GPR18 and tried to purify the active substance from a small volume of IC urine using chromatographic techniques. Among the many proteins identified from partially purified samples, we clearly nominated epidermal growth factor (EGF) as a candidate molecule judging from the correlation between MS protein identification and the bioassay of chromatographic fractions. With recombinant EGF and anti-EGF antibody, EGF was confirmed to be the desired substance found in IC urine.
The complete inhibition of the bioassay response by anti-EGF receptor antibody also indicated that the response was based on the EGF receptor, not GPR18, suggesting that GPR18 overexpression enhanced the EGF signal via the endogenous EGF receptor of the HEK293 cell line.
IC Patient-The 31 IC patients satisfied the National Institute of Diabetes and Digestive and Kidney Diseases criteria (9). The mean age of the IC patients was 58 years (range, 19 -82 years); 84% were female. The mean age of the 11 healthy volunteers without complaints of lower urinary symptoms was 42 years (range, 20 -60 years); 82% were female. The patients and healthy volunteers were enrolled in accordance with the research plan approved by the ethics committee of both the National Hospital Organization Sagamihara National Hospital and Astellas Pharma Inc. and gave written informed consent.
Purification of an Active Substance from IC Urine-Random midstream urine samples were collected in 15-or 50-ml polypropylene conical tubes and immediately centrifuged at 3,000 rpm for 10 min at 4°C. The supernatant was immediately stored in the same kind of tube without any additives at Ϫ20°C. The maximum storage period and the maximum number of freeze-thaw cycles of the urine samples were 6 years and 10 cycles, respectively. Just before use, the urine was thawed and immediately centrifuged at 3,000 rpm for 10 min at 4°C. For purification, 20 ml of the supernatant was applied to a reversed-phase Sep-Pak C 18 cartridge prewashed sequentially with 90:10 (v/v) acetonitrile/water and with water. The cartridge was washed with water and then eluted with 50:50 (v/v) acetonitrile/water. This fraction was concentrated with a SpeedVac concentrator (Thermo Electron, Waltham, MA), cold ethanol was added, and the mixture was stored at 4°C for more than 30 min. After centrifugation at 21,000 ϫ g for 5 min, the precipitate was dried with the SpeedVac concentrator, dissolved with 0.25 ml of 20 mM Tris-HCl (pH 8.0), and then injected on an anion-exchange column (Mono Q HR 5/5). The mobile phases consisted of Solvent A (20 mM Tris-HCl, pH 8.0) and Solvent B (20 mM Tris-HCl, pH 8.0, 1 M NaCl). The column was initially equilibrated at 0% B at a flow rate of 0.5 ml/min. The separation was performed by a linear gradient of 0% B to 100% B over 30 min at a flow rate of 1 ml/min at room temperature. Detection was by UV absorbance at 280 nm. The fractions were manually collected at 1-min intervals into polypropylene microtubes, which were used for the bioassay and MS protein identification.
The Mono Q fractions were further separated with a reversed-phase column (Vydac C 4 ). The mobile phases consisted of Solvent A (0.1% TFA) and Solvent B (0.1% TFA in 90:10 (v/v) acetonitrile/water). The column was initially equilibrated at 0% B at a flow rate of 0.3 ml/min. The separation was performed by a linear gradient of 0% B to 50% B over 30 min at a flow rate of 1 ml/min at room temperature. Detection was by UV absorbance at 280 nm. The fractions were manually collected at 1-min intervals into polypropylene microtubes, which were used for the bioassay and MS protein identification. The HPLC experiments were performed using an HP1090M liquid chromatograph (Agilent Technologies, Palo Alto, CA).
MS Analysis-The collected fraction was dried with the SpeedVac concentrator and dissolved with 50 l of 50 mM NH 4 HCO 3 and 0.1% Rapigest SF. The reduction and alkylation were performed with 5 mM DTT at 60°C for 30 min and with 15 mM iodoacetamide at ambient temperature for 20 min. After 1 g of trypsin was added, the reaction mixture was incubated at 37°C for 1 h, then HCl was added at a final concentration of 0.1 M, and the mixture was centrifuged at 13,000 rpm for 10 min to remove Rapigest SF. The supernatant, digested peptide mixture was concentrated on an on-line trap column (PepMap C 18 ) and then separated with a reversed-phase microcapillary column (nano-HPLC capillary column) using a CapLC system (Waters). The mobile phases consisted of Solvent A (2% acetonitrile, 0.1% formic

TABLE I Correlation between protein identification and bioassay of active fractions separated with a C 4 reversed-phase column from AEX-12 and AEX-14
Proteins identified from only either AEX-12 or AEX-14 are not listed in this table. The number of unique peptides detected and sequence coverage (%) are the maximum values among those of the protein in the fractions listed. The data of single peptide-based identifications are shown in supplemental Table S1 and Fig. S1. The sequence coverage (%) is calculated by dividing the number of amino acid residues of the identified peptide by that of the whole protein in the sequence database. Although the sequence coverage of EGF in this table is calculated as 1% by dividing 13 of the identified peptide (YACNCVVGYIGER) by 1030 of EGF in the database, the actual sequence coverage of EGF is calculated as 25% by dividing it by 53 of the mature EGF existing in urine. To calculate the correlation coefficient, zero was used as the MASCOT score in the fraction where the protein was not identified. GAPDH, glyceraldehyde-3-phosphate dehydrogenase; PTGDS, prostaglandin D2 synthase; HGFL, hepatocyte growth factor-like protein.
acid) and Solvent B (90% acetonitrile, 0.1% formic acid). The column was initially equilibrated at 10% B at a flow rate of 200 nl/min. The separation was performed by a linear gradient of 10% B to 50% B over 35 min, 90% B over 10 min, and 10% B over 15 min at a flow rate of 200 nl/min at room temperature. The LC effluent was directly interfaced with the nanoelectrospray ion source on a Q-TOF Ultima API mass spectrometer (Waters). An MS survey scan was obtained for the m/z range of 400 -1500, and MS/MS spectra were acquired for the two most intense ions from the survey scan with precursor charge limitation (2 or greater). Dynamic exclusion of 2-min duration was used to acquire MS/MS spectra from low intensity ions. All the MS/MS spectra were converted into text files (.pkl) using Protein Lynx software (Waters; purchased in February 2003) combining sequential scans with the same precursor, smoothing the spectrum with Savitzky-Golay smoothing, and measuring the peak top with a centroid top of 80%.
Protein identification was performed using MASCOT software (Version 2.1.0, Matrix Science Inc., Boston, MA). An in-house customized database (built in September 2006; 224,925 sequences) based on the NCBI non-redundant protein sequence database with species limitation (only human, mouse, rat, and bovine proteins can pass) and with locus redundancy removal by NCBI EntrezGene was used. The MASCOT search parameters were as follows. Peptide tolerance was 0.45 Da, and MS/MS tolerance was 0.15 Da (monoisotopic mass). Fixed modification of carbamidomethylation (Cys) and variable modifications of oxidation (Met), acetylation (N terminus), and pyro-Glu (Glu and Gln) were selected, and up to four missed trypsin cleavages were allowed. A protein score of 40 (p Ͻ 0.05) and a peptide score of 25 were the minimum identification criteria, and manual examination was conducted for all proteins identified with fewer than 80 points on the protein score or fewer than four unique peptides. The criteria used for manual validation included (a) peptide fragment ions being clearly above base-line noise, (b) intense y or b ions corresponding to the N-terminal cleavage of proline or glycine and absence of C-terminal cleavage of proline or glycine (except where proline follows), (c) major five fragment ions being interpretable, and (d) not matching a common contaminant (such as silicone or polyethylene glycol) or trypsinderived signals. In addition, each MS/MS spectrum of proteins identified with a small number (one or two) of peptides was compared with the same spectrum of the same proteins (if it exists) identified using a large set (three or more) of peptides for further manual confirmation. If peptides matched to multiple database entries, the one RefSeq accession number belonging to the species Homo sapiens was assigned for the identification protein.
Bioassay-The full-length cDNA of GPR18 (GenBank TM accession number NM_005292) was amplified by PCR with forward primer 5Ј-GGGTCTAGAATGATCACCCTGAACAATC-3Ј and reverse primer 5Ј-GGGTCTAGATCATAACATTTCACTGTTTATA-3Ј from human spleen cDNA and ligated into pEF-BOS-neo vector treated with XbaI. The stable recombinant HEK293 cell lines expressing GPR18 and vector control were established by selecting G418-resistant cells after transfection with the pEF-BOS-neo-GPR18 vector and the pEF-BOSneo vector, respectively. The cells were seeded onto collagen-coated 96-well plates (10,000 cells/well) in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum, 100 units/ml penicillin, and 100 g/ml streptomycin and incubated at 37°C in a 5% CO 2 humidified atmosphere for 24 h and then transfected with pSREluciferase reporter plasmid (100 ng/well) using Lipofectamine 2000. After 24 h, 10 l of urine samples were added into the wells and incubated for 16 h. After discarding the medium, the cells were lysed in 20 l of solubilization buffer (0.5 mM MgCl 2 , 10 mM dithiothreitol, 0.1% Triton X-100, 10 mM Tris, pH 7.8), and then 100 l of substrate solution (5 mM luciferin, 2 mM coenzyme A, 2 mM ATP, 0.5 mM MgCl 2 , 2 mM Mg(OH) 2 , 10 mM Tris, pH 7.8) were added into the wells. Luciferase activity measured by a Dynatech ML3000 luminometer (Dynatech Laboratories, Chantilly, VA) is expressed as relative light units.

RESULTS
We found the existence of an active substance in IC urine in the bioassay using the SRE-dependent luciferase reporter gene with the stable recombinant HEK293 cell line expressing GPR18. The results are shown in Fig. 1. IC urine showed higher activity than healthy volunteer urine (Fig. 1A). The signal with the HEK293 cell line with GPR18 gene-expressing vector was higher than one with vector control (Fig. 1B), suggesting that the activity was related to the GPR18 gene. The activity was lost with protease digestion or heat treatment (data not shown), indicating that this substance is a polypeptide.
We tried to purify it using column chromatography. The scheme of purification is shown in Fig. 2. With Mono Q anionexchange column chromatography, two main active peaks were obtained (Fig. 3A). We collected the two active fractions designated as AEX-12 and AEX-14, which were further separated, respectively, with C 4 reversed-phase chromatography. From the AEX-12 fraction, one active peak was obtained (Fig. 3B). The fraction was dried, digested with trypsin, and then identified with MS. Although the peak seemed very sharp, more than 10 kinds of proteins were detected in the active fraction, number 34, from AEX-12 (Table I). Whereas from the AEX-14 fraction, one relatively broad active peak eluted at three fractions was obtained (Fig. 3C). More than 20 kinds of proteins were also identified in the three active fractions, numbers 34 -36, from AEX-14 (Table I). We attempted to purify the active fractions using additional size exclusion, cation-exchange, or lectin affinity chromatography but failed to achieve satisfactory purification.
As further purification was difficult, we tried another strategy. The fractions eluted near the active peaks on reversedphase chromatography were identified with MS, and then the elution profile of the proteins was compared with that of the bioassay responses. The results are shown in Table I. EGF was identified only in the active fractions of both number 34 of AEX-12 and numbers 34 -36 of AEX-14 (Table I). The product ion spectrum of the identified EGF was identical with that of authentic recombinant EGF (Fig. 4). The correlation coefficient of EGF between MASCOT scores and bioassay responses was the highest among those of the identified proteins (Table  I). Therefore, we nominated EGF as the most prominent candidate molecule. To confirm whether the desired active substance in IC urine was EGF, we studied the effect of authentic recombinant EGF protein and anti-EGF and anti-EGF receptor antibodies for the bioassay. The recombinant EGF showed a dose-dependent response in the bioassay as well as IC urine (Fig. 5, A and B). The concentration of EGF in the IC urine sample used for this purification was estimated to be ϳ10 ng/ml because the signals of 1 and 10 l of IC urine corresponded to those of 0.01 and 0.1 ng of EGF, respectively. The activities of recombinant EGF and IC urine were completely inhibited with anti-EGF and anti-EGF receptor antibodies (Fig. 5, C-F). These results show that the active substance existing in IC urine was EGF. DISCUSSION EGF is a 53-amino acid peptide with three internal disulfide bonds and has a variety of biological activities (2,3). EGF in urine is reported to have C-terminal heterogeneity such as the desarginyl form (2,3), which should be the cause of the multiple peaks observed with Mono Q chromatography. It has also been reported that the average EGF concentration in IC urine is higher than in the control at 20 versus 2 ng/ml, respectively (10); this was consistent with our data that the activity derived from EGF in IC urine was higher than that in the control. In the previous reports, EGF was purified from a large volume of human urine with many chromatographic steps such as 330 liters with four steps (2) and 500 liters with six steps (3). In this report, EGF was identified as the active substance from 20 ml of IC urine with only two chromatographic steps.
A decrease in necessary sample amounts for protein identification was achieved with MS, but at the same time, many kinds of proteins were identified from a partially purified sample, making it difficult to choose a candidate molecule to be validated among them. This report shows that it is possible to select a suitable candidate molecule among many proteins identified with MS by judging from the correlation between MS protein identification and bioassay of chromatographic fractions. This approach does not need complete purification of the molecule. Although this methodology was not proven to be available for other samples such as serum in this report and should not be directly available in the case where plural components are necessary for bioassay activation, MS protein identification for monitoring chromatographic fractions as well as bioassay may be useful for other similar cases.
The SRE signal of EGF and IC urine was also obtained with the HEK293 cell line with vector control to a lesser extent than that with the GPR18 gene, which was completely inhibited by the anti-EGF receptor antibody (data not shown). Before identification of EGF, we believed that the response with control transfectant was due to endogenous GPR18 of HEK293 cells. However, with recombinant EGF and anti-EGF antibody, EGF was confirmed to be the desired substance found in IC urine, and the complete inhibition of the bioassay response by anti-EGF receptor antibody indicated that the response was based on the EGF receptor, not GPR18. The reason why the response with the GPR18-expressing HEK293 cell line was higher than that with the control is unclear. The SRE signal of EGF via the endogenous EGF receptor of the HEK293 cell line might be enhanced by overexpressing the GPR18 gene; this remains to be elucidated.
In this report, the elution profile of proteins with MS identification was evaluated with semiquantitative MASCOT identification scores. But recently, methods of quantitative MS analysis such as isobaric tags for relative and absolute quantitation (iTRAQ) have been reported (11). The use of quantitative MS protein analysis with bioassay in column chromatography will make it much easier to nominate candidate proteins from low amounts of crude protein samples with minimum purification steps.