Identification of MST1/STK4 and SULF1 Proteins as Autoantibody Targets for the Diagnosis of Colorectal Cancer by Using Phage Microarrays*

The characterization of the humoral response in cancer patients is becoming a practical alternative to improve early detection. We prepared phage microarrays containing colorectal cancer cDNA libraries to identify phage-expressed peptides recognized by tumor-specific autoantibodies from patient sera. From a total of 1536 printed phages, 128 gave statistically significant values to discriminate cancer patients from control samples. From this, 43 peptide sequences were unique following DNA sequencing. Six phages containing homologous sequences to STK4/MST1, SULF1, NHSL1, SREBF2, GRN, and GTF2I were selected to build up a predictor panel. A previous study with high-density protein microarrays had identified STK4/MST1 as a candidate biomarker. An independent collection of 153 serum samples (50 colorectal cancer sera and 103 reference samples, including healthy donors and sera from other related pathologies) was used as a validation set to study prediction capability. A combination of four phages and two recombinant proteins, corresponding to MST1 and SULF1, achieved an area under the curve of 0.86 to correctly discriminate cancer from healthy sera. Inclusion of sera from other different neoplasias did not change significantly this value. For early stages (A+B), the corrected area under the curve was 0.786. Moreover, we have demonstrated that MST1 and SULF1 proteins, homologous to phage-peptide sequences, can replace the original phages in the predictor panel, improving their diagnostic accuracy.

The characterization of the humoral response in cancer patients is becoming a practical alternative to improve early detection. We prepared phage microarrays containing colorectal cancer cDNA libraries to identify phageexpressed peptides recognized by tumor-specific autoantibodies from patient sera. From a total of 1536 printed phages, 128 gave statistically significant values to discriminate cancer patients from control samples. From this, 43 peptide sequences were unique following DNA sequencing. Six phages containing homologous sequences to STK4/MST1, SULF1, NHSL1, SREBF2, GRN, and GTF2I were selected to build up a predictor panel. A previous study with high-density protein microarrays had identified STK4/MST1 as a candidate biomarker. An independent collection of 153 serum samples (50 colorectal cancer sera and 103 reference samples, including healthy donors and sera from other related pathologies) was used as a validation set to study prediction capability. A combination of four phages and two recombinant proteins, corresponding to MST1 and SULF1, achieved an area under the curve of 0.86 to correctly discriminate cancer from healthy sera. Inclusion of sera from other different neoplasias did not change significantly this value. For early stages (A؉B), the corrected area under the curve was 0.786. Moreover, we have demonstrated that MST1 and SULF1 proteins, homologous to phage-peptide sequences, can replace the original phages in the predictor panel, improving their diagnostic accuracy. Colorectal cancer (CRC) 1 is the major cause of cancerassociated mortality in Spain and other developed countries (1). The population over 50 years of age constitutes the major risk segment. They should be screened periodically using some of the available detection methods, such as faecal occult blood testing (FOBT), sigmoidoscopy, colonoscopy, or CT colonography (2). CEA, the only available noninvasive protein marker, is mainly adequate for late stages and recurrence detection (3). Other alternative protein serum markers are needed to cover the entire progression of the disease. There is a need to define new clinically useful markers for accurate diagnosis of colorectal cancer (for a review see (4)).

Molecular & Cellular
Humoral response profiling in cancer patients is becoming increasingly used for the discovery of tumor-associated antigens (TAAs) as new biomarkers (5)(6)(7)(8)(9)(10). This new area, called "cancer immunomics" or "seromics," uses autoantibody signatures to classify neoplastic diseases and to find new targets for diagnosis and immunotherapy (11). Two microarray formats are available for TAA detection, recombinant protein microarrays and phage-display microarrays (12,13). The use of protein microarrays has led to the identification of TAAs with higher prevalences than previously reported (5). Peptidecontaining phage microarrays constitute an interesting alternative to commercial protein arrays. They are usually homemade and are more economical to produce than full-length recombinant protein microarrays. They require the construction of phage libraries, usually from T7 phages (8,14), consisting of cDNA fragments representative of genes expressed in cancer tissues. Peptides encoded by these cDNA fragments are exposed on the surface of the phage fused to the C-terminal end of the capsid 10B protein of the phage. Then, phage libraries are selected through biopanning procedures involving normal and patient's serum (8). Once constructed, the libraries are confronted to a panel of positive and reference serum to identify phages reactive with patient's autoantibodies. Some initial reports made use of nitrocellulose lifts for plaque screening of phages (14), in a process not amenable to high-throughput screening procedures. Combination of phage display with microarray technologies considerably improved the objective evaluation and throughput of the assays, allowing the testing of thousands of phages with only a few microliters of serum (6,8,15). This strategy, however, presents some limitations, such as the sequence of peptides that are displayed on the surface of the phage capsid (16), the presence of mimotopes (6) and the batch to batch reproducibility in microarray production, which is a common problem to other protein microarray formats.
Previously, our group identified PIM1, MAPKAPK3, MST1/ STK4, SRC, FGFR4, and ACVR2B as autoantibody targets in colorectal cancer using high-density protein microarrays (5). Here, we decided to test CRC cDNA libraries displayed in T7 phages in microarray format for autoantibody screening in colorectal cancer patients' sera. The combination of both proteomic strategies should increase the number of candidate biomarkers and the diagnostic accuracy. Although screening of colorectal cancer sera with phage display libraries grown in Petri dishes was reported by Ran et al. (17), that screening was based on visual interpretation of antibody binding to nitrocellulose lifts of phage plaques using pooled sera, making objective quantification quite difficult.
In this report, we have used a T7 phage display system in combination with a microarray format to survey the humoral response in colorectal cancer patients. We have discovered and validated a new set of TAAs. One of the TAA candidates, MST1/STK4, was previously identified with commercial highdensity full-length protein microarrays, indicating a significant concordance between both assays. By ELISA, we tested either phages or the recombinant homologous proteins with cancer and reference sera, including controls and different types of cancer, to validate the diagnostic assays in CRC patients. The final TAA candidates showed a significant accuracy for CRC diagnosis.

EXPERIMENTAL PROCEDURES
CRC and Reference Control Serum-The Institutional Ethical Review Boards of the Centro de Investigaciones Bioló gicas (CIB) and the Spanish National Research Council (CSIC) approved this study on biomarker discovery in colorectal cancer. Written informed consent was obtained from all patients. Serum samples for microarray and validation, were obtained from patients in the Bellvitge University Hospital, the Institut Catalá dЈOncología, Barcelona, Hospital Puerta de Hierro (Madrid), and the Hospital of Cabueñ es (Gijó n), Spain. Sample collection was approved by the Ethical Review Boards of these institutions. For selection of CRC-specific T7 phage libraries, three serum samples from CRC patients with Duke's stage B, 3 from stage C, and six from stage D (three with metastasis to liver and three with metastasis to lung) were used. For microarray analysis, serum samples from 15 patients having CRC in different stages were selected. The median age for the CRC patients was 66.3 years (range 54 -82). Fifteen serum samples were obtained from control subjects and were selected to match the median age and the same gender proportion that the CRC cohort. For validation, an independent cohort of 50 CRC serum samples, representative of the different Dukes stages (A-D), 46 control samples, 10 asymptomatic patients with familiar antecedents, 2 hyperplasic polyps, 2 ulcerative colitis, and 43 sera from other types of cancer (bladder, breast, lung, pancreas, and stomach) were used (5). A scheme of the training and validation analysis is shown in Fig. 1. Clinical data from all patients are shown in Table I. Samples were handled anonymously according to ethical and legal guidelines at the Spanish National Research Council (CSIC).
Serum samples were processed according to an identical protocol in the different hospitals. Blood samples were left at room temperature for a minimum of 30 min (and a maximum of 60 min) to allow clot formation, and then centrifuged at 3000 ϫ g at 4°C for 10 min. The sera were frozen and stored at Ϫ80°C until use.
T7 Phage Display cDNA Library Synthesis and Biopanning-Construction of phage libraries and biopanning was basically performed as previously described (8). Full methodology is given in supplemental data.
Printing and Use of Phage Microarrays-Following amplification, bacterial lysates were centrifuged and phage-containing supernatants were diluted 1:2 in phosphate-buffered saline (PBS) containing 0.1% Tween 20 (PBST) and printed in duplicate onto nitrocellulosecoated slides (Whatman/Schleicher and Schuell's) using an OmniGrid Spotter (GeneMachines, San Carlos, CA). Negative controls consisted of BSA (Sigma-Aldrich), buffer alone or empty spots. Human IgG (Sigma-Aldrich), and T7 protein were also spotted as positive controls to verify the array quality.
Serum samples (15 from CRC patients and 15 from healthy individuals) were probed in the phage-peptide microarrays as previously described (6), with minor modifications. Briefly, slides were equilibrated in PBS at room temperature for 5 min and then blocked with 3% skimmed milk in PBS (MPBS) for 1 h at room temperature with agitation. Then, 6.6 l of human serum (dilution 1:300), 120 g of E. coli lysate and 0.3 g of anti T7-Tag monoclonal antibody (Novagen, Madison, WI) in 2 ml of 3% MPBS were incubated for 90 min at room temperature. Slides were washed three times with PBST for 10 min. To detect human antibodies and T7 phages, slides were incubated with Alexa Fluor 647-labeled goat anti-human IgG (Invitrogen, Carlsbad, CA) diluted 1:2000 in 3% MPBS and Alexa Fluor 555labeled goat anti-mouse IgG (Invitrogen) diluted 1: 40,000 in MPBS, respectively. Arrays were washed three times with PBST, once with FIG. 1. Overview of the process followed for the identification and validation of potential biomarkers to diagnose colorectal cancer using phage microarrays.
PBS and dried by centrifugation at 1200 rpm for 3 min. Finally, slides were read on a ScanArray TM 5000 (Packard BioChip Technologies). Genepix Pro 7 (Axon Laboratories, Boston, MA) image analysis software was used for spot intensity quantification.
Immunohistochemistry Analysis-All CRC tumor resection specimens (usually hemicolectomies) were fixed in buffered formaldehyde and paraffin-embedded. We selected well-preserved representative areas from the tumor and distant normal mucosa for the immunohistochemical analysis. Immunohistochemistry was performed on 6-m sections of the blocks following an automated method (Dako autostainer). The primary antibodies for MST1/STK4 (Atlas Antibodies, Stockholm, Sweden) and SULF1 (Sigma) were used at 1:100 and 1:25 dilution, respectively. We counterstained the slides with hematoxylin. Immunoreactivity was graded as 0, absent; 1, mild staining; 2, moderate staining; or 3, intense staining. We classified the cases according to, both, the intensity of the staining and the percentage of areas showing reaction. Because the inflammatory cells showed positivity for MST1 (intense) and SULF1 (mild) antibodies, they were used as internal control. In all cases, an external negative control was included.
ELISA Tests-T7 Phage Capture Plates (Novagen) were blocked for 2 h at 37°C with 3% MPBS, and then coated overnight with 100 l of selected phage lysates in 3% MPBS. After washing three times with PBST, plates were blocked with MPBS for 1 h at 37°C. Then, 100 l of human serum (dilution 1:50 in 3% MPBS) were incubated for 1 h at 37°C. After washing, peroxidase-labeled anti-human IgG (1: 3000 in 3% MPBS) was added for 2 h at room temperature. Then, the signal was developed with 3,3Ј,5,5Ј-tetramethylbenzidine substrate for 10 min (Sigma). The reaction was stopped with 1 M HCl, and the absorbance was measured at 450 nm.
For competition analysis between phage peptides and proteins, T7 Phage Capture Plates were used as above, except that the human sera were pre-incubated overnight with serial dilutions of the proteins MST1, SULF1, or GST. In addition, the preincubated sera were tested in ELISA plates (Maxisorp, Nunc) coated with EBNA1 to verify that the competition between the phage and its respective full-length protein for IgG was specific. EBNA1 was used as a positive control. EBNA1 corresponds to the Epstein-Barr nuclear antigen 1 protein of the Epstein-Barr virus. Over 90% of the human population has been infected with the virus in some moment of their life and presents antibodies to this protein (18). ELISA experiments with full-length proteins MST1, SULF1, and EBNA1 were performed as described before (5). CEA concentration in serum was determined using a specific immunoassay test kit (MP Biomedicals, Santa Ana, CA), following the manufacturer's recommendations.
Statistical Analysis-Microarray data were normalized and processed using the Asterias applications (http://asterias.bioinfo. cnio.es/), a web interface to the limma and marrayNorm Bioconductor packages. After applying a background correction and the global loess normalization (http://dnmad.bioinfo.cnio.es/), data were processed to filter missing values or spots with a too high variance, to merge replicates and then obtain a single value for each phage clone and to transform values in base 2 logarithms (http://prep.bioinfo. cnio.es/). To compare the CRC patients and healthy individuals groups, we performed a t test using pomelo II (http://pomelo2.bioinfo.cnio.es/), where p values were obtained by permutation testing (in our case 200,000). Pomelo II generated a heatmap showing the phages with a FDR-value below 0.15 and an unadjusted p value below 0.05.
For bootstrapping analysis, we fitted a logistic regression model, where we model the probability of being tumoral versus normal as a function of the variables (phages and proteins). We also included in the model the age and sex of the patients, to correct for possible effects of these variables. Models were assessed for adequacy, including the need for nonlinear transformations, using the usual residual plots. To assess predictive ability, we computed the area under the ROC (AUC). However, the AUC computed directly with the original model and the complete data set is too biased toward high values. Thus, we used the bootstrap, with 1000 replicate samples, to obtain a bias-corrected AUC (19). With the bootstrap, we repeatedly sampled with replacement from our original data, and fit the model to that sample, testing the model on the left-out samples. Thus, for each of our 1000 bootstrap samples, we obtained 1000 estimates of AUC from the left-out samples, samples that were not used to fit the model. We refer to this as the biascorrected AUC. This is, therefore, an estimate of the AUC we would obtain from a future independent validation. All models were fitted using Harrell's Design library (20) with the R statistical computing system (21).

Profiling of Colorectal Cancer Sera with T7 CRC Phage
Microarrays-RNA from six patients (three in Dukes' stage A and three in stage C) was used to construct phage cDNA libraries in two vectors (T7Select 415-1 or T7Select 10 -3b). Following removal of nonspecific phages and selection of cancer-specific phages, we obtained eight different tumorspecific enriched phage libraries, according to the vector and the serum pool (B, C, Li, and Lu) used during the biopanning procedure (see supplemental data). A total of 1536 individual phages were selected (192 individual phages from each selection) and printed in duplicate onto nitrocellulose slides. The amount of phage printed in the slides was tested by using anti-T7 and anti-human IgG as controls (supplemental Fig. S1A). A homogeneous signal was observed for anti-T7, whereas the anti-human IgG did not give any signal. To determine the intra and inter reproducibility of the arrays, we plotted the intensity of the two spots corresponding to the same phage clone and compared the data from two different microarrays. We verified that intra and inter reproducibility of the arrays were quite good (R 2 values were 0.9703 and 0.9091, respectively) (supplemental Fig. S1B). Then, slides were probed with 30 sera (15 from patients at different stages and 15 from healthy controls). Following image quantification and normalization, we compared cancer and normal sera using a t test analysis with 200,000 permutations. One hundred and twenty-eight phage clones showed different reactivity between the two groups, with a FDR Ͻ 0.22, 78 phage clones showed increased reactivity, whereas 50 showed a decreased reactivity in CRC sera. A supervised clustering analysis of 50 phage clones with the lowest independent FDR (Ͻ 0.15) showed a clear discrimination between CRC patients and healthy individuals (supplemental Fig. S2).
Identification of Phage-inserted Sequences-Out of the 78 phages showing an increased reactivity with CRC patients' sera, we obtained 43 unique amino acid sequences as fused to the T7 10B capsid protein (supplemental Table S1). Among these 43 phages, those containing (i) between 8 and 20 residues with the highest possible homology to predicted protein sequences, (ii) highest number of phages with the same sequence, and (iii) lower FDR or p value were selected for further studies. Although most of the inserted sequences corresponded to nonassigned genomic regions, peptides showing homology to proteins MST1/STK4, SULF1, NHSL1, SREBF2, GRN, and GTF2i were identified in the reading frame of the 10B capsid protein. All of them gave a higher microarray signal with tumor sera than control ( Fig. 2A). As expected, a significant variation in reactivity was observed between the different patients. Remarkably, MST1/STK4 protein was previously identified as TAA using Protoarrays (5) and SULF1 gene was up-regulated in a CRC transcriptomic analysis (22). Fig. 2B shows a heatmap of the results with the six phage predictor in the training set.
To confirm that peptides expressed in the phages were homologous to MST1 and SULF1 proteins, phages expressing both peptides were subjected to competition analysis with MST1 and SULF1 recombinant proteins. Binding of human cancer sera to both phages was inhibited in a dose-dependent specific manner by MST1 and SULF1 recombinant proteins (Fig. 3A). Antibody binding was almost unaffected when GST was used as a negative control. In contrast, antibody binding to EBNA protein was not affected by incubation with MST1 or SULF1. Phage-inserted sequences were located at the C-terminal region of MST1 and at the N-terminal of SULF1 (Fig. 3B). Collectively, these experiments confirm that the displayed peptides correspond to MST1 and SULF1 proteins.
Phage-homologous Proteins are Overexpressed in Colorectal Cancer-Tumor antigens recognized by autoantibodies are generally overexpressed in tumor cells and cancer tissues (5,8). A meta-analysis of the mRNA expression levels corresponding to the proteins homologous to the six selected phages was carried out with Oncomine (23), a public open cancer microarray database (Fig. 4A). SULF1 was the most overexpressed gene in different types of colon cancer, followed by GTF2i, MST1, GRN, NHSL1, and SREBF2. In addition, we carried out a Western blot analysis using MST1 and SULF1 antibodies on a panel of 11 colorectal cancer cell lines and CRC tumors representing different progression stages (Fig. 4B). MST1 and SULF1 were expressed in most of the colon cancer cell lines. SULF1 highest expression was mainly observed in metastatic cell lines (SW48, HT29 or COLO205) and at late stage tumor samples. Cellular protein expression patterns of identified proteins were characterized by immunohistochemistry on independent series of CRC tumors contained in custom-made tissue microarrays (MST1/STK4, SULF1) or by meta-analysis according to data retrieved from the Human Protein Atlas in the case of GRN and GTF2i (24) (Fig. 4C). A significant more abundant expression of GRN and GTF2i was reported in neoplastic tissue in comparison to paired normal tissues. For MST1/STK4, most of the tumor tissues showed intense or moderate positivity, whereas the normal mucosa was negative or mildly positive. Tumors were moderately positive for SULF1, whereas normal mucosa displayed a weak staining (Fig. 4D). According to the staining scale (0, low to 3, high) applied for the evaluation of the TMA, we found for MST1 a mean value of 1.96 Ϯ 0.98 and 0.04 Ϯ 0.2 for tumoral and normal tissue, respectively, giving a p value of 5.0E-10, which confirms a statistically significant higher expression of MST1 in tumoral tissue (Fig. 4E). For SULF1, we found a mean value of 1.91 Ϯ 0.30 and 0.55 Ϯ 0.52 (p value 1.2E-6) for tumoral and normal tissue, respectively (Fig. 4E). Collectively, all these data indicate a good correlation between autoantibody targeting, protein abundance and gene expression.

Validation of the Phage-Peptide Detector and Associated
Proteins-An independent cohort of 153 samples (50 colorectal cancer, 46 control samples, 10 asymptomatic patients with familiar antecedents, 2 hyperplasic polyps, 2 ulcerative colitis, and 43 sera from other types of cancer (bladder, breast, lung, pancreas, and stomach) ( Table I) was used for validation, with 19 samples coming from early colorectal cancer stages (AϩB). We tested MST1, SULF1, NHSL1, SREBF2, GRN, and GTF2i-like phage lysates for the ability to discriminate cancer from control sera by using individual ELISA assays. ROC curves were generated for each of these ELISAs. Whereas the sensitivity was relatively low for the individual phages, oscillating between 46 and 58%, the specificity was higher, between 52.2 and 73.9% (Table II). To investigate if different combinations of phages would produce higher accuracy, we fitted the data to a logistic curve, performed logistic regressions and produced different models using different combinations of the phages. When a combination of the six phages was used as a predictor, the area under the curve (AUC) increased up to 0.78, with a sensitivity and specificity of 72 and 73.9%, respectively (Table II). This specificity supported further analysis to assess the clinical relevance of the homologous proteins.
We next tested if the replacement of the phages by the recombinant proteins MST1 and SULF1 could improve the diagnostic accuracy. The results confirmed a significant prediction improvement by using the recombinant proteins, with AUCs of 0.71 and 0.74 for SULF1 and MST1 proteins against 0.63 and 0.58 of the respective phages (supplemental Fig. S3; Table II). By combining the two proteins and four phages, the AUC increased up to 0.86 with a sensitivity of 82.6% and specificity of 70% (Fig. 5A). CEA values were lower (AUC: 0.81) and combined with the rest of the predictor hardly improved the model (AUC: 0.89) (supplemental Fig. S4). Moreover, in the validation step different estimations of AUC were done to compare not only CRC versus healthy but CRC versus all reference sera and healthy versus other tumors (Fig.  5). The most relevant result was the ability of our model to discriminate not only CRC from healthy sera (AUC: 0.86) (Fig.  5A), but also CRC from all the reference sera, which included other related colon pathologies (AUC: 0.85) (Fig. 5B). Remarkably, the panel did not discriminate properly healthy from other tumors (AUC: 0.63) (Fig. 5C). Moreover, the panel seemed to discriminate significantly healthy controls from asymptomatic patients with familiar history of CRC (AUC: 0.78) (data not shown), although the small sample set used will require further verification.
Bootstrapping Analysis and Final Prediction Model-In addition, we performed bootstrapping to obtain a bias-corrected AUC. The initial model included linear terms for all phages and FIG. 3. Competition analysis between phage-peptides and homologous proteins. A, A competition ELISA was performed between phages displaying peptides with homology to SULF1 and MST1 and the full-length proteins. GST was used as negative control. Increasing amounts of the recombinant proteins were pre-incubated with the sera and then tested for antibody binding to the phage (vertical bars: black, recombinant protein; white, GST). In the scatter plot, the IgG binding to EBNA1 of the same sera, pre-incubated with increased amounts of recombinant proteins is represented. EBNA 1 was used as a control to demonstrate that the inhibition was protein-specific and no bias was introduced in the experiment (black squares, recombinant protein; white triangles, GST). The Optical Density (OD) at 450 nm of both assays is represented in the figure. Error bars represent standard deviation of three separate experiments. B, Localization of the peptides with homology to SULF1 and MST1 in the full length proteins. Phage-displayed peptide is shown as a black box. White bars correspond to potential phosphorylation sites. Amino acids that were different between the phage-peptide and the wild-type protein are represented in small letter.
proteins, together with two other variables: gender and age. With this model, the value of the bias-corrected AUC was 0.86. This model was probably more complex than justified.
Thus, we carried out variable selection, using backwards selection with Akaike's Information Criterion as stopping rule. The final model retained (GRN phage, MST1, and SULF1   FIG. 4. Analysis of SULF1, MST1, GTF2i, NHSL1, GRN, and SREBF2 expression in CRC tissues. A, Meta-analysis of gene expression levels corresponding to the proteins homologous to the phage-displayed peptides was assessed by using the Oncomine database. p values are also indicated. Relative gene expression levels were found for NHSL1, SREBF2, GTF2i, SULF1, MST1, and GRN. B, Western blot analysis of SULF1 and MST1 overexpression in tumoral cell lines and paired cancer tissues corresponding to stages A(I), B(II), and C(III). Tubulin was used as a control. C, Tissue microarray data of GTF2i and GRN expression were retrieved from the Human Protein Atlas. D, MST1/STK4 and SULF1 showed intense cytoplasmic staining in well-differentiated enteroid adenocarcinoma of the right colon, whereas normal colonic mucosa far from the tumor was not stained with the antibody. As internal control, we used the positivity of the inflammatory cells in the lamina propria (MST1/STK4 intense staining and SULF1 mild staining). Images were taken at a 200ϫ magnification. E, Immunohistochemistry results for MST1/STK4 and SULF1 in CRC tissue and the normal mucosa of 25 CRC patients were quantified by two pathologists according to the following criteria: 0, no staining; 1, weak staining; 2, normal staining; 3, strong staining. Error bars represent the S.D. of the assay. p values are indicated.
proteins plus the age of the patients (supplemental Table S2). However, to avoid an overestimation of the predictive capacity of the model, we obtained bias-corrected estimates of the AUC, by bootstrapping the complete process of variable selection (i.e. for each bootstrap sample, we started with the complete model with eight variables, and used Akaike's Information Criterion as the stopping rule). The bias-corrected AUC was 0.84. Bootstrapping also provided information on the stability of the selection procedure: among the bootstrapped models, most contained either four, five, six, or seven variables (171, 262, 329, and 172 out of the 1000 bootstrap replicates, respectively). Some of the variables appeared in most of the models: GRN phage in 976, SULF1 protein in 954, age in 952 and MST1 protein in 833.
Moreover, we used this model to predict the probability of being CRC for the set of 57 sera comprising diverse pathologies. We constructed a dotplot representation (Fig. 5D), where we showed the individual probability for each subject. A wide variability in probability is observed within each group, but the median is well below 0.5, indicating a low probability of being CRC.
Then, we tested the value of the prediction according to the stage of the patients for early diagnosis purposes. We started from the model with six markers (4 phagesϩ2 proteins) plus age. The bias-corrected AUC using bootstrapping was 0.786 for stages AϩB, 0.857 for stage C and 0.849 for stage D. If we apply the same test with the CEA values, the bias-corrected AUC were 0.742 for AϩB, 0.770 for C, and 0.973 for stage D.
These results indicate a superiority of our predictor for stages A, B, and C, being CEA only superior for stage D as expected.

DISCUSSION
The use of the microarray format for phage display cancer peptide libraries for autoantibody screening permits an objective identification and quantification not possible by other means. Still, the approach is rather cumbersome and laborintensive when compared with the use of recombinant protein microarrays. The technique requires considerable effort and resolution. Moreover, the identification of inserted sequences led in most of the cases to mimotopes with no clear protein assignation. All these factors make difficult its widespread use and may explain the relatively low number of reports that have applied this strategy so far. However, protein and phage microarrays have enabled the discovery of relatively large panels of proteins recognized by autoantibodies in colorectal cancer. The number of these TAAs vastly increases the number and prevalence of those antigens previously identified in cancer patients by other approaches (10). Still, we do not know yet how many proteins become autoantibody targets in cancer patients and the molecular basis for this autoimmunity.
As a novelty, this report demonstrates, for the first time, the correspondence between phage-inserted peptides and the corresponding recombinant proteins. Recombinant MST1 and SULF1 proteins were able to compete and displace antibody binding to the phages in ELISA assays. Moreover, they increased the predictive accuracy of the assay. This is an important step to support the reliability of this technology. The classifier using four combined phages and two proteins resulted in high specificity (70%) and sensitivity (82.6%) for  CRC sera, improving CEA prediction capacity. Combination with CEA did not improve significantly the diagnostic accuracy of the panel detector (supplemental Fig. S4). The addition of sera from other tumors to the validation step did not change the prediction power of the panel, stressing the value of this approach. Specificity of the test was confirmed by the low AUC obtained after comparing healthy sera from other non-CRC, cancer sera. The significance of this study was to develop a diagnostic assay useful for identification of early adenocarcinomas in CRC, with a bias-corrected AUC of 0.786 for stages AϩB. Moreover, preliminary data seem to support that this panel could also discriminate very early stages or Receiver-operating-characteristic curves are based on multiplex analyses of the four phages and two proteins from a total of 153 samples (50 samples from CRC patients, 46 healthy controls, 10 samples from controls with CRC familiar antecedents, 2 from ulcerative colitis patients, 2 from patients with hyperplasic polyp, and 43 samples from patients with bladder, breast, lung, pancreatic or stomach cancer). A, Performance of CRC samples versus healthy controls. B, Performance of CRC samples versus all reference sera. C, Performance of healthy sera versus other tumors sera. D, Dotplot showing individual probability of being classified as CRC patient for each of the subjects with different pathologies. The predicted probability is the predicted probability from the final logistic regression model (to differentiate between CRC and reference subjects) following variable selection. Most of the samples were classified below the 0.5 threshold probability (gray line). Therefore, the model did not detect general markers for cancer or inflammatory disease, but particular markers of CRC. asymptomatic patients. Therefore, this panel of biomarkers might be extremely helpful in defining high-risk populations that should go through enhanced screening procedures like colonoscopy as an alternative to FOBT. Although FOBT is relatively inexpensive and non-invasive, it displays high false positive rates (3) and promotes unnecessary colonoscopies (25).
We have observed coincidences between the identified proteins in these phage-arrays and the commercial protein arrays (Protoarrays®). At least two proteins, MST1/STK4 and DNAJ (data not shown), were identified with both types of arrays. DNAJ-specific autoantibodies were previously reported for lung cancer (14). Together with MST1/STK4 and SULF1, other four phages: NHSL1, SREBF2, GRN, and GTF2i, were used for validation of the predictive and diagnostic capacity. The other four sequences will require further verification to prove that they correspond to those hypothetical proteins. It will require the synthesis of the identified peptides or the expression of the full-length recombinant protein. The identification of only small homologous peptides displayed on the phage surface seems to be because of the random cloning of cDNA fragments. Many cDNA inserts correspond to antisense mRNAs, aberrant splicing regions, and other variants. This resulted in the production of phages containing peptide sequences with weak or no homology to known proteins (supplemental Table S1). These peptides have been generally described as mimotopes, epitopes that reflect conformational epitopes and, therefore, have no significant homology to any other known protein. From our results, the concordance between peptides and proteins might be fortuitous and not because of the insertion of cDNA-specific encoding sequences. Thus, the use of random-peptide libraries (26) would be almost equivalent to this approach.
As mentioned before, individual phages offer a lower sensitivity and specificity than the corresponding recombinant proteins, probably because only a single peptide/epitope is involved in the binding. As previously reported (8), we also experienced the necessity of combining multiple phages or phages and proteins (MST1, SULF1) to get a satisfactory diagnostic value.
Expression analysis of MST1 and SULF1 at the tissue level indicated a potential association of SULF1 with late stages of cancer progression and a significant value of these two biomarkers for CRC diagnosis (Fig. 4E). Protein expression data by Western blot were concordant with high mRNA levels of SULF1 in advanced carcinomas according to the meta-analysis of gene expression in tumoral tissues. In agreement with previous results, there was a good correlation between the presence of autoantibodies against a protein and an elevated mRNA and protein expression. Regarding functional activity, SULF1 diminishes HSPG (heparan sulfate proteoglycans) sulfation, inhibits signaling by heparin-dependent growth factors, diminishes proliferation, and facilitates apoptosis in response to exogenous stimulation (27,28). Messenger RNA down-regulation was observed in ovarian, breast, pancreatic, renal and hepatocellular carcinoma cell lines. However, SULF1 has been reported as up-regulated in CRC tumors (29). This difference in expression between CRC and other tumors might explain the specificity of SULF1 as CRC biomarker.
A previous report of association between MST1 expression and improved survival in colon cancer patients was observed (30). The mechanism underlying this prognostic value might be related to the functional activity of STK4/MST1, which is a stress-activated, pro-apoptotic kinase. MST kinases play important roles in diverse biological processes including cellular responses to oxidative stress and longevity (31).
In summary, we have generated a novel CRC detector based on phages and associated homologous proteins able to generate a diagnostic assay with superior predictive capacity to CEA, especially for early stages, and capable of distinguishing patients with CRC from control subjects or other cancer types. MST1 and SULF1 are candidate biomarkers for CRC diagnosis. The discovery of identical TAAs (MST1) by two different protein array platforms supports the robustness of the application and the significance of autoantibody detection for the early diagnosis of colorectal cancer.
Acknowledgments-RB is recipient of a JAE-DOC Contract of the CSIC. We thank Dr Felix Bonilla (H. Puerta de Hierro) by kindly supplying CRC samples. * This research was supported by grants from the Spanish Ministry of Education and Science BIO2009-08818, "Proyecto Intramural de Incorporació n-CSIC", Colomics Programme of the regional government of Madrid and grants from the Fundació n Mé dica Mutua Madrileñ a, Instituto de Salud Carlos III (FIS 05/1006 and 08/1635), the CIBERESP G55, the "Acció n transversal del cancer" and the Proteored platform.
□ S This article contains supplemental material, supplemental Figs. S1 to S4, and supplemental Tables S1 and S2.