9 Targeted High-Throughput Glycoproteomics for Glyco-Biomarker Discovery

Biomarker discovery has become a major research area in proteomics as protein markers are more readily developed into clinical diagnostic tests than nucleic acid biomarkers. This is reflected by the fact that all United States Food and Drug Administration (US FDA)approved biomarkers currently available for clinical use are protein molecules (Srivastava, Verma, and Gopal-Srivastava 2005). Proteomic technologies for the global study of proteins have evolved in the past decade, in response to the growing demand for body fluid biomarker development (Anderson and Hunter 2006; Wang, Whiteaker, and Paulovich 2009). While mass spectrometry technology is improving in sensitivity and speed, several technical challenges in protein biomarker discovery still requires optimization. These include maximizing sample throughput to process adequate number of samples, reaching high sensitivity, specificity and reproducibility required for FDA approval, and managing the costs for biomarker discovery and assay development. This chapter will discuss the application of a targeted proteomics approach using lectins as affinity reagent throughout the biomarker discovery pipeline, and automation with magnetic beads to increase throughput.


Introduction
Biomarker discovery has become a major research area in proteomics as protein markers are more readily developed into clinical diagnostic tests than nucleic acid biomarkers.This is reflected by the fact that all United States Food and Drug Administration (US FDA)approved biomarkers currently available for clinical use are protein molecules (Srivastava, Verma, and Gopal-Srivastava 2005).Proteomic technologies for the global study of proteins have evolved in the past decade, in response to the growing demand for body fluid biomarker development (Anderson and Hunter 2006;Wang, Whiteaker, and Paulovich 2009).While mass spectrometry technology is improving in sensitivity and speed, several technical challenges in protein biomarker discovery still requires optimization.These include maximizing sample throughput to process adequate number of samples, reaching high sensitivity, specificity and reproducibility required for FDA approval, and managing the costs for biomarker discovery and assay development.This chapter will discuss the application of a targeted proteomics approach using lectins as affinity reagent throughout the biomarker discovery pipeline, and automation with magnetic beads to increase throughput.

Biomarkers
Biomarkers are biological molecules that correlate with a disease condition or phenotype.The search for cancer biomarkers has increased as the traditional tumor node metastases (TNM) system, a morphological pathology-based system used to determine the treatment strategy and prognosis in cancer patients, cannot correlate cancer subtypes with clinical outcomes (Ludwig and Weinstein 2005).Many studies using gene expression profiling have been published in the past decade contributing to a detailed molecular classification of each tumor subtype (Srivastava and Gopal-Srivastava 2002).Genomic profiling of tumor samples allowed the access to individualized genomic data to determine the appropriate treatment method or prognosis.For example, nonsmall cell lung cancer patients with mutated epidermal growth factor receptor (EGFR) will be able to receive an inhibitor of the EGFR tyrosine kinase activity called gefitinib (Belda-Iniesta, de Castro, and Perona 2011).The availability of specific non-invasive biomarkers will facilitate this type of tailored or personalized medicine to improve therapy and patient outcomes.Srivastava 2002).The process of developing such a test is a difficult and uncertain task, as reflected by the declining number of newly approved biomarker tests by the FDA.However, despite this, there are a growing number of articles published on potential biomarker candidates (Anderson and Anderson 2002;Polanski and Anderson 2007;Rifai, Gillette, and Carr 2006).Depending on the purpose of the biomarker and its application in clinics, the criteria and developmental approach for each biomarker varies.The conventional biomarker discovery pipeline involves five stages.Clearly defined issues should be addressed at each stage to guide the process through to success (Fig. 1) (Surinova et al. 2011;Pepe et al. 2001).Fig. 1.Biomarker discovery workflow and study objectives for each phase.Modified from Pepe et al. (Pepe et al. 2001) Phase 1 -Preclinical Discovery phase.Phase 1 is dedicated to hypothesis driven identification of candidate biomarkers, ranking and/or finding suitable combinations of potential biomarkers.The clinical question is defined and a small number of samples are obtained and analyzed to generate a list of candidates with their fold changes (Pepe et al. 2001).Phase 2 -Preclinical verification.Phase 2 evaluates the (ranked) list of potential biomarkers generated in phase 1 using clinical samples from cases with known diagnosis.The end point of the assay may be mean concentration of candidate protein(s) or a unique signature associated with either one of the groups (Alonzo, Pepe, and Moskowitz 2002).The reproducibility, dynamic range and limit of detection (sensitivity) are determined in a relatively small cohort of patients, but with more patients than phase 1 (Rifai, Gillette, and Carr 2006).Another aim of the verification phase is to determine the sample size required for the Preclinical validation phase, to achieve statistical significance.Phase 3 -Preclinical validation.The third phase is still within the scope of preclinical assessment but the aim is to generate a disease signature to determine whether the study objective can be met by the platform.The control and patient groups are designed retrospectively and the numbers used depend on the sensitivity and specificity of the biomarker determined in the previous phase, and the prevalence of the cancer in the population.The results are evaluated for analytical performance including test accuracy and precision, and clinical performance (Gutman and Kessler 2006), which must meet single-digit measurement coefficient-of-variation values (CVs) from measurement of thousands of patient samples.If the performance of the optimized assay meets the clinical objective, the process proceeds to the next phase, clinical evaluation.Phase 4 -Clinical evaluation.Phase 4 is the development of a clinical assay and clinical evaluation of the biomarker as an in vitro diagnostic test.This phase is prospective and involves new control subjects and patients who are yet to be diagnosed (Manolio, Bailey-Wilson, and Collins 2006).The patient group sizes increase again based on the results from phase 3. The aim of phase 4 is to fulfil the clinical requirements and determine the true positive and false positive rates.Phase 5 -Disease control.The last phase aims to determine the effect of the biomarker on disease management in the target population.Therefore, the biomarker proceeds into phase 5 when it is approved and accepted for clinical use.Phase 5 consists of the largest sample size and thus takes many years to complete.Data pertaining to cost of the test, as well as the consequences from the use of the biomarker are determined.Biomarker development has had limited progress due to the lack of effective technology, established guidelines for designing clinical sample groups in each phase, standardized procedures for the development of the biomarker pipeline and quality assessment of the studies published (Mischak et al. 2007;Surinova et al. 2011).Therefore, by addressing the study objective clearly and by applying considerations for each phase, biomarker research should lead to more translatable candidates in the clinical context.

Proteomics for biomarker discovery
As described above, the road to discover biomarkers is a long and uncertain path consisting of different stages and multiple validation steps.The decisions made especially in the first few phases on the ranking of candidates or the best combination of candidates to maximize the sensitivity and specificity have enormous effects on the outcome of a successful biomarker assay.Consistency in the proteomics techniques and sample type used for each phase is crucial to successful biomarker discovery and validation.

Choice of sample type
The choice of sample type may be determined by availability, as well as complexity of the sample type for the available technology.Although the final preferred outcome are body fluid (commonly blood) tests, plasma or serum as a sample for proteomics is technically challenging due to dilution of potential biomarkers and the presence of high abundance proteins masking the lower abundance disease-associated proteins.Estimates suggest that there are more than 10 6 proteins in the blood proteome while one protein (albumin) accounts for more than half of all blood proteins (Zhang, Faca, and Hanash 2011).Approximately 22 proteins, including globulins, transferrins and fibrinogen make up 99% of the total blood proteins.Additionally, the concentration of a blood protein can range from less than 1-5 pg/ml to more than 55 billion pg/ml, stretching across seven logs (Zhang, Faca, and Hanash 2011).Immunodepletion columns have been developed to remove the top 6, 7, 12, 14, or 20 proteins from plasma/serum, prior to proteome profiling (Smith et al. 2011;Gong et al. 2006;Tu et al. 2010).However, this procedure may also deplete potential proteins of interest that are bound to albumin in the blood stream, as well as low abundance proteins due to non-specific binding (Gong et al. 2006).Due to these technical difficulties, many studies choose to use tissue samples in the discovery phase; however, it is difficult to predict which proteins will be easily detected in the blood as data derived from tissue is not always translatable to blood (Abbott and Pierce 2010).Therefore, direct analysis of plasma or serum rather than tissue may be useful in the initial discovery phase (Rifai, Gillette, and Carr 2006;Kulasingam and Diamandis 2008).

Choice of technology
Ideally, similar or compatible techniques are used throughout the biomarker discovery and validation pipeline.However, no single technique can fulfill the requirements of all 5 phases with sufficient throughput, sensitivity and accuracy.Phase 1 requires the measurement of thousands of analytes in few samples, while phases 2-4 require the (simultaneous) measurements of fewer analytes in increasing number of samples.Furthermore, clinical assays (phase 4-5) ideally requires minimal sample handling.Current proteomic profiling methods used in the discovery phase are not suitable for later phases since techniques such as two-dimensional difference gel electrophoresis (2D-DIGE) and multidimensional protein identification technology (MuDPIT) can only analyze one sample at a time, and require days of processing.Current technologies for preclinical and clinical phases such as radioimmunoassay (RIA), enzyme-linked immunosorbent assay (ELISA) and multiplex fluorescent detection technology are antibody-based assays requiring identified target, and hence not applicable to the discovery phase.The development and use o f S e l e c t e d R e a c t i o n M o n i t o r i n g m a s s s p e c t r o m e t r y ( S R M -M S ) a s p r e -c l i n i c a l a n d potentially clinical assays not only provide a link between discovery, validation and clinical techniques, it also avoids the significant cost outlay for antibody development.Hence SRM-MS technology is fast becoming the method of choice for pre-clinical phases, and is set to make it into the clinical arena.

Improving throughput
Due to the high cost and low sample throughput in proteomics technology, biomarker discovery workflows have commonly suffered from the lack of sufficient technical and biological replicates.To address these short-comings, significant effort has been spent on sample preparation and separation using automation on robotic liquid handler, and the introduction of nanomaterial for nanoproteomics (Ray et al. 2011).Increased throughput in mass spectrometry can be achieved by means of multiplexing samples (Boersema et al. 2009;Chen et al. 2007) and/or shortening bioinformatic analysis time after the generation of mass spectrometry data (Martens 2011a(Martens , 2011b)).

Targeted proteomics
Discovery proteomics workflows generally require multiple steps of separation due to high sample complexity.One strategy to reduce extensive separation steps is to enrich for a subset of proteins that are disease-relevant.In this chapter, we focus on the potential of targeted glycoproteomics as an all-encompassing technology for the phases of (glyco-) biomarker discovery.

Glycoproteomics
Glycoproteomics, an area of proteomics with biological and clinical significance, is an emerging field in biomarker research (Pan et al. 2011;Meany and Chan 2011).
Glycoproteins are a group of proteins in which one or more glycans (sugars) are covalently bonded to the protein through a process called glycosylation.There are two main types of protein glycosylation: (i) N-linked glycosylation whereby the glycan is attached to the amide nitrogen of asparagine in a consensus Asparagine-X-Serine/Threonine (Asp-X-Ser/Thr) sequence, where X can be any amino acid except proline and (ii) O-linked glycosylation in which the glycan is attached to the hydroxyl oxygen of serine or threonine in the protein.Glycosylation is the most abundant posttranslational modification and the most structurally diverse.There are at least 14 different monosaccharides and 8 different amino acids involved in this process with at least 41 different chemical bonds in glycanprotein linkage.Glycoproteins are important targets in the search for biomarkers for the following reasons: (i) more than 50% of secreted proteins are glycoproteins, (ii) glycosylation changes in tissues, blood and serum from patients with disease has been implicated in pathogenesis, (iii) changes in glycosylation can be more distinctive than changes in protein expression, as specific glycan structures are generally not present normally, but increase in disease states, (iv) changes in glycosylation occur in many proteins including abundant proteins, thus increasing the likelihood of early detection, (v) the glycosylated form of a particular protein site is generally stable for a given cell type and physiological state, and (vi) as one of the important functions of glycans is in cell-cell interactions and consequently the control of cell function, alterations of protein glycosylation can be diagnostic for a disease (Pan et al. 2011;Packer et al. 2008).Altered glycosylation can be seen in diseases as hypo, hyper or newly glycosylated sites, and/or altered carbohydrate moieties (Pan et al. 2011).Although advances in technologies used in glycoprotein research has been slow due to the complicated nature and vast variety of changes in glycosylation, advances in proteomic technologies have facilitated glycoproteomics research.An excellent example of a glycobiomarker is alpha-fetoprotein (AFP), a marker for hepatocellular carcinoma (HCC) (Sturgeon et al. 2010).The specificity for AFP in HCC is low, limiting the use in the clinic (Meany, Sokoll, and Chan 2009), however, recent studies have shown that the fucosylated form of AFP which is highly reactive with the Lens culinaris agglutinin, also known as AFP-L3, improves the specificity (Masuda and Miyoshi 2011), demonstrating the utility of glycobiomarkers.

Glycoproteomic approaches for biomarker discovery
A typical glycoproteomics pipeline consists of glycoprotein enrichment techniques, followed by multidimensional chromatographic separation, and mass spectrometry with bioinformatic data analysis.Glycoproteomics approaches can be divided into glycoproteinbased and glycopeptide-based methods (Fig. 2).Glycoprotein-based enrichment methods, also known as the top-down workflow, enrich for the glycoproteins prior to proteolytic digestion with enzymes such as trypsin.Glycan cleavage is performed before or after proteolytic digestion.In glycopeptide enrichment methods, proteolytic digestion is performed before enrichment.This is also known as the bottom-up workflow.The bottomup workflow is more popular as it provides detailed information of a glycoprotein profile, and also specific mapping of glycosylation sites.However, the bottom-up workflow can result in very low sample throughput, and current technology is not capable of determining detailed glycan structure of glycoproteins in one analysis (Pan et al. 2011).On the other hand, the top-down workflow may not accurately provide mapping of glycosylation sites, Fig. 2. Glycoproteomic approaches for glycan, deglycosylated and intact glycopeptide analysis.In the top-down workflow, glycoprotein enrichment is performed which may or may not follow deglycosylation.In the bottom-up workflow, proteins are digested then glycopeptides are enriched for further analysis.
although it results in greater glycoprotein sequence coverage.Therefore, the technique used will depend on the specific research question asked.

Glycoproteome enrichment techniques
Several techniques have been used for enrichment of glycans, glycopeptides and glycoproteins (Tousi, Hancock, and Hincapie 2011;Rakus and Mahal 2011;Pan et al. 2011), including hydrazide chemistry-based solid phase extraction methods, boronic acid-based solid phase extraction, size exclusion chromatography, hydrophilic interaction liquid chromatography (HILIC), activated graphitized carbon and lectin affinity based methods (Table 1).This chapter will discuss the potential of lectins as a universal enrichment tool in all phases of the glyco-biomarker discovery workflow.Lectins are naturally occurring sugar binding proteins which are highly specific for their sugar moieties.Their abilities to recognize and bind to specific glycans make them ideal for glycan structure specific glycoprotein enrichment.Lectins have been used in biological research as an affinity reagent for the past few decades, with applications such as lectin histochemistry (Brooks et al. 1996;Carter and Brooks 2006), lectin blotting (Welinder et al. 2009), lectin-affinity chromatography in combination with mass spectrometry (Abbott and Pierce 2010;Yang et al. 2006;Zhao et al. 2006;Xu et al. 2007;Qiu et al. 2008;Jung, Cho, and Regnier 2009) and lectin microarray (Gupta, Surolia, and Sampathkumar 2010;Katrlik et al. 2010) to examine the glycoproteome of serum and plasma.

Use of lectins in glyco-biomarker discovery
The potential of a lectin-enrichment step to be coupled to different downstream assay techniques is attractive in glyco-biomarker discovery as it reduces the potential variation introduced by the change of enrichment methods going from one phase to another (Fig. 3).For example, in the discovery workflow of phase 1, lectin-enrichment can be followed by glycoprotein or glycopeptide separation and identification by tandem mass spectrometry (MS/MS), to measure hundreds of analytes.In the preclinical stages (phases 2 and 3), lectin affinity isolation may be coupled to SRM-MS for targeted quantification of a reduced number of candidates.Although SRM-MS assays may have the desired sensitivity and reproducibility, routine use in clinical pathology laboratories will need additional technology optimization.Lectin affinity can also be incorporated into other preclinical verification technology such as multiplexed immunoassay incorporating fluorescencelabeled microspheres with specific antibodies (Li et al. 2011), multiplexed protein analysis using antibody-conjugated microbead arrays (Theilacker et al. 2011), and multiplex proteins assays using magnetic nanotag sensing (Osterfeld et al. 2008).For clinical phases 3-5, existing antibodies may be used or antibodies may be developed for use in lectin microarrays or lectin-immunosorbent assays.Fig. 3. Biomarker discovery pipeline using lectins.

Lectin affinity chromatography for glyco-biomarker discovery
Lectin affinity chromatography is a technique that employs one or more lectins to enrich for structurally similar subset(s) of glycoproteins or glycopeptides (Jung, Cho, and Regnier 2009;Durham and Regnier 2006;Yang et al. 2006).By coupling this technique to mass spectrometry analysis, bound and unbound fractions can be analysed to identify proteins in the two fractions.Lectin affinity chromatography can be performed in different formats including tubes, packed columns, microfluidic channels and high pressure liquid chromatography (HPLC) (Mechref, Madera, and Novotny 2008).Different types of support matrices can be used to immobilize the lectins, such as sepharose/agarose beads (Kobata and Endo 1992;Mechref, Madera, and Novotny 2008), magnetic beads (Lin et al. 2008), silica or styrene-divinylbenzene co-polymers coated with a cross-linked polyhydroxylated polymer (POROS) (Tousi, Hancock, and Hincapie 2011).Commonly used lectins include mannose and glucose binding concanavalin A (ConA) and N-acetylglucosamine binding wheat germ agglutinin (WGA) for their broad binding specificities and affinity to most Nlinked glycans in biological material.For O-linked glycans, jacalin (JAC) is added to these two lectins for a global range of glycoprotein enrichment.For more specific enrichment, sialic acid and/or fucose binding lectins can be used, such as Sambucus nigra agglutinin (SNA) and Maackia amurensis agglutinin (MAA) for sialic acid and Aleuria aurantia lectin (AAL) for fucose.A wide range of different sample types have been used including soluble and membrane derived glycoconjugates from serum/plasma, cell lysates and tissue homogenates.Elution of bound glycoproteins/peptides is commonly achieved using competitive sugar of relatively low concentrations (5-100 mM) (West and Goldring 1996) or low pH such as acidic solutions (Green, Brodbeck, and Baenziger 1987).Lectin affinity chromatography can be incorporated into top down or bottom up proteomics workflows, where the glycoproteins or the glycopeptides are identified by LC-MS/MS, respectively.Top down workflows identify lectin-reactive glycoproteins primarily by the non-glycosylated peptides in the isolated glycoproteins.The advantages are high sensitivity and ease of use, but the top down approach does not identify the actual glycopeptide(s) that bound to the lectins.Bottom up workflows directly identify the captured glycopeptides, but is technically more challenging due to the lower amount of targets.Top down and bottom up approaches generate complementary data and have both been successfully applied in glyco-biomarker discovery (see 5.1.2).Modified versions of lectin affinity chromatography has been reported including Serial Lectin Affinity Chromatography (S-LAC) which uses a series of sequential lectin affinity steps (Durham and Regnier 2006) or Multi-lectin Affinity Chromatography (M-LAC) which combines 3 or more different lectins for one-step isolation (Yang and Hancock 2004;Ahn et al. 2010;Na et al. 2009).Both methods can be incorporated into the top down and bottom up workflow.However, the bottom up workflow is preferred for S-LAC as proteins with more than 1 glycosylation site with binding affinity to both lectin, may not be identified by the second lectin.S-LAC using ConA and JAC was shown to be efficient for enriching O-linked glycopeptides, since ConA removes most N-linked glycopeptides containing mannose which will facilitate the binding of O-linked glycopeptides to Jacalin (Durham and Regnier 2006).M-LAC is also an effective system to simplify complex samples allowing enrichment of approximately 50% of the plasma proteome in one-step (Dayarathna, Hancock, and Hincapie 2008).The bound fraction of M-LAC using ConA, WGA and JAC has been used by Zeng and others for the initial identification of candidate biomarkers in serum from breast cancer patients (Zeng et al. 2011).M-LAC was coupled with 1D SDS-PAGE, isoelectric focusing and lectin-overlay antibody microarray to identify several glycoproteins such as alpha-1B-glycoprotein and complement C3 as potential candidates (Zeng et al. 2011).Kullolli et al. further developed M-LAC into a high performance multi-lectin affinity chromatography (HP-MLAC), involving targeted albumin and immunoglobulin depletion in-line with glycoprotein affinity isolation using M-LAC (Kullolli, Hancock, and Hincapie 2010).This method has shown reproducibility and consistency of the bound and unbound fraction over 200 runs which promises to provide quality plasma glycoproteome data for clinical proteomics.

Technical aspects of lectin affinity enrichment
Although widely used, significant binding of non-glycosylated proteins during lectin affinity enrichment has been reported (Lee et al. 2009).Potential causes of the non-specific binding include the presence of protein complexes and prolonged incubation leading to non-specific binding to support beads.To optimize binding conditions, we investigated glycoprotein capture using Concanavalin A (ConA)-magnetic beads with a range of mild to stringent binding buffers, using a short incubation time of 30 minutes (Loo, Jones, and Hill 2010).In order to disrupt protein-protein complexes which may result in binding of non-glycosylated proteins to lectin beads, we included a reducing agent (1 mM DTT) and a strong detergent (0.2% SDS) in the binding and washing steps.Although this resulted in ~20% loss of protein binding compared to previous lectin-affinity buffer (Yang et al. 2006), we still observed strong affinity between lectin and their cognate glycans (Loo, Jones, and Hill 2010).Using the most stringent buffer condition, we have shown reproducibility of lectin-glycoprotein binding, confirming this buffer condition helps to avoid non-specific binding of lectins while enriching for glycoproteins with the highest affinity to the individual lectins (Loo, Jones, and Hill 2010).

Application of lectin affinity enrichment in biomarker discovery
Top down workflows that incorporate lectin affinity chromatography have been used to identify potential biomarkers in diseases including psoriasis (Plavina et al. 2007), hepatocellular carcinoma (Na et al. 2009), diabetic nephropathy (Ahn et al. 2010) and bladder cancer (Yang et al. 2011).Plavina et al. depleted the two most abundant plasma proteins, albumin and immunoglobulin, and performed M-LAC consisting of ConA, WGA and JAC to identify numerous tissue leakage proteins present in plasma at low ng/mL concentrations, such as galectin-binding protein 3, which was subsequently verified by ELISA (Plavina et al. 2007).Na et al. used M-LAC consisting of ConA, WGA, JAC, SNA, and AAL and 2D-DIGE with liver tissue samples to identify human plasma carboxylesterase 1 as a potential biomarker for hepatocellular carcinoma (Na et al. 2009).Ahn et al. used M-LAC to capture plasma glycoproteins and found 13 up-regulated and 14 down-regulated glycoproteins in diabetic nephropathy (Ahn et al. 2010).Yang et al. used ConA and WGA for dual-lectin affinity chromatography to enrich for glycoproteins in urine to identify biomarker candidates for bladder cancer and identified 265 glycoproteins with higher abundance in the cancer group compared to the control group (Yang et al. 2011).While there was an overlap of the proteins identified, 240 glycoproteins were uniquely identified by each of the methods.Furthermore, lectin affinity chromatography of glycoproteins has been used for a cell cycle study which combined MAA-affinity chromatography of glycoproteins from cell lysates of the cervical cancer cell line, HeLa cells, and periodate labeling of membrane proteins of intact cells coupled to hydrazide chemistry, to identify distinct expression patterns during the cell cycle which demonstrated a 4-fold change in membrane protein expression during different cell cycles (McDonald et al. 2009).Bottom up lectin-affinity has also been successfully applied in glyco-biomarker discovery.For example, Drake et al. utilized immunoaffinity depletion and subsequent M-LAC with SNA and AAL to identify 122 human plasma glycoproteins with 247 unique glycosites (Drake et al. 2011).Alvarez-Manilla et al. used ConA-sepharose to identify 18 glycoproteins unique to mouse embryonic stem cells and 45 proteins exclusively found in cells of differentiated embryoid bodies (Alvarez-Manilla et al. 2010).Furthermore, the bottom up method coupled with filter-aided sample preparation (FASP) was shown to detect 6367 Nglycosites on 2352 proteins which accounts for 74% of known mouse N-glycosites and 5753 unique sites in four mouse tissues and blood plasma, demonstrating the ability of lectin affinity chromatography techniques to enrich for glycopeptides (Zielinska et al. 2010).

Lectin magnetic bead array for high-throughput glyco-biomarker discovery and preclinical verification
Differential binding to a panel of lectins (a lectin signature) can be used as disease biomarker.This is the principle behind lectin microarrays (see section 5.3) for known target proteins, however, there is a lack of high-throughput methodology for de novo discovery of lectin signatures for potential glyco-biomarkers.To this end, we introduced the concept of a high-throughput lectin-magnetic bead array (LeMBA), consisting of a panel of individual lectin-magnetic beads arrayed in a microplate (Loo, Jones, and Hill 2010).The use of magnetic beads allows liquid handler-assisted automation to increase the throughput while assessing individual lectin-binding sub-glycoproteomes.Direct coupling to LC-MS/MS for glyco-protein (top down) or glyco-peptide (bottom up) analysis enables the simultaneous identification of glyco-biomarker and its lectin signature.While most (glyco-)biomarker discovery workflows focus on low abundance proteins in the serum/plasma, LeMBA-MS screens for specific glycan structure changes by determining the lectin signatures of the glyco-proteome.Hence, instead of identifying new, low abundance proteins secreted or leaked by the diseased cells, the LeMBA approach focuses on alteration in the glycosylation structure of medium-to highabundance secreted proteins.Since altered glycosylation of secreted and/or cell surface proteins reflects cell function and hence disease progression (Pan et al. 2011;Packer et al. 2008), this approach is likely to discover disease-relevant glyco-biomarkers.Previous studies aimed to find glyco-biomarkers have identified high abundance proteins in the blood as potential biomarker candidates, such as haptoglobin (Yoon et al. 2010;Fujimura et al. 2008), hemopexin (Comunale et al. 2009), transferrin (Zeng et al. 2011;Bones et al. 2010) and alpha-1B-glycoprotein (Zeng et al. 2011).LeMBA results will be trading low abundance for high specificity as glycosylation changes detected by multiple lectins will be unique for the altered glycan structure.This approach also holds promise for early diagnostic biomarkers since detection of low abundance early diagnostic markers is extremely difficult to achieve with any throughput using the current detection systems and workflows.If glycosylation changes are identified in early stages of diseases in medium to high abundance proteins, these changes can be developed into biomarkers with reasonable sensitivity and specificity as the proteins carrying the altered glycan will be easy to detect.Taken together, it is expected that candidate biomarkers resulting from LeMBA-MS screen will increase the sensitivity and specificity of glyco-biomarker, owing to the ability of lectin signatures to identify overall and subtle changes.For biomarker discovery phase 2, combinations of lectin signatures that show the biggest changes between normal and disease will result in a panel of potential biomarker candidates that can be verified using LeMBA coupled to SRM-MS for verification and antibody-overlay lectin microarrays for further validation (Boja and Rodriguez 2011).

Lectin microarray as high-throughput glyco-biomarker validation assay
Since their introduction in 2005, lectin microarrays have emerged as a new technology that utilizes lectins as a glyco-profiling tool.A typical microarray contains 6 to 43 lectins immobilized on a solid surface and binding of glycoproteins to lectins is, in most cases, detected by standard fluorescence microarray scanners (Gemeiner et al. 2009).Lectin microarrays are a rapid, sensitive and high-throughput screening tool, highly suitable for all phases of glyco-biomarker discovery, depending on the type used.

Types of lectin microarrays and their use in biomarker discovery
Generally, there are two types of lectin microarrays: the direct assay and reverse-phase dotblot lectin array (Gemeiner et al. 2009;Gupta, Surolia, and Sampathkumar 2010).The direct assay format immobilizes lectins on a solid surface and applies prelabeled sample over the surface.On the other hand, reverse-phase dot-blot lectin array immobilizes glycoproteins on a solid surface and applies prelabeled lectins.These two types have been used for biomarker discovery phase 1 for pancreatic cancer (Li et al. 2009;Patwa et al. 2006;Liu et al. 2010), glioblastoma (He et al. 2010), HCC (Zhao et al. 2007) and colorectal cancer (Qiu et al. 2008) to investigate differential glycosylation between control and disease.The direct assay can also be modified into a sandwich assay called the antibody-overlay lectin microarray (ALM) or lectin-overlay antibody (LAM).In ALM, lectins are immobilized on a solid surface; glycoproteins are added, followed by a biotinylated antibody overlay that binds to the protein.Then, streptavidin with a fluorophore attached is added, and the fluorescence is detected.The difference between ALM and LAM is that in LAM, the antibody is attached to a solid surface and biotinylated lectins are overlaid to bind to the glycan structure (Fig. 4).These types of lectin microarrays may be used for biomarker discovery phase 3 and higher and can be developed into clinical assays with a condition that they are reproducible with less than 10% CV (Fung 2010).

Technological aspects of lectin microarrays for phase 3 and above biomarker assay development
Preserving the carbohydrate recognition domain (CRD) is important for the reproducibility of the assay for assays with immobilized lectins.Popular methods of lectin immobilization include adsorption on nitrocellulose, attachment of amine functional group of protein backbone of lectins to a solid surface through epoxy-or N-hydroxysuccinimidyl-derived ester coated glass slides (Kuno et al. 2005) and use of self-assembled monolayers of thiols on gold-coated surfaces (Zheng, Peelen, and Smith 2005).Other methods include biotinylated lectin-neutravidin bridging (Angeloni et al. 2005), DNA-driven immobilization of lectins on polystyrene latex particles (Fromell et al. 2005), and binding to hydrogel based surfaces (Koshi et al. 2006).Unfortunately, no method can control for the optimal orientation of the CRD of lectins, to maximize the lectin binding ability and for the reproducibility of the assay.Techniques such as covalent bonding of lectins by carbenes have shown to immobilize the lectins but failed to preserve the carbohydrate binding activity (Angeloni et al. 2005) indicating the importance of preserving the CRD of lectins when lectin arrays are generated.The lack of control for lectin immobilization may lead to increased of assays.The variations of spotting have been reported to be 10-20% (Kuno et al. 2005) and the variation of a reverse-phase dot-blot assay, 10% (Patwa et al. 2006), which may be too high to qualify for FDA approval.To preserve the CRD, it has been suggested that glycans of glycosylated lectins may be used as an anchor point for attachment, followed by anchoring to hydroxylamine or hydrazine containing solid surface, which would preserve the CRD of the lectin (Gupta, Surolia, and Sampathkumar 2010).Of course, not all lectins are glycosylated, but this may help lower the variation of a biomarker assay.Additionally, the LAM type may be more suitable for phase 3 and above biomarker assays to avoid this issue.As in most protein arrays, binding is, in most cases, detected by fluorescence (Pilobello and Mahal 2007;Gemeiner et al. 2009) using fluophores such as Cy3/Cy5, Alexa Fluor 555, and phycoerythrin.A number of different technologies have been introduced to increase the sensitivity of detection and salvage weak lectin-glycan bonds.Kuno et al. have introduced the use of evanescent-field fluorescence which allows in situ detection without a washing step to wash away any unbound material (Kuno et al. 2005).However, this technique requires a specialized evanescent-field fluorescence scanner.Other methods proposed include a modified fluorescence resonance energy transfer (FRET) method which demonstrated that a biomolecular fluorescence quenching and recovery (BFQR) technique can be used together with a supramolecular hydrogel matrix for the selective recognition of lectin-glycan bonds in reverse-phase dot-blot assays (Koshi et al. 2006).The use of tyramide signal amplification (TSA), which is a horseradish peroxidase (HRP)-mediated signal amplification method for ALM, has also shown to enhance signaling and therefore, increase the sensitivity of ALM over 100 times and allowed the detection of weak lectin-glycan interactions as demonstrated with as low as 20 ng of prostate specific antigen from seminal fluid (Meany et al. 2011).

Conclusions
There is no doubt that advancement in proteomics has and will contribute to protein biomarker discovery.Especially, technological advancement has enabled glyco-biomarker research.Medium to high abundance blood glycoproteins with disease-specific glycosylation structures are attractive as glyco-biomarkers, with potential for development of robust clinical assays compared to low abundance blood proteins.However, there is still a general lack of high-throughput glycoproteomics platforms to facilitate the discovery and validation of candidate glyco-biomarkers.The technologies and sample types used in the phases of glyco-biomarker discovery are critical to the final outcome, that is, development of a clinical assay.In this chapter, we highlight the potential of lectins as a unifying glycan affinity tool for glyco-biomarker discovery.Lectin-based glycoprotein enrichment methods such as lectin affinity chromatography and high-throughput LeMBA can be coupled with LC-MS/MS to generate candidate biomarkers (phase 1 biomarker discovery).After the discovery of potential biomarkers, lectin affinity techniques such as LeMBA can be coupled to SRM-MS for high-throughput verification of a large number of patient samples.Finally, for phase 3 and onwards, ALM or LAM type lectin microarrays or lectin-coupled immunosorbent assays can be used for further validation of the biomarker assay to ensure high clinical and analytical performance.Having a unifying affinity reagent will improve the consistency and, therefore, success rate of transfer between the phases of biomarker discovery.
Combined with the appropriate bioinformatics tools, such as the recently developed serum glycopeptide SRM atlas (Schiess, Wollscheid, and Aebersold 2009) and glycan databases (reviewed in Frank and Schloissnig 2010), glyco-biomarker discovery and validation will surely contribute biomarker research.