Comparative analysis of the human serum N- glycome in lung cancer, COPD and their comorbidity using capillary electrophoresis

pulmonary are prevalent ailments with a great challenge to distinguish them based on symptoms only. Since they require different treatments, it is important to find non-invasive methods capable to readily diagnose them. Moreover, COPD increases the risk of lung cancer development, leading to their comorbidity. In this pilot study the N- glycosylation profile of pooled human serum samples (90 patients each) from lung cancer, COPD and comorbidity (LC with COPD) patients were investigated in comparison to healthy individuals (control) by capillary gel electrophoresis with high sensitivity laser-induced fluorescence detection. Sample preparation was optimized for human serum samples introducing a new tem- perature adjusted denaturation protocol to prevent precipitation and increased endoglycosidase digestion time to assure complete removal of the N -linked carbohydrates. The reproducibility of the optimized method was<3.5%. Sixty-one N- glycan structures were identified in the pooled control human serum sample and the profile was compared to pooled lung cancer, COPD and comorbidity of COPD with lung cancer patient samples. One important finding was that no other sugar structures were detected in any of the patient groups, only quantitative differences were observed. Based on this comparative exercise, a panel of 13 N- glycan structures were identified as potential glycobiomarkers to reveal significant changes (>33% in relative peak areas) between the pathological and control samples. In addition to N- glycan profile changes, alterations in the individual N- glycan subclasses, such as total fucosylation, degree of sialylation and branching may also hold important glycobiomarker values.


Introduction
Lung cancer (LC) is one of the predominantly diagnosed cancers (11.6% of the total cases) and was the main cause of cancer deaths (18.4% of the total cancer deaths) in 2018 in both genders combined [1]. Another prevalent lung disease is Chronic Obstructive Pulmonary Disease (COPD), mostly caused by cigarette smoking, representing a great but certainly preventable risk factor [2]. COPD has recently become in the spotlight due to its elevating incidence rate, morbidity, mortality and increasing the risk of developing lung cancer, in many cases creating formidable challenges for the global healthcare systems [3]. Simultaneous diagnoses of LC and COPD (i.e., comorbidity) represent poor prognosis [2], thus treatments should be planned accordingly [4]. The high mortality rate of COPD and lung cancer could be due to the asymptomatic property at their early stages, and the lack of appropriate distinguishing diagnostic tools. While biopsy can differentiate malignant (LC) and benign (inflammation) lesions that imaging techniques are not capable in a reliable diagnostic manner, it is an invasive process with a number of risk factors including infections, pneumonia, pneumothorax, bleeding and haemoptysis, just to list the most frequent ones [5]. Furthermore, another potential drawback of biopsy is the risk of tumor spreading, i.e., during the biopsy process, possible metastasis initiating circulating tumor cells may get away from the primary tumor [6,7]. Therefore, there is an urgent need to develop non-invasive molecular diagnostic tools capable of predicting the presence and prognosis of the actual disease (LC, COPD or comorbidity) with adequate specificity and sensitivity.
Ever-growing evidence shows the importance of the N-glycosylation of proteins in biological systems, demonstrating that this post-translational modification is as essential as the polypeptide backbone itself, playing significant roles in forming their higher order structure, biochemical properties and function [8]. Suzuki et al. [9] studied the importance and roles of N-glycosylation in COPD and found that the reduction of FUT8 activity has close relation with the progression of the disease. Phillips and coworkers [10] reviewed the glycosylation aspects of lung cancer and the mechanisms of post-synthetic glycan modification during malignant transformation suggesting promising biomarkers and therapeutic possibilities based on their N-glycosylation alterations. The significance of these changes revealed that identifying N-glycan biomarkers as potential early detection markers and monitoring the treatment of lung diseases [2] can be of high importance. The human serum contains a plethora of proteins with the majority of them glycosylated [11]. Protein glycosylation is the consequence of very complex biochemical processes, regulated by a number of glycosidases and glycosyltransferases [9] resulting in diverse but protein specific glycan profiles, which affect several cellular properties such as signaling, adhesion, motility and half-life, just to mention a few important ones [12]. Genetic and environmental factors can both affect the activity of this glycosylation machinery that may lead to altered glycan structures, possibly specific for pathological changes. Comprehensive analysis of complex carbohydrates requires high sensitivity methods with enhanced resolution because of their great structural diversity. The most frequently applied techniques for N-glycan analysis are high-performance liquid chromatography (HPLC), capillary electrophoresis (CE), mass spectrometry (MS) and NMR, or the combination of those such as LC-MS or CE-MS [13].
In several cases, N-glycosylation alterations affect the high abundant and acute phase proteins, including immunoglobulin G (IgG) [14,15], immunoglobulin A (IgA) [11,16,17], alpha-1-antitrypsin (AAT) [18], haptoglobin (HP) [19,20] and transferrin (TR) [21,22]. Pavić et al [15] analyzed the human plasma and its IgG subset of COPD patients and healthy controls by UPLC and provided new insights into plasma protein and IgG N-glycome changes. The plasma protein Nglycome significantly decreased in low branched type and increased in more complex glycan structures in COPD patients. Ito et al. investigated the N-glycans of a lung-specific protein, surfactant protein D, using matrix-assisted laser desorption/ionization quadrupole ion trap time-offlight mass spectrometry and found that the fucosylation level was greatly elevated in COPD patients compared to controls [23]. Rudd and coworkers analyzed the serum of lung cancer patients and healthy controls by high performance hydrophilic interaction liquid chromatography (HILIC) and anion exchange chromatography [24]. They found that the level of bi-(A2), tri-(A3) and tetra-antennary (A4) glycans were significantly increased in lung cancer compared to healthy controls and a reduction was observed in the amounts of core-fucosylated bi-antennary structures. In addition to total human serum, one of the abundant proteins, haptoglobin was analyzed and similar alterations were found in both sample types. Váradi et al. also studied the human serum haptoglobin in lung diseases and emphasized the importance of measuring the core-to arm-fucosylation ratios [20]. The slight decrease in the total fucosylation level of haptoglobin in the serum of COPD and pneumonia patients compared to the control group was the result of a significant decrease in arm-fucosylation and a slight increase in core-fucosylation. Elevated amounts of core-(FA4G4) and antennary-fucosylated tetra-antennary glycans (A4FG4) in haptoglobin were also observed comparing lung cancer and COPD patient groups. Ruhaak et al. investigated the N-glycans of several high abundance glycoproteins enriched by affinity capture from plasma samples of lung adenocarcinoma patients as well as healthy controls and analyzed with nano-HPLC-chip-TOF-MS to search for lung cancer biomarkers [25].
They found that, while the N-glycan profiles of blood-derived glycoproteins could be adequate biomarkers for lung cancer, protein enrichment did not improve specificity, but made the method more complicated. Liang et al. [18] examined different types of pulmonary diseases including lung adenocarcinoma, squamous cell lung cancer, small-cell lung cancer as well as several benign ones, such as pneumonia, pulmonary nodule and tuberculous pleuritis and suggested that the N-glycosylation patterns of α-1-antitrypsin could be a potential lung cancer biomarker.
In this paper we report on a pilot study of the N-glycosylation profiles of pooled human serum samples from patients with COPD, lung cancer and their comorbidities (COPD with LC) compared to control healthy subjects (pool of 90 patients each) using capillary electrophoresis with laser-induced fluorescent detection (CE-LIF). A new temperature adjusted denaturation protocol was used to prevent precipitation and the endoglycosidase digestion time was increased to assure full removal of the N-linked oligosaccharides. Sixty-one N-glycans were identified in the pooled, control human serum samples based on data in publicly available GU databases and exoglycosidase digestion based carbohydrate sequencing. The relative peak areas of the identified N-glycans were used in a comparative quantitative evaluation to find a potential preliminary glycobiomarker panel to differentiate lung cancer, COPD and their comorbidity from each other and from the control.

Sample preparation
All serum samples were collected with the appropriate Ethical Permissions (approval number: 23580-1/2015/EKU (0180/15)) and Informed Patient Consents at the Department of Pulmonology in the Semmelweis Hospital (Miskolc, Hungary). For this pilot study, the samples were pooled in order to efficiently determine all possible glycan structures in the pooled samples [26]. Serum samples from 90 healthy individuals (control), 90 lung cancer patients, 90 COPD patients and 90 patients with comorbidity of COPD with lung cancer were separately pooled.
Preparation of human serum samples included denaturation, glycan release, fluorophore labeling and magnetic bead mediated cleanup. First, 2 µL of serum samples were diluted with HPLC grade water to 10 µL. Since the Fast Glycan Sample Preparation and Analysis protocol was optimized for purified IgG samples, to avoid possible precipitation issues with the more complex serum samples, a modified denaturation protocol was used by adding 5 µL denaturation solution and applying 40°C for 10 min followed by 70°C for 10 min. The glycan release process was also modified and performed with the addition of 1.0 µL of PNGase F enzyme (200 mU) to the reaction mixture and incubated at 60°C for 1 h instead of 20 min, to ensure complete removal of the serum N-glycome. The endoglycosidase digestion reaction was stopped by the addition of 120 µL of ice-cold acetonitrile to precipitate all remaining protein/polypeptide content. This was followed by centrifugation for 5 min at 13,500 RPM (BioSan, Biocenter Ltd, Hungary). The supernatant that contained all released sugars was dried under reduced pressure at 60°C for 1 h in a SpeedVac (Jouan RC 10.10 Vacuum Concentrator Centrifugal System, Jouan, San Fransisco, CA, USA). The dry samples were reconstituted in the labeling solution containing 4.0 µL of 40 mM 8-aminopyrene-1,3,6-trisulfonic acid (APTS) in 20% acetic acid, 2.0 µL of NaBH 3 CN (1 M in THF) and 4 µL 20% acetic acid. The reaction mixture was incubated in a heating block using a modified evaporative labeling protocol with closed vial cap at 50°C for 60 min, followed with open cap at 55°C for 80 min [27]. The labeling reagent was added in great excess to avoid competition with any possible remaining amine containing molecules. After the labeling step, the samples were purified by magnetic beads following the Fast Glycan Sample Preparation and Analysis protocol and analyzed by CE-LIF. Exoglycosidase digestions were performed by consecutive additions of sialidase A to remove all α 2-3,6,8-linked sialic acids, Jack bean galactosidase to remove β1-4,6-linked galactose residues and Jack bean hexosaminidase to remove the β1-2,4,6-linked N-acetyl-glucosamines by respective overnight incubations at 37°C as described earlier in [28].

Capillary electrophoresis
Capillary electrophoresis analyses with laser induced fluorescent detection (CE-LIF) were performed using a PA800 Plus Pharmaceutical Analysis System (SCIEX). All CE measurements were accomplished in 40 cm effective length (50 cm total length), 50 µm ID bare fused silica capillaries filled with the HR-NCHO separation gel buffer (SCIEX). 30 kV electric potential was applied during the separation steps in reversed polarity mode (cathode at the injection side, anode at the detection side) at 30°C. To increase detection sensitivity and reproducibility, a three-stage sample injection procedure was used: Step 1) 1.0 psi for 5.0 sec water pre-injection, Step 2) 3.0 kV for 3.0 sec sample injection and Step 3) 2.0 kV for 2.0 sec bracketing standard injection. This latter was used for high precision GU value determination by the GUcal software (www.GUcal.hu) [29]. Data collection and analysis were implemented by the 32Karat (version 10.1) software package (SCIEX). Relative percentage area values of the separated peaks were calculated by the PeakFit v4.12 Software (SeaSolve Software Inc., San Jose, CA).

Results and discussion
In this pilot study, a new and improved sample preparation method was applied to accommodate the high complexity of the human serum samples including temperature adjusted denaturation, extended glycan release and evaporative labeling protocol. Capillary electrophoresislaser induced fluorescence detection was used to analyze and compare the N-glycosylation patterns of pooled human serum samples from 90 patients each with chronic obstructive pulmonary disease (COPD), lung cancer (LC), and their comorbidity (COPD with LC) to healthy individuals (as control). The inter-and intraday variability of the optimized method were under 3.5% for the pooled human serum samples. For better comparability and consequently easier structural identification, the timescale of the acquired electropherograms were converted from migration time to GU values using the GUcal software (freely available from GUcal.hu) [29]. A representative electropherogram of the pooled healthy human serum sample N-glycome is shown in Fig. 1, featuring the separation of 61 peaks. Structural elucidation of all separated N-glycans utilized direct mining of GU database entries (Gly-coStore.org), exoglycosidase digestion based carbohydrate sequencing [28], comparison to oligosaccharides released from carefully chosen glycoprotein standards (ribonuclease B, immunoglobulin G, α-1-antitrypsin), and some earlier published literature data on the same subject matter [30][31][32]. The exoglycosidase based glycan sequencing process is shown in Fig. 2, utilizing sialidase A, galactosidase and hexosaminidase, depicted by the corresponding traces. Sequence information was derived from the GU value shifts of the individual peaks as the result of the consecutive exoglycosidase treatments [33]. Table 1 lists all identified N-glycan structures in the pooled human serum sample as numbered in Fig. 1. The first level of structural elucidation of the separated glycans accomplished by using their GU values to search the GlycoStore database (glycostore.org). Glycans denoted by # were identified considering the results of the sequential exoglycosidase digestion process shown in Fig. 2. Structure identification of entries marked with asterisks were accomplished based on a comparative exercise utilizing the N-glycan profiles of IgG (*), RNase B (**) and AAT (***) [32,34,35]. It is important to note that besides these 61 carbohydrates, no other glycan structures were detected in the pathological samples compared to the control.
After the identification of the separated N-glycan structures in the pooled healthy human control serum sample, the relative peak areas of all peaks were computed and quantitatively compared to lung cancer, COPD and their comorbidity (COPD with LC) pooled sample data with their respective SD values based on the triplicate runs. Before the integration step for peak area determination, all electropherograms were normalized to peak 61 (FA2BG2), i.e., the RFU values were divided by the RFU value of peak 61, because it was apparently stable in size in all runs and well-separated from the other peaks. This normalization step was necessary to adequately calculate the peak areas of some of the not completely separated peaks. After the normalization step, the relative peak area % values were calculated by dividing the integrated peak areas by the sum of all integrated areas and multiplied by 100. Table 1 lists the suggested glycan structures corresponding to the separated peaks in Fig. 1, the calculated CE-LIF GU values and relative peak areas with their SDs. Only peaks with > 1% relative peak areas were taken into consideration in this comparative pilot study (highlighted bold in Table 1). All runs were done in triplicates and the average RSD of the relative peak areas was 3.46%. Please note that rows 62-78 in Table 1 represent the peaks appeared during exoglycosidase digestion after each sequential analysis step.
Changes in relative peak areas of the pooled lung cancer, COPD and comorbidity (COPD with LC) samples were compared to the pooled healthy control, based on their capillary electrophoresis analysis results. As shown in Table 2, only glycans with significant alterations were taken into consideration during the evaluation process, i.e., Nglycan structures satisfying the following two criteria: 1) the relative peak areas of the N-linked glycan structures were > 1%, at least in one sample group of healthy control, COPD, lung cancer, or their comorbidity (bold structures in Table 1); and 2) at least one of the observed relative peak area differences between any of the disease groups and the control was > 33%.

Conclusion
In this pilot study, the N-glycosylation profiles of patient samples of chronic inflammatory (COPD) and malignant (LC) pulmonary diseases as well as their comorbidity (COPD with LC) were quantitatively studied and compared to healthy controls to get a better insight about the glycan structures/ratios, which can be expected in each pool. A novel temperature adjusted denaturation protocol as well as extended enzymatic release and evaporative derivatization time was used for the asparagine linked oligosaccharides from the complex serum samples, which were then analyzed by capillary electrophoresis with high sensitivity laser induced fluorescence detection. Sixty-one N-glycan structures were identified in the control human serum samples and since no other glycans appeared in any of the three disease categories, these 61 structures were quantitatively monitored in this study. Our results suggested that certain serum N-glycans could be used as potential markers for the different types of pulmonary diseases. Therefore, the panel of the 13 glycans listed in Table 2 could be considered to differentiate lung cancer, COPD and their comorbidity from the control as well as LC from COPD. In addition, alterations in the N-glycan subclasses, such as fucosylated, mono-, bi-, tri-and tetra-sialylo, as well as mono-, bi-, tri-and tetra-antennary glycans could also carry interesting diagnostic information. The glycan panel in Table 2 and the corresponding subclasses may provide even more reliable information as they represent the sum of multiple structural changes caused by a given Table 2 Comparison of the relative peak areas of lung cancer (LC), COPD, their comorbidity (COPD with LC) to the control sample and between LC and COPD along with their SD, where at least one observed difference was > 33% (bold numbers) for any peak with > 1% relative area between the disease and control samples.  Fig. 3. Alterations among the relative peak areas of specific N-glycan subclasses (Sialoforms: mono-, bi-, tri-and tetra-sialo; Fucosylated and Branching: mono-, bi-tri-and tetra-antennary) of lung cancer (black), COPD (gray) and their comorbidity (dark gray) with their corresponding RSDs. The results were calculated based on the data in supplementary Table 1. disease. This is especially applicable for the highly branched sialylated structures as our recent genotyping data suggested significant increase of MGAT5 activity, i.e., increased branching in lung cancer [37]. Currently we are in the process of collecting 300 samples from each disease groups that will be individually analyzed in view of our preliminary pooled sample based results.