CD5 molecule-like and transthyretin as putative biomarkers of chronic myeloid leukemia - an insight from the proteomic analysis of human plasma

Better and sensitive biomarkers are needed to help understand the mechanism of disease onset, progression, prognosis and monitoring of the therapeutic response. Aim of this study was to identify the candidate circulating markers of chronic-phase chronic myeloid leukemia (CP-CML) manifestations, having potential to develop into predictive- or monitoring-biomarkers. A proteomic approach, two-dimensional gel electrophoresis in conjunction with mass spectrometry (2DE-MS), was employed for this purpose. Based on the spot intensity measurements, six proteins were found to be consistently dysregulated in CP-CML subjects compared to the healthy controls [false discovery rate (FDR) threshold ≤0.05]. These were identified as α-1-antichymotrypsin, α-1-antitrypsin, CD5 molecule-like, stress-induced phosphoprotein 1, vitamin D binding protein isoform 1 and transthyretin by MS analysis [PMF score ≥79; data accessible via ProteomeXchange with identifier PXD002757]. Quantitative ELISA, used for validation of candidate proteins both in the pre-treated and nilotinib-treated CP-CML cases, demonstrate that CD5 molecule-like, transthyretin and alpha-1-antitrypsin may serve as useful predictive markers and aid in monitoring the response of TKI-based therapy (ANOVA p < 0.0001). Two of the circulating marker proteins, identified in this study, had not previously been associated with chronic- or acute-phase myeloid leukemia. Exploration of their probable association with CP-CML, in a larger study cohort, may add to our understanding of the disease mechanism besides developing clinically useful biomarkers in future.

Chronic myeloid leukemia (CML) is a myeloproliferative neoplasm, which results from reciprocal translocation between chromosome 9 and 22 t(9;22) (q34;q11) [Philadelphia chromosome] generating BCR-ABL, a tyrosine kinase encoding oncogene 4 . During the course of CML progression (chronic-, accelerated-and blast-crises phases), underlying gradual amplification of BCR-ABL-driven genomic instability and secondary modifications at genetic/epigenetic levels are believed to have major knock-on effect in altering and activating the expression of different mitogenic, anti-differentiating and anti-apoptotic modulators and mediators with resultant profound influence on the proteome profiles [4][5][6] . Analysis of these altered protein profiles in patients and their healthy counterparts are likely to assist in keeping track of the underlying concealed perturbations while expediting the search for novel diagnostic biomarkers and therapeutic targets of the disease.
This study is designed for comparative plasma proteome analysis of chronic phase-chronic myeloid leukemia (CP-CML) patients to meet two-fold objectives i.e., 1) identify novel, differentially expressed proteins in peripheral blood plasma having potential to develop into predictive-or therapy-associated biomarkers, and 2) make modest yet significant contribution in the international efforts for building up of a precise, accurately annotated universal plasma proteome map by providing protein data from the South Asian region, in particular Pakistan. Findings of this work present two novel, potential candidate biomarkers of myeloid leukemia.

Results
2D proteome profiling and mass spectrometric analysis. Plasma proteins of the healthy and CP-CML subjects, enrolled in this study, were individually resolved by 2DE in three independent experiments, over a pH range 4 to 7 (Fig. 1, Tables 1 and 2). On average, 198 ± 76 spots in the healthy and 172 ± 83 spots in the CP-CML plasma appeared, when the 2D-gels were stained with colloidal-Coomassie. Altogether 68 ± 11 gel spots showed at least one-fold difference in intensities [as assessed by Dymension software (v 3.0.1.2)] and were therefore considered for MS analysis; those exhibiting minor or inconsistent changes were ignored. From the pooled control and the individual CP-CML samples, ~1300 gel spots were subjected to MALDI-TOF MS analysis, which led to the identification of 33 distinct proteins and/or their respective isoforms/subunits (Table 3). To address reliability issue of the peptide mass fingerprinting identification, PMF score for individual identifications was calculated Representative gel images of three independent experiments were merged to have a composite map in the pH range 4 to 7. Protein spots were visualized by staining with colloidal Coomassiee brilliant blue G250 and are numbered with N-and L-labels for normal and CP-CML samples, respectively. The protein spots showing at least one-fold difference were only considered for identification by MS analysis; those exhibiting minor or inconsistent changes were ignored and are therefore unlabeled. using 79 as the cut-off value for positive hits 7 . The proteomics data was deposited to the ProteomeXchange Consortium 8 via PRIDE partner repository (http://www.ebi.ac.uk/pride/) with the dataset identifier PXD002757. When analyzed, majority of the identified proteins were represented by multiple spots in the CP-CML and the control groups (Fig. 1, Table 3). For instance, alpha-1-antitrypsin (AAT) and alpha-1-antichymotrypsin (AACT) are represented by 4 and 7 spots, respectively, in the CP-CML samples while the same proteins were represented by 3 and 4 spots in the healthy counterparts (Fig. 2). Slight to moderate pI or mass shifts between theoretical and experimentally-calculated values were also noticed. These observations seem to be the result of post-translational modifications such as phosphorylation, glycosylation and/or proteolytic cleavage that are likely to affect the electrophoretic mobility, stability, folding and interactions of the proteins and may be responsible for different protein isoforms. To substantiate that the discrepancy observed in the mass or pI is due to the glycosylation (a widely observed, structurally diverse event), the peptide mass fingerprinting data was subjected to the N-linked glycosylation analysis using NetNGlyc 1.0 (http://www.cbs.dtu.dk/services/NetNGlyc/) webserver. The glycosylation was predicted in many of the identified protein sequences namely AACT, AAT, VDBP, HP, etc., with very high scores (threshold ≥ 0.5) suggesting that mass and pI shifts in these proteins may be attributed to post-translational glycosylation (Supplementary data 1).
Prior to identifying the differentially-represented proteins we, therefore, summed up the intensities of the multiple spots of the same protein and applied the paired t-test followed by false discovery rate (FDR) determination as described previously 9 . The cut-off value for FDR (the probability of expected type 1 error in null hypothesis) was set as ≤ 0.05 to demonstrate that 95% findings are accurate. Only six proteins qualified the three-tier criteria that was set for screening of potential candidate biomarkers [p-value ≤ 0.05, FDR ≤ 0.05, PMF score ≥ 79; Table 3] and these were AAT, AACT, stress-induced phosphoprotein 1 (STIP1), CD5 molecule-like (CD5L), transthyretin (TTR) and vitamin-D binding protein (VDBP). Former four proteins were found at higher abundance while the later two showed decreased levels in colloidal Coomassie-stained gels of CP-CML in comparison with the control group. We selected all six differentially-represented proteins for further validation along with two statistically-insignificant/invariable proteins [haptoglobin (HP) and fibrinogen γ (FGG)], as control (Table 3).
Immunological validation of candidate marker proteins. Validation of candidate proteins in the preand post-treatment CP-CML patients was performed using quantitative ELISA. Blood samples from 17 patients, out of the 32 initially enrolled subjects, who had undergone Tyrosine Kinase Inhibitor (TKI)-based therapy (nilotinib) for one year, were redrawn. Other patients (n = 15) could not become the part of this follow-up study either because of their demise or non-traceability.
Except VDBP, all the candidate proteins showed differentiating expression patterns as manifestation of the disease (Fig. 3). More importantly, mean plasma concentration of CD5L in the pre-treated CP-CML subjects was 16.60 ± 7.99 ng/ml, which is nearly seven-times higher than the control group (2.29 ± 1.23 ng/ml). In the  nilotinib-treated CP-CML (PT) cases, the normal levels of CD5L were, however, restored [(2.77 ± 1.37 ng/ml), Fig. 3B]. Likewise, prior to treatment, the patients group showed down-regulated expression of plasma TTR but they regained the normal levels following nilotinib therapy (Fig. 3C). The response of other candidate proteins viz. AAT, AACT and STIP1, in pre-and post-treatment CP-CML subjects was also not different from the above two markers; their ANOVA p-value was, however, higher than 0.0001 (Fig. 3D-F).
In-silico characterization and pathway analysis. The molecular functions and biological processes, in which the MS identified proteins are involved in, according to the Gene Ontology database, were analyzed (Fig. 4A,B). As shown, majority of the proteins belong to the category of enzymes (9%), enzyme modulators (18%), transfer/carrier proteins (9%), immunity/defense proteins (15%), receptors (6%) and/or signaling molecules (15%). Interactive links between 11 such proteins could be traced using STRING and MetaCore TM programs and are illustrated in the form of a curated pathway (Fig. 4C). This curated pathway was used as scaffolding to establish association of the elevated levels of plasma CD5L, AAT, AACT, STIP1 etc. in Philadelphia positive CP-CML cases.
Although it is difficult to speculate the exact correlation of each protein or node, nonetheless our results reinforce the earlier findings that the BCR-ABL constitutive tyrosine kinase activity exerts strong influence on the apoptotic-and immunity/defense-related biofunctions 10,11 . This oncoprotein activates many signaling cascades including the Janus kinase (JAK) signal transducers and activators of transcription (STAT) pathway, a pathway that is frequently triggered in both acute and chronic forms of myeloproliferative diseases. Besides activating the JAK-STAT, BCR-ABL induces the production of JAK2-activating cytokines viz. interleukin-3 (IL-3), IL-6, granulocyte macrophage colony stimulating factor (GM-CSF), G-CSF, etc. This cytokine enriched microenvironment is capable of activating the STAT3 and STAT5 signaling pathways via JAK-2, in a BCR-ABL independent fashion 10-12 . Thus, elevated levels of circulating STIP1, AACT, AAT and CD5L, in the present study, are likely to be the consequence of aberrant STAT signaling and constitutive activation of STAT3 and STAT5.

Discussion
Myeloproliferative neoplasm CML is clinically diagnosed using a combination of complete blood cell count, molecular/cytogenetic testing and bone marrow aspiration and biopsy; blood-based protein biomarkers for screening-or monitoring the therapeutic response are, however, lacking. During the past decade, proteomic-and metabolomic approaches encompassing comparative analysis of proteins, peptides or small metabolites in healthy and diseased states has aided the discovery of several hundred candidate biomarkers of cancer diagnosis and/or prognosis and hence provided better insight into the disease mechanisms [13][14][15] . With an objective of identifying the robust, clinically-applicable, blood-based protein biomarkers of CML, we have compared the plasma proteome profiles of CP-CML subjects and their healthy counterparts using 2DE in conjunction with MALDI-TOF MS.
During the initial screening, eighteen proteins were found differentially-represented with FDR value ≤ 0.1 (90% confidence for accuracy) and amongst these six proteins displayed differential staining with better FDR value ≤ 0.05 (95% confidence for accuracy). These six proteins (Table 3, shown in bold) were selected for further validation, wherein except VDBP, all candidate biomarkers showed potential to discriminate the healthy control group from the patients and the pre-treatment cases from the post-treatment CP-CML group. AAT, TTR and CD5L proteins with ANOVA p-value ≤ 0.0001 appears to be of particular interest as they were better able to predict the patients' clinical behavior and therapeutic response.
AAT, a 54 kDa glycoprotein is a serine protease inhibitor, which earlier has been described to be associated with tumor progression and metastasis in a wide spectrum of cancers including CML 16,17 . There is, however, no direct evidence in the literature showing the association of myeloid leukemia (either CML or AML) with differential abundance of TTR and/or CD5L. Thus not AAT but the other two proteins appear to be novel. Amongst these, TTR is an extracellular protein which is synthesized in the liver and is involved in the transport of thyroxin from blood to brain besides acting as a carrier of retinol 18 . In comparison with Chinese healthy subjects wherein plasma TTR levels have been reported as 129 ± 15.6 μ g/ml 19 , our healthy control group showed significantly lower levels both in males (108 ± 31.99 μ g/ml) and females (63.48 ± 24.29 μ g/ml). Pronounced gender-associated differences in circulating TTR concentrations are also obvious. This is not surprising as many proteins including haemoglobin have shown gender-related differences in clinical settings. Much lower TTR levels in our control group, however, are somewhat interesting because Liu et al. 19 proposed an optimal cut-off value of 115-and 88.5 μ g/ml,

Sr. No
Protein Name  Table 3.

List of proteins identified in the plasma samples of controls and the CP-CML subjects by MALDI-TOF MS.
Six proteins that qualified the three-tier criteria i.e., p-value < 0.05, FDR ≤ 0.05 and PMF score > 79, are shown in bold. * FDR determination (the probability of expected type 1 error in null hypothesis) was performed according to the method of Diz et al. 9 . Value ≤ 0.05 indicates that 95% findings are accurate/true. ** PMF score, for each identification, was calculated as described by Stead et al. 7 using 79 as the cut-off value for positive hits.
respectively to discriminate the healthy subjects from those suffering from benign lung diseases and the lung cancer. Similar cut-off, if applied on our population, where even healthy females have circulating TTR lower than the threshold value set for the lung cancer diagnosis, is likely to result in large number of false-positives. This calls for the need of plasma proteome profiling from diverse population groups of variable ethnicity to ensure discovery of better and universally acceptable biomarkers.
Another interesting biomarker identified in this study is CD5L, a 347 amino acid long soluble, secreted protein. It is a member of SRCR superfamily, which is characterized by the presence of scavenger receptor cysteine rich (SRCR) domains with critical roles in lipid homeostasis, inflammation and immune responses 20 . In the validation study, the plasma concentration of CD5L was found significantly elevated in the CP-CML group, which dropped to the normal levels following TKI-based therapy (ANOVA p-value ≤ 0.0001 and F-value = 110.6), suggesting the effectiveness of candidate marker in monitoring the therapeutic-response as well.
In the Human Protein Atlas (http://www.proteinatlas.org), most of the cancer types such as breast, colorectal, head and neck, cervical, lung, liver, prostate, ovarian cancer, etc., were found negative for the presence of CD5L making it a specific biomarker of leukemia. However, it is of relevance that CD5L is a secretary protein and the tissue analysis may not accurately portray its expression profile. More so, significantly high levels of circulating CD5L has been reported in the patients suffering from pulmonary tuberculosis 21 , liver cirrhosis with HCV infection 22,23 and hepatocellular carcinoma with non-alcoholic fatty liver disease 24 . We have noted that the CML patients are generally immune-compromised and majority of them suffer from hepatomegaly and/or splenomegaly [ Table 2]. The question whether elevated levels of circulating CD5L in CP-CML reflects a coordinated response of infection and inflammation or relates to myeloid leukemia as a function of anti-apoptotic factor, suggests large-scale trials with enrolments of lymphoid-and myeloid-leukemia (ALL, CLL, AML, CML) patients from diverse population groups.
Taken together, in complex diseases such as cancers/leukemia, a single protein or peptide is unlikely to serve as disease biomarker in all population groups. AAT, TTR and CD5L, however, have shown potential to serve as predictive-or therapy-associated CP-CML biomarkers. Further investigation of their specific role and the cross-talk amongst the repertoires of immune-and apoptotic-effectors is likely to provide new clues about the cellular biology of myeloid leukemia.

Methods
Study population. The study population was comprised of 82 subjects in total that included healthy controls (n = 50, Table 1), BCR-ABL positive CP-CML subjects (n = 32, Table 2) and post-treatment CP-CML cases (n = 17; received nilotinib therapy for a period of one year). Informed consent was obtained from all subjects, prior to their enrolment in the research project. The study design was duly approved by the Ethical Review Committee of the School of Biological Sciences, University of the Punjab, Lahore, Pakistan [Ref. No. 873/12] and was in accordance with the principals of the Declaration of Helsinki for research involving human beings. The peripheral blood samples (3cc) from healthy donors and the CP-CML patients were collected in EDTA-coated tubes, centrifuged at 2,000 × g for 10 minutes to separate plasma and then stored at − 80 °C, in 250 μ l aliquots, until their use for analysis. The samples were processed within 30 minutes after collection.
Fractionation of proteins by two-dimensional gel electrophoresis. Total protein contents in the collected plasma samples were estimated by Bradford assay 25 using bovine serum albumin (BSA) as standard. Applying 2D-gel electrophoresis, protein fractionation was performed according to the procedure described  previously with minor modifications 26 . Briefly, the plasma sample was diluted with rehydration solution [7 M urea, 2 M thiourea, 2% CHAPS, 65 mM DTT and 0.25% Servalyte] to a concentration of 1 μ g/μ l and applied onto Servalyte 18 cm long, linear immobilized pH gradient (pH 4-7) strip (Serva Electrophoresis, Heidelberg, Germany). The dried strip was subjected to passive rehydration overnight at 20 °C and then focused on IEF flatbed (IEF-SYS, SciePlas, UK) for a total of 60kVhr.
Following first dimension IEF, strips were successively equilibrated with equilibration buffer-I [6 M urea, 2% SDS, 30% glycerol and 1% DTT in 1.5 mM Tris-Cl (pH 8.8)] and buffer-II [6 M urea, 2% SDS, 30% glycerol, 5% iodoacetamide in 1.5 mM Tris-Cl (pH 8.8)], each for 15 minutes. Equilibrated strips were aligned on 12% SDS-gel and electrophoresed at 80 V initially for 1 hour and then at 160 V until the bromophenol blue tracking dye reached the bottom of the gel. After electrophoresis, the gel was placed in fixative solution (30% ethanol, 10% acetic acid) overnight, stained with Coomassie colloidal blue dye and then destained with deionized water to a clear background. 2D gel images were scanned using Syngene gel documentation system and the individual protein spots were analyzed for pI and molecular weight, followed by their quantification and matching using the Dymension v.3.0.1.2 (Syngene, UK) software program.
In-gel digestion and mass spectrometric analysis. After matching the proteins of healthy and CP-CML subjects, individual gel spots were excised under sterile conditions, washed twice with deionized water, and then destained completely by incubation with 100 μ l of 0.2 M ammonium biocarbonate (AB) and 50% acetonitrile solution (1:1) at 37 °C for 30 min. Proteins in gel spots were thereafter reduced and alkylated by successive incubations with 100 μ l 20 mM tris (2-carboxyethyl) phosphine containing 25 mM AB and 40 mM iodoacetamide containing 25 mM AB, each at 37 °C for 30 min. in the dark. The gel pieces were washed with 100 μ l of 5 mM β -mercaptoethanol containing 25 mM AB for 15 min. at 37 °C and dried completely in a speed vac. For tryptic digestion, the gel slices were rehydrated with 20 μ l of 0.02 mg/ml sequencing grade trypsin (Promega, V511A) and left for overnight digestion at 37 °C. Resulting peptides were extracted from the gel by centrifugation, washed with 40 mM AB/acetic acid (incubation 37 °C for 30 min.) and spotted on target plate for mass spectrometric analysis.
For MALDI analysis, 1 μ l of the digested peptides was mixed with equal volume of freshly prepared saturated solution of α -cyano-4-hydroxycinnamic acid prepared in 0.1% triflouroacetic acid/acetonitrile. 1 μ l of this mixture was then spotted on to the target plate, air dried until solvent evaporation, and then analyzed using MALDI-TOF-TOF MS (Ultraflex III, Bruker Daltonics, Germany). A 337 nm nitrogen laser and a 2 GHz digitizer were used at a laser frequency of 100 Hz and an intensity of 60-70%. Spectra were obtained in linear positive ion mode with accelerating voltage of 25 kV and lens potential of 6 kV. Delayed extraction was performed at 100 ns, the detector gain was set to 7.5 and the sample rate to 0.5 GS/s. Spectra were obtained in the mass to charge (m/z) range of 1000-5000.
Functional and pathway analysis of proteins using bioinformatics tools. Peptide mass spectra obtained from MS analysis were searched against the SWISS-PROT and NCBInr databases using the MASCOT Wizard 1.1.2. from Matrix Science (www.matrixscience.com) and further confirmed by MS-fit program from Protein Prospector (www.prospector.ucsf.edu). The proteins identified with PMF score 79 or higher were considered as acceptable (27). Search parameters included Homo sapiens as species with methionine oxidation and carboxymethylation of cysteine residues as the variable and the fixed modifications, respectively, and an allowable peptide mass tolerance of 50-100 ppm. Enzymatic digestion of proteins was performed with trypsin (V115, Sigma Aldrich, USA). The identified proteins were categorized according to their GO (www.geneontology.com) annotations based on molecular functions. Network construction analyses and canonical pathways were generated through the use of MetaCore TM (Functional Genomics Centre, University of Zurich, Switzerland) and protein functional analysis software, STRING v.9.1 (string.db.org).
Statistical analysis and validation of biomarker candidates. The statistical analyses were performed using SPSS ver.20 and/or GraphPad Prism 6.01 software programs. The spot intensity differences obtained from the 2D-gel images of at least 3 different sets of independent plasma samples were analyzed by non-parametric Mann-Whitney U test. Proteins with high fold-changes (≥ 2.5) were considered for validation studies by enzyme linked immunosorbant assays (ELISA). To minimize sample handling bias or pipetting errors, pre-coated ELISA plates against human AAT (also called SERPIN A1), AACT (also known as SERPIN A3), TTR (also called as pre-albumin), CD5L, STIP1, VDBP, HP and FGG were obtained from GenWay Biotech Inc., CA, USA and Biomatik Corporation, USA, and used in the validation of candidate biomarkers. All the assays were performed in triplicates according to the recommended instructions of the supplier. One-way analysis of variance (ANOVA) with an unpaired Student's t-test (applied to check the significance of differences amongst the group mean values) was applied to compare the antibody titer between test and the control groups. Associations with a p-value ≤ 0.05 were considered as statistically significant, while those with p-value ≤ 0.0001, as clinically important.