Glycan biomarkers of autoimmunity and bile acid-associated alterations of the human glycome: Primary biliary cirrhosis and primary sclerosing cholangitis-specific glycans

We have recently introduced multiple reaction monitoring (MRM) mass spectrometry as a novel tool for glycan biomarker research and discovery. Herein, we employ this technique to characterize the site-specific glycan alterations associated with primary biliary cirrhosis (PBC) and primary sclerosing cholangitis (PSC). Glycopep-tides associated with disease severity were also identified. Multinomial regression modelling was employed to construct and validate multi-analyte diagnostic models capable of accurately distinguishing PBC, PSC, and healthy controls from one another (AUC = 0.93 ± 0.03). Finally, to investigate how disease-relevant environmental factors can influence glycosylation, we characterized the ability of bile acids known to be differentially expressed in PBC to alter glycosylation. We hypothesize that this could be a mechanism by which altered self-antigens are generated and become targets for immune attack. This work demonstrates the utility of the MRM method to identify diagnostic site-specific glycan classifiers capable of distinguishing even related autoimmune diseases from one another.


Introduction
Glycans (i.e.oligosaccharides or sugars) are one of the four fundamental molecules that make up all living systems [1].The totality of glycans within an organism is the glycome.The process that synthesizes and enzymatically attaches glycans to organic molecules is called glycosylation [2].Cell surface and extracellular proteins are commonly post-translationally modified with glycans, which can fine-tune protein function by acting as "on and off" switches or as "analog regulators" [3].However because there is no template for glycan synthesis, it is extremely difficult to predict the composition of the human glycome from gene expression data.In fact, when one considers the massive 3-Abbreviations: MRM, multiple reaction monitoring; primary biliary cirrhosis, PBC; primary sclerosing cholangitis, PSC; CID, collision induced dissociation; VIF, variance inflation factor; CDCA, chenodeoxycholic acid; OCA, obeticholic acid; GCDCA, glycochenodeoxycholate; LCA, lithocholic acid; DCA, deoxycholic acid; ConA, Concanavalin A; JAC, Jacalin from Artocarpus integrifolia; SNA, Sambucus nigra agglutinin; AAL, Aleuria aurantia lectin.dimensional structural diversity of glycans combined with their variation in attachment sites, the complexity of the glycome parallels that of the genome [3].
The National Research Council of the U.S. National Academies has highlighted the importance of glycans as biomarkers of human disease [3].It is believed that glycans play a key role in the pathophysiology of all human diseases.With respect to autoimmune disease, we put forth the Altered Glycan Theory of Autoimmunity, which argues that each autoimmune disease will have a unique glycan signature [4].Most prior glycan biomarker studies in autoimmunity have used labor-intensive methodologies to characterize glycans released from purified proteins and perhaps for this reason, detailed analyses have only been conducted on a relatively small number of patients.Lower resolution techniques, which yield limited structural information and no site-specific information, have been used to characterize larger patient cohorts, but such untargeted analyses are not ideally suited for biomarker discovery and reproducibility.As a result, multi-analyte glycan biomarkers capable of distinguishing one autoimmune disease from another have not yet been developed.
With the goal of deploying glycan biomarkers clinically, we have developed Multiple Reaction Monitoring (MRM) to characterize the human plasma glycome in a rapid, site-specific, and reproducible fashion [5].Although MRM mass spectrometry (MS) is mainly used in the fields of metabolomics and proteomics [6][7][8][9], its high sensitivity and linear response over a wide dynamic range makes it especially suited for glycan detection [10].Herein, we employ MRM MS to characterize the glycan alterations associated with PBC (primary biliary cirrhosis; an autoimmune disease of the intrahepatic bile ducts) and PSC (primary sclerosing cholangitis; an autoimmune disease of the intrahepatic and extrahepatic bile ducts).Using multinomial logistic regression we construct a multi-analyte classifier model capable of distinguishing PBC, PSC, and healthy controls from one another.Finally, as a demonstration that disease-relevant environmental factors can alter glycosylation, the ability of bile acids to alter B-cell glycosylation was assessed.This study represents a viable alternative to existing diagnostic technology tools that has the potential to define specific glycan-based markers of autoimmune disease with minimal patient sample and reliable quantification.

Study design
The objective of this study was to identify the relative abundance of site-specific glycosylations within the most abundant plasma proteins and then to use this information to distinguish multi-analyte classifiers capable of differentiating autoimmune liver diseases.PBC and PSC patients were recruited by the University of California (UC) Davis, Department of Medicine, Division of Gastroenterology and Hepatology.All patients met the clinical diagnostic criteria of PBC or PSC as defined by the American Association for the Study of Liver Diseases (AASLD) [11,12].As per the AASLD guidelines a biopsy diagnosis was not required if other diagnostic criteria were met.There were no established upper or lower age limits for enrollment.Healthy individuals were recruited from the UC Davis Medical Center as controls.The UC Davis institutional review board approved this study.All participants provided their written informed consent.Demographics are listed in Table S1.

Sample preparation
For each individual enrolled, plasma was separated from whole blood using a Ficoll gradient.From each plasma preparation a 2 μL aliquot was reduced, alkylated and then subjected to trypsin digestion at 37 • C as previously described [13].To allow for absolute protein quantification, 100 μg of IgG, IgA, and IgM (all from Sigma-Aldrich, St. Louis, MO) was digested according to the same protocol and a dilution series was made prior to sample injection.

UPLC-ESI-QqQ-MS analysis
The neat enzymatically-prepared samples containing both peptides and glycopeptides were then directly analyzed without further hands-on sample cleanup or dilution using an Agilent 1290 infinity liquid chromatography (LC) system coupled to an Agilent 6490 triple quadrupole (QqQ) mass spectrometer (Agilent Technologies, Santa Clara, CA), as previously described [13,14].Briefly, an Agilent Eclipse plus C18 precolumn (RRHD 1.8 μm, 2.1 Х 5 mm) was connected directly to an Agilent Eclipse plus C18 column (RRHD 1.8 μm, 2.1 Х 100 mm) which was used for UPLC separation.1.0 μL of the digested plasma samples was injected and analyzed using a 25-min binary gradient consisting of solvent A of 3% acetonitrile, 0.1% formic acid and solvent B of 90% acetonitrile, 0.1% formic acid in nano-pure water (v/v) at a flow rate of 0.5 mL/min.
The MRM method used for this study requires predetermined knowledge of the peptide or glycoforms' LC retention time, its electrospray ionization, and its collision induced dissociation (CID) behavior, which we have previously determined for all the non-glycosylated peptides and glycopeptides used in this study [5,13].Results were integrated using Agilent MassHunter Quantitative Analysis B.5.0 software.Protein concentrations were determined based on calibration curves and glycopeptide relative responses were calculated using the area under the curves of the glycopeptide and a non-glycosylated reference peptide from the same protein.Relative IgM is IgM adjusted for high affinity IgG.

Bile acid-induced glycan alterations 2.4.1. Raji cell culture, bile acid treatment, and flow cytometry
For this analysis, Raji B cells were in vitro cultured in triplicate with different dilutions of the following bile acids representing both diseaserelevant and physiologic concentrations: chenodeoxycholic acid (CDCA), obeticholic acid (OCA), glycochenodeoxycholate (GCDCA), lithocholic acid (LCA), and deoxycholic acid (DCA).After incubation with the different bile acids, Raji B cells were incubated with Fc-block (BD Biosciences, San Jose, CA) on ice for 15 min and stained with Aqua-LIVE/DEAD (Invitrogen, Carlsbad, CA).Cells were then stained with or without (FMO, Fluorescence Minus One) Fluorescein labeled lectins Concanavalin A (ConA), Jacalin from Artocarpus integrifolia (JAC), Sambucus nigra agglutinin (SNA), and Aleuria aurantia lectin (AAL) (Vector Laboratories, Burlingame, CA) for 30 min at room temperature.Cells were washed after each step and before being analyzed on a BD Fortessa flow cytometer (BD, Franklin Lakes, NJ).Data were analyzed using FlowJo (Tree Star, Ashland, OR).After gating single cells and live cells, Fluorescein labeled lectins geometric mean fluorescence was retrieved and plotted in a heat map using R (R Core Team, Vienna, Austria) to represent bile-acid induced alterations of the B-cell glycocalyx.Standard mean differences (SMD) were calculated from mean fluorescence intensities of the bile acid-culture B cells and normalized to control cultured cells.

RNA isolation and qRT-PCR
Raji B cells were harvested, washed twice with PBS, and resuspended in RNAlater (Life Technologies, Carlsbad, CA).Total RNA were extracted using RNeasy plus mini kit (Qiagen, Germantown, MD) and the quantity and quality of RNA were determined by using a Qubit Fluorometer (Life Technologies) and TapeStation 2200 (Agilent Technologies, Santa Clara, CA) following the manufacturer's protocol.Total RNA was reverse transcribed to cDNA using iScript Reverse Transcription Supermix (Bio-RadLaboratories, Hercules, CA) following the manufacturer's protocol.Predesigned human glycosylation and extracellular matrix and cytoskeleton PrimePCR plates (Bio-RadLaboratories) were used for real-time PCR using the CFX96 Touch Real-Time PCR detection system (Bio-RadLaboratories) and the analysis was performed using CFX E. Maverakis et al.
Manager 3.1 (Bio-RadLaboratories).Gene expression was normalized to reference genes and presented as fold changes.

Statistical analysis
All statistical analyses were done using R software [15].For each analyte skewedness was assessed and data was log transformed to make the distribution approximately normal.Outliers were identified using R package "extreamvalues" [16], and when present were winsorized from the analysis so that the outliers were set equal to the nearest non-outlier value.Differential analysis was carried out to identify analytes whose concentration was significantly different between disease groups (PBC vs. PSC vs. control) using Analysis of Covariance (ANCOVA) with age and gender as covariates, followed by Tukey's honest significant difference (HSD) tests.ANCOVA and linear regression assumptions about the normality of residuals were examined by use of the Shapiro-Wilk test.Collinearity of variables in the multivariate models was examined by calculating variance inflation factor (VIF, excessive if >2.5) with R package "car" [17].Nonlinear relationships between the analytes and the outcome were evaluated with R package "mfp" using a multiple fractional polynomial (MFP) method [18].Variable selection in the multiple linear regressions analyses was performed by forward stepwise exhaustive search using "leaps" R package [19].The algorithm searched the best models of all sizes up to the specified maximum number variables.Each model's performance was evaluated by the leave-one-out cross validation method using "caret" [20] R package and the optimal number of variables included in the model was selected using minimum root-mean-square error (RMSE).Logistic regression models were fitted using Firth's bias reduction method with the R package "logistf" [21].This package also used penalized likelihood ratio tests for variable selection.The association between analytes and disease status (PBC, PSC, and control) was analyzed using multinomial logistic regression models with R package "nnet" [22].Model performance was estimated with 5fold cross validated (CV) multiple class area under ROC curve using R package "HandTill2001" [23].The association between analytes and disease stages was examined using an ordinal logistic regression implemented in the R package "MASS" [22].Proportional odds assumption was checked by Brant test for this regression in the R package "brant" [24].Meta-analyses were conducted to assess findings across the multiple datasets using R package "metafor" [25].A weighted random-effects model was used to estimate a summary effect size.Restricted maximum-likelihood estimator was selected to estimate between-study variance.Weighted estimation with inverse-variance weights was used to fit the model.

Significant difference of glycoforms among PBC, PSC, and healthy controls
As a demonstration of the utility of the plasma glycome to identify biomarkers of human disease, accounting for age and gender as covariates, ANCOVA and Tukey's HSD test were performed to identify analytes that were significantly differentially expressed among three groups (PBC, PSC, and healthy control).Group demographics are summarized in Table S1 and Fig. S1.PBC patients differed significantly from controls with respect to 61 glycoforms (66 total analytes).PSC patients significantly differed from controls with respect to 56 glycoforms (60 total analytes) and there were 47 glycoforms (54 total analytes) that differed significantly between PBC and PSC (Fig. 1 and Table S2).This analysis also revealed that the relative abundance of IgG3 and IgM were significantly elevated (P = 6.43e-14 and 3.61e-14, respectively) in patients with PBC when compared to controls (0.1 ± 0.1 vs 0.041 ± 0.02 and 0.29 ± 0.2 versus 0.086 ± 0.07, respectively) (Table S2).

Altered glycosylation at different stages of PBC
Given that there were several glycoforms significantly differentially expressed in patients with PBC, it was of interest to determine if disease stage was also associated with alterations in glycosylation.A number of patients ( 15 S1).For this analysis, glycoforms that were differentially expressed in the setting of PBC were graphed against stage, and Spearman's correlation coefficients (r s ) and corresponding P values for significance were calculated.Glycoforms  3)) can be seen in Fig. S3.

Differential expression of glycoforms in PBC, PSC, and healthy controls: multi-analyte classifier performance
Fig. 3A presents receiver operating characteristic (ROC) curves and area under the curve (AUCs) of differentially expressed glycoforms that performed best as single analyte multinomial classifiers capable of distinguishing between PBC, PSC, control groups.Specifically, seven glycoforms and three plasma proteins (relative IgM, normalized IgG3, and absolute A2HSG) performed well (AUCs >0.72) performing as single analyte classifiers (Fig. 3A, Fig. S4A, and Table S6).
Multi-analyte classifiers were then constructed from analytes which demonstrated low Pearson's Product-Moment Correlation Coefficient (PPMCC) r values in their pairwise comparisons (|r| < 0.2).For each classification model, ROC curves were constructed and AUCs calculated to illustrate the classifier's diagnostic ability.Pairwise classifiers designed to distinguish PBC from controls, PSC from controls, and PBC from PSC performed their designated tasks well (5-fold CV AUCs = 0.98 ± 0.03; 0.96 ± 0.06; 0.94 ± 0.06; respectively) (Fig. 3B to D and Table S7).Final multinomial models (capable of differentiating all diagnostic groups from each other) with differing numbers of analytes (n = 1-13) were then assessed for accuracy.As additional analytes were added to the multinomial model, meaningful increases in accuracy were noted until four analytes were reached, above which only small additional increases in accuracy were observed (Fig. S4B).Thus, four analytes were chosen for the final multinomial model: Hp 207-11,904, A1AT 70-5402, Hp 241-5511 and relative IgM (Fig. 3E, Table S8).The AUC of this four-analyte multinomial model (capable of differentiating all diagnostic groups from each other) was 0.93 ± 0.03, 5-fold CV (Fig. 3E).Alternative models are presented in Fig. 3E and Fig. S4 for comparison.For all models described above, collinearity among analytes was evaluated by calculating their variance inflation factor (VIF) and found to be low (Table S7).Nonlinear relationships were evaluated by the MFP method, which revealed no concerning nonlinearity.Lastly, all constructed models were validated using the 5-fold CV technique.

Environmental influences on glycosylation
It has been stated that the plasma glycome is an expression of the overall health of an individual [4].This is in part due to the multitude of environmental factors that impact glycosylation.Thus, we sought to identify environmental influences that could alter glycosylation in the setting of PBC and PSC.Given that PBC and PSC have differential elevation of bile acids compared to each other as well as healthy controls [26,27], we characterized the influence of bile acids on the expression of glycogenes in Raji B cells (Fig. 4A and Fig. S5).qPCR of the Raji B cells under different culture conditions demonstrated that some glycogenes (e.g.GALNT1 and GALNT9) were consistently altered in response to coculture with bile acids (Fig. 4A and Fig. S5).Reduced expression of GALNT9, which encodes a GalNAc transferase responsible for the initial transfer of a GalNAc residue to a serine or threonine residue on nascent proteins, in all bile acid conditions suggest that bile acids may shape the O-glycosylation pattern of B cells.Altered expression of MAN1C1, encoding a mannosidase that functions to glycosylate mature Asn-linked oligosaccharides, also appeared in the majority of bile acid conditions potentially indicating that N-glycosylation of B cells is also affected.The majority of glycogenes decreased in expression with LCA addition, including GALNT1, GALNT9, and GALNT12.Addition of GCDCA (5 μM) and OCA (0.5 μM) resulted in the least pronounced changes in glycogene expression while addition of CDCA (0.3 μM) and LCA (0.03 μM) resulted in the most varied transcripts among all bile acid conditions (Fig. 4A and Fig. S5).
In addition to changes in gene expression, we also sought to characterize the bile-acid induced alterations to the B cell glycocalyx.Thus, the same bile acid-B cell cultures described above were also analyzed by flow cytometry using an array of fluorescein labeled lectins, which are glycan-binding proteins with structural specificity.These experiments demonstrated that, when compared to bile acids LCA and CDCA, the bile acids OCA, DCA, and GCDCA produced opposing glycan alterations in Raji B cells.After exposure to the second group of bile acids, SNA binding to B cells was reduced, indicating less α-2,6 or α-2,3-linked terminal galactose (Fig. 4B).In contrast, incubation of Raji-B cells with LCA and CDCA increased binding of AAL, ConA, and JAC (Fig. 4B).Together, these results demonstrate the differential expression of glycogenes and glycan structures following exposure to bile acids.Thus,   alterations in glycoforms in the setting of PBC and PSC may be influenced by both intra-and extracellular factors, the latter being the result of disease-associated environmental changes (Fig. 4C).
To characterize (CH2)-84.4and other glycosylations in a site-specific fashion we have adopted MRM MS for glycan biomarker research and discovery.MRM MS allows for rapid characterization of site-specific glycosylations but requires predetermined knowledge of the glycopeptides electrospray ionization behavior, their collision induced dissociation patterns, as well as their retention times.These were previously established in our prior studies.For example, as a prelude to the current study we have mapped out the relative abundances of the 159 most common glycopeptides in the plasma of 97 healthy volunteers [46].We have also extensively characterized the glycan alterations associated with age and gender [46], which allowed these factors to be accounted for in the current study.
In the current study, we characterized the site-specific glycan alterations associated with primary biliary cirrhosis, primary sclerosing cholangitis and healthy controls.According to the Altered Glycan Theory of Autoimmunity, each autoimmune disease will have a unique glycan signature [4].Focusing on two related autoimmune diseases, PBC and PSC, allowed us to evaluate the utility of glycans as biomarkers of human disease.PBC was chosen as the prototypical disease for our initial autoimmune glycan biomarker study because it is not usually treated with immunosuppressive medications.Such medications may confound results if they have the ability to alter the glycosylation of plasma proteins [47].PSC was chosen as a comparison group because it is also a liver-specific autoimmune disease but, unlike PBC, a PSC-specific blood test does not exist, making the discovery of a novel PSC serum biomarker of significant clinical value.
After adjusting for covariates and accounting for FDR, 61 of the monitored glycoforms were found to be significantly associated with PBC, 56 with PSC, and 47 were differentially expressed between PBC and PSC.Thus, while the levels of some glycans appeared to be diseasespecific, others were more indicative of an autoimmune disease state (either PBC or PSC).For example, the 5411 IgG1 glycan was significantly decreased in both PBC and PSC patients, a finding we predicted a priori given that this glycan contains a terminal galactose residue and has both sialic acid and fucose decorations, which are thought to be antiinflammatory [4].Although there was some overlap, the overall glycosylation profiles associated with PBC and PSC were distinct from each other and from the profile that was associated with age.For example, the IgM Asn-209 glycosylation 5411 was significantly (FDR = 7.2e-06) altered by age but was not associated with either PBC or PSC (FDR > 0.05) [46].Another important finding was that the relative abundance of some glycans was linked to disease stage.Thus, in the setting of autoimmunity there are many forces driving glycan alterations.At one end are the soluble cytokines and other inflammatory mediators pushing differential glyco-enzyme gene expression.At the other end are the influences resulting from the aftermath of autoimmune tissue destruction.These destructive forces can alter glycosylation by several means.For example, if the cells secreting the glycoproteins are damaged, this may be reflected in the glycovariants that they produce.Another possibility is that the autoimmune-mediated tissue destruction may release environmental elements that in turn induce changes in glycosylation.This possibility is supported by the ability of different bile acids (identical to those released by a damaged liver) to induce alterations in the glycosylation profile of Raji B cells.In addition to altering the plasma glycome, it is likely that elevations in bile acid levels will alter the glycosylation of epithelial cell integral membrane proteins, creating an altered-self, potentially recognizable by the immune system as foreign, thereby initiating an autoreactive immune response.
Apart from changes in glycosylation, it is becoming increasingly apparent that some autoimmune diseases are strongly associated with a particular Ig class or subclass.Prototypic examples include the IgG4mediated diseases, pemphigus foliaceus and autoimmune pancreatitis [48,49].Of relevance to our study, PBC has been previously reported to be associated with mitochondrial-specific IgM and IgG3 autoantibodies [50].Our results demonstrate that relative IgM and IgG3 are significantly elevated in patients with PBC, which matches the antimitochondrial Igs' class/subclass and makes sense from an immunological perspective, as IgM and IgG3 are thought to work in concert during inflammatory immune responses [51].Of note, IgM is negatively associated with age [46], which is the direct opposite of that observed in PBC, a disease mainly presenting in middle-aged to elderly individuals.The ability to monitor relative abundances is unique to our novel MRM mass spectrometry approach, which uses robustly quantified nonglycosylated common peptides to normalize glycoform abundances.This technique also normalizes Ig abundances across different isotypes (e.g.IgG1-4) and our results demonstrate the superior sensitivity of relative as compared to total Ig concentrations, which was an a priori prediction based upon the large variation in "normal" Ig concentrations [4].
In addition to being the first report of glycan alterations occurring in the setting of PBC and PSC, our study offers a number of advantages over prior analyses: 1) the glycan quantification was site-specific across multiple plasma proteins including different Ig classes and subclasses, not just a single protein or released glycans, as is the case in other publications; 2) the MRM approach eliminated the need for additional protein purification or chemical processing, which allowed for large patient cohorts to be rapidly characterized; 3) the analysis was precise, rapid, and automated for high throughput; 4) it required only 2 μl of serum or plasma and little sample preparation, while current techniques require several mL of blood to quantitate Ig levels; and 5) in addition to total protein quantification, the technique provided the relative abundance of each glycoform, making it more suitable for biomarker research and discovery.For these reasons, the development of this approach as a clinical diagnostic tool is very appealing, especially when compared to its more labor-intensive alternatives.However, the most unique aspect of our study is that it successfully established multiple-analyte classifiers capable of differentiating one autoimmune disease from another and patients with autoimmunity from healthy controls.To date, several studies have demonstrated glycan alterations in the setting of autoimmunity, but none have investigated the usefulness of site-specific glycosylations and multi-analyte classifiers as disease-specific biomarkers.The MRM technology that we employed in this current study is rapidly evolving and more site-specific glycosylations are being incorporated every month.This will dramatically increase the accuracy of our diseasespecific classifiers.We anticipate that in the near future glycan analysis will become integral to the diagnosis and management of human diseases, especially diseases of the immune system and cancer.helped in writing of the manuscript and analyzed data.G.L. performed experiments and wrote the manuscript.C.B.L. designed the study, supervised the study and wrote the manuscript.All authors reviewed, edited and approved the manuscript prior to submission.
) lacking stage information were dropped from the analysis leaving 46 patients: 23 with Stage I disease, 13 with Stage II, 5 with Stage III and 5 with Stage IV disease (Table

Fig. 1 .
Fig. 1.Analysis of Covariance (ANCOVA) of glycan or protein relative abundances in PBC, PSC, and CTRL groups.Y axes represent the log of the relative abundance.Examples of the significant analytes that were differently expressed between PBC, PSC, and control groups (CTRL) are depicted.A full list of all monitored analytes including P values, relative concentrations, and standard deviations can be found in TableS2.The upper and lower bars connected to each box indicate the boundaries of the normal distribution and the upper and lower box edges mark the first and third quartile boundaries within each distribution.The bold line within the box indicates the median value of the distribution.FDR values are displayed for each analyte with analytes organized in order of increasing value going across each row left to right.Glycan nomenclature is as follows: p: site on protein, g: glycan code.Glycan conversion from code (i.e.g:3300) to chemical composition and structures (i.e.Hex(3)HexNAc(3)) can be seen in Fig.S3.

Fig. 2 .
Fig. 2. Analytes altered by PBC stages.Log relative glycan abundance is graphed versus PBC stage and Spearman's correlation coefficients (r s ) and significance are calculated.For each stage the relative glycan abundance is represented as a box-and-whisker plot.The upper and lower bars connected to each box indicate the boundaries of the normal distribution and the upper and lower box edges mark the first and third quartile boundaries within each distribution.The bold line within the box indicates the median value of the distribution.

Fig. 3 .
Fig. 3. Classifiers for PBC vs PSC vs CTRL.(A) Receiver operating characteristic (ROC) curves of the 4 best single analyte classifiers for distinguishing between all diagnostic groups (PBC, PSC and CTRL).AUC includes 5-fold CV.A comprehensive list of the single analyte classifiers can be found in Table S2.(B) ROC curve for binomial multi-analyte classifier for distinguishing PBC from CTRL.Classifier is comprised of five glycoforms (ApoC3 p:74 g:1102, Hp p:207 g:11904, IgG2 g:5510, TF p:432 g:6502, and A1AT p:70 g:5402) and 1 plasma protein (relative IgM).(C) ROC curve for binomial multi-analyte classifier for distinguishing PSC from CTRL.Classifier is comprised of four glycoforms (Hp p:207 g:11904, A1AT p:70 g:5402, Hp p:241 g:5511, Hp p:184 g:6512).(D) ROC curve for binomial multi-analyte classifier for distinguishing PBC from PSC. Classifier is comprised of two glycoforms (AGP1 p:33 g:6501, AGP1/2 p:72MC g:7614), 1 plasma protein (relative IgM), and age as variables.(B to D) Confusion matrices representing classifier performance against the entire dataset are listed below their corresponding ROC curves.Five-fold cross validation results are displayed within the ROC curves.(E) Confusion matrices for multi-analyte classifiers designed to distinguish PBC, PSC, and controls from one another.The final model is represented by n = 4, made up of three glycoforms (Hp p:207 g:11904, A1AT p:70 g:5402, Hp p:241 g:5511) and 1 plasma protein (relative IgM), as increasing n further leads to only modest increases in classifier accuracy.A list of the remaining ROC curves with different number of predictors from n = 1 to n = 13 can be found in Fig. S4B.

Fig. 4 .
Fig. 4. Bile acids alter the glycosylations on B cells.(A) Raji B cells were incubated in triplicate with bile acids (CDCA, OCA, GCDCA, LCA, and DCA) at indicated concentrations chosen to represent normal and disease-relevant elevations in bile acids.The cells were harvested 48 h later and RNA extracted.A glycogene quantitative real-time PCR array was then used to characterize glycogene expression, which revealed differential expression of several genes including GALNT1 and GALNT9, which were consistently downregulated.The dotted lines present in each panel represent the 3-fold threshold.The remainder of the plots are shown in Fig. S5.(B) Raji B cells were treated in an identical fashion to part A. After 48 h cells were harvested and stained with a panel of fluorescently labeled lectins (AAL, ConA, JAC, SNA) and analyzed by flow cytometry.The strength of staining (MFI) of the bile acid cultured B cells and control cultured cells were used to calculate SMD values, which are presented graphically as a heatmap.(C) Schematic of glycan alterations in response to bile acids.