Protein glycosylation and glycoinformatics for novel biomarker discovery in neurodegenerative diseases

Glycosylation is a common post-translational modification of brain proteins including cell surface adhesion molecules, synaptic proteins, receptors and channels, as well as intracellular proteins, with implications in brain development and functions. Using advanced state-of-the-art glycomics and glycoproteomics technologies in conjunction with glycoinformatics resources, characteristic glycosylation profiles in brain tissues are increasingly reported in the literature and growing evidence shows deregulation of glycosylation in central nervous system disorders, including aging associated neurodegenerative diseases. Glycan signatures characteristic of brain tissue are also frequently described in cerebrospinal fluid due to its enrichment in brain-derived molecules. A detailed structural analysis of brain and cerebrospinal fluid glycans collected in publications in healthy and neurodegenerative conditions was undertaken and data was compiled to create a browsable dedicated set in the GlyConnect database of glycoproteins (https://glyconnect.expasy.org/brain). The shared molecular composition of cerebrospinal fluid with brain enhances the likelihood of novel glycobiomarker discovery for neurodegeneration, which may aid in unveiling disease mechanisms, therefore, providing with novel therapeutic targets as well as diagnostic and progression monitoring tools.


Introduction
The cell surface is covered with a layer of glycans associated with different glycoconjugates, including glycoproteins, designated as glycocalyx and those glycans play important roles in cell-cell, cell-matrix and cell pathogen interactions. Brain glycoproteins, including cell adhesion molecules from the plasma membrane, e.g., NCAM (Wuhrer et al., 2003) or L1CAM (Faissner et al., 1985) are heavily glycosylated and are crucial for cell differentiation and development in the central nervous system (CNS), for example, due to their importance in synaptic connectivity (Sytnyk et al., 2021;Yuzaki, 2018), neuron differentiation (Gouveia et al., 2008), or neuron-astrocyte interactions (Tan and Eroglu, 2021). On the other hand, receptors, e.g., N-methyl-D-aspartate receptors (Sinitskiy et al., 2017) or transporters, e.g., glutamate transporters, EAAT1 and EAAT2 (Parkin et al., 2018) from the plasma membrane are in general heavily glycosylated, with glycans playing important roles in their regulation.
Glycosylation is a common post-translational modification of proteins with N-and O-glycosylation being the most studied types.
Other glycoconjugates relevant in brain and brain disease but outside the scope of this review, are proteoglycans and glycolipids. Proteoglycans and their contribution in neurological diseases have been reviewed recently (Downs et al., 2022). Glycosphingolipids are very abundant in the brain where they play important roles, such as in myelination, and are related to neurological diseases such as multiple sclerosis or Alzheimer's disease (AD). This has also been reviewed in recent years (Alaamery et al., 2021;Giussani et al., 2021;Schnaar, 2016).
Cells and tissues show characteristic glycosylation profiles that reflect the set of expressed glycosyltransferases, glycosidases and sugar nucleotide transporters as well as their localization along the secretory pathway, substrate availability, metabolism and physiological conditions. One glycoprotein can exist as a set of glycoforms where the protein displays biosynthetically related, but distinct glycan structures on the same glycosylation site (microheterogeneity). Organs display specific glycosignatures (Otaki et al., 2022) and, particularly, brain has distinguishing glycosylation patterns (Sytnyk et al., 2021).
Due to the important role glycans play in glycoconjugate functions as well as cellular and organism homeostasis, mutations in genes coding for glycan biosynthesis are either lethal during embryonic development or cause congenital disorders of glycosylation (Freeze et al., 2015;Reily et al., 2019). Furthermore, changes in glycosylation have been described in diseases of the CNS, including neurodegenerative diseases, such as AD or Parkinson's disease (PD) (Balana and Pratt, 2021;Gaunitz et al., 2021b). The characterization of brain glycosylation patterns and specific glycan epitopes as well as their deregulation in disease will be useful for the understanding of disease etiopathogenesis, to provide disease biomarkers and therapy targets. To investigate molecular signatures as biomarkers in CNS diseases many studies have explored the cerebrospinal fluid (CSF) Dreger et al., 2022;Khalil et al., 2018;Shaw et al., 2020;Swift et al., 2021) due to its proximity to the CNS.
In this review we focus on protein glycosylation of N-and O-glycoproteins of brain tissues in health and disease. Experimental strategies as well as glycoinformatic tools for protein glycosylation analysis are presented. Tools associated with the glycoprotein platform GlyConnect including the dataset dedicated to brain and CSF glycans are used in comparative studies. We further address human CSF glycosylation properties, particularly of CSF brain derived proteins, as ground for future development of biomarkers for neurodegenerative diseases associated with aging.

Structural diversity of N-and O-glycans
Major classes of mammalian glycans consist of N-and O-glycans. Nglycans are covalently linked to asparagine (Asn) residues located in  Table 1 and Table 2 and references therein. Selection criteria included abundance, functional relevance and representation of different types. Nomenclature of glycans was according to SNFG. Composition abbreviation is as follows: H, hexose; N, HexNAc; F, Fuc; S, sialic acid. PSA, polysialic acid. Created with BioRender.com. consensus sequence Asn-X-Ser/Thr of the polypeptide chain, thereby forming an amide bond with a GlcNAc. A common core, composed of two GlcNAc and three mannose (Man) residues (IUPAC: Manα1-3 (Manα1-6)Manβ1-4GlcNAcβ1-4GlcNAcβ1-Asn) is present in N-glycans, which is extended further to form three main categories designated as oligomannose, complex and hybrid (Fig. 1A). In each of these categories, several properties are used as distinctive features to group structures together. For example, in mammalians, the addition of fucose (Fuc) on the core GlcNAc is described as core fucosylation, whereas branching Man of the core may be split with GlcNAc then called bisecting GlcNAc.
O-Glycans are bound to the hydroxyl group from the lateral chain of amino acid residues (Fig. 1B). A common type of O-glycosylation consists of the binding of GalNAc to Ser or Thr from the polypeptide chain forming the Tn antigen, which can be further elongated to form different cores numbered from 1 to 8 (Wang et al., 2021). GalNAc glycans are commonly found in high abundance on mucins and other glycoproteins on the surface of cells. Ser and Thr residues may also be modified by O-linked Man, which can be further elongated to form M1 to M3 cores (Larsen et al., 2019;Sheikh et al., 2017). Note that O-GlcNAcylation also acting on Ser or Thr residues (Fig. 1B) is a common post-translational modification of nuclear or cytoplasmic proteins (Fehl and Hanover, 2022).
Mammalian N-and O-glycans are elongated and branched through the addition of a variety of monosaccharide residues, including sialic acids (Neu5Ac and Neu5Gc as most common), Fuc, galactose (Gal), and GalNAc. Glycan diversity is even expanded with other modifications, e. g., sulfation or acetylation. Note that sialic acid is by far the most variable monosaccharide taking over one hundred forms (Lewis et al., 2023). In the end, masses can reach up to 4 or 5 kDa in the sub-category of tri-or tetra-antennary glycans. Another important characteristic feature of glycan structure is designated as motif that corresponds to previously identified ligands. Needless to say, blood groups are the most famous examples of such motifs. In addition, and particularly in the brain, polysialic acid (PSA), human natural killer-1 (HNK-1), Lewis x , sialyl-Lewis x and LacdiNAc have been observed (Fig. 1C).

Analytical strategies for brain and CSF glycoproteins
In view of the high structural complexity, advanced technologies and complementary approaches are required for glycan analysis.
Glycomics and glycoproteomics address the systematic characterization of protein glycosylation from proteins or cells, organisms and biofluids in specific physiological conditions. In glycomics glycan profiles are provided as well-defined structures with poor information on attachment sites, while in glycoproteomics structural resolution is poor (only compositions) but information on the specific amino acid site of attachment is precise. Concerning brain tissues and CSF, major analytical procedures that have been used in recent glyco(proteo)mics studies are summarized in Table 1 and Table 2, together with the major results obtained. A global diagrammatic representation of the major analytical workflow steps is presented in Fig. 2.
A crucial issue in determining the glycan profiles is sample extraction in view of the potential action of degradative enzymes, so ideally brain tissues should be homogenized on ice immediately after their collection or, if not possible, be snap-frozen in liquid nitrogen and conserved at − 80ºC. Since the brain is highly enriched in lipids (O'Brien and Sampson, 1965), several extraction protocols involved a step of delipidation with chloroform/methanol mixtures (Klaric et al., 2021;Stalnaker et al., 2011). Analytics of CSF are comparatively easier since it is a biofluid. Nevertheless, to assure molecule stability ideally CSF should also be conserved at − 80ºC immediately after collection for subsequent storage. Detailed guidelines for CSF collection and storage aiming at biochemical characterization and biomarker identification have been published (Teunissen, 2009) and widely followed in studies aiming at biomarker identification from the CSF in CNS diseases, including, Alzheimer's disease (Massa et al., 2022), Parkinson's disease (Mackmull et al., 2022), depression (Sorensen et al., 2022), amyotrophic lateral sclerosis (Otto et al., 2012).
For global analysis of glycosylation periodic acid-Schiff (PAS) staining is performed (Frenkel-Pinter et al., 2017). On the other hand, to detect specific glycan epitopes lectin blotting or lectin arrays are applied (Fang et al., 2020;Frenkel-Pinter et al., 2017;Kizuka et al., 2015;Murrey et al., 2009;Nakano et al., 2019;Williams et al., 2022). When aiming at detailed structural analysis in N-glycomic studies, N-glycans are released enzymatically from brain or CSF glycoproteins/glycopeptides using peptide N-glycosidase F, whereas in O-glycomics studies, O-glycans are released by β-elimination. In most studies glycans have been derivatized to facilitate their subsequent analysis either by permethylation for MS or fluorescent labelling for UPLC. For detailed structural elucidation, there are several approaches, including exoglycosidase digestion in combination with HILIC-UPLC and/or MS, and MS fragmentation analysis. For glycoproteomics studies, peptides are first obtained by protease digestion and glycopeptide enrichment is usually done to avoid signal supression of glycopeptides due to the presence of other more abundant nonglycosylated peptides. Enrichment strategies may introduce bias into the results (see example below in Section 2.3, Fig. 3 A).
There are numerous LC-MS and MALDI MS approaches to the glyco (proteo)mics analysis that have been reviewed extensively (Chau et al., 2023;Kellman and Lewis, 2021;Lageveen-Kammeijer et al., 2022;Peng et al., 2022;Thaysen-Andersen et al., 2021). Many of these point at the same two main issues hindering fast progress in the field. First, is the still poor statistical validation of data analysis. In the past two decades, advances in proteomics and mass spectrometry analysis have led to a consensus regarding the calculation of a False Discovery Rate (FDR) that expresses confidence in protein identification. Solutions to this problem have very recently been proposed for glycopeptide identification in StrucGP (Shen et al., 2021) as well as version 3 of pGlyco (Zeng et al., 2021) though a consensus is not quite achieved, yet. Second, despite stronger recommendations for including metadata on both the sample preparation and MS analysis of the experiments in on-line repositories, published glyco(proteo)mics results often remain unnoticed. In contrast with the visibility and accessibility offered by current good practices in many -omics fields, e.g., transcriptomics data submitted to the GEO repository (https://www.ncbi.nlm.nih.gov/geo/) and proteomics data submitted to any of the ProteomeXchange associated repositories (http://www.proteomexchange.org/). Glycomics is still lagging behind despite budding initiatives such as GlycoPOST (Watanabe et al., 2021). A recent review of bioinformatics tools for glycomics (Rojas-Macias et al., 2019) also surveys some of the tools, with the prospect of implementing an automated pipeline for processing MS data. A major obstacle to automation remains the lack of standards and the poor reproducibility of results. This is tied up with the first issue mentioned in glycoproteomics that lacks a consensus definition of confidence. This situation is being monitored via community challenges that are efficient in assessing the strengths and weaknesses of the various tools (Kawahara et al., 2021).

Bioinformatic resources for glyco(proteo)mic analysis in brain and CSF
Glycoinformatics was first used to refer to [direct quote] "the informatics tools available for assessing 'primary data' (covalent and three-dimensional structures of glycans and glycoconjugates) and organizing these primary data into databases that can be used for speeding up the production of primary data, predicting new features and characterizing structure/activity or structure/function relationships" (Perez and Mulloy, 2005). In relation to the wider field of bioinformatics, glycobioinformatics puts an emphasis on carbohydrate-related proteins and genes .
For efficacy, bioinformatics tools need well populated, current, maintained and quality datasets, and ideally the data should adhere to Table 1 Selected glyco(proteo)mic studies from brain tissues of human, mouse and rat. FAIR principles (Wilkinson et al., 2016): findability, accessibility, interoperability, and reusability. This applies not only to the data but also to the algorithms, tools and workflows that led to the data. FAIR principles should be applied to both human-driven and machine-driven activities. At a time when machine learning and data mining strategies have become common practice in bioinformatics, it is necessary to prepare the ground for glycoinformatics (Bojar and Lisacek, 2022). To ensure the implementation of the FAIR principles, reporting experimental glycomics metadata in a unified manner is essential, as pointed out earlier. This question is being addressed by the gradual development of the MIRAGE guidelines (York et al., 2014).
Recently, there has been an increase in interest in brain glycosylation especially in different disease states. A few tools are routinely used in the field. A frequently used software in a glycomics / glycoproteomics workflow remains GlycoWorkBench (Damerell et al., 2015) and its associated drawing module (available as a standalone tool), Gly-canBuilder (Damerell et al., 2012). Developed as part of a larger glycoinformatics project (von der Lieth et al., 2011), GlycoWorkBench was specifically designed for mass spectrometry-based glycomics analysis and has been used in brain as well as CSF studies, to assign glycan compositions from MS spectra as well as the possible fragment ions generated by tandem MS experiments (Table 1, Table 2) (Bleckmann et al., 2009b;Costa et al., 2019;Fogli et al., 2012;Gaunitz et al., 2021a;Goncalves et al., 2015;Goyallon et al., 2015;Palmigiano et al., 2016;Williams et al., 2022). It is also commonly used to generate cartoons that adhere to the Symbol Nomenclature for Glycan Structure (SNFG) format (Neelamegham et al., 2019) for inclusion in publications irrespective of the experimental technique used (Klaric et al., 2021;Matthies et al., 2021). Another module of GlycoWorkBench that has been used in glycomics analysis is Glyco-peakfinder (Maass et al., 2007) specifically for mass spectrometry peak analysis. It can be used to aid the calculation of glycan compositions and to evaluate spectra resulting from glycomics experiments (Bleckmann et al., 2009a;Bleckmann et al., 2009b).
As MS is one of the most used analytical techniques, the informatic tools are varied and often quite project-specific. For example, Xiao et al. (Xiao et al., 2018) developed an N-glycan database search engine, Gly-Seeker, that allows for large scale analysis of glycans. It was subsequently used to analyse mouse brain (Shen et al., 2019). In glycoproteomics, processing may involve different tools (Chau et al., 2023). For example, Fang et al. (Fang et al., 2020) identified non-glycosylated peptides and de-glycosylated peptides using classic proteomics data analysis tools including Proteome Discoverer (Orsburn, 2021), Mascot , along with a targeted mouse dataset extracted from UniProt/SwissProt (Poux et al., 2017), and the Scaffold validation software (Proteome Software, Inc., Portland, OR; version 4.4.6). Then, the identification of site-specific glycopeptides was performed with pGlyco 2.0  and data from GlycomeDB (Ranzinger and York, 2015), now fully included in the universal Gly-TouCan glycan structure repository (Fujita et al., 2021). GlyTouCan (https://glytoucan.org), the international glycan repository is a key resource in glycoinformatics; it includes non-curated glycan structures identified by unique accession numbers (Fujita et al., 2021).
Glycans have also been identified by HILIC and UPLC chromatography run in parallel with glycan standards and often with exoglycosidase sequencing. Again, there are specialized databases and tools for this field, such as, GlycoDigest (DOI: 10.1093/bioinformatics/btu425) a tool that simulates the activity of exoglycosidases on glycan structures. One of the most recent is GlycoStore (Zhao et al., 2018), a database of retention properties for glycans that built on former GlycoBase (Campbell et al., 2008). Using this database Klaric et al. (Klaric et al., 2021) verified the identification of rat brain N-glycans from HILIC UPLC coupled with mass spectrometry. This database was also used by Samal et al. (Samal et al., 2020), in their analysis of rat brain using UPLC.
In addition to GlycoWorkBench to generate SNFG compliant cartoons of glycan structures, a broad range of similar tools are available (Lal et al., 2020). For example, GlycoGlyph (Mehta and Cummings, 2020) adapted from SugarSketcher (Alocci et al., 2019), has been used in a recent study of mouse brain glycans . The advantage of this tool is that it is online and does not require downloading or installation. It also allows output of the glycan structures in standard string formats such as GlycoCT (Herget et al., 2008).

GlyConnect for brain and CSF glycosylation studies
As mentioned earlier in this article, monosaccharides and their linkages of glycan structures generated in glycomics experiments are either partially or fully determined, but their protein site attachment is poorly defined if at all characterised. In contrast, glycoproteomics experiments generate precise site attachment information but with a poor determination of glycan structures, usually limited to monosaccharide  (Moh et al., 2022) compositions. The two sources of information are brought together in GlyConnect, a curated glycoprotein and glycan database (Alocci et al., 2019), cross-linked to related glycoinformatics or proteomics resources for the annotation of metadata with standard identifiers. GlyConnect also includes dedicated datasets of biologically relevant data which are subsets of the entire database, for example, a COVID-19 specific dataset of reported glycan structures attached to Sars-Cov-2 spike protein (https://glyconnect.expasy.org/covid-19) and its binding receptors (e. g., ACE2). Due to the increasing interest in brain glycosylation, and as a complement to this review, a brain dataset accessible at https://glyconnect.expasy.org/brain, has recently been added. This dataset covers multiple mammalian species, brain subsections, CSF and disease states and includes a number of references mentioned in this review.
GlyConnect combines a collection of in-house tools developed under the Glyco@Expasy initiative (Mariethoz et al., 2018), in particular GlyConnect Compozitor (Robin et al., 2020) specifically designed to visualise and compare glycome and glycoproteome interactive maps. Given a set of glycan compositions representing a glycome (associated with a protein, a cell line, a disease or a tissue) Compozitor plots them in a graph that connects compositions sharing all but one monosaccharide. Additional linked information on potential structure(s) is provided for each composition/node in the graph. Furthermore, each composition is implicitly related to structural properties reflecting a generalized breakdown of structure type (oligomannose, neutral, fucosylated, sialylated, fucosialylated). This information is summarized in a colour-coded bar chart.

Fig. 2.
Schematic representation of the analytical strategies used for brain and CSF protein glycosylation analysis. The information was collected from the studies presented in Table 1 and Table 2. Created with BioRender.com.  (Lee et al., 2020b), mouse frontal cortex  and rat (R. norvegicus) neocortex (Klaric et al., 2021) glycomes. C. Comparison of human brain (Lee et al., 2020b), CSF (Moh et al., 2022;Palmigiano et al., 2016) and plasma (Clerc et al., 2016) glycomes. The Compozitor resource was used following the instructions provided in Supplementary File 1. Created with BioRender.com.
Individual glycomes can also be compared with Compozitor. Graphs can be superimposed and additional property bar charts representing shared properties (intersection of sets) are displayed. Fig. 3 shows an example of three relevant glycome comparisons. The description of the steps involved in these analyses is presented in Supplementary File 1. In the first example, the glycan profiles of mouse brain in a glycomics study (glycome A)  and in glycoproteomics study (glycome B) (Riley et al., 2019) were compared (Fig. 3 A). The glycomics study shows higher levels of fucosylation and sialylation than in the glycoproteomics study. A number of causes can explain the bias towards less sialylation in the glycoproteomics study, including the reduction in efficacy of certain proteases due to the sialic acid moiety, the loss of the sialic acid in collisional activation methods used in MS, or the glycopeptide enrichment method based on concanavalin A affinity with higher affinity for oligomannose glycans than to complex glycans as discussed by the authors (Riley et al., 2019). One cannot exclude that differences are also due to distinct experimental set-ups. Two other applications of Compozitor analysis are discussed below in Section 3 and Section 4.2.

N-and O-glycans in brain tissue
The characterization of protein glycosylation in the mammalian CNS, particularly in brain, has been a subject of research for decades Krusius and Finne, 1977;Zamze et al., 1998) and more recent studies are shown in Table 1.
More recent studies also reported the predominance of complex Nlinked asialylated biantennary type structures with one or two Gal (Barboza et al., 2021;Fang et al., 2020;Gaunitz et al., 2021a;Klaric et al., 2021;Lee et al., 2020b;Mealer et al., 2022;Samal et al., 2020;Williams et al., 2022) in agreement with early reports Krusius and Finne, 1977;Zamze et al., 1998). Abundant structures are presented in Fig. 1A and included fucosylated bisected monoantennary glycan without Gal (H3N4F1), fucosylated bisected biantennary glycan without Gal (H3N5F1) and difucosylated bisected biantennary glycan with one Gal (H4N5F2), as well as a high overall proportion of bisected N-glycans (Table 1). The presence of bisecting GlcNAc hinders processing of N-glycans to higher antennarities and also affects glycan conformation (reviewed in Kizuka and Taniguchi, 2018)), which could explain the low amounts observed of tri-and tetraantennary glycans found in the brain. Monoantennary complex glycans such as difucosylated bisected monoantennary glycan with one Gal (H4N4F2) found in several studies Klaric et al., 2021;Williams et al., 2022), have been suggested to result from degradation of the corresponding biantennary glycan due to the action of hexosaminidase B (Okamoto et al., 1999); this enzyme could also produce paucimannose structures, with only few Man residues. Hybrid-type structures Helm et al., 2022;Klaric et al., 2021;Lee et al., 2020b;Mealer et al., 2022;Rebelo et al., 2021;Samal et al., 2020;Shen et al., 2019;Williams et al., 2022;Zamze et al., 1998) correspond to intermediate biosynthetic forms. By contrast other studies (Tena et al., 2022) reported that N-glycans in the human brain were generally highly branched and highly sialofucosylated, which could be due to the fact that membrane fractions were analysed or to other experimental variations.
The extent of N-glycan sialylation in the brain is low (Barboza et al., 2021;Gaunitz et al., 2021a;Otaki et al., 2022;Riley et al., 2019;Suttapitugsakul et al., 2022;Williams et al., 2022) (Table 1). This is at least in part a consequence of the predominant brain complex-type structures being truncated, and, therefore, not providing a terminal Gal substrate for the addition of a sialic acid residue. Functionally relevant in the CNS is the elongation of Neu5Ac with successive Neu5Ac residues, linearly, in α2,8-linkage, to create di-, oligo-and PSA (or polySia, where n ≥ 8); this event is restricted to specific brain proteins, including NCAM, SynCAM, neuropilin-1 and E-selectin ligand 1 (Thiesler et al., 2022). PSA plays different physiological roles in brain and most importantly is crucial for brain development with decreases being observed one week after birth in mouse (Schnaar et al., 2014). The negative charge of the Neu5Ac residue has also been reported to play important physical/biological roles with implications in mental diseases (Sato and Kitajima, 2013).
Another negatively charged structure found in brain is the sulfated HNK-1 epitope (Fig. 1C). This motif has been reported in N-glycans in several studies (Helm et al., 2022;Klaric et al., 2021;Lee et al., 2020b;Wilkinson et al., 2021). It is carried by cell adhesion molecules including NCAM, which is also a PSA carrier (Sytnyk et al., 2021).
It is interesting that NeuGc, which is abundant in mouse tissues but not synthesized by humans (Okerblom and Varki, 2017), has not been detected on mouse brain N-glycans (Lee et al., 2020b;Williams et al., 2022) and only detected at very low levels (1-2%) on O-glycans  although these are extensively sialylated with NeuAc in contrast to N-glycans. Underlying this finding is the fact that CMP-NeuAc hydroxylase, the enzyme that catalyzes the conversion of CMP-N-acetylneuraminic acid (CMP-Neu5Ac) into its hydroxylated derivative CMP-N-glycolylneuraminic acid (CMP-Neu5Gc), is not expressed in the brain (Suzuki, 2006). O-GlcNAcylation is a common PTM in the brain where it has different roles including contributing to synapse function and maturation (Hart, 2019). For example, in rat a high number of synaptosome proteins, such as synapsin I, are O-GlcNAcylated (Cole and Hart, 2001).
Compozitor was used again to compare the glycomes (in glycomics studies) of the human prefrontal cortex (152 structures) (Lee et al., 2020b), the mouse frontal cortex (95 structures)  and the rat neocortex (41 structures) (Klaric et al., 2021) (Fig. 3B). While not exactly the same, the prefrontal cortex makes up most of the frontal cortex, which is itself part of the neocortex, and therefore was regarded as a fair comparison. The structural properties bar chart showed a very similar distribution and the overlap of 30 structures among the three species. Furthermore, the presence of comparatively lower levels of sialylated structures, and high levels of fucosylated structures as major features of brain glycosylation are obvious and shared by the three species, which supports the use of mouse or rat as model systems for studying brain diseases. Particularly, the structural properties bar chart shows a very similar distribution and the overlap of structures (77) between the human and mice data, supporting the similarity between these two species, which is in agreement with Lee et al. (Lee et al., 2020b) who reported the similarity of glycan classes between mouse and human brains. Curiously, the relative amount of fucosylated structures appears to increase from rat, to mouse to human, which would require further experimental validation. One cannot exclude that differences observed were due to different experimental set-ups and not species related.
Some interesting examples of structural motifs that may be functionally relevant include: the LacdiNAc structure (see Fig. 1C) reported recently by two groups (Helm et al., 2022;Lee et al., 2020b) but not by others ; fucosylated LacdiNAc (LNDF) recently detected in human brain (Helm et al., 2022); the branched disialylated motif detected in older studies (Torii et al., 2014;Zamze et al., 1998) but not in a more recent one (Klaric et al., 2021). Curiously, LacdiNAc has been detected in human cells and biofluids, including amniotic fluid glycodelin (Dell et al., 1995) or human embryonic kidney cells used for recombinant protein production (Castro et al., 2021); it is also relevant in cancer cells (Machado et al., 2011) and it was found in helminths (together with LNDF) (van Die and Cummings, 2010) where it mediates binding to galectin-3 from macrophages with a potential role in immune response (van den Berg et al., 2004). The role of LacdiNAc in the brain is still unknown.

The origins of CSF molecules
CSF is a biofluid that surrounds the brain and the spinal cord providing buoyance support to the brain, protecting against physical injury, contributing to the regulation of intracranial pressure, maintaining the homeostasis of molecules including hormones in the CNS, and allowing the clearing of toxic waste products. The major site of CSF production is the choroid plexus, in cerebral ventricles, where epithelial cells connected by tight junctions form the blood-CSF barrier, which allows the selective transfer of molecules from the blood into the CSF (Engelhardt and Sorokin, 2009). The exchange of molecules between the brain and the blood is also accomplished via the blood-brain barrier, which consists of the endothelial cells of brain capillaries that provide a separation between blood and the interstitial fluid (Ghersi-Egea et al., 2018). In general, larger molecular weight proteins are excluded during CSF formation from the plasma, and whether specific glycoforms can traverse the blood-CSF or blood-brain barrier selectively is not clear at present. In addition to blood-derived molecules, some main CSF proteins such as transthyretin (Schreiber et al., 1990) or β-trace protein (β-TP, prostaglandin-H2 D-isomerase) (Hoffmann et al., 1996) are locally expressed by epithelial cells of the choroid plexus. Furthermore, molecules secreted by brain parenchymal cells to the interstitial fluid may be transported to the CSF via a paravascular or glymphatic route (Da Mesquita et al., 2018). The CSF gets reabsorbed into the blood mainly via arachnoid granulations into the venous system, the lymphatics of the nasal mucosa and meninges and the choroid plexus epithelium (Da Mesquita et al., 2018). Protein concentration is lower in CSF (200-400 mg/l) (Schilde et al., 2018) than in blood (60-80 mg/ml) (Leeman et al., 2018).
In this way, the CSF is a complex biofluid in close contact with CNS cells, thus enriched in intrathecal synthesized molecules, but with extensive interchange with the blood. Therefore, CSF has been largely used for the identification of deregulated molecules in CNS diseases that may provide with useful biomarkers for diagnosis and prognosis, to monitor disease progression, to evaluate the effect of potential therapies, to use as benchmarks, and to unveil disease mechanisms. Indeed, there is a large body of work aiming at identifying molecular markers from the CSF for neurological diseases Dreger et al., 2022;Khalil et al., 2018;Shaw et al., 2020;Swift et al., 2021). For biochemical characterization, CSF collection is usually done by lumbar puncture (Teunissen et al., 2009). Once identified in the CSF, potential biomarkers may be specifically screened for within the blood whose collection is less invasive for the patient.

Brain glycans detected in the CSF
CSF glycosignatures mirror brain glycosylation to some extent but also contain glycans derived from blood glycoproteins, as presented below.
Oligomannose glycans, which are abundantly detected in brain tissues, are only observed at low levels below 5% in human CSF (Goncalves et al., 2015;Palmigiano et al., 2016;Stanta et al., 2010). This is expected since CSF protein composition includes the secretome of brain cells and selected blood components. Though in low abundance, extracellular vesicles are another cellular-derived component of the CSF (Costa et al., 2020;Sandau et al., 2022). They were found to be enriched in complex glycans but only with low amounts of oligomannose glycans in human cell supernatants (Costa et al., 2018) and mouse serum (Otaki et al., 2022). As such, they seem to rather contribute to the complex glycan pool in the CSF.
Comparison of the glycomes of human CSF (Moh et al., 2022;Palmigiano et al., 2016), brain (Lee et al., 2020b) and plasma (Clerc et al., 2016) with Compozitor reveals a lower proportion of sialylated + sialofucosylated glycoforms in the brain with respect to CSF, which by its turn is also lower than in plasma (Fig. 3 C). On the other hand, levels of fucosylation are much higher in brain than in CSF and plasma. Fuc is predominantly bound to the core but several structures common to brain and CSF and absent from plasma displayed more than one Fuc indicating the presence of peripheral Fuc, such as, H4N5F2 (Hex:4 HexNAc:5 dHex:2), which is compatible with core fucosylated bisected biantennary glycan with one Gal and one Lewis x motif (Fig. 1). Therefore, Compozitor analysis corroborated that several glycoforms were shared between CSF and brain. Curiously, sulfated glycoforms were found in the brain and CSF but not in plasma. On the other hand, structures common to CSF and plasma but absent from brain were also detected including the disialylated biantennary glycan H5N4S2. Note that Gly-Connect displays all identified compositions within the analysed tissue, even the low abundant ones, and does not yet process information on the actual percentage of each glycoform.
O-Glycans from the human CSF have been comparatively less studied than N-glycans. However, predominance of core-1 GalNAc initiated glycans in CSF glycoproteins (Chen et al., 2021a;Halim et al., 2013;Moh et al., 2022;Nilsson et al., 2009) with a large proportion of these structures being sialylated (72.9%) (Chen et al., 2021a) was observed. O-glycans also contain abundant levels of Fuc (29.1%) (Chen et al., 2021a) and the presence of two or three Fuc residues tend to indicate the presence of the Lewis x epitope. Mono-and di-sialylated core-1 structures (Flowers et al., 2020) as well as core-2 GalNAc initiated structures were detected (Halim et al., 2013) in apolipoprotein E. O-glycosylation site occupancy of apolipoprotein E is distinct between CSF and plasma. CSF apolipoprotein E holds almost 10-fold more abundant C-terminal glycosylation in contrast to the plasma protein, which had comparatively higher N-terminal glycosylation. These differences may have functional implications in high-density lipoproteins (HDL) binding (Flowers et al., 2020). In contrast with brain tissues, mono-sialylated O-Man core M1 structure was reported but only at very low levels (Moh et al., 2022).
Overall, the glycan structures detected in brain tissues, including, bisecting GlcNAc-containing N-glycans, high level of sialylated core-1 GalNAc glycans, fucosylated Lewis x motif, oligosialic acid and PSA, sulfated structures, are also found in CSF glycoproteins supporting their brain origin. On the other hand, glycoforms highly abundant in blood glycoproteins, such as disialylated biantennary glycans, are abundant in CSF. These observations are compatible with the mechanisms of CSF formation and homeostasis, with molecules derived from local synthesis and others originating from blood.

Conditions of altered CNS glycosylation
Changes in glycosylation have been described in several disorders affecting the CNS. For example, Congenital Disorders of Glycosylation (CDGs), which result from mutations in genes responsible for glycan biosynthesis, are generally associated with developmental alterations and neurological phenotypes in both the central and peripheral nervous system concomitant to changes in cellular glycosylation (reviewed in (Freeze et al., 2015)). A particular type of CDGs consists of dystroglycanopathies, which are congenital muscular dystrophies, and result from mutations in genes affecting O-mannosylation primarily of α-dystroglycan (Jahncke and Wright, 2023).
Chemical therapies are available for CDGs (Sosicka et al., 2022) and several clinical trials are ongoing, for example, to investigate their treatment with monosaccharide dietary supplementation (e.g., Clin-icalTrials.gov NCT04198987).
A recently investigated type II CDG with severe symptoms including developmental problems including in the brain displayed mutations in the SLC39A8 gene encoding for the ZIP8 manganese transporter Park et al., 2015). Interestingly this gene also had variations associated with schizophrenia (SCZ) (Schizophrenia Working Group of the Psychiatric Genomics, C, 2014). ZIP8, though not participating directly in glycan biosynthesis, mediates the transport of Mn 2+ , which is a cofactor for many glycosyltransferases (containing the DXD motif) (Taujale et al., 2020). In the SLC39A8-A391T mouse, changes in brain glycosylation were observed distinctively across brain regions and more evident in the cortex of males . Global decreases of N-glycosylation and sialic acid content in N-and O-glycans were observed in the cortex, in contrast with cerebellum where total sialic acid level was increased. A recent report for human carriers of the SLC39A8 missense allele, also showed changes in plasma protein glycosylation profile of two patients which were normalized after Mn 2+ supplementation (Mealer et al., 2020a).
SCZ is a severe psychiatric illness where functional changes occur in different subregions of the frontal cortex with neurochemical disbalance in dopamine and glutamatergic N-methyl-D-aspartate receptor functionality (Jauhar et al., 2022). Alterations in post-translational modifications including deregulation of protein glycosylation has been described in SCZ, e.g., glutamate-and GABA-associated protein glycosylation or decreased polysialylation of NCAM (Mueller and Meador-Woodruff, 2020;Williams et al., 2020). Genetic studies indicated that SCZ is a polygenic disease (Jauhar et al., 2022;Schizophrenia Working Group of the Psychiatric Genomics, 2014) and variants in the genes associated with glycan biosynthesis FUT9, MAN2A1, TMTC1, GALNT10, and B3GAT1 were identified in association with SCZ (Mealer et al., 2020b). The analysis of post-mortem tissues of SCZ patients also showed changes in the expression of glycosyltransferases (Kippe et al., 2015;Narayan et al., 2009). In CSF a general down-regulation of bisected and sialylated glycans from SCZ patients was reported (Stanta et al., 2010).
In neurodegeneration defects in glycosyltransferases had several functional consequences including alterations in cell surface signaling, deregulation of ganglioside biosynthesis and changes in O-GlcNAcylation associated with AD, PD, ALS and Huntington's disease as reviewed in (Moll et al., 2020).
In neurodegenerative diseases associated with aging impairment of glucose metabolism has been described in the brain; the effect has been more investigated in AD, where it occurs presymptomaticaly (Cunnane et al., 2020). Glucose metabolism also affects the production of UDP-GlcNAc with reduced levels of glucose leading to reduction of protein O-GlcNAcylation (Bukke et al., 2020;Hart, 2019;Huang et al., 2023;Pinho et al., 2018;Schubert, 2005), which is known to play an active regulatory role in mechanisms of neurodegeneration as reviewed in (Balana and Pratt, 2021;Lee et al., 2021).
Also interesting are the recent findings showing that glucosamine constitutes up to 25% of glycogen monosaccharide composition in the brain instead of the normal glucose component. Diseases that impair glucosamine flux through glycogen, for example, glycogen storage diseases that impact brain function, have been shown to lead to a decrease in cytosolic pools of GlcNAc and reduced N-glycosylation . This topic has been recently reviewed .
Neuroinflammation, which is characterized by the reaction of glial cells (astrocytes, microglia) and infiltrating peripheral immune cells, occurs in neurodegenerative diseases, such as, AD, PD and ALS, in SCZ and in brain cancer. Many changes in glycosylation have been described during neuroinflammation and this topic has recently been reviewed (Rebelo et al., 2022). For example, decreases in sialylation, core and peripheral fucosylation, concomitant to increases in oligomannose, and bisected N-glycans were observed in rat brain striatum of a neuroinflammation model .
Recent studies addressing protein glycosylation deregulation in brain or CSF in neurodegenerative disorders AD, PD, ALS have been displayed in Table 1 and Table 2 and will be further explored in the following sections. Furthermore, a schematic representation of deregulated structures in neurodegenerative diseases is presented in Fig. 4.

Alzheimer's disease
AD is the most common cause of dementia, it affects primarily cholinergic neurons with degenerative changes occurring especially in the neocortex and hippocampus, and is characterized by amyloid β (Aβ) and tau pathology. Promising CSF molecular biomarkers consist of Aβ42 / Aβ40 ratio and phosphorylated tau (p-tau181, p-tau217) for AD pathology and neurofilament light chain NfL for neurodegeneration (Leuzy et al., 2022;Shaw et al., 2020).
In AD decreased levels of GlcNAcylation were observed in frontal cerebral cortices of patients, particularly of tau protein (Liu et al., 2009). Concomitant to decreased O-GlcNAcylation of tau protein increased phosphorylation is observed, since there is reciprocity between these post-translational modifications: Ser and Thr amino acid residues may either be phosphorylated or O-GlcNAcylated depending on cellular physiological conditions (Hart, 2019;Liu et al., 2004;Liu et al., 2009;Zhu et al., 2014). Hyperphosphorylated tau protein form aggregates in tangles that are a hallmark of AD and other tauopathies (Khanna et al., 2016). Supporting the beneficial effect of O-GlcNAcylation, was the observation that inhibition of O-GlcNAcase, the enzyme that cleaves O-GlcNAc, led to increased levels of O-GlcNAc tau and reduction of tauopathy as well as reduction of tau in the CSF of the mouse model rTg4510 (Hastings et al., 2017). Variations of O-GlcNAc levels in human AD brains showed distinct trends depending on the cellular fraction and brain region; for example, O-GlcNAcylation was decreased in the cytoplasmic fraction in the frontal lobe but not in the hippocampus (Frenkel-Pinter et al., 2017). Concerning global N-glycosylation in AD a recent study, by MALDI MS imaging showed increased N-glycan levels in the gray matter of human frontal cortex but not in hippocampus (Hawkinson et al., 2022). In support of a cross-talk between O-GlcNAcylation and N-glycosylation was a recent study where OGT knock-down cells showed increased levels of diantennary and 1,6-branched triantennary N-glycans concomitant to decreased levels of 1,4-branched triantennary and tetrantennary glycans (Song et al., 2022). This regulation occurs via the axis OGT / SLC35A3 UDP-GlcNAc transporter / GlcNAc-transferase IV.
Therefore, metabolic alterations in AD and how they affect global glycosylation as well as specific glycan classes and types is a particularly interesting topic that requires further clarification. A relevant study showed that monosaccharide utilization for glycan synthesis depended on cell type and physiological conditions (Wong et al., 2020).
Since O-GlcNAc constitutes a promising therapeutic target, inhibitors of the enzymes involved in its regulation, O-GlcNAc transferase and O-GlcNAcase, have been obtained (Permanne et al., 2022). Some have undergone Phase I trials aiming at Phase II trials for tauopathies (Alteen et al., 2021).
Another potentially deregulated structure that has been largely investigated in AD is bisecting GlcNAc. Early findings in human AD brains, reported increased levels of mRNA expression of the MGAT3 Fig. 4. Major deregulated glycan structures from glycoproteins as potential biomarker targets in neurodegenerative diseases. References in green, blue, pink, brown and black correspond to studies in human CSF, human brain tissues, human blood, disease models and review articles, respectively. Created with BioRender.com. gene that encodes the GlcNAc-transferase III enzyme responsible for the synthesis of bisecting GlcNAc (Akasaka-Manya et al., 2010). In a MGAT3 knockout mouse decreased pathological Aβ accumulation was observed (Kizuka et al., 2015) indicating a role of bisecting GlcNAc in this pathology. A recent study also indicated increases in some bisected glycans in mouse AD frontal cortex but a decrease in bisecting LacNAc-containing glycan in human AD hippocampal tissue (Hawkinson et al., 2022). On the other hand, in AD frontal cortex and hippocampus, individual glycans potentially bisected showed distinct trends between controls and AD; particularly, a confirmed bisected glycan showed a decrease in AD relatively to the control (Gaunitz et al., 2021a).
Concerning the CSF, several studies aiming at quantifying bisecting GlcNAc in AD have been performed and recently reviewed (Gaunitz et al., 2021b) (Table 2). In one study, an increase in the levels of several bisecting GlcNAc glycans was reported in a subgroup of AD patients (Palmigiano et al., 2016). Another study reported an increase exclusively for the AD female population (Cho et al., 2019). A limitation with these studies is the low number of patients analysed. Interestingly, a study with a higher number of patients (242) using the enzyme-linked lectin assay (ELLA) with the lectin PHA-E that recognizes bisecting GlcNAc, showed higher levels of PHA-E binding in AD and MCI than SCI; there was also significant correlation with phosphorylated tau and total tau, which are AD markers (Schedin-Weiss et al., 2020). More recently, a model was developed based on a retrospective cohort of 233 individuals, which included tau/bisecting GlcNAc ratio from serum, apolipoprotein E status, and Mini-Mental State Examination score, which could predict future AD (Zhou et al., 2023). To strengthen the potential for bisected N-glycans to provide AD markers, it would be crucial to validate these findings in larger independent cohorts of patients.
Other structural changes have been assessed. In N-glycans, a decrease of oligomannose was observed in an AD mouse model (Fang et al., 2020), and in human CSF (Cho et al., 2019). However, a Man5-transferrin glycoform was found increased in AD CSF and MCI patients (Hoshi et al., 2021). Other trends involve fucosylation that was found to decrease in an AD mouse model using lectin arrays and glycoproteomic analysis (Fang et al., 2020). However, whereas this observation was confirmed for AD patients (Chen et al., 2021b), it was contradicted for an AD female population (Cho et al., 2019) in human CSF. Note that a heterogeneous increase in Lewis x was also detected in CSF AD (Schedin-Weiss et al., 2020). As to N-glycan sialylation, decreased levels were detected in human CSF (Palmigiano et al., 2016).
In O-GalNAc type O-glycans, a glycoproteomics study of human AD CSF (Chen et al., 2021a), showed a high amount of core 1 structures (48.3%). A trend towards decreased fucosylation during disease progression from control to MCI to AD was reported. On the other hand, O-glycosylation increased in endogenous peptides.
A recent N-glycoproteomics study of AD asymptomatic and symptomatic brains and controls, showed differences in glycosylation at individual site level that included changes in frequency of bisection, fucosylation, galactosylation, and the number of antennae (Suttapitugsakul et al., 2022).

Parkinson's disease
PD is caused by degeneration of the dopaminergic pathway from substantia nigra in the midbrain to the corpus striatum in the basal ganglia. α-Synuclein aggregation is a hallmark of PD pathology and other synucleinopathies. The potential of αand β-synucleins as CSF biomarker for synucleinopathies has been discussed recently .
α-Synuclein is O-GlcNAcylated in PD, and evidence showed that upregulation of O-GlcNAcylation was protective of dopamine neurons in different aspects of PD pathology, including neurodegeneration in a PD mouse model (Lee et al., 2020a). Recently, it was reported that O-GlcNAc modified α-synuclein results in an amyloid-strain with decreased seeding activity and pathology (Balana et al., 2023).
In another study (Wilkinson et al., 2021), an increase in sialylation in the substantia nigra and striatum and a decrease in sulfation in the striatum were observed on O-glycans of PD patients. Furthermore, incidental Lewy body disease patients (occurring at initial stages of parkinsonism) showed decreased levels of O-mannose-core and glucuronylated structures in the striatum.
Unfortunately, for CSF glyco(proteo)mics in PD the information is scarce. In one study, the ratio of serum-type / brain-type Tf (H5N4S2 and H3N5F1 glycoforms, respectively) in the CSF was higher in PD patients than in controls, as calculated from immunoblotting (Yoshihara et al., 2016). In order to corroborate the potential of this ratio as marker for neurological diseases it would be important to use a more rigorous method of quantification with a higher number of patients and controls.

Amyotrophic lateral sclerosis
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease caused by the progressive death of motor neurons. Neurofilaments, including neurofilament light chain (NFL) and phosphoneurofilament heavy chain (pNFH), from the CSF and the plasma constitute promising biomarkers for the disease and have been largely investigated (Feldman et al., 2022;Goncalves et al., 2015;Lehnert et al., 2014;Oeckl et al., 2016). However, since neurofilaments are neurodegeneration markers they are also deregulated in other neurodegenerative diseases and it would be useful to identify additional and more specific targets.
Neurofilaments are post-translationally modified by O-GlcNAcylation and phosphorylation (Khalil et al., 2018). In the transgenic mouse model SOD1G93A of ALS, a decrease in O-GlcNAcylation in the spinal cord (Shan et al., 2012) including of neurofilament medium chain (Ludemann et al., 2005) was observed. Recently, the ratio of O-GlcNAcylated to total NFL from the serum appeared as a promising marker to differentiate healthy patients from patients with brain disease (cerebral thrombosis, AD and PD) and also to differentiate non-neurodegenerative cerebral thrombosis from neurodegenerative diseases AD and PD (Zhou et al., 2022). It will be interesting to investigate this ratio in other neurodegenerative diseases such as ALS.
A potential link between glycosylation and ALS has recently appeared, with the finding of mutations in the GLT8D1 gene in families with ALS (Cooper-Knock et al., 2019). Sequence analysis of the corresponding protein assigned it to the GT8 glycosyltransferase family described in the CAZy database of Carbohydrate-Active Enzymes (http://www.cazy.org/GT8.html), which opened interesting perspectives of novel disease mechanisms. GLT8D1 has also been identified as a SCZ risk gene (Yang et al., 2018), which is in line with evidence from the literature supporting a genetic relationship between ALS and SCZ McLaughlin et al., 2017).
NP-HPLC data show that monosialylated diantennary glycans in CSF are more highly expressed in patients relative to neurological controls (Goncalves et al., 2015). Furthermore, higher levels of galactosylated species in CSF IgG from ALS patients compared to neurological controls (predominantly ALS-mimicking disorders) with a predictive value comparable to that of phosphoneurofilament heavy chain were found (Costa et al., 2019). These findings were compatible with inflammation and autoimmunity not being primary events in ALS, by contrast with chronic inflammation diseases where decreased levels of IgG sialylation and galactosylation were observed, for example, in chronic inflammatory demyelinating polyneuropathy (Wong et al., 2016) or multiple sclerosis (Wuhrer et al., 2015).

General conclusions and perspectives
Proteins from the CNS are extensively glycosylated with glycans playing important roles, for instance in the modulation of receptor function or in cell adhesion. Recent progress using sensitive and highresolution analytical techniques allows the collection of large data sets containing thorough information about the mammalian brain glycome. Yet, comparison of trends is many times hindered due to variations associated with different experimental set-ups; in this context, the publication of standardized protocols with the inclusion of relevant standards would contribute to a better comparability of studies, thus allowing more robust physiological and translational conclusions. Furthermore, the development of glycoinformatics resources including curated databases as presented in this work as well as additional tools accounting for quantitative information, are of crucial importance for comparing studies. Such tools would lead to model variations associated with brain diseases, unveil new associations, new trends and, consequently, design new therapeutic targets.
Brain has long recognized distinguishing patterns of protein glycosylation that are also reflected in CSF molecular composition. Since deregulation of glycosylation occurs in several CNS diseases, monitoring glycosylation changes in the biofluid CSF appears as a particularly promising field of research to identify novel disease biomarkers. For example, in AD bisecting GlcNAc has appeared as a potential but still controversial target, as well as increased global glycosylation. On the other hand, O-GlcNAcylation has been studied for longer time in AD and also in other neurodegenerative diseases, such as PD or ALS, and not only is it a potential marker but also appears to have a protective role. High levels of fucosylation being a marked feature of brain glycoproteins deserves further investigation, not only core fucosylation but also peripheral fucosylation, namely, the functionally relevant Lewis x motif. Other structures, such as PSA, sulfated motifs or LacdiNAc could also provide interesting targets.
In conclusion, protein glycosylation and its relevance in brain disease is currently seen in a new light due to the development of powerful analytical techniques complemented by innovative glycoinformatics tools. The urge to understand disease mechanisms to unveil novel therapeutic targets also pushed forward this field of research. Furthermore, alterations in protein glycosylation, particularly from the CSF, constitute promising biomarker candidates towards a precision medicine approach.

CRediT authorship contribution statement
Conceptualization and literature search by JC. Data curation and initial draft preparation by JC and CH. Artwork done by JC, CH and FL. Brain dataset implementation by CH and FL. Writing and revision by JC, CH and FL.

Declaration of Competing Interest
none.