Reading the glyco-code: New approaches to studying protein–carbohydrate interactions

The surface of all living cells is decorated with carbohydrate molecules. Hundreds of functional proteins bind to these glycosylated ligands; such binding events subsequently modulate many aspects of protein and cell function. Identifying ligands for glycan-binding proteins (GBPs) is a defining challenge of glycoscience research. Here, we review recent advances that are allowing protein-carbohydrate interactions to be dissected with an unprecedented level of precision. We specifically highlight how cell-based glycan arrays and glycogenomic profiling are being used to define the structural determinants of glycan-protein interactions in living cells. Going forward, these methods create exciting new opportunities for the study of glycans in physiology and disease.


Introduction
Cellular glycosylation is a molecular language of staggering complexity. The "letters" of the human glycocode consist of 10 monosaccharides that can linked together in many different ways [1]. Resulting cell surface glycans are produced through the combined activity of hundreds of writer enzymes, principally glycosyltransferases. Writer enzymes build oligosaccharide chains that are covalently attached to thousands of protein and lipid scaffolds [1]. The resulting collection of glycoprotein and glycolipid antigens encode biological information that is subsequently translated by glycan-binding proteins (GBPs). These factors can be thought of as readers of the cellular glyco-code. Binding between a reader protein and a glycosylated ligand can trigger cell signalling pathways, drive cell-cell This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).This work is licensed under a Creative Commons Attribution 4.0 International License, which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. adhesion and nucleate new protein-protein interactions ( Figure 1) [1]. Recent advances in translational glycoscience (reviewed in Ref. [2]) have underlined the significance of these protein-glycan interactions to the pathogenesis and treatment of disease. For example, the Siglec family of glycan-binding immune receptors have become hot targets for cancer immunotherapy in recent years [3]. Siglecs are inhibitory receptors that bind to sialic acid-containing glycan ligands frequently overexpressed on the surface of cancer cells [3]. Blockade of Siglec signalling with targeted antibodies, enzymatic removal of cell surface sialoglycans, and targeted suppression of sialic acid synthesis have all recently been shown to induce potent antitumour immune responses in vivo [4][5][6]. Other GBPs, such as the soluble galectins and the functionally diverse C-type lectin family, are similarly emerging as targets for various diseases [7][8][9][10].
Every GBP possesses at least one cognate ligand that regulates its biological function.
Characterization of these ligands can produce key insights into the physiological role of a specific GBP. Ligands, once identified, can also provide new molecular targets for therapeutic intervention [4,11]. Identification of ligands for GBPs, however, is not a simple task. Glycan-protein interactions are influenced by a combination of several factors. Firstly, glycans vary both in their monosaccharide composition and in the stereochemistry of the linkages between sugar subunits. Most GBPs exhibit both monosaccharide and linkagebased binding preferences [1,3]. Secondly, while some GBPs display little specificity beyond recognition of the sugar itself, many other GBPs can interact with non-carbohydrate elements of protein and lipid scaffolds in ways that strengthen both the affinity and specificity of binding [12,13]. Finally, glycan-protein interactions are frequently influenced by the valency and topology of glycan presentation on the cell surface. Prior work has convincingly demonstrated that a GBP can show markedly different affinities for the same glycan structure depending on how closely glycan units are clustered together on the cell membrane [14,15].
As a result of this complexity, it is first important to clearly define what constitutes a ligand for a GBP. In our view, ligands should not always be thought of as isolated glycan structures that can be attached to many redundant scaffolds. Numerous studies have demonstrated that the nature of the underlying scaffold can be critical for understanding glycan-protein binding events [12,16]. It is also inadequate merely to identify glycosylated proteins that happen to be bound by a GBP. Many proteins exist in multiple glycosylation states, and often only one of these glycoforms will actually serve as a functional ligand [17][18][19]. It is instead best to conceptualize a ligand as a 3-dimensional macromolecular motif that exhibits a distinguishing set of carbohydrate, proteinaceous and topological features. How do we define what these features are for any given ligand? In this Mini-Review, we highlight recent methodological advances that are providing new options for researchers interested in this question. glycosylation states of protein interactors. While these methods are as old as glycoscience itself, technical advances in mass spectrometry-based glycoproteomics continue to improve their sensitivity and resolution [20]. In recent years, for example, this basic strategy has been applied to identify novel interactors of immune-regulatory GBPs [21][22][23][24] and microbial glycoproteases [25], among many other examples. New platforms for engineering soluble, multimeric GBPs with higher target-binding affinities also provide improved tools for conducting these studies [26].
In some cases, glycan-protein interactions cannot be preserved under the denaturing conditions required for cell lysis and affinity capture. A number of key studies have shown that such transient glycan-protein contacts can be effectively captured through the use of proximity labeling approaches [27][28][29][30][31]. In one version of this strategy, recombinant GBPs are genetically fused to an enzyme like ascorbate peroxidase (APEX) or horseradish peroxidase (HRP) [27,29]. These reagents are then incubated with intact cells. An enzymecatalyzed reaction is subsequently performed that covalently biotinylates proteins in close proximity to the GBP. Affinity capture and mass spectrometry can then be used to identify novel scaffolding proteins that might interact with the GBP [27][28][29][30][31]. Notably, proximity labeling methods vary significantly in the range (the "diffusion radius") within which interacting proteins are labelled. Different methods can thus be expected to capture different interacting partners, which may bind the GBP either directly or indirectly [27][28][29][30][31].
Another notable advance has been the development of photocrosslinking monosaccharide analogues, which can be used to chemically fix glycan-protein interactions. In this approach, intact cells are incubated with a synthetic monosaccharide bearing a caged cross-linking group, which is subsequently incorporated into a broad range of cell surface glycans ( Figure  2). Cells can then be incubated with recombinant GBPs, which will selectively bind to a subset of these glycans. Following exposure to light, the cross-linking moiety reacts to form covalent links between the GBP and any glycoprotein scaffolds to which the GBP has bound. Simple affinity purification protocols can then be used to identify these crosslinked scaffolding proteins and their appended glycans. These techniques were recently well-reviewed in Ref. [32].
Photocrosslinking sialic acid analogues represent one class of tools that have been developed and used to study glycan-protein interactions in recent years [33][34][35]. More recently, the applicability of this technique has been greatly expanded by the generation of the first photocrosslinking N-acetylglucosamine analogue, which can be used to broadly identify N-glycosylated ligands for various GBPs [36]. The development of these tools has been paralleled by the arrival of new methods for engineering cellular glycosylation pathways. Recent reports have shown how "bump-hole" enzyme engineering strategies can expand the scope of chemically modified sugars accepted by the cellular glycan biosynthetic machinery, further expanding possible applications of future photocrosslinkers [37][38][39][40]. Methods development in this space has been rapid, and it will be exciting to see whether these techniques can be integrated and applied to a wider range of biological questions. It is not possible, however, to profile glycan-protein binding through biochemical approaches alone. In pulldown experiments, for example, cell lysis removes proteins from their native cell membrane context, which can be a crucial factor influencing GBP-ligand interactions.
Proteins that interact with recombinant GBPs in a pulldown experiment will not necessarily act as functional ligands for the GBP in situ. More specialized approaches to studying these binding events in intact cells are thus also required.

The future of glycan arrays
Unlike proteins, which can be easily recombinantly expressed, glycans must either be chemically synthesized or isolated from natural sources [41]. If enough distinct glycans can be accessed and purified, these molecules can be coupled to a solid support, generating a synthetic "glycan microarray" [41]. These arrays can then be probed with recombinant GBPs, yielding basic insights into oligosaccharide-binding specificity [41]. Recent leaps in the automation and accessibility of high-throughput glycan synthesis have greatly expanded the range of glycan structures that can be interrogated using this method [42,43]. While synthetic glycan arrays are now a mature technology (see Ref. [41] for a more comprehensive review), these tools continue to generate important findings. One recent study, for example, used glycan arrays to discover that the co-stimulatory immune receptor CD28 binds in vitro to glycans containing sialic acid. This initial finding sparked further biological investigation that revealed a whole new role for sialoglycans in regulation of T-cell activation [44]. Additionally, a huge amount of glycan array data is now easily accessible to the glycobiology community. These results can thus be leveraged to generate new discoveries through the application of modern data analysis pipelines. One recent work, for example, systematically analyzed a range of publicly available glycan microarray results with a novel machine learning algorithm [45]. This approach correctly predicted several aspects of GBP binding specificities that had been missed by prior studies [45].
Synthetic glycan arrays have some important limitations, however. Glycans are typically attached to an inert solid support via a non-natural linker; biologically relevant factors like glycan valency and clustering are not well recapitulated (Figure 3) [41]. There has thus long been interest in developing new chemical approaches (such as the synthesis of bio-mimetic glycopolymers) that allow glycans to be presented in a more biologically relevant context [46,47]. Several recent papers have reported new breakthroughs in this area, using a variety of innovative approaches.
In one study, researchers used Chinese hamster ovary (CHO) cells as a platform to construct "cell-based glycan arrays" [48]. As these cells express a limited repertoire of glycan structures on their surface, in vitro treatment with a small panel of recombinant glycosyl-transferases allowed for installation of fucose and sialic acid into cell surface glycans in a linkage-specific manner [48]. The investigators successfully used this platform to identify a high-affinity ligand for the immune receptor Siglec-15 [48]. Another study recently reported the coupling of synthetic glycans to the surface of bacteriophage particles [49]. These efforts yielded a "liquid glycan array": a library of glycosylated phages where every glycan structure was associated with a specific DNA barcode. Crucially, this platform allows for tunable control of the valency with which glycans are displayed on the virus capsid. Pulldown of phage particles with GBPs, followed by deep sequencing of barcoded phage libraries, revealed optimal glycan structures and densities required to enable protein-glycan interactions [49]. In future, encoding glycan structural information into easily Wisnovsky  sequenced DNA libraries will significantly augment the ease and sensitivity of glycan array experiments [49].
CRISPR-Cas9 genome engineering can also be used to edit glycan biosynthesis in ways that allow dissection of GBP binding specificities [50][51][52]. In one recent study, researchers used CRISPR-Cas9 gene editing to generate a panel of HEK293 cell lines bearing knockouts and knock-ins of key glycosyltransferase enzymes. This effort produced a glycosylation "atlas": a collection of cell lines exhibiting a wide range of cell surface glycosylation patterns [50,51]. In a subsequent work, these glyco-engineered cell lines were transfected with a panel of O-glycoprotein reporter constructs. Binding of transfected cells to recombinant Siglecs was then assessed, allowing for systematic dissection of both glycan and scaffolding protein-binding specificities. Intriguingly, the authors found that some Siglecs exhibited specificity for certain O-glycoprotein reporters. Additionally, these specificities changed upon perturbation of O-glycan biosynthesis [53]. This effort highlighted the complex interplay between carbohydrate and protein structural elements in determining Siglec ligand affinities. In another study, researchers used the same technology to profile the carbohydrate binding and cleavage specificity of glycoprotein-specific proteases, demonstrating the wide applicability of this approach [54]. Going forward, these biological glycan arrays are ripe for application towards studying a huge range of GBPs that have not been fully characterized using in vitro approaches.

Applying functional genomics to discover novel ligands for GBPs
Expression of a ligand for a GBP can require co-expression and repression of many genes (glycosyltransferases, glycosidases and scaffolding proteins) within a discrete biosynthetic circuit. Recent work has highlighted how cutting-edge functional genomics screening methods can be applied towards mapping these pathways (Figure 4). In one recent study, researchers transduced Cas9-expressing cells with a genome-wide library of short guide RNAs (sgRNAs) and isolated cells (via FACS) that showed reduced binding to recombinant Siglecs. sgRNAs were then sequenced to identify genes whose knockdown reduces expression of glycan ligands for these GBPs [55]. This screening study identified a set of glycosyltransferases that synthesize a specific glycan (the disialyl core 1 tetrasaccharide), as well as a specific O-linked glycoprotein called CD43. Extensive biochemical characterization revealed that Siglec-7 binds to a specific epitope at the Nterminus of CD43, which is decorated with clusters of adjacent disialyl core 1 residues [55]. Both the structure of the glycan and the nature of the scaffold are critical for enabling binding, exemplifying how composite glycopeptide motifs often act as ligands for GBPs.
Functional genomics can also be used to characterize the upstream factors that regulate these biosynthetic pathways. One work recently used CRISPR screening to show that the transcription factor KDM2B regulates ligand expression for heparan sulfate-binding proteins by coordinating expression of multiple sulfotransferase enzymes [56]. A related study used bioinformatic analysis to identify ZNF263 as another key transcription factor directing heparan sulfate biosynthesis [57]. Coupling of RNA-seq with lectin microarray technology has revealed that miRNAs act as master regulators of glycan biosynthetic pathways ( [58] recently well-reviewed in Ref. [59]). Finally, recent ground-breaking studies have applied scRNA-seq to correlate specific transcriptomic states with glycan structure and abundance in both primary Tcells [60] and human stem cells [61]. Taken together, these strategies provide new avenues for profiling the genetic circuits that underlie distinct cellular "glyco-states", with exciting future implications for characterizing the drivers of glycanprotein interactions.

Conclusions & future perspectives
CRISPR functional genomics and single cell profiling methods will continue to become more closely integrated with glycoscience in future years. In particular, we expect newer CRISPR-based technologies, such as programmable epigenetic silencing and base editing, to generate further opportunities at the genomics/glycomics interface [62,63]. It is also notable that attempts to study the glycome with functional genetics have so far used only single gene knockout methods. Cellular glycostates, however, frequently emerge from the concerted activity of many functionally redundant genes [1]. High-throughput screening methods that allow multiplexed, simultaneous disruption of several genes in combination are ideally suited to the study of these complex glycostates. The continuous refinement of such techniques [64,65] will enable the study of glycan-protein interactions in future.
The closely timed development of cell-based glycan arrays and glyco-genomic profiling methods has been another major theme in recent years. There are some parallels between these approaches. Recent application of these two techniques to discovery of Siglec ligands, for example, has produced some strikingly similar (although not identical) findings [53,55]. These strategies are not redundant, however, but complementary. Genome-wide screening studies cannot achieve the precise control over cellular glycosylation that is possible with targeted genetic or chemical engineering. On the other hand, the ability to broadly assess the function of thousands of genes in a single experiment, rather than a more limited array of glycosyltransferases and scaffolds, may uncover unexpected biology that would be missed by array-based studies. Interestingly, resources are now being developed that may offer a productive synthesis of these two approaches. In one recent work, for example, investigators constructed a pooled sgRNA library targeting all known glycosyltransferase enzymes and applied this library in high-throughput screens to rapidly identify genetic determinants of Selectin binding to living cells [66]. We anticipate these types of glycobiology-specific screening tools will continue to be developed and will be of great use to the research community.
Lastly, we highlight that new research is challenging fundamental assumptions in glycoscience. It has long been thought that glycans can only be attached to protein and lipid scaffolds. Genetic approaches to studying glycosylation have historically rested on the assumption that ligands for GBPs will be generated by the protein-coding genome. The recent discovery of "glyco-RNA" conjugates, and the related finding that these glycosylated nucleic acids may bind to certain Siglecs, has turned this assumption on its head [67].
These studies remind us that much remains to be discovered about how glycans regulate macromolecular interactions, and hint at the exciting work that remains to be done. The cell surface glyco-code. Scaffolds are the proteins or lipids to which carbohydrate chains are attached. These molecules play a key role in the presentation of glycans on the cell surface. Writers are enzymes (principally glycosyltransferases) that build complex carbohydrate chains. These enzymes can directly link a glycan to a protein/lipid. They can also extend glycans by covalently linking new monosaccharide building blocks to existing oligosaccharide chains. Readers are glycan-binding proteins that bind to specific glycan-containing molecular motifs, which are termed ligands. Often, binding between a reader and its ligand will initiate transduction of downstream biological signaling pathways. UDP = Uridine diphosphate, CMP = Cytidine-5-monophosphate. Wisnovsky   Affinity capture of GBP-binding ligands. In one approach, cells are incubated with a synthetic monosaccharide modified with a photocrosslinking group. After incorporation of this sugar into cell surface glycans, cells are incubated with a recombinant GBP. Exposure of cells to light induces the formation of covalent cross-links between the GBP and its ligands. Proteins can then be affinity purified and subjected to glyocoproteomic analysis to identify glycoproteins bound by the GBP. Wisnovsky   Synthetic and biological glycan arrays. Synthetic arrays may contain glycans that form components of a physiological ligand for a GBP, but these glycans may not be presented in the correct valency and orientation to mediate high-affinity binding events. Synthetic glycans may also be anchored to a solid support via flexible linkers that don't recapitulate physiological interactions with a protein scaffold. Engineering of cell surface glycans by genetic or chemical approaches, conversely, allows for dissection of glycan-binding interactions in their correct biochemical context. Wisnovsky   Glyco-genomic profiling of GBP-ligand interactions. A genetically heterogenous cell population is either generated through CRISPR genome-wide mutagenesis or isolated from physiological sources. Cells exhibiting altered (either diminished or elevated) binding to a glycan-binding protein are isolated (via FACS) and genetically characterized by highthroughput sequencing. This method allows unbiased identification of genes required for expression of ligands for the given glycan-binding protein, information that can be used to infer structural characteristics of the ligand. Wisnovsky Curr Opin Struct Biol. Author manuscript; available in PMC 2023 January 04.