Evolutional and clinical implications of the epigenetic regulation of protein glycosylation

Protein N glycosylation is an ancient posttranslational modification that enriches protein structure and function. The addition of one or more complex oligosaccharides (glycans) to the backbones of the majority of eukaryotic proteins makes the glycoproteome several orders of magnitude more complex than the proteome itself. Contrary to polypeptides, which are defined by a sequence of nucleotides in the corresponding genes, glycan parts of glycoproteins are synthesized by the activity of hundreds of factors forming a complex dynamic network. These are defined by both the DNA sequence and the modes of regulating gene expression levels of all the genes involved in N glycosylation. Due to the absence of a direct genetic template, glycans are particularly versatile and apparently a large part of human variation derives from differences in protein glycosylation. However, composition of the individual glycome is temporally very constant, indicating the existence of stable regulatory mechanisms. Studies of epigenetic mechanisms involved in protein glycosylation are still scarce, but the results suggest that they might not only be important for the maintenance of a particular glycophenotype through cell division and potentially across generations but also for the introduction of changes during the adaptive evolution.

Abstract Protein N glycosylation is an ancient posttranslational modification that enriches protein structure and function. The addition of one or more complex oligosaccharides (glycans) to the backbones of the majority of eukaryotic proteins makes the glycoproteome several orders of magnitude more complex than the proteome itself. Contrary to polypeptides, which are defined by a sequence of nucleotides in the corresponding genes, glycan parts of glycoproteins are synthesized by the activity of hundreds of factors forming a complex dynamic network. These are defined by both the DNA sequence and the modes of regulating gene expression levels of all the genes involved in N glycosylation. Due to the absence of a direct genetic template, glycans are particularly versatile and apparently a large part of human variation derives from differences in protein glycosylation. However, composition of the individual glycome is temporally very constant, indicating the existence of stable regulatory mechanisms. Studies of epigenetic mechanisms involved in protein glycosylation are still scarce, but the results suggest that they might not only be important for the maintenance of a particular glycophenotype through cell division and potentially across generations but also for the introduction of changes during the adaptive evolution.

Protein glycosylation is an ubiquitous posttranslational modification
Glycosylation is an ancient evolutionary protein modification still existing in all three domains of life including Archea, Bacteria and Eukarya (Calo et al. 2010;Weerapana and Imperiali 2006). Over half of all known eukaryotic proteins are N-glycosylated in a process starting with the sequential addition of sugar molecules to the dolichol pyrophosphate and followed by the transfer of a branched structure (glycan) to asparagine residues within a sequence Asn-X-Ser/Thr of a target protein (Apweiler et al. 1999). The sequential formation of glycan structures occurs in the endoplasmic reticulum by the enzymatic activity of transferases belonging to a gene family termed ALG, whereas a transfer to a target protein is catalyzed by oligosaccharyl transferase (Helenius and Aebi 2001) (Fig. 1). During their passage through Golgi, N-glycans can further be modified by core fucosylation, which primarily affects integrins involved in cell-cell adhesion and motility (Potapenko et al. 2010). Additionally, following proper protein folding, glycan moieties can be modified by branching, which has been related to receptor signalling (Partridge et al. 2004) and cancer development (Dube and Bertozzi 2005).
Alternatively, glycans can be added to serine or threonine residues in a process called O glycosylation. Instead of being synthesized through a dolichol-based core structure, O glycosylation includes direct transfer of Nacetylgalactosamine (GalNAc) to a protein creating a Tn antigen (Ten Hagen et al. 2003), which can further be converted into several different O-glycan structures including sialyl Tn, T antigen, sialyl T and disialyl T (Dalziel et al. 2001;Potapenko et al. 2010). O glycosylation is an essential modification of glycoconjugates called mucins, which play a role in control of the immune response (Varki and Angata 2006) and carcinogenesis (Hollingsworth and Swanson 2004).
In this review, we will mainly focus on N glycosylation. Its importance was indicated in a study where germline deletion of a gene coding for an enzyme GPT (UDP-Glc-NAc: dolichol phosphate N-acetylglucosamine-1-phosphate transferase) that catalyzes the first step in the glycan precursor biosynthesis leads to embryonic lethality in mice (Marek et al. 1999). In unicellular organisms, glycans generally function only as structural components of the cell membrane, while in multicellular organisms they acquired various complex functions needed to integrate numerous cells into a single functional unit (Drickamer and Taylor 1998;Varki 1993).

Glycans are promising disease biomarkers
At least 2,000 different glycan determinants have been found to exist in mammalian glycoproteins (Cummings 2009) and between two and five glycans are attached to a single glycoprotein. This results in an exceedingly complex glycoproteome (defined as the complete set of all glycoproteins in an organism), estimated to be at least several orders of magnitude more complex than the proteome itself (Lee et al. 2005). A typical glycan is a complex molecule containing between 10 and 15 monosaccharides. Contrary to proteins and DNA which are linear molecules, glycans are nonlinear branched structures that are characterized not only by the sequence of monomeric units but also by the exact position of the glycosidic bond, its anomeric configuration (α or β), the number of branches and the position of branching.
Genealogy and biosynthetic pathways in the synthesis of the glycan and the polypeptide parts of a glycoprotein are different. Nevertheless once synthesized, glycoprotein functions as a single unit with defined structural and functional properties, which originates from both the protein and the glycan moiety of the molecule (Skropeta 2009). The prominent peculiarity of the glycan moieties of glycoproteins, contrary to the polypeptide moieties, is that our genome does not contain genetic templates to code for them. Instead, glycan structures result from the activity of a dynamic network of over 600 "glycogenes" (Fig. 1) (Taniguchi et al. 2002;) that code for various glycosyltransferases, glycosidases, enzymes for sugar nucleotide biosynthesis, transporters, etc. (Abbott et al. 2008;Nairn et al. 2008). While the production of a novel polypeptide structure requires a change in a DNA sequence, which could have unfavourable effects on genome stability, novel glycan structures could also be resulting from a modulation of gene expression as well as a change in the activity and/or the localization of any of the enzymes involved in their production. Moreover, the expression of "glycogenes" could be affected by various transcription factors, Golgi organizers, proton pumps, etc., which additionally increases the number of genes and protein products that are directly or indirectly involved in the synthesis of glycans. Since these changes are liable to environmental influences (Lauc and Zoldos 2010), they are less predetermined than classical Mendelian mutations and can lead to altered repertoire of glycan structures, as was found in virtually all diseases (Alavi and Axford 2008;Axford 1999;Brockhausen et al. 1998;Lebrilla and An 2009).
Even though the epigenetic effects pronounce the complexity of glycan structures, there are also "classical" mutations occurring in DNA leading to variability in glycan synthesis. These rarely affect genes coding for core Nglycan structure, but are more common in genes involved in the production of glycan antennas, apparently causing a large part of individual phenotypic variations that exist in humans and other higher organisms. The majority of human variability originates from single nucleotide polymorphisms (SNPs). Individually, they do not manifest visible phenotypes, but if present in specific combinations within the same individual, they can have significant phenotypic effects (Brown 2009;Dhiman et al. 2010;Ovsyannikova et al. 2010). Due to the involvement of hundreds of genes in glycan synthesis, glycosylation is particularly prone to this type of variability. Some combinations of individual SNPs can be manifested as specific glycophenotypes, which might represent potential evolutionary advantages or disadvantages. For example, various forms of congenital  (Freeze 2006). Each mutation alone results in a decrease of enzyme activity, but when combined in the same individual, they can result in a complex phenotype associated with significant mortality and extensive motoric, immunological, digestive and neurological symptoms (Freeze 2002(Freeze , 2006. Many of these genes have been identified as important in dolichol-linked oligosaccharide biosynthesis (Haeuptle and Hennet 2009). Taking this into consideration, protein glycosylation may represent by far the most complex and the most expensive metabolic pathway in eukaryotes. In addition, since their synthesis depends on both the presence of genetic polymorphisms and possible environmental changes, glycans are very promising biomarkers.

Glycans are essential regulators of a number of different biological processes
Variations in glycosylation are of great physiological significance since the altered glycan can significantly change the structure and function of the whole glycoprotein (Skropeta 2009). For example, proper glycosylation of Asn 297 is essential for the pro-inflammatory activity of immunoglobulin G (IgG) antibodies during humoral immune response. The glycan moiety maintains the IgG heavy chains in the proper conformation to bind the Fc receptor Fcγ (Arnold et al. 2007). Consequently, the enzymatic removal of this glycan significantly reduces binding to an Fc receptor FcγR and the pro-inflammatory activity in vivo. Interestingly, its modification by sialylation reverses a pro-inflammatory effect of IgG into an antiinflammatory effect through a still unknown mechanism (Anthony et al. 2008; Anthony and Ravetch 2010). Therefore, knowledge of the exact glycan structure has a significant impact on both normal immunoglobulin function in the immune response and during the intravenous application of highly purified, polyclonal IgG antibodies administered in cases of several autoimmune diseases (Nimmerjahn and Ravetch 2008). However, glycan binding can also result in undesirable effects such as in the case of fucose addition to the glycan core of IgG. Its presence there interferes with IgG binding to FcγRIIIa leading to an attenuated capacity to destroy invading cells through antigen-dependent cell cytotoxicity (ADCC) activity. The presence of an exact glycan seems to be essential in this process since IgG molecules lacking core fucose were found to have an ADCC activity enhanced up to 50-100-fold (Shields et al. 2002). Importantly, IgG from an endogenous serum seems to inhibit the antibodyinduced ADCC in therapeutic purposes by competing for FcγRIIIa-binding sites (Preithner et al. 2006). However, the removal of fucose from the therapeutic IgG resulted in its higher affinity to bind FcγRIIIa and a better response (Iida et al. 2006). Still, the therapeutic success will depend on the ratio of core fucosylation on host versus the administered IgGs. In a recent large population study, we observed a range of non-core-fucosylated IgG in normal population to be between 1.5% and 21% (manuscript in preparation).
The biological activities of many other signalling and receptor proteins (including Notch, GLUT4, NMDA receptor, etc.) are also modulated by glycosylation, which appears to be one of the main mechanisms for the adaptive regulation of the cell surface in cell-cell adhesion and cellular communication. Proper glycosylation of membrane receptors is particularly important as it modulates adaptive properties of the cell membrane and affects communication between the cell and its environment (Dennis et al. 2009). Deregulation of glycosylation is associated with a wide range of diseases including cancer, diabetes, cardiovascular, congenital, immunological and infectious disorders (Crocker et al. 2007;Marth and Grewal 2008;Ohtsubo and Marth 2006).

Epigenetic regulation affects glycan biosynthesis
Recently, we performed the first large-scale analysis of human plasma glycome (defined as the complete set of glycan structures in an organism), which revealed unusually high variability in the level of individual glycans in human plasma ). The median difference between minimal and maximal levels of individual plasma glycans was found to be over sixfold, much more than recorded for any other class of macromolecules. Different measured personal and lifestyle parameters (age, body mass index, smoking, etc.) were found to explain only a small part of the observed variability in the analyzed glycans ). In addition, the individual plasma glycome changes very little even after prolonged period of time ). All this argues in favour of the genetic predetermination of the individual glycome composition but also emphasizes the necessity to validate the importance of epigenetic factors that could play a role in establishing a person's glycome profile. Several clearly identifiable glycophenotypes, which significantly differed from the normal glycophenotype in levels of one or more glycans, were observed to exist in both European and Chinese populations. Some of these phenotypes were associated with specific pathological conditions, while others apparently did not have any identifiable adverse consequences for health . The heritability of glycans is generally below 50% ), indicating that the temporal stability of the glycome is the result of regulatory mechanisms, which maintain stable regulation of protein glycosylation with time, but are not necessarily heritable.
Since glycosylation is a very complex metabolic process including hundreds of factors encoded by a large group of "glycogenes" (Nairn et al. 2008), the structures of final products, glycans, are not solely genetically predefined (Fig. 2). Along with the importance of genetic polymorphisms in establishing the variability of glycan structures, much of the variance is resulting from a complex network of interactions between competing enzyme activities, supply of activated monosaccharide donors (nucleotide sugars), transport through Golgi and many other factors ). In addition, the regulation of activity and structural localization of hundreds of glycosyltransferases, glycosidases and other enzymes is involved in the process. Stable epigenetic regulation of expression of these "glycogenes" is an obvious mechanism which can explain both the temporal stability of the glycome in "normal" conditions and specific changes which were reported to appear in various diseases (Alavi and Axford 2008; Gornik and Lauc 2008). A number of studies on epigenetic modifications involved in protein glycosylation is still limited, but their results indicate that this process is very important for the regulation of protein glycosylation (Zoldoš et al. 2010).

Functional implications of epigenetic regulation of protein glycosylation in cancer
Glycans participate in all major physiological events during various stages of tumour progression, from tumour cell proliferation, its dissociation from primary cancer and dissemination through bloodstream, metastasis and angiogenesis (Fuster and Esko 2005;Potapenko et al. 2010;Reis et al. 2010). Drastic changes in carbohydrate determinants facilitating these events are often caused by the alteration in gene expression levels of glycosyltransferases. Consistent with the Knudson's two-hit hypothesis, promoter hypermethylation of a glycogene could be an event that is along with allelic loss or loss of heterozigosity causing this alteration (Kim et al. 2008). Indeed, both occurrences were responsible for the downregulation of A/B glycosyltransferases, which in turn resulted in a decreased expression of blood group A and B antigens in tumours (Dabelsteen and Gao 2005). The role of promoter methylation was further confirmed by treating gastric and colon cancer cell lines with a methylation inhibitor 5-aza-2'-deoxycytidine (5-aza-dC) that resulted in the upregulation of α1,3-N-acetylgalactosaminyltransferase, an enzyme responsible for the expression of the A determinant (Kawamura et al. 2008). Other examples of the regulation of glycogene expression by promoter methylation include the N-acetylglucosaminyltransferases GnT-IVa and GnT-IVb in pancreas (Ide et al. 2006), the α-1,3/4 fucosyltransferase Fig. 2 Glycan structures integrate the activity of hundreds of genes. An example of structural variations in IgG glycans is presented. Initial GlcNAc 2 Man 3 GlcNAc 2 structure (red square) can be modified by the addition of bisecting GlcNAc (GnTIII), fucose (FUT8) or galactose (GalT). These resulting structures can further be modified by the activity of the same enzymes or by the addition of the sialic acid (SiaT) (FUT) 3 in gastric carcinoma cell lines (Serpa et al. 2006) and FUT7 in leukocytes (Syrbe et al. 2004).
Around half of the known human genes contain CpG-rich sites within their promoter regions including many glycogenes. One of the more recent studies undertook more extensive analyses concerning the involvement of DNA methylation in the regulation of fucosylation in several cancer cell lines with relatively low fucosylation levels. Interestingly, treatment with a methyltransferase inhibitor zebularine resulted in the upregulation of the expression of fucosylation-related genes, including Fut4, GDP-fucose transporter and FX genes, leading to an increase in the global fucosylation level (Moriwaki et al. 2010). Still, the presence of a CpG island in a gene promoter does not necessarily imply a control of gene expression by DNA methylation, as shown for fucosyltransferase genes FUT1 and FUT2 whose expression levels were not recovered following the treatment with 5-aza-dC (Kawamura et al. 2008).
Another interesting observation came from studying the increased expression of sialyl Lewis a antigen in cancers of the digestive organs, where it serves as a ligand for Eselectin, thus mediating metastasis (Kannagi 1997). A very similar glycan epitope disialyl Lewis a is preferentially synthesized in normal epithelial cells by the action of α2→6 sialyltransferase (ST6GalNAc6), which adds an additional sialyl group to a sialyl Lewis a glycan (Tsuchida et al. 2003). Previously, the common strategy to explain an enhanced synthesis of cancer-related glycans was to study a potential increase in the levels of transcription of genes involved in their synthesis (Ito et al. 1997). Surprisingly, in this case, it was the downregulation of ST6GalNAc6 that resulted in the overexpression of sialyl Lewis a . A decrease in the expression level was achieved by epigenetic mechanisms as shown in cultured human colon cells where following the treatment with butyrate, an inhibitor of histone deacetylation, or 5-azacytidine, an inhibitor of DNA methylation, the reexpression of disialyl Lewis a was induced (Miyazaki et al. 2004). This example argues in favour of the dominance of the "incomplete synthesis" over the "neosynthesis" concept in production of some complex carbohydrate determinants. Accordingly, it was shown that the downregulation of various glycogenes, involved in the synthesis of normally expressed determinants in cancer cell lines, occurs primarily by promoter methylation (Kawamura et al. 2008), which could potentially result in the upregulation of other cancer-associated carbohydrate antigens.
Nevertheless, complex glycans are often found in malignant cells with high metastatic potential. These cells are characterized by an increased expression of β1,6branched N-linked glycans, resulting from the upregulated expression of GnT-V (also known as Mgat5) ). These complex glycan structures are found on different proteins, including an adhesion molecule E-cadherin, which promote cell detachment and invasion (Dennis et al. 2002) and a tissue inhibitor of metalloproteinase-1 in human colon cancer cells, increasing their metastatic potential (Kim et al. 2008). The observed upregulation of the GnT-V expression level was positively correlated to the expression of a transcription factor ets-1 in various cancer cell lines ). On the other hand, glycan branching was shown to be inhibited by the overexpression of GnT-III, thus reducing the metastatic potential of cancer cells . Recently, we found the correlation between a transcription factor hepatocyte nuclear factor 1A (HNF1A) promoter methylation and the expression of complex branched glycans (unpublished data). Thus, an intricate network of various factors seems to be involved in the synthesis of complex glycan structures. To elucidate a cell's capacity to synthesize a particular glycan, one needs to understand the modes of regulating the expression of not only glycogenes but of transcription factors and all other glycan-related genes involved in the synthesis of a particular glycan structure as well. Therefore, we believe future efforts will be directed towards understanding the epigenetic control of gene expression of all factors involved in glycan synthesis, primarily through DNA methylation of promoter sequences, but also in coordination with histone modification marks of particular genomic regions.

Evolutional implications of the epigenetic regulation of protein glycosylation
The variability of glycan structures within a particular species and between different ones by far exceeds the variability of proteins and other macromolecules. From the evolutionary perspective, this seems very logical since proteins perform specific tasks (enzymatic reaction, transport, recognition, etc.) and a change in their structure or activity might result in losing the ability to perform their dedicated task. On the other hand, since glycans primarily modulate the protein function they can vary much more freely. The observed large variety in glycan structures might be resulting from small alterations in structure and/or regulation of different factors combined into a complex pathway, which regulates glycosylation.
Therefore, while small differences between individual genes can hardly account for the versatility of life forms, the introduction of a slight modification in a pathway that involves synergistic action of various factors could have very significant consequences to the physiology of a cell. A good comparison might be drawn with how the traffic functions in a big city; a slight modification in the traffic light timing on several main crossings could result in a disruption of smooth traffic and finally complete chaos. On the other hand, it could also lead to a creation of novel, alternative passageways. Therefore, in the context of living cells, there is a need for precise coordination of activity of all the factors involved in cellular processes. However, the alteration introduced by environmental factors might prove to be beneficial for a cell or an organism in a given environment. While changes in protein structures may arise only as a consequence of mutations that irreversibly alter genetic information and can be validated only in the second generation, changes in glycan structures are reversible and can be repeatedly evaluated within the same organism, the example of which comes from a study of crosstalk between intestinal glycans and commensal intestinal bacteria (Hooper and Gordon 2001).
While on an evolutionary scale in order to improve their phenotypes in response to a given environment, higher organisms are subject to genetic mutations, on a lifetime scale they use much more sophisticated tools to adapt to environmental changes-and these include an epigenetic regulation of gene expression. Nutritional factors such as folate methyl donors, inorganic contaminants, drugs, endocrine disruptors, phytoestrogens, chemicals used in agriculture, etc., are the environmental factors, which can quickly influence cellular enzymatic processes. This occurs due to alterations of a gene expression status of relevant genes in a specific cell type, which is then inherited through numerous cell cycles. It is easily envisioned that an epigenetic modulation of glycogenes expression levels resulting in the synthesis of more adapt glycan structures could render the organism more fit for survival in a given environment (food sources, specific commensal or pathogenic microorganisms etc.).
The evolutionary changes in glycan structures have particularly been studied in the context of host-pathogen interactions. It seems that host organisms induce structural changes in their glycan repertoire in order to evade an invasion by pathogenic microbes (Varki 2006). This often occurs by introducing a mutation that irreversibly inactivates one or more genes involved along the glycan assembly line. Advantageously for the host, this might lead to a prevention of recognition by the pathogen. Nevertheless, it can potentially result in a complete loss of the glycan(s) with important endogenous function(s) (Bishop and Gagneux 2007). In order to avoid potentially detrimental consequences for the host organism, alternative mechanisms might have evolved, which would allow a more subtle control by shutting down the glycogene expression levels in case of a pathogen invasion and reestablishing their activity afterwards. Here, the reversibility and plasticity of the epigenetics modes of controlling gene expression may prove to be more advantageous than genetic mechanisms involving changes in the DNA sequence itself. Favouring this hypothesis is the finding that many glycosyltransferase genes are conserved in between different species (Esko and Selleck 2002;Harduin-Lepers et al. 2005), even though the same conserved glycoprotein can carry different glycan structures (Gagneux and Varki 1999).
The Jean-Baptiste Lamarck's idea arguing that characteristics acquired during an individual's lifetime can be passed on to the offspring has nowadays been reconsidered in the light of epigenetics. Over the past decade, it has become increasingly clear that environmental factors, such as diet or stress, can have biological consequences that are transmitted to an offspring without a single change in a gene sequences. The epigenetics, by definition, involves various modes of reading the genetic information. When in a population a new mode arises, it would as well be subject to the evolutionary pressure, including natural selection, genetic drift, etc., leading to its potential heritability. Even though the epigenetic transgenerational inheritance is still an insufficiently understood process (Rakyan and Beck 2006;Richards 2006), there are examples favouring its existence. It has been shown that some epialleles can be transmitted to the offspring due to the incomplete erasure of cytosine methylation marks during the genome-wide resetting, which normally occurs during the early development in mammals (Morgan et al. 2005). It is tempting to speculate that the transmission of an acquired epigenetic regulatory status of glycogenes to a germ cell could be an essential evolutionary mechanism, which would enable adaptation of complex organisms to environmental changes, while preserving the precious genetic heritage. Whether this would entail skipping the epigenetic re-setup during the early development and in gametogenesis, or whether some other mechanisms might be at hand, remains to be elucidated.