Post-Translational Modifications of Proteins Exacerbate Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV2)

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) is severely affecting the worldwide population. It belongs to the coronavirus family which exhibit protein constituted enveloped single-stranded RNA. These viral proteins undergo post-translational modifications (PTMs) that reorganized covalent bonds and modify the polypeptides and in turn modulate the protein functions. Being viral machinery, it uses host cells system to replicate itself and make their copes, their proteins are also subject to PTMs. Glycosylation, palmitoylation of the spike and envelope proteins, phosphorylation, of the nucleocapsid protein are among the major PTMs responsible for the pathogenesis of the viral infection phase. The current knowledge of CoV proteins PTMs is limited and need to be exploring for to understand the viral pathogenesis mechanism and PTMs effect of infection phase.


Introduction
SARS-CoV-2 is a culprit of the COVID-19 pandemic, caused 168, 599, 045 infections in people followed by 3,507,477 deaths worldwide as of 28th May 2021 according to the World Health Organization (WHO). Despite protective immunity and vaccination against SARS-CoV-2, it is spreading and affecting the worldwide population causing an increase in severity of illness due to the absence of the pre-existing immunity against SARS-CoV-2. This virus belongs to the family of coronavirus (CoV) with Nidovirales order exhibiting protein-containing enveloped positive-strand RNA that causes disease in both humans and animals. SARS-CoV-2 exhibits 30 kb genome size constituted in single-stranded RNA. Its genome comprises 6-11 open reading frames (ORFs) associated with 5′ and 3′ flanking untranslated regions (UTRs).
Coronaviruses are morphologically found spherical accounting average diameter of 80-120 nm with trimeric S-glycoprotein, sometimes with homodimeric HE protein [1,2]. M-glycoprotein is the most abundant protein of virion which provides structural support to the virion. Moreover, E protein is also an essential protein needed for virion assemble and release [3,4]. The nucleocapsid of the virus also comprised of the N protein. In the pathophysiology of virus replication, the S protein (180-200 kDa) plays a major role in binding and recognition with the host cell via a cognate receptor(s). This trimeric S protein comprises two S1, S2, HR1 and HR2 subunits. This S protein comprised of two domain on its N-terminal extracellular transmembrane with another short intracellular C-chain domain [5]. The total length of the S protein of SARS-CoV-2 is 1273 amino acids (aa) arranged in a single peptide (1-13 aa) situated at N-terminus, S1 subunit (14-685 aa) and the S2 (686-1273 aa). The last two-sector is responsible for receptor binding and membrane fusion respectively. Furthermore, the S1 subunit comprises of N-terminal domain (14-305 aa) along with the receptor-binding domain (319-541 aa). Similarly, S2 protein constitutes the fusion peptide (788-806 aa), heptapeptide sequence (HR1) (912-984 aa), HR2 (1163-1213 aa), transmembrane domain (1213-1237 aa) along with cytoplasm domain ((1237-1273 aa) [6].
Post-translational modifications (PTMs) refers to the covalent bonds modifications of the proteins post-release from the ribosomes. PTMs adds new functional groups, like phosphate and carbohydrates along with other biological molecules of desired interest. PTMs is a naturally occurring process responsible for the regulation of protein folding, stability, enzymatic activity, protein to protein, and cell-to protein interaction. The most common and routinely occurring PTMs includes glycosylation, phosphorylation and lipidation (addition of such as palmitoylation and myristoylation) via proteolytic cleavage, formation of disulfide bonds. Proteins can also be modified through other covalent modifications like ubiquitination, sumoylation, glycation and neddylation.
PTMs are naturally occurring process catalyzed by enzymes. For instance, N-linked glycosylation needs series of enzyme reaction that generates precursor dolichol-linked oligosaccharide, oligosaccharyltransferase that initiate the transfer of the glycan to a specific consensus sequence (N-X-S/T, where X is any amino acid except proline) along with glycosidases and glycosyltransferases that are needed for the processing of N-linked glycan. In another series of example, protein ubiquitination needed three different enzymes including ubiquitin-activating enzymes (E1), ubiquitin-conjugating enzymes (E2) and ubiquitin ligases (E3) acting sequentially. It is a known fact that a virus utilizes host machinery to replicate its copies, therefore several viral proteins are susceptible to PTMs. Many viral proteins including structural, non-structural and accessory proteins are modified by PTMs which are affecting viral replication and pathogenesis.

N-linked glycosylation
N-linked glycosylation of the S protein of coronavirus is reported first time in the 1980s [7]. Murine strain MHV S protein acquire several mannose residues in the rough endoplasmic reticulum (ER). It was demonstrated that the S protein of infectious bronchitis virus (IBV), transmissible gastroenteritis coronavirus (TGEV) and bovine coronavirus is modified by the process of N-liked glycosylation [8]. S protein was also supposed to acquire mannose oligosaccharides and undergoes trimerization post entry to the endoplasmic reticulum (ER) before entering to Golgi complex [9]. In previously published literature the glycosylation sites in S protein of MHV predicted are 20 and 21 in numbers [10].
N-linked glycosylation plays important role in maintaining the conformation of coronavirus S protein, can affect the binding with the receptor of host cells and antigenicity of S protein. In a previously published study of IBV, it was found that mutation on the sites of N-linked glycosylation sites significantly causes shifting in the antigenicity of IBV [11]. In another study, it was found that TGEV infected cells when incubated with tunicamycin (inhibitor of N-linked glycosylation) showed a reduction in the antigenicity of both S and M protein [12]. These research findings suggest that N-linked glycosylation is playing a vital role in maintaining the structure and conformation of the S protein and any mutations occurring at these glycosylation sites contribute to the reduction of antigenicity. It may be proposed here that mutations in the N-linked glycosylation in the S protein of SARS-CoV-2 can be beneficial in controlling the infections and reduction in their antigenicity. In another study conducted on IBV, it was demonstrated that N-D or N-Q mutations at the N-linked glycosylation sites N212 or N276 inhibit the function of S protein and in turn, hampers the cell-to-cell fusion and recognition [13].
Moreover, N-linked glycosylation of Dipeptidyl-peptidase 4 (DPP4), a similar receptor of Middle East respiratory syndrome coronavirus (MERS-CoV) significantly affect the binding of MERS-CoV S protein. E protein of SARS-CoV is another important constituent of the virus, it has been demonstrated that it contains two N-linked glycosylation sites N48 and N66, while IBV E protein contains one site at N5. In one of the studies published in past associated with SARS-CoV, it was demonstrated that transfected E protein with N-terminal Myc-tag showed glycosylation co-translationally [14]. However, two transmembrane domains are required to interact with SARS-CoV M protein and the hydrophilic region [14].
In previously published literature it was demonstrated that M protein of α-coronavirus transmissible gastroenteritis virus (TGEV) and Porcine Epidemic Diarrhea Virus (PDEV), gamma coronavirus IBV along with turkey enteric coronavirus are susceptible to N-linked glycosylation which can be inhibited by the tunicamycin [15][16][17]. The SARS-CoV-2 S glycoprotein possesses 22 N-linked glycosylation sites as confirmed from the recently published literature [18][19][20]. SARS-CoV-2 S glycoprotein showed a conserved S2 subunit for N-linked glycosylation with a low tendency for O-linked glycosylation. The N-linked glycosylation in SARS-CoV-2 is featured by binding of GlcNAc with the Asp amino acid residue in the Asp-X-Ser/ Thr consensus sequence in which the residue X is amino acid except for proline.
It is known that N-linked glycosylation is compulsory for understanding the location, structure and infectivity of the viruses and also their interaction with the host cells. These glycoproteins plays important role in immune responses but still need more research [21,22]. The S glycoprotein represents unique pathogenassociated molecular patterns (PAMPs) that further recognized by the host pattern Recognition Receptors (PPRs). These PPRs includes Toll-like receptors 3, 4, 7, 8 and 9 along with C-type lectins and collectins [23,24]. SARS-CoV is recognized by the toll-like receptors 3 and 4 via MyD88 and TRIF, furthermore, the same process may be proposed for the pathogenesis and infectivity in SARS-CoV2 [22, 25].

O-linked glycosylation
O-linked glycosylation involved in providing structural and functional stability to protein and believed to play important role in the maintenance of viral entity and biological activities associated with these viral proteins [26]. It has been demonstrated from a previously published study that Ser673, Thr678 and Ser686 are the conserved sites of O-linked glycosylation in human SARS-CoV-2 and other coronaviruses especially in S protein [26]. Moreover, O-glycosylation sites were predicted using the tool Net-O-Gly server 4.0 and found three sites for O-linked glycosylation at Ser673, Thr678 and Ser686 [26]. In another study, it was found that O-glycosylation at Thr 323 and Ser 325 and Thr 323 of the S1 glycoprotein are the possible and predicted sites of O-linked glycosylation in SARS-CoV-2 viral proteins [27].
The O-linked glycosylation at the Thr323 is confirmed by the presence of proline amino acids at position 322, making the possibility that the presence of proline amino acid is higher adjacent to the O-linked glycosylation sites [28]. Cryo-electron microscopic images of the SAS-CoV2 indicate that the binding of S protein to the human angiotensin I-converting enzyme 2 (hACE2) receptor involves an association between receptor-binding domain (RBD) and the hACE2 peptidase domain [29,30]. The RBD of the S protein in the S1 subunit endures hinge-like dynamic movement of accelerating the detention of RBD with hACE2, exhibiting a 10-20 fold increase in affinity for the hACE2 receptors [31,32].
In another published study it was found that application of tunicamycin showed normal glycosylation of the M protein despite inhibiting the N-linked glycosylation of S protein [33]. In one of the study, the structures of the associated glycans to the M protein during the O-linked glycosylation showed that it added into two-step processes; GalNAc first added before the addition of galactose and the sialic acid. After the possession of the GalNAc, galactose and sialic acid sequentially, the M protein was further undergone modification in the trans-Golgi apparatus [34].
Recently it was reported that there are low levels of O-linked glycosylation in the S protein of SARS-CoV2 [35]. These glycans regulate the recognition of the antibodies and impinge on priming by the host proteases enzyme system. Mucintype O-linked glycosylation is featured with the presence of GalNAc associated with the hydroxyl group of serine and threonine amino acid residues. Mucins contain a significant number of O-GalNAc glycans [36]. The presence of the O-linked glycans involved in the O-linked glycosylation of viral proteins suggests a vital role in biological activity. In the SARS-CoV-2 S1 protein, the O-linked glycosylation as GalNAc and O-GlcNAc appears to be involved in the structural and functional stability of the protein. The current scenario involves the use of a vaccine that utilizes the S protein glycosylation as a target.

Palmitoylation
Palmitoylation refers to the attachment of the palmitic fatty acid to the cysteine (S-palmitoylation) but less frequently to the serine and threonine amino acid residues (O-palmitoylation). In coronaviruses studies, the S protein undergone palmitoylation in infected cells and in the presence of tunicamycin it does not undergoes palmitoylation [37]. In another study conducted on MHV S protein reduces infectivity of MHV when treated with palmitoyl acyltransferase inhibitors 2-bromopalmitate [38]. The cytoplasmic part of the SARS-CoV S protein consists of four cysteine-rich clusters among them 2 clusters modified upon palmitoylation. However, cell surface expression of SARS-CoV S protein was unaffected due to this palmitoylation. In one of the previously published study, it was found that treatment of nitric oxide significantly leads to a reduction in the palmitoylation of the S protein of SARS-CoV [39].
In one of the study, it was found that there is three cysteines at position 40, 43 and 44 are found to undergo palmitoylation in the E protein of the SARS-CoV [40]. In another study, homologous cysteine of the E protein of MV-A59 at position C40, C44 and C47 were mutated to the alanine residues as resultant infectivity decreased [41]. It is therefore concluded that palmitoylation of the MHV E protein contributes to the stability and biological activity of the mature virions. Contrary, palmitoylation of the SARS CoV E protein is not mandatory for its interception with N protein.

Phosphorylation
In SARS-CoV2 the most abundant genomic protein encoded is the N protein with a significantly higher level of translation at the early stage of the infection. In all form of the coronaviruses, the N-protein is almost the same and conserved containing two globular domains, the N-terminal domain (NTD) and the C-terminal domain (CTD). Around these domains, intrinsically disorganized regions are present. N protein is dimeric with multiple RNA binding sites including one major RNA-binding groove, which is created by the two CTD piling on each other on NTD [42]. In previously published literature it has been found that in the disorganized region, there is an abundance of serine-arginine residues that is essential for the essential function and regulation of the N-protein [42]. Cytoplasmic kinases mediate the phosphorylation of the N-protein in the early infection phase. N protein, a helical nucleocapsid, constituent of SARS-CoV2 virus arranged in beads on a string pattern which shows binding with the RNA. It has two domains namely the N-terminal domain and C-terminal domain. It has been found that both domains contribute to the binding of the viral genome. In one of the recently published study, it was demonstrated in truncation analysis that an L/Q-rich region placed within the intrinsically disordered region of the SARS-CoV-2 N protein plays a vital role in RNA-mediated phase separation, which is located adjacent to the phosphorylated SR-rich region (constituting residues 176-206) [43]. In the same study, it was concluded that N protein central intrinsically disordered region shown to be involved in protein-protein interactions mediated via putative hydrophobic α-helix spanning residues (213-225 residues) [43].

Prospects
The functional role of PTMs in SARS-CoV2 associated proteins has not been fully explored and many trials are needed for proposing the role of all types of glycosylation like N-linked and O-linked along with palmitoylation and phosphorylation in the initial phase of the infection. However multiple modification sites on the proteins of the SARS-CoV2 virus provides opportunities to explore more about the replication and pathogenesis of the virus into the host cells. Moreover, newer techniques for the detection of the PTMs are also needed to detect the modifications at multiple sites in dynamically changing virus structure. It is also needed in the current scenario to better understand the molecular mechanism of these PTMs. Also, the PTMs of the coronavirus proteins might be attractive targets for the therapeutic regime. PTMs of the coronavirus proteins might also provide a prospective target for the development of the vaccines.