Protein post-translational modifications: In silico prediction tools and molecular modeling

Post-translational modifications (PTMs) occur in almost all proteins and play an important role in numerous biological processes by significantly affecting proteins' structure and dynamics. Several computational approaches have been developed to study PTMs (e.g., phosphorylation, sumoylation or palmitoylation) showing the importance of these techniques in predicting modified sites that can be further investigated with experimental approaches. In this review, we summarize some of the available online platforms and their contribution in the study of PTMs. Moreover, we discuss the emerging capabilities of molecular modeling and simulation that are able to complement these bioinformatics methods, providing deeper molecular insights into the biological function of post-translational modified proteins.


Introduction
Post-translational modifications (PTMs) occur on a large number of proteins de facto increasing the actual complexity of the proteome. PTMs consist in a covalent modification of amino acids of the primary protein sequence [1] and have the effect to create a much larger array of possible protein species. In response to specific physiological requirements, PTMs play a crucial role in regulating many biological functions [2], such as protein localization in the cell [3,4], protein stability [5], and regulation of enzymatic activity [6]. To date more than 90,000 individual PTMs were detected using biochemical and biophysical analyses [7]. In particular, it was observed that almost 5% of the human genome encodes enzymes in charge of catalyzing reactions leading to PTMs [8], highlighting once more the importance of these chemical modifications of the proteome.
Enzymes often are responsible for regulating these chemical modifications in proteins, as in the case of phosphorylation, acetylation, methylation, carboxylation or hydroxylation [9]. For instance, protein kinases can phosphorylate a given protein target to induce a signaling cascade, while this PTM can be further removed by specific protein phosphatases. These enzymes are found indeed in important signaling pathways, like G-protein [9,10] and Wnt signaling [11,12]. On the other side, PTMs not induced by specific enzymes (e.g., carbonylation or oxidations) were observed to be responsible of non-specific protein damage involved in neurodegenerative diseases, cancer and diabetes [13][14][15].
During the past 30 years, experimental techniques used for mapping and quantifying PTMs have seen an impressive progress. In particular, liquid chromatography (LC) with mass-spectrometry (MS) proteinbased analysis allowed the detection of thousands of PTMs across entire proteomes [16]. The study of PTMs in their biological context was achieved thanks to advancements in fluorophore chemistry, fluorescence spectrometry, and peptide and antibody synthesis [17]. However, the identification and characterization of PTMs are still limited by the poor knowledge of the underlying enzymatic reactions and their final effects on protein stability and dynamics. In this context, in silico methods, often based on the current knowledge of PTMs, are a promising strategy to perform preliminary analysis and prediction that can guide further in vivo and in vitro experiments, leading to expand our understanding of the role of PTMs in cellular processes.
In this review, we provide an overview of some of the existing computational approaches used to study the most common PTMs, which we classified based on the covalent attachment of (i) small chemical groups, (ii) lipids or (iii) small proteins to the main peptide chain. Most of these tools are presented as online webservers, providing a user-friendly interface for PTM site identification. Although in this review we mainly focus our attention on this kind of resources, standalone software, like for instance PEAKS PTM [18], GlycoMaster [19] or MODa [20], are also available but won't be covered here. reaction, which consists in the attachment of a phosphate group to the side chain of an arginine, lysine, histidine, tyrosine, serine or threonine residue [22] (Fig. 1A). It plays a key role in almost every cellular process, including metabolism, division, organelle trafficking, membrane transport, immunity, learning and memory [23,24], and function of target proteins [25]. It can activate [26,27] and inhibit [28,29] enzyme activity through allosteric conformational changes, facilitate the recognition of other proteins [30][31][32][33], promote protein-protein association [34][35][36] or dissociation [37] and also induce order-to-disorder transition [38,39] (Table 2).
It was estimated that 30% of the total proteome is phosphorylated at least at one residue [40,41]. However, this simple switch mechanism is in reality more complex since multiple enzymes can act on multiple sites of the same protein creating a highly connected network of interactions and modifications. For example, it was shown by highresolution mass spectrometry that 37,248 phosphorylation sites are present on 5705 proteins in adipocyte cells [42]. Other phospho-proteomic analyses demonstrated that proteins, on average, could be phosphorylated on at least five different sites, although these results could suffer from biases coming from high stoichiometry of the complexes [43][44][45].
Advances in mass spectrometry, both in terms of speed and sensitivity, allowed identifying and quantifying thousands of phosphorylation sites in different species [42,45]. The conservation of the functional phosphorylation sites in species like mice, rats and flies is a feature used by biologists for selecting specific sites of interest for functional characterization. Therefore, mapping the phosphorylation sites on proteins is an important step in order to understand the catalytic process and the effects of signal transduction events. However, it is still in general difficult to identify specific phosphorylation sites. In silico predictions play an important role in this field. Several methods (Table 1) were implemented in order to predict the target phosphorylation sites from the sequenceand structure-based analysis of the specific protein kinases' catalytic domain, such as KinasePhos2.0 [46] or GPS [47]. In particular, GPS is a group-based phosphorylation algorithm, which predicts kinase-specific  Table 1 PTM prediction webservers. Abbreviations: artificial neuronal network (ANN); support vector machine (SVM); random forest method (RFM); Hidden Markov model (HMM); weight matrix (WM); group based phosphorylation scoring method (GPS); binary profile of patterns (BPP); composition profile of patterns (CPP); PSSM profile of patterns (PPM); average surface accessibility (ASA); neuronal network (NN); knowledge-based (KB); conditional random field (CRF); group-based prediction (GBP); binary profile bayesian (BPB); information gain (IG); Bayesian discriminant (BD); enrichment based method (EBM); binary-relative adaptive binomial score Bayesian (Bi-BSP); logistic regression model (LRM); synthetic minority oversampling technique (SMOT); Markov chain clustering (MCC); particle swarm optimization (PSO); genetic variability (GV); position frequency matrix (PFM); covariance discriminant algorithm (CD): machine learning (ML phosphorylation sites in 71 protein kinase groups, such as Aurora-A, Aurora-B and NimA-like protein kinases. Other methods were instead implemented in a way to predict the phosphorylation sites simply from the substrate primary sequences. For example, Scansite [48] is built on combined experimental binding and/or substrate information to derive a weighted matrix-based scoring that predicts protein-protein and protein-phospholipid interactions, as well as phosphorylation sites. NetPhos [49] instead is based on an artificial neuronal network that allows the users to choose between a generic predictions based only on the substrate protein sequence or kinase-specific predictions. Recently, in order to overcome the limitations due to a training set based only on the same type of kinases, two general predictors were developed: PPRED [50], which incorporates evolutionary information, and PhosphOrtholog [51] that enables cross-species comparison of large-scale phosphorylation sites. Finally, several online databases are also available in order to curate and organize information about phosphorylation sites studied in vivo and in vitro in human and mouse proteomes (PhosphositePlus [52]), as well as rat, fly, yeast and worm (PhosphoELM [53]).

Glycosylation
Protein glycosylation is one of the most relevant and complex posttranslational modifications in the cell [54,55], which is thought to influence almost half of all proteins in nature [56]. It consists of a covalent interaction between a glycosyl donor of a glycan and a glycosyl acceptor amino acid side chain of a protein [57] (Fig. 1B). Protein glycosylation can be divided in four main categories based on the linkage between the amino acid and sugar: N-linked glycans, O-linked glycans, GPI anchors and C-mannosylation. In N-glycosylation, a sugar is attached to an amino group of an asparagine [58], while Oglycosylation is characterized by the interaction of a sugar with the hydroxyl group of a serine or threonine [59]. GPI anchors consist of the attachment of glycophosphatidyl-inositol near to the C-terminal of a protein chain anchoring the protein to the membrane [60]. Cmannosylation occurs when an α-mannopyranosyl moiety is attached to the indole of the tryptophan via C\ \C link [61].
Glycosylation modulates several protein biophysical properties influencing their native functions [2]. In particular, it was observed that it could alter not only protein thermodynamic and kinetic properties, but also influence the structural features of the proteins [62]. The covalent attachment of large hydrophilic carbohydrates modulates protein stability, oligomerization and aggregation [62][63][64], host cell-surface interactions [65], enzyme activity [66] and protein trafficking [67] ( Table 2).
Several analytical tools were developed over the past 2-3 decades facilitating glycan analysis. In particular, capillary electrophoresis, liquid chromatography, mass spectrometry and microarray-based are extensively used in grycoproteomics [68][69][70]. None of these tools can produce, however, a detailed molecular characterization of glycosylated proteins. The high heterogeneity of glycans and the difficulty of obtaining them in large amounts still preclude investigating the role of glycosylation at the molecular level. In the past years, the number of glycoconjugates' crystallographic protein structures have increased [71][72][73][74][75][76], nevertheless, a complete chemical and structural description of a glycan structure is still challenging. Mass spectrometry as well as different web-servers (Table 1) currently provide information about existing glycosylation sites. Indeed, in the last decade, several algorithms, trained with sequences or sequence-based information, have been developed to improve prediction of glycosylation sites. Some of these resources are based on neuronal network algorithms, such as NetNGlyc, NetOGlyc [77] for prokaryotes, or YingOYang [78] for eukaryotes. Other useful tools are the GlycoMod [79] server for prediction of glycans' structure based on experimental determined masses, and the NGlyPred [80] server, which incorporates both structure and residue pattern information. More recent developments include prediction of glycosylation sites based on machine learning algorithms (i.e., GlycoMine [81]), an approach that has produced a significant improvement with respect to prediction performances of NetNGlyc [82] and NectOGlyc [77].

S-nitrosylation
S-nitrosylation (SNO) consists in the covalent attachment of a nitric oxide (NO) to cysteine thiol moieties (Fig. 1C). Compared to phosphorylation, SNO is not catalyzed by an enzyme, but it depends on the chemical reactivity between the nitrosylation agent and the target, thus the specific residues' environment influences the reactivity of the target protein. Concentration of the nitrosylation agent and the protein, as well as the stability of the S\ \NO bond under physiological conditions, influences in turn the specificity of this reaction.
Although S\ \NO bonds are highly labile and redox-sensitive, several techniques managed to detect SNO in cells. There are methods for the direct detection of S-nitrosylated sites, such as the measurement of S\ \NO characteristic absorbance at 340 nm, electrospray ionization mass spectrometry (ESI-MS) [102] and NMR with 15 N [103]. Ozone chemiluminescence [104] and specific reduction with Cu + /cysteine [105] at pH 6 are indirect chemical methods that are instead based on the analysis of the cleavage products of SNO. Biotin switch assays and chemical reduction/chemiluminescence assays are specific and sensitive methods for measuring low levels of intracellular S-nitrosylated proteins. These experiments are laborious and low-throughput due to the labile nature and low abundance of SNO. Therefore, computational methods represent again a valid alternative to timely and reliably identifying SNO protein sites for further experimental verification. Several benchmark datasets were developed during the past years. SNOSID [106] tests the prediction performance on 65 positive and 65 negative samples, while GPS-NO [107] was developed based on 549 experimentally verified SNO sites. A support vector algorithm machine (SVM) [108] and a nearest neighbor algorithm (NNA) [109] were also proposed to predict SNO sites. However, no web server was later developed for any of these methods, so that their current usage is quite limited. Alternative web-servers are iSNO-PseAAc [110], which identifies nitrosylated proteins on an independent dataset, predicting sites with 90% accuracy [110], and GPS-SNO [107], which also represents a valid tool for an experimentalist providing information for hundreds of potentially S-nitrosylated substrates that have not been yet experimentally determined [110] (Table 1).

Methylation
Protein methylation is a reversible PTM that modifies the nitrogen atoms of either the backbone or side-chain of several types of amino acids, such as lysine, arginine, histidine, alanine and asparagine [111][112][113][114][115][116][117]; methylation has been also reported at cysteine residues (S-methylation) [118] (Fig. 1D). Despite this variability, most studies have been predominantly focused on lysine and arginine modifications. Methylation research dates back to 1939, but just recently has attracted more and more attention [111] with the identification of new methyltransferases, like protein arginine methyltransferases (PRMTs) [119][120][121] or histone lysine methyltransferases (HKMTs) [122][123][124], which catalyze mono [125] or double [111,126] methylation. In particular, the methylation of the N-terminal tails of the histone plays an important role in gene expression regulation [127], genome stability [128] and nuclear architecture [129] influencing several biological processes such as transcription [130,131] and chromosome maintenance [132] (Table 2). Methylation can also occur on the C-5 position of the cytosine ring of the DNA (DNA methylation) resulting in its association with several human diseases such as cancer, mental retardation (Angelman syndrome) or diabetes mellitus [133]. Although different biological processes are linked to DNA and histone methylation, there seems to be a mutual relationship between these processes, which could play an important role in gene expression [134].
Methylated proteins, as well as methylation regulatory enzymes, are involved in several human diseases such as cancer [126,135,136], cardiovascular diseases [137], multiple sclerosis [138] and neurodegenerative disorders [139]. Thus, the inhibition of these enzymes with small molecules could be an effective therapeutic means of intervention [140]. Moreover, as it is key to identifying methylation sites, understanding methylation mechanistic and dynamic features is as important. In the past years several experimental methods were developed to study the molecular mechanism of methylation. Mutagenesis of potential methylated residues, methylation of a specific antibody [141], as well as Chip-Chip [142] were extensively used for this purpose. Recently, mass spectrometry experiments have been also applied allowing the identification of 249 arginine methylated protein sites in 131 proteins from T cells [143]. However, these techniques are usually very expensive and laborious limiting the research of potential methylation sites. Computational predictions of methylation sites have helped handle these limitations providing an important resource for reducing the number of experiments needed to determine protein methylation sites. Eight web-servers for prediction of methylation sites are currently available (Table 1). MeMo [144] is one of the first online tools to become available. It uses a support vector machine (SVM) as a prediction algorithm. Its dataset is based on a curated selection of all methylated residues annotated in SWISS-PROT [145], 264 experimentally manually verified methyl-lysine and 107 methyl-arginine extracted from roughly 1700 scientific articles. MeMo [144] appears to be a powerful tool for predicting methylated-arginine sites when compared to methylatedlysine. However, its accuracy is affected by the lack of training data available at the time of development. Lately, the reliability of the prediction was improved by BPB-PPMS [146], where a Bi-Profile Bayesian approach was used to define methylated and non-methylated sites based on known experimental data [147,148]. The data set was increased to 363 candidates containing methylated arginines and 977 methylated lysine proteins. The combination of Bi-Profile Bayesian features with a larger data set improved the methylation prediction accuracy up to 92% for methylated lysine proteins and 88% for methylated arginine proteins [149]. It was observed that protein methylation mainly occurs in regions that are easily accessible and intrinsically disordered, thus MASA [149] used Solvent Accessible Surface Area (SASA) and secondary structure information for predicting methylated sites. This web-server allows the prediction not only of methylated lysines and methylated arginines, but also methyl-glutamates. However, most of these methods use only primary sequence information without taking into account any physicochemical property of residues. With the aim of improving the quality of the prediction, a novel approach called PMes [150] was introduced, which considers physiochemical properties of amino acids surrounding methylation sites. A specific lysine-methylation prediction tool for histones was also proposed: METhK [151] uses amino acids' composition, SASA, amino acid pair composition (i.e., the frequency of amino acid pairs in the primary sequence), amino acid index and protein disorder regions for discriminating between methylated lysine sites in histones and in non-histone proteins. More recently, another web-server has been introduced for in vivo or in vitro species-specific methylation sites' identification: PSSMe [152] was tested on a largescale experimental methylated site dataset originated from different species, revealing that methylation patterns are indeed species dependent.
Experimentally several techniques were applied to explore N-acetylation, such as radioactivity detection [180], immunity affinity detection and chromatin immunoprecipitation [181]. The development of high-throughput technologies like immune-precipitation combined to mass spectrometry increased also the number of detected acetylated proteins [182]. However, the experimental detection of acetylated sites is inefficient, expensive and have implicitly low throughput [183]. Therefore, computational tools represent alternative methods for studying the acetylation modifications and provide information for further experiments. Some web-servers (Table 1) dealt only with one specific type of acetylation such as NetAcet [184] for instance. NetAcet [184] attempted to predict only N α -acetylation sites using a neuronal network trained on yeast data and extendable only to mammalian acetylated substrates. However, NetAcet [184] suffered from the limited size of the training dataset available at that time of development. Several web-servers aimed to predict acetylated lysine. PAIL [185] was the first in silico tool for N ε -lysine acetylation sites' prediction. The Bayesian discriminant algorithm [186] was employed on a training set of 246 experimentally verified acetylated sites. Despite a small data set, PAIL [185] is able to achieved an accuracy of 85%. BRABSB-PHKA [187] is a human-specific lysine acetylation predictor, which combines a bi-relative adaptive binomial score Bayesian algorithm with a support vector machine. Another method in lysine acetylation prediction is PSKace-Pred [188], where a position-specific view was considered for the characterization of acetylated proteins.
Protein sequences' information, evolution similarity and physiochemical properties can help in discriminating between acetyl-lysines and non-acetyl-lysines, improving lysine sites' evaluation. LAceP [189] is based on a logistic regression model, where the physiochemical property of the amino acids and the transition probability of adjacent amino acids were considered during the prediction process. It also allows predicting acetylated sites not only for lysines, but also for glycine, methionine serine and threonine residues. This is actually done by N-Ace [190], where physiochemical properties (e.g., non-bonded energy, absolute entropy) and solvent accessibility were included in the original prediction code.
The status of lysine acetylation can also be influenced by the enzymes that catalyze the reaction. Although lysine acetyltransferases (KATs) act usually on a multiple-subunit complex, it is still difficult to determine which KATs are responsible for the acetylation of a given protein. ASEB [191] was the first server for KAT-specific human acetylated lysine prediction that not only evaluates possible lysine acetylation sites, but also provides information about the responsible KAT enzyme.

Covalent attachment of acyl chains
Protein lipidation is a unique post-translational modification, which has the result of directly controlling the interaction of soluble protein with biological membranes affecting in turn cellular organization and trafficking. In this section we give an overview of several types of lipidation, their mechanism, involvement in diseases and the computational resources used for predicting lipidation sites.

Palmitoylation
Palmitoylation consists in the attachment of a 16-carbon acyl chain to cysteine residues via a thioesteric bond [192,193] (Fig. 1F). Among all PTM lipidations, palmitoylation is the only reversible one and can dynamically regulate protein function, as in the case of H-Ras and N-Ras [3,194]. Two families of enzymes regulate the palmitoylation/ depalmitoylation process: palmitoyltransferases (PATs), which catalyze the attachment of a palmitate from CoA to specific cysteines, and Acyl Protein Thioesterases (APTs), which remove the palmitate acyl chain. Palmitoylation occurs both in soluble and membrane proteins playing a critical role in the regulation of key biological processes, such as protein membrane trafficking, signaling, cell growth and development [195] (Table 2). Aberrant palmitoylation is associated to a variety of human diseases including neurological disorders (e.g., Huntington disease's [196] or Alzheimer's disease [197]) and cancer [198][199][200][201][202][203].
However, the S-palmitoylated proteome is not yet well defined and little is known about the mechanism that regulates S-palmitoylation and its consequences. In fact, the identification of palmitoylation sites is not simple due to the lack of a distinct sequence motif on the substrates [204]. Mass spectrometry allows the identification of several palmitoylated proteins in cells and tissues, which can be further experimentally characterized using Acyl Biotin Exchange (ABE) or Acyl Resin Assisted Capture (Acyl-RAC) techniques [205][206][207]. Metabolic labeling and click chemistry probes [208,209] were developed to recognize palmitoylation sites in order to shed light on the molecular mechanism and dynamics of palmitoylation. All these experimental techniques are time and money consuming, thus computer-aided methods are a necessary alternative for predicting palmitoylation sites (Table 1). CSS-palm [210] was one of the first methods to be developed for searching novel palmitoylated proteins in budding yeast. It is based on a clustering and scoring algorithm, where 263 experimentally verified palmitoylation sites are used as a training set, manually collected from the scientific literature. CKSAAP-PALM [211] is another computational method to predict palmitoylation sites based on protein sequences. An encoding scheme composed by k-spaced amino acid pairs is at the basis of this approach [211], which improved accuracy compared to former strategies. SwissPalm [212] has been recently introduced, which provides information from the comparison of different palmitoylproteomic studies and allows the users to easily search for the protein of interest, determine the predicted S-palmitoylation sites, identify orthologues and compare them across palmitoyl-proteomes. SeqPalm [213] has been recently developed in order to get insights into the correlation between the disruption of palmitoylation sites and diseases. This new computational method allows for the identification of palmitoylation sites based on amino acid compositions, autocorrelation of amino acid physicochemical properties and amino acid positionweighted matrices.

N-myristoylation
Myristoylation is a covalent and irreversible attachment of a 14-carbon fatty acid to N-terminal Gly residues [214] of eukaryotic or viral proteins (Fig. 1G). This PTM facilitates in turn the interaction with membranes or a hydrophobic protein domain [215][216][217][218][219]. The substrates involved in myristoylation are generally characterized by the consensus motif Met-Gly-X-X-X-Ser/Thr at the N-terminus. This PTM acts predominantly by removal of the main methionine residues in order to expose the subsequent glycine [220]. Less frequently, it can also expose an internal glycine by proteases' cleavage [221]. These mechanisms are both catalyzed by the N-myristoyl transferase (NMT), a 50 kDa enzyme expressed in most organisms [222,223].
Myristoylation is involved in several critical cellular processes, such as signaling pathways, apoptosis [221] and extracellular protein export [224] (Table 2). Usually myristoylation acts with other posttranslational modifications like palmitoylation [225][226][227], or in combination with positively charged residues [228] in order to enhance membrane-protein interactions. Several diseases are linked to Nmyristoylation like cancer, epilepsy, Alzheimer's disease and viral and bacterial infections [229]. The experimental detection of Nmyristoylation includes radioactive techniques like the use of 3 H or 14 C radioactive myristate that requires a long exposure period (weeks to months). To the best of our knowledge only two online web-servers are available to predict myristoylation sites (Table 1). NMT [230] uses a trial set that combines experimentally proved myristoylated proteins with potential myristoylated candidates. Based on structural and biochemical characterization of the N-myristoyltransferase, a set of descriptors was suggested for better predicting myristoylated sites. This protocol improved the previous pattern suggested in PROSITE [231] (pattern code: PDOC00008), which gave numerous false negative predictions.
Another N-myristoylated site predictor is called Myristoylator [232], which is based on a machine learning model that uses several combined neuronal networks and a test set of positive and negative sequences. Although this predictor seems to increase the specificity, it was trained to predict myristoylation only on terminal glycines, thus a priori knowledge of the proteolytic scission site is necessary when using this web-server.
The most common approach for detecting prenylation is to use expensive radiolabeling techniques [203,204]. Initially, the prenylation motif was suggested to be CaaX, i.e. consisting of a cysteine (C) followed by two aliphatic residues (aa) and a terminal residue X. However, further kinetic studies and mutation experiments showed a more flexible and complex recognition motif for prenylation [247]. PrePS [248] is the only online tool available, which is based on modeling of the substrate-enzyme interactions for each prenyltransferase.

Small proteins' modifications
An important field in cell signaling is the characterization of the covalent and reversible attachment of ubiquitin (ubiquitylation) and small ubiquitin-related modifiers (sumoylation). This peculiar class of PTMs provides new protein-protein interfaces remodeling the target proteins [249]. In this section we review the latest findings on sumoylation and ubiquitylation with a particular attention on the in silico tools recently developed.

Ubiquitylation
Ubiquitylation is a three step process where, first, the ubiquitin is activated by a ubiquitin-activating enzyme (E1), then conjugated to a ubiquitin-conjugating enzyme (E2), and finally transferred by a ubiquitin-ligase enzyme (E3) to a substrate molecule via an isopeptide bond with an internal lysine (Fig. 1I). This reversible modification is implicated in the regulation of several cellular processes, like protein degradation [250][251][252], cell cycle division, the immune response [253], lysosomal trafficking [254] and control of insulin [255] ( Table 2). The aberration of ubiquitylation is linked to human pathologies varying from inflammatory neurodegenerative diseases to different forms of cancers [253,256].
Despite the availability of several ubiquitin-protein ligase complex structures [257][258][259][260][261], the ubiquitylation reaction mechanism is still poorly understood. It has been recently hypothesized that structural disorders of the substrate could actually facilitate this process. Analysis of sequences by mutant yeast strain experiments [262] showed that most of the ubiquitylation sites are in the disordered and flexible regions of a protein. On the basis of this observation UbPred was developed [262] (Table 1): a ubiquitylation site predictor based on a support vector machine algorithm (SVM), which allows studying the correlation between ubiquitylation and protein half-life. In order to overcome the lack of accuracy and training data deficiency, UbiProber [263] and iUbiq [264] were designed (Table 1). UbiProber predicts both general and species-specific ubiquitylation sites using large-scale experimental data as training set, while iUbiq is based on evolutionary information incorporated into the general form of pseudo-amino acid composition. However, all these in silico tools do not account for E3 binding/ recognition sites, although it was shown to be an important feature for ubiquitylation. UbiNet [265] (Table 1) is the first server that allows studying the regulatory network among E3 and ubiquitylated proteins.

Sumoylation
Sumoylation is a PTM characterized by a covalent attachment of the Small Ubiquitin-like Modifier (SUMO) to specific lysine residues via an enzymatic reaction (Fig. 1L). Sumoylation sites are identified by a canonical consensus sequence Ψ-K-X-E (where Ψ is a hydrophobic amino acid, such as A, I, L, M, P, F, V or W; X any amino acid residue) [266,267], and by SUMO-interacting motifs (3-4 aliphatic residues linked to acid and/or phosphorylatable amino acids) called SIM [268]. Both these features are essential for characterizing the biological significance of sumoylation. This modification is involved in several cellular processes, like protein binding, subcellular transport, gene expression, DNA repair, chromosome assembly and cellular signaling [269][270][271][272] ( Table 2). Aberrant sumoylation is correlated not only to Alzheimer's and Parkinson's diseases [273], but also to cancer [274] and diabetes [275], highlighting the importance of detecting sumoylation sites. Mass spectrometry-based proteomic studies allow mapping hundreds of proteins identifying different sumoylation and SIM sites [276][277][278][279]. However, the limitations due to the reversibility of this modification and the difficult identification of peptides from trypsin digestion impose some limitations to the use of this technique. Computer-aided prediction represents a good alternative to reduce the number of potential targets to explore for further experimental verifications (Table 1).
While web-servers available for SIM prediction are only GPS-SUMO [280] and JASSA [281], several online methods for sumoylation sites' prediction are currently available. SUMOhydro [282] is based on a support vector machine (SVM) combined with amino acid hydrophobicity, while SUMOAMVR [283] considers also other structural features, like average accessible surface area (AASA), secondary structure and evolutionary information of amino acids. Recently, a new in silico tool based on the covariance discriminant (CD) algorithm was developed in order to avoid errors caused by disparity in training data sets [284].

PTMs cross-talk
The hypothesis of a correlation between PTMs within the same protein (PTMs cross-talk) [285] has emerged in the proteomic field in recent years. For instance, the regulatory interplay among PTMs was observed for histones [286] and other proteins like p53 [287,288], RNA polymerase II [288] or β-tubulin [289]. In particular, the importance of PTM cross-talk was recognized in several biological pathways (e.g. DNA damage response [290] and protein stability regulation [291][292][293]) pointing to a strong relationship between PTM cross-talk and protein functions.
While the structural and functional understanding of combinatorial PTMs has been initially limited by technological limitations, recent advances in proteomics have allowed integrating information on different types of modifications [294,295]. However, with the latest experimental methods it is also difficult to identify the whole set of PTM sites in the proteins. In this emerging context, computational methods are poised to support the study of PTM cross-talk. The first unified tool for a simultaneous prediction of PTM sites was ModPred [296], which predicts and analyses simultaneously multiple types of PTM sites in order to gain structural and functional information on the protein regulatory mechanism of multiple PTMs. Recently, a new webserver, PTM-X [297], allows the prediction of PTM cross-talk based on experimentally published data. The difference compared to ModPred is represented by the necessity to know a priori the PTM candidate sites.

Structural and dynamical characterization of PTMs
Despite the important role played by PTMs, their structural and dynamics effects of protein function remain poorly understood from a molecular point of view, due to the labile transient nature of most of these modifications and the lack of adequate experimental techniques able to detect and characterize them and the underlying chemical mechanism of formation. The online tools previously discussed are valid methods to overcome some of these experimental limitations and predict putative PTMed sites, but they do not usually provide any information about the impact of post-translational modifications from a mechanistic point of view.
Molecular modeling and molecular simulation (such as molecular dynamics, MD), which are based on empirical atomistic force fields [298][299][300][301], are a powerful strategy for studying biological systems at single molecule resolution and nanosecond-to-millisecond time scales. Although this computational approach allows nowadays the study of protein processes and properties that are not easily accessible experimentally, there are still some apparent limitations regarding the availability of accurate parameters that would allow the investigation of PTMed proteins. In the past years several improvements have been made in order to expand this approach also to non-standard biomolecules. Within the AMBER force field atomic charges and parameters were developed for phosphorylated residues as phosphoserine, phosphothreonine, phosphotyrosine, phosphohistidine [302], and S-nitrosylated residues (S-nitrocysteine [303]) and methylation (trimethyllysine [304,305]). Similarly, within the CHARMM force field there are parameters for methylated lysines and arginines, as well as acetylated lysines and palmitoylated cysteines [306]. There are also ad hoc comprehensive atomistic force field parameters for treating the description of the link between carbohydrates and proteins such as in GLYCAM for AMBER [307]/CHARMM [308] and a modified version of GROMOS [309,310]. In theory, within these schemes, there are existing strategies to parameterize virtually any kind of non-standard amino acids, as for the case of PTMs; in practice, the development of new force field models always involves time-consuming parameterization protocols and rigorous a posteriori validations of the quality and robustness of the new models.
Moving to lower resolution, coarse-grained force fields can be also very useful for studying the impact of PTMs on protein function. In this domain there are no specific parameters for the description of PTMed residues. The Martini force field [311], for instance, provides parameters for treating non-covalently bound sugar molecules or phosphate groups but a complete general representation of modified residues is not yet available. However, a recent work described new parameters for modeling palmitoylated cysteines [312] that were used to study H-Ras, and contributed to show the influence of this PTM in regulating the partition of the protein with the membrane.
While for a long time PTMs were not usually considered in modeling and molecular simulation works, the recent availability of more comprehensive experimental data along with accurate force field parameters have thus allowed investigating protein properties taking also into account the effect of PTMs on their structure and stability. Recent examples of this approach have revealed the impact of PTMs for the HIV-1 fusion peptide structure [313], rhodopsin [314], calnexyn [315] and phosphatidylinositol 4-kinase [316].
Answering to the need of new and better molecular models to more realistically describe proteins, some automatic tools for generating force field parameters for new molecular species have become available, such as ParaChem or SwissParam [317] compatible with the CHARMM force field, q4md-forcefieldtools for AMBER [318] and ATB for GROMOS [319]. However, none of them directly focus on the parameterization of PTMs, likely because of the complexity of the development of parameters required for most PTMs. Therefore, the necessity of having computational tools allowing an automatic parameterization of PTMed protein structures to be used in MD simulations resulted in the development of some new web-servers, such as FF_PTM (http://selene. princeton.edu/FFPTM/) and Vienna-PTM (http://vienna-ptm.univie.ac. at). FF_PTM focuses on expanding the existing AMBER force field (i.e., ff03) including parameters for 32 PTMs. In particular, it is characterized by parameters that describe the attachment of small molecules (e.g., phosphorylation, methylation or acetylation) and the covalent interaction with acyl chains such as palmitic acid (palmitoylation) and geranylgeranyl pyrophosphate (geranylgeranylation). On the other hand, Vienna-PTM is a web platform designed for introducing PTMs on PDB structures to run simulations using the GROMOS 54A7 and 45A3 force fields.

Summary and outlook
The chemical modification of amino acids plays an important role in a myriad of cellular processes that range from protein localization to disease development and aging. Enormous efforts, which combine the development of both experimental and computational methods, have been put in the past 2 decades in order to understand PTM mechanisms and effects for protein structure, dynamics and function.
In this review we summarized the main in silico tools mainly available as an online webserver for studying PTMs (Table 1). Recently, an integrative platform (dbPTM: http://dbptm.mbc.nctu.edu.tw/) has also become available. Originally developed as a comprehensive database of experimentally verified PTMs, dbPTM collects all available databases and webserver resources considering also other PTMs, like succinylation and S-glutathionylation. Although this platform does not provide an exhaustive description for the case of lipidation, it includes also standalone software (not discussed here), offering thus a complementary source of information to this review.
With an increasing amount of experimental data available every day, we think that, as the existing ones will keep improving their performance, many other methods will emerge in the future. Although most of the web-servers available are based on a sequence-based analysis of training data sets, some of them have also started to take into account other interesting properties, such as evolutionary information (e.g., PhosphoOrtholog, iUbiq), SASA (e.g., MASA, METhK, SUMOAMVR) and physiochemical properties (e.g., Pmes, SUMOhydro).
Altogether, these approaches are only rarely considering the molecular features associated with PTMs and the molecular impact they have for protein function in general. Within this context, molecular modeling and simulations have the capability to complement experimental and bioinformatics techniques providing a molecular description of the effect of PTMs on protein structures and stability. However, the lack of suitable tools and parameters for treating PTMs in proteins has limited so far the characterization of these covalent modifications. As some automatic tools (e.g., FF_PTM and Vienna-PTM) have recently appeared providing an online platform to parameterize post-translational modified proteins suitable for running atomistic MD simulations with AMBER or GROMOS force fields, for most PTMs ad hoc parameterizations still need to be developed. Similarly, coarse-grained force fields still lack reliable and robust models for dealing with PTMs, as well as systematic protocols to produce accurate parameters.
Nowadays, with the constant increment of computing power and the availability of always more accurate force fields for biomolecules, which accompany the tireless advances on the experimental side, it is possible to achieve a more precise and faithful description of biological systems in their physiological conditions using molecular modeling and simulation. For instance, the advances in lipidomic analyses have provided a much more detailed view of membrane composition, allowing the construction of more realistic membrane models [320,321] to better investigate membrane biophysical properties and their interplay with integral and peripheral membrane proteins [322,323]. Several computational tools have been developed with this aim in mind, such as CHARMM-GUI [308] and LipidBuilder [324].
Along the same lines, it is also clear that protein PTMs are another important layer of complexity that integrally defines and modulates protein function and, for this reason, needs to be considered at all levels of both experimental and computational investigation. Therefore, the computational advances of bioinformatics and physics-based molecular modeling and simulation techniques appear as a fundamental requirement to complement the experimental investigation of PTMs' impact on cellular processes.