From Synthesis to Characterization of Site-Selective PEGylated Proteins

Covalent attachment of therapeutic proteins to polyethylene glycol (PEG) is widely used for the improvement of its pharmacokinetic and pharmacological properties, as well as the reduction in reactogenicity and related side effects. This technique named PEGylation has been successfully employed in several approved drugs to treat various diseases, even cancer. Some methods have been developed to obtain PEGylated proteins, both in multiple protein sites or in a selected amino acid residue. This review focuses mainly on traditional and novel examples of chemical and enzymatic methods for site-selective PEGylation, emphasizing in N-terminal PEGylation, that make it possible to obtain products with a high degree of homogeneity and preserve bioactivity. In addition, the main assay methods that can be applied for the characterization of PEGylated molecules in complex biological samples are also summarized in this paper.


INTRODUCTION
The binding of proteins, peptides, enzymes, antibody fragments, oligonucleotides, or small synthetic drugs to polymers has become a very useful method for improving therapeutic activity or decreasing the toxicity of these biological agents (Mishra et al., 2016). Among the polymeric materials, polyethylene glycol (PEG) is the most used for these purposes, mainly due to its high biocompatibility, low toxicity, and limited side effects (Gauthier and Klok, 2008). PEGs are water-soluble polymers approved by the Food and Drug Administration for use in oral, topical, and intravenous formulations (D'souza and Shegokar, 2016). It presents a structure of repeated units of polyether diols (either linear or branched) chemically formulated as HO-(CH 2 CH 2 O)n-CH 2 CH 2 -OH ( Figure 1A), where each ethylene oxide residue has a molecular weight (MW) of 44 Da (Roberts et al., 2012). PEGylation refers to the covalent or non-covalent attachment of PEG to different molecules, such as proteins, macromolecular carriers, oligonucleotides, vesicles, and others to improve the pharmacokinetic (Milton Harris et al., 2001;Lee et al., 2013) and pharmacodynamic properties (Abbina and Parambath, 2018). The conjugation to PEG generates an increase in the hydrodynamic volume of the biomolecule of interest, creating a shield around it (Gokarn et al., 2012). This effect enables clearance by the renal system to be reduced, and therefore, the half-life is increased in the bloodstream (Milton Harris and Chess, 2003) concomitant with the increases in PEG molecular weight (Hamidi et al., 2006). Additionally, this approach has been used to improve the stability of some proteins (Yang et al., 2007;Jevševar et al., 2010;Lawrence and Price, 2016;Santos et al., 2019), as well as decrease the immune response against several biomolecules (Soares et al., 2002;Zheng et al., 2012;Sathyamoorthy and Magharla, 2017;Wu et al., 2017).
Since the 1990s several PEGylated biopharmaceuticals (see Table 1) have been approved by the FDA, and some more are currently undergoing clinical trials (information available at https://clinicaltrials.gov/ct2/results?term=Pegylated&Search=App ly&recrs=d&age_v=&gndr=&type=&rslt=). Most of the approved PEGylated proteins were synthesized by non-site-specific chemical conjugation strategies, resulting in heterogeneous mixtures of multi-PEGylated (polydisperse) proteins due to the presence of several reactivity sites on the protein surface (Alconcel et al., 2011), requiring complex separation steps. In addition, protein PEGylation can lead to the loss of protein activity through several mechanisms that include the direct PEGylation of the active site or receptor binding site (Schiavon et al., 2000), the steric entanglement imposed by PEG chains that cause restricted movements (Kubetzko, 2005;Xiaojiao et al., 2016) and conformational changes in proteins (Chiu et al., 2010), among others. Also, recent research has revealed certain shortcomings related to highly PEGylated forms, such as activation of the immune system, nondegradability, and possible accumulations with high molecular weight PEGs. These are strong reasons that support the need to find site-selective PEGylation techniques, yielding homogenous mono-PEGylated products, a field that has garnered considerable in recent years. Although certainly the in vivo potency of therapeutic proteins can be affected by the PEGylation process, this decrease in activity can be largely balanced by their prolonged half-life in the circulation (Oclon et al., 2018). Site-selective PEGylation has been a very useful strategy for introducing PEG at specific amino acid sites in various proteins. Some methods like pH-controlled N-terminal selective acylation (Chan et al., 2006;Chan et al., 2012) or reductive alkylation (Kinstler et al., 1996;Marsac et al., 2006), the use of oxidizing agents (Kung et al., 2013;Obermeyer et al., 2014), the chemo-selective capability of catechol (Song et al., 2016) and transamination reaction (Gilmore et al., 2006) have been used to perform PEGylation at the N-terminus of proteins. Additionally, in recent years there has been a lot of work on using "grafting from" approaches to grow PEG from the surface of proteins via ATRP and RAFT polymerization methods (Quémener et al., 2006;Ameringer et al., 2013;Gody et al., 2015;Tucker et al., 2017). These approaches involve the direct generation of conjugates containing high molecular weight polymers (like PEGs) by directly growing the polymer from the protein surface (Wallat et al., 2014;Obermeyer and Olsen, 2015). An illustrative example of success in chemical site-selective PEGylation is the case of Neulasta ® , which is an N-terminally mono-PEGylated granulocyte colony-stimulating factor bearing a 20-kDa PEG (Molineux, 2004). The improved pharmacokinetic behavior of this biopharmaceutical allows administration only once per chemotherapy cycle compared to the first generation, Neupogen ® , which is administered daily (Cesaro et al., 2013;Zhang et al., 2015).
Despite their robustness, chemical methods usually involve the use of excessive amounts of reagents and careful working conditions. Site-specific PEGylation of peptides and proteins has been approached successfully not only from the chemical point of view but also enzymatically. Several studies report the use of enzymes to conjugate PEG to peptides, proteins, and oligonucleotides (Sato, 2002;Mero et al., 2009;Da Silva Freitas et al., 2013;Sosic et al., 2014).These enzymes usually catalyze the reaction between the biomolecule of interest and a substrate analog containing a functional group (Dozier and Distefano, 2015), which can be the case of PEG. There are a number of more recent approaches aimed at achieving site-selective modification including the use of the Spytag/Spycatcher system (Schoene et al., 2014;Reddington and Howarth, 2015;Cayetano-Cruz et al., 2018;Kim et al., 2018); however, there is as yet insufficient information focused on these methods, and some studies are being developed in this direction.
The structural changes in protein characteristics after the attachment to PEG influence the subsequent characterization of PEGylated proteins. These changes result in an analytical challenge due to the heterogeneity of the PEGylation products and the degree of PEGylation, coupled with the complex protein structure (Hutanu, 2014). Several studies have reported the use of analytical techniques with differing degrees of difficultyfrom colorimetric methods to more complex techniques such as computational approaches-for the characterization of PEGylated peptides and proteins.
In the present review, we have focused on summarizing both classic and novel chemical and enzymatic tools used for the covalent attachment of PEG in site-specific regions of peptides and proteins, as well on the main analytical methods for PEGylated molecule characterization.

Chemical Approaches for Site-Selective Pegylation
For the selective modification of specific amino acids in peptides and proteins, the knowledge of some characteristics about their The square represents PEG functional group that covalently binds to the terminal amine of the protein. PEG chain is end-capped with a terminal methoxy group to prevent reactivity and enzymatic attack upon administration in mammals.
primary structure is needed. An important physicochemical feature in proteins is the difference in pKa between the amino group of an N-terminal amino acid residue (~7.6) and the amino groups in the side chains of lysine (~10.5) and arginine (~12) (Roberts et al., 2012). This difference allows the selective N-terminal modification of proteins based on pH control and the use of reductive agents like sodium cyanoborohydride. A useful strategy for the specific conjugation of peptides and proteins is based on the amino acid ratio in a protein being variable. Moelbert et al. reported the accessibility index on the surface of the 20 essential amino acids, which makes it possible to know the expression of these amino acids in different areas of the proteins in relation to their natural abundance (Moelbert, 2004). It has also been reported that short peptides/proteins (less than 50 residues) tend to over-represent glutamine and cysteine in the N-terminal region (Villar and Koehler, 2000). It is well known that single-chain proteins possess only one N-terminal residue, having a uniquely reactive site for chemical modification (Rosen and Francis, 2017). Therefore, as virtually all proteins present these functional groups, a number of valuable reactions have been developed for their selective modification (Boutureira and Bernardes, 2015).
The use of potassium ferricyanide as an oxidizing agent in o-aminophenol-performing N-terminal PEGylation has also been shown (Obermeyer et al., 2014). In 2016 Song et al. described an alternative strategy for PEGylation at the N-terminus of several proteins as well as two peptides based on the chemoselectivity of catechol (Song et al., 2016). More recently, Rosen and Francis described classical methods for the selective modification of N-terminal amino group under pH control. These methods include the selective acylation and alkylation of N-terminal amines at low-to-neutral pH and also transamination using pyridoxal-5′-phosphate aldehyde, which undergoes condensation with ε-amines from lysine side chains and N-terminal α amines to form imines (Gilmore et al., 2006;Rosen and Francis, 2017). Chen et al. demonstrated the ability of benzaldehyde to selectively modify native peptides and proteins on their N-termini. Preservation of the positive charge on the N-terminus of the human insulin A-chain through reductive alkylation instead of acylation leads to a 5-fold increase in bioactivity. They showed that under mild conditions, aldehyde derivatives and carbohydrates can site-specifically react with peptide and protein N-termini, providing a universal strategy for site-selective N-terminal functionalization in native peptides and proteins . PEG-isocyanate is in the group of PEG reagents used for the site-specific modification of different proteins (Berberich et al., 2005;Sharma et al., 2017). The reaction takes place via the amine group to produce a stable thiourea linkage (Ganesan et al., 2015). For example, in 2009 Cabrales et al. generated PEGylated human serum albumin (PEG-HSA) by conjugating PEG-phenylisothiocyanate 3 and 5 kDa at primary amine groups of the HSA, enhancing the hydrodynamic volume of the protein and restoring intravascular volume after hemorrhagic shock resuscitation (Cabrales et al., 2008). Furthermore, Chen and He reported in 2015 the achievement of nanophosphors coated with PEG-isocyanate and polylactic acid (PLA) for paclitaxel delivery, resulting in a significant improvement and serving as a platform in the field of drug development (Chen and He, 2015). Lee et al. synthesized a dual functional cyclic peptide gatekeeper attached on the surface of nanocontainers by using PEG-isocyanate as a linker to enhance dispersion stability and biocompatibility. This allowed the active targeting of cancer cells with high CD44 expression together with the ability of triggered drug release . It is important to note that specific PEG-reagents like isocyanates have a short half-life in aqueous solutions (Erfani-Jabarian et al., 2012); thus, a stoichiometric excess of these reagents is necessary, causing difficulties in the removal of the remaining PEG.
A relevant report for one-step N-terminus-specific protein modification showed the stable and selective imidazolidinone product at the N-terminus, with 2-pyridinecarboxaldehyde (2PCA) derivatives (Macdonald et al., 2015). The main basis of this reaction is the nucleophilic attack of the neighboring amide nitrogen on the electrophilic carbon of the initially formed N-terminal imine (Koniev and Wagner, 2015). As an example, a 2PCA-functionalized polyacrylamide-based hydrogel has been developed for the immobilization of extracellular matrix proteins through the N-terminus to study their biochemical and mechanical influence on cells .
In the next section, we provide an overview based on reactions which can be used to selectively modify specific amino acids. Keeping that in mind, in some cases the described modification does not refer to the PEGylation itself, but the concept could be applied if the introduction of PEG reagents is desired. A mechanism corresponding to N-terminal PEGylation has been illustrated in Figure 1B, while general mechanisms of the siteselective chemical reactions are shown in Figure 2.

Targeting Cysteine
Cysteine residues are interesting targets for residue-specific modification of peptides/proteins due to their low apparition frequency (Harvey et al., 2000). These are often found partially or fully covered within the protein structure, limiting their accessibility to chemical reagents (Thordarson et al., 2006). Proteins with N-terminal cysteine have been successfully modified through native chemical ligation (NCL) when, on the first and reversible step, a thioester intermediate is formed, which then undergoes a spontaneous S-to-N acyl shift and yields an amide bond (Johnson and Kent, 2006;Rosen and Francis, 2017). This methodology has been useful in the preparation of high complexity protein-polymer conjugates. For example, Zhao et al. described a PEGylated human serum albumin (HSA) in a site-specific method by taking advantage of the unusual chemical reactivity of the only one free Cys34 of the HSA molecule and the high specificity of PEG-maleimide for the protein sulfhydryl (-SH) groups. Targeting the distinctive free Cys34 through this site-specific PEGylation could generate a chemically welldefined and molecularly homogeneous product and may be also convenient in preventing dimerization (Zhao et al., 2012). Another technique which plays a major role in modern chemical biology and has been used for many applications is known as expressed protein ligation (EPL) (Mitchell and Lorsch, 2014;David et al., 2015;Liu et al., 2017). EPL constitutes an improvement for NCL, and in this case selectivity over lysine acylation was achieved through pH control, by using benzaldehyde derivatives bearing selenoesters to acylate N-terminal positions through acyl transfer (Raj et al., 2015). As N-terminal cysteines are rare in nature, they frequently need to be introduced by genetic engineering (Nguyen et al., 2014a;Uprety et al., 2014;Gunnoo and Madder, 2016). Methionine aminopeptidase can take out the first methionine to liberate an N-terminal cysteine (Gentle et al., 2004), and some proteolytic enzymes that specifically cleave in the presence of cysteine residues in a protease recognition sequence (Busch et al., 2008;Wissner et al., 2013) have been used as strategies for the exposure of N-terminal cysteine and its subsequent bioconjugation.

Targeting Serine and Threonine
The presence of an N-terminal serine or threonine offers unique opportunities due to the high susceptibility of 1, 2-aminoalcohols to periodate oxidation, resulting in the formation of a glyoxylyl group, which can be used to form several linkages (Xiao et al., 2005). It has been shown that the extra periodate used to oxidize the N-terminal residues of proteins carries the risk of oxidizing other residues, such as cysteines and methionines, as well as causing unwanted oxidative cleavage of protein glycosyl groups . This is mainly the approach applied in classical research, based on targeting serines or threonines at the N-terminal position, which uses periodate oxidation to generate a glyoxylyl group. Gaertner et al. performed site-selective PEGylation of an N-terminal serine residue, which was oxidized using sodium periodate followed by subsequent oxime ligation with an aminooxy and hydrazyde PEG derivative (Gaertner and Offord, 1996). The modified proteins, interleukin (IL)-8, granulocyte colony-stimulating factor (G-CSF) and IL-1rα, fully retained their activity after PEGylation (Krall et al., 2016).

Targeting Tyrosine
Francis et al. Have Reported a Number of Efficient strategies where tyrosine residues were modified via a three-component Mannich-type reaction, alkylation of the residue and coupling with diazonium reagents (Tilley and Francis, 2006). However, Jones et al. were the first to describe direct polymer conjugation, including PEGylation, to tyrosine residues. These authors developed a general route to polymer-peptide biohybrid materials by preferentially targeting peptide tyrosine residues using diazonium salt-terminated polymers. Also, aniline derivatives are attractive molecules for tyrosine-targeted protein modifications with 4-aminobenzoyl-N-PEG 2000 -OMe through either diazonium coupling or three-component Mannich-type reactions (Jones et al., 2012). Recently, the first study to apply Mannich reaction modification and reactive coloration in fibrous proteins was developed, providing promising future applications for the reactive dyeing process of silk (Chen et al., 2019).

Targeting Tryptophan
Peptides containing N-terminal tryptophan residues may be modified using the Pictet-Spengler reaction with aldehydes in glacial acetic acid. The Pictet-Spengler reaction is based on the oxidation of the N-terminal amino group to an imine, where an aldehyde undergoes cyclic condensation with the α-amine and the indole side chain of a tryptophan residue, forming a new stable C-C bond (Agrawal et al., 2013;Mittal et al., 2014). Li et al. applied the Pictet-Spengler reaction to peptide ligation using peptide segments containing an aldehyde at the C-terminal and a Trp at the N-terminal. The main advantage of this reaction is the formation of a product with a stable C-C bond in a single step (Li et al., 2000). Also, Sasaki et al. applied the Pictet-Spengler reaction to the N-terminal labeling of horse heart myoglobin with an N-terminal glycine, employing tryptophan methyl ester and tryptamine as the coupling partners (Sasaki et al., 2008).
As an alternative to chemoselective modification, recombinant methods have also been used to incorporate unnatural amino acids (UAA) into proteins as chemical handles for a bio-orthogonal conjugation reaction (Liu and Schultz, 2010). The transfer of nonnatural amino acids with azide and ketone functional groups at the N-terminus of proteins bearing N-terminal arginine residues using leucyl/phenylalanyl (L/F)-tRNA-protein transferase has proven efficient, both in the presence of other peptides and in crude protein mixture (Taki and Sisido, 2007). Although considerable progress has been made, an improvement in the existing N-terminal strategies is needed as none of the methods reported to date offer universal sequence compatibility.

ENZYMATIC TOOLS FOR SELECTIvE PEGYLATION OF PROTEINS
Enzyme-mediated bioconjugation has gained a lot of attention in recent years because of the ability of biocatalysts to modify specific molecular tags under mild conditions. In this section, we briefly explore some enzymatic tools used for selective PEGylation purposes. Among these, sortase A (SrtA) from Staphylococcus aureus has been the most widely applied enzyme for protein bioconjugation in academic research (Tsukiji and Nagamune, 2009;Popp and Ploegh, 2011;Schmidt et al., 2017;Wang et al., 2017). It catalyzes a transpeptidase reaction between an N-terminal amino group derived from glycine and a specific internal amino acid sequence on a protein, usually LPXTG (where X can be any amino acid) (Rosen and Francis, 2017) (Figure 3A). Although the sortase A is applied for labeling the peptides and proteins among them, the approach of sortase-mediated PEGylation has been used to label large macroscopic particles with PEG-stabilized proteins to the surface of cells (Tomita et al., 2013). More recently, Li et al. took advantage of the mutated sortase A enzyme, which can enzymatically ligate the universal α-amino acids to a C-terminal tagged protein, allowing specific modification of the C-terminus of human growth hormone (hGH) with PEG. This site-specific bound PEG-hGH has similar efficacy as wild-type hGH (Shi et al., 2018). Despite there being as yet no approved PEGylated drugs derived from sortagging, it could be a promising advancement for improving the performance of traditional PEGylated drugs.

Microbial Transglutaminases
Microbial transglutaminases (mTGases) are another class of enzymes that has frequently been used for protein conjugation (Figure 3B). Several excellent reviews covering applications of microbial transglutaminase have been published previously (Mariniello and Porta, 2005;Rachel and Pelletier, 2013;Adrio and Demain, 2014;Strop, 2014). In general terms, TGases catalyze the acyl transfer reaction between the c-carboxyamide group of a protein-bound Gln residue and a variety of linear primary amines, such as the amino group of Lys (Griffin et al., 2002). In terms of site selective PEGylation this approach could be ineffective due to promiscuity in the amine substrates for these enzymes (Rachel and Pelletier, 2013). Nevertheless, Pasut et al. examined how the properties of PEGylated human growth hormone (hGH) changed depending on whether it was generated by chemical modification at the N-terminus or enzymatically using transglutaminase. Enzymatic labeling of hGH was carried out using TGase and a PEG reagent incorporating a primary amine. The study shows that although hGH carries 13 glutamine residues, 63.3% of the reaction product was a monoPEGylated form at position 141, showing a certain degree of site selectivity (Da Silva Freitas et al., 2013). Spolaore et al. studied the reactivity of IFN α-2b to microbial mTGase to obtain a site-specific conjugation of this biopharmaceutical. Characterization by mass spectrometry of the conjugates indicated that among the 10 Lys and 12 Gln residues of the protein only Gln101 and Lys164 were selectively conjugated with a PEG-NH 2 for Gln101 and a PEG modified with carbobenzoxy--glutaminyl-glycine for Lys164 derivatization, with activity retention and improvements at pharmacokinetic levels (Spolaore et al., 2016). A mono-PEGylated derivative of filgrastim (granulocyte colonystimulating factor) was also prepared using mTGase. The conjugation yielded an active protein with a single conjugation site (Gln135) that exhibited good in vivo stability (Scaramuzza et al., 2012). Although in the previous examples the PEGylation sites do not correspond to the N-terminal amino acid, they do illustrate a partial selectivity of mTGase despite its tendency toward substrate promiscuity. Also, these results indicate the potential of mTGase in the future of specific PEGylation and the development of innovative biopharmaceuticals. More recently, Braun et al. obtained an insulin-like growth factor 1-PEG (IGF1-PEG) conjugate for release in diseased tissue by using a combination of enzymatic and chemical bioorthogonal coupling strategies. In this interesting example, mTGase was used for the ligation at the level of the N-terminal lysine of IGF1 to a PEG30 kDa modified protease-sensitive linker (Braun et al., 2018).

Subtiligase
Subtiligase is a redesigned peptide ligase based on the modification of the active site of subtilisin. It was engineered by converting catalytic Ser221 to Cys, thereby increasing the ligase activity compared to amidase activity, and Residue Pro225 was converted to Ala to reduce steric assembling (Haridas et al., 2014). Subtiligase facilitates the ligation between a peptide C-terminal ester and a peptide N-terminal α-amine, without requiring a recognition motif at the termini of any reaction partners (Lin and He, 2018) (Figure 3C). The selective modification of the α-amine using subtiligase is a powerful approach in proteomics to enrich new N-termini arising from protease recognition and cleavage (Wiita et al., 2014), because 80% and 90% of wild-type eukaryotic proteins are acetylated at the N-terminal position (Polevoda and Sherman, 2003). This advantage could be exploited for the selective attachment of PEG-modified peptides as an innovative application to improve either the conjugation efficiency or the originality in the development of therapeutics.

Butelase 1
Butelase 1 is an enzyme isolated from the medicinal and ornamental plant Clitoria ternatea, which is a high-yielding asparagine/ aspartate-specific cysteine ligase (Nguyen et al., 2014b) ( Figure  3D). In spite of being C-terminal-specific for Asx, this enzyme accepts most N-terminal amino acids to mediate intermolecular peptide and protein ligation (Nguyen et al., 2016). Although it was recently discovered, butelase 1 has been used for several purposes, such as protein modification and engineering, peptide/ protein ligation and labeling, peptide/protein macrocyclization, and living-cell surface labeling (Lin and He, 2018). No work has yet reported butelase 1 as being used for PEGylation reactions. However, some recent experiences with the enzyme, such as the method developed by Nguyen et al. for butelase-mediated ligation using thiodepsipeptides, have been applied in N-terminal labeling of ubiquitin and green fluorescent protein (GFP). The ligation yield of > 95% could be achieved for the model peptide and ubiquitin with a small substrate excess. This result anticipates a wide-ranging application and the perspectives of using butelase 1 for N-terminal modification of peptides and proteins (Nguyen et al., 2015).

Lipoid Acid Ligase
Lipoid acid ligase (LplA) is an alternative enzyme that has also been exploited for protein bioconjugation. This enzyme is able to recognize a specific LplA acceptor peptide (LAP) and catalyze the attachment of a lipoate moiety to a lysine residue in LAP (GFEIDKVWYDLDA) through an ATP-dependent reaction (Puthenveetil et al., 2009;Zhang et al., 2018) (Figure 3E). Regarding PEGylation, Plaks et al. used LplA for multisite clickable modification based on the incorporation of azide moieties in GFP at the N-terminal and two internal sites. Modification of the ligated azide groups with PEG, α--mannopyranoside and palmitic acid resulted in highly homogeneous populations of conjugates, being a potential approach, for instance, for site-specific multipoint protein PEGylation, among other modifications (Plaks et al., 2015). Additionally, other studies have been conducted using LplA-mediated enzymatic protein labeling followed by subsequent bio-orthogonal reactions (Hauke et al., 2014;Drake et al., 2016;Gray et al., 2016), allowing site-specific labeling of N-or C-terminus, even at the internal regions of a target protein.
There are other enzymes that have also been exploited for protein bioconjugation, including tubulin tyrosine ligase, which catalyzes the ATP-dependent addition of a tyrosine residue to the C-terminus of a-tubulin yielding a peptide bond (Schumacher et al., 2015;Zhang et al., 2018); N-myristoyltransferase, leading the transference of myristate from myristoyl-CoA to the N-terminal glycine of protein substrates, resulting in an amide linkage (Wright et al., 2010;Zhang et al., 2018); and biotin ligase, another ATP-dependent enzyme, catalyzes the conjugation of biotin derivatives onto proteins (Howarth and Ting, 2008;Fairhead and Howarth, 2015).

ANALYTICAL METHODS FOR CHARACTERIZATION OF PEGYLATED PROTEINS
The evidence indicates that the use of PEG to improve the properties of biopharmaceuticals or diagnostic agents will increase. This is supported by the growing number of proposals in clinical evaluation each year. In order to achieve high-quality products, it is necessary to take into account the implementation of accurate methods for the analysis of some parameters that provide a higher level of characterization of the molecule under study. It is important to note that none of the techniques on their own allows for the most complete characterization of the PEGylated proteins, but in many cases the combination of these is necessary to obtain more accurate results. This section provides an overview of the most frequently used analytical methods for the characterization of PEGylated peptides and proteins.

High-Performance Liquid Chromatography-Mass Spectrometry
High-performance liquid chromatography (HPLC) has been used for the separation and quantitation of free PEG and its PEGylated-protein form (PEG-conjugate). Some features of the PEGylated protein such as conjugate molecular weight, polymer mass distribution, or the degree and sites of PEGylation can be measured by HPLC methods. Lee et al., using SEC (sizeexclusion chromatography) and RP-HPLC (reversed phasehigh-performance liquid chromatography) mapping, assessed N-terminal PEGylated EGF, demonstrating the formation of a PEGylated macromolecule and that PEGylation occurred at the N-terminal position, respectively (Lee et al., 2003). Also, Brand et al. performed the separation of N-terminal PEGylated retargeted tissue factor tTF-NGR by using HPLC-based gel filtration, revealing pure elution fractions with the mono-PEGylated protein, which were represented by one clear band in SDS-PAGE and Western blotting (Brand et al., 2015). Although generally useful, the HPLC conditions and detection method must be improved for each compound based on the specific properties of the conjugated proteins.
To improve HPLC performance in the characterization of PEGylated proteins and to provide a more detailed characterization, the solution of coupling liquid chromatography to mass spectrometry was adopted. For decades, mass spectrometry (MS) has been the technique of choice for PEGylated protein characterization in terms of accurate average molecular weight and degree of PEGylation (Hutanu, 2014). A comparison of PEGylated and un-PEGylated counterparts by MS and peptide mapping is used to identify and quantify PEGylation sites and characterize impurities that occasionally go undetected by simpler techniques (Caserman et al., 2009). Collins et al. performed N-terminal amine PEGylation to stabilize oxytocin formulations for prolonged storage. Conjugation was confirmed by Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight (MALDI-TOF) MS, where a clear shift in molecular weight was observed in the MALDI-TOF spectrum from the NHS ester polymer to the polymer-peptide conjugate (Collins et al., 2016). In another study conducted by Qin et al. following MALDI-TOF MS, PEG modification sites were determined through comparative analysis of peptide mapping between rhGH (recombinant human growth hormone) and PEG-rhGH. The use of MS makes it possible to discriminate positional isomers, with PEGylation sites potentially located at the N-terminus and nine lysine residues of rhGH (Qin et al., 2017). However, the exact determination of the PEG attachment site(s) continues to be highly challenging, especially in a mixture composed of products with differing degrees of PEGylation (Gerislioglu et al., 2018). On the other hand, ESI-TOF has overcome some disadvantages related to polydispersion and the overlapping protein charge pattern of PEGylated proteins (Forstenlehner et al., 2014). Furthermore, ESI-MS is preferred to MALDI due to automated workflow and reduced sample preparation time (Hutanu, 2014). Several studies have reported applying the approach of the line-up of liquid chromatography to MS (LC-MS) for the sensitive quantitation of free PEG in biological fluid samples (Pelham et al., 2008;Yin et al., 2017) or tissues (Gong et al., 2014), as well as clear detection and identification of the positional isomers formed upon PEGylation (Gerislioglu et al., 2018;Shekhawat et al., 2019), obtaining significant structural information in a heterogeneous sample of PEGylated proteins Muneeruddin et al., 2017), among others.

Dynamic Light Scattering
Dynamic light scattering (DLS) is an additional technique also convenient for the molecular weight evaluation of PEGylated proteins, as it can measure the molecular radii of the samples (Kusterle et al., 2008;Gokarn et al., 2012), and discriminate between linear and branched PEGs (Wan et al., 2017). This method, among others, was used by Vernet et al. in 2016 to assess the first large-scale study with the site-specific mono-PEGylation of 15 different proteins and characterization of 61 entities in total (Vernet et al., 2016). In addition, Khameneh et al. conducted a study in which site-specific PEGylated hGH was prepared by using microbial transglutaminase. Physicochemical properties, size and zeta potentials of native and PEGylated hGH, were evaluated by DLS, indicating that the size and zeta potentials of the protein were increased and decreased respectively by PEGylation, enhancing the stability of the protein (Khameneh et al., 2016). Recently, Meneguetti et al. applied DLS for the characterization of a novel N-terminal PEGylated asparaginase, showing that the PEGylation of ASNase caused an increase in the hydrodynamic diameter of the protein related to the increase in the amount of PEG attached to the protein (Meneguetti et al., 2019). The DLS approach has been used in the characterization not only of PEGylated proteins, but also of PEGylated organic nanotubes, revealing that PEGylation dramatically improves the dispersibility of the nanotubes in saline buffer (Ding et al., 2014). Despite its wide use in the characterization of the hydrodynamic radius of PEGylated proteins, this methodology presents certain disadvantages in its application, such as the presence of large particles that can also be detected during the analysis; low resolution when the populations are close in size or a highly polydispersed sample; light absorption by the dispersant can interfere with detection because of their viscosity as well as the density of the particles. These are important parameters to take into account when carrying out this type of analysis.

Nuclear Magnetic Resonance
1 H nuclear magnetic resonance (NMR) spectroscopy is useful to quantify PEGylated species in complex biological fluids with advantages of time and simplicity in the sample preparation (Alvares et al., 2016). The application of this technique for the structural characterization of conjugates with PEG (Kiss et al., 2018) has been being useful in the quantitative determination of the degree of PEGylation (Zaghmi et al., 2019), the assessment of the higher-order structure of PEGylated therapeutic proteins (Cattani et al., 2015;Hodgson and Aubin, 2017) or even the behavior of free PEG in serum samples (Khandelwal et al., 2019). More recently, solid state NMR has been used for the structural characterization of large PEGylated proteins such as asparaginase (Giuntini et al., 2017;Cerofolini et al., 2019). The combination of NMR with other techniques such as LC-MS/MS has enabled the accurate quantification of isobaric glycan structures, even in the picomolar order (Wiegandt and Meyer, 2014), an approach that could be used for a better characterization of high complexity PEGylated molecules.

Immunoenzymatic Assays
Enzyme-linked immunosorbent assay (ELISA) is a powerful tool for measuring the concentration of PEGylated proteins in serum samples . This technique permits the study of the effects of PEGylation in protein immunogenicity as well as the anti-PEG immune response (Wan et al., 2017). While direct ELISA has the advantage of lacking only one specific antibody for compound detection, it cannot distinguish between PEGylated and unPEGylated proteins (Cao et al., 2009). On the other hand, sandwich or indirect ELISA employs two antibodies: one to capture the analyte on a solid surface and a second to determine the concentration of the captured analyte (Gan and Patel, 2013). Bruno et al. used a quantitative sandwich ELISA to analyze the pharmacokinetics of Pegasys and PEG-Intron using two mouse monoclonal antihuman IFN-α antibodies that recognize different epitopes of IFN-α (Bruno et al., 2004), and a similar ELISA was used for the measurement of Neulasta ® (Roskos et al., 2006) and Mircera ® (Macdougall et al., 2006). Su et al. produced secondgeneration monoclonal antibodies attached to PEG (AGP4/ IgM and 3.3/IgG) that also bind to the repeating subunits of the PEG backbone, but with greater affinity than those of first-generation AGP3 and E11 (Su et al., 2010). Since then, they have produced a range of specific anti-PEG IgG and IgM monoclonal antibodies for use in ELISA, FACs, IHC, and flow cytometry, which can be found under anti-PEG in the Institute of Biomedical Sciences at Academia Sinica, Taiwan.

Bioinformatics Methods
With the advent of the era of bioinformatics, computational methods have been effectively employed for an easier designing, engineering, and characterization of proteins, which supports experimental methodologies and, in many cases, saves time and materials. At present, computational analysis is highly recommended to select the proper position on the protein for site-selective PEGylation (Rouhani et al., 2018). In 2013 Mu et al. conducted a bioinformatics study in which four forms of PEGylated staphylokinase obtained by site-specific conjugation of PEG to the N-and C-termini of SK, respectively, were structurally evaluated to provide greater molecular insight into the interaction between the PEGylated protein and its receptor (Mu et al., 2013). The results suggested that the PEG polymer wraps around the protein providing steric shield, and this effect depends on the PEG chain length and PEGylation site of the protein (Rouhani et al., 2018b). Also, Mirzaei et al. (2016) applied computational and non-glycosylated systems to define an artless methodology for site-selective (cysteine) PEGylation of erythropoietin analogs. The results showed that using an in silico approach together with the experimental methodologies can be a strategy to optimize the parameters of PEGylation (Mirzaei et al., 2016). Recently, Xu et al. (2018) used interferon (IFN) as a representative model system to characterize the molecular-level changes in IFN introduced by several degrees of PEGylation through molecular dynamics simulations. The simulations generated molecular evidence directly linked to improved protein stability, bioavailability, retention time, as well as the decrease in protein bioactivity with PEG conjugates, providing an important computational approach in the improvement of PEGylated protein drug conjugates and their clinical performance (Xu et al., 2018a). However, and in spite of the advances obtained in this field, there are still some drawbacks that must be solved, such as the computational cost in terms of infrastructure, and many times, it could be hard to explain what the biological or clinical meaning of features identified using bioinformatics analysis.

RECENT APPROACHES IN THE SITE-SELECTIvE CONJUGATION OF PROTEINS
The chemistry of natural amino acids has been a highly exploited approach in the bioconjugation of proteins. However, there is often poor control over the site and various modifications and incompatibilities with complex mixtures or living systems (Reddington and Howarth, 2015). Since the manipulation of proteins is at the core of biochemical research, the search for new strategies in efficient and specific bioconjugation has been an objective developed by the scientific community through protein engineering. These strategies include for example the SpyTag/ SpyCatcher system.

The Spytag/Spycatcher System
The SpyTag/SpyCatcher system allows the specific and covalent conjugation of proteins through two short polypeptide tags (Zakeri et al., 2012). The larger partner, the SpyCatcher, adopts an immunoglobulin-like conformation that specifically binds the SpyTag (γ-carbon of Asp-117), leading the formation of an extremely resistant intermolecular bond between two amino acid side chains (Gilbert et al., 2017). In this extremely fast method, no exogenous enzymes need to be added or removed (Fisher et al., 2017) and despite its recent description, this system has already been used in the production of synthetic vaccines (Brune et al., 2016), thermo-stable enzymes (Schoene et al., 2016;Wang et al., 2016), and other applications Dovala et al., 2016;Lakshmanan et al., 2016). Take advantage of this system, Gilbert et al. described how the XynA enzyme was genetically encoded to covalently conjugate in culture media, providing a novel and flexible strategy for protein conjugation exploiting the substantial advantages of extracellular self-assembly (Gilbert et al., 2017). Recently, Cayetano-Cruz et al. published a study in which the α-glucosidase Ima1p enzyme of Saccharomyces cerevisiae was attached to the surface of virus-like particles (VLPs) of parvovirus B19 using the SpyTag/ SpyCatcher system. This approach made it possible to obtain a more thermostable enzyme and the modified VLPs were also able to act on glycogen. Hence, these particles may be developed in the future as part of the therapy for the treatment of diseases caused by defects in the human acid α-glucosidase (Cayetano-Cruz et al., 2018). SpyCatcher is large and may be difficult to attach to polymers; therefore, the final product contains a large SpyCatcher protein sequence (Zakeri et al., 2012). It could be a reason why no study to date has been reported using this system to modify proteins with PEGs. However, this is a promising mechanism to create PEGylated proteins, taking advantage of the fact that SpyTag can be placed at the N-terminus, at the C-terminus and at the internal positions of a protein (Zakeri et al., 2012), and previously bound, for instance, to the polymers (PEG) being conjugated.

Ring Opening Polymerization
Ring opening polymerization (ROP) is a reaction, in which the terminal end of a polymer chain acts as a reactive center where additional cyclic monomers can react by opening its ring system, forming a longer polymer chain (Jenkins et al., 1996) with the occurrence of two main reactions: initiation and growth (Penczek and Pretula, 2016). In 2013, Spears et al. used the approach of ROP for first time for the in situ controlled branching of polyglycidol and formation of BSAglycidol bioconjugates with "PEG-like" arms (Spears et al., 2013;Qi and Chilkoti, 2015). Since then, ROP has been used as a methodology to modify various molecules as well as to obtain different varieties of polymers. Ma et al. prepared a cross-linked fluorescent polymer through ROP and performed a subsequent ring opening PEGylation with 4-arm PEG-amine, yielding polymeric nanoparticles in aqueous solution with hydrophilic PEG groups covered at the surface (Ma et al., 2015). Also, Tian et al. (2018) developed smart polymeric materials based on biomimetic PEGylated polypeptoids by combining ring-opening polymerization and a post-modification strategy (Tian et al., 2018). Furthermore, the usefulness of this approach has also been established in the preparation of PEGylated and fluorescent nanoprobes for biomedical applications (Wan et al., 2015;Xu et al., 2018b) and the development of polymeric gene vectors with high transfection efficiency and improved biocompatibility (Xiao et al., 2018). All this demonstrates the potential that ROP could have in the design of PEGylated proteins of biopharmaceutical interest or other molecules used in the diagnosis of different diseases.

Click Chemistry
Click chemistry is another method widely used for PEG attachment to proteins for different purposes (Jølck et al., 2010;Leung et al., 2012;Li et al., 2012;Xu et al., 2015;Huang et al., 2018;Lou et al., 2018). Here, azide and alkyne groups react selectively with each other in the presence of Cu 1+ as the catalyst (Rostovtsev et al., 2002) through the initial reaction of reduced thiols with a maleimide compound containing a click-reactive alkyne moiety. Then, a large PEG molecule containing a complementary clickreactive azide moiety is selectively conjugated to the click-tagged thiols (van Leeuwen et al., 2017). This method is versatile, fast and simple to use, easy to purify, site-specific, and gives high product yields (Hein et al., 2008); however, its drawback is related to the toxicity of copper, even in small amounts. This could limit the development of pharmaceuticals using this methodology; as a result, PEGylation via copper-free click reaction has gained more attention nowadays (Debets et al., 2010;Koo et al., 2012;Lou et al., 2018). The reaction conditions are extremely mild and do not cause protein denaturation, nor are any metals, reducing agents or ligands required.

Non-Covalent PEGylation
Non-covalent PEGylation is an innovative approach in which a chemical reaction between protein and PEG is avoided (Reichert and Borchard, 2016). It is based on the mechanisms of hydrophobic interactions (Mueller et al., 2011a;Mueller et al., 2011b;Mueller et al., 2012), ionic interactions (Khondee et al., 2011), protein polyelectrolyte complex (Kurinomaru and Shiraki, 2015;Kurinomaru et al., 2017), or chelation (Mero et al., 2011). The main advantage of this technique is that it eliminates a potential loss of product due to additional purification processes (Reichert and Borchard, 2016). However, the release of the protein during storage is an important shortcoming for this approach (Santos et al., 2018).

CONCLUDING REMARKS
The covalent attachment of peptides and proteins to polyethylene glycol remains a preferred method for modifying the pharmacokinetic and immunological properties of therapeutic molecules, supported not only by the introduction of PEGylated drugs on the market but also by the increasing number of currently ongoing clinical studies. The chemical versatility of polyethylene glycol derivatives enables the synthesis of various PEGylated protein structures, with a trend to target-specific amino acid residues located at the terminal ends (N or C-terminus) of the peptides or protein of interest, which contributes to obtaining homogeneous and well-defined conjugates. These site-selective modifications must preserve the biological activity of the PEGylated molecule. As part of the development of the science of PEGylation, new methods continue to be implemented based on new approaches, as well as faster and more efficient techniques, such as enzymatic ligation or the development of bio-orthogonal chemistry. As the number and location of PEG chains attached to a protein can affect its activity, it is critical to uncover these important structural details. Thus, strong analytical methods must be developed, allowing for a qualitative and quantitative characterization with a greater degree of robustness and accuracy. In this sense, the computational tools (predictive models based on molecular dynamics) are a great help in clarifying interactions, binding sites or stability of PEGylated proteins in the unending search for and design of new, more effective biopharmaceuticals.

AUTHOR CONTRIBUTIONS
LB and CR-Y: writing of the topic related to the chemical reactions of site-specific pegylation. JBL and ML-E: writing of the topic related to enzymatic pegylation. BE and RC: writing of the topic related to the characterization of pegylated proteins. AP and JF: critical revisions and corrections of the manuscript.