The E4 protein; structure, function and patterns of expression

The papillomavirus E4 open reading frame (ORF) is contained within the E2 ORF, with the primary E4 gene-product (E1 ∧ E4) being translated from a spliced mRNA that includes the E1 initiation codon and adjacent sequences. E4 is located centrally within the E2 gene, in a region that encodes the E2 protein ′ s ﬂ exible hinge domain. Although a number of minor E4 transcripts have been reported, it is the product of the abundant E1 ∧ E4 mRNA that has been most extensively analysed. During the papillomavirus life cycle, the E1 ∧ E4 gene products generally become detectable at the onset of vegetative viral genome ampli ﬁ cation as the late stages of infection begin. E4 contributes to genome ampli ﬁ cation success and virus synthesis, with its high level of expression suggesting additional roles in virus release and/or transmission. In general, E4 is easily visualised in biopsy material by immunostaining, and can be detected in lesions caused by diverse papillomavirus types, including those of dogs, rabbits and cattle as well as humans. The E4 protein can serve as a biomarker of active virus infection, and in the case of high-risk human types also disease severity. In some cutaneous lesions, E4 can be expressed at higher levels than the virion coat proteins, and can account for as much as 30% of total lesional protein content. The E4 proteins of the Beta, Gamma and Mu HPV types assemble into distinctive cytoplasmic, and sometimes nuclear, inclusion granules. In general, the E4 proteins are expressed before L2 and L1, with their structure and function being modi ﬁ ed, ﬁ rst by kinases as the infected cell progresses through the S and G2 cell cycle phases, but also by proteases as the cell exits the cell cycle and undergoes true terminal differentiation. The kinases that regulate E4 also affect other viral proteins simultaneously, and include protein kinase A, Cyclin- dependent kinase, members of the MAP Kinase family and protein kinase C. For HPV16 E1 ∧ E4, these kinases regulate one of the E1 ∧ E4 proteins main functions, the association with the cellular keratin network, and eventually also its cleavage by the protease calpain which allows assembly into amyloid-like ﬁ bres and reorganisation of the keratin network. Although the E4 proteins of different HPV types appear divergent at the level of their primary amino acid sequence, they share a recognisable modular organisation and pattern of expression, which may underlie conserved functions and regulation. Assembly into higher-order multimers and suppression of cell proliferation are common to all E4 proteins examined. Although not yet formally demonstrated, a role in virus release and transmission remains a likely function for E4.

breakdown products of the abundant skin keratins, but were subsequently shown to be the products of the viral E4 ORF Doorbar et al., 1986Doorbar et al., ,1988. Because of the great abundance of E4 in HPV1 warts (see Fig. 1), much of the early characterisation of E4 expression was carried out on this HPV type (Doorbar et al., 1986;Grand et al., 1989;Rogel-Gaillard et al., 1992Roberts et al., ,1994, with studies on the high-risk HPV types following Roberts et al., 1997;Pray and Laimins, 1995;Brown et al., 1994). Indeed, the abundance of E4 in lesions has suggested a role as a biomarker of active HPV disease Borgogna et al., 2012), and at the cervix, a marker of disease severity (Griffin et al., 2012;Doorbar and Cubie, 2005). The E4 proteins of all HPVs, and probably most animal HPV types, have some noticeable structural similarities (see detail provided in Tables 1-3, Katoh et al., 2002), but have diverged considerably in their primary amino acid sequences (see Figs. S1 and S2), and probably also the subtleties of their function.

E4 expression and its contribution to life-cycle success
Most molecular studies on E4 have focused on HPV1 from the Mu Genus (see Fig. 1), and HPV16 from the Alpha Genus (Species Group 9; Fig. 2, Table 1), with additional analysis coming from the study of other Alpha papillomaviruses including HPV11, 18 and 31 (Pray and Laimins, 1995;Brown et al., 1991;Frattini et al., 1997). The E4 proteins have however been detected in lesions caused by a much wider range of papillomavirus types, including HPV2 (Alpha), HPV3 (Alpha), HPV4 (Gamma), HPV5 (Beta), HPV6 (low risk (LR) Alpha), HPV8 (Beta), HPV11 (LR Alpha), HPV58 (HR Alpha), HPV59 (HR Alpha), HPV63 (Mu) and HPV65 (Gamma) (Fig. 3 (Borgogna et al., 2012;Brown et al., 2004)), as well as several animal papillomaviruses including Canine Oral Papillomavirus (COPV, ), Rabbit Oral Papillomavirus (ROPV, (Maglennon et al., 2011)), Cottontail Rabbit Papillomavirus (CRPV, (Peh et al., 2004)) and Bovine Papillomavirus (BPV, (Peh et al., 2002)). In each case, the detection of E4 by immunofluorescence is simple and straightforward, suggesting that these papillomaviruses resemble HPV1 to some extent in their ability to express their E4 proteins at high level. Such comparative analysis has also revealed broad similarities in the timing of expression, with all papillomaviruses that have been examined showing prominent E4 accumulation in the mid and upper epithelial layers during productive infection (Maglennon et al., 2011;Peh et al., 2002;Doorbar et al., 1997) (see protein distribution images in Figs. 1 and 2). A close association is apparent between the first detection of E4 by immunofluorescence and the onset of vegetative viral DNA replication or genome amplification, which when considered alongside its abundance, points to an important role for E4 in the late stages of the virus life cycle Peh et al., 2002;. Historically, the E4 ORF was classified as an early viral gene (Chen et al., 1982;Danos et al., 1982), because it lies in the early region of the viral genome and is embedded amongst viral genes that regulate cell-cycle entry and genome maintenance. No obvious function for E4 during the early stages of the virus life-cycle has yet been convincingly described however. Early studies on BPV showed that it was not involved in cell transformation by this virus (Neary et al., 1987), while E4 null mutants of CRPV, were not apparently compromised in their ability to produce papillomas in either domestic or cottontail rabbits (Peh et al., 2004), which is consistent with a primary role during the late rather than the early stages of infection. Amongst HPVs, only the high-risk HPV types have been extensively studied, partly because of the availability of convenient model systems for these viruses (Frattini et al., 1997;Fig. 1. E4 inclusion granules are typically found in cutaneous HPV types (Mu, Beta, Gamma Genera). Cutaneous HPV types from the Mu, Beta and Gamma Genera typically accumulate their E4 proteins to high level as inclusion granules visible by Hemotoxylin & Eosin staining (granular structures arrowed in (A)). The precise composition of these inclusion granules is not known, but they are thought to be made up primarily of E4 Rogel-Gaillard et al., 1992. The appearance of such E4 inclusion granules varies depending on HPV type (Egawa, 1994;Gissmann et al., 1977;Gross et al., 1982). The lesion shown in A is caused by a Mu HPV type (HPV63), but E4 inclusion granules are also apparent in lesions caused by Beta and Gamma HPV types. The E4 inclusion granules are predominantly cytoplasmic, but can also be found in the nucleus. The individual electron-dense E4 granule shown in (B(i)) is imaged from the granular layer of a related Mu HPV type (HPV1) and is stained using a primary antibody to E4 and detected using a gold-conjugated secondary-antibody. E4-staining is visualised as dots on the E4 granule (arrowed). It appears that in cells with intact nuclei, there is little association between E4 and the virus particles, which form arrays around the intranuclear E4 inclusions. Following nuclear degeneration in the upper epithelial layers, virus particles enter the cytoplasm of the terminally differentiating epithelial cells and appear by immunostaining to be closely associated with the abundant cytoplasmic E4 proteins (Doorbar et al., 1997). In the case of the Mu HPV types, E4 can be detected in purified virus preparations (Doorbar and Gallimore, 1987), with virions often being associated with blebs (possibly E4) after density gradient centrifugation (B(ii)). In naturally-occurring papillomas caused by Mu HPV types, E4 expression begins in the lower epithelial layers as visualised by indirect immunofluorescence (C(i)) and (C(ii)). The green stain is E4, and the cytoplasmic inclusion granules are arrowed in (C(i)) (nuclei shown in blue after DAPI counterstain). The red stain in (C(i)) reveals the distribution of amplified viral DNA following Fluorescent In Situ Hybridisation (FISH) using a viral DNA probe. Genome amplification is generally confined to E4-positive cells, which in the lower layers are also positive for cellular replication proteins such as MCM (red stain in (C(ii))). It is generally thought that viral genome amplification occurs predominantly in E4-positive/replication-competent cells, which (in HPV1) are also positive for the G2 cell cycle marker CyclinB/Cdk1 (Davy et al., 2005). Flores et al., 1999;Hummel et al., 1992) (Fig. 2). In experimental systems, E4 null mutants of HPV16, 18 and 31 are not significantly perturbed in their ability to be maintained in proliferating 'basal-like' keratinocytes in monolayer culture, and do not show consistent differences in their ability to drive cell proliferation (Nakahara et al., 2005;Wilson et al., 2005Wilson et al., , 2007. While these studies do not rule out an early function for E4, they have not yet allowed such a function to be identified. By contrast, both animal studies using CRPV (Peh et al., 2001), and organotypic raft studies on HR HPV types (Nakahara et al., 2005;Wilson et al., 2005Wilson et al., ,2007 have suggested a role for E4 in modulating genome amplification and virus synthesis. A single study that applied the organotypic raft approach to HPV11 (a low-risk HPV type) failed to report a dramatic effect on these functions . This may reflect a less significant involvement of the low-risk E4 proteins in these late events and/or differences in genome maintenance and amplification requirements between the high and Table 1 The 16E1 ∧ E4 protein is structurally and functionally modified by cellular kinases and proteases as the infected cell undergoes terminal differentiation. These changes are mediated partly by the viral gene products that drive cell cycle entry and affect normal cell signalling (e.g. E6, E7 and E5), and partly by the differentiating environment within the epithelium. Changes in the cellular environment are thought to modulate the functions of viral and cellular proteins simultaneously in order to ensure virus genome amplification, virus assembly and virus release.  Cytokeratin HPV1 a &HPV16 a Direct binding to type 1 keratins (e.g. keratin 18) but not type 2 keratins or vimentin (Wang et al., 2004). Binding is dependent on the 16 amino acids at the N-terminus of the E1 ∧ E4 protein, with the leucine cluster being absolutely required (Roberts et al., 1997) 1. Interaction has been shown in vitro and in vivo (Wang et al., 2004) and triggers a cellular stress-response with activation of stressassociated kinases (e.g p38 and pJNK (McIntosh et al., 2010)) 2. Interaction leads to the reorganisation and hyper phosphorylation of the cytokeratin IF network (McIntosh et al., 2010). 3. Interaction is thought to contribute to virus escape and cell destruction in the upper epithelial layers Roberts et al. (1994), , Roberts et al. (1997) (Davy et al., 2005).
1. For HPV16, the sequestration of CyclinB/Cdk1 in the cytoplasm arrests proliferating cells (such as are found in neoplasia) in G2 (Davy et al., , 2005, and contributes to genome amplification-efficiency (Wang, Jackson, Doorbar; Manuscript in preparation) 2. In HPV18, CyclinB/Cdk1 sequestration is similarly important for G2 arrest. In productive HPV18 rafts, no role for 18 E1 ∧ E4 G2 arrest was reported (Knight et al., 2011). This may reflect a more limited suprabasal cell proliferation seen in this system when compared to HPV16 3. In HPV1, CyclinB/Cdk1 activity is thought to be inhibited rather than being sequestered. It has been suggested that the full length E1 ∧ E4 protein contributes to a role for 16K-mediated G2-arrest in genome amplification by inhibiting re-replication of cellular DNA in the arrested cells (Knight et al., 2004). Cdk1 inhibition is mediated by Wee1 . The significance for the HPV1 life cycle is thought to be in optimising genome amplification  Davy et al. In HPV18, the putative cyclin binding motif 43 RRLL 46 (rather than T 23 ) has been reported to important for binding (Knight et al., 2011;Ding et al., 2013) For HPV1 E4, G2 arrest function appears to reside with the 16K N-terminally-truncated form of the protein and requires residue T 27 (E1 ∧ E4 position) which is also referred to as T 13 (its position in the 16K E4 species) (Knight et al., 2004). HPV1 E4-mediated G2 arrest is dependent on sustained Wee1 kinase activity which phosphorylates Cdk1 at T 15  CyclinA/Cdk2 HPV16&HPV18 For HPV18, association requires the 43 RRLL 46 motif (Knight et al., 2011) that also mediates CyclinB/ Cdk1 binding. In the case of HPV16 E1 ∧ E4, residues T 22 and T 23 are essential for both CyclinA/Cdk2 binding and CyclinB/Cdk1 association (Davy et al., 2006). CyclinA/Cdk2 phosphorylates 16 E1 ∧ E4 at Serine 32 (S 32 ) 1. For HPV 16, CyclinA/Cdk2 is sequestered in the cytoplasm in G2 but not S-phase. It is though that this may augment the G2 arrest activity mediated through CyclinB/Cdk1 association 2. For HPV 18 CyclinA/Cdk2 sequestration does not occur in all cell types and does not appear essential for the G2 arrest phenotype (Knight et al., 2011) 3. The HPV1 E4 16K protein appears to stimulate an elevation in CyclinA levels on G2 arrest (Knight et al., 2004) Knight et al. (2011), Ding et al. (2013), Davy et al. (2006) CyclinE/Cdk2 HPV18 Requires 43 RRLL 46 motif in 18 E1 ∧ E4. Interaction seen so far only in in vitro binding assays (Ding et al., 2013) Interaction with CyclinE/Cdk2 not yet seen in mammalian cells. 18 E1 ∧ E4 does not appear to regulate S-phase entry. Significance uncertain Ding et al. (2013) p42 MAPK HPV16, HPV11 Transient association and phosphorylation of 16 E1 ∧ E4 at Threonine (HPV16 T 57 ; HPV11 T 53 ) around the onset of genome amplification HPV16 E1 ∧ E4 is phosphorylated and functionally modified to enhance keratin binding by p42 MAPK (Wang et al., 2009). It is thought that ( Wang et al., 2009). HPV18 E1 ∧ E4 is reported not to be a p42 MAPK target (Knight et al., 2011;Bell et al., 2007) this helps initiate E4-accumulation following activation of the differentiation-dependent promoter (Wang et al., 2009) Protein kinase A HPV16, HPV18 Phosphorylation site not mapped. S 43 predicted in HPV16 (Wang et al., 2009) Possible regulator of E1 ∧ E4 structure. PKA sites reported in many E4 proteins (Grand et al., 1989;Wang et al., 2009;Bryan et al., 2000;  E4 expression causes the re-organisation of PML from ND10 domains to nuclear E4 inclusion granules. Interaction not thought to be direct (Roberts et al., 2003) Redistribution dependent on the presence of nuclear E4 inclusions, rather than the more prominent cytoplasmic E4 inclusions. Suggested involvement in viral genome amplification, perhaps by modulating anti-viral responses (Roberts et al., 2003) Roberts low-risk HPVs. LR HPV types such as HPV11 do not replicate well in standard keratinocyte monolayer culture however, which makes analysis of life-cycle events in organotypic culture difficult Thomas et al., 2001;Oh et al., 2004). The analysis of HPV11 E4 in genital warts has however suggested that this protein can become cross-linked by endogenous transglutaminase to the cornified envelope and that the integrity of the cornified squame is compromised Brown et al., 2006;Bryan and Brown, 2000). It is thought that such 'fragile' squames may facilitate efficient person to person transmission (Bryan and Brown, 2001).

E4 processing and functional regulation in infected epithelium
Cutaneous infections (Mu, Gamma, Beta HPV types) One of the key differences between the E4 proteins of different papillomaviruses lies in the structures that they form in the infected cell, and in the timing at which these structures appear (see Tables 1 and 3, Figs. 1 and 2). Many cutaneous HPV types such as HPV1 and 63 (Mu Genus), HPV 4 and 65 (Gamma Genus) and HPV 5 and 8 (Beta Genus) form productive papillomas that are characterised by the appearance of cytoplasmic inclusion granules (see Fig. 1 (Croissant et al., 1985;Egawa, 1993Egawa, , 1994Gissmann et al., 1977;Jablonska et al., 1985;Laurent et al., 1982)). These inclusion granules are typically first noticeable in the lower to mid epithelial layers (i.e. the spinous layers), and are in general, clearly visible in Haemotoxylin and Eosin stains, although for some HPV types (e.g. HPV4) staining is poor and can give rise to a 'clear cell' appearance (Gross et al., 1982). These inclusion granules are the primary location of the HPV E4 proteins in such cutaneous lesions, with E4 antibodies staining these structures prominently (Fig. 1 Croissant et al., 1985;Rogel-Gaillard et al., 1993;Peh et al., 2002;). The precise composition of these cytoplasmic structures is not known, but analysis using chemical cross-linking agents that introduce reversible covalent links between proteins that are in close proximity to each other, suggests that they may be composed predominantly of E4, and that the great abundance of E4 in cutaneous lesions is likely to be attributed largely to the presence of these structures . E4 inclusion granules are a major contributor to the 'cytopathic effect' described by pathologists, with different HPV types showing different cytopathic morphologies (Rogel-Gaillard et al., 1993;Egawa et al., 1993;Gross et al., 1982). In HPV1, these structures have been extensively examined and are present as cytoplasmic (and nuclear) inclusion granules in cells supporting viral genome amplification immediately above the infected basal layer (see Fig. 1). The prominent HPV1 cytoplasmic inclusions probably associate transiently with the cytokeratin network during their formation (Rogel-Gaillard et al., 1992Roberts et al., 2003). The less prominent nuclear HPV1 E4 inclusions (see Fig. 1) have been reported to be associated with promyelocytic leukemia protein (PML) at their periphery (Roberts et al., 2003). The E4 inclusion granules typically coalesce and increase in size as the infected cells are pushed towards the epithelial surface, eventually becoming less distinct as nuclear degeneration occurs in the upper cornified layers (Fig. 1, Rogel-Gaillard et al., 1992). Studies on the HPV1 E4 protein have shown that it becomes multiply phosphorylated during differentiation Grand et al., 1989), and also becomes progressively cleaved from a 17 K full length species in the parabasal cell layers, to shorter 16 K (truncated before tyrosine 15), 11 K and 10 K species (truncated before alanine 59) that have lost sequences from their Nterminal region (Table 3, (Doorbar et al., , 1988). Interestingly, the 16 K HPV1 E4 protein has been reported to retard cell cycle progression at the G2 phase of the cell cycle by inhibiting the activity of Cdk1 , with the 17 and 16 K proteins working together to also inhibit cellular DNA replication (Knight et al., 2004 ( Table 2). In cell-free culture systems, the HPV1 E4 proteins inhibit High-risk Alpha HPV E4 expression as amyloid fibres in low-grade Cervical Neoplasia. The E4 proteins of HPV16 have been extensively studied because of the association of HPV16 with cervical cancer. A haemotoxylin+eosin stained image showing the junction between normal uninfected epithelium (labelled normal) and a region of low-grade cervical neoplasia (labelled CIN1) is shown in (A(i)). The same piece of tissue is shown in (A(ii)) after staining for HPV16 E4 (green) and MCM (red). The HPV16 E4 protein is abundant in the upper epithelial layers in cells supporting viral genome amplification (Doorbar et al., 1997). Although the E4 protein of HPV16 does not assemble into prominent cytoplasmic inclusion granules as seen for the Mu, Gamma and Beta HPV types, the HPV16 protein is predominantly cytoplasmic with a low-level of nuclear staining, and associates with keratin filaments. This intracellular distribution is seen in lesions caused by other Alpha HPV types including HPV2, HPV11 and HPV18. The E4 proteins of many HPV types and the majority of Alpha types contain a predicted beta-aggregation motif at their C-terminus (B). Current thinking is that this multimerisation/aggregation motif is carefully regulated in the context of the full-length protein, and that the capacity to multimerise is controlled by phosphorylation and proteolytic cleavage (see Table 1). C-terminal homology is shown in (B(i)), with aggregation potential for key E4 proteins shown in (B(ii)). The assembly of 16 E1 ∧ E4 into amyloid-like fibres occurs slowly after expression and purification of WT E1 ∧ E4 in bacteria, and more quickly after expression of N-terminal E1 ∧ E4 deletions as shown in (C(i)) (McIntosh et al., 2008). Soluble WT 16 E1 ∧ E4 can be converted into amyloid-like fibres following cleavage with calpain, which removes the N-terminal 17 amino acids (C(ii)). It appears that individual E4 fibres can aggregate laterally to produce larger structures. The width of individual filaments are indicated with arrows in (C(ii)).  (Roberts et al., ,1997McIntosh et al., 2010;Wang et al., 2004). (B) The E4 proteins encoded by two Gamma HPV types (HPV4 and 65) show a partial association with keratins McIntosh et al., 2010). The E4 proteins of these HPV types can form large cytoplasmic inclusions during natural infection (Egawa, 1994;Egawa et al., 1993). (C) Although the Mu HPV type, HPV1 can associate with keratins in some systems , it appears to do this with a lower affinity, and assembles into cytoplasmic and nuclear inclusion granules (arrowed) both in vitro (Rogel-Gaillard et al., 1992and in vivo (Breitburd and Croissant Orth).
loading of the cellular MCM helicases onto chromatin, and it has been suggested that cellular DNA replication may be similarly inhibited in vivo (Roberts et al., 2008). It is not yet clear however whether inhibition of cellular DNA replication occurs as part of any HPV lifecycle, although other viruses are known to use this strategy to favour replication of their genomes over that of the cell (Roberts et al., 2008). The full length HPV1 E1 ∧ E4 protein is not readily detectable once cells have exited the cell cycle post genome-amplification, and in these cells the truncated forms of E4 predominate (Table 3). Mutagenesis and structural analysis has localised E4′s multimerisation-capability to the C-terminus of the protein (Roberts et al., 1994(Roberts et al., , 1997Ashmole et al., 1998), and in the case of Alpha HPV types, has suggested that interactions are mediated through beta-aggregation (McIntosh et al., 2008). A second key function of E1 ∧ E4 that is thought to extend to all HPV types is its ability to associate with cytokeratin filaments and to reorganise the cytokeratin filament network in the cell (Table 2 and Fig. 3, ). For HPV1 E1 ∧ E4, this interaction requires the N-terminal E1 ∧ E4 amino acids that are lost in the truncated E4 proteins (Table 3). This region is predicted to form an amphipathic alpha helix that usually contains a 'leucine-cluster' domain (LLXLL or a motif derived from this) that is essential for keratin association (Table 3). In productive papillomas caused by HPV1, normal cytokeratin organisation is abolished, with differentiation-dependent keratins being almost undetectable in cells expressing E4 Rogel-Gaillard et al., 1993). This dramatic reduction in keratin staining is a characteristic of infection by many if not all HPV types, including cutaneous and mucosal types from the Alpha genus (Doorbar et al., 1997;Fig. 4. Model of Keratin Reorganisation mediated by E1 ∧ E4. (A) Reorganisation of the cellular keratin network is a well-characterised consequence of HPV E1 ∧ E4 expression in epithelial cells. In the cell, cytokeratins exist primarily as insoluble keratin filaments as well as soluble monomers (i). Keratin filaments are converted to keratin monomers as a result of phosphorylation and association with the cellular 'solubility factor' 14-3-3, with soluble and insoluble keratins existing in a dynamic equilibrium (ii). In the presence of E1 ∧ E4 multimers, the keratin network becomes cross-linked, and although the 14-3-3 solubility factor can still bind, phosphorylated keratin monomers cannot be properly solubilised (iii) and eventually the filament network becomes disorganised and can collapse to the nuclear periphery. In the absence of E1 ∧ E4′s C-terminal multimerisation region, some degree of keratin association is still apparent, but the keratin network is not cross-linked and the dynamics of keratin movement within the cell is not significantly compromised (iv). (B) During the papillomavirus life cycle, keratin association and other functions of E4 are carefully regulated. This has been most thoroughly investigated for the HPV16 E1 ∧ E4 protein (McIntosh et al., 2008;McIntosh et al., 2010;Wang et al., 2004Wang et al., , 2009Khan et al., 2011). Our current thinking is that the unphosphorylated form of E4 binds keratins poorly, but that in the S-phase environment that precedes genome amplification, the E1 ∧ E4 protein is phosphorylated by MAPK and that keratin-association is enhanced (i). In the upper epithelial layers the E4 protein is cleaved at its N-terminus by Calpain, which favours the assembly into amyloid-like fibres (ii). It is thought that these E4 structures participate in keratin cross-linking, virus release and possibly also transmission (Khan et al., 2011). McIntosh et al., 2010;Wang et al., 2004). Indeed, our knowledge of E1 ∧ E4 processing during differentiation is in fact best understood for HPV16, where both structural information and the nature of regulatory phosphorylation is well understood.

Mucosal Infections (Alpha HPV types)
Detailed analysis of the high-risk E4 proteins has focused primarily on HPV16, HPV18 and 31 (Tables 1-3 and Fig. 4), and it is assumed that the E4 proteins of related HPV types will have similar mechanisms of regulation. Amongst low-risk Alpha types, the HPV11 E1 ∧ E4 protein has been most extensively studied (Table 3) (Brown et al., 1994(Brown et al., , 2006Bryan and Brown, 2000;Brown et al., 1996). E1 ∧ E4mediated disruption of the cellular cytokeratin network (see Figs. 3 and 4) was first shown in epithelial cells grown in monolayer culture , with later work confirming the importance of this interaction in HPV-associated tissue biopsies from patients (Doorbar et al., 1997;Wang et al., 2004Wang et al., , 2009). In such infected tissue, E1 ∧ E4′s effect on cytokeratins is a reorganisation, generally to the cell periphery, rather than collapse to a perinuclear bundle as seen in dividing epithelial cells in culture Roberts et al., 1997). These differences underlie the limitations of using monolayer culture to model events during epithelial differentiation, and indeed, most recent studies have made use of organotypic raft culture and the analysis of clinical material to support basic cell culture experiments (Doorbar et al., 1997;Wilson et al., 2005Wilson et al., , 2007Wang et al., 2004Wang et al., , 2009Davy et al., 2005;Khan et al., 2011). As with the E1 ∧ E4 protein of HPV1, the HPV16, 18 and 31 E1 ∧ E4 proteins all contain a 'leucine-cluster' motif close to their N-terminus, which is important (along with upstream amino acids) for keratin association (see Tables 2 and 3). In the constrained structure identified for the full length 16 E1 ∧ E4, this motif is thought to be largely unavailable for cytokeratin binding, and associates instead with the E4 multimerisation-motif located at the C-terminus of the protein ((McIntosh et al., 2008), Table 1, Fig. 4). The C-terminus of the HPV11 E1 ∧ E4 protein is similarly important for multimerisation (Bryan et al., 1998) and a common structural organisation may be widely conserved amongst E4 proteins (Fig. 2, Table 3). In the constrained form, the 16 E1 ∧ E4 protein (and very likely other E1 ∧ E4 proteins as well) has only a limited ability to bind keratins and to self-associate, and does not accumulate to any significant extent within the cell. Indeed, during natural infection, the E1 ∧ E4 protein is not readily apparent prior to the onset of viral genome amplification, but may be present at very low level in this constrained form. The dramatic accumulation of E1 ∧ E4 at or around the onset of genome amplification, coincides with the activation of the differentiation-dependent promoter (i.e. p670 in HPV16) and an elevation in E1 ∧ E4 transcript levels, as well as with regulatory post-translational modification as cell-cycle activity declines and the infected cell progresses through its final S-phase and G2-phase-like stages before exiting the cell cycle completely and succumbing to true terminal differentiation (Table 1, Fig. 4). The basis of how these post-translational modifications affect E4 function during productive infection (see Fig. 4) can be understood to some extent by considering the positions of the post-translationally modified amino acids within the HPV16 E1 ∧ E4 amino acid sequence (Table 1). In low-grade neoplasia (such as CIN1), late events such as genome amplification begin as E6/E7mediated proliferative activity wanes (Isaacson Wechsler et al., 2012). In such cells, it appears that the S-phase kinase p42 ERK (Wang et al., 2009) (and possibly other members of the MAP kinase family (McIntosh et al., 2010)) phosphorylate 16 E1 ∧ E4 at amino acid Threonine 57, which adds a negative charge within a region of the protein that is already rich in negatively charged amino acids (McIntosh et al., 2008;Wang et al., 2009). A similar MAP kinase phosphorylation site has been identified in the HPV11 E1 ∧ E4 protein at position Threonine 53, and its loss compromises the normal distribution of the protein within the cell ( , Table 3). For HPV16 E1 ∧ E4, the consequence of this, as revealed by FRET analysis, is a tightening of the central loop of the protein, which facilitates keratin binding and the initial multimerisation of the E4 protein ((McIntosh et al., 2008), Table 1, Fig. 4), and it is likely that the HPV11 MAP kinase site regulates this protein similarily. The similar amino acid charge distribution seen across divergent E1 ∧ E4 sequences suggests a similar kinase-mediated regulation of structure, although for HPV18, the kinase responsible appears not to be p42 MAPK, which was unable to phosphorylate the 18 E1 ∧ E4 protein in in vitro assays ( (Knight et al., 2011;Ding et al., 2013) see Tables 2 and 3). Experiments using antibodies that specifically recognise the HPV16 phospho T57 epitope suggest that this modification is however sustained only transiently during the productive virus life cycle, and is lost as the infected cell slips into the G2-like phase of the cell cycle where viral genome amplification is thought to occur (Table 1 and Fig. 4 (Wang et al., 2009)). For 16 E1 ∧ E4, a second well-characterised phosphorylation event accompanies this S to G2-phase-like transition, and is mediated by the cytoplasmic kinase Cdk1 ( (Davy et al., 2005(Davy et al., , 2006, Table 1, Fig. 4). Although, several Cdk sites are predicted in the 16 E1 ∧ E4 protein, the most significant appears to be at position 32, which is located amongst a generally positively-charged stretch of amino acids in the Nterminal half of the protein (McIntosh et al., 2008(McIntosh et al., , 2010Wang et al., 2009). Although not located at precisely the same position, a Cdk1 site (at threonine 23) has been revealed experimentally in the HPV18 E1 ∧ E4 protein (Table 3 ( Knight et al., 2011;Ding et al., 2013)). For HPV16 (and probably also HPV18 as well), the electrostatic interactions in this region are important for maintaining the constrained loop-structure of the E4 protein, which is thought to become more relaxed following CyclinB/Cdk1 (and possibly also CyclinA/Cdk2) phosphorylation (see Table 1 and Fig. 4). It is during this G2-like phase that E4 begins to accumulate to very high levels, facilitated by cleavage of sequences from its N-terminus in a manner akin to that described above for the HPV1 E4 proteins (see Table 1). For HPV16 E1 ∧ E4, this is mediated in organotypic rafts (and probably also in vivo) by the cysteine protease calpain, which cleaves the 16 E1 ∧ E4 protein between amino acids 17 and 18 (Khan et al., 2011). The HPV 18 E1 ∧ E4 protein shows a similar susceptibility to calpain cleavage, which removes the proteins N-terminus and abolishes its constrained structure (McIntosh et al., 2008). As a result of this, the C-terminal multimerisation motifs identified in Alpha HPV types (Figs. 2 and 4, Tables 1 and 3) become available for self-association, allowing the cleaved E1 ∧ E4 protein to assemble into amyloid-like fibres which affect cell structure and cytokeratin organisation, and indeed beta-aggregation potential is predicted in all Alpha HPV E1 ∧ E4 proteins that we have examined (see Figs. 2 and 4). Although further analysis of post-translational modifications may reveal additional layers of complexity, it appears that as with other papillomavirus proteins (such as E1 for instance (Ma et al., 1999;Deng et al., 2004;Yu et al., 2007)), that E1 ∧ E4 function is regulated by the specific kinases and proteases that become activated as the infected cell migrates from the basal layer to the epithelial surface. Furthermore, the recognisable modular structure of E1 ∧ E4 that is apparent across different papillomavirus types suggests that the broad principles of E4 regulation may in fact be conserved, with differences (e.g. in kinases and phosphorylation sites) being driven by individual epithelial tropisms and routes of transmission.

Genome amplification
Initial studies on the HPV1 E4 protein suggested involvement in virus assembly and/or release, and lead to the proposition that the protein might eventually be reclassified as the third late protein (L3 (Doorbar et al., 1986)). Although this classification was not adopted, it has since become clear that the major role for E4 lies in the late stages of the virus life cycle, and that these initial predictions may have been largely accurate. Although the gross effects on cytokeratin architecture clearly suggest a role in virus release, only a few studies have so far attempted to examine how cell disruption could facilitate the escape of infectious virions Brown et al., 2006;Bryan and Brown, 2001), with none yet addressing the possible effects on transmission success. So far, our understanding of E4 function has come from standard molecular analysis carried out in cells in monolayer culture, and from the analysis of E4′s role during the virus life cycle using model systems and the analysis of biopsy material. A role for E4 in viral genome amplification, but not in the early stages of infection was first shown using mutant Cottontail Rabbit papillomavirus (CRPV) genomes that contain translation-termination linkers in the E4 ORF (Peh et al., 2004). Such genomes were not markedly compromised in their ability to produce papillomas, but when propagated in cottontail rabbits (which support the full CRPV productive cycle), the loss of E4 reduced the efficiency of genome amplification and the synthesis of capsid proteins (Peh et al., 2004). E4s role in genome amplification and the expression of capsid proteins, was subsequently shown for HPV16 (Nakahara et al., 2005), 31 (Wilson et al., 2005) and 18 (Wilson et al., 2007) using organotypic raft culture systems, and although our current work indicates that the protein is not in fact essential for these events, its presence certainly appears to optimise life-cycle completion. Such a role has not yet been demonstrated for low-risk HPV types however, although relevant work on these viruses has so far been limited Thomas et al., 2001;Oh et al., 2004).
The precise mechanisms by which E4 contributes to genome amplification-success and capsid protein synthesis have not yet been clearly reported in the literature. However, from the analysis of known E4 functions, some important insights can be gleaned. Expression of both HPV16 and 18 E1 ∧ E4 in proliferating epithelial cells in monolayer culture causes a dramatic arrest in G2 (Davy et al., 2005;, with similar findings also being reported for the HPV 1 E1 ∧ E4 proteins (Knight et al., 2004. In the case of HPV 16, G2-arrest appears to involve the sequestration of CyclinB/Cdk1 in the cytoplasm (through amino acids Threonine 22 and 23 in 16 E1 ∧ E4; amino acids 43 RRLL 46 in HPV18 E1 ∧ E4 (see Tables 2 and 3)), which is thought in its self to be sufficient to prevent nuclear translocation and the phosphorylation of proteins involved in mitotic progression. Sequestration of CyclinB/Cdk1 is not always apparent in neoplasia caused by high-risk Alpha types where lifecycle events are de-regulated, but is typically seen in benign papillomas caused by diverse low-risk types, in good agreement with the idea that E4 accumulation and genome amplification are G2-specific (Deng et al., 2004;Yu et al., 2007;Davy and Doorbar, 2007). For highrisk types, the G2 arrest function of E1 ∧ E4 may be expected to potently inhibit E6/E7-mediated cell proliferation in the mid epithelial layers, and in this way contribute to the timely onset of vegetative viral genome amplification. Indeed, mutant HPV16 genomes expressing an 'arrest-deficient' form of E4 shows lower levels of genome amplification in organotypic raft culture, supporting this hypothesis (our unpublished data). For HPV18, the contribution of E4′s G2-arrest function is apparently less significant (Knight et al., 2011), which may reflect differences in the extent of suprabasal E6/E7-driven cell proliferation in organotypic rafts induced by these two HPV types. Indeed, the ability of HPV11 to drive cell cycle entry, but not suprabasal cell proliferation, may partially explain why for this HPV type, the loss of E4 (and any potential G2-arrest capability) has only minimal effects on vegetative genome amplification . In addition to its dramatic effects on the cell cycle, it is clear that for HPV16 E1 ∧ E4 at least, that the protein might have other functions that are important for genome amplification but which are less well understood. These include its association with cytoplasmic CyclinA/Cdk2 (via amino acids T22 and T23 in HPV16 and 43 RRLL 46 in HPV 18 (Table 2)), which is thought to enhance 16 E1 ∧ E4s ability to inhibit mitotic progression (see Table 2 (Davy et al., 2006)), and the association with E2 (Davy et al., 2009), which is directly required for genome amplification ( Table 2). The significance of the E2/E4 association during the HPV 16 life cycle is as yet unknown, but is likely to be carefully regulated during the productive cycle of the virus. In cell monolayer experiments however, co-expression of E2 and E4 leads to an elevation in the levels of both proteins, and to a partial sequestration of E2 in the cytoplasm (Davy et al., 2009). A final component of E4 function that may clearly contribute to the optimisation of genome amplification success is its effect on cellular kinases (Table 2, Fig. 4). The key kinases that modulate 16 E1 ∧ E4 function are the same as those that regulate E1 accumulation in the nucleus, and it is likely that E4′s ability to sequester certain members of the MAP (McIntosh et al., 2010;Wang et al., 2009) and Cyclin-dependent kinase family (Davy et al., , 2005 will result in the functional modification of diverse viral and cellular gene products including E1. Differences in kinase site position and susceptibility to phosphorylation seen in the E1 ∧ E4 protein of HPV18 (Knight et al., 2011;Ding et al., 2013) may reflect subtle differences in the epithelial tropism of HPV18 and related types. When considered together, it is perhaps not surprising that E4loss in the context of the viral genome acts to compromise genome amplification success. In our hands however, E4 is certainly not essential for either genome amplification or virus assembly, prompting us to suspect that the E4 protein may have evolved primarily to fill other important functions.

Virus assembly, virus release and transmission
Although the E4 proteins of papillomaviruses are always primarily cytoplasmic, they can clearly be found in the nuclei too , and can sometimes associate with recognisable structures (Fig. 2 (Roberts et al., 2003). It is not yet clear whether the presence of nuclear E4 is an unavoidable consequence of its high abundance in the cell, or whether the nuclear E4 fraction represents a particular functional form of the protein that is necessary for a particular activity such as association with E2 (Davy et al., 2009), or with viral or cellular proteins necessary for virus assembly and/or maturation. Of significant importance however when considering why E4 accumulates late in infection, is the observation that the expression of capsid proteins appears to only occur in cells that are already expressing E4 (Brown et al., 1994(Brown et al., ,1995, and that during the productive life cycle, E4 expression precedes L2 expression, which in itself appears to precede-slightly the expression of L1 (Florin et al., 2002). Thus the order of expression of viral gene products is arranged in such a way that infectious virus particles are only ever produced in E4-positive cells. Although the role of E4 in virus assembly may initially suggest a need for nuclear E4, we should also remember that L1 and L2 are expressed in the cytoplasm before locating to the nucleus, and that the initial assembly of L1 molecules into capsomeres is a cytoplasmic event (Bird et al., 2008). Although associations between E4, L1 and L2 can be demonstrated in vitro, it appears that E4 loss does not necessarily compromise virus assembly significantly in the organotypic raft system (our unpublished observations). Importantly however, the nuclear assembly of virus particles is followed eventually by nuclear degeneration and the release of assembled virions into the terminally-differentiating cell which contains abundant E4 and a reduced level of cellular cytokeratins. Work carried out by Brown and co-workers has revealed important defects in the cornified envelope in cells expressing the HPV11 E1 ∧ E4 protein, (Brown et al., 2006;Bryan and Brown, 2000;Lehr et al., 2002Lehr et al., , 2004 and has clearly highlighted the point that the major L1 capsid protein is expressed from a bicistronic mRNA that contains E1 ∧ E4 as the first cistron (Brown et al., 1996). The natural follow-on from this is that the E1 ∧ E4 proteins, which in the upper epithelial layers are present in their Nterminally truncated form as amyloid fibres, in some way serve to optimise the transmission success of virions shed from the epithelial surface in virus-laden squames. Driven largely by the need to fully understand how prophylactic vaccination offers protection against infection, a considerable effort has already been put into the study of virus attachment and the mechanisms of virus entry, but only few studies have yet made use of virions produced in differentiating epithelial cells, and none have yet addressed in any significant way the contribution of E1 ∧ E4 to this process. The absence of convenient epithelial systems for the study of virus production and maturation in the presence or absence of E4 have so far limited such important studies.
The modular structure of the E4 proteins suggests conserved functions While most of the above analysis focuses on the E4 proteins of the prominent high-risk HPV Alpha types, the comparison with HPV1 E1 ∧ E4 at a functional level, and the analysis of sequence similarities between these E1 ∧ E4 proteins and those of other papillomaviruses, suggests that while E4 proteins are highly divergent at the primary amino acid sequence level, they are much more obviously conserved when broad 'structural' modules are compared (Tables 2 and 3). When large-scale alignment is carried out, it is clear that the most significant regions of similarity lie at the N-terminus, where a predicted amphipathic alpha helix and the leucine-cluster motifs reside (see alignment in Figs. S1 and S2), and at the C-terminus, where sequences involved in E4/E4 self-association are primarily located (Table 3). These C-terminal sequences of the Alpha HPV types typically have high amyloidogenic/Beta aggregation potential and can be identified as such using amyloid peptide prediction programs such as Tango (McIntosh et al., 2008). For HPV16 and HPV1 E1 ∧ E4, the loss of the C-terminal region and the binding-avidity that this provides (McIntosh et al., 2008;Wang et al., 2004), prevents prominent E1 ∧ E4 accumulation in the cytoplasm (Roberts et al., 1994(Roberts et al., ,1997, and can a redistribution of E1 ∧ E4 to the nucleus, despite the retention of Nterminal motifs that mediate direct keratin binding ( Fig. 4 (Wang et al., 2009)). Although the affinity of E1 ∧ E4/keratin binding has not yet been measured, the leucine-cluster alone is insufficient to mediate specific cytokeratin-binding, but instead directs reporter molecules such as GFP to the mitochondria (Raj et al., 2004). For HPV16 E1 ∧ E4, cytokeratin-association requires the leucine-cluster as well as upstream amino acids, and it is likely that specificity is controlled in a similar way amongst other HPV types (Table 3). Thus it appears that despite variation at the level of primary amino acid sequence, the E1 ∧ E4 proteins of all HPV types contain clearly recognisable 'functional motifs' at their N and C-termini. Variation between E4 sequences lie primarily in their length, which is mediated for the most part by an extension of the central portion of the E1 ∧ E4 molecule. In the case of the HPV16 E1 ∧ E4 protein, this central region (which is surrounded by proline-rich segments) comprises a generally positively-charged region, a loop-region, and a generally negativelycharged region which form a folded structure ((McIntosh et al., 2008) ,  Fig. 3). These defined structural elements are recognisable not only in the putative E1 ∧ E4 proteins of related HPV types, but also in the E1 ∧ E4 proteins from other genera, suggesting that E1 ∧ E4 proteins from diverse HPVs may have similar mechanisms of regulation. Interestingly, the loop region (where the amino acid sequence undergoes a turn) contains the major immune dominant epitopes within the protein, and is the site to which many of the most useful E1 ∧ E4 monoclonal antibodies bind (Fig. 3). It is possible that this region in particular is available for interaction with cellular targets in the context of the full-length protein.
Other E4 species generated by splicing and proteolytic cleavage In general, sequence alignment carried out using the predicted products of the full length E4 ORF show recognisable conservation only downstream of the E1 ∧ E4 splice acceptor site (Doorbar and Myers, 1996). Such observations suggest that sequences upstream of the splice acceptor are not in fact E4 coding sequences, and indeed, there is no generally conserved ATG codon at the start of all E4 reading frames to suggest this. It seems instead that in most cases, the abundant E4 proteins are expressed from a spliced E1 ∧ E4 mRNA transcript, and that the initiation codon for E4 gene expression comes from the E1 ORF. It also appears, for both HPV16 and HPV1 E4, that smaller E4 species with modified functions can be generated by proteolytic cleavage as the infected cell migrates towards the epithelial surface, and that these can be further modified by phosphorylation Doorbar et al., 1988;Roberts et al., ,1997McIntosh et al., 2008;Wang et al., 2004;Khan et al., 2011). Although an abundant E1 ∧ E4 transcript has been identified in most papillomaviruses that have so far been examined (Chow et al., 1987a,b;Doorbar et al., 1990;Milligan et al., 2007;Wang et al., 2011;Ozbun and Meyers, 1997;Nasseri et al., 1987;Baker and Howley, 1987;Barksdale and Baker, 1995;Palermo-Dilts et al., 1990), recent work has suggested that there may in fact be additional E4 gene-products expressed from distinct transcripts. These including a putative E6 *∧ E4 gene product in HPV16 (Milligan et al., 2007), as well as two distinct E2 ∧ E4 polypeptides (E2 ∧ E4S and E2 ∧ E4L) (Tan et al., 2012) and an alternative form of E1 ∧ E4 (E1 ∧ E4S) in HPV18 (Kho et al., 2013). Such transcripts are generally low-abundance when compared to the primary E1 ∧ E4 species, with their pattern of expression and significance during productive infection not yet known. Interestingly, the E4 ORF encodes not only the hinge domain of E2, but in the case of HPV16, it also contains a series of ASF/SF2 sites that regulate usage of the splice acceptor at position 3358 (SA3358 (Somberg and Schwartz, 2010)). SA3358 is in itself a poor splice acceptor, despite being efficiently used during the HPV life-life cycle to direct the production of the abundant E1 ∧ E4 mRNA. Current thinking suggests that increasing ASF/SF2 acts to down regulate the activity of SA2709 (which is required for E2 expression) in order to favour the use of SA3358 which allows expression of the abundant E1 ∧ E4 transcripts, and it has been suggested that the eventual decline in ASF/SF2 binding in the upper layers of the epidermis may eventually facilitate the switch to SA5639 usage and allow the expression of capsid proteins (Somberg and Schwartz, 2010;Rush et al., 2005;Johansson et al., 2012).

Summary
The functions of the papillomavirus E4 proteins in the virus life cycle have been difficult to resolve. Although located in the early part of the viral genome, the E4 proteins are primarily expressed during the late stages of infection, at or around the time that genome amplification is initiated. In general, the E4 proteins of different papillomaviruses share a recognisable modular organisation despite showing divergence at the primary amino acid sequence level (Table 3), and can accumulate to very high levels in the upper epithelial layers where virus particles assemble. Any consideration of E4 function must take this into account, and it has been suggested that E4′s ability to disrupt the cellular keratin network and the formation of the cornified envelope may facilitate virus release and/or transmission (Figs. 1-4 (Bryan andBrown 2000,2001)). Formal validation of the extent to which E4 facilitates virus transmission and infection remains to be carried out. In addition to this potentially significant role, the E4 proteins of several HPV types have been shown to inhibit cell proliferation in G2 (Davy and Doorbar, 2007), and to participate in efficient genome amplification, and perhaps as a consequence of this, virus synthesis as well (Peh et al., 2004;Nakahara et al., 2005;Wilson et al., 2005). The contribution of E4 to genome amplification has not yet been fully worked out, but may involve E4′s interaction with kinases, its association with E2, and its cell cycle-arrest capabilities. The abundance of E4 in lesional tissue has recently suggested a useful role as a HPV-specific biomarker, particularly when used in combination with cellular makers of deregulated viral gene expression (Griffin et al., 2012). The preservation of the E4 ORF across diverse papillomavirus types, coupled with its abundance suggests an evolutionary important role in improving viral fitness. It appears that the E4 proteins may have multiple roles in the virus life-cycle, but that it plays an important role in ensuring efficient virus release and transmission.