VP22 core domain from Herpes simplex virus 1 reveals a surprising structural conservation in both the Alpha- and Gammaherpesvirinae subfamilies

The viral tegument is a layer of proteins between the herpesvirus capsid and its outer envelope. According to phylogenetic studies, only a third of these proteins are conserved amongst the three subfamilies (Alpha-, Beta- and Gammaherpesvirinae) of the family Herpesviridae. Although some of these tegument proteins have been studied in more detail, the structure and function of the majority of them are still poorly characterized. VP22 from Herpes simplex virus 1 (subfamily Alphaherpesvirinae) is a highly interacting tegument protein that has been associated with tegument assembly. We have determined the crystal structure of the conserved core domain of VP22, which reveals an elongated dimer with several potential protein–protein interaction regions and a peptide-binding site. The structure provides us with the structural basics to understand the numerous functional mutagenesis studies of VP22 found in the literature. It also establishes an unexpected structural homology to the tegument protein ORF52 from Murid herpesvirus 68 (subfamily Gammaherpesvirinae). Homologues for both VP22 and ORF52 have been identified in their respective subfamilies. Although there is no obvious sequence overlap in the two subfamilies, this structural conservation provides compelling structural evidence for shared ancestry and functional conservation.


INTRODUCTION
Human herpesviruses are known to cause vastly different diseases/illnesses that range from mild oral-facial blisters and chicken pox to fatal conditions such as Burkitt's lymphoma and Kaposi's sarcoma (Antman & Chang, 2000;Davison, 2007;Whitley & Roizman, 2001). Herpesviruses are large DNA viruses that share an overall common virion structure. The virion consists of a dsDNA encapsidated within an icosahedral capsid (Davison, 2007). Between the capsid and the outer membrane lies a layer of proteins, collectively known as the tegument (Guo et al., 2010). Twenty-four different tegument proteins have been identified in Herpes simplex virus 1 (HSV-1; Human herpesvirus 1) but, judging from sequence alignments, only a third of them are conserved across all the subfamilies (Alpha-, Betaand Gammaherpesvirinae) of the family Herpesviridae (Kelly et al., 2009). Tegument proteins can occur in several thousands of copies inside the virion, whilst others are less abundant (Elliott & Meredith, 1992). Some tegument proteins have been found to contribute greatly to viral entry, assembly and egress, whilst others play vital roles in viral immune evasion and regulation of viral gene expressions (Kalejta, 2008;Kelly et al., 2009;Sathish et al., 2012).
Sequence analysis and secondary structure predictions reveal that VP22 consists of a non-conserved N-terminal domain and a conserved C-terminal domain with the clear presence of secondary structures (O'Regan et al., 2007a). Deletion and functional studies have shown that the conserved C-terminal domain in VP22 is important for binding to VP16 and gE (O'Regan et al., 2007a, b). To generate insight into the VP22 structure and function, we crystallized and solved the structure of the conserved Cterminal domain of this protein, hereafter referred to as VP22 core , to a resolution of 1.9 Å . VP22 core exists as a dimer with a highly conserved dimerization site. Although sequence homology of VP22 has only been established within the alphaherpesviruses, the crystal structure reveals that it shares extensive structural similarity with ORF52 from Murid herpesvirus 68 (MHV-68) (subfamily Gammaherpesvirinae). ORF52 MHV-68 has been found to be essential for replication in MHV-68 in vitro (Song et al., 2005). Similar to VP22 core , ORF52 MHV-68 is also a highly expressed tegument protein that exists as a dimer made up of two identical monomers (Benach et al., 2007;Bortz et al., 2007). It is well conserved within the gammaherpesviruses, and has been implicated to be important for tegument association and interactions (Bortz et al., 2007;Fossum et al., 2009;Rozen et al., 2008;Uetz et al., 2006). These are coincidentally similar to some of the proposed functions of VP22 (Brignati et al., 2003;Farnsworth et al., 2007). With the VP22 core structure in hand, we have been able to compare the two protein structures, revisit the outcome of reported mutational studies as well as identify completely conserved residues that might be important for function.

RESULTS AND DISCUSSION
Structure of VP22 core VP22 core crystallized in the space group P6 1 22 and the crystal structure [Protein Data Bank (PDB) ID: 4XAL] was determined at a resolution of 1.87 Å using single isomorphous replacement with anomalous scattering (SIRAS). Each asymmetrical unit consists of a molecule of VP22 core , with visual electron density for residues 174-260, together with three amino acids from the N-terminal purification tag. The crystallographic data statistics are summarized in Table 1. The structure of VP22 core is constituted by a long central ahelix (a1) flanked by a long random coil (L1) at the N terminus, two shorter a-helices (a2 and a3) and one b-strand (b1) at the C terminus ( Fig. 1). Two VP22 core monomers are related by the crystallographic twofold axis and are slightly twisted around each other, creating an elongated dimer ( Fig.  1) where the a1 helices and the b1 interact in an anti-parallel fashion. The dimeric state of VP22 core has been proposed previously (Mouzakitis et al., 2005) and our light-scattering results show that VP22 core is mono-dispersed with a mean molar mass of~26 500 g mol 21 . This is roughly twice the theoretical molecular mass of the monomeric VP22 core including the purification tag and the tobacco etch virus (TEV) protease site (14 551 g mol 21 ), further confirming that VP22 core is a dimer in solution (Fig. 2).
To be able to orientate ourselves in the structure, we have dubbed one side of the dimer the 'peak' (Figs 1a,b and 3a) and the other side the 'groove' (Figs 1c, d and 3b). On the peak side, the dimerization of b1 creates a flat plateau where two conserved arginines (Arg242) create a positively charged peak in the middle of a less charged area (Figs 1a and 3a,c). Flanking the sides of this peak are two identical negatively charged patches. The residues that contributed to these two patches are Asp186 from L1 of one VP22 core monomer and a cluster of negatively charged residues, Glu230, Asp231 and Glu234, from a2 of the other monomer (Figs 1c and 3a, d). The electrostatic potential surface map of the groove, which is created by L1 and a1 from both monomers, shows two large and positively charged patches. In general, distinctly charged patches on a protein surface might indicate potential sites for proteinprotein interactions and any of these described areas in VP22 core could serve this purpose.
Interestingly, we observed a stretch of unaccounted electron density next to b1 (Fig. 4). The b1 b-sheet forms a tiny b-sheet through interactions with b1 from the other monomer and contributes to the overall dimerization of VP22 core . We managed to model a six-amino-acid peptide into this density. This peptide forms a perfect b-strand, expanding the b1 b-sheet to four stands. It corresponds to the sequence SSGSVD, which is a part of the linker region between the N-terminal His 6 -tag and the TEV protease cleavage site. This peptide is most likely contributed in trans from a neighbouring subunit in the crystal lattice, which is not part of the crystallographic dimer. The peptide is held tightly into place by backbone interactions and the coordination of the hydroxyl group on the N-terminal serine. Although this particular peptide sequence is most likely not of biological relevance, it indicates directly that this peptide-binding cavity could constitute a real site for protein interaction with VP22. A motif similar to the peptide was not identified at the N terminus of VP22, but it is plausible that some as-yet unidentified part of the N terminus could form a b-strand and bind in this location.
Conserved residues in VP22 core contribute to its fold, oligomerization and interactions VP22 has many proposed interaction partners. In order to evaluate and differentiate between these interactions, various deletion mutants have been created, described and discussed (Brignati et al., 2003;Elliott et al., 2005;Hafezi et al., 2005;Martin et al., 2002;O'Regan et al., 2007aO'Regan et al., , b, 2010Stylianou et al., 2009) (summarized in Table S1, available in the online Supplementary Material). Whilst these studies have laid a foundation for the VP22 protein interaction network, the crystal structure of VP22 core can now aid in understanding these interactions at the atomic level. We mapped several of the published mutations onto the VP22 core structure to gain more insights into their structure-function relationship.
Upon mapping these deletions and truncations, we can now see that most of the mutated residues that yielded in a loss of protein function are located in L1, a1 or a2 (Table  S1). In most cases, the reported deletions would have removed parts of the long central helix a1 -a key secondary structure along the dimerization interface. Most of the described point mutations that seem to have an effect on VP22 interactions are also focused on this helix (O'Regan et al., 2007b(O'Regan et al., , 2010Tanaka et al., 2012). In particular, Trp189, Phe201 and Trp221, which have been found to disrupt the binding between VP22 and gE/VP16, are located along the dimerization interface of a1 (Fig. 5a, b). It is possible that most effects observed in these studies are the result of the distortion of VP22's dimerization, rather than specific functional effects.
A residue of particular interest is the conserved and solvent-exposed Phe196. The electrostatic surface potential of VP22 core reveals that this hydrophobic Phe196 is located in the middle of the two large and highly positively charged patches at the groove side (Fig. 5c). With a single point mutation of this amino acid, O'Regan et al. (2010) were able to remove the binding between gE and VP22, but not between VP16 and VP22. Moreover, conserved aromatic residues on the surface of a protein have often been shown to be important for protein interactions (Albiston et al., 2010;Cao et al., 2008;Chouljenko et al., 2012;Ferrandon et al., 2003). The functional evidence from O'Regan et al. , where the sum is calculated over all observations of a measured reflection (I j ) and [I] is the mean intensity of all the measured observations (I j ). dR factor 51006g(|F o |2|F c |)/g(|F o |), where F o and F c are the observed and calculated structure factors, respectively. §R free is equivalent to R factor , but where 5 % of the measured reflections have been excluded from refinement and set aside for cross-validation.
(2010), in combination with the high degree of conservation and strategic location/orientation of Phe196, suggest its importance in protein interactions. Given that the point mutation on Phe196 only removed the binding between gE and VP22, Phe196 and the surrounding amino acid residues may also play a key role in discriminating between the different interacting proteins of VP22 (O'Regan et al., 2010).
Similarly, the binding between VP22 and VP16 was disrupted when a pair of conserved leucines along a2 VP22 (Leu235 and Leu236) was mutated into alanines (O'Regan et al., 2007b). These mutations also altered the localization sites of several HSV-1 proteins, including ICP0, gE, gD, VP16 and vhs, in the host cell (Tanaka et al., 2012). However, these leucines are exposed to the hydrophobic core and do not appear to be able to participate in any    Fig. 2. Light-scattering curve of VP22 core in solution as a function of its elution volume. The monomeric molar mass of VP22 core is 14551 g mol "1 and the light-scattering results show that VP22 core is mono-dispersed with an estimated mean molar mass of 26 500±5000 g mol "1 . This shows that VP22 core is dimeric in solution. The SDS-PAGE gel of the injected VP22 core sample and the protein ladder (Mark12 Unstained Standard; kDa) is displayed on the left of the elution peak. direct protein-protein interactions (Fig. 5d). Thus, the loss of protein function may likely have arisen due to either the collapse of the global VP22 core structure or local distortions of a-helical stability/positions. If the observed effects are indeed a result of local structural distortions, these mutations highlight the significance of the entire a2 for making interactions with its binding partners.
VP22 core is structurally homologous to ORF52 from MHV-68 VP22 consists of two domains where only the C-terminal domain is highly conserved in the alphaherpesviruses. The N-terminal domain is more variable and this domain is completely absent in some alphaherpesviruses (O'Regan et al., 2007a). A structural homology search with the structure of VP22 core on the Dali server identified another herpesvirus protein, ORF52 from MHV-68 (PDB ID: 2R3H and 2OA5), with a mean Z score of 5.6 (Holm & Rosenström, 2010). ORF52 from MHV-68 (ORF52 MHV-68 ) is a small viral protein of 21 kDa, making it substantially smaller than the full-length VP22 (35 kDa). As with VP22 core , ORF52 MHV-68 is also a highly expressed tegument protein that exists as a dimer made up of two identical monomers (Benach et al., 2007;Bortz et al., 2007). Both VP22 and ORF52 MHV-68 are well conserved within the alpha-and gammaherpesviruses, respectively, and both proteins seem to share similar functions, such as tegument association and interactions (Bortz et al., 2007;Brignati et al., 2003;Fossum et al., 2009;Rozen et al., 2008;Uetz et al., 2006). For clarity, we use the  Fig. 1(d). (c) The positively charged patch at the peak side is created by Arg242, whilst (d) the negatively charged patch is created by Asp186 from L1 of one monomer and a cluster of negatively charged residues, Glu230, Asp231 and Glu234, from a2 of the second monomer. These distinctively charged patches on VP22 core might be potential molecular interaction sites.

Arg242
Val245 Fig. 4. Peptide-binding site of VP22 core with the electrostatic potential surface map of the peak side. A peptide consisting of six amino acids was traced from the stretch of unmodelled electron density next to b1. The interaction between the peptide (yellow) and b1 (white) is magnified and displayed below. The peptide fits well into the electron density and the sequence was traced to be SSGSVD. Hydrogen bonds hold the peptide to b1 and these interactions are illustrated by yellow dotted lines.
subscripts 'VP22' and 'ORF52' to differentiate between the secondary structural elements in the respective proteins.
To analyse the structural similarities in detail, the VP22 core structure was compared with the published dimer of ORF52 MHV-68 (PDB ID: 2OA5) using Coot (Emsley et al., 2010). The a carbons of each VP22 core monomer and the individual ORF52 MHV-68 monomer align well with a mean root-mean-square deviation (RMSD) of 2.1 Å (Fig. 6a, b). Both VP22 core and ORF52 MHV-68 have long central ahelices (a1 VP22 and a2 ORF52 ) that constitute the core of the dimer interactions. The anti-parallel b-strands (b1 VP22 and b1 ORF52 ) also contribute to this dimerization. The helices a2 VP22 and a3 ORF52 located on the surface of the proteins align well with each other. There is a slight difference at the C terminus of this superposition where we notice that whilst ORF52 MHV-68 has an extended loop, HSV-1 VP22 core has an a-helix denoted a3 VP22 . However, the major differences between VP22 core and ORF52 MHV-68 lie at the N terminus (Fig. 6a, b). At the N terminus, ORF52 MHV-68 has an additional helix (a1 ORF52 ), whilst VP22 core has a long extended loop (L1 VP22 ). In ORF52 MHV-68 , this particular helix displays two different conformations by extending in different directions in the dimer structure, suggestive of a flexible N terminus in ORF52 MHV-68 . L1 VP22 stretches in the same direction as a1 ORF52 in chain A and in the opposite direction from a1 ORF52 of ORF52 MHV-68 chain B. VP22 core has an additional N-terminal domain, not present in our structure, and secondary structure predictions also indicate a low a-helical propensity along L1 VP22 (not shown) (Cole et al., 2008). Thus, there is a possibility that L1 VP22 exists as a part of a long and flexible connection between VP22 core and its N-terminal domain.
Based on the structural similarity, we generated a structurebased sequence alignment between the monomeric VP22 core and ORF52 MHV-68 yielding a sequence identity of 13 % (Fig.  6c) (Pettersen et al., 2004). As with the conserved residues within the VP22 homologues, most of these residues are clustered throughout the dimerization interface and the hydrophobic core. The structure-based sequence alignment prompted us to try and identify a possible homologue in the betaherpesviruses. However, no homologue could be identified.
To further understand the sequence conservation between the alpha-and gammaherpesviruses, we generated an additional alignment with most homologues from the alpha-and gammaherpesviruses (Fig. S1). Although this sequence alignment displays very low sequence similarity, it does reveal four amino acids that are particularly conserved in both subfamilies. In particular, along a2 VP22 and a3 MHV-68 , a leucine (Leu236 VP22 /Leu89 ORF52 ) is conserved in both the alpha-and gammaherpesviruses. As in VP22, this conserved leucine in ORF52 MHV-68 (coloured red at a3 MHV-68 in Fig. 6b) is also exposed to the hydrophobic core of the protein, supporting the importance of oligomerization of this protein for proper function.
The remaining three amino acids that are conserved in both the alpha-and gammaherpesviruses are Arg242 VP22 / Arg95 ORF52 , Val243 VP22 /Val96 ORF52 and Val245 VP22 / Val98 ORF52 (Fig. S1). These amino acids are located along b1, where the side chain of the valines stretches into the core of the structure, whilst the side chain of the arginine is solvent-exposed (Figs 4 and 6). The two conserved valines seem to contribute to the fold, but the highly conserved arginine (Arg242 VP22 /Arg95 ORF52 ) along b1 appears to be important for protein binding. This conserved residue is found next to our proposed peptide-binding site and is what creates the distinct peak of VP22 core (Figs 3a, c and 4). To underline the importance of this completely conserved show one VP22 core monomer displayed as a cartoon and the other monomer displayed as the electrostatic potential surface map. (a) Trp189/Phe201 and (b) Trp221 are buried in the hydrophobic dimerization interface, rendering them unlikely to participate in any specific protein-protein interactions. Instead, they seem very important for dimerization. However, the surface electrostatic potential map shows that the conserved Phe196 (c) is found on the surface of the VP22 core and is likely to participate in protein-protein interactions. However, Leu245/Leu246 (d) are buried in the hydrophobic interface, indicating that the leucine pair is not likely to participate in specific proteinprotein interactions.
Conserved VP22 core domain from HSV-1 arginine is the fact that Wang et al. (2012) could disrupt the binding between ORF52 MHV-68 and ORF42 MHV-68 with a single amino acid substitution (ArgAAla) in this position. Hence, although there is no determined homologue to ORF42 MHV-68 in HSV-1, it is likely that the corresponding mutation in VP22 could also disrupt the interaction to one or several of its (un)known binding partners. It would be interesting to see how a mutation of this conserved Arg242 VP22 would affect this protein in vivo.
The described conserved structural features and functions of VP22 and ORF52 MHV-68 suggest that both proteins could act as protein adaptors in which different proteins are bound. Moreover, being a major tegument protein in HSV-1, VP22 has been associated with multiple proteinprotein interactions, several HSV-1 protein localizations as well as protein transportation along the microtubules (Chi et al., 2005;Elliott et al., 1995Elliott et al., , 2005Elliott & O'Hare, 1998;Farnsworth et al., 2007;Hafezi et al., 2005;Kotsakis et al., 2001;Maringer & Elliott, 2010;Maringer et al., 2012;Martin et al., 2002;O'Regan et al., 2007aO'Regan et al., , 2010Potel & Elliott, 2005;Stylianou et al., 2009;Tanaka et al., 2012;Yedowitz et al., 2005). It is likely that VP22 and the structural homologue ORF52 MHV-68 could be involved in assembling a protein scaffold consisting of other tegument proteins, thereby creating a protein bridge between the capsid and the lipid envelope. This assembly may be important for the intracellular transportation of proteins along the microtubules.
In conclusion, with a three-dimensional structure of a wellstudied protein like VP22, we can now start connecting functional data with structural information. We hope that the data presented in this paper might help to spur new and directed efforts to elucidate this protein's function.  Fig. 6. Structural and sequence alignment of VP22 core and ORF52 MHV-68 . The dimer structures of (a) VP22 core and (b) ORF52 MHV-68 are shown as cartoons in the same orientation. (c) The structural alignment of VP22 core and ORF52 MHV-68 is reproduced in a sequence alignment. The completely conserved amino acids are highlighted in red, whilst the other conserved residues are highlighted in pink. The conserved amino acids are mainly concentrated along the hydrophobic dimerization interface at a1 VP22 , a2 VP22 and b1 VP22 .
Crystallization and data collection. Native crystals of VP22 core were obtained from a sitting drop experiment with drops containing 1.5 ml purified VP22 core protein (12 mg VP22 core ml 21 ) and 1.5 ml reservoir solution (40 % PEG 300 and 0.1 M phosphate citrate, pH 5) was incubated with 300 ml reservoir solution in a 24-wells sitting drop Intelli-plate (Art Robbins) at 20 uC. Native crystals were transferred to a fresh drop of reservoir solution containing 1 mM PbCl 2 for 45 min to obtain derivative crystals. No additional cryoprotectant was added to the native and the derivative crystals before flash freezing them in liquid nitrogen.
Diffraction datasets were collected at beamline BL13C1 at the National Synchrotron Radiation Research Center (Taiwan, ROC) with the detector ADSC Quantum-315r CCD. Datasets were collected at 0.97 Å , and integrated and scaled with HKL-2000 (Otwinowski & Minor, 1997).
Structural determination. The initial crystallographic model of VP22 core was obtained with SIRAS using AutoSol wizard and AutoBuild from the PHENIX suite (Adams et al., 2010). The final structure was obtained after many cycles of automatic and manual structural refinement with REFMAC (Murshudov et al., 2011) and Coot (Emsley et al., 2010). The structure refinement was validated with SFCHECK (Vaguine et al., 1999) and the geometry of the final structure was analysed with RAMPAGE (Lovell et al., 2003).
The figures of the final VP22 core structure were created and displayed with PyMOL (http://www.PyMOL.org/). The electrostatic potential of the solvent accessible surfaces of the protein were calculated using PDB2PQR (Dolinsky et al., 2004) and the APBS plugin (Baker et al., 2001) in PyMOL. The electrostatic potential contour levels were set at ±3 kT/e and the surface maps were displayed with PyMOL.
Structure-based sequence alignment. The sequence alignment between VP22 core and ORF52 MHV-68 was generated with a pair-wise structure-based alignment between the monomers using Chimera (Pettersen et al., 2004). Sequences of the VP22 core and ORF52 MHV-68 homologues from the alpha-and gammaherpesviruses were aligned by adding their amino acid sequences to the structure-based alignment. The amino acid conservation was mapped and displayed with PyMOL.
Multi-angle light scattering. Light-scattering data were obtained with analytical size-exclusion chromatography (Superdex 200 5/150 GL; GE Healthcare) coupled with a multi-angle light-scattering detector (MiniDAWN TREOS; Wyatt Technology) and a refractive index detector (Optilab rEX; Wyatt Technology) on an Ä KTAmicro (GE Healthcare). An aliquot of 20 ml VP22 core (6 mg VP22 core ml 21 ) was injected onto the pre-equilibrated column (20 mM HEPES, pH 7.5, 300 mM NaCl, 10 % glycerol and 2 mM TCEP) at a flow rate of 0.3 ml min 21 . ASTRA 6 (Wyatt Technology) was used to determine the experimental protein molecular mass from the light-scattering data.