Polyhedra structures and the evolution of the insect viruses

Polyhedra represent an ancient system used by a number of insect viruses to protect virions during long periods of environmental exposure. We present high resolution crystal structures of polyhedra for seven previously uncharacterised types of cypoviruses, four using ab initio selenomethionine phasing (two of these required over 100 selenomethionine crystals each). Approximately 80% of residues are structurally equivalent between all polyhedrins (pairwise rmsd ≤ 1.5Å), whilst pairwise sequence identities, based on structural alignment, are as little as 12%. These illustrate the effect of 400 million years of evolution on a system where the crystal lattice is the functionally conserved feature in the face of massive sequence variability. The conservation of crystal contacts is maintained across most of the molecular surface, except for a dispensable virus recognition domain. By spreading the contacts over so much of the protein surface the lattice remains robust in the face of many individual changes. Overall these unusual structural constraints seem to have skewed the molecule’s evolution so that surface residues are little less conserved than the internal residues.

conservation of crystal contacts is maintained across most of the molecular surface, except for a dispensable virus recognition domain. By spreading the contacts over so much of the protein surface the lattice remains robust in the face of many individual changes. Overall these unusual structural constraints seem to have skewed the molecule's evolution so that surface residues are little less conserved than the internal residues.
Cypoviruses have been classified into 21 distinct types (Attoui et al., 2012). Within a type the amino acid sequence of polyhedrins are highly conserved, whilst between types there is little conservation. The structure of BmCPV-1 polyhedrin (CPV1), determined by micro-focus crystallography, showed the polyhedra to be built from rigid trimeric building blocks, which assemble with the aid of ribonucleoside triphosphates (NTPs) to form a lattice containing very little bulk solvent (Coulibaly et al., 2007). To ensure the delivery of virus particles to the target intestinal cells, allowing the transmission of packages of infectious virus between hosts by oralfaecal routes, polyhedra are disrupted only when they are exposed to the very alkaline pH environment found in insect midguts (Vago and Croissant, 1959). Region-specific ion transport has been found to facilitate the alkalinisation of the anterior midgut lumen with major roles for an H + V-ATPase-energised Cl − /HCO3 − exchanger and carbonic anhydrase (Boudko et al., 2001), consistent with carbonate ions being used to modulate the pH of the midgut.
We recently reported structures for CPV18, a close relative of CPV1, determined from undisrupted insect cells and from purified polyhedra (Axford et al., 2014), and for CPV17 at room temperature and at 100K, using an X-ray free electron laser and synchrotron respectively (Ginn et al., 2015). Here we report polyhedrin structures for all other types of CPV for which polyhedra could be obtained, giving us a database of nine structures from which we derive rules for crystal formation and dissolution and investigate how preservation of this unusual phenotype (the crystal lattice) has impacted on the variation of the protein over geological time scales.

Materials and methods
For readability we refer to the cypovirus type and strain by number alone i.e. CPV14 for Lymantria dispar cypovirus 14, as defined in Table S1. For completeness we include CPV1 (Coulibaly et al., 2007) as a reference in our analyses.

Expression and purification of polyhedra
The cypovirus types studied in this paper are abbreviated as detailed in Supplementary Table S1. cDNA pools of cypovirus genomes for types CPV1, CPV5, CPV14 and CPV15 were a gift from Dr. Shujing Rao (Institute for Animal Health, Pirbright, Surrey, UK). Polyhedrin genes of CPV4, CPV17, CPV18, CPV19 and CPV20 were synthesized by GeneArt (Life Technologies) using sequences obtained from GenBank.
Spodoptera frugiperda (Sf9) cells were passaged at 27 °C in Sf-900 II serum-free medium (SFM, Life Technologies) using standard procedures (O'Reilly D.R. et al., 1994). The coding regions of genome segment 10 from CPV1, CPV5, CPV14 and CPV15 were amplified and inserted in the multi-cloning site of vector pFastBac1 (Life Technologies). The recombinant plasmids were transformed into competent DH10Bac cells (Life Technologies) to produce bacmid DNA for the generation of recombinant baculoviruses, following the protocol of the Bac-to-Bac® baculovirus expression system (Life Technologies). The polyhedrin genes from CPV4, CPV17, CPV18, CPV19 and CPV20 were sub-cloned into the transfer vector pBacPAK9 (Clontech). Base domain deletion mutants of CPV1 (CPV1Δ) and CPV4 (CPV4Δ) were produced by three steps of PCR using primers encoding a Gly-Ser-Gly linker (described in (Strachan and Read, 1999)).
The recombinant virus was produced by co-transfection of linearized baculovirus DNA and the transfer vector following a standard protocol (Zhao et al., 2003). Polyhedra were expressed and purified as described earlier (Anduleit et al., 2005). SeMet-labelled polyhedra were generated as published previously (Ji et al., 2010) and the incorporation of SeMet was confirmed to be at least 80% using mass spectroscopy. Purified polyhedra were observed and characterised by a Zeiss Axiovert 200 M inverted microscope with fluorescence / phase or DIC (Nomarski) imaging.

Data collection
Polyhedra were spread on a MicroMesh mount (Mitegen, Ithaca, USA) in a solution of 10 mM HEPES, pH 7.5 and 50% ethylene glycol and flash-frozen in a stream of nitrogen gas at 100 K.
The typical 10 µm x 10 µm focal spot at the sample position of beamline I24 was reduced with the use of beam defining slits; in the case of CPV17, slits close to sample position and producing a beam size of ~4 µm x ~4 µm with ~1 x 10 11 Ph/s; in the case of all other CPVs, slits at the focal point of the first of two pairs of KB mirrors that form the optical configuration of the beamline, producing a beam size of ~6 µm x ~6 µm with ~1.5 x 10 11 Ph/s .
Data were processed using various programs including Denzo/Scalepack (Otwinowski et al., 1997), XDS (Kabsch, 1993), Mosflm/Scala (Leslie and Powell, 2007) and XIA2 (Winter, 2010) according to the detector types. The current in the storage ring was maintained around 200-250 Ji et al 6 mA at Diamond. Each crystal was exposed to 100% transmission of the X-ray beam with a fine sliced oscillation range of 0.05-0.1° for best performance of the PILATUS 6M detectors. For the smallest crystal only 10-20 frames could be collected. Data from a number of crystals of each cypovirus including native and SeMet-substituted crystals were merged respectively to give a series of datasets with adequate redundancy (Table 1). Details for the data collection strategy of CPV17 at the Linac Coherent Light Source X-FEL (Stanford, U.S.A.) are described in (Ginn et al., 2015).
The structures of other cypovirus polyhedrins were solved using a method which combined molecular replacement and SIRAS. The molecular replacement was performed using PHASER (McCoy et al., 2007) with the mainchain atoms of known cypovirus polyhedrin structures as search models against the native data. As we solved additional polyhedrins more starting models could be used for molecular replacement. We also optimised the process by selecting those cypovirus types which bore the highest sequence similarity. The resulting phases were used to calculate anomalous and isomorphous difference Fourier maps using FFT (Read and Schierbeek, 1988;Winn et al., 2011). Potential heavy atom sites were found and positions refined by SHARP (Bricogne et al., 2003) with the phases from molecular replacement not only as a restraint but also helping the refinement. The phases from SIRAS and molecular replacement were finally combined to be the restraint phases in BUSTER (Bricogne et al., 2011) during the process of refinement. Model building was performed in Coot (Emsley and Cowtan, 2004) and final refinement used PHENIX (Adams et al., 2010). All the structures were validated by MolProbity . Details of the refinement are in Table 2.
Molecule interfaces and oligomerization were evaluated by PISA server at the European Bioinformatics Institute (Krissinel and Henrick 2007). Three-dimensional structure alignment was performed and analyzed by SHP (Stuart et al., 1979). Illustrations were generated by Pymol (Schrodinger, 2010).

Alkaline disruption
Initially each of eleven polyhedra samples (CPV1,4,5,14,15,17,18,19,20,1Δ and 4Δ) were suspended in 500 μl alkaline buffer (50 mM NaHCO3, pH 10.5) at room temperature. The optical density of the sample was measured at regular time intervals by a spectrophotometer at the wavelength of 600 nm (OD600). The measurements were compared with the starting value to characterize the dissolved ratio. In a follow-up experiment aliquots (20 μl) of CPV1, 4, 1Δ and 4Δ polyhedra suspended in water were taken and centrifuged at 20000 ×g for 1 min.

Structure determination
The polyhedrin genes from eight cypovirus types (4, 5, 14, 15, 17, 18, 19 and 20) were cloned and recombinant baculoviruses generated (Experimental Procedures). The resultant recombinant cypovirus polyhedra varied in size from 6-8 μm for CPV4, to ~1 μm for CPV17 (Table 1). The use of a small (~5 μm × 5 μm) beam at the tuneable micro-focus I24 beamline at Diamond Light Source allowed data collection at 100 K and structure solution at 2.2 Å resolution or beyond for all of these polyhedra (except that the initial structure determination of CPV17 required the use of an X-FEL see below, Table 1). Due to the tiny size of the crystals, data from a number of crystals were merged to attain a complete dataset with sufficient signal to noise. In four cases the structures were solved by SeMet labelling, and for two of these over 100 crystals were required to achieve sufficient anomalous signal to solve the structure (Table 1). The final structures were all reliable (Table 2). Data for CPV17 were also collected at room temperature on the CXI beamline at the Linac Coherent Light Source, to a resolution higher than could be achieved on I24. The small differences between the CPV17 100 K and room temperature structures have been described (Ginn et al., 2015).

General description of the fold
All polyhedra belong to the same I23 space group and the native crystals (at 100 K) mostly possess a unit cell of 102-103 Å (102.8±0.4 Å). The exceptions were CPV4, which forms crystals with 2 distinct unit cell dimensions (101.7 and 104.8 Å -the structure of the larger unit cell is an outlier and is discussed below), and CPV17 which has larger cell dimensions (106.1 Å room temperature and 104.9 Å 100 K). The fold of all the polyhedrins is highly similara βbarrel core sitting on a base domain, surrounded by a common array of five helices (H1-H5, Fig.   1a). Fig. 1b shows the arrangement of the major secondary structure elements along the polypeptide chain. The β-barrel core is the most conserved part of the structure -90% of Cαs align with rmsd 1.1 Å (Fig. S1) and non-aligned residues lie at the periphery of the core and the termini. The β-core is reminiscent of the jelly-roll fold seen in the capsids of many icosahedral viruses (Koonin, 2008) ( Fig. 1a), however, the arrangement of the nine β-strands in the two βsheets, IBADGF and CHE, is novel. The base domain lies between H2 and βD, comprises H3 and the loops connecting it to the core and varies in length from 33-41 residues between the different cypovirus types (Fig. S2). The base caps an exposed side of the β-sheets and interacts with the loops connecting the β strands. Despite little amino acid sequence identity ( connections between secondary structures (H2 and H3, βE and βF, βF and βG, βG and βH), and the C-terminal loop, which we designate V1-V5 respectively (Fig. 1). Differences in these regions give each polyhedrin its characteristic appearance. CPV4 features long V1 and V5, which loop back to interact. CPV5 has insertions at V3 and V2c which form two hairpins resembling a clamp. CPV14 essentially lacks a V1 extension whereas a long V5 loop forms a hook and an intra-molecular disulphide bridge helps to stabilise the base domain. CPV15 is like CPV5, but with a smaller clamp, and has a dramatic conformational change in V4. CPV17 is unique in having an incompletely (~50%) formed inter-chain disulphide bond at room temperature, which may confer mechanical stability (Ginn et al., 2015). Baculovirus polyhedra also contain a disulphide bond, involving a single cysteine, which links to form a dodecamer (Ji et al., 2010). As expected from the 83% amino acid sequence identity, CPV18 looks very similar to CPV1. CPV19 possesses a combination of a long V1 and V2c hairpin. CPV20 has elongated V1 and V5 but unlike CPV4 they do not interact. The V5  There is no obvious significance to this change.

Very few amino acids are conserved
Only two residues are conserved across the nine structurally aligned polyhedrins. When CPV17 is excluded this increases to five (highlighted in Fig. S2: Q17, G95, P106, E217 and Y232, using CPV1 numbering), all located in non-variable regions. Q17 lies in H1 and the side chain engages the main chain of a distal part of the molecule -NE2 with the carbonyl group of residue 149, and OE1 with the amino group of residue 165. E217 and Y232 are towards the C-terminal region, outside the core. E217 OE2 forms a hydrogen bond with Y232 OH, and an ionic interaction with R13 in a symmetry related molecule, except in CPV5 and CPV20. E217 OE1 forms a hydrogen bond with the amino group of residue 235. P106 bridges the base domain and strand βD, and its limited flexibility likely helps direct the correct domain organisation. G95 is in the base domain close to H3, indeed only a glycine residue would allow the required packing of H3.

Crystal contacts and assembly
To investigate whether the conservation of the molecular structure and crystal packing arises from conserved interactions, the inter-and intramolecular interactions made by residues were plotted against residue number (Fig. S4). This confirms that both the inter-and intramolecular interactions are broadly conserved across all nine polyhedra. The strongest intermolecular interactions define a trimer composed of a cluster of three β-barrels sandwiched by helices top and bottom (Fig. 2a). Strong hydrophobic interactions and large numbers of hydrogen bonds, mainly involving H3, H4, elements within V3 and sheet-CHE, bind the trimer around the body diagonal three-fold axes. Some 38% of the amino acids in each subunit participate in the interface, which buries ~3500 Å 2 of surface area (Fig. S4a). Three outstretched arms, formed mainly by H1, in part, mediate a complex higher-level assembly, with each trimer contacting eight others. These arms stretch to contact another trimer on the far side of the unit cell (translated in the direction of a cell edge). The strongest trimer-trimer interactions are also lateral, involving trimers tightly packed along a cell edge, forming contacts of ~8000 Å 2 per trimer (involving ~55% of amino acids, Fig. 2b and S4b). A further interaction of ~28% of the polyhedrin amino acids, burying ~1500 Å 2 of surface per trimer ( Fig. 2b and S4b) is also formed.
The net effect of these lateral interactions is to form layers of molecules in all three directions, stacked alternately (Fig. 2c). As the network of trimers builds up, four trimers are brought close the panels the 'core' structure shown is that of CPV5 using just the residues which are aligned between all the 9 polyhedrin structures. Structural alignments were performed with SHP (Stuart et al., 1979). The surface maps have been progressively smoothed which results in a slightly coarse and stylised rendering but which helps to visualise the complicated interactions.
together around (0, 0, 0) and (½, ½, ½), to form a dodecameric arrangement and there is interaction mainly between the base domains ( Fig. 2b and S4b) (involving around 9% of amino acids, and burying ~900-1800 Å 2 per trimer). The sum of such interactions, from which the whole network of a polyhedron crystal can be constructed, involves the majority of the surface of the trimer (Fig. S4b).
Overall there is remarkable strength and redundancy in the interactions, with no single weak point, and ambiguity in the assembly pathway may be key to the molecule's function, flexibility in assembly allowing the lattice to remain intact whilst building around virus particles.

The base domain is dispensable
Native crystals of CPV4 polyhedra grouped in to two distinct unit cell dimensions, 101.7 (±0.27) Å and 104.8 (±0.18) Å, whilst all SeMet crystals had the larger size (Table 1). Crystals with both unit cell dimensions were observed on each of several frozen meshes suggesting that this was not an artefact of sample preparation. In the larger-cell structure, residues 1-12, 75-108, 189-192 and 254 were not visible, however SDS-PAGE analysis of crystals showed only a single protein band, corresponding to the full length polyhedrin, suggesting that this is not due to cleavage.
Residues 75-108 and 189-192 are clustered around the centre of the dodecameric assembly described above. In the absence of these trimer-trimer interactions the trimers of the dodecamer slide away from teach other, enlarging the unit cell. vs. 103.0 Å). The absence of the base domain did not influence the overall structure of the trimeric building block but increased flexibility such that the electron density for 40-50 residues in the inter-trimer contact areas could not be observed. These disordered regions include the areas in CPV1 which previously had been shown to bind NTPs, so that NTPs are absent from CPV1Δ. The base domain is therefore a region that is neither required for proper folding of the protein, nor for crystal assembly, but fine-tunes the crystal, 'locking-down' the structure, often in conjunction with NTPs. This region is also possibly important for virion recognition and packaging, as discussed below. There are no contiguous channels in the wild-type polyhedra (Fig.   S5), although there are significant voids where trimers come together to form dodecamers, and in the heart of the trimers. In CPV1, 14, 15, 17, 18 and 20 solvent molecules can move between these two voids, whereas in CPV4, 5 and 19 they are blocked by tyrosine residues. In contrast in the base domain deletion mutants all major cavities are connected, resulting in contiguous solvent channels running through the crystal lattice (Fig. S5).

Alkali disruption
Eleven different cypovirus polyhedra (nine wild-type and two deletion mutants) were incubated in carbonate at pH 10.5 at room temperature for an hour, and the rate of disruption monitored by measurement of OD600 of the insoluble fraction (Fig. 3a). CPV1, CPV19 and CPV20 were resistant such that only a small portion was dissolved in an hour while the other six cypovirus types plus the two deletion mutants dissolved dramatically. To investigate if hindrance of access or buffer chemistry might be a factor in these differences we investigated a subset of polyhedra, (b) Effect of five alkali pH buffers on polyhedra of 4 cypovirus types. Release of protein from insoluble polyhedra was monitored by A280 after 20mins. As a reference, aliquots of polyhedra were completely dissolved in carbonate buffer pH 12.0, and this was used to convert to% disruption.
CPV1, CPV4 and their deletion mutants, in a series of alkaline buffers at pH 10.0 for 30 minutes (Fig. 3b). Wild-type polyhedra behaved very differently in these buffers. Carbonate buffer is far more effective at dissolving polyhedra than other buffers, suggesting that the buffer structure as well as pH is important. The deletion mutants are much more sensitive to alkaline buffers, even those such as glycine and borate which do not disrupt crystals of full-length polyhedrin, although CAPS could not dissolve any of the samples effectively. Overall effectiveness is inversely correlated with size of the buffer molecule (Table S2) and the three CPVs (CPV1, 19 and 20) most resistant to alkali buffers have more extensive interactions between the three-fold symmetry-related subunits at the CTP binding site, either strengthened by CTP (CPV1) and UTP (CPV20) binding (see below) or achieved by a large number of protein interactions mainly through hydrogen bonding, suggesting that physical access is a major factor in allowing dissolution, and that crystals tend to be resistant to non-physiological alkalis.
It has been proposed that a cluster of tyrosine trigger alkali disruption of both cypovirus (Coulibaly et al., 2007) and baculovirus (Chiu et al., 2012;Coulibaly et al., 2009;Ji et al., 2010) polyhedra, with some supporting mutagenesis data for CPV1 (Y. Ohtsuka et al., 2010). However, of the eight new structures, only CPV18 has a full tyrosine cluster of 8 residues as found in CPV1, the others have between 2 to 5. Nevertheless in all types except CPV17 the hydroxylgroup of conserved tyrosine 232 (CPV1/18 numbering) hydrogen bonds to another conserved residue, glutamate 217 (CPV1/18 numbering) (Fig. S2). We suggest that upon raising the pH, the tyrosine acquires a negative charge, repelling the glutamate, moving H5 and the C-terminal region and initiating disruption of the crystal lattice. This tyrosine is also vital for polyhedra assembly since mutating it ablates crystal formation (Y. Ohtsuka et al., 2010).

NTPs and protein chains can interchange
In CPV1 three NTPs per polyhedrin subunit are specifically trapped within the protein matrix (Coulibaly et al., 2007), however some of the nine types of polyhedra we have compared harbour none (Table 2). CPV1 in fact bears the most NTPs and so we will relate our results to that type where the NTPs lie in two areas; ATP and GTP bind close to each other to form a purine cluster whilst CTPs amass at the centre of the dodecamer, the pyridine region.
Purine cluster: The purines are found at three 2-fold axes on the facets such that there are eight symmetry equivalent purine clusters per facet (Fig. 4). Each has complex interactions with the neighbouring protein, including variable regions V1, V3, V4, V2c and V5 contributed by four different polypeptide chains, forming diverse interactions between the facets of unit cells in different polyhedra (Fig. S6). In CPV18 purine binding is almost identical to CPV1 in that GTP hydrogen bonds with V2c and V1 while ATP is supported by two V4 regions from different polypeptide chains. The ATP and GTP bases stack with each other and with the phenol ring of Y172. There are two substitutions in the binding pocket, K154H and R155V. Residue 155 interacts via the main chain nitrogen, and is thus sequence independent, whilst at residue 154 both the lysine and histidine side chain NH groups hydrogen bond with ATP phosphates.
CPV4, 5 and 20 lack both ATP and GTP, whilst the other cypovirus types have varying combinations, in all cases the V1, V3, V4, V2c or V5 regions of the protein adopt alternate polypeptide conformations (Fig. 4). Thus in CPV4, the purines are replaced by greatly elongated V1 and V5 regions, whilst the shortened V4 loop would no longer be able to contact the ATP. In CPV20 the V4 loop is similarly shortened whilst V1, V2c, and V5 infiltrate the region. In CPV5 V3 (179-187) and V2c (135-140) form clamp-like protrusions which fill the GTP and ATP pockets, whilst in the SeMet form a calcium ion links four polyhedrin molecules (Fig. S3).  In summary the two purines are shrouded by amino acids, leaving almost no cavity (Fig. 4) and bridge at least four protein molecules to reinforce interactions between adjacent unit cells. Where one or both are missing, the polypeptide is altered to substitute for the missing moieties.
Pyridine region: NTP binding in this area is only observed in CPV1 and CPV20. In CPV1 a ball of 12 CTPs lie at the point of 23 symmetry at the heart of the dodecamer (Fig. 5). The pyridine in CPV20 is not well ordered, but can be identified as UTP based on the limited space and the hydrogen bonding environment provided by residue 101 main chain nitrogen and the side chain nitrogen of H79, whilst the triphosphates are supported by two lysines, two histidines and an arginine. The pyridines stabilise the dodecamer, although in CPV1 the CTPs also interact with each other, whilst the UTPs in CPV20 are more deeply buried in the trimer interface ( Fig. 5 and   S7).
In the cypovirus types which contain no pyridine the proteins take up a number of different configurations. In CPV14 and CPV19 the V2n loops move to occupy some of the central space.
CPV4 has limited numbers of molecular interactions in this area, while in CPV5 a few strong The view for the nine cypovirus types is from the centre out along a body diagonal. Each trimer is roughly denoted by lines. The 9 subunit protein chains visible in the view are coloured separately with the shading grouped according to which trimer they belong to (blue, green and red-yellow). Nucleotide moieties are shown as sticks along with side chains important in this region. The 13 residues missing from the CPV15 model are shown by dashed lines.
trimer-trimer interactions are present. In CPV15, 13 disordered residues are situated around this region (indicated by dashed lines in Fig. 5) leaving the trimer contacts apparently weakened.
Despite a strong overall similarity with CPV1 there is no pyridine bound in CPV18. Many of the key residues that bind the CTP in CPV1 (H76, N77, D78, S79, Y80, D81, D96 and R98) are substituted, including H76N and S79P, which interact with the phosphates and R98T which stacks with the base. Clearly these changes are sufficient to disrupt CTP binding (Fig. 5).

Discussion
Structures for a number of previously uncharacterised types of cypovirus polyhedrins have revealed a strongly conserved structural core. A phylogenetic tree constructed based on the structures (Fig. 6) shows, as expected, that CPV1 and CPV18 are closely related structurally, but otherwise the tree is rather star-like, with CPV17 being perhaps rather an outlier and closest to the markedly different baculovirus polyhedrin. It is conceivable that CPV17 retains some features of an ancestral insect virus polyhedrin. Given the evolutionary explosion of insects some 400 million years ago the considerable divergence of the polyhedra suggests that much of the substantial evolution of the CPVs took place alongside these events, perhaps explaining why the build-up of variation appears almost saturated, with only two amino acids fully conserved across all nine types. Away from the core of the polyhedrin we identify five regions of structural variability and show that NTPs are used in the assembly of some but not all proteins, so that they impose little constraint on the polyhedrin structure. We also find that the whole base domain is dispensable for polyhedra formation, although the biochemical features are altered dramatically when it is removed. The base domains include the most variable region of the protein and nestle around a point of 23 symmetry. The bases likely attach to the virus to be encapsidated and are were performed using SHP (Stuart et al., 1979) and the phylogenetic tree calculated with PHYLIP (Felsenstein, 1989). The length of the branch to baculovirus polyhedrin (BV) has been reduced to 20% of its actual length. Structures of the proteins are shown coloured from Nterminus to C-terminus, blue to red. NTPs are represented in thickened stick style with a black outline to aid visualization. Cysteines involved in disulphide bridges are shown as black spheres, and the calcium ion in CPV5 as a pink sphere. therefore either packed together inside the crystal or exposed on the outside, furthermore, since they do not drive assembly they will tend to remain accessible on the surface of the growing crystal, ready engage a protein-only icosahedral shaped CPV particle. The outer faces of the base domains are positively charged across nearly all the types, at present there is no little structural information for charge status of the virion component that is likely to attach to the polyhedrin. It seems that in the final stage of assembly the base domains pull the lattice into a closer knit form, presumably allowing useful flexibility in the lattice in the vicinity of the viral inclusions (note that the presumed ancestral CPV17 does not fully lock down the lattice, retaining a larger unit cell). The variety of NTPs seem to act as small molecule 'fillers' in the absence of virus particles, allowing lattice completion and the avoidance of crystal voids whilst permitting appropriate specificity in the virus recognition motifs between CPV types. Overall this domain is likely a late addition to the fold (it is dispensable for protein folding) and probably was a feature acquired to aid viral recognition (it is distal to the point of attack by ions which initiate the dissolution process by modifying a point critical for assembly). There may be potential for exploiting the variation in the base domain for nanotechnology applications.
Structure based sequence alignments of the polyhedrins show that whilst approximately 80% of the C alphas are structurally equivalent (pairwise rmsd from 0.2 to 1.5 Å), the corresponding sequence identities are only 12-32% (Fig. S2). The exception is the polyhedrin of CPV18, which has >80% sequence identity with that of CPV1, consistent with their relatively recent divergence (Graham et al., 2007). Although the two molecules are very similar (rmsd of 247 structurally equivalent Cαs 0.23 Å (Fig. S1)) they behave dramatically different to the alkali buffers, CPV18 being far more sensitive to dissolution. Clearly functional differences have been locked into these molecules, probably arising from the structure around the CTP binding site. Infact the three CPVs (CPV1, 19 and 20) most resistant to alkali buffers all display more extensive interactions between the 3-fold symmetry-related subunits of the trimeric building blockstrengthened by either pyridine binding (CPV1 and CPV20) or numerous protein-protein hydrogen bonds.
Overall polyhedra present a fascinating picture of the effect of 400 million years of evolution on a system where crystal lattice formation is conserved in the face of massive sequence variability.
By spreading the crystal contacts over the majority of the surface of the molecule lattice formation remains robust in the face of many individual changes, in the same way that the tertiary structure is robust. Indeed whilst most proteins harbour the greatest fraction of conserved residues in the protein core (e.g. 63% for picornavirus 3C proteases) only a minority (43%) of the conserved residues in CPV polyhedrins are buried in the core. .

Accession numbers
Coordinates and structures factors have been deposited in the PDB with accession codes :

Appendix A. Supplementary data
Supplementary data associated with this article can be found, in the online version, at YYYY.