Crystal structure of the 3C protease from Southern African Territories type 2 foot-and-mouth disease virus.

The replication of foot-and-mouth disease virus (FMDV) is dependent on the virus-encoded 3C protease (3C(pro)). As in other picornaviruses, 3C(pro) performs most of the proteolytic processing of the polyprotein expressed from the large open reading frame in the RNA genome of the virus. Previous work revealed that the 3C(pro) from serotype A-one of the seven serotypes of FMDV-adopts a trypsin-like fold. On the basis of capsid sequence comparisons the FMDV serotypes are grouped into two phylogenetic clusters, with O, A, C, and Asia 1 in one, and the three Southern African Territories serotypes, (SAT-1, SAT-2 and SAT-3) in another, a grouping pattern that is broadly, but not rigidly, reflected in 3C(pro) amino acid sequences. We report here the cloning, expression and purification of 3C proteases from four SAT serotype viruses (SAT2/GHA/8/91, SAT1/NIG/5/81, SAT1/UGA/1/97, and SAT2/ZIM/7/83) and the crystal structure at 3.2 Å resolution of 3C(pro) from SAT2/GHA/8/91.


INTRODUCTION
Diseases caused by RNA viruses are often difficult to control because of the high mutation rate and the continual emergence of novel genetic and antigenic variants that escape from immune surveillance. The degree to which immunity induced by one virus is effective against another is largely dependent on the antigenic differences between them. Foot-andmouth disease virus (FMDV) is an example of an antigenically variable pathogen that infects many species of cloven-hoofed animals, such as cattle, sheep, pigs and goats, and remains a potent threat to agricultural livestock (Sutmoller et al., 2003). Although FMD vaccines made from chemically inactivated virus particles are in widespread use, control of the disease remains difficult. This is because the vaccines provide only short-lived protection and the virus occurs as seven clinically indistinguishable serotypes (O, A, C, Asia1 and three Southern African Territories serotypes: SAT1, SAT2 and SAT3), each of which has multiple, constantly evolving sub-types (Knowles & Samuel, 2003). Viruses belonging to the SAT serotypes display appreciably greater genomic and antigenic variation in their capsid proteins compared to serotype A and O viruses (Bastos et al., 2001;Bastos et al., 2003;Maree et al., 2011), possibly due to their long term maintenance within African buffalo (Syncerus caffer). Constant surveillance of circulating strains is required to ensure that vaccine stocks remain effective.
In common with other members of the picornavirus family, FMDV has a singlestranded, positive-sense RNA genome. Cell entry in infected hosts is followed immediately by translation of a large open reading frame in the viral RNA. This yields a polyprotein precursor of over 2,000 amino acids that is processed into fourteen distinct capsid and non-structural proteins for virus replication. The majority of this processing is done by the virus-encoded 3C protease (3C pro ), which cleaves the precursor at ten distinct sites. FMDV 3C pro may also assist infection by proteolysis of host cell proteins and has RNA-binding activity that is important for initiation of replication of the viral RNA (reviewed in Curry et al., 2007b).
Crystallographic analysis of the 3C pro from a type A FMDV (sub-type A10 61 ) showed that, similar to other picornavirus 3C proteases, it adopts a trypsin-like fold consisting of two β-barrels that pack together to create a centrally-located Cys-His-Asp/Glu catalytic triad in the active site (Allaire et al., 1994;Matthews et al., 1994;Mosimann et al., 1997;Birtley & Curry, 2005;Yin et al., 2005). Subsequent studies on FMDV 3C pro complexed with peptides derived from the viral polyprotein work revealed that substrate recognition is achieved by conformational changes primarily involving the movement of a β-ribbon (residues 138-150) that helps to secure the position of cognate peptides in relation to the active site of the protein (Sweeney et al., 2007;Zunszain et al., 2010).
Sequence analysis has shown that while variation within FMDV 3C pro does not rigidly reflect that observed with capsid proteins, the SAT-type 3C proteases generally form a distinct cluster (Van Rensburg et al., 2002). Mapping of the sequence variation between different FMDV serotypes onto the structure of A10 61 3C pro indicated that the peptidebinding face of the protease is completely conserved among the non-SAT serotypes (which are 91-97% conserved in amino-acid sequence), supporting the notion that identification of inhibitors of the protease might aid the development of broad spectrum antiviral drugs (Birtley & Curry, 2005;Curry et al., 2007a). This structure should therefore serve as a useful model for the 3C protease from this group of viruses. However, the same comparison suggested the presence of at least two amino acid differences on the peptide-binding surfaces between A10 61 3C pro and the corresponding 3C sequences from SAT serotype viruses.
To provide a more complete picture of the structural variation between FMDV 3C proteases from different serotypes, we set out to determine the crystal structure of 3C pro from at least one SAT serotype virus. We report here the cloning and expression of 3C pro from four distinct SAT1 and SAT2 viruses and the crystal structure of the 3C pro from a SAT2 serotype virus (SAT2/GHA/8/91).

Cloning and mutagenesis
We used the polymerase chain reaction (PCR) to amplify the coding regions for the FMDV 3C proteases of sub-types SAT2/GHA/8/91 (Accession No. AY884136), Site-directed mutagenesis was performed with the Quikchange method (Stratagene), using KOD polymerase (Novagen). All DNA sequences were verified by sequencing.
Details of the particular modifications made to expressed proteins are given in the Results and Discussion section.

Protein expression and purification
All SAT-type 3C proteases were expressed in cultures of BL21 (DE3) pLysS E. coli (Invitrogen) grown in lysogeny broth (LB) at 37 • C with shaking at 225 rpm.
Protein expression was induced for 5 h by the addition of 1 mM isopropyl β-d-1thiogalactopyranoside (IPTG) once the optical density at 600 nm reached 0.8-1.0. Cells were harvested by centrifugation at 4550 g for 15 min at 4 • C and frozen at −80 • C.
The volumes given below are appropriate for processing the pellet from 1 L of bacterial culture. Cell pellets were thawed on ice and re-suspended in 30 mL Buffer A (50 mM HEPES pH7.1, 400 mM NaCl, 1 mM β-mercaptoethanol) supplemented with 0.1% Triton X-100 and 1 mM phenylmethylsulfonyl fluoride (PMSF) protease inhibitor. Cells were lysed by sonication on ice and lysates clarified by centrifugation at 29,000 g for 20 min at 4 • C. Protamine Sulphate (Sigma) was added to 1 mg/ml final concentration to precipitate nucleic acids, and lysates were then centrifuged again at 29,000 g for 20 min. The supernatant was filtered using a 1.2 µm syringe filter and incubated for 90 min at 4 • C with slow rotation in 1 mL bed volume of TALON metal affinity resin (Clontech) pre-equilibrated with buffer A. This slurry was applied to a gravity-flow column and the TALON beads washed three times with 50 mL of Buffer A supplemented with 0, 5 and 10 mM imidazole respectively. His-tagged 3C proteins were eluted in 20 mL of Buffer A containing 100 mM imidazole, followed by a final wash with 10 mL of Buffer A containing 250 mM imidazole. To remove the His tag the eluted protein was mixed with 100 units of bovine thrombin (Sigma) and dialysed for 16 h at 4 • C in 4 L of Buffer A supplemented with 2 mM CaCl 2 . Cleaved protein was then re-applied to TALON resin to remove the cleaved His tag and other contaminants. The untagged protease was recovered in the flow through, concentrated using Vivaspin concentraters (3 kD MWCO) (Sartorius Stedim Biotech) and further purified by gel filtration using HiLoad 16/60 Superdex 75 gel filtration column (Amersham Bioscience) in Buffer A supplemented with 1 mM EDTA and 0.01% sodium azide at a flow rate of 0.5 mL/min. Peak fractions were pooled, concentrated and stored at −80 • C. Protein concentrations were determined from absorbance measurements at 280 nm using extinction coefficients calculated with the ProtParam tool (Gastiger et al., 2005).

Crystallisation and structure determination
Crystallisation trials with purified SAT-type 3C pro were performed at 4 • C and 18 • C using protein concentrations in the range 5-10 mg/mL. Initial screens were done by sitting drop vapour diffusion using a Mosquito crystallisation robot (TTP Labtech). Typically in each drop 100 nL of protein was mixed with 100 nL taken from the 100 µL reservoir solution. Trials were performed with the following commercial screens: crystal screen 1 and 2, and PEG/Ion (Hampton Research); Memstart, Memcys, JCSG+, and PACT (Molecular Dimensions); Wizard 1 and 2 (Rigaku Reagents).
Crystals of g3C-SAT2-G(1-208) for data collection were washed in the mother liquor (15% (w/v) PEG-8000, 0.09 M Na-cacodylate pH 7.0, 0.27 M Ca-acetate, 0.01 M Tris pH 8.5, 0.08 M Na-thiocyanate) supplemented with 20% (v/v) glycerol, and immediately frozen in liquid nitrogen in a nylon loop. X-ray diffraction data were processed and scaled with the CCP4 program suite (Collaborative Computer Project No. 4, 1994), and phased by molecular replacement using the coordinates of type A10 61 FMDV 3C pro (PDB ID 2j92;(Sweeney et al., 2007)) as a search model in Phaser (McCoy et al., 2007). The search model was edited to delete side-chains (to the C β atom) for all residues that differed with g3C-SAT2-G(1-208) and to remove all the atoms in the β-ribbon (residues 138-150), since these have been observed to vary in structure between different crystal forms (Sweeney et al., 2007). Model building and adjustments were done using Coot (Emsley et al., 2010); crystallographic refinement was performed initially with CNS (Brünger et al., 1998) and completed using Phenix (Adams et al., 2010).

Protein expression and crystallisation
We engineered bacterial expression plasmids for FMDV 3C proteases from four SAT sub-types: SAT2/GHA/8/91, SAT1/NIG/5/81, SAT1/UGA/1/97, and SAT2/ZIM/7/83 (see Materials and Methods) which have 80%, 92%, 82% and 85% amino acid sequence identity respectively with the 3C pro from FMDV A10 61 (Fig. 1). In doing so we were guided by the lessons learned from work to express and crystallise subtype A10 61 FMDV 3C pro , which suggested that preserving the N terminus of the protein but truncating the C terminus by up to six residues would be optimal for solubility and crystallisation (Birtley & Curry, 2005). Accordingly, for each SAT sub-type we generated expression constructs that add a thrombin-cleavable His tag to the N terminus of residues 1-208 of the 213 amino acid 3C protease; following thrombin cleavage there is a single additional Gly residue appended to the N terminus of the protease polypeptide. To ensure the solubility of the SAT-type 3C proteins, we introduced to all constructs a C142A substitution to remove a surface-exposed Cys that had been shown previously to be responsible for protein aggregation (Birtley & Curry, 2005;Birtley et al., 2005). (The C95K mutation also introduced to eliminate aggregation of A10 61 FMDV 3C pro (Birtley & Curry, 2005) was not needed here because residue 95 is an Arg in the SAT 3C proteases used in this study). In addition, the active site nucleophile was eliminated from all constructs by incorporation of a C163A substitution to prevent adventitious proteolysis in highly concentrated samples of purified 3C pro . For consistency with our earlier naming scheme these SAT2/GHA/8/91, SAT1/NIG/5/81, SAT1/UGA/1/97, and SAT2/ZIM/7/83 3C constructs will be referred to as SAT2/G-g3C pro (1-208), SAT1/N-g3C pro (1-208), SAT1/U-g3C pro (1-208), and SAT2/Z-g3C pro (1-208) respectively.
The 3C pro proteins from all four SAT sub-types yielded soluble protein that was purified first by metal-affinity chromatography and then, following thrombin cleavage of the N-terminal His tag, on a gel filtration column (see Materials and Methods). Of the four, SAT1/N-g3C pro (1-208) appeared to be the most soluble and could be concentrated to 20 mg/mL (see Table 2). The other three variants exhibited some precipitation during gel filtration, indicated by a void peak containing aggregated 3C pro , which was about one-third of the area of the monomeric peak. They also had lower apparent solubility limits and could be concentrated to ∼6 mg/mL (SAT2/G-g3C pro (1-208)) or ∼11 mg/mL (SAT1/U-g3C pro (1-208), and SAT2/Z-g3C pro (1-208)).
In crystallisation trials we only obtained crystals from the 3C pro of a single sub-type: SAT2/G-g3C pro (1-208). These exhibited a variety of habits but the largest were needleshaped and were typically 10 µm wide and up to 300 µm long. In initial diffraction tests on beamline ID23-2 at the European Synchrotron Radiation Facility (ESRF) showed that the crystals belonged to a trigonal spacegroup and diffracted to a resolution limit of 2 Å. Unfortunately, for reasons that remain unclear, efforts to reproduce these crystals proved unsuccessful. In subsequent trials diffraction was limited to ∼3 Å. (A) Section of the 3.2 Å resolution electron density map (blue chicken wire) calculated with phases from the final refined model, which is shown as sticks coloured by atom type: grey-carbon; red-oxygen; blue-nitrogen; yellow-sulphur. (B) Overall structure of SAT2/G-g3C pro (1-208), with secondary structure elements indicated. The N-and C-terminal β-barrels are coloured green and blue, respectively. (C) Superposition of the five molecules of SAT2/G-g3C pro (1-208) in the asymmetric unit of the crystal, shown in ribbon representation. (D) Comparative superposition of SAT2/G-g3C pro (1-208) (teal) with A10 61 3C pro in the absence (purple; PDB 2J92) and presence (orange; PDB 2WV4) of a peptide substrate (shown in stick representation).
We used mutagenesis to engineer modifications to the SAT2/G-g3C pro (1-208) construct in the search for better crystals. Although alterations to trim the C-terminus by one residue (in SAT2/G-g3C pro (1-207)), or to add back a single His residue (in SAT2/G-g3C pro (1-207 h))-strategies that had been useful when working with type A10 61 3C pro (Birtley & Curry, 2005)-both yielded soluble protein (Table 2) and SAT2/G-g3C pro (1-207 h) produced crystals, there was no improvement in the resolution of the diffraction.
In a further effort to enhance crystal quality, we used the Surface Entropy Reduction prediction server (Goldschmidt et al., 2007) to design additional SAT2/G-g3C pro (1-208) mutants. We made four different mutants, each containing the following pairs of
The crystals belong to space-group P3 2 and have a long c-axis (318.5 Å). The diffraction data were phased by molecular replacement using a search model based on the crystal structure of type A10 61 FMDV 3C pro , which is 80% identical in amino-acid sequence to SAT2/G-g3C pro (1-208) (see Materials and Methods). This gave an unambiguous solution with a log likelihood gain of 1495 (McCoy et al., 2007), revealing five molecules in the asymmetric unit. Though of modest resolution, the initial electron density maps ( Fig. 2A) were of sufficient quality to guide adjustment of the initial molecular replacement model prior to multiple interleaved rounds of refinement and model building. Because of the limited resolution and non-crystallographic symmetry, refinement was performed using group B-factors and non-crystallographic restraints. Model building was done conservatively-amino acid side-chains were truncated to the C β atom in cases where there was no indicative electron density. The final model of SAT2/G-g3C pro (1-208) contains residues 7-207 for all five chains and has an R free of 27.2% and good stereochemistry; full data collection and refinement statistics are given in Table 3. As expected, given the high level of amino acid sequence identity with A10 61 3C pro (80%), FMDV SAT2/G-g3C pro (1-208) adopts the same trypsin-like fold (Fig. 2B), which has been described in detail elsewhere (Birtley & Curry, 2005;Sweeney et al., 2007). Superposition of the five molecules in the asymmetric unit shows that they are highly similar to one another ( Figs. 1 and 2C)-the pair-wise root mean square deviation in C α positions between chains is 0.2-0.3 Å. The largest differences are observed in the longest surface-exposed loops, the E 1 -F 1 loop in the N-terminal β-barrel and the B 2 -C 2 loop known as the β-ribbon in the C-terminal β-barrel (Fig. 2C). These are also the regions of greatest difference between SAT2/G-g3C pro (1-208) and A10 61 3C pro ; (overlay of the two structures yields an overall Notes. a Values for highest resolution shell given in parentheses. b R merge = 100 × hkl |I j (hkl) − I j (hkl) / hkl j I (hkl), where I j (hkl) and I j (hkl) are the intensity of measurement j and the mean intensity for the reflection with indices hkl, respectively. c R work = 100 × hkl ||F obs | − |F calc ||/ hkl |F obs |. d R free is the R model calculated using a randomly selected 5% sample of reflection data that were omitted from the refinement. e RMS, root-mean-square; deviations are from the ideal geometry defined by the Engh and Huber parameters (Engh & Huber, 1991). rms deviation in C α positions of ∼0.6 Å) (Fig. 2D). The flexibility of the β-ribbon, which shifts in position to aid peptide binding, has been noted before (Zunszain et al., 2010) and clearly it plays a similar role in SAT-type 3C proteases.

CONCLUDING REMARKS
The results reported here provide a template structure of a SAT-type FMDV 3C protease that should be of value in directing molecular investigations of this group of proteases. Although it is frustrating that higher-resolution diffraction data were not obtained, given that initial crystals of SAT2/G-g3C pro (1-208) diffracted to 2 Å, this should be possible with further optimization. Likewise, since soluble 3C pro was found to be purified from three other SAT-type viruses-notably SAT1/NIG/5/81-crystal structures for these proteases may well also be achievable.