Structure and Stability of FlgD from the Pathogenic 26695 Strain of Helicobacter pylori

Flagellin component D (FlgD) is a hook capping protein involved in the proper structural assembly of bacterial flagella but is not present in the mature flagella. We report here the crystal structure of the truncated form of FlgD from the strain 26695 of Helicobacter pylori (HpFlgD_26695) in the P2 space group at 2.8 Å resolution, along with data on protein stability. The structure includes only the central globular part of the protein composed of conserved Fn-III and tudor domains, while the Nand C-terminal regions are absent. This structure, together with the tetragonal crystal structure of HpFlgD_26695 and the monoclinic crystal structure of HpFlgD_G27 reported previously, revealed a different mutual orientation of the two domains in the HpFlgD structures in respect to FlgD from Xanthomonas campestris and Pseudomonas aeruginosa. In addition, investigation of protein stability in solution showed higher protein stability then in other bacterial orthologues reported so far.


INTRODUCTION
ELICOBACTER PYLORI (H.pylori) is a small (3.5 μm × 0.5 μm), slow growing, microaerophilic, non-spore forming and spiral-shaped Gram-negative rod bacterium.The bacterium became an important and interesting target of research since it is known that it can colonize human gastric cells and lead to development of severe gastrointestinal diseases such as chronic gastritis, peptic ulcer disease, gastric mucosa associated lymphoid tissue (MALT) lymphoma, and gastric cancer. [1,2]H. pylori flagellum is a complex structure composed of about 50 different proteins needed for its proper regulation and assembly.It provides the bacterial cell a chance to move to a more favorable environment and to prevent stressful conditions. [3]Major sections that define the flagellum are: the filament, the hook and the basal body. [4]FlgD plays an important role in H. pylori virulence, owing to its role in forming the hook cap component. [5]Moreover, FlgD interacts with FliK and FlhB and regulates appropriate assembly of FlgE monomers in the hook structure. [6]Until now crystal structures of two other FlgD homologues, from Xanthomonas campestris (XcFlgD, PDB code: 3C12) [5] and Pseudomonas aeruginosa (PaFlgD, PDB code: 3OSV), [7] have been determined, as well as two crystal structures of H. pylori FlgD from two different strains and in two different space groups (HpFlgD_26695_t, tetragonal space group I422, and HpFlgD_G27_m, monoclinic space group P2, with PDB codes 4ZZF and 4ZZK, respectively). [8]n this paper we describe the crystal structure of truncated FlgD from the H. pylori strain 26695 in a new space group, and discuss stability of the full length protein.

Cloning, expression and purification of HpFlgD_26695
Cloning of the HP0907 gene from Helicobacter pylori (strain 26695, Figure 1) was performed as described in Pulić et al. (2016). [8] Finally, elution of the protein was performed in buffer A with a stepwise gradient of imidazole from 20 to 500 mmol L -1 .Size-exclusion chromatography was conducted on an ÄKTA Purifier (GE Healthcare) using Superdex 200 10/300 column (GE Healthcare).Purity of the obtained fractions was monitored by electrophoresis under denaturating conditions (SDS-PAGE), and fractions with the highest concentration of protein were further concentrated by ultrafiltration in the Centricon system.

Protein Crystallization and X-ray Data Collection
Purified HpFlgD_26695 at a concentration of 17.5 mg mL -1 was used in the crystallization experiment.Crystallization was performed by the sitting drop vapor diffusion method with the Oryx 8 crystallization robot (Douglas Instruments) using 1 L of protein mixed with 1 L precipitant solution and equilibrated against 75 L of the mother liquor in the reservoir.Monoclinic crystals of HpFlgD_26695 were grown from the screening solution 6 of the PACT suite (Qiagen) (0.1 mol L -1 SPG pH 9.0, 25 % PEG 1500) at 293 K.
A crystal of approximate size of 0.15 × 0.05 × 0.02 mm 3 was frozen directly in liquid nitrogen with no cryoprotectant.Diffraction data were collected at 100 K at the Elettra synchrotron, beamline XRD1, on the 2M Pilatus detector.360 frames at 0.5° rotation range were collected.Diffraction data were indexed and integrated with iMOSFLM, [9] and scaled with Scala [10] contained in the CCP4i crystallographic package. [11]ructure Solution and Refinement The structure of HpFlgD_26695 was solved by molecular replacement using AMoRe, [12] implemented in the CCP4i suite, with the structure of the tetragonal crystal form of HpFlgD_26695 (PDB entry 4ZZF) as the search model.Iterative model building (using Coot), [13] and refinement cycles (Refmac5, [14] implemented in CCP4i) were used to complete the structure.MolProbity [15] was used for the evaluation of the refinement quality.Figures of the protein structure were prepared with ccp4mg. [16]gure 1.Sequence alignment of HpFlgD_26695 (HP, O25565_HELPY) and HpFlgD_G27 (HP, B5Z7R3_HELPG) using CLUSTAL OMEGA. [20]The alignment figure was generated using ESPript 3.0. [21]The numbering system is based on the HpFlgD_26695 sequence.Residues visible in the monoclinic crystal structures of HpFlgD_26695 and HpFlgD_G27 are labelled in pink while the rest of the sequences residues are labelled in orange.Similar residues in sequences are highlighted in blue, while residues of low or no similarity are labelled in black.Labels of the secondary structure elements are shown as well.Possible cleavage sites accessible to trypsin in both sequences are signed with black triangles.

Stability Studies
To investigate stability of HpFlgD_26695 a limited proteolysis experiment was carried out.Each of the four samples of HpFlgD_26695 ( = 3 mg mL -1 ) in buffer A was digested with a selected protease (trypsin,  chymotrypsin, thermolysin or thrombin) in a 1 : 1000 and 1 : 500 ratio at 298 K, and incubated for an hour.The yield of digestion was monitored by SDS-PAGE.The product obtained by trypsin digestion was further purified by size-exclusion chromatography (SEC) using the Superdex 200 10/300 column (GE Healthcare).
The influence of the components of the crystallization solution on the protein stability was also examined.HpFlgD_26695 was mixed with PEG 1500 and SPG buffer to reach the final values of (w / v) 25 % and 0.1 mol L -1 , respectively, at different pH values (pH 4, pH 6-9) and left at room temperature (298 K).In addition, time-course stability experiment of protein in buffer A without any additives was performed to monitor the cleavage kinetics.Stability of the protein over time was followed by SDS-PAGE.

Crystal Structure of the Monoclinic Form of HpFlgD_26695 and Its Comparison with Other FlgD Structures
Data collection and processing statistics of the monoclinic form of the HpFlgD_26695 protein (HpFlgD_26695_m) are given in Table 1.HpFlgD_26695_m crystallizes in the monclinic form, space group P2, which is a new crystal form in addition to the previously reported tetragonal form found for HpFlgD_26695 (HpFlgD_26695_t). [8]The structure, as was the case in the previously described structures (PDB codes: 4ZZK, 4ZZF), [8] is truncated with missing C-and N-termini (Figure 1).From the electron density map 146 amino acids (Asn127-Lys272) can be built.Statistics on the data refinement are given in Table 1.The overall B-factor of the monoclinic form of HpFlgD_26695 (69.2 Å 2 ) is slightly higher than in the previously solved monoclinic structure of HpFlgD_G27 (62.1 Å 2 ).Such high values of B-factors are consequence of low resolution data.
The asymmetric unit contains 4 protein molecules and 17 water molecules.HpFlgD_26695_m occupies a Vm of 2.64 Å 3 Da -1 and has a solvent content of around 53 %.As in the case of the previously reported HpFlgD structures, [8] the monomer consists of two domains, the fibronectin domain III (Fn-III) and the tudor domain (Figure 2).Overlaid monomer structures of the two published structures, the monoclinic (HpFlgD_G27_m) and tetragonal (HpFlgD_26695_t) forms, with the present monoclinic form (HpFlgD_26695_m) show r.m.s.d.s. of 0.27 Å and 0.44 Å, respectively, between 146 equivalent C  atoms (Figure 3).As shown in Figure 3, the largest structural difference in all three superposed HpFlgD structures is present in the loop region positioned in the tudor domain (from Asp245 to Lys248).As in the previously reported crystal structures of HpFlgD_26695_t and HpFlgD_G27_m the quaternary structure in HpFlgD_26695_m is a tetramer. [8]   ). [8]The r.m.s.d.s. for the superposition of 146 aligned C  atoms of HpFlgD_26695_t on HpFlgD_26695_m_A and 146 aligned C  atoms of HpFlgD_G27_m_A on HpFlgD_26695_m_A are 0.44 Å and 0.27 Å, respectively.A black arrow shows the region with major structural differences in all three HpFlgD structures.shows the tetramer assembly in the structure of HpFlgD_26695_m formed, as in the structure of HpFlgD_G27_m, by monomers A and B, together with their pair related by a two-fold symmetry axis.Monomers C and D form another tetramer in the same way.The two tudor domains of the neighbouring monomers within the tetramer are connected through the network of hydrogen bonds and salt bridges (Table 2).PISA program [17] was used to validate the biological assembly from the crystal structure packing and the analysis suggests that the tetramer observed in the crystal corresponds to the molecule in solution, as described in Pulić et al. (2016). [8]he largest differences in the hydrogen bond length in the monoclinic structures of  3).In the case of HpFlgD_G27_m the symmetrically equivalent monomers occupy a smaller interface area of 492.5 Å 2 which is formed by 6 hydrogen bonds and 2 salt bridges. [18]The interface involved in tetramerization in the tetragonal structure of HpFlgD_26695 occupies the smallest interface area of 478.8 Å 2 but involves the highest number of hydrogen bonds and salt bridges (12 H-bonds and 2 salt bridges). [18]In addition to the main interface (labelled t), which is involved in tetramerization, the other type of interfaces (labelled a-c) are also present in the structure of HpFlgD_26695_m, but form smaller interface areas (Table 3).
Other bacterial orthologues of the FlgD protein, whose crystal structures have been solved (XcFlgD, PaFlgD), [5,7] show the same tendency to lose the Nterminal part during crystallization.The HpFlgD sequence contains an extra 44 and 29 amino acids at the C-terminus in the G27 and 26695 strains, respectively in comparison to XcFlgD and PaFlgD.These amino acids are not present in the crystal structures of HpFlgD (Figure 1).
In all FlgD structures, the monomer is composed of conserved Fn-III and tudor domains.The most important difference is the different mutual orientation of the Fn-III and tudor domains in HpFlgD with respect to the same domains in XcFlgD and PaFlgD.When the Fn-III domain of PaFlgD is superposed on the same domain in HpFlgD_26695_m it can be seen that the tudor domains are not superposed but are rotated in respect to each other by approx.115° (Figure 4a).The tudor domains by itself superimpose well in these two structures (Figure 4b).For more details on the description of the differences between the HpFlgD and its orthologues see Refs.Pulić et  al. (2016). [8,18]

Limited Proteolysis Studies
Since the previously reported crystal structures of HpFlgD were truncated and the growth of crystals took 2 months, [8] we wanted to test if limited proteolysis could eliminate the flexible regions present at the surface of the protein and thus speed up the crystallization process.As previously described, HpFlgD_26695 was cleaved with four proteases.Three of the four enzymes gave similar results enriching the pool of species at about 14 kDa, while thrombin did not cut the protein (Figure 5a).The most homogenous and discrete digestion pattern was obtained with trypsin and this sample was used for further experiments.The protein digested with trypsin was eluted from the SEC column at a volume of 17.11 mL (Figure 6), showing a reduction of molecular weight in comparison with the full length HpFlgD_26695. [8]A solution containing a monomer of about 15 kDa was obtained.The size of protein estimated by MALS measurement was in agreement with the one estimated by size exclusion calibration with protein standards (MW digested HpFlgD_26695 = 15 kDa).Analyses of HpFlgD sequences by Peptide cutter (http://web.expasy.org/peptide_cutter/)showed 2 potential cleavage sites for the trypsin protease, positioned at the sequence number 121 (LR|EV) and 246 (DK|GK), Figure 1.The 3D model of the full length HpFlgD_26695 was built by the ITASSER software. [19]nvestigation of the modelled and truncated structures of HpFlgD_26695 imply that the potential cleavage sites (Arg121, Lys246) are positioned in loop regions at the surface of the protein and could be accessible for trypsin cleavage (Figure 7).In those cases, cleavage would generate a truncated version of the protein whose size is compatible with the crystallized fragment.Such in vitro conditions somehow reproduce a result similar to what occurs in the crystallization conditions, but they give rise to a protein fragment 2.4 kDa shorter than the one visible in previously described HpFlgD crystal structures (16.4 kDa, PDB codes: 4ZZK, 4ZZF). [8]The distinct processing levels encountered in the different experimental conditions most likely produced multiple oligomerization states observed in our study.Indeed, differently from the full length protein (that forms dimers) Table 2. Hydrogen bonds (Å) between monomers that are relevant for their assembly into a tetramer in the monoclinic crystal structure of HpFlgD_26695.* Denotes the salt bridge.Table 3. Different types of interfaces between the monomers in the monoclinic crystal structure of HpFlgD_26695.NHB and NSB correspond to the number of hydrogen bonds and salt bridges, respectively.The interface type signed as t is involved in tetramerization.The interface areas were calculated using the PISA software. [17]terface type Monomer1 and the crystallized species (that associate as tetramers), [8] such 14 kDa proteolysis product behaves as a monomer in solution.These results can be explained if we look at the tetramers observed in the crystal structures: a 14 kDa product can be easily obtained thanks to further degradation of the crystallized fragment.Indeed, if a 26 amino acids loss occurs at the C-terminus, implying the removal of the last three beta strands (10-12) of the crystallized species.This includes the loss of -strand 12 that is directly involved in the main interaction (12 and 9 strands) that enables protein tetramerization (Figure 2).

Stability of HpFlgD_26695 in Solution
All flagellin components D whose crystal structures are known (PDB codes: 3C12, 3OSV, 4ZZK, 4ZZF) were isolated as full length proteins; however, in the crystal structures they are lacking the C-and N-termini. [5,7,8]H. pylori survives in the stomach environment at pH 4−6.5.The FlgD protein is a hook-capping protein and, similarly to other H. pylori flagellar proteins, is predicted to have a pI value of around 5. SDS-PAGE confirmed that full length protein left in buffer A for 3 weeks at room temperature was intact and no lower molecular weight degradation ). [7]Amino acid residues used for the torsion angle calculation are highlighted in dark purple and dark blue in    products were obtained (Figure 5b).Addition of PEG 1500 (w / v 8.5 % and 25 %) to the solution of the full length protein in buffer A or in the SPG buffer at pH values from 6 to 9 did not provoke any protein degradation after 3 weeks, as well.A lower molecular weight product at around 14 kDa was only observed in the sample left in the SPG buffer at pH 4 (Figure 5b).For XcFlgD a time-course auto-cleavage experiment was performed at 298K. [5]It was found that XcFlgD showed a tendency to degrade to 20 kDa and 15 kDa products after 5 days, while after 10 days the 20 kDa band disappeared and only a band at approximately 15 kDa remained.All these findings suggest that components of the mother liquor do not affect protein stability in the solution and that HpFlgD_26695 shows greater stability during the 3-week period in comparison to its ortholog from X. campestris.It was hard to mimic in solution the cleavage events that happened during crystallization without any protease treatment.

CONCLUSIONS
HpFlgD_26695 was crystallized in a new crystal form in the monoclinic, P2 space group.As in the previously reported crystal structures of truncated HpFlgD, the protein is missing the N-and C-terminus portions and is composed of a globular region divided in the two conserved domains, Fn-III and tudor.The main structural difference among HpFlgD structures occurs in the loop region (Asp245 to Lys248) of the tudor domain.Comparing the structure with other flagellin D orthologues, the most significant discrepancy is found in the mutual orientation of the two domains.Limited proteolysis experiments with trypsin showed that a protein fragment of a similar size as in the crystal could be obtained, but it probably lacks the last three -strands of the tudor domain and hence is unable to form the tetramer.Timecourse stability experiments produced a lower molecular weight (15 kDa) product only when the SPG buffer at pH 4 was used, suggesting higher protein stability in solution then in other flagellin D orthologues.

Figure 2 .
Figure 2. HpFlgD_26695_m tetramer with monomer A shown in purple and monomer B in pink color.Beta sheets (9 and 12) of the two tudor domains from neighbouring monomers responsible for tetramerization are labelled in white (in monomer A) and black (in monomer B).

Figure 3 .
Figure 3. Superposition of the monomers from the crystal structures of HpFlgD_26695_m_A (purple), HpFlgD_26695_t (gold, PDB code: 4ZZF) and HpFlgD_G27_m_A (light blue, PDB code: 4ZZK).).[8]The r.m.s.d.s. for the superposition of 146 aligned C  atoms of HpFlgD_26695_t on HpFlgD_26695_m_A and 146 aligned C  atoms of HpFlgD_G27_m_A on HpFlgD_26695_m_A are 0.44 Å and 0.27 Å, respectively.A black arrow shows the region with major structural differences in all three HpFlgD structures.
HpFlgD_26695_m and HpFlgD_G27_m are found for hydrogen bonds between the [O2] atom of Glu265 and [O] atom of Ser241, connecting monomers C[-x, y, -z] and D, B and A, D[x-1, y, z-1] and C, with values of 0.18 Å, 0.20 Å and 0.35 Å, respectively, however these differences are not significant.Moreover, when comparing the tetragonal and monoclinic crystal structures of HpFlgD_26695, the hydrogen and salt bridges bond length differences are even greater, particularly in case of Glu265 [O2]•••Arg252 [NH2] and Glu265 [O2]•••Ser241 [O] contacts where the range of differences vary between 0.69-0.88Å and 0.40-0.59Å, respectively.Dimerization of monomers D and C[-x, y, -z] through the 12 and 9 sheets, respectively, creates the largest interface area that corresponds to 517 Å 2 and involves 10 hydrogen bonds and 1 salt bridge (Table

Figure 4 .
Figure 4. (a) Different spatial orientation of the two domains in HpFlgD_26695_m (gold colour) and PaFlgD (light blue color, PDB code: 3OSV_A).).[7]Amino acid residues used for the torsion angle calculation are highlighted in dark purple and dark blue in HpFlgD_26695_m and PaFlgD, respectively.(b) Overlaied tudor domains of PaFlgD and HpFlgD_26695_m.The r.m.s.d. for the superposition of 43 aligned C  atoms of PaFlgD on HpFlgD_26695_m is 2.06 Å.
Figure 4. (a) Different spatial orientation of the two domains in HpFlgD_26695_m (gold colour) and PaFlgD (light blue color, PDB code: 3OSV_A).).[7]Amino acid residues used for the torsion angle calculation are highlighted in dark purple and dark blue in HpFlgD_26695_m and PaFlgD, respectively.(b) Overlaied tudor domains of PaFlgD and HpFlgD_26695_m.The r.m.s.d. for the superposition of 43 aligned C  atoms of PaFlgD on HpFlgD_26695_m is 2.06 Å.

Figure 6 .
Figure 6.Analytical gel filtration chromatogram of the trypsin digested HpFlgD_26695 (green curve; MW of the monomer ~14 kDa) which elutes as a single peak at the volume of 17.11 mL corresponding to the molecular weight of ~15 kDa.

Figure 7 .
Figure 7. Overlayed structures of the monomers of the HpFlgD_26695 (purple) and modeled full length HpFlgD_26695 (orange).Black arrows indicate possible positions for trypsin cleavage (Arg121, Lys246) in both structures and the first and last residues visible in the crystal structure of the monoclinic form of HpFlgD_26695 (Asn127, Lys272).The r.m.s.d. for the superposition of 112 C  atoms of modeled full length HpFlgD_26695 on HpFlgD_26695_m_A is 2.13 Å

Table 1 .
Statistics of data collection, processing and refinement.Values for the outer shell are given in parentheses.