Structural and binding characterization of the LacdiNAc-specific adhesin (LabA; HopD) exodomain from Helicobacter pylori

Helicobacter pylori (H. pylori) uses several outer membrane proteins for adhering to its host's gastric mucosa, an important step in establishing and preserving colonization. Several adhesins (SabA, BabA, HopQ) have been characterized in terms of their three-dimensional structure. A recent addition to the growing list of outer membrane porins is LabA (LacdiNAc-binding adhesin), which is thought to bind specifically to GalNAcβ1-4GlcNAc, occurring in the gastric mucosa. LabA47-496 protein expressed as His-tagged protein in the periplasm of E. coli and purified via subtractive IMAC after TEV cleavage and subsequent size exclusion chromatography, resulted in bipyramidal crystals with good diffraction properties. Here, we describe the 2.06 Å resolution structure of the exodomain of LabA from H. pylori strain J99 (PDB ID: 6GMM). Strikingly, despite the relatively low levels of sequence identity with the other three structurally characterized adhesins (20–49%), LabA shares an L-shaped fold with SabA and BabA. The ‘head’ region contains a 4 + 3 α-helix bundle, with a small insertion domain consisting of a short antiparallel beta sheet and an unstructured region, not resolved in the crystal structure. Sequence alignment of LabA from different strains shows a high level of conservation in the N- and C-termini, and identifies two main types based on the length of the insertion domain (‘crown’ region), the ‘J99-type’ (insertion ~31 amino acids), and the H. pylori ‘26695 type’ (insertion ~46 amino acids). Analysis of ligand binding using Native Electrospray Ionization Mass Spectrometry (ESI-MS) together with solid phase-bound, ELISA-type assays could not confirm the originally described binding of GalNAcβ1-4GlcNAc-containing oligosaccharides, in line with other recent reports, which also failed to confirm LacdiNAc binding.


Introduction
Helicobacter pylori (H. pylori) is a Gram-negative microaerophilic bacterium with a strict tropism for the human gastric mucosa. H. pylori establishes chronic infection in the stomach despite the hostile environment with its acidic conditions and the abundance of proteolytic enzymes. As a result of this adaptation, approximately half the world's population is infected with the bacterium (Brown, 2000;Odenbreit, 2005). Among other complex adaptive mechanisms, H. pylori, using its helical shape and flagella, penetrates the mucus layer and adheres to the underlying gastric epithelium. The adherence is mediated by highly specific interactions between outer membrane proteins of the bacterium, called adhesins, and carbohydrate or other structures present on the surface of the epithelial cells (Salama et al., 2013).
It is estimated that H. pylori expresses approximately 60 outer membrane porins, of which 21 share extended sequence similarity in their Nand C-termini (Odenbreit, 2005) and at least seven are thought to mediate the attachment of the bacterium to the gastric surface epithelium (Ilver et al., 1998) (Odenbreit et al., 1999) (Peck et al., 1999) (Mahdavi et al., 2002) (Rossez et al., 2014) (Javaheri et al., 2016). The contribution of some of these proteins to bacterial adhesion and establishment of infection and chronic colonization is better understood than others.
The best characterized H. pylori adhesin is the Blood group antigenbinding adhesin, or BabA (Boren et al., 1993) (Ilver et al., 1998). However, not all H. pylori strains express BabA, and not all strains expressing BabA demonstrate affinity for Le b (Hennig et al., 2004). This has been attributed to the high sequence variability observed in the extracellular domain of the protein (Nell et al., 2014). Two different groups have described the crystal structure of BabA from different H. pylori strains (Naim Hage et al., 2015a;Moonens et al., 2016). BabA from strain J99 is known to have affinity for Le b . Recently, indications of additional, previously unknown binding sites of the protein, have been reported (El-Hawiet et al., 2018).
Although BabA is thought to be responsible for the initial infection by H. pylori due to its affinity for Le b receptors found on healthy stomach epithelium, persistent colonization of the stomach by the bacterium induces chronic inflammation, leading to the expression of sialylated glycan receptors, such as sialyl-Le x and sialyl-Le a . The loss of Le b receptors is accompanied by a loss in babA expression and induction of the expression of a different adhesin, sialic-acid binding adhesin or SabA, which was identified to bind the sialyl-Le x receptor. This strategy allows H. pylori to maintain chronic infection (Magalhães and Reis, 2010;Mahdavi et al., 2002). SabA from H. pylori strain 26695 was the first H. pylori adhesin to be crystallized and structurally described (Pang et al., 2014), and later refined by Coppens and co-authors (Coppens et al., 2018).
The third structurally characterized H. pylori adhesin was HopQ from the strain G27 (Javaheri et al., 2016). In contrast to BabA and SabA, this adhesin does not have affinity for glycan receptors, but instead interacts with certain carcinoembryonic antigen-related cell adhesion molecules (CEACAMs). As in the previous examples, only the extracellular domain of the protein was expressed and crystallized. The crystal structure revealed similarity of the 'head' region among the three proteins, however HopQ lacked the prominent angle between the 'head' and 'handle' region seen in the other two, giving the exodomains of the latter two an L-shaped appearance. The 'crown' region, which was absent in SabA, was present in HopQ, but contained only two β-sheets instead of four as in the case of BabA. It was found that although the HopQ 'crown' region of the protein played a role in the binding, it did not constitute the full binding site (Javaheri et al., 2016). While SabA, BabA and HopQ are increasingly well understood, this is not the case for other H. pylori adhesins. For example, we lack a complete understanding of the contribution of the paralog proteins BabB (Matteo et al., 2011) and SabB (Talarico et al., 2012) to bacterial adhesion. For some other adhesins, including LabA (Rossez et al., 2014), AlpA, AlpB (Odenbreit et al., 1999) and HopZ (Peck et al., 1999), there are no crystal structures, and in the case of AlpA, AlpB and HopZ, the receptors have not yet been identified. Generation of protein crystals and analysis of their three-dimensional conformation could prove particularly useful in the investigation of their function and the identification of ligands.
Here, we describe the crystal structure of the extracellular domain of LacdiNAc-binding adhesin, or LabA, from H. pylori strain J99. LacdiNAc (GalNAcβ1-4GlcNAc) is a carbohydrate structure presented on the gastric mucous cells, carried by the mucin MUC5AC, which has been suggested as a receptor for H. pylori adherence (Rossez et al., 2014). We also present ligand binding studies which suggest that LacdiNAc may not be the physiological ligand for this adhesin.

Constructs and periplasmic protein expression
The pOPE101 plasmid (PROGEN Biotechnik GmbH, Germany) construct and protocol for periplasmic expression in E. coli XL10-Gold which had previously been developed and optimized for BabA J99 (Naim Hage et al., 2015b) was also used for the expression of different versions of LabA J99 (Genbank Acc. No. AAD05605.1), as described in (Paraskevopoulou et al., 2019) and LabA 26695 (Genbank Acc. No. AE000511.1) (constructs see Fig. 1A and B). The following primers were used for cloning LabA 21-517K 26695 into pOPE101: FOR PvuII: 5 0 -CAGTAGCAGCTGGAAGACAACGGCTTTTTTGTG-3 0 and REV (BamHI) 5 0 -GCTGCTGGATCCCTTCTTCTTCTTCTTCTTGAGTTCTTGACTCCTA-GATTG-3' (restriction sites underlined). The reverse primer introduces a hexalysine tag at the C-terminus of the recombinant protein. All oligonucleotide primers were from Merck, UK. The sequences of all recombinant plasmids were confirmed by Sanger sequencing (Source BioScience, Nottingham). The only modification applied was the use of 0.2 mM Isopropyl-β-D-thiogalactopyranoside (IPTG) for induction of protein expression, instead of 0.1 mM IPTG used previously. The proteins were expressed in 6 L of bacterial culture prior to harvesting and reconstituted with 600 mL of each of the two different cell lysis buffers as described in (Paraskevopoulou et al., 2019). In total, 1.2 L of combined periplasmic extracts were collected for each protein preparation. The redesigned construct (Fig. 1B) with a TEV cleavage site for LabA 47-496 J99 was obtained by gene synthesis by ThermoScientific Fisher (GeneArt) (sequence available in supplementary data), and subcloned into pOPE101.

Immobilized metal-ion affinity chromatography (IMAC)
The combined periplasmic extracts were incubated for a maximum of 2 h with 5 mL of Ni Sepharose 6 Fast Flow resin (Cytiva, USA) at 4 C and then loaded on a gravity Econo-Column® chromatography column. The flowthrough was collected and the column was washed with ten column volumes (CV) of washing buffer, consisting of 20 mM Tris-Cl, pH 7.4 and 300 mM NaCl. Finally, the protein was consecutively eluted from the column with 3-5x CV of 20, 40, 100 and 200 mM and 10x CV of 500 mM imidazole in washing buffer. The protein content of the different fractions was analyzed with electrophoresis; the proteins were separated on NuPAGE 4-12% Bis-Tris protein gels (Thermo Fisher, USA) and the gels were stained with InstantBlue™ (Expedeon, UK).

TEV cleavage
Approximately 30 mg of IMAC purified protein, at a concentration 1 mg/mL, were mixed with 100 μL of 3 mg/mL Tobacco Etch Virus (TEV) protease (in-house AstraZeneca product). The cleavage reaction was left to happen overnight at 4 C in a dialysis setup. The reaction mixture was inserted in Spectrum™ Spectra/Por™ 1 RC dialysis membrane tubing made of regenerated cellulose dialysis with a MWCO of 6000-8000 Da (FisherScientific, USA), and was dialysed against 5 L of 20 mM Tris-Cl pH 7.4, 300 mM NaCl, in order to remove any residual imidazole from the IMAC elution fractions.

Subtractive IMAC
The overnight reaction mixture was incubated for 1 h at 4 C with 1 mL of Ni Sepharose 6 Fast Flow resin and then loaded on a gravity column. The flow through was collected and the column was washed again with 10 CV. The contents of the column were then eluted with increasing imidazole step gradient, at 20, 40, 250 and 500 mM imidazole. Each eluted fraction was 5 CV. The collected fractions were analyzed with electrophoresis followed by InstantBlue staining, as previously.

Size exclusion chromatography
The purest IMAC fractions were concentrated to 5 mL, using a Vivaspin sample concentrator with a molecular weight cut-off of 30,000 Da (Merck, UK). The concentrated protein samples were loaded onto a HiLoad 16/60 Superdex 75 (120 mL) gel filtration column (Cytiva, USA), previously equilibrated with buffer containing 25 mM Bicine, pH 8.4, and 150 mM NaCl, connected to an € AKTA purifier system (Cytiva, USA). The buffer was chosen based on buffer optimization results (data not shown). The flow rate was set at 1 mL/min. The fractions containing protein were analyzed with electrophoresis and InstantBlue staining, as previously.

Liquid chromatographymass spectrometry
The molecular weight of purified proteins was determined by liquid chromatography (LC)time-of-flight (ToF) mass spectrometry (MS).
Approximately 5 μg of purified protein sample was loaded onto an Agilent 1100 Series LC (Agilent Technologies, USA) which was coupled to a time-of-flight Q-ToF Premier mass spectrometer (Waters, USA), equipped with an electron spray ionizer for acquisition in a positive ionization mode. The software MassLynx (Waters, USA) was used to analyse the data.

Crystallization
Protein samples were concentrated to 20 mg/mL and centrifuged for the removal of aggregated protein, before dispensing the crystal plates.
Crystallization was performed using the sitting drop vapour diffusion method in 96-well MRC crystallization plates (Molecular Dimensions, UK) and dispensed with the assistance of the Mosquito® Robot (TTP Labtech, UK). Crystallization trials used commercial and proprietary sparse-matrix screens. Each droplet contained protein sample in 25 mM Bicine, pH ¼ 8.4, and 150 mM NaCl mixed with a precipitant solution, at a volume ratio 1:1, and was equilibrated against 50 μL of the precipitant solution at 4 and 20 C. Bipyramidal crystals for LabA 47-496 J99 grew in 28% poly(ethylene glycol) methyl ether 2000 and 0.1 M Bis-Tris (pH 6.5). Crystals appeared within one week of incubation at 20 C.

Data collection and structure determination
Crystals were loop-mounted and briefly transferred to a drop of crystallization buffer supplemented with 20% ethylene glycol for cryoprotection, before flash-freezing in liquid nitrogen. X-ray diffraction data were collected at a temperature of 100 K at Diamond Light Source beamline I04 (wavelength 0.9795 Å) and were indexed and integrated using the XDS package (Kabsch, 2010). Anisotropy correction was applied to the unmerged data using the Staraniso server (Burg et al., 2008), which resulted in a resolution cut-off of 2.06 Å after scaling with AIMLESS (Evans and Murshudov, 2013). Phases were obtained by molecular replacement using the structure of BabA J99 (PDB ID 4ZH7) as a search model in the program phaser (McCoy et al., 2007). The model was completed and refined in iterative cycles of manual rebuilding in the graphics programme Coot (Emsley et al., 2010) and reciprocal space refinement using Refmac5 (Murshudov et al., 2011) and Buster (Bricogne et al., 2011). RMSD values between LabA, BabA and HopQ were determined using UCSF Chimera (v 1.14), developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from NIH P41-GM103311. The MatchMaker function was used with a 2 Å cutoff for pruning and default settings (Needleman-Wunsch algorithm for alignment, BLOSUM-62 alignment matrix, and a gap extension penalty of 1). and incubated with LabA 21-496 and LabA 21-496 -6 K at a concentration range from 5 to 20 μg/mL. After washing off unbound glycans and proteins, the wells were sequentially incubated, with three washes between the steps, with a mouse-anti-c-Myc biotinylated antibody (AbD Serotec, USA) and the conjugate streptavidin-HRP (AbD Serotec, USA), both at a 1:2000 dilution, in order to complex with bound recombinant LabA and enable tetramethylbenzidine chromogenic detection. Absorbance was measured at 450 nm on a Spark® 10 M multimode microplate reader (Tecan, Switzerland).

Electrospray ionization mass spectrometry (ESI-MS) affinity measurements
LabA 26695 and a single chain antibody (which served as P ref ) were each buffer-exchanged into 200 mM aqueous ammonium acetate (pH 6.8) using a 10 kDa MW cutoff Amicon Ultra-4 centrifugal filter (Millipore Corp, Bedford, MA). Protein concentrations were estimated by UV absorption (280 nm). These stock solutions were stored at À20 C until used.
All ESI-MS measurements were performed in positive ion mode using a Synapt G2 ESI-quadrupole-ion mobility separation-time-of-flight mass spectrometer (Waters UK Ltd., Manchester, UK), equipped with a nanoflow ESI source. The nanoESI tips were produced from borosilicate capillaries (1.0 mm o.d., 0.78 mm i.d.) pulled to~5 μm outer-diameter using a P-1000 micropipette puller (Sutter Instruments, Novato, CA). For each measurement, approximately 10 μL of sample solution (containing LabA 26695, glycan and P ref ) was loaded into the nanoESI tip. To perform ESI, a voltage of~1 kV was applied to a platinum wire in contact with the solution. Mass spectra were acquired using a sampling cone voltage of 30 V and an extraction cone voltage of 2 V. The source pressure was 3.2 mbar and the temperature was 60 C. The source wave velocity and wave height were 200 m s À1 and 0.2 V, respectively. Gas flow rates were 2 mL min À1 in Trap, 180 mL min À1 in helium cell and 90 mL min À1 in ion mobility cell. Ions were transmitted through the Trap and Transfer ion guides using voltages of 5 V and 2 V, respectively. At least 150 scans were collected for every acquisition. Data acquisition and processing were carried out using MassLynx (v 4.1).
Association constants (K a ) for the interactions between LabA 26695 and the glycan ligands were quantified using the direct ESI-MS assay (Kitova et al., 2012a). Briefly, K a was calculated from the abundance ratio (R) of the ligand-bound (PL) to free protein (P) ions, after correction for nonspecific ligand binding (El-Hawiet et al., 2012), and the initial concentrations of protein ([P] 0 ) and ligand ([L] 0 ), eq (1): where R is taken to be equal to the corresponding concentration ratio ([PL]/[P]) in solution, eq (2): The reported affinities are average values from four replicate measurements performed at three different protein/glycan concentrations.

Data availability
The structure presented in this paper has been deposited in the Protein Data Bank (PDB) with the following codes: 6GMM. All remaining data are contained within the article or the supporting information.

Expression and purification of LabA J99
LabA 21-496 from H. pylori strain J99 was produced by periplasmic expression in E. coli as recently described (Paraskevopoulou et al., 2019), with C-terminal tags for solubility enhancement, immunological detection and affinity purification. The resulting protein was purified by immobilized metal-ion affinity chromatography (IMAC, Fig. S1A) and size exclusion chromatography (SEC, Figs. S1B and C). Although the IMAC and SEC protein fractions appeared pure after analysis with electrophoresis and the SEC chromatogram contained one main peak, mass spectrometry revealed multiple lengths of LabA 21-496 J99 (Fig. S1D). The main species of 54,580 Da corresponds to the expected molecular weight of the protein including four cloning-derived amino acids (QVQL-) at the N-terminus of the protein. In addition, two smaller secondary products were observed; with a mass of 53,144 Da and 51,509 Da, corresponding to 9-and 24-amino acid N-terminally truncated products, respectively. Initial sparse-matrix crystallization screening yielded microcrystals in a range of different conditions. Despite optimization of the conditions and micro-seeding, the crystals remained very thin needles, and did not generate X-ray diffraction pattern for their structural analysis (Fig. S1 E and F). To obtain more suitable protein crystals, by addressing the heterogeneous processing of the N-terminal sequence during expression, a Tobacco Etch Virus (TEV)-protease cleavage site was introduced downstream of a His Tag and of the amino acids previously found to be cleaved ( Fig. 1 A and B), and all C-terminal tags removed. This new variant results in a protein missing the first 46 amino acids (LabA 47-496 ) after TEV cleavage.
Following recombinant expression, the protein was cleaved with TEV-protease. The two-step purification was repeated for the removal of the enzyme and the protein was analyzed (Fig. 1C-E). TEV and uncleaved protein were removed by subtractive IMAC, as shown in Fig. 1C.
Although the SEC chromatogram of the protein sample indicated the presence of two different products (Fig. 1F), analysis of the protein fractions with SDS-PAGE revealed relatively good purity (Fig. 1E). The mass spectrum of LabA 47-496 (Fig. 1F) revealed two products in the protein sample; the molecular mass of 48,922 Da is consistent with the calculated 48,760, corresponding to LabA 47-496 with an additional Nterminal GS (Gly-Ser) added by the cloning and left after restriction with TEV (ENLYFQ*G, cleavage site indicated by *). However, the ratio of the main (48,739 Da) product to the secondary product (48,922 Da) was increased and the purity of the protein sample used for protein crystallization screening was enhanced.

Crystallization screening of improved construct
Crystallization trials of the LabA 47-496 construct were performed in 384 different sets of conditions in sparse-matrix screens. Within seven days, protein crystals were obtained in a crystallization drop containing 28% poly-(ethylene glycol) methyl ether 2000 and 100 mM Bis-Tris at a pH 6.5 and in the absence of salt, on a plate stored at 20 C. The crystals displayed a bipyramidal morphology with maximum dimensions of 60 Â 150 μm (Fig. 1G) and further optimization was not required. A single crystal ( Fig. 1H) was subjected to X-ray diffraction data collection (Fig. 1I) and produced a dataset to 2.06 Å resolution (Fig. 1J).
The crystals belonged to the space group P41212, with unit cell dimensions of α ¼ 60 Å, b ¼ 60 Å and c ¼ 265 Å and contained one molecule of protein per asymmetric unit. The protein structure was solved by molecular replacement, using the structure of BabA J99 as the search model, since the two amino acid sequences have 58.62% identity (53.85% for the exodomains). The crystal parameters, as well as data processing and structure refinement statistics are shown in Table 1. The initial model was iteratively improved by rigid body and restrained refinement interspersed with real space model building with Coot (Emsley et al., 2010) (Fig. 2). The final atomic model ran from residues Q47 to L496, with the addition of the two residues added downstream of the C-terminus of the protein, K497 and G498. However, there were two disordered loops not visible in the electron density map; these were between N211-E224 and G380-D388 ( Fig. 2A). Both stretches are also predicted as unstructured by intrinsically disordered structure prediction algorithms, such as Globplot2.3 (Linding et al., 2003), IUPRED2 (M esz aros et al., 2018) (using 'short disorder' prediction type) and PONDR (Romero et al., 1997) (only N211-E224). The crystallographic model revealed that LabA J99 contains two main regions comprising mostly α-helices, called 'handle' and 'head' regions, and a smaller third region comprising two β-strands, termed the 'crown' region, in analogy to the structure of BabA J99 (Naim Hage et al., 2015a). The 'handle' region contained both the N-and C-termini of the protein; in more detail, one N-terminal (α-N) and one C-terminal (α-C1) α-helices of similar lengths formed a two-helix antiparallel coiled coil bundle. This C-terminal α-helix was followed by a two-stranded antiparallel β-sheet (β-C) before ending with a short α-helix (α-C2), which had an almost antiparallel orientation to the N-terminal α-helix. The two terminal helices protruded from the head region at right angles and were connected to the predicted transmembrane β-barrel domain. The 'head' region comprised a 4 þ 3 α-helix bundle, also found in other H. pylori adhesins. Within this bundle, four α-helices (α-1, α-5, α-6 and α-8) formed an antiparallel coil bundle, similar to a tetratricopeptide repeat motif, at a near perpendicular angle to the 'handle' region, creating a kinked (or L-shaped) tertiary structure. The connecting features between the α-helices were: (i) A 48-amino acid segment between the α-2 and α-3 helices; this connecting segment, which extended out of the core of the head region, contained two small antiparallel β-sheet strands (β-1 and β-2); (ii) A 42-amino acid segment between the α-3 and α-4 helices; this segment contained the crown region, consisting of a two-strand antiparallel β-sheet (β-3 and β-4) between T204 and N230, and a disordered loop within the crown, between N211 and E224; (iii) A 28-amino acid loop between the α-4 and α-5 helices, with two short α-helices; (iv) A 21-amino acid loop between the α-5 and α-6 helices; (v) A 28-amino acid segment between the α-7 and α-8 helices, containing a short α-helix and a disordered loop between G380 and D388. LabA has three disulphide bonds (C127-C157, C250-C279, C370-C395), all three located in the head region ( Fig. 2A).
When comparing the full-length structures for LabA and BabA, the RMSD between 217 pruned atom pairs (cutoff used 2 Å) is 0.921 Å; without pruning, the RMSD across all 402 pairs rises to 5.493 Å. Using the central ('head') part of the molecule (ranging from S57 to V462 in BabA) results in a low RMSD of 0.874 Å for 208 pruned atom pairs (without pruning: 4.959 Å). The pruned atoms in the 'head' region correspond to the inserted 'crown' region (G170-D257, RMSD 16.832 Å), as the crown is much smaller and contains an unstructured region in LabA. Without pruning, the RMSD for AA 57-462 rises to 4.780 Å. Matching the handle regions D27-P56 and V461-K528 gives RMSD values of 0.465 and 1.147 Å, respectively. Overall, these results show that the handle and head regions are very similar between LabA and BabA, but are connected at slightly different angles, with the largest discrepancies between the two structures found in the crown region.
Despite the relatively low sequence identity, all these proteins share high conformational similarity in their head regions, where a 4 þ 3 α-helix bundle was present; however, significant differences were observed in the crown region, which may be related to each protein's ligand binding specificity. The 'crown' regions of LabA and HopQ (called 'insertion domain' in (Bonsor et al., 2018)) comprises a short and long β-sheet, respectively, while a crown structure is altogether absent in SabA. Differences were also observed in the handle region of the adhesins. The original structure of SabA 26695 (4O5J) lacked the β-sheet (β-C) present in LabA and BabA, as the C-terminal last 61 amino acids residues were not visible in the structure (Pang et al., 2014). The more recent structure of SabA J99 (6GW5) by Coppens et al. (2018) however includes a short hairpin (residues 393-438) matching the longer β-C domain in BabA J99 (4ZH0). Rossez et al. (2014) indicated that LabA is a LacdiNAc-specific adhesin, which mediates the adhesion of H. pylori to human gastric mucins. To verify this binding ability, a sandwich-type ELISA was carried out in order to test the binding of the recombinantly expressed LabA 21-496 and LabA 21-496 -6 K (both from J99 strain; the 6 K-variant has an additional hexalysine tag see Fig. 1A) to LacdiNAc and the ligands that are well known to be H. pylori receptors, Lewis b and sialylated-Lewis x . However, in contrast to BabA, which was used as positive control, neither of the two proteins showed binding to any of the ligands, at concentrations up to 20 μg/mL (Fig. 4).

ELISA-type and ESI-MS analysis of ligand binding by recombinant LabA
Native Electrospray Ionization Mass Spectrometry (ESI-MS) was initially performed on aqueous ammonium acetate (50 mM, pH 6.8, 25 C) solutions containing LabA 21-496 26695 (4.6 μM) in the absence and presence of LacdiNAc. A reference protein (P ref , 0.4 μM) was added to the latter solution to correct the mass spectrum for the occurrence of nonspecific ligand-protein binding during the ESI process (Wang et al., 2003). However, due to the presence of multiple N-terminally truncated forms of LabA 26695 occurring in different amounts, interpretation of data was difficult (data not shown). Because of the difficulty with interpreting binding to multiple truncated forms of LabA 21-496 26695, the new variant LabA 47-496 , equivalent to that which had enabled crystallization, was examined.
Binding of this newly produced LabA 47-496 26695 with LacdiNAc was measured. Fig. 5A shows a mass spectrum of an aqueous ammonium acetate (100 mM, pH 6.8, 25 C) solution of LabA 47-496 26695 (2.5 μM), P ref (3.2 μM) and LacdiNAc (15 μM). Only one isoform of LabA was detected, with a MW of 51,277 Da. After correction of nonspecific binding using the reference protein method (Kitova et al., 2012b;Sun et al., 2006), the affinity of LacdiNAc for LabA was found to be 3088 AE 252 M À1 (mean AE s.d.). We next decided to investigate binding to chito-oligosaccharides due to the structural similarities to LacdiNAc. The affinities of chitotriose, chitotetraose and chitohexaose (structures see Fig. S2) were also measured. Shown in Fig. 5B is a mass spectrum obtained from an aqueous ammonium acetate (100 mM, pH 6.8, 25 C) solution with LabA 26695 (2.5 μM), P ref (3.2 μM) and three chito-glycans (each 48 μM). The affinities for the three oligosaccharides ranged from 1000 M À1 to 2000 M À1 .
Having established only weak binding affinity of LabA 47-496 26695 and no binding at all of LabA 21-496 J99 in ELISA-like format (Fig. 4) for LacdiNAc, we next decided to screen a set of 35 human milk oligosaccharides (HMO1-HMO35, see Table S1), all of which contain Lac and the Table 1 Crystallographic data processing and refinement statistics. Values in parentheses are for the highest resolution shell. R factor ¼ Σ hkl ||F obs | -|F calc ||/ Σ hkl |F obs |. R free is the cross-validation R factor computed for the test set of 5% of unique reflections. CC 1/2 is the Pearson correlation coefficient between the average intensities of two subsets containing randomly selected halves of the measurements for each unique reflection (Karplus and Diederichs, 2012). Ramachandran analysis was performed with MolProbity (Lovell et al., 2003) and the Z-Score was calculated by WHAT_-CHECK (Hooft et al., 1996). majority contain LacNAc, in an attempt to identify potential binders with higher affinity (for schematic structures, see Fig. S2). A representative example for HMO1 is shown in Fig. 5C, and the results of this screening are summarized in Fig. 5D and Table S1. Notably, LabA was found to bind all 35 HMOs, although with uniformly weak affinity (ranging from 1700 to 5000 M À1 ).

Discussion
While our initial attempts to crystallize LabA protein were unsuccessful, removal of the C-terminal tags and addition of an N-terminal TEV cleavage site resulted in suitable crystals. It has been reported that excessive solubility can play a role in making protein crystallization more challenging. In particular, highly soluble protein versions bearing a Cterminal pentalysine tag have been found to yield small needle-shaped crystals, similar to those obtained by us here (Fig. 1S) inappropriate for X-ray diffraction, even when high protein concentrations were achieved (Islam et al., 2015). Homogeneity of the recombinant protein was improved by introducing a TEV cleavage site in the N-terminus. However, truncation within the TEV cleavage site by periplasmic proteases had to be prevented; for this reason, the TEV cleavage site was introduced after the first nine amino acids in the N-terminus of the protein, which was the main area of non-specific proteolytic cleavage during periplasmic expression.
At the same time, limiting the proportion of unstructured regions in the protein was also considered during the design of the new expression construct. It was known from the crystal structure of BabA J99 that the three C-terminal tags, used for the enhancement of solubility, detection by Western Blotting and purification with IMAC, were not visible in the crystal structure. This confirmed the conformational flexibility of these sequences. We removed the C-terminal hexalysine and c-Myc-tags and moved the hexahistidine tag needed for IMAC to the N-terminus, upstream of the TEV cleavage site. This approach resulted in the new recombinant LabA variant which yielded protein crystals sufficient for X-  (Bond and Schüttelkopf, 2009)). Dotted areas indicate amino acids missing from the crystal structure, coinciding with predicted areas of intrinsic disorder. Cysteines and corresponding disulphide bonds are highlighted in green. The two C-terminal amino acids (KG) are cloning-derived. ray crystallography during initial screening already, without the requirement for further optimization.
Overall, the protein adopts a similar structure to the previously described adhesins of H. pylori.
Despite low amino acid sequence similarity, all four crystallized H. pylori adhesins (LabA, BabA, SabA, and to a lesser extent HopQ) adopt a three-dimensional L-shape, suggesting that this structure may fulfil some yet to be discovered functional role. The highest similarity of the LabA amino acid sequence to that of BabA was corroborated by the superimposition of their crystal structures and the obtained low RMSD values. The similarity among all adhesins was most pronounced in the handle and head and particularly low in the crown region, where the glycan binding site of BabA is found. While this explains the lack of binding affinity of LabA for Le b , it does not provide any information about the actual binding site of this protein. In the absence of a ligand, the crown appears partially disordered in most adhesin structures.
A multiple alignment of HopD/LabA sequences from 84 different strains shows that LabA is highly conserved at both N-and C-termini, with the exception of a central region (amino acids 201-226) (Fig. S4) in which most of the inter-strain variability is found. This region overlaps with the unstructured region in our J99 LabA (N211-E224) and coincides with the position of the insertion domains in BabA and HopQ. With regards to this variable region, HopD/LabA sequences appear to fall into two separate categories, the H. pylori 'J99-type', with an overall total length of approx. 686-691 amino acids and a shorter insertion domain of approx. 31 amino acids, and the H. pylori '26695 type', with an approximate total length of 702-711 amino acids and a longer insertion domain of approx. 46 amino acids. In the absence of a ligand co-crystal structure, the exact significance of these structural variations between strains remains unclear. While BabA also shows strain variation in the crown region/insertion domain, the variations in this region are stronger in BabA, which has led to the identification of two hypervariable/diversity loops ((N Hage et al., 2015)  ). In contrast, in LabA, there is almost no variation within the 'J99-type' or '26695-type' insertion domains. It is interesting to note in this context that in the original paper by Rossez and coauthors (Rossez et al., 2014) two strains suggested to bind to LacdiNac (B128 and 26695) had the longer insertion domain ('26695 type'), while the J99 strain, with the shorter insertion domain, also were thought to bind to LacdiNAc, as suggested by the strong inhibition in the presence of 0.5 mM soluble disaccharide. This would appear to suggest that the ability to bind to LacdiNAc is not related to the type of insertion domain present in LabA, or that the shared properties of short vs. longer insertion domains are sufficient to mediate ligand binding. However, our ESI-MS screening indicated that LabA only weakly binds to Lac/LacNAc-containing oligosaccharides, without any clear specificity. Furthermore, we were not able to detect any binding to LacdiNAc-HSA in an ELISA-type assay. We do not believe that the omission of the first 46 N-terminal amino acids in the improved construct can be responsible for the lack of LacdiNAc binding for two reasons. First, the longer LabA 21-427 variant tested in ELISA-type format also did not bind to the putative ligand. The first 20 amino acids correspond to the signal peptide. Secondly, as known from the existing crystal structures of BabA (4ZH7) and SabA (6GW5, 4O5J), and in line with the obtained structure for LabA (6GMM), the N-terminal region is located close to the membrane insertion domain, which is buried deep in the outer membrane of H. pylori, and is a very unlikely binding site, as it would not be able to access its putative ligand on epithelial cells.
Because of our inability to detect any binding to LacdiNAc, we extended our study to the binding of 35 HMOs with similar structures and  to three chito-oligosaccharides of different lengths (Fig. 5). The latter only differ from LacDiNAc in the axial vs. equatorial orientation of the hydroxy group in the C4 position (Fig. S2). Binding of LabA to all the tested oligosaccharides was invariably found to be non-specific and of low affinity (K a values summarized in Table S1).
This lack of binding to LacdiNAc is in good agreement with the recent observation by Mthembu and co-workers (Mthembu et al., 2020). The authors of this recent study were not able to detect any binding of H. pylori J99 or 26695 strain (or 15 other strains tested, representing different world populations) using a variety of methods, including whole bacteria binding assays in microtiter plates. This was not due to a lack of expression or translation of LabA protein, as this could be identified by LC-MS/MS. The complete lack of binding of H. pylori bacteria to Lacdi-NAc, together with our inability to detect specific and high affinity binding of the recombinant LabA protein, suggests that LacdiNAc might not be the physiological ligand for this adhesin, and that the glycan preference of LabA remains to be elucidated. Alternatively, the function of LabA in promoting binding may be accessory, rather than direct, supporting the structure and/or binding activity of another adhesin with specificity for LacdiNAc as determined by Rossez et al. (2014), or specific for another epithelial ligand, as suggested by the work from Mthembu and co-workers (Mthembu et al., 2020). From this point of view, reintroducing the original term HopD for this protein instead of LabA is an option worth considering.

Funding and additional information
This research has been supported through joint funding from EPSRC (Grant EP/L01646X), AstraZeneca R&D and the University of Nottingham. Y.C., L.N. and J.S.K. acknowledge funding from the Alberta Glycomics Centre. Ross Overman is currently employed by Leaf Expression Systems. FHF is currently fully funded by a grant from the LOEWE DRUID (Novel Drug Targets against Poverty-Related and Neglected Tropical Infectious Diseases).

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.