An internal thioester in a pathogen surface protein mediates covalent host binding

To cause disease and persist in a host, pathogenic and commensal microbes must adhere to tissues. Colonization and infection depend on specific molecular interactions at the host-microbe interface that involve microbial surface proteins, or adhesins. To date, adhesins are only known to bind to host receptors non-covalently. Here we show that the streptococcal surface protein SfbI mediates covalent interaction with the host protein fibrinogen using an unusual internal thioester bond as a ‘chemical harpoon’. This cross-linking reaction allows bacterial attachment to fibrin and SfbI binding to human cells in a model of inflammation. Thioester-containing domains are unexpectedly prevalent in Gram-positive bacteria, including many clinically relevant pathogens. Our findings support bacterial-encoded covalent binding as a new molecular principle in host-microbe interactions. This represents an as yet unexploited target to treat bacterial infection and may also offer novel opportunities for engineering beneficial interactions. DOI: http://dx.doi.org/10.7554/eLife.06638.001


Introduction
For commensal and pathogenic bacteria, adhesion to host surfaces is a pre-requisite for colonization and infection, and is mediated by surface-presented adhesins (Pizarro-Cerdá and Cossart, 2006). Through specific interactions, these proteins can define host and tissue tropism, providing niche environments and a competitive advantage in the search for nutrients. Bacterial adhesins bind either directly to integral host cell surface components, such as integrins or carbohydrates, or they interact with components of the extracellular matrix resulting in indirect binding to receptors on the host cell surface (Kline et al., 2009). Such molecular interactions that define the host-microbe interface are generally non-covalent in nature and frequently involve extensive intermolecular interfaces and multivalent binding. The surprising discovery of internal thioester bonds in the pilus tip adhesin Cpa from the Grampositive human pathogen Streptococcus pyogenes raised the possibility of pathogen-encoded covalent adhesion (Pointon et al., 2010;Linke-Winnebeck et al., 2014). Internal thioester bonds are formed between the side chains of Cys and Gln residues, most likely self-generated by a favorable environment during protein folding. Internal thioesters have previously only been observed in mammalian complement proteins C3 and C4 (Law and Dodds, 1997) and related proteins (Dodds and Law, 1998;Lin et al., 2002;Cherry and Silverman, 2006;Wong and Dessen, 2014). Complement thioester proteins are large, multi-domain constructs that upon proteolytic activation undergo a conformational change that exposes the reactive thioester (Janssen et al., 2006). This is thought to react with nucleophiles on the surface of pathogens, thus mediating irreversible host-encoded covalent tagging of the pathogens for elimination by the host immune system. However, to the best of our knowledge, definitive evidence that internal thioesters in complement proteins deliver an intermolecular bond with a pathogen target is lacking, and it is not clear if target binding is specific or non-specific. Cpa and complement proteins share no identifiable sequence or structural relationship and therefore appear to be evolutionarily distinct. The thioester domains of Cpa are found in a much less complex structural context and do not require proteolytic activation. The only similarity to complement appears to be that the same strategy, a 'chemical harpoon', has evolved independently in bacteria and complementrelated proteins for the purpose of irreversible binding.
Internal thioesters are one of the three unexpected self-generating cross-links between amino acid side chains found in subunits of pili and other adhesins of Gram-positive bacteria (Schwarz-Linek and Banfield, 2014), with the others being intramolecular isopeptide bonds (Kang et al., 2007;Kang and Baker, 2011) and ester bonds (Kwon et al., 2014). Unlike intramolecular isopeptide or ester bonds, the role of thioesters does not appear to be protein stabilization (Walden et al., 2014). Instead, by analogy to complement, thioesters presented on bacterial surfaces may react with nucleophilic groups on host tissue targets, thus establishing pathogen-encoded covalent adhesion. Indeed, binding of S. pyogenes to mammalian cells in vitro is severely impaired when an engineered Cpa variant lacking the thioester is expressed (Pointon et al., 2010). While this does not confirm covalent attachment, it supports a role for the reactive bond in bacterial adhesion. Interestingly, a recent study showed covalent bond formation between a Cpa thioester domain and the small molecule nucleophile spermidine, confirming the eLife digest The human body is home to many trillions of microbes; most are harmless, but some may cause disease. To live inside a host, microbes must first attach to host tissues. This process involves multiple proteins on each microbe's surface, called adhesins, which interact with the molecules that make up these tissues.
Like all proteins, adhesins are long chains of simpler building blocks called amino acids, and each amino acid is connected to the next via a strong 'covalent' bond. Adhesins, however, typically attach bacteria to host molecules through the combined strength of many weak 'non-covalent' interactions.
It was recently discovered that one adhesin from a bacterium called Streptococcus pyogenes contains a rare, extra covalent bond-called a thioester-in an unusual location between two of its amino acids. S. pyogenes is a common cause of throat infections in humans, and can also cause the life-threatening 'flesh-eating disease'. Walden, Edwards et al. have now used a range of computational, biochemical, structural biology and cell-based techniques to study other adhesins that have thioester bonds in more detail. Computational searches identified hundreds of bacterial proteins containing similar bonds. These included many from bacteria that infect humans: such as Streptococcus pneumoniae, which is the most common cause of pneumonia in adults; and Clostridium difficile, which is notorious for causing severe gut infections in hospital patients. Closer examination of the three-dimensional structures of three of these proteins-including one called SfbI from S. pyogenes-revealed that each had a clear thioester bond. Biochemical tests of an additional nine of the identified proteins strongly suggested they too contained thioester bonds. Walden, Edwards et al. then showed that SfbI was able to not only attach to tissues like conventional adhesins, but also chemically react with fibrinogen: a human protein that is essential for blood clotting and commonly found in inflamed tissues and healing wounds. This chemical reaction results in the formation of a covalent bond between SfbI and fibrinogen, which is as stable as the bonds that link the amino acids in a protein chain. Further experiments revealed that SfbI strongly binds to human cells grown in the lab under conditions that mimic tissue inflammation. Finally, Walden, Edwards et al. made a mutant version of SfbI that did not contain a thioester, and found that it could not interact with fibrinogen nor bind to human cells.
Together, these findings suggest that thioesters in bacterial adhesins act like 'chemical harpoons', which microbes can use to irreversibly attach themselves to molecules within their host's tissues. This attachment mechanism has not been seen before in host-microbe interactions, and further research is now needed to explore whether interfering with this process could represent a new way to treat bacterial infections. accessibility and reactivity of the thioester (Linke-Winnebeck et al., 2014). However, the molecular mechanisms of targeting and covalent binding to a relevant host receptor have yet to be discovered.
Here we experimentally demonstrate that thioester domains are prevalent in Gram-positive surface proteins, and that they share a conserved three-dimensional structure despite being divergent in protein sequence. This suggests that thioester domains may target distinct receptors and have a widespread but currently unappreciated role in mediating pathogen adhesion to hosts. Further, we present the identification of host fibrinogen as a covalently bound target of the streptococcal surface protein SfbI. In a thioester-dependent mechanism, SfbI reacts with one specific lysine residue in fibrinogen, forming a very stable intermolecular amide bond. Using a combination of computational, biochemical, structural and cell-based assays we reveal a novel mechanism for host adhesion mediated by a pathogen-encoded covalent interaction.

Results and discussion
Identification of diverse, putative thioester-containing proteins in Grampositive bacteria The discovery of thioester-containing domains (TEDs) in Cpa prompted us to look for similar domains in other proteins. Using a TED of Cpa (Cpa-TED2) as a template, extensive PSI-BLAST similarity searches (Altschul et al., 1997)    the N-termini of proteins containing secretion signals and C-terminal LPXTG cell wall-anchoring motifs (Schneewind and Missiakas, 2014). Most TED-containing proteins are also predicted to contain intramolecular isopeptide and/or ester domains, usually present in tandem repeat arrays. Like isopeptide domains (sIPDs) in pilus assemblies (Kang et al., 2007), these repeats probably act as 'stalks' presenting the TED away from the bacterial surface. We refer to these as TIE (thioester, isopeptide, ester domain) proteins. Other domains commonly associated with putative TEDs are fibronectin-binding repeats and proline-rich regions.
Multiple sequence alignment suggests high diversity among putative TEDs, which can be divided into two classes (class I and class II) based on two indels (Figure 1-figure supplement 1). Secondary structure predictions suggest conservation of TED topology, with a central helical region comprising three core helices lying between predicted β-sheet regions. Sequence similarities are mainly limited to three short regions (Figure 1). The only fully conserved residues, the thioester-forming Cys and Gln, are found in a [YFL]CΦζ motif (where Φ is any hydrophobic and ζ any hydrophilic residue) and a weak ΦQζΦΦ motif, respectively. Both motifs are consistently predicted to reside in a β-sheet secondary structure context. A TQxxΦWΦxζ α-helical motif, where x is any residue (previously TQxA(I/V)W

Experimental validation of putative TEDs
To obtain experimental evidence for the apparent abundance and high sequence diversity of TEDs, twelve domains from eight significant human pathogens ( Figure 1) were recombinantly expressed in Escherichia coli. In addition to Cpa-TED2, three allelic variants of the S. pyogenes fibronectinbinding protein SfbI (SfbI-A40, SfbI-A346 and SfbI-A20) and its Streptococcus dysgalactiae ortholog GfbA were chosen, since their N-terminal domains, now annotated as TEDs, exert an unexplained differential effect on the uptake mechanism of streptococci by mammalian cells (Rohde et al., 2011). Other class-I TEDs expressed originate from the fibronectin-binding protein FbaB of S. pyogenes, and from TIE proteins of Clostridium perfringens, Corynebacterium diphtheriae and Streptococcus pneumoniae. Finally, three class-II TEDs from Bacillus anthracis, vancomycin-resistant Staphylococcus aureus isolate VRS11b (Kos et al., 2012) and multidrugresistant Peptoclostridium difficile CD630 (formerly Clostridium difficile) (Sebaihia et al., 2006) were also expressed.
These twelve TEDs were purified to homogeneity (Figure 1-figure supplement 2). Each protein displayed an experimental molecular mass, as determined by liquid chromatography-mass spectrometry (LC-MS), ∼17 Da lower than predicted ( Table 1). This is consistent with loss of one molecule of ammonia, as would occur upon internal thioester formation. Seven TEDs were produced as Cys to Ala variants, targeting the Cys of the [YFL]CΦζ motif. For each of these variants, the experimentally determined molecular masses conformed to predicted values (Table 1), confirming that the −17 Da differences observed for the native proteins are attributable to internal thioesters. For class-II TEDs (BaTIE, SaTIE and CdTEP) the presence of internal thioesters and the identities of the thioester-forming Gln were established by tryptic-digest LC-MS/MS of the proteins after reacting the thioester bond with the small nucleophile methylamine (Figure 1-figure supplement 3). This analysis lends confidence to our definition of domain boundaries of class-II TEDs, which often lack an obvious ΦQζΦΦ motif to define the thioester-forming Gln from sequence alone (  Table 2). Despite pairwise sequence identities as low as 12% (Table 3), implying no difference from chance, the overall structures of all TEDs solved to date are remarkably similar (Figure 2-figure supplement 1A). The three native TED structures determined here show continuous electron density between the Cys and Gln side chains predicted to form the thioesters (SfbI-A40-TED: Cys109-Gln261, PnTIE-TED: Cys94-Gln247, CpTIE-TED: Cys138-Gln267, Figure 2B). Consistent with previous data (Pointon et al., 2010;Linke-Winnebeck et al., 2014), the thioesters are largely buried at the interface between α-helical and β-barrel subdomains. The thioester Cys backbone carbonyl and amide groups form hydrogen bonds to the Gln, and in some cases Trp, side chains of the TQxxΦWΦxζ motif of the central helix, but from the structures no obvious role of this motif for thioester bond formation is apparent. One notable difference between the TED structures is the position of the loop between the first two strands of the β-barrel, which lies adjacent to the thioester and results in very different protein surfaces around this region (Figure 2A,C). Given the proximity to the thioester, we suggest that this loop may be involved in interactions that define thioester target specificity, although this has yet to be tested. Furthermore, the α-helical subdomain shows a larger degree of structural variation than the β-barrel subdomain. The structure of CpTIE-TED: Cys138Ala is essentially identical to the native protein (Figure 2-figure supplement 1B,C), confirming that internal thioesters are not structural determinants (Walden et al., 2014).

Thioester-dependent adduct formation of SfbI and FbaB with the Aα subunit of fibrinogen
We next tested the hypothesis that TEDs can target host proteins, forming intermolecular covalent bonds in a thioester-dependent manner. Of our panel of twelve TED proteins, the streptococcal protein SfbI is the most thoroughly characterized. SfbI mediates internalization of bacteria by host cells through binding to the extracellular protein fibronectin (Schwarz-Linek et al., 2006). The Nterminal domain of SfbI (revealed here to be a TED) is not strictly required for this process, but defines the uptake mechanism and is a determinant of intracellular bacterial survival (Rohde et al., 2011). It has also been reported to interact with fibrinogen (Katerov et al., 1998). We investigated if this interaction involves thioester-dependent covalent bond formation by mixing purified TEDs with fibrinogen and analyzing the resulting samples by SDS-PAGE. For all three SfbI-TED variants (SfbI-A40-TED, SfbI-A346-TED, SfbI-A20-TED), and FbaB-TED, a new band is present at a molecular mass consistent with an adduct of each TED with one fibrinogen subunit ( Figure 3A). Interestingly, a group of protein bands corresponding to the heterogeneous fibrinogen Aα chain (Mosesson et al., 1972) show depletion in the adduct samples, which is clearest for SfbI-A20-TED with the conditions used here. This suggested that SfbI-TED and FbaB-TED were covalently binding to the Aα chain of fibrinogen. When fibrinogen is incubated with the Cys/Ala variants of the SfbI-TEDs and FbaB-TED, no adduct bands are observed ( Figure 3B), suggesting their formation critically depends on the thioesters. Other TEDs did not form a fibrinogen adduct under the conditions of the assay, indicating this activity is not a generic, non-specific property of TEDs. Tryptic-digest nanoLC-MS E of excised adduct gel bands revealed the respective TEDs and the fibrinogen Aα chain as top hits following database searches (Supplementary file 1). Peptides from the    Figure 3C).

SfbI-TED specifically forms an adduct with fibrinogen Aα in blood plasma
To assess the specificity of SfbI-TED and FbaB-TED binding to fibrinogen, pull-down assays using human blood plasma were performed. TEDs were immobilized using isopeptide domain (IPD) complementation; a strategy that allowed us to use a toolbox approach for many constructs that were used in both pull-down and cell binding (described below) experiments. IPD complementation relies on the spontaneous amide bond formation between a truncated, or split, IPD of the streptococcal FbaB protein that lacks the C-terminal β-strand (sIPD), and a peptide representing this missing strand (Zakeri et al., 2012). The peptide, named here isopep-tag (iPT) is used as an expression tag fused as bait to the C-termini of TEDs (TED-iPT) while the sIPD is immobilized on sepharose beads ( Figure 4A). The advantage of this covalent pull-down strategy is the ability to greatly reduce the amount of nonspecific binding. Each of the SfbI-TEDs pulled down fibrinogen from this complex biologically relevant sample by forming an adduct with fibrinogen Aα, although to differing degrees ( Figure 4B; Figure 4-figure supplement 1). For FbaB-TED, little fibrinogen was detectable and the protein appeared to bind non-covalently to albumin. All TED Cys/Ala variants failed to pull down fibrinogen from plasma. There is no evidence for covalent binding of TEDs to any other protein in plasma pulldowns. Gel bands present in samples eluted from the beads were analyzed by MS and contained sequences matching fibrinogen, or the TED-sIPD pull-down constructs. For FbaB, the albumin band contained no TED peptides.

SfbI-TEDs and FbaB-TED specifically target fibrinogen Aα-Lys100
The fibrinogen Aα Lys residue involved in covalent bond formation was identified by searching both low and high trap collision energy nanoLC-MS E spectra for precursor and fragment ion masses consistent with calculated values for theoretical cross-links. This analysis was carried out for gel bands obtained from experiments using isolated fibrinogen and, for SfbI-A40-TED and SfbI-A20-TED, also for plasma pull-down experiments. Multiple precursor ions and peptide fragments were recovered, all of which were consistent with a single candidate nucleophilic residue, fibrinogen Aα-Lys100 (Lys81 in the mature protein) ( Figure 5; Supplementary file 1), which formed covalent links with the TED Gln residues of the ΦQζΦΦ motifs.
To further support specific targeting of Aα-Lys100 by SfbI-A40-TED, we subjected fibrinogen to acetylation in the absence or presence of SfbI-A40-TED. Acetylation results in specific modification of solvent-accessible Lys ε-amine groups. Following proteolytic digests of bands corresponding to fibrinogen Aα and the SfbI-A40-TED:fibrinogen Aα covalent adduct, we observed near-complete coverage of the Aα sequence ( Figure 5-figure supplement 1). Of all Aα-Lys residues, only Lys100 was acetylated in the free, but not the bound form of fibrinogen. One other Lys residue, Lys446, was not covered by the analyses of either free or bound fibrinogen. nanoLC-MS E spectra were scrutinized for potential cross-links between the tryptic peptide containing this Lys residue and TEDs, but no matching precursor peptide masses could be found.
All available evidence supports reaction of SfbI and FbaB TEDs with fibrinogen Aα-Lys100 exclusively. If binding were not specific for a single Lys, we would expect to observe multiple binding of TEDs to different Lys residues on a single fibrinogen molecule. We do not see evidence for such higher-order complexes in SDS-PAGE gels. Further, if only one TED can covalently bind to fibrinogen, but through different Lys residues, this would result in a mixed population of cross-linked species with the same molecular mass, but different acetylation and tryptic digest patterns. There is no evidence to support such events in our exhaustive MS analyses, which only support modification of Aα-Lys100. Bold values correspond to pairs of TEDs with known structures. Values highlighted in grey indicate that a pairwise alignment was not meaningful; the values given correspond to pairwise identities as calculated from the alignment of 54 TEDs (Figure 1-figure supplement 1). Alignments of randomized sequences commonly resulted in pairwise identities of 10-20%. Pairwise alignments were produced with BioEdit using a GONNET similarity matrix. DOI: 10.7554/eLife.06638.011 Aα-Lys100 is presented on the surface of the α-helical coiled-coil region of fibrinogen ( Figure 5-figure supplement 2). It does not participate in the cross-linking of fibrinogen that results in fibrin formation (Sobel and Gawinowicz, 1996), but is a plasmin cleavage site (Kirschbaum and Budzynski, 1990). This suggests SfbI-TEDs may also bind to fibrin with implications for fibrinolysis. Aα-Lys100 also lies in the vicinity of the integrin-binding RGDF motif of fibrinogen involved in platelet interactions (Ugarova et al., 1993).

Thioester-dependent bacterial binding to fibrin
To determine a role for thioester-mediated binding to fibrinogen by bacteria, strains of the model Gram-positive bacterium Lactococcus lactis expressing either SfbI-A40 or the corresponding SfbI-A40: Cys109Ala variant were produced. Immunogold labeling confirmed the presence of wildtype and variant SfbI at similar levels on the surface of L. lactis ( Figure 6-figure supplement 1). Since fibrinogen Aα-Lys100 is not involved in fibrin formation, it was possible to use a fibrin-based assay for visualization of bacterial binding by electron microscopy. L. lactis expressing SfbI-A40, but not bacteria expressing the Cys109Ala variant, show intimate adherence to both single fibrin fibrils ( Figure 6) and fibrin clots, with the latter reminiscent of biofilms ( Figure 6-figure supplement 2). Similar results were obtained for L. lactis expressing SfbI-A20 and SfbI-A20:Cys97Ala ( Figure 6-figure supplement 2).

SfbI-TED binding to human cell surfaces under conditions mimicking inflammation
Fibrinogen is an abundant plasma protein produced mainly by hepatocytes and required for hemostasis. It is however also produced by extrahepatic epithelial cells, and is present in the extracellular matrix (Pereira et al., 2002), the provisional matrix in wound healing (Clark et al., 1982) and in inflamed tissues (Lawrence and Simpson-Haidaris, 2004). Fibrinogen expression is upregulated during inflammatory acute phase response in hepatocytes, but also in epithelial cells by the synergistic action of corticosteroids and interleukin-6 (Snyers et al., 1990;Haidaris, 1997). This inflammatory response can be induced in cell culture by incubation with dexamethasone and interleukin-6. Under these conditions human A549 lung epithelial cells express and secrete fibrinogen, which remains associated with cell surfaces (Guadiz et al., 1997). We investigated if SfbI-TEDs interact with such induced A549 cells. SfbI-TED binding to unfixed, viable, adherent cells was visualized by conjugating TED-iPT constructs via IPD complementation to GFP fused to sIPD (Zakeri et al., 2012) (Figure 7A). SfbI-A40-TED bound to A549 cells only after they had been pre-incubated with dexamethasone and interleukin-6 ( Figure 7B). SfbI-A40-TED:Cys109Ala binding to A549 cells was not detectable with or without induction. To obtain direct evidence for fibrinogen binding, Western blot analyses of cell homogenates using an anti-fibrinogen α antibody were performed. These confirmed upregulation of fibrinogen expression following exposure of cells to dexamethasone/interleukin-6, in agreement with published data (Snyers et al., 1990;Haidaris, 1997). Critically, in homogenates from induced cells incubated with  SfbI-A40-TED, a second band is detected of a molecular mass consistent with an SfbI-A40-TED: fibrinogen Aα adduct ( Figure 7C). The lack of this band in homogenates from induced cells incubated with SfbI-A40-TED:Cys109Ala confirms formation of this adduct is dependent on the thioester. These data show that SfbI-A40-TED adherence to A549 cells under conditions mimicking inflammation depends on thioester-mediated covalent binding to cell surface-associated fibrinogen.
Pathogenic bacteria disseminated in the blood stream directly or indirectly target fibrinogen and fibrin, resulting in immune evasion, induction of inflammatory response, or platelet activation (Sun, 2006;Rivera et al., 2007). Two well-studied examples for bacterial fibrinogen-binding proteins involved in invasive infections are clumping factor A of S. aureus (Ganesh et al., 2008) and the M1 protein of S. pyogenes (Macheboeuf et al., 2011). In this study we have not investigated the role of SfbI binding to fibrinogen during bacterial infection. However, S. pyogenes is known to target epithelial and endothelial cells for adhesion and invasion, and to cause invasive infections involving dissemination in the blood stream. Therefore our experiments using fibrin and epithelial cells represent meaningful biological models to probe the role of thioester-dependent covalent interaction in infections. Perhaps most significantly our data suggest that fibrinogen may have an unappreciated role as a host cell-surface associated adhesion target for bacteria. Potentially SfbI through its fibrinogenbinding activity may confer a tropism for inflamed tissue or the provisional matrix of healing wounds.
To mediate strong attachment, bacterial adhesion complexes typically present extensive, multivalent binding interfaces. Our results show that a wide and diverse range of Gram-positive bacteria have evolved a mechanism for covalent bond formation with specific host factors via a pathogen-encoded reactive internal thioester. Thioester proteins could form a continuous covalent bridge between the bacterial cell envelope and host adhesion targets. This would provide rapid, mechanically very strong attachment that may be particularly advantageous under conditions of shear stress. Lack of similarity in the amino acid sequences of TEDs, together with the observation that only a small subset of those tested here bind fibrinogen, suggests there are undiscovered targets of these proteins. Thioester-dependent covalent adhesion may be a common molecular mechanism in hostmicrobe interactions, playing a key role in host colonization by both commensal and pathogenic Gram-positive bacteria. As such, TEDs are potentially attractive targets for the design of small molecules to inhibit infection, but they also present an opportunity to engineer beneficial interactions. TEDs hold promise as reactive protein fusion tags for use in tissue engineering, diagnostics or as tools for cell and molecular biology. While we think it is likely that many thioester-containing bacterial proteins will play a role in host-microbe interactions, it is also possible that they function in other processes such as bacteria-bacteria interactions and biofilm formation.

Expression vectors for protein production
Five E. coli expression vectors were used to encode TED proteins: pOPIN-F, pOPIN-E, pHisTEV, pDEST and pDEST-iPT. Expression from pOPIN-F results in an N-terminal His 6 -tag cleavable by 3C protease and from pOPIN-E, a non-cleavable C-terminal His 6 -tag (Berrow et al., 2007). pHisTEV is derived from pET30a (Liu and Naismith, 2009) and pDEST from pDEST14. Both incorporate an N-terminal His 6 -tag cleavable by TEV protease. The pDEST-iPT vector is derived from pDEST with a C-terminal extension consisting of a trypsin cleavage site followed by the iPT sequence AHIVMVDAYK, representing the C-terminal β-strand of the FbaB CnaB2 domain (Zakeri et al., 2012).
The sIPD construct was derived from pDEST-Cna114 (residues 1-113 of the FbaB CnaB2 in pDEST14) (Zakeri et al., 2012). This was shortened to residues 22-107 by deletion of residues 1-21 and mutation of Lys108 to a stop codon. A Cys residue was added N-terminal to the His 6 -tag, allowing immobilisation on a solid support. The sIPD-GFP construct was generated by insertion of a BamHI site between the His 6 -tag and sIPD, and cloning of GFP (residues 2-238) between BamHI and NcoI. All primers used for this study are detailed in the Supplementary file 2.
For proteins expressed with a cleavable N-terminal His 6 -tag (pDEST, pHisTEV, pOPIN-F), clarified lysate was applied to a Ni 2+ -IMAC column (GE Healthcare, UK) and bound proteins were step eluted with buffer (as above) supplemented with 500 mM imidazole. The eluate was loaded onto a Hi-Load 26/60 Superdex 75 gel filtration column (GE Healthcare) pre-equilibrated with 20 mM HEPES (pH 7.5), 150 mM NaCl. Fractions containing TED proteins were incubated with appropriate protease (1:50, wt/wt) at 4˚C for 16-20 hr. Imidazole was added to 40 mM and the sample applied to a Ni 2+ -immobilized column. Cleaved TED protein was collected in the flow through, which was concentrated and injected onto the Superdex 75 column pre-equilibrated with either 20 mM HEPES (pH 7.5), 150 mM NaCl, or 20 mM bis-tris (pH 6.0), 150 mM NaCl. Fractions containing purified TED proteins were concentrated to ∼5-10 mg/ml for binding assays and to ∼10-30 mg/ml for crystallization. Protein concentration was determined by A 280 .
For proteins expressed with a non-cleavable C-terminal His 6 -tag (pOPIN-E), only the initial twostep purification by Ni 2+ -IMAC, and gel filtration chromatography was carried out. Gel filtration buffer was 20 mM bis-tris (pH 6.0), 150 mM NaCl.
TED proteins for cell binding assays and pulldowns from plasma were expressed from pDEST-iPT. Cell pellets were resuspended in phosphate-buffered saline (PBS) (pH 6.0), 10 mM imidazole, with one EDTA-free protease inhibitor cocktail tablet (Roche, UK) per 50 ml of buffer. Cells were lysed by sonication. Clarified lysate was applied to a Ni 2+ -IMAC column and bound proteins eluted with PBS (pH 6.0), 250 mM imidazole. The eluate was dialyzed against PBS (pH 6.0) at 4˚C to remove imidazole. The His 6 -tag was cleaved with His 6 -tagged TEV protease (1:50, wt/wt) for 16 hr at 4˚C. Imidazole was added to 20 mM and the sample applied to a Ni 2+ -immobilized column. Purified TED proteins were collected in the flow through and concentrated to 2-4 mg/ml as determined by A 280 .
The sIPD and sIPD-GFP proteins were expressed from pDEST and purified as above, except that PBS (pH 7.2) buffer was used throughout, and the His 6 -tag was not removed.

Intact mass spectrometry analyses
Protein intact masses were determined by LC-MS on a Synapt G2 mass spectrometer coupled to an Acquity UPLC system (Waters, Milford, MA, USA). 50-100 pmol of protein were injected onto an Aeris WIDEPORE 3.6μ C4 column (Phenomenex, UK) and eluted with a 10-90% acetonitrile gradient over 13 min (0.4 ml/min). The spectrometer was controlled by the Masslynx 4.1 software (Waters) and operated in positive MS-TOF and resolution mode with capillary voltage of 2 kV, cone voltage, 40 V. Leu-enkephalin peptide (2 ng/ml, Waters) was infused at 10 μl/min as a lock mass and measured every 30 s. Spectra were generated in Masslynx 4.1 by combining scans and deconvoluted using the MaxEnt1 tool (Waters).

Identification of Gln residues forming a thioester bond
BaTIE-TED, SaTIE-TED and CdTEP-TED (40 μl at 1 mg/ml in PBS) were incubated with 6 μl 2 M methylamine (in PBS), pH 7.6-8.0 at 25˚C for 15 min. The pH was adjusted to 7.0, 4 μl 200 mM iodoacetamide was added, and the mixture incubated for 60 min at 25˚C in darkness. 4 μl 200 mM DTT was added to quench excess IAA.
Protein solutions (5 μl) were dialyzed into 50 mM ammonium bicarbonate, trypsin added and the samples incubated at 37˚C overnight. The acidified peptides were separated on an Acclaim PepMap 100 C18 trap and RSLC C18 column (Thermo Fisher Scientific, UK), using a nanoLC Ultra 2D plus loading pump and nanoLC as-2 autosampler (Eskigent, UK). The peptides were eluted with a gradient of increasing acetonitrile. The eluent was sprayed into a TripleTOF 5600 electrospray tandem mass spectrometer (ABSciex, UK) and analyzed in Information Dependent Acquisition mode, performing 250 ms of MS followed by 100 ms MS/MS analyses on the 20 most intense peaks. The MS/MS data file generated in PeakView (ABSciex) was analyzed using the Mascot algorithm (Matrix Science, UK), against an internal database containing the TED sequences as an error-tolerant search, which considers all modifications in Unimod.
Final models were produced through iterative rounds of refinement using REFMAC5  and manual rebuilding with COOT (Emsley et al., 2010). Translation-Liberation-Screw (TLS) and non-crystallographic symmetry restraints were used for CpTIE-TED, whilst only TLS was used for CpTIE-TED:Cys138Ala. TLS groups were based on the α-helical and β-strand subdomains. The β-strand domain comprises Ser100-Gly162 and Pro251-Thr276; the α-helical domain Ser163-Ile250. In Chain C of the CpTIE-TED:Cys138Ala model (PDB:5a0d), it was clear from the electron density that the α-helical domain is somewhat flexible and can be modeled in two distinct positions. For SfbI-A40-TED and PnTIE-TED data, anisotropic B-factor refinement was used. Data collection, phasing and refinement statistics are shown in Table 2.

Fibrinogen binding assays
Lyophilized human fibrinogen (Sigma, UK) was reconstituted to 2 mg/ml in reaction buffer (20 mM HEPES [pH 7.5], 150 mM NaCl). TED protein purified in 20 mM bis-tris (pH 6.0), 150 mM NaCl was diluted to 10 μM in reaction buffer. Equal volumes of fibrinogen and TED were mixed and left at RT for 60 min. Controls were performed with either fibrinogen or TED replaced with reaction buffer. Samples were analyzed by SDS-PAGE. Bands corresponding to the TED-fibrinogen Aα adduct were excised for analysis.
Pulldown of TED-fibrinogen complexes from plasma sIPD protein was immobilized on SulfoLink Coupling Resin (Thermo Fisher Scientific) equilibrated in 50 mM Tris, 5 mM EDTA, pH 8.5 by reaction at 37˚C for 45 min. Unreacted iodoacetyl groups were quenched with 50 mM L-cysteine in PBS (pH 7.2) at 37˚C for 30 min. After washing (PBS, pH 7.2) the resin was incubated with iPT-TED at 37˚C for 30 min. Excess iPT-TED was removed by washing (PBS). The resin was incubated with 1 ml plasma (TCS Biosciences, UK, human plasma mixed pool, citrated) at 37˚C for 60 min. For SfbI-A20-TED, prior to this step the plasma was partially depleted of albumin by applying it to a 1 ml HiTrap Blue Sepharose column (GE Healthcare) pre-equilibrated in PBS (pH 7.2). Non-covalently bound proteins were removed by washing (PBS). Protein complexes were cleaved from the resin using TEV protease in PBS (pH 7.2), 1 mM DTT, 0.5 mM EDTA, at 37˚C for 60 min. Samples were analyzed by SDS-PAGE. Bands corresponding to the TED-fibrinogen Aα adduct were excised for analysis.

Covalent TED-fibrinogen binding mapped by mass spectrometry
Excised bands from TED-fibrinogen and TED-plasma binding reactions were digested with trypsin according to standard procedures. Peptides were extracted and analyzed by nanoLC-MS E on a Synapt G2 mass spectrometer coupled to a nanoAcquity UPLC system. Peptides were trapped using a precolumn (Symmetry C18, 5 μm, 180 μm × 20 mm, Waters), which was switched in-line to an analytical column (BEH C18, 1.7 μm, 75 μm × 250 mm, Waters) for separation. Peptides were eluted with an 8-50% acetonitrile gradient in water/0.1% formic acid at 0.75% per minute (250 nl/min). The column was connected to a 10 μm SilicaTip nanospray emitter (New Objective, Woburn, MA, USA) for infusion into the mass spectrometer. [Glu 1 ]-fibrinopeptide B (1 pmol/μl, Sigma) was infused at 0.5 μl/min as a lock mass and measured every 30 s. The spectrometer was controlled by the Masslynx 4.1 software (Waters) and operated in positive MS E and sensitivity mode with capillary voltage of 3 kV, cone voltage, 40 V. Scan time was 1 s over the range 50-2000 m/z range. For the low energy scan the trap collision was off, and for the high energy scan the trap collision energy (CE) was ramped from 20-60 V. For de novo identification of proteins, raw files were processed in Protein Lynx Global Server 2.5.2 (Waters), including a search on a Homo sapiens protein database to which the TED protein sequences had been added.
Acquisition in MS E mode uses alternating low and high CE scans and generates 2 separate chromatograms. To detect the TED-fibrinogen Aα cross-linked peptides, extracted ion chromatograms were generated from the high CE trace in Masslynx 4.1. and inspected for characteristic y-ions expected from C-terminal TED-peptide fragments starting with proline. Peptides are known to fragment to the N-terminus of prolines, and the sequences searched for here were Pro (116.071 Da, SfbI-A40-TED-fibrinogen), Pro-Lys (244.166 Da, SfbI-A346-TED-and SfbI-A20-TED-fibrinogen) and Pro-Gly-Ser-Arg (416.226 Da, SfbI-A40-TED-and SfbI-A20-TED-plasma). Detected peaks for those yions were aligned with the corresponding low CE traces, which reliably lead to the detection of masses consistent with the cross-linked precursor peptides. In some cases we also searched the low CE trace for masses consistent with all possible combinations of tryptic fibrinogen Aα peptides, crosslinked with the relevant TED peptides, but we could find no evidence for additional combinations. Precursor spectra and high CE fragment spectra were inspected for the presence of characteristic fragment ions of the identified cross-linked peptides.
The lysine-acetylated TED-fibrinogen complex was generated by incubation of 20 μM fibrinogen with a fivefold molar excess of SfbI-A40-TED at 37˚C for 1 hr. Sulfo-NHS-acetate (Thermo Fisher Scientific) was then added to a concentration of 10 mM and the reaction mixture incubated at 37˚C for 2 hr. The sample components were separated by SDS-PAGE, and the TED-fibrinogen Aα adduct bands excised for analysis by MS. For comparison lysine-acetylated fibrinogen bands were produced by the same protocol, with the omission of the TED reaction step.
Excised gel bands were digested with either trypsin or endoproteinase GluC or, in the case of the TEDfibrinogen adduct, chymotrypsin. The extracted peptides were analyzed by nanoLC-MSMS by the same protocol used for identification of Gln residues involved in thioester bond formation. Peptides with charges +2 to +5 with an ion count over 150 were selected for MSMS, and then excluded for further analysis for 15 s. A rolling CE was applied to fragment the peptides. The data was searched using the Mascot algorithm against an in-house database containing fibrinogen sequences. Carboamidomethyl modification of cysteines was set as a fixed modification, acetylation on lysine and oxidation of methionine were set as variable modifications. MS tolerance was ± 20 ppm and MSMS tolerance ± 0.1 Da.
Lactococcus heterologous protein expression L. lactis was grown at 30˚C in M17 medium plus 0.5% glucose (GM17). S. pyogenes was grown at 37˚C in Todd-Hewitt broth plus 0.5% yeast extract. Where appropriate, antibiotics were added: erythromycin at 3 mg/l for L. lactis and 400 mg/l for E. coli.
Chromosomal DNA from S. pyogenes A40 was prepared as described previously (Bergmann et al., 2014) as a template for amplification of the sfbI-A40 gene. Amplified DNA was inserted into the shuttle vector pOri23 (Que et al., 2000) and transformed into E. coli XL1-Blue. Site-directed mutagenesis was used to create SfbI-A40:Cys109Ala plasmid DNA. 1 μg of plasmid was used to transform L. lactis by electroporation (Holo and Nes, 1989).
L. lactis constructs SfbI-A40, SfbI-A40:Cys109Ala and pOri23 were grown in GM17 and fixed in the growth medium with 2% formaldehyde. After quenching free aldehydes with 10 mM glycine, samples were incubated with purified polyclonal anti-SfbI IgG-antibodies for 60 min at 37˚C, washed and incubated with protein A-gold nanoparticles (15 nm) for 30 min at 37˚C. After washing, samples were fixed in 2% glutaraldehyde, absorbed onto butvar-coated 300 mesh grids, washed in distilled water, air-dried and examined in a Zeiss Merlin scanning electron microscope at an acceleration voltage of 10 kV using the high efficiency Everhart-Thornley SE-detector.

Lung epithelial cell-binding assays
Human alveolar basal epithelial (A549) cells were propagated in Dulbecco's modified Eagle medium (DMEM; Life Technologies, Thermo Fisher Scientific) supplemented with 10% (vol/vol) fetal bovine serum (FBS, Thermo Fisher Scientific), 100 U/ml penicillin and 50 μg/ml streptomycin at 37˚C in a humidified atmosphere of 5% CO 2 . Cells were seeded at a density of 4 × 10 4 cells/well and grown to approximately 60% confluency.
Inflammatory response was induced in half of the cell cultures by incubating cell monolayers with 0.39 mg/ml dexamethasone (Sigma) and 50 ng/ml human recombinant interleukin-6 (Cambridge Bioscience, UK) in DMEM supplemented with 10% FBS. After 48 hr, cells were chilled at 4˚C for 30 min and then incubated with 1 mg/ml SfbI-A40-TED-iPT, or SfbI-A40-TED:Cys109Ala-iPT in PBS, or PBS for 30 min at 4˚C. Cells were fixed with pre-chilled (−20˚C) methanol for 20 min at −20˚C, then stained with 1 mg/ml sIPD-GFP in PBS for 30 min at 25˚C. Nuclei were stained with 0.5 μg/ml DAPI (15 min, 25˚C). Between each incubation cell monolayers were washed extensively with sterile PBS. After 24 hr at 4˚C, cells were imaged on an EVOS Digital Inverted Microscope (Life Technologies, Thermo Fisher Scientific).

Western blot
A549 cells were grown, induced and incubated with wildtype or Cys109Ala mutant TEDs as described above. Cells were homogenized in lysis buffer (8 M urea, 5% SDS, 10% β-mercaptoethanol) by sonication in an ultrasonic water bath at 4˚C for 45 min. 10 μl of cell homogenate were separated by SDS-PAGE on a 12% acrylamide gel (Bio-Rad) in Tris-glycine buffer (2.5 mM Tris, 19.2 mM glycine, 0.01% SDS, pH 8.3). The proteins were then transferred to a BioTrace PVDF membrane (PALL Gelman Laboratory) at 100 V for 1 hr in Tris-glycine buffer containing 20% (vol/vol) methanol. The membrane was subsequently washed with PBS containing 0.1% (vol/vol) Tween20 (Sigma), blocked for 1 hr with 5% (wt/vol) BSA in PBS-Tween, and incubated for 12 hr at 4˚C with 1:10,000 diluted primary antifibrinogen α antibody (C-7, mouse IgG) (Santa Cruz Biotechnology). The membrane was washed 3 times for 10 min with PBS-Tween, and incubated with 1:20,000 diluted secondary IRDye 800CW goat anti-mouse IgG (LI-COR Biosciences), for 1 hr at room temperature. After three 10 min washes with PBS-Tween, the membrane was imaged (Odyssey CLx imaging system, LI-COR Biosciences). The same membrane was subsequently incubated for 1 hr at room temperature with 1:10,000 diluted primary anti-β-actin mouse antibody (Sigma), washed 3 times for 10 min with PBS-Tween, incubated with 1:20,000 diluted secondary IRDye 800CW goat anti-mouse IgG for 1 hr and re-imaged.