Enhanced functionalisation of major facilitator superfamily transporters via fusion of C-terminal protein domains is both extensive and varied in bacteria

The evolution of gene fusions that result in covalently linked protein domains is widespread in bacteria, where spatially coupling domain functionalities can have functional advantages in vivo. Fusions to integral membrane proteins are less widely studied but could provide routes to enhance membrane function in synthetic biology. We studied the major facilitator superfamily (MFS), as the largest family of transporter proteins in bacteria, to examine the extent and nature of fusions to these proteins. A remarkably diverse variety of fusions are identified and the 8 most abundant examples are described, including additional enzymatic domains and a range of sensory and regulatory domains, many not previously described. Significantly, these fusions are found almost exclusively as C-terminal fusions, revealing that the usually cytoplasmic C-terminal end of MFS protein would the permissive end for engineering synthetic fusions to other cytoplasmic proteins. In nature, proteins commonly consist of multiple functional domains. This allows the combination of multiple functions into a single protein, with fusions of catalytic domains potentially facilitating catalysis via substrate channeling [1], or fusions of regulatory domains allowing post-translational control of activity [2]. Identification of these fusions is of interest as they may allow the identification of functional relationships between proteins through the Rosetta Stone method [3] or provide insights that enable optimisation of the development of synthetic fusions [4]. Membrane transporters are integral membrane proteins involved in the active transport of chemicals across the lipid bilayer. Many transporter families have been identified, and the strategies used for transport are diverse; transport can be linked to ATP hydrolysis or to utilisation of proton or Na-ion gradients across the membrane [5]. A number of transport proteins are fused to other domains; these can be catalytic or regulatory, and can be located at the N-terminus, C-terminus, or internally [6]. The value of studying these fusions to elucidate the function of transporters is well-established, and has enabled discoveries such as the LplT lysophospholipid transporter [7]. This transporter was identified through its fusion to domains involved in lysophospholipid repair, and follow-up experiments confirmed a role in lysophospholipid transport. A subsequent analysis by Barabote et al. [6] in 2006 identified several novel fusions from a range of transporter families and made several novel predictions of function. Since then, thousands of bacterial and archaeal genomes have been sequenced, providing a rich resource to mine for additional examples of fusions and assess systematically how widespread these are in biology. For this study, we wished to identify the range and type of fusions seen in the major facilitator superfamily (MFS), the largest family of secondary transporters in bacteria. This is an ideal starting point as the vast majority of MFS transporters are single proteins [8], although they can sometimes be found in complexes such as the EmrAB-TolC efflux pump [9], are widespread, being found in all living organisms [10], and are identifiable in numerous architectures in the InterPro database [11, 12]. The InterPro database was searched for MFS domains with the identifier ‘IPR020846’, identifying 3527 architectures. To survey the most commonly found, we selected architectures with more than 100 examples, which also removed any chance of them being false positives from sequencing errors. Furthermore, as the study focussed on bacterial MFS transporters, we only included architectures with bacterial representatives. Architectures containing particular domain fusions were assigned to groups; we added additional architectures containing these fused domains as part of multiReceived 31 August 2018; Accepted 21 December 2018; Published 18 January 2019 Author affiliation: Department of Biology, University of York, Wentworth Way, York YO10 5DD, UK. *Correspondence: Gavin H. Thomas, gavin.thomas@york.ac.uk

In nature, proteins commonly consist of multiple functional domains.This allows the combination of multiple functions into a single protein, with fusions of catalytic domains potentially facilitating catalysis via substrate channeling [1], or fusions of regulatory domains allowing post-translational control of activity [2].Identification of these fusions is of interest as they may allow the identification of functional relationships between proteins through the Rosetta Stone method [3] or provide insights that enable optimisation of the development of synthetic fusions [4].
Membrane transporters are integral membrane proteins involved in the active transport of chemicals across the lipid bilayer.Many transporter families have been identified, and the strategies used for transport are diverse; transport can be linked to ATP hydrolysis or to utilisation of proton or Na + -ion gradients across the membrane [5].A number of transport proteins are fused to other domains; these can be catalytic or regulatory, and can be located at the N-terminus, C-terminus, or internally [6].The value of studying these fusions to elucidate the function of transporters is well-established, and has enabled discoveries such as the LplT lysophospholipid transporter [7].This transporter was identified through its fusion to domains involved in lysophospholipid repair, and follow-up experiments confirmed a role in lysophospholipid transport.A subsequent analysis by Barabote et al. [6] in 2006 identified several novel fusions from a range of transporter families and made several novel predictions of function.Since then, thousands of bacterial and archaeal genomes have been sequenced, providing a rich resource to mine for additional examples of fusions and assess systematically how widespread these are in biology.
For this study, we wished to identify the range and type of fusions seen in the major facilitator superfamily (MFS), the largest family of secondary transporters in bacteria.This is an ideal starting point as the vast majority of MFS transporters are single proteins [8], although they can sometimes be found in complexes such as the EmrAB-TolC efflux pump [9], are widespread, being found in all living organisms [10], and are identifiable in numerous architectures in the InterPro database [11,12].
The InterPro database was searched for MFS domains with the identifier 'IPR020846', identifying 3527 architectures.To survey the most commonly found, we selected architectures with more than 100 examples, which also removed any chance of them being false positives from sequencing errors.Furthermore, as the study focussed on bacterial MFS transporters, we only included architectures with bacterial representatives.Architectures containing particular domain fusions were assigned to groups; we added additional architectures containing these fused domains as part of multi-domain fusions when more than 30 representatives were identified.
From this conservative analysis, we identified a broad range of fusions (Fig. 1 and Table 1), several of which are as yet uncharacterised.To obtain further data, gene/protein identifiers were used to search UniProt [13], providing the predicted function, gene name, host species, taxonomic distribution, and length (Table S1, is available in the online version of this article).Finally, the NCBI BLAST server [14] was used to identify homology between proteins and specific domains.By combining these analyses with available literature, it was possible to predict functions for some of the unidentified fusions.
The three most prevalent fusions we identified are all wellstudied.The largest apparent group (Group 1.) consists of fusions of two full-length MFS transporters, which includes the fused nitrate/nitrite transporter NarK, as characterised in Paracoccus pantrophus [15] and Paracoccus denitrificans [16].NarK1 encodes a nitrate/proton symporter, and NarK2 a nitrate/nitrite antiporter; it has been proposed that, during denitrification, NarK1 allows the import of nitrate, which is reduced to nitrite and then exchanged with nitrate via NarK2 [15].The fusion is not essential for function, but may allow cross-talk between the two domains, and maintains the stoichiometry of the two transporters at 1 : 1 [16].However, a significant limitation of our strategy is that this category of fusions includes all proteins that are annotated as having two separate MFS domains by InterPro, regardless of the size of the domains.This means that this group is likely to have a high rate of false positives; for example, 12transmembrane helix (TM) proteins with extended loops between helix 6 and 7 (e.g.UniProt: A0A031GPQ9) may be falsely considered as two separate MFS domains.
The second group contains fusions of MFS to a phospholipid/glycerol acyltransferase domain (PlsC, IPR002123), with three architectures: fusion to PlsC alone (Group 2a.), with an additional AMP-dependent synthetase/ligase domain (ACS, IPR000873) (Group 2b.) or with an ACS domain and an AMP-binding enzyme, C-terminal domain (IPR025110) (Group 2c.).These represent different degrees of fusion of the flippase LplT to the lysophospholipid repair system; LplT transfers the lysophospholipid 2-acylglycerophosphoethanolamine to the internal leaflet of the inner membrane, where it is acylated by PlsC/ACS to form phosphatidylethanolamine [7].The two-domain fusion was only identified in g-proteobacteria, whereas the three-domain fusion was identified in a-, d-, and "-proteobacteria and the g-proteobacterium Microbulbifer degradans.Our analysis reveals a significantly wider distribution, with the twodomain fusion also being widespread within the b-proteobacterial Burkholderiales order, whereas the three-domain fusion has examples in all groups of proteobacteria, the PVC superphylum, and even some Gram-positives.
The third group consists of a fusion to an 'osmosensory transporter coiled coil' (Osmo_CC, IPR015041) domain (Group 3.), a short sequence found specifically at the C-terminus of MFS transporters.This fusion is found almost exclusively in Gram-negative bacteria, primarily Enterobacterales and Pseudomonales (g-proteobacteria), but with some examples in aand b-proteobacteria.Observed substrates of these transporters are compatible solutes such as proline, glycine betaine, taurine, ectoine, and pipecolate [17,18].The Osmo_CC domain is a regulatory domain that allosterically activates the transporter under osmotic stress; E. coli ProP loses function when the Osmo_CC domain is deleted [19], but can still respond to high levels of osmotic stress if the domain is truncated [20].However, the activity of OusA from Erwinia chrysanthemi, a ProP homologue [18], is not affected by osmolarity, suggesting that the domain is not necessarily indicative of osmoregulation.
Several variants of polyamine biosynthesis (PABS) domain (IPR030374)-MFS fusions were observed.The most prevalent has an MFS domain immediately followed by the PABS domain (Group 4a.).This group contains MFS domains with 13 TM helices and 'short' variants with seven TM helices.Another variant (Group 4b.) consists of the 13TM MFS and PABS domain followed by a tetratricopeptide repeat-containing domain (TPR, IPR013026).There are 32 examples of a PABS domain with a C-terminal fused 7TM MFS transporter (Group 4c.); these proteins also typically contain seven TM helices at the N-terminus (as identified by TMHMM [21]) which are unrecognised as a domain.In the Transporter Classification Database [22], both 7TM and 13TM variants are considered to be MFS of the Uncharacterised Major Facilitator-30 (UMF30) family.Whether or not the 7TM MFS forms a functional transporter is uncertain.The Neisseria meningitidis protein NMB0240 appears to be a 7TM member of this group, and is immediately preceded on the genome by a 6TM protein recognised as an MFS [23].Given that homologues of these two proteins are fused in some organisms (e.g.C9PRR1 from Pasteurella dagmatis), it is possible that these two proteins could function as a split MFS transporter, as MFS transporters can be 'split' and expressed as two discrete proteins while retaining function [24,25].This genomic arrangement seems to be unusual in the wider group, although some 7TM examples have small 2TM or 4TM proteins immediately upstream.NMB0240 has been suggested to be involved in arginine uptake [23]; furthermore, knockout of this protein leads to a significant growth impairment on minimal medium, which can be rescued by addition of leucine or isoleucine.It is possible that the proteins of Group 4c constitute functional 14TM transporters with the PABS domain located in the linker region that joins the two helical bundles.While several MFS transporters have an additional two TM helices at this location [26], to our knowledge, there are no examples with a functional soluble domain inserted in this region.
In bacteria, PABS domain-containing proteins are typically involved in the production of spermidine from putrescine or thermospermine from spermidine [27].However, as these transporters have 'extra' helices, the PABS domain is unusually located in the periplasm.An interesting alternative explanation could be that the PABS domain in this group functions as a sensory domain; the transporter could then be modulated by PABS substrate binding, or simply serve to transmit the signal into the cytoplasm.
MFS can also be fused to one (Group 5a.) or two (Group 5b.) cyclic nucleotide binding domains (CNBD, IPR000595).Some examples contain an un-annotated region with an Armadillo-type fold signature between the MFS domain and CNBD.These represent HEAT repeatcontaining nucleotide transporters (NTTs) [28] involved in uptake of ATP from the environment; NTT proteins are part of the AAA family, which is a distant member of the MFS superfamily [26].The CNBD is proposed to modulate transporter activity in response to levels of cAMP, which is produced from ATP.Of the other MFS-CNBD fusions, the majority are from Actinobacteria such as Nocardioides, Microbacterium and Agromyces; the two-CNBD variant is found only in Mycobacterium triplex, Mycobacterium lentiflavum and Mycobacterium iranicum.Another architecture (Group 5c.) contains a CNBD and a patatin-like phospholipase (PLP) domain (IPR002641) at the C-terminus.This architecture is primarily found in mycobacteria, although some are found in Nocardiodes and Knoellia isolates.One example has been identified in Mycobacterium.tuberculosis [29], but no assessment has been made of its function.
Two architectures contain PAS (Per/Arnt/Sim, IPR000014) domains, one (Group 6a.) with an associated PAS-associated C-terminal domain (IPR000700), and one without (Group 6b.).Almost all are found in the order Bacillales, primarily in the families Bacillaceae and Paenibacillaceae.Many are annotated as nitrate/nitrite transporters.The MFS domains of these proteins are homologous to the B. subtilis nitrite efflux protein NarK; the closest is Bacillus vireti NarK (UniProt: W1SEL7), the MFS domain of which shares 60 % identity with Bacillus subtilis NarK.In B. subtilis, NarK prevents the toxic accumulation of nitrite, which is produced from nitrate during denitrification under anaerobic conditions for use as an electron acceptor [30]; under anaerobic conditions, NarK is significantly upregulated [31].Denitrification has also been observed in other Bacillales [32].PAS domains are involved in the sensing of a range of signals [33], including redox states, oxygen, and light.PAS domains have been observed to play a role in regulation of nitrogen metabolism.In some nitrogen-fixing bacteria, the nif nitrogen fixation operon is repressed under aerobic conditions by the NifL/NifA system; under anaerobic conditions, the flavin mononucleotide bound by the PAS domain of NifL is reduced, leading to derepression [34].Given the role of PAS domains in oxygen sensing, and the oxygensensitive regulation of B. subtilis NarK, it is tempting to speculate that the PAS domain of MFS-PAS proteins may provide post-translational regulation of transporter activity based on oxygen and/or redox levels.
Fusions of MFS to two C-terminal cystathionine b-synthase (CBS, IPR000644) domains (Group 7.) are found primarily in Actinobacteria, but also a handful of Firmicutes, a-proteobacteria, and even archaea.The presence of two CBS domains is consistent with the usual arrangement found in CBS-containing proteins [35].These domains are able to bind nucleotides, typically ATP, allowing regulation of enzymatic activity and/or oligomerisation state in relation to the internal concentration of ATP (and potentially also inhibitory nucleotides, such as GTP in eukaryotic IMPDH enzymes) [36].Two homologous CBS-containing MFS transporters, Bifidobacterium breve BBr_0838 and Bifidobacterium longum BL0920, have been characterised.Both were shown to be bile salt efflux pumps involved in bile resistance, although the role of the CBS domains was not investigated [37,38].CBS domains have been identified in other transporters, where they play a regulatory role.A group of osmoprotectant ABC transporters, found in B. subtilis and Lactococcus lactis as well as in pseudomonads, relies on CBS domains for osmoregulation [39].Additionally, in the Mg 2+ channel MgtE, ATP binding to the CBS domain seems to increase the sensitivity of the channel to internal Mg 2+ , promoting channel closure at physiological Mg 2+ levels [40].In the human CNNM2 transporter, CBS domains bind ATP and Mg 2+ and are required for Mg 2+ efflux [41].
Finally, there are 122 examples (Group 8.) of MFS transporters fused to UspA (IPR006016) domains.Almost all of these are derived from Actinobacteria, primarily Mycobacterium and Streptomyces species, although a few are found in proteobacteria and archaea.To our knowledge, no MFS-UspA fusion has been identified; while some fusions to APC transporters have been identified [6], the role of the UspA domains in these proteins is unknown.Indeed, while Usp proteins are known to be involved in stress response and tolerance, the underlying mechanism is not fully understood [42].Thus, it is difficult to predict the role of these fusions.
Almost all of the fusions identified according to our parameters are C-terminal, with the exception of the internal PABS domain-MFS fusion, and the MFS-MFS fusions of Group 1, which can be considered as both N-and  [45].The fusion of domains seems to be highly permissive with regards to size, with fusions ranging from around 30-40 amino acids for the Osmo_CC domain fusions, up to around 600 amino acids for the PlsC_ACS fusions, and even to around 1000 amino acids for the HEAT-containing MFS_CNBD_CNBD fusions.
In order to obtain a greater understanding of linker structure, we analysed a small selection of linkers from Groups 2 and 7 (Tables S2 and S3).In Group 2, the analysed linkers were typically around 35 amino acids in length, although some were as short as 29 amino acids, and one example was as long as 61 amino acids.In the g-proteobacterial examples, these linkers typically consisted of a conserved a-helix and area of extended secondary structure, followed by a poorly conserved unstructured region (Table S2, Fig. S1).
The linkers of Group 7 (Table S3) show greater variability, with some being around 50 amino acids long, and others being around 15 amino acids.Secondary structure in these linkers is less apparent, although the linkers from Bifidobacterium proteins show an a-helical region towards the C-terminal end of the linker.
The Pfam database [46] also allows the identification of domain architectures, and it is thus possible to carry out a similar analysis using the MFS clan (CL0015).Pfam divides the MFS clan into 24 members, which makes the dataset more complex, but allows some fusions to be examined in more detail; for example, while We also examined the Protein Data Bank [47] for crystal structures of MFS fusions.However, we were unable to find structures for any of our identified fusions.This makes it difficult to analyse other properties of the fused domains such as surface charge.The only structure available of a fused MFS protein is for the E. coli transporter YajR [48]; however, the YAM domain of this protein is not considered as a 'domain' in InterPro, and hence it has not been included in our analysis.Future analyses would benefit by integrating analyses from multiple databases, including structural databases, in order to further develop and refine the dataset.
The primary aim of this research was specifically to assess the suitability of using InterPro for identification of transporter fusions, rather than the identification of novel fusions, and many of the fusions that we have identified systematically have been previously characterised.Nevertheless, we have also been able to identify previously undescribed fusions, such as the MFS-PABS and MFS-PAS fusions, and identifying the roles of these fusions could be an interesting avenue for further study.

Fig. 1 .
Fig. 1.Schematic diagram of the eight most prevalent groups of MFS fusion proteins in bacteria.Dashed lines indicate domain boundaries where multiple architectures are observed.Numbers in parentheses indicate the numbers of results in InterPro for each domain architecture.For group 5, numbers of HEAT repeat containing and non-HEAT repeat containing architectures are combined (annotated as † and ‡) as these are not separated in InterPro.Blue highlights represent regulatory domains, and red highlights represent catalytic domains.The results presented here are accurate as of 07/03/18.

Table 1 .
Most abundant fused MFS architectures identified in this study.Architectures have been organised into groups according to fused domain identities.Individual architectures have been characterised as Regulatory fusions (R) or Catalytic fusions (C).Notable examples are those that have been characterised or identified previously, and are discussed in the main text.7TM: unclassified 7-transmembrane helix region.
Downloaded from www.microbiologyresearch.orgby IP: 54.70.40.11On: Thu, 23 May 2019 23:34:27 Thus, it may be harder to distinguish less common fusions from false positives using Pfam.We also identified fusions that were overlooked in the InterPro analysis; for example, the MFS domain of the MFS_2-phosphoenolpyruvate-dependent sugar phosphotransferase system, EIIA 1 fusion, with 79 examples in Pfam, is annotated as a sodium:galactoside symporter (IPR001927) by InterPro and was not included in our analysis despite having 479examples.An MFS_3-thymidylate kinase fusion, with 47 examples in Pfam, is not included as InterPro does not recognise thymidylate kinase as a 'domain'.Searching for thymidylate kinase (IPR018094) 'family' architectures revealed 553 MFS transporters annotated as single MFS domains.