Multiple classes and isoforms of the RNA polymerase recycling motor protein HelD

Abstract Efficient control of transcription is essential in all organisms. In bacteria, where DNA replication and transcription occur simultaneously, the replication machinery is at risk of colliding with highly abundant transcription complexes. This can be exacerbated by the fact that transcription complexes pause frequently. When pauses are long‐lasting, the stalled complexes must be removed to prevent collisions with either another transcription complex or the replication machinery. HelD is a protein that represents a new class of ATP‐dependent motor proteins distantly related to helicases. It was first identified in the model Gram‐positive bacterium Bacillus subtilis and is involved in removing and recycling stalled transcription complexes. To date, two classes of HelD have been identified: one in the low G+C and the other in the high G+C Gram‐positive bacteria. In this work, we have undertaken the first comprehensive investigation of the phylogenetic diversity of HelD proteins. We show that genes in certain bacterial classes have been inherited by horizontal gene transfer, many organisms contain multiple expressed isoforms of HelD, some of which are associated with antibiotic resistance, and that there is a third class of HelD protein found in Gram‐negative bacteria. In summary, HelD proteins represent an important new class of transcription factors associated with genome maintenance and antibiotic resistance that are conserved across the Eubacterial kingdom.

complexes (either a stochastic pause during transcription of structured RNA or at a site of DNA damage), physically removing it from the DNA or restarting it via a RecG-like ATPase motor domain (Ghodke et al., 2020;Ho et al., 2018;Kang et al., 2021;Le et al., 2018;Ragheb et al., 2021;Shi et al., 2020;Westblade et al., 2010).
In B. subtilis RNaseJ1 clears stalled RNAP using a torpedo mechanism (5′-3′ exonuclease activity followed by RNAP displacement) , and in Escherichia coli the helicase protein RapA is important in recycling RNAP (Liu et al., 2015). UvrD/PcrA in concert with Gre factors has been reported to act on RNAP stalled at a DNA lesion, binding to the complex and using the energy of ATP hydrolysis to backtrack away from the lesion to allow repair systems access to the damaged DNA (Epshtein et al., 2014;Hawkins et al., 2019), although it now appears that the role of these helicases is in preventing the formation of, and resolving, R-loops (RNA-DNA hybrids) that can have a detrimental effect on DNA replication (Urrutia-Irazabal et al., 2021).
An additional system identified in Gram-positive bacteria required for recycling stalled transcription complexes involves the action of the motor protein HelD . The designation of HelD (also called helicase IV) was originally made for a protein identified in E. coli as a weakly processive 3′-5′ DNA helicase (Wood & Matson, 1987). To avoid confusion with the separate classes of HelD proteins that are the focus of this work, the E. coli protein will be referred to as helicase IV. Based on conserved sequence motifs Helicase IV is a superfamily 1 (SF1) helicase, related to housekeeping helicase UvrD/PcrA (Figure 1). The B. subtilis gene yvgS was assigned the name helD based on limited protein sequence conservation to helicase IV , although the proteins differed with respect to domain organization (Koval et al., 2019;Wiedermannova et al., 2014) (Figure 1). Little functional, and no structural information is available for helicase IV, although a model generated by AlphaFold2 (Jumper et al., 2021) enables tentative comparison of UvrD/PcrA, helicase IV, and B. subtilis HelD (Figure 1). Helicase IV and HelD show similarity with UvrD/PcrA around the well-defined 1A and 2A helicase domains (blue and orange, respectively, Figure 1a), but not in other structural motifs associated with helicase activity (UvrD/PcrA domains 1B and 2B). Both helicase IV and HelD have N-terminal domains not present in UvrD/PcrA helicases, and helicase IV has a putative 1B domain which may account for its reported helicase activity, whilst in the equivalent 1B domain position HelD contains an unrelated sequence that folds into a novel clamp-arm (CA) structure important in transcription recycling (Newing et al., 2020;Wiedermannova et al., 2014). Whilst UvrD/PcrA and helicase IV have helicase activity, HelD shows none suggesting it has evolved from an SF1type helicase into a transcription recycling factor that utilises the energy from ATP hydrolysis catalysed by its helicase motifs for its transcription-related activity.
Studies on HelD from low G+C (Bacillus subtilis) and high G+C (Mycobacterium smegmatis) Gram-positives revealed that there are two distinct classes of the enzyme, confirmed by phylogenetic and structural analyses (Kouba et al., 2020;Newing et al., 2020;Pei et al., 2020). Class I HelD was described from B. subtilis, whilst the structurally distinct Class II enzyme was identified in M. smegmatis (Kouba et al., 2020;Newing et al., 2020;Pei et al., 2020). Class I and II HelDs have similar motor domains but differ in the structure of their arms and the mechanism by which these arms perform the mechanical activity of removing nucleic acids and recycling RNAP (Kouba et al., 2020;Newing et al., 2020;Pei et al., 2020). The recent structures of HelD from B. subtilis and M. smegmatis bound to core RNAP (α 2 ββ'ω) (Kouba et al., 2020;Newing et al., 2020) are shown in Figure 2a  RNA exit channels are simultaneously opened, leading to the release of the stalled RNAP (Newing et al., 2020). This recycling activity is powered by ATP hydrolysis and the mechanical action of the two arms that flank the motor domain. In the Class I HelD, the long SCA ( Figure 2a and c) can physically remove nucleic acids from the active site (dotted circle in Figure 2a), whereas in the Class II HelD the SCA is too short, and instead nucleic acid removal is performed by a CA insert called the PCh-loop (Figure 2b and d) (Kouba et al., 2020;Newing et al., 2020). Recent reports also suggest that some Class II HelDs (from M. abscessus and Streptomyces venezuelae) can confer rifampicin resistance through removal of rifampicin by the PCh-loop (Hurst-Hess et al., 2021;Surette et al., 2021).
In this work, we take advantage of the recent structural information to compile a detailed phylogenetic analysis of HelD showing that many organisms contain more than one (up to 5) different versions of HelD, that the genes encoding these enzymes are all expressed, that HelD is likely to have been acquired by horizontal gene transfer in Gram-negative Bacteroides and Gram-positive Coriobacteria and Acidimicrobiia, and that there is a third Class of HelD found in the Gram-negative Deltaproteobacteria.
2 | E XPERIMENTAL PROCEDURE S

| Sequence retrieval and analysis
The sequence of B. subtilis 168 HelD (UniProtKB ID: O32215) was used to search for homologues on 11/11/2020 using the NCBI Conserved Domain Architecture Retrieval Tool (Geer et al., 2002), which identified 13,781 sequences, which were trimmed to 11,821 to remove partial sequences (<600 aa). To aid subsequent analyses, particularly for the study of multiple copies of helD genes, the original sequences were used to search complete reference genomes from the KEGG (https://www.kegg.jp) and JGI (https://jgi.doe.gov) databases. HelD and RpoB sequences retrieved from these complete genomes were used for subsequent phylogenetic studies.

| Distribution and phylogeny of HelD
Searching for HelD-like sequences using the conserved domain architecture retrieval tool (CDART; NCBI) portal identified >13,000 hits. Additional searches using NCBI BLASTP suggest that there are substantially more sequences in the database, but many of these are from incomplete genomes and/or metagenomic sequencing projects, making systematic identification and classification of sequences unfeasible, particularly in cases where an organism carries more than one helD gene (see below  (Newing et al., 2020) showed that HelD sequences fall into two classes, which was confirmed at the structural and functional level in comparing HelD proteins from the Firmicutes and Actinobacteria (Kouba et al., 2020;Newing et al., 2020;Pei et al., 2020). Using a wider range of carefully curated sequences from complete genomes identified from the initial CDART search, an unrooted phylogenetic tree was constructed to enable a more detailed understanding of HelD distribution and phylogeny which was compared against the RNAP RpoB (β) subunit

HelD RpoB
Overall, the tree contains three major branches: Class I HelD sequences originating mainly from the low G+C Gram-positives and Bacteriodia, Class II HelD sequences from the high G+C Grampositives, and a novel Class III identified in Deltaproteobacteria.
Interestingly, the HelD sequences from the Actinobacterial   (56) and Nocardia farcinica IFM10152 (60) contain two copies of the rpoB gene (numbered x.a and x.b in panel B). Copy 1 is the housekeeping rpoB and copy 2 is a rifamipicin-resistant rpoB expressed during antibiotic production in those organisms. and XIVa (Lachnospiraceae) that are abundant gut microbes associated with many aspects of good health, and the cluster XI gut pathogen C. difficile (Lopetuso et al., 2013;Lozupone et al., 2012;Milani et al., 2017). Since the Bacteroides and Parabacteroides are also abundant obligate gut anaerobes, this clustering suggested that helD was horizontally transferred from an anaerobic gut Firmicute, most likely from the order Clostridiales (Appendix 1; Figure A5).
Analysis of the genome context of helD genes indicated they were not (or are no longer) located in mobile genetic elements, except for B. thetaiaotamicron, and along with their widespread distribution in F I G U R E 4 Three classes of HelD. Panel A shows a focused unrooted phylogenetic tree constructed using HelD sequences, with numbers (#) as used in Figure 1A:

| A novel HelD class in Gram-negative bacteria
The analysis presented in this work also shows that there is a third class of HelD proteins encoded by the Deltaproteobacteria (Class III, Figures 3 and 4; see below). Newing et al. (Newing et al., 2020) identified Class I and II HelD proteins based on the conservation of twelve sequence motifs. These motifs (labeled I-XII, Appendix 1; Figure A6) are all conserved in Class III proteins (exemplified by Myxococcus xanthus HelD), despite the low overall levels of sequence similarity found in HelD proteins (Newing et al., 2020). A model of M. xanthus HelD was also generated from an unbiased screen of the protein structure database (Figure 4; see Materials and Methods). As seen with Class I and II proteins, there is a HelD-specific N-terminal domain of ~50-150 amino acids that has a long antiparallel α-helical structure (secondary channel arm, SCA, Figure 4b) that is required to anchor HelD in the secondary channel of its cognate RNAP (Kouba et al., 2020;Newing et al., 2020;Pei et al., 2020), and the 1A helicase domain is split by the insertion of an arm-like structure (clamp arm, CA, Figure 4b and S6) that is used to bind within the primary channel of RNAP, forcing it open to aid the release of bound nucleic acids (Kouba et al., 2020;Newing et al., 2020;Pei et al., 2020). shown as a small green sphere (within the dotted circles). The arrows in panels C and E denote the view of the respective RNAP-HelD complex in panels E and F. The view in panels C and D is into the primary channel to which the clamp arm (CA) of HelD binds. The view in panels E and F is into the secondary channel (dotted circle) into which the secondary channel arm (SCA) is inserted HelD (Appendix 1; Figure A6). This additional amino acid does not appear to be highly conserved,

| RNAP δ subunit and HelD
The Firmicutes have the smallest multi-subunit RNAPs currently known (Lane & Darst, 2010a, 2010b, as well as auxiliary subunits δ and ε that are not found in other bacteria (Keller et al., 2014;Weiss & Shaw, 2015). In the original work characterizing the function of HelD as a transcription complex recycling factor, it was shown that although δ or HelD on their own enhanced recycling, there was a synergistic relationship between them in B. subtilis transcription recycling assays . Structural analysis of RNAP recycling complexes shows that δ and HelD interact, as well as providing clues as to how δ could enhance the recycling activity of HelD by augmenting clamp opening (Pei et al., 2020). These structural studies also provided insights into how δ could facilitate transcription recycling in the absence of HelD (Miller et al., 2021).

Genome searches indicated that not all Firmicutes contained both
helD and rpoE (encoding the δ subunit) genes, and an analysis was performed based on the rpoB gene to establish whether there is segregation of genes amongst orders and/or based on the natural environment ( Figure 6).
In the bulk of cases, the Bacilli, Lactobacilli, Leuconostoc, and Enterococci contained genes for both HelD and δ, and if the gene for one protein was missing, the other was present ( Figure 6). The Staphylococci were heterogeneous with species such as S. rostri containing both helD and rpoE genes, whereas S. aureus only contained the gene for the δ subunit. There is a segregation of species containing both helD and rpoE cf. rpoE only, with rpoE only present in the S. saprophyticus and S. aureus clusters (Takahashi et al., 1999).
Species that fall within the S. hyicus-intermedius cluster (e.g., S. rostri) contained both helD and rpoE, but there were exceptions such as S. felis, which only contained rpoE ( Figure 6). The Streptococci (order Lactobacillales) only contained the rpoE gene (Figure 6), whereas the Clostridia, except for C. (Erysipelatoclostridium) cocleatum and inoccuum, only contained helD genes ( Figure 6). Thus, it appears that in the Firmicutes, especially class Bacillus, the default situation is for both rpoE and helD to be present, but the absence of one gene is compensated for by the presence of the other to ensure the ability to recycle stalled transcription complexes is retained.

| Many bacteria contain multiple helD genes
A striking observation made in the preliminary phylogenetic analysis of HelD was that some organisms contain more than one helD gene (Newing et al., 2020). This preliminary analysis has now been extended and it is clear that the presence of >1 helD is common and is found in both Gram-positive and -negative organisms (Figure 3a). and showed the level of helD expression was not influenced by amphotericin B and was ~3% that of rpoB (Figure 7a). This is also consistent with proteomics analysis indicating HelD is present at ~6% the level of RNAP (Delumeau et al., 2011). B. cereus contains two helD genes and the data set from strain F837/76 (Jessberger et al., 2019) grown in the presence and absence of mucin that can influence toxin production shows that both copies (one large, one small variant) are expressed, albeit at low levels, and expression is not significantly affected on exposure to mucin (Figure 7b). C. perfringens also contains two Class I helD genes, labeled CPE_0599 (small; 706 aa) and CPE_1619 (large; 763 aa) in strain 13, and expression levels were determined from datasets of cells grown in brain heart infusion (BHI) and a rich medium developed for the optimal growth of fastidious anaerobes, fastidious anaerobe broth +2% glucose (FABG) medium (Soncini et al., 2020). Both genes were expressed at levels comparable to helD in B. subtilis, and their cognate prcA/uvrD, although CPE_0599 expression increased ~3-fold and CPE_1619 expression decreased in FABG medium compared to BHI medium (Figure 6c).
S. coelicolor A2(3) contains four Class II helD genes, two encoding large (SCO_2952 744 aa, and SCO_5439 755 aa) and two encoding small (SCO_4195 680 aa, and SCO_4316 681 aa) variants. Data from a study on growth phase-dependent changes in gene expression (Jeong et al., 2016) were obtained from the SRA for analysis of helD expression and compared with rpoB and pcrA. All four helD genes Coriobacteria are all common in the gut microbiome, it appears A. equolifaciens has acquired helD genes from gut microorganisms on two separate occasions. Indeed, an unusual feature of helD genes is that many organisms contain multiple paralogues and that all versions are expressed. Why some organisms have a single gene for helD while a closely related species has multiple expressed copies is unclear, and this will make a fascinating avenue for future research. It is interesting to note that actinobacteria, such as Streptomyces, Frankia, and Nonomuraea (numbers 50, 51, 54, and 55; Figure 3) that are known producers of valuable bioactive compounds used as antibiotics and anti-cancer drugs contained the largest number of helD genes (4-5). The 5 helD genes in Nonomuraea (number 55, Figure 3), which is a known producer of DNA-intercalating agents (Sungthong & Nakaew, 2015)  is induced in the presence of rifampicin and has an upstream RAE (Surette et al., 2021). It is interesting to note that despite encoding a rifampicin-resistant RNAP β subunit, Nonomuraea also has an RAE located directly upstream of helD NOA_42280 (#55.3; Appendix 1;

Figures A7 and A8).
Investigation of the distribution of helD genes with upstream RAEs revealed they were clustered to two sub-branches of the Actinobacteria (Appendix 1; Figure A8) that may be considered the HelR grouping based on the nomenclature of these proteins by (Hurst-Hess et al., 2021;Surette et al., 2021). It should be noted that clearly identifiable RAEs could not be found upstream of all the genes in the HelR group, including for Frankia alni, Nocardia brasiliensis, or Mycolicibacterium phlei (54.2, 56.2, and 64, respectively; Figure 3 and Appendix 1 Figure A2). Rifampicin has also been observed to induce helD expression in the low G+C Grampositive B. subtilis, but this induction does not confer resistance to the drug (Hutter et al., 2004). Nevertheless, the ability of naturally Finally, it is important that genome annotation databases are updated as helD genes are often classified as pcrA, uvrD, or helicase IV-ATPase. Correct annotation of helD genes will enable a more detailed understanding of the distribution, evolution, and function of this fascinating new category of transcription factors.

E TH I C S S TATEM ENT
None required.

ACK N OWLED G EM ENTS
The authors appreciate the constructive comments from Brett Neilan, Leanne Pearson-Neilan, Caitlin Romanis, and Karl Hassan during the preparation of this article. This work was funded by the Australian Research Council grant DP210100365 (PL, ND, and AO).

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data are provided in full in the results section of this paper and all sequences used are available from the NCBI at https://www.ncbi.

nlm.nih.gov
A PPE N D I X 1 F I G U R E A 1 Acquisition of helD genes by Coriobacteria from Firmicutes and Clostridia. The phylogenetic tree from Figure 1 is shown on the left side with the region boxed expanded on the right side. Bacterial classes are colored, species numbered, number of helD genes colored as in Figure 1

F I G U R E A 3
Distribution of helD genes in the phylum Bacteroidota. HelD sequences from Bacteroidota RefSeq genomes were retrieved from a BLASTP search and mapped to individual species within the phylum Bacteroidota using Annotree (Mendler et al.,2019). Bacteroidotal classes are shown in the colored outer ring with Bacteroidia in pink, Rhodothermia in grey, Chlorobia in light grey, UBA10030 in lime green, Kryptonia in pale green, Ignavibacteria in cyan, Kapabacteria in pale blue, and SZUA-365 in blue. Individual species are shown as lines radiating out from the circular dendrogram with species containing HelD sequences highlighted in bright blue F I G U R E A 4 Acquisition of helD genes by Bacteroides from Clostridia. The phylogenetic tree from Figure 1 is shown on the left side with the region boxed expanded on the right side. Bacterial classes are colored, species numbered, number of helD genes colored as in Figure 1 (Newing et al., 2020), with panel B showing the equivalent sequence motifs from M. xanthus HelD. Appendix Table A1 shows the conserved sequence motifs with sequence numbers referring to the B. subtilis HelD sequence. X corresponds to a poorly conserved sequence (any amino acid) and h to a conserved hydrophobic residue. Residues colored red are specific to class I and green to class II sequences. The HelD motifs from the Class III M. xanthus HelD (Class IIIMX) are shown in the right column with absolutely conserved motif residues shown in purple (blue for the ATP binding motifs) and the Class III defining residue (F in the case of M. xanthus) that is inserted in the DWRAP motif shown in grey (see text for more details)