Next Article in Journal
SparkRA: Enabling Big Data Scalability for the GATK RNA-seq Pipeline with Apache Spark
Previous Article in Journal
Targeted Next-Generation Sequencing in Patients with Suggestive X-Linked Intellectual Disability
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluation of the Abundance of DNA-Binding Transcription Factors in Prokaryotes

by
Israel Sanchez
1,
Rafael Hernandez-Guerrero
1,
Paul Erick Mendez-Monroy
1,
Mario Alberto Martinez-Nuñez
2,
Jose Antonio Ibarra
3 and
Ernesto Pérez-Rueda
1,4,*
1
Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Mérida.C.P. 97302, Yucatán, Mexico
2
Unidad Académica de Ciencias y Tecnología de Yucatán, UMDI-Sisal. Facultad de Ciencias, UNAM, Mérida C.P. 97302, Yucatán, Mexico
3
Laboratorio de Genética Microbiana, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad de México C.P. 11340, Mexico
4
Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago C.P. 7500000, Chile
*
Author to whom correspondence should be addressed.
Genes 2020, 11(1), 52; https://doi.org/10.3390/genes11010052
Submission received: 11 November 2019 / Revised: 13 December 2019 / Accepted: 25 December 2019 / Published: 3 January 2020
(This article belongs to the Section Molecular Genetics and Genomics)

Abstract

:
The ability of bacteria and archaea to modulate metabolic process, defensive response, and pathogenic capabilities depend on their repertoire of genes and capacity to regulate the expression of them. Transcription factors (TFs) have fundamental roles in controlling these processes. TFs are proteins dedicated to favor and/or impede the activity of the RNA polymerase. In prokaryotes these proteins have been grouped into families that can be found in most of the different taxonomic divisions. In this work, the association between the expansion patterns of 111 protein regulatory families was systematically evaluated in 1351 non-redundant prokaryotic genomes. This analysis provides insights into the functional and evolutionary constraints imposed on different classes of regulatory factors in bacterial and archaeal organisms. Based on their distribution, we found a relationship between the contents of some TF families and genome size. For example, nine TF families that represent 43.7% of the complete collection of TFs are closely associated with genome size; i.e., in large genomes, members of these families are also abundant, but when a genome is small, such TF family sizes are decreased. In contrast, almost 102 families (56.3% of the collection) do not exhibit or show only a low correlation with the genome size, suggesting that a large proportion of duplication or gene loss events occur independently of the genome size and that various yet-unexplored questions about the evolution of these TF families remain. In addition, we identified a group of families that have a similar distribution pattern across Bacteria and Archaea, suggesting common functional and probable coevolution processes, and a group of families universally distributed among all the genomes. Finally, a specific association between the TF families and their additional domains was identified, suggesting that the families sense specific signals or make specific protein-protein contacts to achieve the regulatory roles.

1. Introduction

Gene regulation is crucial for optimal processes in the cell and as the first action to achieve expression during adaptation of metabolic responses to environmental conditions. In this context, regulation of gene expression at the transcriptional level, where DNA-binding transcription factors (TFs) play a fundamental role, allows the organisms to modulate the synthesis of specific genes depending on the metabolic requirements, stress responses, or food availability, among others. Hence, TFs interact with their DNA-binding sites around or overlapping the promoter-binding site [1,2], and in consequence allow or block access to the RNA polymerase, i.e. activating or repressing gene expression. In general, TFs are two-domain proteins, with a DNA-binding domain (DBD) in either the amino or carboxy terminus, which is involved in specific contacts with the regulatory region of the corresponding cognate genes, and an additional domain associated with diverse functions such as ligand binding or protein-protein interactions [3,4]. To date, diverse studies have shown that some TF families are common to bacteria and archaea, suggesting that the mechanisms affecting gene expression could be similar in the cellular domains of both of these groups of prokaryotes [5,6].
Therefore, a variety of factors are involved in the diversity of TFs and their families, such as the lifestyles. For instance, bacteria that have free-living lifestyles, such as Pseudomonas aeruginosa or Escherichia coli, bear a much larger number and variety of genes encoding transcriptional proteins than do intracellular pathogens that thrive in more stable biotopes [7,8]. In contrast, archaea organisms seem to have a lower proportion of TFs than bacteria, suggesting the existence in archaea of alternative mechanisms to compensate for the apparent deficit of protein regulators, including conformations of diverse protein complexes as a function of metabolic status [6,9].
Hence, to understand the association between the expansion patterns of different protein regulatory families, 1351 completely sequenced bacterial genomes, which represent adaptive designs for evolutionary classification, were analyzed. This analysis is important to understand their contribution to gene regulation in different lineages and provide insights into the functional and evolutionary constraints imposed on different classes of regulatory factors in bacterial and archaeal organisms. In this context, abundant families are not widely distributed across all bacteria and archaea. In contrast, certain small families are the most widely distributed. This difference might be associated with different phenomena, such as evolutionary constraints by regulatory mechanisms, as the case of LexA and LysR families. Our results also suggest that in larger genomes, regulatory complexity may possibly increase as a result of the increasing number of some TF families.

2. Materials and Methods

2.1. Bacterial and Archaeal Genomes Analyzed

A total of 5321 prokaryotic genomes from the NCBI Refseq genome database [10] were downloaded, and to exclude any bias associated with the overrepresentation of bacterial or archaeal genomes of one genus or species, we employed a web-based tool and a genome similarity score (GCCa) of ≥0.95 [11] as the limit to consider a genome non-redundant; this method resulted in a set of 1321 representative genomes.

2.2. Identification of DNA-Binding Domains Associated with TFs

We retrieved 16,712 hidden Markov models (HMMs) from the PFAM database and used them to scan 5321 genomes, using the program pfam_scan.pl with an E-value of ≤10−3 and with the option of clan_overlap “activated” (to show overlapping hits within clan member families; this step only applies to Pfam-A families). In a posterior step, 111 PFAMs associated with DNA-binding TFs were retrieved from diverse databases containing information on regulatory proteins, such as the DBD (DNA-binding domain) database, Regulon Database, and Database of transcriptional regulation in Bacillus subtilis (DBTBs). We also identified relevant information by manual curation to identify proteins devoted to gene regulation. (The complete list of PFAM IDs is included as supplementary material Table S1).

2.3. Protein Domain Enrichment Analysis

To evaluate the content of protein domains associated with the 111 families of TFs, their structural domains were determined by considering the PFAM assignments and enrichment analysis for each group. To this end, we used a one-tailed Fisher’s exact test (FET) to perform enrichment analysis, because it is related to the hypergeometric probability and can be used to calculate the significance (p-value) of the overlap between two independent datasets. We set statistical significance at a p-value of −10. Together with the FET, we also determined the false discovery rate (FDR) of the tests in order to account for type I errors. Corrections for multiple testing were performed using the Benjamini and Hochberg step-up FDR-controlling procedure to calculate adjusted p-values. All analyses were performed using the R software environment for statistical computing and graphics and the package multtest [12].

3. Results

3.1. The Repertoires of TF Families Correlate with Genome Sizes

In order to identify how TF families are distributed as a function of genome size, the Pearson correlation (R-value) was calculated for the abundance of each family against the number of open reading frames (ORFs) associated with all genomes. From this, nine TF families were identified as correlating with genome size (R ≥ 0.6), such as the Trans_reg_C (PF00486) and GerE (PF00196) families, which include two-component systems, and seven families associated with a one-component system [GntR (PF00392), TetR_N (PF00440), MarR (PF01047), HxlR (PF01638), MerR_1 (PF13411), CSD, and HTH_AraC families] (Figure 1 and Table 1). These families followed a similar trend of duplication and loss events as a function of genome dynamics, reinforcing the notion that increased gene complexity also requires the development of mechanisms for gene regulation at the transcription level [13], i.e., when the genome is duplicated, members of these families are also duplicated, but when gene loss occurs these families are affected, decreasing the size of the family. An interesting observation of these families is the fact that they are regulating central functions in the organisms, such as carbon sources uptake (HTH_AraC) and resistance to multiple drugs, as antibiotics or heavy metals (MarR and MerR_1), among others. Conversely, 102 families did not exhibit an evident correlation with the genome size, suggesting that a large proportion of duplication or gene loss events occur independently from genome size, such as the highly abundant families LysR (PF00126) and HTH_3 (PF01381), associated to regulate amino acid biosynthesis and a hypothetical family widely distributed along the organisms.

3.2. The Abundance of Families is Not Homogeneous Across the Genomes

In order to determine how regulatory proteins abundant are along the prokaryotic genomes, a total of 225,999 TFs were identified in 1321 bacterial and archaeal genomes. These proteins were clustered into 111 different families, and their abundances and distributions along the genomes were evaluated. In this regard, some families are quite heterogeneous in terms of their abundances; for instance, 34 of these families include fewer than 100 proteins per group, such as the transcriptional repressor of hyc and hyp operons, HycA_repressor (PF11046), or the PerC transcriptional activator (PF06069), whereas two of them [HTH_1 (LysR) or PF00126, and TetR_N or PF00440] include more than 20,000 members per group. See Table 1. Therefore, to evaluate how the abundance of TF families correlated with the bacterial and archaeal genome sizes, we calculated the coefficient of variation (CV), a measure of the dispersion of data points in a data series around the mean. In this regard, large families showed a minor variation among the genomes, whereas small families exhibited a wide variation (presence and abundance) among the organisms analyzed in this work. For instance, the large families Trans_reg_C (PF00486), with 14,446 members, exhibits an average 10.7 ± 10.91 proteins per genome (CV = 1.01); GntR (PF00392), with 13263 members, has 9.90 proteins per genome (CV = 1.33), and TetR_N (PF00440) had 12.02 proteins per genome (CV = 2.6). In contrast, the Histone_HNS family (PF00816) has 434 members (CV = 3.67), the AP2 (PF00847) family has 77 members (CV = 5.49), and the CitT (PF12431) family has 82 proteins (CV = 5.05). These results suggest that small families, such as AP2 (PF00847), CitT, or Histone_HNS, have a heterogeneous distribution and abundance among bacteria and archaeal genomes, whereas large families have a small CV, i.e. they have a more homogeneous distribution of size among the genomes. See Table 1.
When the abundance of the families was analyzed in detail, 12 of them were identified with more than 10 members per phylum (on average), whereas the rest of the families contained a low number of copies per phylum. In Figure 2 we show that seven families with more than 10 members were identified in Actinobacteria, 6 in Proteobacteria and Acidobacteria, and 5 are abundant in Firmicutes. There are three families in Verrumicrobia, two families each in Chlorobi, Cyanobacteria, Nitrospora, Planctomyces, and Bacterioidete. The family HTH_24 [or AsnC (PF13412)] was found to be abundant in Euryarchaeota, and the family Trans_reg_C (PF00486) is abundant in Deinococcus. Indeed, the family AsnC (HTH_24), abundant in Euryarchaeota, has been described as a group with global regulators in bacteria and archaea, suggesting a role in global regulation [14].

3.3. Correlation of TF Families among Bacteria and Archaeal Genomes

In order to evaluate if the families show a common distribution pattern that would allow us to hypothesize about potential correlation patterns and in consequence families working together, we calculated their coefficient of correlation, and a matrix of ALL versus ALL was created. From this matrix, a hierarchical cluster (HCL) analysis was achieved with a Manhattan distance and support tree with average linkage algorithm, with correlation uncentered as a similarity measure [15]. From this analysis, a total of 90 families were included in 15 clusters, whereas 21 families were not included in an evident cluster. Therefore, the clustering of families with similar distribution patterns suggests the existence of common distribution patterns as a consequence of regulation via similar mechanisms, such as the cluster in which members of the YoeB_toxin (PF06769), PhdYeFM_antitox (PF02604), and ParE_toxin (PF05016) families, all of which belong to toxin-antitoxin systems and are associated with the clan CL0136, were included. Indeed, proteins of these families interact among themselves to regulate postsegregation cell killing systems that might function as regulatory switches under stress conditions [16], and are involved in initiate cell death in bacterial and archaeal cultures and to content against the infection by phages or to regulate subpopulations [17].
Another interesting cluster is integrated by the flagellar regulation families (PF05247 FlhD and PF05280 FlhC); histone-like proteins such as Histone_HNS (PF00816); the Phage_AlpA (PF05930), BolA (PF01722), and Arc (PF03869) familes; and the Pro_dh-DNA_bdg (PF14850) family. All these familes exhibit a similar correlation pattern of distribution. In this regard, proteins of the bacterial flagellar transcriptional activator (FlhC) combine with FlhD to form a regulatory complex in E. coli [18] or members of the histone-like nucleoid-structuring (H-NS) protein, which plays a role in the formation of nucleoid structure. In addition, AlpA is in a family that consists of several short bacterial and phage proteins that are related to the E. coli protein AlpA, whereas BolA causes round morphology and may be involved in switching the cell between elongation and septation systems during cell division [19]. It has also been suggested that BolA induces the transcription of penicillin-binding proteins 6 and 5 [20]. In summary, these findings suggest that families work together to regulate common functions in bacteria and archaeal genomes, such as the FlhD and FlhC families, or AlpA and BolA families, and opens diverse correlations to be further analyzed in functional and structural terms to identify potential protein-protein contacts or similar regulatory mechanisms.

3.4. Distributions of Families among All the Genomes

To determine how the families are distributed among the complete collection of bacterial and archaeal genomes and to determine if there are families that are universally distributed, the distributions of the 111 PFAMs were traced along the 1321 genomes, and their rates of occurrence were calculated, considering the rate of total presence of a PFAM against the total number of organisms. Therefore, a value close to 1 indicates that the family is present in 100% of the organisms, whereas a value near 0 indicates that the family is absent in all the organisms. Table 1 and Figure 3. From this distribution, a set of 12 families were considered universally distributed, because they were found as in at least 80% of the total organisms, and they could be considered the basic core of regulators associated with prokaryotes. In this dataset, the following families were identified: HTH_3 (PF01381), Bac_DnaA_C (PF08299), Bac_DNA_binding (PF00216), Fur (PF01475), HTH_5 (PF01022), MerR_1 (PF13411), HTH_Crp_2 (PF13545), HTH_24 (PF13412), MarR (PF01047) and TetR_N (PF00440), HTH_1 (PF00126), and Trans_reg_C (PF00486). The rest of the families identified in all the genomes can be interchanged or lost among the bacteria and archaea as a consequence of their lifestyles. In general, the set of universal families is comprised of highly abundant groups, as TetR (highly abundant) and those that are not necessarily the most abundant ones, suggesting that families with a small number of members are also fundamental for the regulation of basic processes, such as ferric uptake regulator Fur, that connects iron transport and utilization enzymes with negative-feedback loop pairs for iron homeostasis [21]; Fur has been identified across a large diversity of organisms [22,23,24]. Similarly, DnaA is involved in initiation of chromosomal replication [25], a fundamental process of all organisms; or LexA-like proteins, with a wide distribution along the bacteria and archaeal genomes, suggesting that the SOS response might be a universal adaptation of bacteria to DNA damage [26]. This distribution, together with the probable coevolution of the LexA recognition domain and its binding motif [27], indicates that the regulated genes must also contain a conserved binding motif in the upstream regions. In summary, we suggest a probable scenario concerning the distribution of universal or widely distributed families, with a conservation of the regulated genes such as the SOS regulon and LexA, with few probable recruitments of additional TFs to regulated regulons, such as occurs in the evolution of regulatory networks [28].

3.5. Structural Domains Associated with Families

To gain insights into the association between the structural domains connected with the families, these groups of proteins were analyzed in terms of their domains. In total, 2694 different domains were identified to be associated 111 families. In Figure 4, the association between those domains and families, shows that abundant families do not have a high diversity of structural domains, i.e. small families, constrained to specific phyla, contain a large proportion of additional domains, such as the Fez1 with few members associated to more than 300 different domains. Similarly, large and universal families also contain a large number of domains, such as the Trans_reg_C with around 160. Therefore, to determine if there is a specific association between the domains identified and the families, an enrichment analysis using a one-tailed FET was performed, and a significance of p-value of less than −10 was considered. Table 1 shows the number of domains identified in all the families, and in Figure 5 (and supplementary material S1), a network representation of all associations between TFs and additional domains (enriched domains) is shown. From this network representation, we evaluated the most important nodes (TF families) by using the Maximal Clique Centrality (MCC) method, because it has been described to show excellent performance and precision in predicting essential proteins from networks [29]. Based on this approach, we identified a set of 10 families as highly important and specifically connected with their respective domains, including Fez1 (PF06818), bZIP_1 (PF00170), and bZIP_2 (PF07716), sharing domains among them, and suggesting that they could be associated with similar regulatory processes. In contrast, HTH_3 (PF01381) is strongly associated with the Methyltransf_22 (PF13383), suggesting that one of the most common architectures of this family contains those domains. In addition, the HTH_23 (PF13384), HTH_38 (PF13936), GerE (PF00196), MerR_1 (PF13411), HTH_24 (PF13412), and HTH_11 (PF08279) are described in the dataset as regulators with a large diversity of domains (more than 60 different domains), and are not linked among them, suggesting that they contain specific domains with few domains that are shared with other families of TFs. Therefore, we suggest that combinations between the DNA-binding domains and their associated domains significantly increase the sensing of diverse signal compounds, decreasing signaling cross talk and making the response to environmental stimuli in bacterial and archaeal organisms more efficient.

4. Discussions

In this work, we evaluated 111 families of DNA-binding TFs on bacterial and archaeal genomes, how abundant they are in prokaryotic genomes, and how they are distributed according to genome size. For the examined families, we found a set of nine families whose distribution correlates with the genome size and that represent more than 40% of the total of TF identified by HMM profiles. These families have been intimately associated with diverse and central functions in the organisms, such as the two-component systems (Trans_reg_C and GerE), multiple antibiotic resistance responses (TetR_N and MarR), or carbon sources uptake, virulence, and nitrogen assimilation (HTH_AraC), among others. The correlation between the abundance of these families and the genome size reinforces the notion that increased gene complexity also requires the development of mechanisms for gene regulation at the transcription level [13]. In contrast, 56.3% of the collection do not exhibit a clear correlation with the genome size, suggesting that a large proportion of families have independent evolutionary events associated with their increasing, such as duplications or gene losses, opening questions to be further explored, such as how many families in a genome are product of lateral gene transfer or what occurs with the regulated genes or how many families with a similar distribution pattern across Bacteria and Archaea, are product coevolution processes. In addition, we found a specific association between the DNA-binding domains and their associated companion domains, as it has previously described [3], suggesting that the scaffold to protein-protein interactions could be conserved among members of the same family contacts, as occurs in the Crp family and that their association in diverse bacterial and archaeal genomes could increase the ability of the organisms to recognize and respond to diverse environmental stimuli [30]. This result opens the opportunity to predict and modify the probable ligands to understand the diversity of signals that modulate the activity of transcription factors, as it has been identified for E. coli [31].
Finally, based on a correlation matrix of all families, we identified a probable coevolution processes of families devoted to regulate similar processes, such as the members of the YoeB_toxin (PF06769), PhdYeFM_antitox (PF02604), and ParE_toxin (PF05016) families exhibited similar distribution patterns among all the bacterial and archaeal genomes, suggesting that the regulation to initiate cell death in bacterial and archaeal cultures is widely distributed to content against the infection by phages or to regulate subpopulations [17].

5. Conclusions

In conclusion, diverse scenarios might occur, depending notably on the family of TF associated, such as those abundant and universal families devoted to regulate amino acid biosynthesis (LysR) or antibiotic resistance (TetR and MarR), or those less abundant ones such as the LexA family, whose distribution suggest that the SOS response might be a universal adaptation of prokaryotic organisms to DNA damage [26].

Supplementary Materials

The following are available online at https://www.mdpi.com/2073-4425/11/1/52/s1, Table S1: Description of all protein families identified in the bacterial and archaea genomes.

Author Contributions

Conceptualization, J.A.I. and E.P.-R.; Formal analysis, I.S., R.H.-G. and E.P.-R.; Funding acquisition, M.A.M.-N. and E.P.-R.; Investigation, J.A.I. and E.P.-R.; Methodology, I.S., R.H.-G., M.A.M.-N., P.E.M.-M., and E.P.-R.; Resources, E.P.R.; Writing—original draft, J.A.I. and E.P.-R. All authors have read and agreed to the published version of the manuscript.

Funding

E.P.-R. and M.A.M.N. were funded by DGAPA-UNAM (IN-201117 and IA-205417, respectively).

Acknowledgments

Authors would like thank Joaquin Morales, and Sandra Sauza are very much appreciated for their computational support.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Browning, D.F.; Busby, S.J. The regulation of bacterial transcription initiation. Nat. Rev. Microbiol. 2004, 2, 57–65. [Google Scholar] [CrossRef] [PubMed]
  2. Browning, D.F.; Busby, S.J. Local and global regulation of transcription initiation in bacteria. Nat. Rev. Microbiol. 2016, 14, 638–650. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Rivera-Gomez, N.; Martinez-Nunez, M.A.; Pastor, N.; Rodriguez-Vazquez, K.; Perez-Rueda, E. Dissecting the protein architecture of DNA-binding transcription factors in bacteria and archaea. Microbiology 2017, 163, 1167–1178. [Google Scholar] [CrossRef] [PubMed]
  4. Rivera-Gomez, N.; Segovia, L.; Perez-Rueda, E. Diversity and distribution of transcription factors: Their partner domains play an important role in regulatory plasticity in bacteria. Microbiology 2011, 157, 2308–2318. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Lemmens, L.; Maklad, H.R.; Bervoets, I.; Peeters, E. Transcription Regulators in Archaea: Homologies and Differences with Bacterial Regulators. J. Mol. Biol. 2019, 431, 4132–4146. [Google Scholar] [CrossRef] [PubMed]
  6. Perez-Rueda, E.; Janga, S.C. Identification and genomic analysis of transcription factors in archaeal genomes exemplifies their functional architecture and evolutionary origin. Mol. Biol. Evol. 2010, 27, 1449–1459. [Google Scholar] [CrossRef] [PubMed]
  7. Martinez-Nunez, M.A.; Poot-Hernandez, A.C.; Rodriguez-Vazquez, K.; Perez-Rueda, E. Increments and duplication events of enzymes and transcription factors influence metabolic and regulatory diversity in prokaryotes. PLoS ONE 2013, 8, e69707. [Google Scholar] [CrossRef]
  8. Perez-Rueda, E.; Hernandez-Guerrero, R.; Martinez-Nunez, M.A.; Armenta-Medina, D.; Sanchez, I.; Ibarra, J.A. Abundance, diversity and domain architecture variability in prokaryotic DNA-binding transcription factors. PLoS ONE 2018, 13, e0195332. [Google Scholar] [CrossRef] [Green Version]
  9. Tenorio-Salgado, S.; Huerta-Saquero, A.; Perez-Rueda, E. New insights on gene regulation in archaea. Comput. Biol. Chem. 2011, 35, 341–346. [Google Scholar] [CrossRef]
  10. Haft, D.H.; DiCuccio, M.; Badretdin, A.; Brover, V.; Chetvernin, V.; O’Neill, K.; Li, W.; Chitsaz, F.; Derbyshire, M.K.; Gonzales, N.R.; et al. RefSeq: An update on prokaryotic genome annotation and curation. Nucleic Acids Res. 2018, 46, D851–D860. [Google Scholar] [CrossRef]
  11. Moreno-Hagelsieb, G.; Wang, Z.; Walsh, S.; ElSherbiny, A. Phylogenomic clustering for selecting non-redundant genomes for comparative genomics. Bioinformatics 2013, 29, 947–949. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. R-programming: Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2011. [Google Scholar]
  13. Ranea, J.A.; Buchan, D.W.; Thornton, J.M.; Orengo, C.A. Evolution of protein superfamilies and bacterial genome size. J. Mol. Biol. 2004, 336, 871–887. [Google Scholar] [CrossRef] [PubMed]
  14. Peeters, E.; Charlier, D. The Lrp family of transcription regulators in archaea. Archaea 2010, 2010, 750457. [Google Scholar] [CrossRef] [PubMed]
  15. Saeed, A.I.; Sharov, V.; White, J.; Li, J.; Liang, W.; Bhagabati, N.; Braisted, J.; Klapa, M.; Currier, T.; Thiagarajan, M.; et al. TM4: A free, open-source system for microarray data management and analysis. BioTechniques 2003, 34, 374–378. [Google Scholar] [CrossRef] [Green Version]
  16. Anantharaman, V.; Aravind, L. New connections in the prokaryotic toxin-antitoxin network: Relationship with the eukaryotic nonsense-mediated RNA decay system. Genome Biol. 2003, 4, R81. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Engelberg-Kulka, H.; Amitai, S.; Kolodkin-Gal, I.; Hazan, R. Bacterial programmed cell death and multicellular behavior in bacteria. PLoS Genet. 2006, 2, e135. [Google Scholar] [CrossRef]
  18. Pruss, B.M.; Liu, X.; Hendrickson, W.; Matsumura, P. FlhD/FlhC-regulated promoters analyzed by gene array and lacZ gene fusions. FEMS Microbiol. Lett. 2001, 197, 91–97. [Google Scholar] [CrossRef]
  19. Moreira, R.N.; Dressaire, C.; Barahona, S.; Galego, L.; Kaever, V.; Jenal, U.; Arraiano, C.M. BolA Is Required for the Accurate Regulation of c-di-GMP, a Central Player in Biofilm Formation. MBio 2017, 8, e00443-17. [Google Scholar] [CrossRef]
  20. Guinote, I.B.; Matos, R.G.; Freire, P.; Arraiano, C.M. BolA affects cell growth, and binds to the promoters of penicillin-binding proteins 5 and 6 and regulates their expression. J. Microbiol. Biotechnol. 2011, 21, 243–251. [Google Scholar]
  21. Seo, S.W.; Kim, D.; Latif, H.; O’Brien, E.J.; Szubin, R.; Palsson, B.O. Deciphering Fur transcriptional regulatory network highlights its complex role beyond iron metabolism in Escherichia coli. Nat. Commun. 2014, 5, 4910. [Google Scholar] [CrossRef] [Green Version]
  22. Gonzalez, A.; Angarica, V.E.; Sancho, J.; Fillat, M.F. The FurA regulon in Anabaena sp. PCC 7120: In silico prediction and experimental validation of novel target genes. Nucleic Acids Res. 2014, 42, 4833–4846. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Quatrini, R.; Lefimil, C.; Veloso, F.A.; Pedroso, I.; Holmes, D.S.; Jedlicki, E. Bioinformatic prediction and experimental verification of Fur-regulated genes in the extreme acidophile Acidithiobacillus ferrooxidans. Nucleic Acids Res. 2007, 35, 2153–2166. [Google Scholar] [CrossRef] [PubMed]
  24. Grifantini, R.; Sebastian, S.; Frigimelica, E.; Draghi, M.; Bartolini, E.; Muzzi, A.; Rappuoli, R.; Grandi, G.; Genco, C.A. Identification of iron-activated and -repressed Fur-dependent genes by transcriptome analysis of Neisseria meningitidis group B. Proc. Natl. Acad. Sci. USA 2003, 100, 9542–9547. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Zawilak-Pawlik, A.; Nowaczyk, M.; Zakrzewska-Czerwinska, J. The Role of the N-Terminal Domains of Bacterial Initiator DnaA in the Assembly and Regulation of the Bacterial Replication Initiation Complex. Genes (Basel) 2017, 8, 136. [Google Scholar] [CrossRef]
  26. Wojciechowski, M.F.; Peterson, K.R.; Love, P.E. Regulation of the SOS response in Bacillus subtilis: Evidence for a LexA repressor homolog. J. Bacteriol. 1991, 173, 6489–6498. [Google Scholar] [CrossRef] [Green Version]
  27. Sanchez-Alberola, N.; Campoy, S.; Emerson, D.; Barbe, J.; Erill, I. An SOS Regulon under Control of a Noncanonical LexA-Binding Motif in the Betaproteobacteria. J. Bacteriol. 2015, 197, 2622–2630. [Google Scholar] [CrossRef] [Green Version]
  28. Martinez-Nunez, M.A.; Perez-Rueda, E.; Gutierrez-Rios, R.M.; Merino, E. New insights into the regulatory networks of paralogous genes in bacteria. Microbiology 2010, 156, 14–22. [Google Scholar] [CrossRef] [Green Version]
  29. Chin, C.H.; Chen, S.H.; Wu, H.H.; Ho, C.W.; Ko, M.T.; Lin, C.Y. cytoHubba: Identifying hub objects and sub-networks from complex interactome. BMC Syst. Biol. 2014, 8 (Suppl. 4), S11. [Google Scholar] [CrossRef] [Green Version]
  30. Korner, H.; Sofia, H.J.; Zumft, W.G. Phylogeny of the bacterial superfamily of Crp-Fnr transcription regulators: Exploiting the metabolic spectrum by controlling alternative gene programs. FEMS Microbiol. Rev. 2003, 27, 559–592. [Google Scholar] [CrossRef] [Green Version]
  31. Balderas-Martinez, Y.I.; Savageau, M.; Salgado, H.; Perez-Rueda, E.; Morett, E.; Collado-Vides, J. Transcription factors in Escherichia coli prefer the holo conformation. PLoS ONE 2013, 8, e65723. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Proportion of TFs(Transcription factors) as a function of R-values. On the X-axis is the indicated the R-value (Pearson correlation) between the genome size and the number of proteins per family of TFs. On the Y-axis is the proportion of TFs versus the dataset. An R-value ≥ 0.6 was considered significant (vertical dotted lines at 0.6). A value ≥ 0.050 was considered as abundant (horizontal dotted lines). Circle size denotes the proportion of the families and color the R-value.
Figure 1. Proportion of TFs(Transcription factors) as a function of R-values. On the X-axis is the indicated the R-value (Pearson correlation) between the genome size and the number of proteins per family of TFs. On the Y-axis is the proportion of TFs versus the dataset. An R-value ≥ 0.6 was considered significant (vertical dotted lines at 0.6). A value ≥ 0.050 was considered as abundant (horizontal dotted lines). Circle size denotes the proportion of the families and color the R-value.
Genes 11 00052 g001
Figure 2. Families identified with more than 10 members (on average) per cellular division. The Y-axis shows the number of TFs, and the X- and Z-axes indicate the cellular division and family ID, respectively.
Figure 2. Families identified with more than 10 members (on average) per cellular division. The Y-axis shows the number of TFs, and the X- and Z-axes indicate the cellular division and family ID, respectively.
Genes 11 00052 g002
Figure 3. Distribution of TF families among all the genomes. The X-axis indicates the distribution (as a percentage) of genomes with members of a given family. 0 represents absence of a family and 1 means that a family is distributed along all the organisms. The Y-axis indicates the proportion of families versus the total of TFs identified. The vertical dotted line indicates those families identified in more than 80% of organisms.
Figure 3. Distribution of TF families among all the genomes. The X-axis indicates the distribution (as a percentage) of genomes with members of a given family. 0 represents absence of a family and 1 means that a family is distributed along all the organisms. The Y-axis indicates the proportion of families versus the total of TFs identified. The vertical dotted line indicates those families identified in more than 80% of organisms.
Genes 11 00052 g003
Figure 4. Number of NR domains associated with TF families. The X-axis indicates the proportion (as a percentage) of abundance of a given family versus the total of TFs. The Y-axis indicates the total number of non-redundant domains per family. The dotted line indicates the most abundant families.
Figure 4. Number of NR domains associated with TF families. The X-axis indicates the proportion (as a percentage) of abundance of a given family versus the total of TFs. The Y-axis indicates the total number of non-redundant domains per family. The dotted line indicates the most abundant families.
Genes 11 00052 g004
Figure 5. Top 10 families of TFs identified by MCC(Maximal Clique Centrality). Each box represents a protein family (see Section 3.5) and lines between boxes indicate that some families share additional domains.
Figure 5. Top 10 families of TFs identified by MCC(Maximal Clique Centrality). Each box represents a protein family (see Section 3.5) and lines between boxes indicate that some families share additional domains.
Genes 11 00052 g005
Table 1. Families evaluated in this work. Columns are as follows: PFAM ID, PFAM Name and PFAM Description: R-value (correlation between genome size and PFAM distribution); Total number of members in the bacterial and archaeal genomes; Proportion of the PFAM (%) relative to the complete collection; PFAM Coefficient of Variation (CV); Number of total of domains; Number of different domains (NR); PFAM Distribution (%) among all the organisms.
Table 1. Families evaluated in this work. Columns are as follows: PFAM ID, PFAM Name and PFAM Description: R-value (correlation between genome size and PFAM distribution); Total number of members in the bacterial and archaeal genomes; Proportion of the PFAM (%) relative to the complete collection; PFAM Coefficient of Variation (CV); Number of total of domains; Number of different domains (NR); PFAM Distribution (%) among all the organisms.
PFAM IDPFAM NamePFAM DescriptionCorrelation (R value)Total No. of ProteinsProportionCVDomains (Total)Domains (NR)Genomic Distribution (%)
PF00096zf-C2H2Zinc finger, C2H2 type−0.04590.006.82105240.03
PF00126HTH_1Bacterial regulatory helix-turn-helix protein, lysR family0.57213200.091.7042400980.87
PF00165HTH_AraCBacterial regulatory helix-turn-helix proteins, AraC family0.05470.006.8586270.03
PF00170bZIP_1bZIP transcription factor0.032480.003.1145873410.12
PF00196GerEBacterial regulatory proteins, luxR family0.71129320.061.53250202900.79
PF00216Bac_DNA_bindingBacterial DNA-binding protein0.2432760.010.954731400.88
PF00313CSD’Cold-shock’ DNA-binding domain0.6236510.020.913866390.79
PF00325CrpBacterial regulatory proteins, crp family0.181860.002.70377240.13
PF00356LacIBacterial regulatory proteins, lacI family0.4370830.031.71145251080.61
PF00376MerRMerR family regulatory protein0.5415060.011.652885660.43
PF00392GntRBacterial regulatory proteins, gntR family0.70132630.061.33254782050.79
PF00440TetR_NBacterial regulatory proteins, tetR family0.67238390.112.62330091970.87
PF00486Trans_reg_CTranscriptional regulatory protein, C terminal0.74144460.061.01312821630.87
PF00816Histone_HNSH-NS histone family0.244340.003.68537370.16
PF00847AP2AP2 domain0.08770.005.50105130.04
PF01022HTH_5Bacterial regulatory protein, arsR family0.4339400.020.9346831380.84
PF01047MarRMarR family0.6658800.031.1563841420.81
PF01258zf-dskA_traRProkaryotic dksA/traR C4-type zinc finger0.3115640.011.021744250.67
PF01316Arg_repressorArginine repressor, DNA binding domain0.106310.001.331287260.41
PF01325Fe_dep_repressIron dependent repressor, N-terminal DNA binding domain0.1110760.001.232675430.53
PF01340MetJMet Apo-repressor, MetJ0.05720.004.277320.05
PF01371Trp_repressorTrp repressor protein0.042990.002.00330200.21
PF01381HTH_3Helix-turn-helix0.50120840.050.82167413940.94
PF01402RHH_1Ribbon-helix-helix protein, copG family0.1717140.011.562228930.52
PF01418HTH_6Helix-turn-helix domain, rpiR family0.2718020.011.743574480.43
PF01475FURFerric uptake regulator family0.5128910.010.642992450.87
PF01638HxlRHxlR-like helix-turn-helix0.6432550.011.653389370.64
PF01722BolABolA-like protein0.208980.001.3690550.37
PF01726LexA_DNA_bindLexA DNA binding domain0.4410150.000.911988330.64
PF01978TrmBSugar-specific transcriptional regulator TrmB0.0414360.012.4724751640.38
PF02082Rrf2Transcriptional regulator0.4825310.010.852644510.78
PF02257RFX_DNA_bindingRFX DNA-binding domain0.03120.0013.562870.01
PF02467WhibTranscription factor WhiB0.389470.002.921000180.14
PF02604PhdYeFM_antitoxAntitoxin Phd_YefM, type II toxin-antitoxin system0.2920200.011.582104300.52
PF02892zf-BEDBED zinc finger−0.01170.0011.1130110.01
PF02954HTH_8Bacterial regulatory protein, Fis family0.4271620.031.51205501570.61
PF03333PapBAdhesin biosynthesis transcription regulatory protein0.05140.0011.012250.01
PF03551PadRTranscriptional regulator PadR-like family0.4930980.011.623782520.62
PF03749SfsASugar fermentation stimulation protein0.074580.001.4647570.33
PF03869ArcArc-like DNA binding domain0.201710.003.5017430.10
PF03965Penicillinase_RPenicillinase repressor0.3910130.001.881081390.37
PF04014MazE_antitoxinAntidote-toxin recognition MazE, bacterial antitoxin0.0717980.011.722321590.46
PF04024PspCPspC domain0.3410040.001.471300340.41
PF04221RelBRelB antitoxin-0.084660.002.96499180.17
PF04247SirBInvasion gene expression up-regulator, SirB0.122130.002.56291260.14
PF04299FMN_bind_2Putative FMN-binding domain0.413250.002.0432510.22
PF04353Rsd_AlgQRegulator of RNA polymerase sigma(70) subunit, Rsd/AlgQ0.11980.003.7411240.07
PF04383KilA-NKilA-N domain0.061360.004.19160140.08
PF04397LytTRLytTr DNA-binding domain0.2526490.012.134836470.53
PF04606Ogr_DeltaOgr/Delta-like zinc finger0.11800.004.9494100.05
PF04761Phage_TregLactococcus bacteriophage putative transcription regulator−0.0110.0036.47110.00
PF04947Pox_VLTF3Poxvirus Late Transcription Factor VLTF3 like−0.02140.0012.8625120.01
PF04967HTH_10HTH DNA binding domain0.027040.007.471610610.07
PF05016ParE_toxinParE toxin of type II toxin-antitoxin system, parDE0.2218080.011.631839100.48
PF05043MgaMga helix-turn-helix domain−0.016950.005.322544410.13
PF05068MtlRMannitol repressor0.06370.006.543820.02
PF05225HTH_psqhelix-turn-helix, Psq domain0.08500.006.45139450.03
PF05247FlhDFlagellar transcriptional activator (FlhD)0.181140.004.4416720.06
PF05280FlhCFlagellar transcriptional activator (FlhC)0.191160.004.1211830.07
PF05321HHAHaemolysin expression modulating protein0.03330.008.723420.02
PF05443ROS_MUCRROS/MUCR transcriptional regulator protein0.163620.006.11416200.10
PF05764YL1YL1 nuclear protein0.08120.0010.4841200.01
PF05848CtsRFirmicute transcriptional repressor of class III stress genes (CtsR)−0.041910.002.5020280.14
PF05930Phage_AlpAProphage CP4-57 regulatory protein (AlpA)0.184310.002.74482110.18
PF06018CodYCodY GAF-like domain0.011880.002.85398170.12
PF06054CoiACompetence protein CoiA-like family0.041820.002.8119890.12
PF06069PerCPerC transcriptional activator0.0270.0014.86930.01
PF06116RinBTranscriptional activator RinB−0.0290.0019.41910.00
PF06320GCN5L1GCN5-like protein 1 (GCN5L1)−0.03340.008.39214710.02
PF06338ComKComK protein0.01930.004.859640.05
PF06769YoeB_toxinYoeB-like toxin of bacterial type II toxin-antitoxin system0.103500.002.4935640.18
PF06818Fez1Fez10.073990.002.7783164000.16
PF06839zf-GRFGRF zinc finger0.0540.0028.82740.00
PF06923GutMGlucitol operon activator protein (GutM)0.02760.004.718160.05
PF06943zf-LSD1LSD1 zinc finger0.0250.0016.281660.00
PF07180CaiF_GrlACaiF/GrlA transcriptional regulator0.0060.0017.17930.00
PF07417CrlTranscriptional regulator Crl0.08310.006.483110.02
PF07704PSK_trans_facRv0623-like transcription factor0.241420.004.38293160.07
PF07716bZIP_2Basic region leucine zipper0.00350.007.389011520.02
PF07750GcrAGcrA cell cycle regulator0.192110.003.58232120.10
PF07764Omega_RepressOmega Transcriptional Repressor-0.0340.0018.21410.00
PF07804HipA_CHipA-like C-terminal domain0.298140.001.851436100.32
PF07848PaaXPaaX-like protein0.482720.002.68530190.15
PF07879PHB_acc_NPHB/PHA accumulation regulator DNA-binding domain0.312510.002.13553170.18
PF08220HTH_DeoRDeoR-like helix-turn-helix domain0.4525670.011.355238870.60
PF08222HTH_CodYCodY helix-turn-helix domain−0.011730.002.7634030.12
PF08270PRD_MgaM protein trans-acting positive regulator (MGA) PRD domain−0.06200.0010.595750.01
PF08279HTH_11HTH domain0.3532900.011.3683311830.72
PF08280HTH_MgaM protein trans-acting positive regulator (MGA) HTH domain−0.02980.007.17325130.04
PF08299Bac_DnaA_CBacterial dnaA protein helix-turn-helix0.2013850.010.563716230.90
PF09278MerR-DNA-bindMerR, DNA binding0.449810.001.751988390.35
PF09339HTH_IclRIclR helix-turn-helix domain0.5643830.021.8588341580.58
PF11046HycA_repressorTranscriptional repressor of hyc and hyp operons0.0650.0016.28510.00
PF12324HTH_15Helix-turn-helix domain of alkylmercury lyase0.17420.007.6183200.03
PF12431CitTTranscriptional regulator0.17820.005.0618680.05
PF12793SgrR_NSugar transport-related sRNA regulator N-term0.141020.005.6821550.04
PF12833HTH_18Helix-turn-helix domain0.62160650.071.63327662360.75
PF13384HTH_23Homeodomain-like domain0.3722610.011.5548362740.63
PF13404HTH_AsnC-typeAsnC-type helix-turn-helix domain0.5923240.011.684460880.52
PF13411MerR_1MerR HTH family regulatory protein0.6454680.021.2372552060.82
PF13412HTH_24Winged helix-turn-helix DNA-binding0.4145670.021.1288742600.81
PF13413HTH_25Helix-turn-helix domain0.238170.001.051559600.54
PF13545HTH_Crp_2Crp-like helix-turn-helix domain0.5742280.021.0282481220.82
PF13556HTH_30PucR C-terminal helix-turn-helix domain0.4720650.012.503180690.36
PF13693HTH_35Winged helix-turn-helix DNA-binding0.041110.006.64135170.05
PF13936HTH_38Helix-turn-helix domain0.1313500.012.4028721650.40
PF14549P22_CroDNA-binding transcriptional regulator Cro0.06410.006.6451100.03
PF14850Pro_dh-DNA_bdgDNA-binding domain of Proline dehydrogenase0.243150.001.8796740.23
PF15723MqsR_toxinMotility quorum-sensing regulator, toxin of MqsA0.11720.004.5410370.05
PF15731MqsA_antitoxinAntitoxin component of bacterial toxin-antitoxin system, MqsA0.061990.003.13271200.12
PF15943YdaS_antitoxinPutative antitoxin of bacterial toxin-antitoxin system, YdaS/YdaT0.081710.003.87231290.09

Share and Cite

MDPI and ACS Style

Sanchez, I.; Hernandez-Guerrero, R.; Mendez-Monroy, P.E.; Martinez-Nuñez, M.A.; Ibarra, J.A.; Pérez-Rueda, E. Evaluation of the Abundance of DNA-Binding Transcription Factors in Prokaryotes. Genes 2020, 11, 52. https://doi.org/10.3390/genes11010052

AMA Style

Sanchez I, Hernandez-Guerrero R, Mendez-Monroy PE, Martinez-Nuñez MA, Ibarra JA, Pérez-Rueda E. Evaluation of the Abundance of DNA-Binding Transcription Factors in Prokaryotes. Genes. 2020; 11(1):52. https://doi.org/10.3390/genes11010052

Chicago/Turabian Style

Sanchez, Israel, Rafael Hernandez-Guerrero, Paul Erick Mendez-Monroy, Mario Alberto Martinez-Nuñez, Jose Antonio Ibarra, and Ernesto Pérez-Rueda. 2020. "Evaluation of the Abundance of DNA-Binding Transcription Factors in Prokaryotes" Genes 11, no. 1: 52. https://doi.org/10.3390/genes11010052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop