Abasy Atlas v2.2: The most comprehensive and up-to-date inventory of meta-curated, historical, bacterial regulatory networks, their completeness and system-level characterization

Graphical abstract


Background
Regulation at the gene transcription level is a fundamental process for bacteria to adapt to different media conditions and to cope with adverse environments. Transcription factors (TFs) mainly mediate this process. They are proteins capable to promote or hinder the transcription of their target genes (TGs). A TF-coding gene and its TGs conform a regulon, multiple regulons can be assembled to construct a gene regulatory network (GRN) where nodes and edges depict genes and interactions, respectively. Given the different specificity across TFs, they can contribute to organism adaptation in different levels which provides hierarchical and modular properties to GRNs in bacteria [1].
The increasing number of experimental strategies to study the transcriptional machinery [2] has allowed the community to unveil novel regulatory interactions. Despite curation efforts, many interactions remain buried in publications and are not integrated into a GRN yet. Organism-specific databases offer expertise and often are the primary resource for further research on the organism of interest. Such databases include RegulonDB [3] for Escherichia coli, DBTBS [4] and SubtiWiki [5] for Bacillus subtilis, CoryRegNet [6] for Corynebacterium glutamicum and MtbRegList [7] for Mycobacterium tuberculosis. Nonetheless, many of those databases are no longer updated or accessible [8]. Besides, the availability of multiple organism-specific databases gives rise to heterogeneity, which could bias results when cross-organisms analyses are performed. Such heterogeneity encompasses different names for the same gene and different network representations. This is even a problem for a single organism when complementary databases are integrated.
The analysis of global properties through multiple bacteria have revealed similarities among them [9][10][11][12][13][14]. Nonetheless, those studies have been limited to only a few organisms and results need to be validated with the most complete GRNs [15]. Besides, the study of the effect of network incompleteness on network structural analyses has been hindered by the limitations in databases to identify when a set of novel interactions is reported, and the experimental evidence supporting those interactions. Since no GRN curation model has been developed, works to study this phenomenon have been limited to simulate the curation process by decomposition or reconstruction of the GRNs by different random models [16,17].
Diverse databases cope with information inconsistency, such as CollecTF [18] for experimentally-validated TF binding sites in bacteria, and GSDB [19] for 3D chromosome and genome topological structures. Other resources integrating and homogenizing experimentally-validated data with computational predictions include STRING [20] for protein-protein interaction networks, SwissRegulon [21] for regulatory sites in prokaryotes and eukaryotes organisms, PRODORIC [22] for DNA binding sites for prokaryotic TFs, RegNetwork [23] for transcriptional and posttranscriptional regulatory relationships for human and mouse, and Network Portal (http://networks.systemsbiology.net/) for coregulation networks. But, poor efforts have been carried out to provide consolidated, disambiguated, homogenized high-quality GRNs on a global scale, their structural properties, system-level components, and their historical snapshots to trace their curation process.
Abasy Atlas v1.0 was originally conceived to fill this gap by making a cartography of the functional architectures of a wide range of bacteria [12]. Our database provides a comprehensive atlas of annotated functional systems (hereinafter also referred to as modules), statistical and structural network properties, and system-level elements for reconstructed and meta-curated (homogeneous and disambiguated) GRNs across 42 bacteria, including pathogenically and biotechnologically relevant organisms. Abasy Atlas is the first database in providing predictions of global regulators, basal machinery genes, members of functional modules, and intermodular genes based on the systemlevel elements predicted for the natural decomposition approach (NDA) in several bacteria [9,[11][12][13]. The NDA is a biologically motivated mathematical approach leveraging the global structural properties of a GRN to derive its architecture and classify its genes into one of the four above-mentioned categories of system-level elements. Abasy Atlas was also designed to provide statistical and structural properties characterizing the GRNs, such as their associated power laws, percentage of regulators, network density and giant component size, and the number of feedforward and feedback motifs among others.
In this work, we present the expanded version of Abasy (Acrossbacteria systems) Atlas, which consolidates information from different sources into historical snapshots of meta-curated GRNs in bacteria. Each historical snapshot represents the integrated knowledge we had about a GRN at a given time point. The new Abasy Atlas v2.2 makes possible to study the effect of network incompleteness across bacteria on diverse GRNs analyses, to identify potential bias and improvements, and to predict future results with more complete GRNs. Besides, Abasy Atlas GRNs integrates regulation mediated by regulatory proteins, small RNAs, sigma factors and regulatory complexes to better understand the biological systems [24]. This global representation of the GRNs eases their use because the organism-specific databases usually represent each network in a different file and different format, which can convolute the parsing of the network flat files and the integration of information.
While most proteins regulate gene transcription as homodimeric complexes, the regulation of gene expression can also be achieved by heteromeric complexes, whose subunits are encoded by different genes. Despite previous integrative approaches merging different level components [25][26][27], heterodimeric complexes have not been properly represented in most of them nor databases. One of the most common representations is to assign the regulations to each subunit, leading to a duplicated representation of the interaction in the GRNs. The new Abasy Atlas v2.2 provides a homogeneous representation for heteromeric complexes, when information is available, preserving the regulatory information and avoiding duplicated, misleading interactions.
In summary, Abasy Atlas v2.2 provides historical snapshots of reconstructed and meta-curated GRNs across bacteria, their completeness level, topological properties, and system-level components, enabling network completeness-dependent analyses for multiple organisms. Besides, the homogeneity of gene symbols, interactions confidence level, and network representation allow Abasy Atlas GRNs to be used as gold standards for benchmarking purposes, such as those to assess GRN predictions and theoretical models. In the section ''Functionality", we describe studies that would be benefited from the functionality of Abasy Atlas v2.2 [28][29][30][31][32][33][34][35].
Abasy Atlas does not intend to replace organism-specific databases containing regulatory interactions with biological information such as regulatory sites. Conversely, it fills a gap by providing a consolidated version of bacterial GRNs on a global scale, their structural properties, system-level components, and their historical snapshots to trace their curation process. Abasy Atlas is cross-linked to diverse external databases providing biological, genomic, and molecular details. Cross-links to organismspecific databases included as a source for each GRN are also provided. From there, the user can further inquire about biological considerations such as binding sites annotation, TF conformation, genome annotation, and chromosomal conformation. All essential data when studying the molecular mechanisms and evolution of GRNs in bacteria. In this way, Abasy Atlas serves as an acrossorganisms database coping with information inconsistency and providing high-quality GRNs on a global scale.
Remarkable uses of previous versions of Abasy Atlas [12] comprise the characterization of C. glutamicum GRN [13], the integration of gene regulatory interactions to metabolism to identify the relevant TGs suitable for strain improvement [36], and comparative genomic analyses to characterize the transcriptome profile of Corynebacterium pseudotuberculosis in response to iron limitation [37]. Abasy Atlas v2.0 was used to identify evolutionary constraints on the complexity of GRNs enabling the study of three models to predict the total number of genetic interactions [14]. The latter allowed to compute an interaction coverage as a proxy of network completeness, which improves the biased network genomic coverage (fraction of the genome in the network). Abasy Atlas V2.2 could be useful to improve these works since more complete GRNs provide more information regarding transcriptional regulation in medically and biotechnologically relevant organisms such as M. tuberculosis and C. glutamicum. Also, to improve models developed with the previous version of Abasy, such as the novel network completeness model presented in the section ''Estimating GRNs completeness by leveraging their constrained complexity".

2.
A primer on the natural decomposition approach: Predicting global regulators, modular genes shaping functional systems, basal machinery genes, and intermodular genes Abasy Atlas was designed to provide annotations of the modules and system-level elements integrating each GRN. These predictions are computed by using the NDA. The NDA is a largescale modeling approach characterizing the circuit wiring and its global architecture. It defines a mathematical-biological framework providing criteria to identify the four classes of systemlevel elements shaping GRNs: global regulators, modular genes shaping functional systems, basal machinery genes, and intermodular genes. Studies have shown that regulatory networks are highly plastic [38]. Despite this plasticity, by applying the NDA our group has found that there are organizational principles conserved by convergent evolution in the GRNs of phylogenetically distant bacteria [11]. The high predictive power of the NDA has been proven in previous studies by applying it to the phylogenetically distant E. coli [9], B. subtilis [11], and C. glutamicum [13], and by comparing it with other methods to identify modules [39].
The NDA defines objective criteria (e.g., the j-value to identify global regulators) to expose functional systems and system-level elements in a GRN, and rules to reveal its functional architecture by controlled decomposition (Supplementary Fig. 1). It is based on two biological premises [10,11]: (1) a module is a set of genes cooperating to carry out a particular physiological function, thus conferring different phenotypic traits to the cell. (2) Given the pleiotropic effect of global regulators, they must not belong to modules but rather coordinate them in response to generalinterest environmental cues.
According to the NDA, every gene in a GRN is predicted to belong to one out of four possible classes of system-level elements, which interrelate in a non-pyramidal, three-tier, hierarchy shaping the functional architecture [10][11][12][13] as follows (Supplementary Fig. 2): (1) Global regulators are responsible for coordinating both the (2) basal cell machinery, composed of strictly globally regulated genes and (3) locally autonomous modules (shaped by modular genes), whereas (4) intermodular genes integrate, at the promoter level, physiologically disparate module responses eliciting combinatorial processing of environmental cues.
These 76 GRNs comprise 204282 regulatory interactions (+160%) organized into 8459 (+128%) regulons and 4335 modules (+144%). We homogenized the representation of heteromeric TFs and their subunits and obtained a total of 12 heteromeric TFs, all of them in the GRN of E. coli K-12. However, this paves the way for a homogeneous representation of GRNs that will be propagated to more organisms in a future version of Abasy Atlas, when information regarding heteromeric TFs for these organisms is available. A total of 20 historical snapshots for the model organisms M. tuberculosis, B. subtilis, E. coli, and C. glutamicum were also included in the Abasy Atlas v2.2.

Unique machine-readable, user-friendly identifiers for each GRN reconstruction
Studies using GRNs from organism-specific databases usually cite the source database. However, while some articles specify the GRNs used [28,39], others do not [9,40]. This drives to a reproducibility problem when the database updates the GRN and does not provide the historical snapshots. To cope with this problem, a machine-readable and user-friendly identifier was assigned to each network to ease reporting and identification when using the database.
Network identifiers are constructed as follows: Five fields are separated by an underscore, three are mandatory and two are optional. The first field represents the NCBI taxonomy ID of the organism (mandatory). The second field, preceded by a ''v", which stands for version, is the year when the network was reconstructed (mandatory). The field starting with an ''s" provides information about the sources from which the network was reconstructed (mandatory). The confidence level of the evidence supporting the regulatory interactions is described by an optional field starting with an ''e". When this field is omitted means that the reconstruction contains all the available interactions disregarding the confidence level of evidence, whereas ''strong" is used for those GRNs reconstructed only with interactions validated by direct experimental evidence. An optional description field, preceded by a ''d", enables to include keywords such as ''sRNA" for GRNs containing sRNAs-controlled regulons (Fig. 1B).
The source field, that starting with an ''s", is composed by a database name abbreviation and year when meta-curated from databases, and the last two digits of the publication year when curated from literature (see Supplementary Table 1 for a complete list of data sources abbreviations and references). On the ''Browse" page of Abasy Atlas, the user can identify the source for each GRN, as well as for the subnetworks when the GRN is a meta-curation from different sources.

Historical snapshots of the GRNs
Network theory-based approaches to study the organizing principles governing GRNs have been pointed to be biased by the curation process and incompleteness [16,41]. Nevertheless, those studies have been mainly applied to subnetworks sampled by different random computational algorithms that cannot reproduce faithfully the curation process by the scientific community. To bring an alternative solution to this problem, we have been curating organism-specific databases and literature during the construction of Abasy Atlas in different time points for several organisms (hereinafter referred to as historical snapshots). Namely, nine historical snapshots for E. coli, four for C. glutamicum, four for B. subtilis, and three for M. tuberculosis (Fig. 2).
Each historical snapshot represented in Fig. 2 is the most complete version of the GRNs at that time point. However, individual GRNs are also available. For example, the historical snapshot of the GRN of B. subtilis in 2017 (224308_v2017_sDBTBS08-15-SW18, Fig. 2) integrates regulatory interactions from two organism-specific databases (DBTBS [42] and SubtiWiki [5]) and one article [43] (Fig. 3). The individual GRNs are available with their own network ID (224308_v2008_sDBTBS08_eStrong, 224308_v2017_sSW18, and 224308_v2015_s15, respectively).
Note that the GRN from DBTBS is also the first historical snapshot for B. subtilis (Fig. 2), and GRNs from different sources do not need to be from the same year since a new historical snapshot integrates every previous GRNs. The network integration and homogenization from different sources enables cross-bacteria analyses with the historical snapshots.
We will continue querying organism-specific databases and curating literature periodically to obtain more complete versions of each GRNs. Also, we will extend the historical snapshots to other organisms as information will be available.

Meta-curation of GRNs: Quality control coping with inconsistency and preserving information from the different sources
The heterogeneity in gene symbols and network representations often conduces to redundancy and loss of information. Consequently, this heterogeneity can result in misleading network reconstructions. The meta-curation process mainly consists of homogenizing gene symbols and network representation before merging interactions from different sources. To cope with gene symbols disagreement among regulatory datasets from different sources, we gathered gene name, locus tag, and synonyms for each gene in the GRNs. Then, we developed an algorithm to map gene symbols onto unambiguous canonical gene names and locus tags. This allowed us to remove a total of 223 redundant nodes and 412 redundant interactions from the current set of GRNs (Supplementary Fig. 4). We refer the reader to version 1.0 of Abasy Atlas for further information about the gene symbols disambiguation algorithm [12]. For the graphical network representation, we use the unambiguous canonical gene name when available or locus tag. This eases to identify genes of interest. However, the mapping of gene identifiers allows the user to use the search box with different gene symbols and synonyms mapping to the same gene and navigate through the neighborhood of the gene of interest.
Abasy Atlas also provides the confidence level supporting each interaction since GRNs composed with different confidence-levels may bias their structural properties [14]. Therefore, a ''strong" or ''weak" confidence level is assigned to each interaction according to an expanded scheme based on the one proposed by RegulonDB [44,45]. The basic idea of the confidence level scheme is to label as ''strong" only those interactions with direct, non-ambiguous experimental support such as DNA binding of purified TF [45]. Besides, the meta-curated networks that merge regulons from different sources also integrate the effect and the evidence level. This makes the GRNs from Abasy Atlas the most complete collection of homogenous versions in contrast to those individual GRNs available in organism-specific databases. Historical snapshots for GRNs of model organisms. The completeness of the network can be measured as genomic coverage (fraction of the genome included in the GRN, black triangles) and interaction coverage (fraction of the known interactions relative to the complete network, red circles). It is evident that for some networks genomic coverage overestimates completeness as some networks may be classified as almost completed in terms of genomic coverage whereas many interactions are still missing. For instance, the GRN for C. glutamicum in 2016 is a meta-curation of the network from 2011 and a set of interactions curated in [13] including the sigA housekeeping sigmulon. On the other hand, the GRN for M. tuberculosis in 2016 is the most complete in terms of interaction coverage (97.7%) since it integrates the network from 2015 with novel interactions curated from the literature. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. 3. Complementary sources to reconstruct the meta-curated GRN for B. subtilis. A poor overlap is observed between the different sources used to reconstruct the meta-curated GRN for B. subtilis, mainly for interactions. This highlights the need for the meta-curation since the organism-specific databases do not fully cover each other nor the dataset not previously hosted in any database. Abasy provides homogeneous meta-curations integrating all the available information.
One of the main caveats of consolidating networks is the nonmachine readable, heterogeneous way to represent the information about the way a TF regulates a specific TG and the evidence supporting such interaction, mainly for community-updated databases. To tackle this problem, we manually curate those attributes from different sources when available. Thus, Abasy Atlas makes possible to know in a homogenous fashion whether a TF promotes or hinders its TGs transcription even for interactions from a community-updated database such as SubtiWiki. Therefore, if the same interaction from a different source share effect but diverge on evidence, the interaction and the ''strong" evidence is conserved since one directly experimentally validated interaction is enough to classify the edge as ''strong" [45]. On the other hand, in case of different effects and the same evidence level, both effects are conserved in a single dual interaction to avoid redundancy. In the case that both attributes are different, only the ''strong" interaction is conserved (Supplementary Fig. 5). This meta-curation process allows us to reconstruct the most complete GRNs available preserving information from the different complementary sources (Fig. 3).

Meta-curation of GRNs: Quality control filtering spurious interactions by reassessing the confidence level of each interaction
We perform a meta-curation process to reduce the number of spurious interactions, thereby reassessing the confidence level of the interactions. Although networks with ''weak" evidence are a valuable resource to study the transcriptional regulation, only directly experimentally validated interactions offer the reliability needed to use GRNs as gold standards. Abasy Atlas eases the selection of gold standards for benchmarking purposes through readyto-download filtered ''strong" GRNs ( Supplementary Fig. 6).
Using the historical snapshots of the E. coli GRNs, we analyzed how often a regulatory interaction identified by a ''weak" methodology was validated as ''strong" evidence. We found that the number of interactions identified for each methodology varies in a wide range, as well as its fraction of predictions validated as ''strong" (Fig. 4A). Namely, ''inferred computationally without human oversight" (ICWHO) is the evidence with the lowest fraction of validated interactions ( Fig. 4A and Supplementary Fig. 7). On the other hand, ''RNA-polymerase footprinting" (RPF) is the only methodology having a 100% of interactions validated as ''strong" evidence, and >50% of ''gene expression analysis" (GEA) predictions have been validated despite being the ''weak" evidence with the highest number of predictions.
We further analyzed the effect of the interactions with ICWHO as its unique evidence, and found that most of these interactions were present in the 2013 and 2014 time points but no longer in 2015 or later. Being this the reason for the outstanding completeness of these network reconstructions and its unusual system-level elements proportions (Fig. 4B). For this reason, we decided to exclude predictions being supported only by the ICWHO evidence in Abasy Atlas. This analysis highlights the capability of the system-level properties to assess GRNs quality. It is important to note that despite the small fraction of validated interactions inferred by ''non-traceable author statement" (NTAS) (Supplementary Fig. 7), we did not remove interactions supported only by this evidence since the number of predicted interactions is very small (Fig. 4A).

Estimating GRNs completeness by leveraging their constrained complexity
The ability to quantify the total number of interactions in the complete GRN of an organism is a valuable insight that will leverage the daunting task of curation, prediction, and validation by enabling the inclusion of prior information about the network structure. Besides, the ability to track the completeness, quantified as the fraction of the known interactions from the total number in the complete network (interaction coverage), through different historical snapshots could allow to develop models on how new regulatory interactions are discovered and to provide a framework to assess network analysis and network inference tools. But, poor efforts have been directed towards the longstanding problem of how to assess the completeness of these networks. Traditionally, network genomic coverage has been used as a proxy of completeness. The genomic coverage of a regulatory network is the fraction B A Fig. 4. (A) Number of interactions identified by methods described as ''weak" in [3] and how many of these interactions have been validated by ''strong" evidence. IGI (inferred from genetic interaction), TAS (traceable author statement), TASES (traceable author statement to experimental support), NTAS (non-traceable author statement), IC (inferred by curator), IHBCE (inferred by a human based on computational evidence), RFP (RNA-polymerase footprinting), ICA (inferred by computational analysis), IEP (inferred from expression pattern), IMP (inferred from mutant phenotype), BCE (binding of cellular extracts), AIPP (automated inference of promoter position), HIPP (human inference of promoter position), AIBSCS (automated inference based on similarity to consensus sequences), ICWHO (inferred computationally without human oversight), HIBSCS (human inference based on similarity to consensus sequences), GEA (gene expression analysis) [59]. (B) Effect of removing spurious interactions through the meta-curation process. System-level elements (global regulators, modular, intermodular, and basal-machinery genes) values represent its fraction from the total genes in the E. coli GRN historical snapshots before and after removal of interactions supported only by the ICWHO evidence. of genes in the network relative to the genome size. Nevertheless, this measure poses potential biases as it neglects regulatory redundancy and the combinatorial nature of gene regulation, thus potentially overestimating network completeness.
For example, the addition of a global regulon or sigmulon (perhaps discovered by high-throughput methodologies) to a quite incomplete regulatory network could bias the genomic coverage. Assume you have a regulatory network with a genomic coverage of 15% (600/4000) and 700 interactions. You then found a paper reporting the promoter mapping for the corresponding housekeeping sigma factor, whose sigmulon has 3000 genes (400 of which were already in the original network). Next, you found that 100 out of the 3000 interactions in the global sigmulon already exist in our original network. You then integrate all the remaining 2900 new interactions to your original network to found that your resulting network has a new genomic coverage of 80% (3200/4000) and 3600 interactions. This new high genomic coverage may suggest a highly complete network but it is indeed the same quite incomplete original network plus a single global sigmulon. To clarify this, assume that the total number of interactions in the complete network is 10000, then the completeness of this new network is 36% (3600/10000). Whereas the curation of a single housekeeping sigmulon increased the completeness~30% (3600/10000 -700/10000), the new completeness is still low, and the genomic coverage is highly overestimating when is used as a proxy for completeness. Therefore, to state the completeness of a regulatory network correctly, it is fundamental to estimate the total number of interactions. Two recent works have simultaneously provided estimations on the size of GRNs [14,46].
On one hand, the RegulonDB team carried out an exploratory analysis [46]. They used a single version of the E. coli regulatory network and high-throughput datasets of binding experiments for around 15 TFs. By assuming a linear model, they found an upper-bound estimate of 45759 regulatory interactions. They claimed that only one-third of the~46000 would affect gene expression, concluding that the complete network comprises only around 13000 interactions.
Alternatively, our group recently explored the constraints on several structural properties of the 71 regulatory networks deposited in Abasy Atlas v2.0 [14]. We found that the network density (d) as a function of the number of genes (n) follows a power law as d~n Àc with c % 1. Since 1972, a seminal paper by Robert May showed that the frontier between dynamical stability and instability for a complex system follows a power law as d~n À1 , relating complexity quantified via the density of interactions and the number of variables (the size of the system) [47]. The density of interactions (network density) is the fraction of potential interactions that are real interactions, thus a constraint in network density implies a constraint in the total number of interactions in the complete network. As we found that density is constrained in GRNs, we explored three possible models to predict the total number of interactions as a function of the number of genes (see Fig. 4 in [14]): edge regression (assuming linearity, R 2 = 0.90), density invariance (assuming an invariant density, R 2 = 0.86) and density proportionality (assuming an exponential decay, R 2 = 0.91). All the models had a good fit to the data (0.86 R 2 0.91), with small differences between them. These models predicted that the total number of interactions in the complete E. coli regulatory network is~10000, 14000, and~11000, respectively. After publication, we reformulated the problem. As regulatory networks are directed and self-regulations are allowed, the maximum number of possible interactions (I max ) is n 2 as each of the n genes could regulate to other n genes including itself (selfregulation). The density of a regulatory network must be then computed as By introducing this equation into the power law found for the density of the Abasy Atlas networks (d~n Àc ), we derived another power law modeling the total number of interactions in the regulatory network as a function of the number of genes as This model has a better fit to data (Fig. 5, R 2 = 0.98) than the previous three models, and allows us to compute the total number of interactions in the regulatory network of an organism as I total( genome size) 2Àc . We implemented this model in Abasy Atlas v2.2 to provide estimations on the completeness of each regulatory network, including confidence intervals. The power-law model predicts that the complete E. coli regulatory network will have 11656 total regulatory interactions. This model can learn the tendency in the number of interactions, and it improves as more regulatory networks are included in Abasy Atlas. That is one of the reasons motivating us to continue expanding Abasy Atlas by adding new organisms and historical snapshots.

Homogeneous representation for heteromeric transcription factor complexes
Even though heterodimeric regulatory complexes are not overrepresented in regulatory networks, some of them are global regulators and their interactions control up to~10% of the genome and represent a valuable percent of the whole network (~6% in E. coli GRNs). IHF is a global regulator histone-like protein of E. coli that regulates transcription as a heterodimeric complex that is shaped by two different proteins: IhfA and IhfB. Although both subunits can form homodimeric complexes, the affinity for DNA is much lower [48], and no regulation in such fashion has been reported. For this reason, assigning the regulatory activity to each subunit (a gene-gene representation, Fig. 6B) is a misleading representation. Additionally, the RpoS sigma factor allows the transcription of both subunits conforming IHF, which in turn also regulates its subunits (Fig. 6A). Such interesting autoregulation cannot be properly represented in a gene-gene based representation (Fig. 6B). Conversely, a representation of the IHF heteromeric complex regulating ihfA and ihfB is better as it depicts the IHF conformation and links them to the TFs regulating their transcription.
This representation is also useful for subunits of heteromeric regulatory complexes that can exhibit regulation in a homodimeric fashion, such as the relB product regulating relE, hokD, and its transcription both as a homodimer and as part of the RelBE complex with relE (Fig. 6C). This RelE-RelB toxin-antitoxin system in E. coli [49] is not properly represented in a gene-gene network (Fig. 6D) as it shows regulatory activity by the relE product on its own. This representation eases the application of the networks as gold standards for inference methods such as those based on the DNA sequence and TF binding sites prediction. For analysis requiring GRNs composed only by genes, Abasy Atlas provides the required information to identify the classification of each biological entity (Supplementary Fig. 8). Currently, Abasy Atlas comprises 12 heteromeric TFs, all of them in the meta-curated GRN of E coli K-12 obtained from RegulonDB [46]. Future development includes the addition of heteromeric TFs in those organisms where this information is available.

Corynebacterium glutamicum ATCC 13,032
The PubMed database was screened to find papers published between January 2017 and August 2018 and describing new tran-scriptional regulatory interactions of C. glutamicum, in addition to the comprehensive data set previously deposited in Abasy Atlas [13]. Four new regulators of different types have been examined in detail, exerting in total 63 new direct transcriptional interactions. Moreover, the predicted regulatory role of the AraC/XylRtype protein Cg2965 (PheR) has been confirmed by experimental data [50,51]. PheR activates the expression of the phe gene (cg2966) encoding phenol hydroxylase, allowing C. glutamicum to degrade phenol by a meta-cleavage pathway. Electrophoretic mobility shift assays (EMSAs) demonstrated a direct interaction of the purified PheR protein with the phe promoter region [51]. The MarR-type regulator CrtR (Cg0725) is encoded upstream and in divergent orientation of the carotenoid biosynthesis operon crtEcg0722crtBIYEb in C. glutamicum. DNA microarray experiments revealed that CrtR acts as a repressor of the crt operon. Additional EMSAs with purified CrtR showed that CrtR binds to a region overlapping the À10 and À35 promoter sequences of the crt operon [52].
The two-component system EsrSR (Cg0707/Cg0709) controls a regulon involved in the cell envelope stress response of C. glutamicum [53]. Interestingly, the integral membrane protein EsrI (Cg0706) acts as an inhibitor of EsrSR under non-stress conditions. The resulting three-component system EsrISR directly regulates a broad set of genes, including the esrI-esrSR locus itself, and genes encoding heat shock proteins (clpB, dnaK, grpE, dnaJ), ABC transporters and putative membrane-associated or secreted proteins of unknown function. Among the target genes of EsrSR is moreover rosR (cg1324) encoding a hydrogen peroxide-sensitive transcriptional regulator of the MarR family and playing a role in the oxidative stress response of C. glutamicum [53,54].
The extracytoplasmic function sigma factor SigD (Cg0696) is a key regulator of mycolate biosynthesis genes in C. glutamicum [55]. Chromatin immunoprecipitation coupled with DNA microarray (ChIP-chip) analysis detected SigD-binding regions in the genome sequence, thus establishing a consensus promoter sequence for this sigma factor. The conserved DNA sequence motif 5 0 -GTAAC-N 17(16) -CGAT-3 0 was found in all ChIP-chip peak regions and presumably corresponds to the À35 and À10 promoter regions recognized by SigD. The rsdA (cg0697) gene, located immediately downstream of sigD, is under direct control of a SigD-dependent promoter and encodes the corresponding SigD anti-sigma factor [55].
The WhcD protein (Cg0850) interacts with WhiA (Cg1792) to exert jointly an important regulatory effect on cell division genes of C. glutamicum [56]. WhiA is an exceptional transcriptional regulator as it has been classified as a distant homolog of homing endonucleases that retained only DNA binding activity [57]. Binding of the WhcD-WhiA complex to the promoter region of the cell division gene ftsZ was observed by EMSAs using purified fusion proteins, although WhcD alone did not bind to the genomic DNA. RelB can regulate itself as a homomeric-complex, and as a heteromeric-complex with relE (C). Besides, relE can regulate neither its transcription nor RelB transcription on its own, as could be misinterpreted from (E). This same misrepresentation is observed for the IHF complex where neither of the subunits has regulatory activity as a homomeric complex.
The sequence motif 5 0 -GACAC-3 0 was found to be important for binding of the WhcD-WhiA complex to the DNA. Additionally, loss of the DNA-binding activity of WhiA in the presence of an oxidant indicated a regulatory role for this protein to control cell division of C. glutamicum under oxidative stress conditions [56]. We merge these interactions with the previous version of the GRN for C. glutamicum and included as a new historical snapshot (196627_v2018_s17) with 2317 genes (73.8% of genomic coverage) and 3444 interactions (45.8% of interaction coverage) (Fig. 2). The ''strong" version of the network was also included, containing a total of 2237 genes (71.3% of genomic coverage) and 2969 interactions (39.5% of interaction coverage).

Bacillus subtilis subtilis 168
Interactions from the most recent big update of SubtiWiki [5] were merged with the last version of Abasy Atlas including interactions from DBTBS [4] and a non-database hosted publication [43]. The result represents a new time point in the B. subtilis GRN history. Until now, four historical snapshots are available for this representative Gram-positive organism (Fig. 2), being the last one the GRN with the highest genomic coverage in Abasy Atlas.

Escherichia coli K-12 MG1655
RegulonDB [46] is one of the first organism-specific databases for transcriptional regulation data and it continues being updated. This makes E. coli the organism with a higher number of historical snapshots. Meta-curated GRNs from 2003 to 2018 depict the effect of the curation process in this Gram-negative model organism (Fig. 2). The meta-curation of the GRNs in Abasy Atlas reassesses the confidence level of the interactions (see ''Construction and content"), and integrates the regulations by TFs, sRNAs, and sigma factors from RegulonDB into a global regulatory network.

User interface
From the ''Home" page, you can find the description and statistics of Abasy Atlas, as well as links of interest. In the ''Browse", page you can find the species for which a global GRN is deposited in Abasy Atlas, along with the number of items (networks) for such species. Further, you can click on the species to identify the strains available and even the confidence level you need. After the selection of the strain and the confidence level, you will find the historical snapshots available for the GRN of interest, as well as additional information such as the genomic and interaction coverage, data sources, and fraction of the system-level components predicted by the NDA (Supplementary Fig. 9). By clicking on ''Global properties", you will find statistical and structural properties characterizing the GRN of interest. Such properties include the number of transcription factors, net-work density, size of the giant component, number of feedforward and feedback motifs, among others. On the same page, you can find the plots for degree, out-degree and clustering coefficient distributions ( Supplementary Fig. 10). We fitted these distributions to a power-law using robust linear regression of log-log-transformed data with Huber's T for M-estimation. This overcomes the negative effect of outliers, in contrast to ordinary least squares, which is highly sensitive to outliers in data.
You can directly search for a specific gene in the upper-right box from any page. Once you are visualizing the subnetwork of interest, using the interactive panel ( Supplementary Fig. 11) you can customize the visualization with several buttons and download the subnetwork as a high-definition PNG image, as well as the JSON file. Every global network can be downloaded from the ''Downloads" page ( Supplementary Fig. 6). Regulatory networks are provided in JSON data-interchange format, including NDA predictions and, when available, effect and evidence supporting regulatory interactions. JSON is an open standard file format, which is a lightweight, language-independent, widely used, data-interchange format supported by >50 programming languages (e.g., Python, R, Matlab, Perl, Julia, JavaScript, PHP) through a variety of readily available libraries. JSON uses human-readable text to store and transmit data objects consisting of attribute-value pairs and array data types. The JSON data files downloadable from Abasy Atlas are readily importable into Cytoscape for further analyses. Gene information and module annotation flat files in tab-separated-value file formats are also available for download. Information on how to parse the JSON files is available in the ''Downloads" page. The citation policy, and the methodology to identify the system-level elements and to predict the interaction coverage is available in the ''About" page. You can find additional help on the ''Help" page, and contact us on the ''Contact" page for any subject; we will appreciate your feedback.

Functionality
Following, we describe some remarkable cases where this new version of Abasy Atlas could have been applied to improve the studies: The DREAM5 consortium assessed to identify the best methodology to predict GRNs from gene expression data [28] using E. coli and Staphylococcus aureus as prokaryotic models. However, they did not study how its assessment was affected by network incompleteness. This analysis can be carried out by using the set of the historical snapshots for model organisms as gold standards. The same could be applied for other assessments such as identifying the best tools to predict TF binding sites [29], DNA motifs [29,30,59], and functional modules [31].
Further, Abasy Atlas could be used to extend those benchmarking studies to include more organisms. For example, DREAM5 considered only E. coli as a prokaryotic model to compute the overall score because a sufficiently large set of experimentally validated interactions for S. aureus did not exist at that time [28]. Currently, Abasy Atlas provides GRNs for 13 S. aureus strains, being USA300/ TCH1516 the most complete one with 25 and 30.6% of genomic and interaction coverage, respectively.
In addition to benchmarking improvements, the comprehensive atlas of GRNs that Abasy Atlas provides could be applied to study the communication that exists between the regulation of gene transcription with other mechanisms such as protein-protein interactions and metabolism [32][33][34]. Even when only the regulation of gene transcription is studied, across-organisms information provided by Abasy Atlas can be used to trace the evolution of the GRN in bacteria, and compare them using gene orthology and network alignment [35]. Future development of Abasy Atlas includes GRNs comparative analyses based on their structural properties.

Future development
Despite high-throughput strategies to study transcriptional regulation, there is a lack of novel interactions reported in contrast with earlier years (Fig. 2). Besides, only a handful of organisms have been experimentally studied. Computational approaches have been a hopeful option for non-model organisms and a plethora of algorithms to infer GRNs have emerged. Nonetheless, many of them are based solely on statistical approaches lacking biological constraints to filter spurious interactions. Previous assessments of tools to infer GRNs have unveiled their poor performance but also have shed light on the possibility to increase precision by consensus approaches and biological constraints [28].
Future development of Abasy Atlas aims to include inferred non-model organisms GRNs in a conservative fashion by different consensus-based approaches and the application of currently available data to validate predicted networks by using GRN organizing constraints, such as the composition of system-level elements (Fig. 4B) and network structural properties. The addition of heteromeric TFs for more organisms is also considered in the short-term future development. Mainly for the model organisms C. glutamicum and B. subtilis for which more information regarding regulation by heteromeric TFs is available. Besides, historical snapshots for non-model organisms already available in Abasy Atlas, such Streptomyces coelicolor will be included, while continuing including additional historical snapshots for model organisms curated from the literature and organism-specific databases. Finally, a python library providing an API to allow programmatic access to Abasy Atlas, and a REST API are under development.

Conclusions
Beyond the regulon level, Abasy Atlas provides the most complete and reliable set of GRNs for many bacterial organisms, which can be used as the gold standard for benchmarking purposes and training data for modeling and network prediction. Besides, Abasy Atlas provides historical snapshots of regulatory networks. Therefore, network analyses can be performed with GRNs having different completeness levels, making it possible to identify how a methodology is affected by the incompleteness, to pinpoint potential bias and improvements, and to predict future results. Additionally, Abasy Atlas is the first database providing estimations on the completeness of GRNs, their global regulators, modules, and other system-level components. The estimation of the total number of regulatory interactions a GRN could have is a valuable insight that may aid in the daunting task of network curation, prediction, and validation. Furthermore, the prediction of the system-level elements in GRNs has allowed unraveling the complexity of these networks and provides new insights into the organizing principles governing them, such as the diamond-shaped, three-tier, hierarchy unveiled by the NDA. The GRNs in Abasy Atlas have been metacurated to avoid heterogeneity such as inconsistencies in gene symbols and heteromeric regulatory complexes representation. This enables large-scale comparative systems biology studies aimed to understand the common organizing principles and particular lifestyle adaptations of regulatory systems across bacteria and to implement those principles into future work such as the reverse engineering of GRNs.

Availability and requirements
Abasy Atlas is available for web access at https://abasy.ccg. unam.mx. If you use any material from Abasy Atlas please cite properly. Use of Abasy Atlas and each downloaded material is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at jfreyre@ccg.unam.mx. Disclaimer: Please note that original data contained in Abasy Atlas may be subject to rights claimed by third parties. It is the responsibility of users of Abasy Atlas to ensure that their exploitation of the data does not infringe any of the rights of such third parties.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.