Distribution of Genes Encoding Nucleoid-Associated Protein Homologs in Plasmids

Bacterial nucleoid-associated proteins (NAPs) form nucleoprotein complexes and influence the expression of genes. Recent studies have shown that some plasmids carry genes encoding NAP homologs, which play important roles in transcriptional regulation networks between plasmids and host chromosomes. In this study, we determined the distributions of the well-known NAPs Fis, H-NS, HU, IHF, and Lrp and the newly found NAPs MvaT and NdpA among the whole-sequenced 1382 plasmids found in Gram-negative bacteria. Comparisons between NAP distributions and plasmid features (size, G+C content, and putative transferability) were also performed. We found that larger plasmids frequently have NAP gene homologs. Plasmids with H-NS gene homologs had less G+C content. It should be noted that plasmids with the NAP gene homolog also carried the relaxase gene involved in the conjugative transfer of plasmids more frequently than did those without the NAP gene homolog, implying that plasmid-encoded NAP homologs positively contribute to transmissible plasmids.


Introduction
Bacterial chromosomal DNA is folded to form a compacted structure, the nucleoid. The proteins involved in folding the chromosome are known as nucleoid-associated proteins (NAPs) [1,2]. Because of their DNA-binding ability, NAPs can also play an important role in global gene regulation [1,2]. Each well-known NAP in Enterobacteriaceae may be categorized as a "factor for inversion stimulation" (Fis), "histone-like nucleoid structuring protein" (H-NS), "histone-like protein from Escherichia coli strain U93" (HU), "integration host factor" (IHF), or "leucine-responsive regulatory protein" (Lrp) [1]. Fis is one of the most abundant NAPs in exponentially growing E. coli cells, and its role as a transcriptional regulator has been investigated [3]. H-NS binds DNA, especially A+T-rich regions including promoter regions or horizontally acquired DNA and acts as a global transcriptional repressor [4]. HU and IHF are similar in amino acid sequence level, and both are global regulators [5,6], although they have distinct DNA-binding activities: HU binds to DNA nonspecifically whereas IHF binds to a consensus sequence [7]. Lrp has a global influence on transcription regulation and is also involved in microbial virulence [8]. In addition to these well-known NAPs, many other NAPs are found not only in Enterobacteriaceae but also in other organisms. For instance, NdpA, a functionally unknown NAP, has been found in Gram-negative bacteria [9]. The MvaT family protein is the functional homolog of H-NS in Pseudomonas bacteria [10].
Horizontal gene transfer (HGT), which is mediated by transduction, transformation, and conjugation, plays an important role in the evolution of prokaryotic genomes [11,12]. Genes acquired by HGT can provide beneficial functions such as resistance to antibiotics and advantages to their host under selective pressures [13]. However, the mechanisms underlying the integration of newly acquired genes into host regulatory networks are still unclear. Recent investigations have shown that some plasmids carry the genes 2 International Journal of Evolutionary Biology encoding NAP homologs, which play important roles in transcriptional regulation networks between plasmids and host chromosomes and in maintaining host cell fitness. For example, Doyle et al. [14] reported that plasmidencoded H-NS-like protein has a "stealth" function that allows for plasmid transfer into host cells without disrupting host regulatory networks, maintaining host cell fitness. Yun and Suzuki et al. [15] reported that plasmid-encoded H-NS-like protein can also play a key role in optimizing gene transcription both on the plasmid and in the host chromosome.
In this study, we determined the distributions of NAP homologs among plasmids and discussed their roles in the maintenance of plasmid and host cell fitness.

Plasmid Classification.
Plasmids in the database were classified into six groups according to their source organisms: Gram-negative, Gram-positive, archaeal, eukaryotic, viral, and unclassified. Putative transferability of each Gramnegative plasmid was determined by whether it carried the relaxase gene of each MOB family that Garcillán-Barcia et al. proposed [16]. Instead of using the local PSI-BLAST program (ver. 2.2.24, ftp://ftp.ncbi.nlm.nih.gov/ blast/executables/blast+/LATEST/) as described by Garcillán-Barcia et al. [16], we used the local TBLASTN program.

Database Collection and Plasmid Classification by Origin.
We downloaded the whole sequences of 2278 plasmids from the NCBI ftp site (April 2010). Duplicated plasmids were removed manually, and the resultant 2260 plasmid sequences were used in this study. To understand what types of plasmids were included in the database, we classified them into six groups according to their source organisms. The database included 1382 Gram-negative, 725 Gram-positive, 81 archaeal, 43 eukaryotic, 1 viral, and 28 unclassified plasmids.

Identification of the Plasmids Containing NAP Gene
Homologs. Using the amino acid sequences of well-known NAPs (Fis, H-NS, HU, IHF, and Lrp) and newly found NAPs (MvaT and NdpA), their distributions were surveyed for plasmids using the TBLASTN program. Some plasmids had ORFs showing sequence similarities to both HU and IHF. We adopted the one with the higher E value. Of 2260 plasmids, 155 (7%) contained the gene encoding NAP homolog. Of those, 116 (75%) contained only one NAP gene homolog and 39 (25%) contained more than one NAP gene homolog. No plasmids carried the Fis gene homolog. Twenty-two plasmids carried the H-NS gene homolog, and all of them had a Gram-negative origin (Table 1). Sixty-six plasmids had the HU gene homolog; of these, 51 had a Gram-negative origin and 15 had a Gram-positive origin ( Table 2). Twenty-seven plasmids (25 with Gram-negative and 2 with Gram-positive origins) carried the IHF gene homolog (Table 3). Forty-eight plasmids (46 with Gram-negative, 1 with a Gram-positive, and 1 with an archaeal origin) carried the Lrp gene homolog (Table 4). Of these, 23 (48%) contained more than one Lrp gene homolog. On the other hand, MvaT and NdpA homologs were encoded on only 3 plasmids, and all of them were of Gram-negative origin (Table 5). Previously reported plasmids that are known to have NAP gene homologs were included in those 155 plasmids. These included R27 (NC 002305) and pHCM1 (NC 003384) [18,19] with the H-NS gene homolog; pQBR103 (NC 009444) [20] with the HU and NdpA gene homologs; and pCAR1 (NC 004444) [21,22] with the MvaT, HU, and NdpA gene homologs. These results indicated the adequacy of our search. Because we used NAPs from Gram-negative bacteria as query sequences, it may be reasonable that 136 (88%) of 155 plasmids with the NAP gene homolog belonged to the group isolated from Gram-negative bacteria. Therefore, in further studies we discussed the Gram-negative plasmid group.

Relationships between Plasmid Size and NAP Gene
Homolog Distributions. We first compared the sizes of 136 plasmids with NAP gene homologs with those of all 1382 Gram-negative group plasmids. All 1382 plasmids could be divided into 4 groups according to size, small (<10 kb), intermediate (10 to 100 kb), large (100 kb to 1 Mb), and mega (>1 Mb) plasmids. The distribution of the 136 plasmids, each of which had one or more genes encoding NAP homologs, is shown in Figure 1                       International Journal of Evolutionary Biology       International Journal of Evolutionary Biology   International Journal of Evolutionary Biology    were found in 588 proteobacterial genomes. Frequency of NAP genes in plasmids was higher (1 per 236 kb) than that in proteobacterial genomes (1 per 1.8 Mb), also suggesting that larger plasmids frequently have NAP gene homologs to minimize their negative effects on the host cell.
Of the plasmids with the NAP gene homolog, the average size of those with the H-NS gene homolog was relatively small (132 kb) while that of those with the Lrp gene homolog was relatively large (725 kb). The average sizes of those with the other NAP gene homologs were as follows: HU (301 kb), IHF (230 kb), MvaT (244 kb), and NdpA (235 kb) (Figure 1(b)). H-NS exists in an oligomeric form and binds to DNA, especially A+T-rich regions, by bridging it [25]. This function may be important for regulating gene expression on relatively small plasmids among those with the NAP gene homolog. The activity of H-NS can also be modulated by Hha-like proteins [26]. Intriguingly, TBLASTN analysis showed that 12 (55%) of 22 plasmids with the H-NS gene homolog also carried gene encoding Hha-like protein although only 65 (5%) of all 1382 plasmids carried Hha-like protein gene (Table 6). This suggests the close relationship of H-NS and Hha-like protein. On the other hand, Lrp exists in dimeric, octameric, and hexadecameric forms and compacts DNA by wrapping it [27]. This distinctive DNAbinding ability may be essential for maintaining the structure of particularly larger plasmids. without NAP gene homologs. The average G+C content of the 136 plasmids with NAP gene homologs was higher (56.4%) than that of all 1382 plasmids (44.8%) (Figure 2(a)). Note that the average G+C content of large and mega plasmids (55.0% and 62.9%, resp.) was higher than that of small and intermediate plasmids (44.8% and 40.4%).

Relationships between Plasmid G+C Content and NAP
Considering that larger plasmids frequently had NAP gene homologs, this seems reasonable. Nevertheless, plasmids with H-NS gene homologs had a lower G+C content (48.3%) than did those with other NAP gene homologs, including HU (54.2%), IHF (58.7%), Lrp (62.3%), MvaT (55.6%), and NdpA (52.9%) (Figure 2(b)). H-NS family protein binds A+T-rich regions not only on chromosomes but also on plasmids [15]. Acquisition of a large A+T-rich plasmid with many H-NS binding sites may result in a reduction in the binding of H-NS to the host chromosome and host cell fitness [14]. It is therefore possible that large A+Trich plasmids may have to supply another H-NS encoded on themselves to minimize the effect on the host cell. On the other hand, although MvaT-family proteins are the functional homolog of H-NS [10,15], plasmids containing the MvaT gene homolog were not particularly low in G+C content. Although only three plasmids contained the MvaT gene homolog and thus we cannot discuss this interesting phenomenon in detail, the difference between H-NS and MvaT may be derived from their different origin or host bacteria.

Relationships between Plasmid Transferability and NAP
Gene Homolog Distributions. Conjugative transfer is an essential function of plasmids, through which they play an important role in bacterial evolution and host cell behavior [11,12]. Relaxase is an essential protein for plasmid transmission involved in the cleavage of the transferring DNA at the origin of transfer (oriT) site, and plasmids with relaxase genes are thought to be transmissible. Garcillán-Barcia et al. [16] proposed that transmissible plasmids can be classified into 6 MOB families (MOB C , MOB F , MOB H , MOB P , MOB Q , and MOB V ) according to the amino acid sequences of 6 prototype relaxase proteins. MOB F and MOB H families are predominantly composed of conjugative plasmids, also called self-transmissible plasmids, and the other 4 families are composed of both mobilizable and conjugative plasmids. Recent studies have reported that plasmid-encoded H-NS family proteins have a "stealth" function and aide horizontal transfer of plasmids [14,15]. Other NAPs also act as global transcriptional regulators and may regulate expression of genes involved in plasmid transmission. To discuss the relationship between NAP gene homolog distribution and plasmid transferability, we determined the distribution of genes encoding relaxase proteins in Gram-negative plasmids according to the classification by Garcillán-Barcia et al. [16]. Four hundred and nine (30%) of 1382 Gram-negative plasmids carried relaxase genes, and 71 (17%) of those 409 plasmids carried NAP gene homologs. Note that 71 (52%) of 136 plasmids with NAP gene homologs carried relaxase genes. This indicates that plasmids with NAP gene homologs frequently carried the relaxase genes than did those without NAP gene homologs. This phenomenon may be related to the average size of the plasmids. That of the 409 plasmids with relaxase genes was relatively larger (145 kb) than that of all 1382 plasmids (83 kb), corresponding to the fact that larger plasmids frequently had NAP gene homologs. Four hundred and nine plasmids were classified into each MOB family (13, MOB C ; 128, MOB F ; 29, MOB H ; 86, MOB P ; 131, MOB Q ; and 26, MOB V ). Plasmid 1 (NC 008545) was classified into both the MOB C and MOB F families. In addition, the MOB P , MOB Q , and MOB V families were partially overlapped as described by Garcillán-Barcia et al. [16]. Seventy-one plasmids with NAP gene homologs were contained in each MOB family (1, MOB C ; 11, MOB F ; 20, MOB H ; 8, MOB P ; 30, MOB Q ; and 2, MOB V ). Intriguingly, 20 (69%) of 29 MOB H -family plasmids encoded some NAP homologs, and most of them were H-NS or HU ( Table 7). The MOB H family was composed of predominantly large conjugative plasmids, such as the IncHI1 group of plasmids, suggesting that HU may also contribute to plasmid transmission as does H-NS. Furthermore, 30 (23%) of 131 MOB Qfamily plasmids also contained some NAP gene homologs, and 15 (50%) of those carried Lrp gene homologs ( Table 8). The MOB Q family was composed of both mobilizable and conjugative plasmids, such as those of Rhizobium and Agrobacterium, implying that Lrp may also affect plasmid conjugation. In the other MOB families, plasmids containing NAP gene homologs were less than 10% (8%, MOB C ; 9%, MOB F ; 9%, MOB P ; and 8%, MOB V ). This phenomenon may also be related to the average size of the plasmids contained in each MOB family. MOB H (220 kb) and MOB Q (198 kb) were larger than MOB C (78 kb), MOB F (117 kb), MOB P (87 kb), and MOB V (149 kb). On the other hand, the average G+C content of all plasmids belonging to each MOB family was as follows: MOB C (52%), MOB F (52%), MOB H (51%), MOB P (53%), MOB Q (54%), and MOB V (46%). No relationship between the distribution of NAP gene homologs of each MOB family and the G+C content of plasmids was found.
3.6. Conclusions. We compared the distribution of NAP gene homologs among plasmids and plasmid features. Larger plasmids frequently had NAP gene homologs, possibly to maintain themselves and host cell fitness. Plasmids with NAP gene homologs also frequently carried relaxase genes. Although this may be related to their relatively larger sizes, together with the fact that NAPs affect global gene regulation, it is likely that NAPs contribute to plasmid transmission. Considering the fact that NAPs encoded on plasmids actually help the host cell to integrate newly acquired genes into host regulatory networks [14,15], large plasmids with NAP gene homologs may be generally more beneficial not only for the host cell, but also for their own existence.
NAP homologs encoded on plasmids can interact with different types of NAPs encoded on the host chromosome and cooperatively regulate host transcriptional networks. Understanding these mechanisms in more detail will shed light on the meanings of the distributions of NAPs on plasmids and chromosomes. Comprehensive analysis of their binding sites in the host and plasmid genomes will help us to understand the relationships between G+C content and the presence of NAPs. Such information will explain how bacteria adapt and evolve by acquiring foreign genes by HGT.