Identification and analysis of genomic islands in Burkholderia cenocepacia AU 1054 with emphasis on pathogenicity islands

Genomic islands (GIs) are genomic regions that reveal evidence of horizontal DNA transfer. They can code for many functions and may augment a bacterium’s adaptation to its host or environment. GIs have been identified in strain J2315 of Burkholderia cenocepacia, whereas in strain AU 1054 there has been no published works on such regions according to our text mining and keyword search in Medline. In this study, we identified 21 GIs in AU 1054 by combining two computational tools. Feature analyses suggested that the predictions are highly reliable and hence illustrated the advantage of joint predictions by two independent methods. Based on putative virulence factors, four GIs were further identified as pathogenicity islands (PAIs). Through experiments of gene deletion mutants in live bacteria, two putative PAIs were confirmed, and the virulence factors involved were identified as lipA and copR. The importance of the genes lipA (from PAI 1) and copR (from PAI 2) for bacterial invasion and replication indicates that they are required for the invasive properties of B. cenocepacia and may function as virulence determinants for bacterial pathogenesis and host infection. This approach of in silico prediction of GIs and subsequent identification of potential virulence factors in the putative island regions with final validation using wet experiments could be used as an effective strategy to rapidly discover novel virulence factors in other bacterial species and strains.


Background
B. cenocepacia is one of the major pathogens infecting patients with cystic fibrosis (CF) as well as some other chronic respiratory diseases such as bronchiectasis [1][2][3]. As an opportunistic pathogen, it can cause chronic lung infections in these patients. B. cenocepacia belongs to the Burkholderia cepacia complex (BCC), which consists of at least 17 genetically distinct but phenotypically similar species [4]. B. cenocepacia was initially the species most commonly isolated from patients with CF, although almost all BCC species have now been isolated from CF populations [5,6]. By using recA gene sequence analysis and multilocus sequence typing, B. cenocepacia may be subdivided into four phylogenetic clusters, IIIA to IIID [7]. However, almost all clinically relevant isolates belong to the IIIA and IIIB groups [8,9]. Epidemiological studies showed that strains ET-12 and several other epidemic dominant in Canada and Europe are part of the IIIA subgroup [10]. In comparison, the dominant epidemic clones in the USA belong to subgroup IIIB [11].
With thousands of sequenced bacterial genomes, there are (at the time of writing: December, 2014) seven assembled genomes from the B. cenocepacia species (ftp:// ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/Bacteria/): strains J2315, H111, H2424, MC0-3, AU1054, DDS 22E-1 and DWS 37E-2 [12]. Without exception, all strains possess three chromosomes of unequal sizes. Since sequences of these seven genomes were completely published, they have been extensively used in many comparative genomics and computational genomic studies [13][14][15][16][17][18][19]. For example, we reported that the AU 1054 strain has a distinct gene distribution regarding the most important genes, i.e. essential protein-coding genes and tRNA genes [20]. Its third chromosome contains a higher number of these genes than the larger chromosome II [20]. However, this pattern is absent in the other strains and results from segment translocation between chromosomes I and III [20]. Due to the fact that large-scale translocation has been reported in very few bacteria, this work was often listed as one type of example of chromosome translocation in bacterial genomes [21,22]. In addition, genomic islands (GIs) have been extensively investigated in the strain J2315 [23]. A total of 14 GIs were revealed in this strain and these GIs occupied 9.3% of its 8.06 Mb chromosome. Interestingly, none of them were found as conserved entities in the two available genomes of B. cenocepacia IIIB strains, AU1054 and HI2424 [23].
To further understand genome plasticity and reveal potential pathogenicity islands (PAIs), we identified and analyzed GIs in the strain AU 1054. Consequently, 21 GIs were identified through combining multiple methods. These GIs occupied 7.26% of the complete genome. GIs usually exhibit specific characteristics [24,25]. First, GIs, particularly those recently inserted, tend to have a distinct composition to that of the host genome, and this feature is generally measured by G + C base deviation. Second, transposases and integrases, as mobility genes, may aid host incorporation of the GIs [25,26] and hence many GIs contain high proportions of mobility genes. Third, tRNA genes, as another type of marker gene [27], often flank GI borders [25,26]. Fourth, a recent study found that GIs contain higher ratio of hypothetical proteins (predicted proteins with unknown functions) than the rest of the genome [28]. Furthermore, virulence genes more frequently appear in GI regions. Analyses of these features in the 21 putative GIs indicated that they constituted reliable predictions given that each of them was found to contain multiple typical features. Moreover, four GIs were determined as PAIs since they contain putative or recognized virulence factors.

B. cenocepacia genomes
Eight strains of B. cenocepacia were employed in this work: these are AU 1054, J2315, H2424, HI11, MC0-3, DDS 22E-1, DWS 37E-2 and K56-2. Of these, only the genome of K56-2 has not been sequenced [12]. Sequence data and annotation information of the seven sequenced strains were downloaded from the ftp site of NCBI Refseq in June 2014 [12]. Each genome contains three chromosomes, named I, II and III based on descending order of sequence length. Note that there exist seven other sequenced genomes of this species, but these sequences remain as highly separated fragments and have not been assembled. Therefore, we could not analyze these in a whole chromosome mode as our methods required.

The cumulative GC profile
The cumulative GC profile method proposed by Zhang and Zhang [29] was used to identify GIs in dozens of prokaryotic genomes [30][31][32][33]. Briefly, a chromosomal sequence is projected into a curve called a cumulative GC profile, after the Z transform, linear fitting, and noise filtering [28]. Three basic characteristics of the curve should be indicated: (i) if a region in the curve is almost a straight line, the GC content remains nearly constant within this region. (ii) An elevation (or a decrease) in the profile indicates a reduction (or increase) in GC content. (iii) Any maximum (minimum) point in the curve indicates a turning point, where the GC content undergoes an abrupt change from a relatively GC-poor (GC-rich) region to a relatively GC-rich (GC-poor) region [28].
GIs are typically relatively homogeneous in terms of GC variation, and this fact implies that its curve appears as a straight line [28][29][30][31][32][33]. According to this characteristic, we could roughly identify candidate GIs by inspection using human eye. To ensure accurate results, an additional index named 'h' is employed [28], which describes the homogeneity of GC content of GIs more accurately. If h is significantly less than 1, the variations of GC content of GIs may be considered to be small [33]. In this work, h = 0.1 is taken as the threshold for deciding a potential GI. For more details of the systematic method, please refer to Zhang and Zhang [29].
The IslandViewer web tool 'IslandViewer' is a freely available web tool for predicting GIs [34,35] and has been widely utilized in identification and characterization of bacterial GIs [36]. For thousands of sequenced bacteria, the web site offers detailed information about their GIs. For an anonymous genome, the tool can perform an automatic search for GIs through composition bias-based methods or comparative genomic approaches. In addition to predictions from cumulative GC profiles, we also downloaded GI information for the AU 1054 strain from IslandViewer.
Combining the cumulative GC profile and the IslandViewer web tool to obtain reliable predictions To minimize false-positive predictions, we only retained those GIs predicted by both the cumulative GC profiles and IslandViewer. Such results would have fewer falsepositive predictions, although a few actual GIs may be missed by this combinatorial strategy. That is to say, GIs identified by the convergence of the two methods would be more likely to be authentic GIs than those obtained using the individual methods.

Construction of non-polar deletion mutant strains
Primers used for deletion mutagenesis are listed in Table 2. The unmarked, non-polar mutant strains were constructed as described by Flannagan et al. [37], with slight modifications. Briefly, 5'-and 3'-flanking regions of target genes (copR and lipA) were amplified by PCR from chromosomal DNA, respectively, and the individual PCR products were mixed to generate an in-frame deletion pattern of target genes by an overlapping PCR method, as described previously [38,39]. Then, the overlapping amplicon containing the in-frame deletion pattern of target genes was sub-cloned into pGPI-SceI, resulting in recombinant plasmids pGPI-copR and pGPI-lipA respectively. The resulting plasmids were transferred into B. cenocepacia by tri-parental mating using E. coli HB101 carrying the helper plasmid pRK2013. The single Tpresistant colonies were then selected as candidates with a single recombination, which was confirmed by PCR. The plasmid pDAI-SceI was introduced by tri-parental conjugation, with the help of pRK2013, to obtain doublecrossover event mutants. The pDAI-SceI plasmid was resolved by curing the exconjugants in LB broth. All constructs and mutants were confirmed by PCR analysis and verified by DNA sequencing.

Complementation of mutant strains
The coding regions of copR and lipA genes were amplified from chromosomal DNA of AU1054. The PCR products were digested with NdeI and XbaI and inserted into a similarly digested plasmid, pDAI-SceI [37,40] resulting in the final constructs of pDA-copR and pDA-lipA. The complementing plasmids were introduced into the desired mutant strains by triparental mating as described above.

Growth kinetics
To examine the growth kinetics of wild-type (WT) and mutant strains, overnight bacterial cultures of test strains were diluted 1:100 into LB broth and further cultured with agitation (250 rpm) at 37°C. One ml samples of cell suspension were monitored at different time points by measuring absorbance at 600 nm (OD 600 ) using a spectrophotometer as described previously [39]. Experiments were performed in triplicate and repeated three times.

Cell lines and cell culture
The A549 cell line (American Type Culture Collection) is a human alveolar epithelial carcinoma cell line. A549 cells were grown in F-12 K tissue culture medium supplemented with 10% fetal bovine serum and penicillin/ streptomycin 100 U/ml (Gibco) in a humidified atmosphere at 37°C with 5% CO 2 . For bacterial invasion and replication experiments, A549 cells (2 × 10 5 cells per well) were seeded in 24-well tissue culture plates (BD Falcon). The cultures were grown at 37°C with 5% CO 2 . Prior to infection, cells were incubated overnight in antibiotic-free medium.

Bacterial invasion and intracellular replication assay
The ability of B. cenocepacia to invade A549 epithelial cells was examined. Invasion assays were performed by a modification of the gentamicin protection assay described previously [39]. Briefly, A549 cells were infected with mid-log phase (OD 600 = 0.5) AU1054 at a multiplicity of infection (MOI) of 10. Infected monolayers were centrifuged (300 × g for 5 min) and incubated at 37°C in a humidified atmosphere with 5% CO 2 for 2 h to allow bacterial entry. Media from the wells were then aspirated and washed three times in phosphate-buffered saline (PBS) to remove unbound bacteria. Extracellular bacteria were then killed by incubation for 2 h in medium containing 2 mg/ml Cef and 2 mg/ml Ami. Cells were washed with PBS, trypsinized and lysed with 0.1% Triton X-100. Intracellular bacteria were quantitated by plating serial dilutions of cell lysates. For intracellular replication assays, after extracellular killing and PBS washing, cells were further incubated in F-12 K medium containing Ami and Cef for 24 h, then trypsinized, lysed and plated to determine the abundance of intracellular bacteria. Bacterial CFUs recovered at 24 h were used to calculate the recovery rate of intracellular replication relative to the baseline values of bacterial invasion at 2 h. Experiments were repeated in triplicate.

Bacterial adhesion assay
A bacterial adhesion assay was performed as previously described with slight modification [41,42]. Briefly, A549 cells were seeded into 24-well tissue culture plates at 2 × 10 5 cells per well and incubated at 37°C with 5% CO 2 for 24 h before infection. Bacterial infection of cell lines was as described above for invasion experiments with an MOI of 50:1. Infected cells were incubated for 1 h at 37°C followed by rinsing five times with PBS to remove non-adherent bacteria. Cells were then trypsinized and lysed with 0.1% Triton X-100 and CFUs were counted by plating serial dilutions of cell lysates on LBA plates.

Identified GIs and their characteristics
Compared with non-island regions, GIs tend to exhibit more consistent GC contents, reflected by the fact that GI regions usually appear as straight lines within the cumulative GC profile [29,33]. In addition, an ascension (drop) line within the cumulative GC profile indicates that the region has a lower (higher) GC content than the rest of the host [43]. Through combining the cumulative GC profile method and IslandViewer [29,35], 14 putative GIs on chromosome I and 7 GIs on chromosome II were identified. However, there are no common predictions between the two methods for chromosome III. Details of the 21 GIs on chromosome I and II are listed in Table 3. All the 21 GIs are AT-rich, given that both chromosomes have average G + C contents of 0.669. As an example, the homogeneity and AT-richness of seven GIs on chromosome II could be illustrated by their patterns within the chromosomal cumulative GC profile (Fig. 1), where all GIs show as ascending straight lines. It was noted there are also several other segments approximating straight lines, but these were filtered out by an additional check of the h index and the IslandViewer tool. Here, all the 21 identified GIs are AT-richer than the core genomes, but they have different bias of G + C extents.
As indicated, authentic GIs exhibit certain six specific characteristics. These six characteristics are summarized in Table 3 for all 21 GIs. As can be seen for the feature of G + C bias, GI (1934646..1946682) has the smallest bias   belongs to the same COG group with the virulence gene in J2315 and K56-2, the gene will be regarded as putative virulence gene in AU1054. Since COG annotation has been taken as a routine tool of function annotation in prokaryotes, we think that such type transfer of virulence annotation will be much reliable. By COG match, we obtained a total of 118 putative virulence factors in the strain AU1054 and these are listed in Additional file 1: Table S1. Homologues of these have been experimentally determined to be associated with pathogenesis in another strain of B. cenocepacia [4,44]. Furthermore, 14 genes have been shown to be associated with virulence just in the strain AU1054 by transposon mutagenesis and screening attenuated virulence [45]. Additional file 1: Table S2 lists these confirmed virulence factors. There is only one overlap (Bcen_2776) between the two lists.
With such information from these genes, we could identify which islands are PAIs. Consequently, four GIs are found to contain putative or confirmed virulence factors. Two identified PAIs are located on chromosome I, and they are referred to as PAI 1 (311461..333231) and PAI 2 (1672260..1684139). Regarding the virulence factors, Bcen_0292 is the homologue (identity = 63%) of the experimentally validated VF BCAL3240 in the strain J2315 [4,44], whereas Bcen_1509 is a validated VF just in the strain AU1054 [45]. Therefore we refer the former as a putative PAI because it contains only the putative virulence factor of AU1054, whereas the latter is a confirmed PAI given that it contains one virulence factor validated as in the strain.
In total, among the 118 putative or confirmed virulence factors, 15 are found to be located in island regions and this means island regions contain 12.7% of all putative virulence factors. However, two of the 14 confirmed virulence factors are located in island regions, corresponding to a ratio of 14.3%. Both ratios are significantly higher (U test: U = 2.03, p < 0.05 in the former case and U = 6.80, p = 0.000000001 in the latter case) than the percentage (7.4%) of the total size of all islands divided by the total chromosome size. This result is consistent with a previous investigation which showed that GIs regions disproportionately contain more virulence factors than the remainder of a given genome [46].
The definition of GIs refers to genomic regions present in one strain but absent in closely related strains [24]. Therefore, it is interesting to investigate the appearance of the four PAIs in the other six sequenced strains of B. cenocepacia. As Fig. 2 shows, these four PAIs indeed have abnormal phylogenetic distributions if we only consider the homologues with significant match (or hit). For example, PAI 1 shares homologies only in strain HI2424 (Fig. 2a), indicating this strain has the closest evolutionary distance to AU 1054 among the six B. cenocepacia strains. PAI 2 exhibits homologues in strains HI2424 and MC0-3 (Fig. 2a), but the former has a higher similarity, indicating MC-03 is the next closest strain to AU1054. PAI 3 reveals a similar case with PAI 2.

Identification of novel virulence factor determinants or effectors from putative PAIs
In order to further verify our bioinformatics analyses for the PAIs, two putative PAIs were selected for further confirmation with wet experiments. Gene lipA (Bcen_0285) from PAI 1, encoding a capsule polysaccharide modification protein, and gene copR (Bcen_1505) from PAI 2, encoding a transcriptional regulatory protein, were selected and deleted, respectively. These two genes attracted our attention because that they have not been determined as virulence factors in the species B. cenocepacia according to two comprehensive reviews [4,44], but constitute virulence factors in distant bacteria after our in silico comparison with VFDB [47]. We first examined the growth kinetics of the WT AU1054 strain, compared with mutant strains AU1054ΔlipA and AU1054ΔcopR. As shown in Fig. 3a, the growth rate of each mutant was unaltered in comparison to WT AU1054, indicating that deletion of lipA and copR genes does not affect bacterial growth rate. We then determined bacterial survival and replication properties of WT AU1054 and mutant strains AU1054ΔlipA and AU1054ΔcopR in cell line A549, 24 h post-infection. Results showed that mutants with lipA and copR deletion had a significantly lower intracellular multiplication than that of WT AU1054. When the mutant strains were complemented with pDA-lipA and pDA-copR respectively, their intracellular replication abilities were fully restored (Fig. 3b), That is to say, if a strain has the largest homologous match length, it will be assigned most adjacent with AU 1054. Note that confirmed or putative VFs are marked on the bar of AU 1054 as blue box Fig. 3 Survival in and adherence to A549 cells of B. cenocepacia strains. a Growth of WT AU1054 versus mutant strains AU1054ΔlipA and AU1054ΔcopR cultured in LB. The optical density at OD 600 was measured hourly over 14 h. b Intracellular survival of WT AU1054 and derivative mutant strains in A549 cells. Bacterial infections were performed with MOI of 10, and bacterial survival was represented as recovery rate of CFUs at 24 h relative to that at 2 h. c Bacterial adherence assays with different AU1054 strains in A549 cells. Bacterial infections were performed with an MOI of 50. Adherence values were calculated by determining the percentages of bacterial CFUs after adhesion relative to that of original CFUs added for infection (*P < 0.05; **P < 0.01; ***P < 0.001; ns: no significant difference) confirming that lipA and copR genes play a significant role for bacterial survival and replication in human cell lines. The importance of the genes lipA (from PAI 1) and copR (from PAI 2) for bacterial invasion and replication indicates that they are required for full invasiveness of B. cenocepacia and may function as virulence determinants for bacterial pathogenesis and host infection.
Furthermore, there is evidence that the bacterial capsule could affect adhesion to host cells, influence the elimination by human neutrophils, and modulate the virulence in animal models of infection [48,49]. In order to determine if the observed replication and invasion defect of the mutants was due to reduced binding to A549 cells, we performed adhesion assays employing WT and mutant strains and compared their adherence characteristics. Consistent with the contribution observed for the capsule in other bacteria for intracellular invasion [48], capsule mutant strain AU1054ΔlipA was approximately two-fold higher in adherence capacity to human cells than the encapsulated strains, including the WT and mutant strain AU1054ΔcopR. The adherence of AU1054ΔlipA could be compromised in comparison to that of WT strain after complementation with pDA-lipA (Fig. 3c), indicating that the capsule negatively affects AU1054 adhesion to human epithelial cells. Taken together, although capsule-deficient mutants were internalized much more efficiently than that of encapsulated bacteria, their invasion and replication abilities were much lower in infected cells, suggesting that the capsule contributes to the establishment of bacterial infection.
Two novel virulence-related genes lipA from PAI 1 and copR from PAI 2 were confirmed by wet experiments. Sequences of the two genes were extracted from '.ffn' file. BLASTn search was performed via NCBI blast web server (https://blast.ncbi.nlm.nih.gov/Blast.cgi) with optional organism 'Burkholderia cenocepacia (taxid:95486)' and found that gene lipA has homologous gene in strain HI2424. Using the same method, we identified another six homologous genes of copR in strain J2315, H111, HI2424, MC0-3, DDS 22E-1, DWS 37E-2. In order to verify whether the homologous genes with lipA and copR are located in island region or the core genome in the other six strains, we also identified islands for them. Consequently, the homologous gene of lipA, Bcen2424_0769, is located with genomic island (864830..881195) in strain HI2424. Conversely, all the homologous genes of copR are located in the core genomes of the six strains. Specifically, the PAI 1 located in the chromosome I of strain AU1504 contains genes Bcen_0284-Bcen_0285 and Bcen_0287-Bcen_0298. Among them, Bcen_0285 corresponds to lipA and its function is capsule polysaccharide biosynthesis PAI 1.

Discussion
GIs are genomic regions that reveal evidence of horizontal DNA transfer, particularly in bacteria [10]. GIs can code for many functions, symbiotic or pathogenic, and may augment an organism's adaptation to the host or environment [50]. Two steps were involved with GI integration and formation [51]. Previously, GIs were found to disproportionately contain more virulence factors than the rest of a given genome [46]. Here, the ratios of virulence factors in island regions and in the remaining genome in the strain B. cenocepacia AU 1054 are 12.7% and 7.4%, respectively, consistent with previous observations. Furthermore, GIs tend to contain distinguishing characteristics, such as limited length range, distinct composition, mobility genes, flanking tRNA and higher ratios of hypothetical genes [25,26]. After thorough analysis, all 21 GIs identified here share at least one conserved feature, suggesting that the predictions are highly reliable. However, a high rate of false-positive predictions is a widely reported phenomenon using the existing GI predicting methods. To compare the result of combining the two methods and only retaining the common predictions, we also analyzed features of GIs exclusively predicted by the cumulative GC profile or IsandViewer tool. As shown in Additional file 1: Tables S3 and S4, four of the 16 GIs exclusively predicted by the former method are GC richer than the host, and four also have higher GC contents among the 22 GIs exclusively predicted by the latter method. The property of GC richness decreases the possibility that they constitute genuine predictions. As for the other five typical features, 21 GIs share 2.24 common features on average, whereas 38 exclusive GIs only exhibit 1.29 on average. The difference between the two average numbers is statistically significant (p = 0.0007 by student's t-test). Therefore, the approach of coupling two independent computational methods and selecting common predictions has been shown to be successful, and the final predictions do indeed have lower false-positives.
Four GIs were identified as putative PAIs by combining island-prediction tools and identification of putative virulence factors in strain AU 1054. Although these genes have been validated as enhancing pathogenicity in other strains of B. cenocepacia, they are not validated in AU 1054, hence the islands are termed 'putative' PAIs. With bioinformatics tools, we identified two further potential virulence factors in two of the putative PAIs. These factors were not previously validated in B. cenocepacia but have been validated in distantly related species. Using knock-out experiments in viable host cells, we demonstrated the role of these factors to favor infection. Our strategy of jointing two tools may be used to identifying GIs in other bacterial genomes. We downloaded the 13 GIs of the strain B. cenocepacia J2315 identified by comparative genomics method [23], and use this dataset as the gold standard, 9 of them were also identified by our combining methods of cumulative GC plot and the Island Viewer. We obtained precision value of 42.86% for the Cumulative GC profile [52], whereas IslandViewer [29,35] has the precision of 12.63%. After combing the two methods, the precision increases to 50%. Following this approach, we propose a convenient and rapid pipeline to identify novel virulence factors in certain pathogenic strains. First, GIs can be rapidly identified using computational techniques. Second, DNA sequence homology searches of the genes contained in GI regions can identify possible virulence factors. Finally, gene deletion experiments may validate (or otherwise) the function of the predicted virulence factors. Because these genes contribute to pathogenicity in distantlyrelated species, and that virulence factors are frequently associated with GIs [46], the predicted virulence factors in island regions have much-elevated likelihood to authentically contribute to infection.

Conclusions
In this work, we identified 21 (GIs) in B. cenocepacia strain AU 1054 by combining two computational tools. Feature analyses suggest that the predictions are reliable and hence illustrate the advantage of joint predictions by two independent methods. Four GIs were further identified as PAIs because they contain putative virulence factors. Two PAIs were confirmed by experimental validation of virulence related functions for genes in them. Such approach of theoretically predicting GIs, and then identifying potential virulence factors in the island regions with final validation using wet experiments may be used to discover or validate virulence factors in other bacterial species and strains.

Additional file
Additional file 1: Table S1. List of putative virulence factors in AU 1054 generated by COG ID match with known VFs in the strain J2315, and information of known virulence factors in the J2315 were obtained from Ref. 4