Combining Functional and Structural Genomics to Sample the Essential Burkholderia Structome

Background The genus Burkholderia includes pathogenic gram-negative bacteria that cause melioidosis, glanders, and pulmonary infections of patients with cancer and cystic fibrosis. Drug resistance has made development of new antimicrobials critical. Many approaches to discovering new antimicrobials, such as structure-based drug design and whole cell phenotypic screens followed by lead refinement, require high-resolution structures of proteins essential to the parasite. Methodology/Principal Findings We experimentally identified 406 putative essential genes in B. thailandensis, a low-virulence species phylogenetically similar to B. pseudomallei, the causative agent of melioidosis, using saturation-level transposon mutagenesis and next-generation sequencing (Tn-seq). We selected 315 protein products of these genes based on structure-determination criteria, such as excluding very large and/or integral membrane proteins, and entered them into the Seattle Structural Genomics Center for Infection Disease (SSGCID) structure determination pipeline. To maximize structural coverage of these targets, we applied an “ortholog rescue” strategy for those producing insoluble or difficult to crystallize proteins, resulting in the addition of 387 orthologs (or paralogs) from seven other Burkholderia species into the SSGCID pipeline. This structural genomics approach yielded structures from 31 putative essential targets from B. thailandensis, and 25 orthologs from other Burkholderia species, yielding an overall structural coverage for 49 of the 406 essential gene families, with a total of 88 depositions into the Protein Data Bank. Of these, 25 proteins have properties of a potential antimicrobial drug target i.e., no close human homolog, part of an essential metabolic pathway, and a deep binding pocket. We describe the structures of several potential drug targets in detail. Conclusions/Significance This collection of structures, solubility and experimental essentiality data provides a resource for development of drugs against infections and diseases caused by Burkholderia. All expression clones and proteins created in this study are freely available by request.


Introduction
Gram-negative bacteria of the genus Burkholderia include the pathogenic species B. pseudomallei and B. mallei, potential bioterrorism agents and the causative agents of melioidosis and glanders, respectively, and B. cenocepacia, which causes often-fatal pulmonary infections in patients with cancer and cystic fibrosis [1][2][3].
Treatment of these infections is challenging due to intrinsic and acquired drug resistance [4,5]. New approaches are needed to develop antibiotics less susceptible to drug resistance.
A first step in focusing a search for new antimicrobials is to identify the set of genes required for survival of the pathogen. Methods to determine a minimum set of essential genes include experimental approaches based on genome-wide gene disruption or systematic mutagenesis [6][7][8][9][10], and bioinformatic methods based on comparative analysis of genomes [11,12]. Experimentally determined counts of essential genes in infectious bacteria range from ,200 to .600 [10][11][12][13], with estimates for Burkholderia using computational methods ranging from 312 to 649 [11,14]. There have been no whole-genome essentiality studies in the genus Burkholderia. The order Burkholderiales was estimated to have 610 orthologous gene families conserved among all 51 species, using an all-against-all BLAST search of the 51 proteomes and clustering into ortholog groups using OrthoMCL [11]. These 610 ortholog groups corresponded to 649 genes in B. cenocepacia, 454 of which had homologs in the Database of Essential Genes (DEG) [15]. In B. pseudomallei, 312 putative essential genes that lack close human homologs were predicted based on comparison of the B. pseudomallei proteome with the DEG and with the human proteome [14]. A set of 335 putative essential genes was identified experimentally in P. aeruginosa, a pathogen phylogenetically similar to B. cenocepacia, using saturation-level transposon mutagenesis [16], while a different study of P. aeruginosa also using saturation transposon mutagenesis estimated 300-400 essential genes [17].
Together with knowledge of essential functions, another critical resource for developing new antimicrobials is a set of highresolution three-dimensional structures for the corresponding proteins. Such structures are required for structure-guided drug lead design and refinement. Improvements in high-throughput protein expression and structure determination methods have improved the overall gene-to-structure success rate, but this rate typically remains relatively low (,10%) due to insolubility of a high percentage of proteins in heterologous expression systems [18,19], and intractability of other proteins to crystallization or structure determination. One strategy that has been employed to improve success rates is to ''rescue'' such proteins by adding orthologs from related species to the pipeline [18], based on the assumptions that many of these will have slightly different physical properties that may improve their solubility or crystallization, and that close orthologs will have structures sufficiently similar to the original target to be useful as surrogates in drug design [18,20,21].
In this study, we apply saturation-level transposon mutagenesis and next-generation sequencing (Tn-seq) to identify putative essential genes in B. thailandensis, a low-virulence species with a genome closely related to that of B. pseudomallei [22] and sharing numerous physiologic and virulence traits [23][24][25]. We then applied high-throughput structure determination with an ''ortholog rescue'' approach to maximize structural coverage of these essential genes. For each essential gene product with a structure solved, we analyze the protein for properties of a potential antibacterial drug target, such as lacking a close human homolog, being a member of an essential metabolic pathway (having $2 essential enzymes), and possessing a binding pocket capable of enveloping a compound of at least six non-hydrogen atoms. We describe five of these potential drug targets in detail. The resulting collection of structures and information about target essentiality and solubility provides a resource for development of new antibiotics to treat Burkholderia-related infectious diseases.

Experimental Determination of Putative Essential Genes in B. thailandensis
The genome of B. thailandensis E264 consists of 6.72 million base pairs and 5712 predicted genes. We used saturation-level transposon mutagenesis followed by next-generation sequencing to identify putative essential genes (see Materials and Methods). Two independent pools of mutants were generated with .30 insertions per gene, and insertion locations were identified by Tnseq, a technique which uses next-generation sequencing to profile complex pools of insertion mutants [26]. Genes with no, or only a few (,10% of the average per gene density), insertions in both pools were considered putative essential genes. A total of 406 such genes were identified, representing 7.1% of the total predicted gene set of B. thailandensis. These results are summarized in Table 1; the complete set of putative essential genes with number of insertions per kB is listed in Table S1.
We examined these genes by mapping them to metabolic pathways using the Kyoto Encyclopedia of Genes and Genomes (KEGG) [27], and by comparing them with genes previously identified as essential in related organisms (Table S1). Prior to this study, there had been no experimental genome-wide essentiality studies in Burkholderia. We searched for homologs among genes predicted to be essential in B. cenocepacia (from the computationally defined ''core genome'' in Burkholderiales, based on gene conservation among all 51 species in the order with an available genome sequence [11]); in P. aeruginosa (based on saturation transposon mutagenesis [16]); and in the Database of Essential Genes (DEG), a collection that includes 7430 prokaryotic genes [15]. We used a BLASTP search with an E-value cutoff of 1610 210 and a minimum 30% sequence identity over at least 50% of the sequence to identify homologs. Of our 406 putative essential genes, 349 (83%) had homologs identified as essential in other bacteria; 241 of 406 had homologs among the core genome of B. cenocepacia; 330 of 406 had homologs in the DEG (including all but three of the 241 B. cenocepacia homologs), and we found 13 additional homologs among genes identified as essential in P. aeruginosa [16]. Table S1 lists the closest homologs (best hits) and their percent sequence identity. We found no homologs for 70 of the 406:48 of these have homologs in the B. cenocepacia proteome not in the ''core genome'' [11], while 27 were annotated ''protein of unknown function.' ' We also used BLASTP to map 265 of the 406 B. thailandensis genes onto 62 different KEGG [27] metabolic pathways (see Table S1). Several pathways essential for bacterial growth (such as the histidine, purine, and pyrimidine biosynthetic pathways, tRNA charging pathways, and the aspartate pathway) were overrepresented, despite the use of rich growth medium containing amino acids and nucleosides. Other pathways that are thought to be non-essential for in vitro growth (including those for aromatic compound degradation and UDP sugar interconversion) were under-represented.

Selection of Targets for Expression and Structuredetermination
The 406 B. thailandensis putative essential gene targets were processed according to normal SSGCID target selection criteria: eliminating proteins with over 750 amino acids, 10 cysteines, or 95% sequence identity with 70% coverage to proteins already in the PDB, targets being worked on by other groups, and targets with transmembrane domains (except where a soluble domain could be expressed separately). Using these criteria, 315 of the 406 B. thailandensis essential genes were selected for cloning.
Since we expected a modest success rate for these 315 targets, we also implemented an ''ortholog rescue'' strategy to increase the likelihood of solving a structure for each gene product. Orthologs (and paralogs) of the 315 B. thailandensis genes were identified in seven other Burkholderia species (B. pseudomallei, B. cenocepacia, B. ambifaria, B. multivorans, B. phymatum, and B. xenovorans) selected based on their medical significance and phylogenetic diversity (we sought to maximize the coverage of sequence space). To identify orthologs, we used a BLASTP search of the 315 selected B. thailandensis genes against the proteomes of these species using a cutoff of 40% sequence identity over 70% of the sequence, and clustered the resulting sequences into ortholog groups using OrthoMCL [28,29]. These ''ortholog groups'' include both orthologs and in-paralogs -we will use ''orthologs'' to include both. Based on this search, an additional 387 orthologs from these seven Burkholderia species were selected, bringing the total number of targets selected for structure determination to 702.

High-throughput Structure Determination
Target progress by Burkholderia species as of October 1, 2012 is shown in Table 2. We stopped work on any target for which an ortholog structure was solved, except in five cases in which multiple orthologs were too far along in the structure determination process to warrant not completing deposition into the PDB. Out of 702 targets approved by the NIAID, 698 were selected for cloning and 675 were successfully cloned from genomic DNA. In small-scale screening, 450 of the 675 cloned targets (67%) showed soluble expression with an N-terminal His 6 -tag. Of these 450 soluble proteins, 170 crystallized (38%) and 68 proteins diffracted with sufficient resolution to meet SSGCID quality criteria and were submitted to the PDB. A total of 88 structures were deposited into the PDB, including ligand-bound structures. X-ray crystallography data are summarized in Table S3. As shown in Table 3 (and in the expanded version, Table S2), structures were solved for 31 B. thailandensis targets and 25 targets in other Burkholderia species -56 total Burkholderia proteins -representing 49 B. thailandensis putative essential genes.

Analysis of Solved Targets
We analyzed each of the 56 proteins for properties of a potential antimicrobial drug target: having no close human homologs (based on a BLASTP search against the human proteome, using an E-value cutoff of 1610 210 with .30% sequence identity and 50% coverage) (30/56), being a member of an essential metabolic pathway (having at least two enzymes with homologs in the Database of Essential Genes) (48/56), and possessing a binding pocket capable of enveloping a compound of at least six nonhydrogen atoms (54/56). The closest human homologs (best hits) are shown along with percentage sequence identity and coverage in Table 3. We used KEGG to identify one or more pathways for each protein (Table S2). To determine whether these pathways contained more than one essential enzyme, we obtained a list of all enzymes in each pathway from the KEGG, and performed a BLASTP search of the sequences of these enzymes against the DEG (using an E-value cutoff of 1610 210 and minimum 30% sequence identity and 50% coverage). Of the 56 proteins, 48 had a pathway listed in the KEGG, and all of these pathways had at least two enzymes with homologs in the DEG (Table S2). Of the 56 Burkholderia proteins with a structure solved, 25 satisfied all three criteria of a potential antimicrobial drug target listed above.
In five cases, we obtained structures from two or more orthologs of the same essential gene, although none of these cases included the original B. thailandensis target. To assess the structural similarity of orthologs, we calculated overall Ca RMSD values for all seven pairs of ortholog structures (without bound ligand) (Table S4). While these ortholog pairs had a mean amino acid sequence identity of 55626% (1 standard deviation) with a mean coverage of 97%, in some cases the sequence identity was only 30-38%. Nevertheless, all ortholog pairs showed a high degree of structural similarity, with an average RMSD of 1.560.5 Å over all common Ca atoms. In general, pairs with greater sequence identity showed more structural similarity, but there were exceptions. For instance, while BURPS1710b_3264 showed 50% sequence identity to both BamMC406_2018 and BuceA.00102.a, the RMSD was 2.1 Å for the former, but only 1.4 Å for the latter. In contrast, Bxe_A1072 and Bxe_A0096 both showed an RMSD of 1.8 Å from

Structures of Burkholderia Putative Essential Proteins
FabH, which encodes 3-oxoacyl-(acyl-carrier-protein) synthase, is essential in the absence of long chain fatty acids in some species, such as E. coli, but not in others, such as Pseudomonas aeruginosa [30,31], and has been identified as a promising drug target in pathogenic bacteria [32]. The B. thailandensis FabH gene (BTH_I1717) was among the group of genes we identified as essential for in vitro growth using rich medium. We solved structures for orthologs/in-paralogs of this gene in B. pseudomallei (BURPS1710b_0096, PDB: 3GWA and 3GWE) and B. xenovorans (Bxe_A0096, PDB: 4EFI and Bxe_A1072, PDB: 4DFE) ( Figure 1). As discussed above, these structures are very similar, with a chainto-chain RMSD over all common Ca atoms of 1.8 Å . FabH has no close human homolog, so the availability of structures from multiple orthologs may be useful in designing antimicrobial drugs with cross-species reactivity.
Isochorismate is an intermediate in the synthesis of siderophores such as enterobactin and vibriobactin, which are crucial for microorganisms to acquire iron from their surroundings [34,35]. We solved a structure for the putative isochorismatase family protein, BTH_II2229 (PDB: 3TXY) from B. thailandensis. This protein has no close human homolog, but shows sequence and structural similarity to PhzD from P. aeruginosa (PDB: 1NF8, 30% sequence identity, 47% coverage, 1.7 Å overall Ca RMSD) ( Figure 3 formation of phenazine-1-carboxylic acid (PCA). Derivatives of PCA are virulence factors and natural antibiotics in several pathogenic strains of bacteria, including Pseudomonas and Streptomyces [36]. This structure may be useful in selecting compounds to validate isochorismatase as a drug target in Burkholderia and other GNRs.

). PhzD catalyzes an intermediate reaction in the
Thymidylate synthase (TS) is a proven anti-cancer drug target with active ongoing research for its potential as an antibacterial [37][38][39][40][41]. The high sequence and structural homology across TS enzymes from human and many parasite species, particularly within active site residues, creates a challenge for obtaining drug selectivity [42,43]. The B. thailandensis TS protein (BTH_I1680,   PDB: 3V8H) has an arginine residue substituted for a canonical active site tryptophan (W83 in E. coli); arginine is also the side chain found in human TS (Figure 4). While, the difference in amino acid identity in the active site between human and Burkholderia proteins may be too small to develop a broad-spectrum antibiotic capable of host-parasite selectivity, large subdomain differences between TS enzymes from different species (not shown) may provide an alternate drug development strategy. An additional strategy in targeting TS is to simultaneously target thymidine kinase (TK), since bacteria may circumvent TS inhibition through TK activity [44]. In this regard, we have also solved a structure for TK in B. thailandensis (BTH_I2154, PDB: 3V9P). A therapy targeting both TS and TK enzymes could prolong the lifespan of inhibitors with human-parasite selectivity.
Peptidyl-tRNA hydrolase (PTH) is an enzyme that cleaves the ester bond on peptidyl-tRNAs that are stalled on the ribosome, releasing an N-substituted amino acid and free tRNA [45]. Inhibition of PTH depletes the supply of aminoacyl-tRNA, stopping protein synthesis. We identified PTH as essential in B. thailandensis, and it has been identified previously as essential in other bacteria [46,47]. The structure for PTH in B. thailandensis (BTH_I0472, PDB: 3V2I) has a large, charged binding pocket ( Figure 5). Discovery of a ligand that binds the alternately charged (positive/negative/positive) channel could block the reaction and prevent protein synthesis. PTH has a human homolog (Q86Y79 UniProtKB AC, no PDB structure available) with 36% sequence identity and 87% coverage, so further structural comparison using a 3D model of the human protein would be necessary to determine whether drug selectivity is possible. However, achieving selectivity may not be necessary since eukaryotes possess multiple PTH activities [48].

Discussion
Here we report a functional and structural genomics effort that applied saturation-level transposon mutagenesis and next generation sequencing (Tn-seq) to identify essential genes in B. thailandensis, followed by high-throughput structure determination. We used an ''ortholog rescue'' approach to maximize structural coverage of these gene families, which are likely to be essential not only in B. thailandensis, but also in related, but more virulent, Burkholderia species, such as B. pseudomallei. A large fraction of the genes (83%, 336/406) that we identified have homologs previously identified as essential either in B. cenocepacia [11], in P. aeruginosa [16], or in other prokaryotes listed in the Database of Essential  Genes [15]. Of the remaining 70, some are likely to be essential but have not been identified previously, as there had been no experimental genome-wide essentiality studies in Burkholderia prior to this study. A small percentage of our putative essential genes may be false positives -genes wrongly identified as essential. These are most likely to be small genes which due to their size are most likely to have eluded mutagenesis, or genes with close to the threshold of three insertions per kB in the 5-90% portion of the ORF (in two independent mutant pools) (Table S1). This threshold was chosen based on a survey of genes thought to be essential based on annotated function, in which small numbers of insertions were detected, and was used to reduce false negatives; for example, rare insertions in transiently duplicated genes or within intradomain regions may not fully abrogate essential function. False negatives are still possible, and are most likely to be genes that possess nonessential domains tolerant of transposon insertions.
The number of essential genes identified, 406, falls within the range of values estimated for other bacteria using experimental approaches such as genome-wide gene disruption or mutagenesis [10,12,13]. Experimentally determined estimates of the number of essential genes in pathogenic bacteria range from ,200 to .600. By comparing the genomes of all 51 species in the order Burkholderiales and clustering using OrthoMCL, Juhas et al. identified 610 ortholog groups conserved among all 51 species (the ''core genome''), corresponding to 649 genes in B. cenocepacia [11]. Of these 649 genes, 454 had homologs in the Database of Essential Genes (DEG). However, both computational gene conservation analysis and experimental methods that use lower mutation rates per gene (upon which much of the DEG is based) are likely to overestimate the number of essential genes.
By using an ortholog rescue strategy for insoluble or difficult to crystallize targets, we increased our structural coverage of B. thailandensis essential genes from 31/406 (7.6%) to 49/406 (12.1%) ( Table 3, Table S2). Such an approach has been used previously in high-throughput structure determination efforts to similarly improve the overall gene-to-structure efficiency for closely related protein sequences. In Plasmodium, the ortholog rescue approach was able to improve the protein solubility rate to 229/468 target genes (49%) resulting in 32 structures (6.8%) [18]. SSGCID has also improved the gene-to-structure rate from 11% for Mycobacterium tuberculosis targets to 36% by using orthologs from nine other Mycobacterium species [manuscript in preparation]. However, the underlying rationale for this approach -that ortholog structures are sufficiently similar to serve as surrogates in drug design -has rarely been verified with experimental data. For the seven pairs of ortholog structures (with no bound ligand) solved in this study, the average overall Ca RMSD was 1.560.5 Å (Table S4), indicating a high degree of structural similarity. This structural similarity suggests that the ortholog approach is an efficient method to obtain useable structures from otherwise intractable targets, thereby lowering the barrier to structure-based drug design targeting infectious organisms. Ortholog structures may also be useful in designing broad-spectrum antibiotics with cross-species activity, and by representing a variety of functionally conservative point mutations in the active site may be useful in developing drugs less susceptible to mutations that cause drug resistance.
Of the 56 Burkholderia protein targets with a structure solved, 25 possess properties of a potential antimicrobial drug target: i.e., they were experimentally identified as an essential gene product or are a close ortholog; they are members of a metabolic pathway containing at least two essential enzymes (as listed in the DEG); they possess a deep, druggable pocket large enough to envelop a compound of at least six non-hydrogen atoms; and they lack a close human homolog, reducing the chance of host toxicity. Thus we have solved structures for 25 Burkholderia proteins that appear worthy of further validation as drug targets, including chemical validation to determine whether blocking the target affects cell growth and viability in vivo.

Conclusions
We have combined an experimental genome-wide essentiality screen in B. thailandensis, using a high rate of insertions per gene, with high-throughput structure determination and an ortholog rescue approach to achieve a significant structural coverage of essential genes. Using only seven Burkholderia species to select orthologs of essential genes, we solved structures for 49/406 essential gene families, and for 56 total Burkholderia protein targets (including seven ortholog replicates). Of these 56 targets, 25 satisfied criteria for being a potential antimicrobial drug target. By increasing the number of species used to select orthologs, future efforts may come closer to complete coverage of the essential structomes of other infectious organisms. The resulting collection of structures and information about target essentiality and solubility provides a resource for development of new antibiotics to treat Burkholderia-related infectious diseases.
Expression clones and proteins created in this study can be freely obtained via BEI Resources (http://www.beiresources.org/ StructuralGenomicsCenters.aspx) and through the SSGCID website (http://www.ssgcid.org/home/index.asp). Clones and proteins may be searched for using the SSGCID Target IDs listed in Table S2.

Experimental Identification of Essential Genes
B. thailandensis strain E264 (ATCC 700388) was mutagenized with transposon T23 (ISlacZ_p rhaBout -Tp/FRT) by conjugal delivery from E. coli strain SM10/lpir of suicide plasmid pLG99, which bears the transposon and the transposase gene. Insertion mutants were selected by incubation for 24 h at 37uC on TYE agar (10 g tryptone, 5 g yeast extract, 8 g sodium chloride and 15 g agar per L) supplemented with 50 mg/mL trimethoprim (to select for insertion mutants) and 100 mg/mL streptomycin (to select against the E. coli donor). Mutants were pooled by scraping cells off the selective media, and DNA from the pools purified by DNeasy Blood & Tissue Kit (Qiagen). Tn-seq analysis of the pooled DNA was carried out as described [25] using oligonucleotides specific for transposon T23 (sequences available upon request). Two independent pools were generated and analyzed ( Table 1). The number of chaste sequence reads obtained for the two pools were 26,398,169 and 11,888,155, of which 24,020,048 and 10,001,776, respectively, mapped to the E264 genome. Since insertions near gene termini may not represent null mutations, insertions within the 39 5% or 59 10% of each ORF were ignored when assessing essentiality. Additionally, since rare insertions in transiently duplicated genes or within intra-domain regions may not fully abrogate essential functions, genes with fewer than three insertions per kB (in the 5-90% portion of the ORF) were also included in the analysis. The limit of three insertions per kB was determined based on a survey of putatively essential genes (by annotated gene function) in which small numbers of insertions were detected. Thus, for a gene to be assigned as (putatively) ''essential'', it needed to receive fewer than three hits per kB in the 5-90% region in both mutant pools.
High-throughput Protein Expression, Purification, Crystallization, and Structure Determination PCR, cloning, screening, sequencing, expression screening, scale-up, and purification of proteins were performed as described previously [50,51]. DNA templates for PCR amplification were obtained from Joe Mongous (University of Washington, Seattle) for B. thailandensis E264 and B. ambifaria MC40-6, from Jane Burns (Seattle Children's Pediatrics) for B. cenocepacia J2315 and B. multivorans ATCC 17616, from Mary Lidstrom (University of Washington) for B. phymatum STM815 and B. xenovorans LB400, from Eshwar Mahenthiralingam (Cardiff University, UK) for B. vietnamiensis G4, and from American Type Tissue Culture for B. pseudomallei 1710b. Crystal trials, diffraction, and structure solution were performed as described previously [52,53].