BacCapSeq: a Platform for Diagnosis and Characterization of Bacterial Infections

BacCapSeq is a method for differential diagnosis of bacterial infections and defining antimicrobial sensitivity profiles that has the potential to reduce morbidity and mortality, health care costs, and the inappropriate use of antibiotics that contributes to the development of antimicrobial resistance.


RESULTS
Probe design strategy. We assembled a probe set comprised of 4.2 million oligonucleotides based on the Pathosystems Resource Integration Center (PATRIC) database (11), representing 307 bacterial species that include all known humanpathogenic species. The probe set also represents all known antimicrobial resistance genes and virulence factors based on sequences in the Comprehensive Antibiotic Resistance Database (CARD) (12) and Virulence Factor Database (VFDB) (13,14). Probes were selected along the coding sequences of the 307 targeted bacteria (see Table S1 in the supplemental material) with an average length of 75 nucleotides (nt) to maintain a probe melting temperature (T m ) with a mean of 79°C. The average interval between probes along annotated protein coding sequences targeted for capture was 121 nt (see Materials and Methods for further details). The probes capture fragments that include sequences contiguous to their targets; thus, we recovered near complete protein coding sequence. An example with Klebsiella pneumoniae is shown in Fig. 1a. Probes based on the CARD and VFDB databases ensured coverage of AMR genes and virulence factors, as illustrated by detection of the toxR virulence factor regulator in Vibrio cholerae (Fig. 1b) and bla KPC AMR gene in K. pneumoniae (Fig. 1c).
BacCapSeq performance using whole-blood nucleic acid spiked with bacterial nucleic acid. The efficiency of the bacterial capture sequence (BacCapSeq) system versus conventional unbiased high-throughput sequencing (UHTS) was assessed in side-by-side comparisons of data obtained with 5 million reads per sample. We began with extracts of whole blood spiked with DNA from Bordetella pertussis, Escherichia coli, Neisseria meningitidis, Salmonella enterica serovar Typhi, Streptococcus agalactiae, Streptococcus pneumoniae, V. cholerae, and Campylobacter jejuni at concentrations ranging from 40 to 40,000 copies/ml. BacCapSeq yielded up to 100-fold more reads and higher genome coverage for all bacterial targets tested compared to UHTS ( Table 1). The enhanced performance of BacCapSeq was particularly pronounced at lower copy numbers.
BacCapSeq performance using whole blood spiked with bacterial cells. We tested performance with whole blood spiked with K. pneumoniae, B. pertussis, N. meningitidis, S. pneumoniae, and Mycobacterium tuberculosis cells. Nucleic acid was extracted from the spiked sample and processed for BacCapSeq or UHTS. Here too, BacCapSeq yielded more reads and higher genome coverage than UHTS (Table 2 and Fig. 2).
BacCapSeq ® system (Becton Dickinson). Using BacCapSeq, we recovered near full genome sequence and identified antimicrobial resistance genes that matched standard clinical microbiology laboratory antimicrobial sensitivity testing (AST) profiles (Table 3; see Table S2 in the supplemental material).  BacCapSeq performance with human blood samples. Blood samples from two immunosuppressed individuals with HIV/AIDS and sepsis of unknown cause were extracted and processed for BacCapSeq and UHTS analysis in parallel. A causative agent was identified by either method; however, BacCapSeq yielded higher numbers of relevant reads and better genome coverage (Fig. 3). Salmonella enterica was detected in one patient. The other patient had evidence of coinfection with both S. pneumoniae and Gardnerella vaginalis.
BacCapSeq-facilitated discovery of potential AMR biomarkers. The current probe set specifically captures all AMR genes present in the CARD database. Demonstrating the presence of an AMR gene is not equivalent to finding evidence for its functional expression. To address this challenge, we have begun to use BacCapSeq to  BacCapSeq ® pursue potential biomarkers in bacteria exposed to antibiotics. We cultured ampicillinsensitive and -resistant strains of Staphylococcus aureus at an inoculum of 1,000 CFU/ml in the presence or absence of the antibiotic for 45, 90, and 270 min. We then extracted RNA for BacCapSeq and UHTS for transcriptomic analysis to find biomarkers that differentiated ampicillin-sensitive and ampicillin-resistant S. aureus strains exposed to ampicillin. As illustrated in Fig. 4, BacCapSeq enabled discovery of transcripts that were differentially expressed between 90 min and 270 min of antibiotic exposure. These represented constitutive genes that reflect bacterial replication, including classical strain-and species-specific markers such as 16S and 23S rRNA, elongation factors Tu (tuf) and G (fusA), protein A (spa), clumping factor B (clfB), or 30S ribosomal protein S12 (rpsL).

DISCUSSION
In the pre-antibiotic era, puerperal sepsis was a common cause of maternal mortality. Up to 30% of children did not survive their first year of life, and communityacquired pneumonia and meningitis resulted in 30% and 70% mortality, respectively (15). The advent of bacterial diagnostics and antibiotics has not only reduced the burden of naturally occurring infectious diseases, but has also enhanced our quality of life by enabling innovations in clinical medicine such as organ transplantation, joint replacement, and other invasive surgical procedures, immunosuppressive chemotherapy, and burn management (16). These advances are threatened by the emergence of AMR. In 2013, the collaborative World Economic Forum estimated 100,000 annual AMR-related deaths in the United States alone due to hospital-acquired infections (17). The global impact of AMR is estimated at 700,000 deaths annually, with the highest burden in the developing world. The financial implications are also dire. The CDC estimates the direct annual impact of antibiotic resistance in the United States is 20 to 35 billion U.S. dollars (USD), with an additional 35 billion USD in lost productivity (18). Absent an effective response to limit further growth in AMR, the challenge will continue to increase. The World Bank issued a report in March 2017 citing the impact of AMR on the gross domestic product (GDP) by 2030 will be between 1.1 trillion and 3.4 trillion USD (19).
Early, accurate differential diagnosis of bacterial infections is critical to reducing morbidity, mortality, and health care costs. It can also enhance antibiotic stewardship by reducing the inappropriate use of antibiotics. Multiplex PCR methods in common use for differential diagnosis of bacterial infections can identify potential pathogens but do not provide insights into the presence or expression of AMR genes. Furthermore, they do not include bacteria only rarely associated with significant disease, such as G. vaginalis, implicated here in unexplained sepsis in an individual with HIV/AIDS. Culture-based methods require 2 to several days to identify pathogens and even longer to provide antibiotic susceptibility profiles (20,21). Accordingly, physicians typically administer broad-spectrum antibiotics pending acquisition of more specific information (22).
We have not subjected BacCapSeq to rigorous validation using methods for determining limits of detection and reproducibility required for approval of a diagnostic assay in clinical microbiology. Nonetheless, results obtained with blood samples spiked with known concentrations of bacterial DNA (Table 1 and Fig. 1) or bacterial cells (Table 2 Table 2). This advantage was also observed in analysis of blood from patients with unexplained sepsis (Fig. 3), where reads obtained were higher with BacCapSeq than UHTS for S. enterica (3,183 versus 132), S. pneumoniae (419,070 versus 130), and G. vaginalis (776,113 versus 2,080). These findings suggest that where levels of bacteria in blood are below 40 cells per ml, BacCapSeq has the potential to indicate the presence of a causal pathogen that might be missed by UHTS.
Incubation periods in blood culture systems commonly range from 3 days to 5 days (23-25). Longer intervals may be required for sensitive detection of some pathogenic species of Neisseria, Rickettsia, Mycobacterium, Leptospira, Ehrlichia, Coxiella, Campylobacter, Burkholderia, Brucella, Bordetella, and Bartonella. An additional challenge is that bacterial loads may be low or intermittent. Cockerill et al. (24) and Lee et al. (26) have suggested that 80 ml of blood in four separate collections of at least 20 ml of blood are required for 99% test sensitivity in detecting viable bacteria. We have not performed head-to-head comparisons of either BacCapSeq or UHTS with culture. We also cannot address the issue of intermittent bacteremia but are nonetheless encouraged by the observation that current estimates of BacCapSeq sensitivity (a minimum of 40 copies per ml) correspond favorably to the 80 ml sample volume recommended in culture tests (26). The American Society for Microbiology and the Clinical and Laboratory Standards Institute (CLSI) require false-positivity rates below 3% (27,28). Protocols for hygiene in diagnostic microbiology will be even more stringent with BacCapSeq than culture because nucleic acids are not eliminated by common disinfectants. To limit potential false-positive calls based on spurious read signals, we propose a conservative threshold cutoff for bona fide signal of 10 reads per million reads. However, the presence of fewer reads specific for the particular strain or isolate should trigger a search for confirmation or refutation using alternative methods, such as specific PCR assays based on findings with BacCapSeq or UHTS.
BacCapSeq is designed to detect all AMR genes in the CARD database. Where these genes are located on bacterial chromosomes, we anticipate that flanking sequences will allow association with specific bacteria within a sample, even when those samples contain more than one bacterial species. Where AMR genes are extrachromosomal, we may not be able to do so. However, BacCapSeq is anticipated to enable the discovery of constitutively expressed and induced transcripts that reflect the presence of functional bacterium-specific AMR elements.
Our objective in building BacCapSeq was to enable efficient and sensitive detection of pathogenic bacteria, virulence factors, and antimicrobial resistance genes. As with VirCapSeq-VERT, the analogous sequence capture platform established for viral diagnostics and surveillance, BacCapSeq is anticipated to allow focused investment of sequencing capacity. This should facilitate multiplexing and decreases in sequencing costs and in the complexity of bioinformatic analysis.
We acknowledge that the system has several constraints. The method is sensitive to as few as 40 to 400 copies/ml. However, BacCapSeq is qualitative, not quantitative. Where measurements of bacterial burden are critical, BacCapSeq findings should be followed by quantitative PCR (qPCR) assays. Such assays would add an additional 3 h to processing time. The system can only detect bacterial sequences that are sufficiently similar to probes for efficient capture. In earlier work, based on probes with similar lengths and probe melting temperature (T m ) properties, we defined a homology threshold for capture of 60% nucleotide identity (10). Accordingly, a novel bacterium that differs from any known human pathogen by more than 40% over the entire length of its coding sequence would not be detected. At present, the time from sample acquisition to results using the Illumina platform is approximately 70 h. This is not a time frame that will have a major impact on clinical diagnostics. However, the probe set is platform agnostic and should be adaptable to nanopore and anticipated more rapid sequencing systems that would have broader utility in clinical microbiology laboratories and resource-challenged environments, such as those found in the developing world. Bacterial typing and AMR biomarker discoveries enabled by BacCapSeq could also be used to establish qPCR assays that produce results within hours of clinical sample receipt. Nucleic acid extraction. Total nucleic acid from bacterial cells, whole blood spiked with bacteria, or bacterial nucleic acids were extracted using an Allprep mini DNA/RNA kit (Qiagen, Hilden, Germany) and quantitated by NanoDrop One (Wilmington, DE) or Bioanalyzer 2100 (Agilent, Santa Clara, CA). Bacterial nucleic acid and genome equivalents were quantitated by agent-specific quantitative TaqMan real-time PCR.

Bacteria
Agent-specific quantitative TaqMan real-time PCR and standards. Primers and probes for quantitative PCR (qPCR) were selected in conserved single-copy genes of the investigated bacterial species with Geneious v10.2.3 (https://www.geneious.com) (Table S3). Standards for quantitation were generated by cloning a fragment of the targeted gene spanning the primers into pGEM-T Easy vector (Promega, Madison, WI). Recombinant plasmid DNA was purified using a Mini Plasmid Prep kit (Qiagen). The linearized plasmid DNA concentration was determined using NanoDrop One, and copy numbers were adjusted by dilution in Tris-HCl (pH 8) with 1 ng/ml salmon sperm DNA.
Probe design. Our objective was to target all known human bacterial pathogens as well as any known antimicrobial resistance genes and virulence factors. Known human-pathogenic bacteria were selected from the available bacterial genomes in the PATRIC database (11). Included were all species for which at least one strain or isolate is annotated as "human-related" and "pathogenic." One genome was selected per species due to probe number limitations. In consultation with experts in the field, we added other bacterial species that were considered to have high potential to become pathogenic. The final list contained 307 species (Table S1), including all 19 bacterial species listed in the priority list from the Child Health and Mortality Prevention program of the Bill and Melinda Gates Foundation.
The protein coding sequences from the selected genomes of the 307 species were extracted and combined with the full data set of 2,169 antimicrobial resistance gene sequences in the CARD database (12) and the 30,178 virulence factor genes in the VFDB database (13,14). All databases were downloaded in March 2017. The combined target sequence data set (1,196,156 genes) was clustered at 96% sequence identity (resulting in 1,007,426 genes) and sent to the bioinformatics core of Roche Sequencing Solutions (Madison, WI), where sequences were subjected to further filtration based on printing considerations. Probe lengths were refined by adjusting their start/stop positions to constrain the melting temperature. The final library comprised 4,220,566 oligonucleotides with an average length of 75 nt. The average interprobe distance between probes along the targeted bacterial proteome, virulence, and AMR targets was 121 nt.
Unbiased high-throughput sequencing. Double-stranded DNA (for detection of bacteria) or cDNA (for detection of transcripts) was sheared to an average fragment size of 200 bp (E210 focused ultrasonicator; Covaris, Woburn, MA). Sheared products were purified using AxyPrep Mag PCR cleanup beads (Axygen/Corning, Corning, NY), and libraries were constructed using KAPA library preparation kits (Wilmington, MA) with input quantities of 10 to 100 ng DNA. Libraries were purified (AxyPrep), quantitated by Bioanalyzer (Agilent), and then split, with one half processed by BacCapSeq (see below) and the other directly sequenced on an Illumina MiSeq platform v3, with 150 cycles (San Diego, CA).
Bacterial capture sequencing. Nucleic acid preparation, shearing, and library construction were the same as for UHTS, except for the use of SeqCap EZ indexed adapter kits (Roche Sequencing Solutions, Pleasanton, CA) (BacCapSeq). The quality and quantity of libraries were checked using a Bioanalyzer (Agilent). Libraries were mixed with a SeqCap HE universal oligonucleotide, SeqCap HE index blocking oligonucleotides, and COT DNA and vacuum evaporated at 60°C. Dried samples were mixed with hybridization buffer and hybridization component A (Roche) prior to denaturation at 95°C for 10 min. The BacCapSeq probe library (SeqCap EZ Designs, v4.0; Roche) was added and hybridized at 47°C for 12 h in a standard PCR thermocycler. SeqCap Pure capture beads (Roche) were washed twice, mixed with the hybridization mixture, and kept at 47°C for 45 min with vortexing for 10 s every 10 to 15 min. The streptavidin capture beads complexed with biotinylated BacCapSeq probes were trapped (DynaMag-2 magnet; Thermo Fisher) and washed once at 47°C and then twice more at room temperature with wash buffers of increasing stringency. Finally, beads were suspended in 50 l water and directly subjected to posthybridization PCR (SeqCap EZ accessory kit V2; Roche). The PCR products were purified (Agencourt Ampure DNA purification beads; Beckman Coulter, Brea, CA) prior to sequencing on an Illumina MiSeq platform v3. The time required for extraction, library construction, hybridization, generation of 150 bp single reads, and bioinformatic analysis is approximately 70 h.
Data analysis and bioinformatics pipeline. Each sample yielded an average of 5 million 100 bp single-end reads. The demultiplexed FastQ files were adapter trimmed using Cutadapt v1.13 (29). Adapter trimming was followed by generation of quality reports using FastQC v0.11.5 (https://www .bioinformatics.babraham.ac.uk/projects/fastqc/) and filtering with PRINSEQ v 0.20.3 (30). Host background levels were determined by mapping the filtered reads against the human genome using Bowtie2 v2.0.6 (31). The host-subtracted reads were de novo assembled using Megahit v1.0.4-beta (32); contigs and unique singletons were subjected to homology search using MegaBlast against the GenBank nucleotide database (33). The genomes of the tested bacteria were mapped with Bowtie2 using the filtered data set to visualize the depth and the genome recovery in IGV (34,35). Results of spiking experiments are presented without any cutoff. Results of clinical blood culture and AMR treatment analyses are presented using an empirical cutoff. Targets with read counts above a 0.001% cutoff (Ͼ10 reads/1 million quality-and host-filtered reads) were rated positive.
For transcriptional analyses, MiSeq reads were aligned using the STAR read mapping package (36). Expression data were extracted from each sample using featureCounts (37), and the results were compiled into a master data file representing transcript counts for each gene. These data were normalized based on the number of reads sequenced for each sample, and the data were sorted by strain (AMR ϩ /AMR Ϫ ), time point, and antibiotic treatment to identify genes with differences in growth patterns based on these metrics.

ACKNOWLEDGMENTS
The study was supported by the NIH (Center for Research in Diagnostics and Discovery, U19 AI109761) and the Bill and Melinda Gates Foundation (OPP1163230). The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We thank David Relman (Stanford University) and Scott Hammer (Columbia Univer-sity) for assistance with the bacterial species selection for probe design. We thank Max O'Donnell and Matthew Cummings for clinical samples and Kelly Harpula for assistance with the manuscript.