Gene capture and random amplification for quantitative recovery of homologous genes

https://doi.org/10.1016/j.mcp.2006.09.003Get rights and content

Abstract

The polymerase chain reaction (PCR) is instrumental in molecular analysis of microorganisms, allowing for the selective amplification of nucleic acids directly from clinical and environmental samples. However, the principles that allow for targeted amplification of DNA become a hindrance when attempting to simultaneously discriminate and quantify complex mixtures of homologous genes. Here we present a simple solution to the quantitative problem by separating the enrichment and amplification aspects of a conventional PCR reaction. In this assay, genes are enriched using a DNA oligonucleotide capture probe and subsequently amplified in a two-step random amplification protocol. In order to evaluate the quantitative aspects of the gene capture assay, we used real-time quantitative-PCR to measure initial and final concentrations of homologous genes from constructed mixtures of genomes. Upon sampling for the universal DNA-dependent RNA polymerase gene, rpoC, we were able to demonstrate quantitative recoveries from a mixed DNA sample despite differences in gene copy number ranging up to 4 orders of magnitude. This suggests that minority populations as low as 0.01% of the total community are represented as accurately as populations at higher abundance. These results offer new possibilities for accurately and quantitatively monitoring diverse mixtures of microorganisms.

Introduction

The polymerase chain reaction (PCR) is a key step in most nucleic acid-based diagnostic techniques, allowing for selective amplification of desired sequences above the complexity of background DNA. In many cases, PCR is used to amplify homologous genes from mixtures of organisms, and downstream detection technologies offer various types of discrimination based on amplicon fragment sizes or sequence characteristics. More recently, hybridization of PCR amplicons to DNA oligonucleotide microarrays has allowed for entire communities to be surveyed in a single reaction [1], [2]. However, the ability to accurately discriminate different organisms in an assay depends on the sequence resolution and characteristics of the genetic target. Likewise, the ability to estimate relative abundance is only as good as the techniques used to extract, prepare and amplify signal from the raw samples. Regardless of DNA extraction method, the current paradigm for comprehensive microbial diagnostics and community analysis is based on PCR amplification of the ubiquitous 16S ribosomal RNA gene (rRNA). Although the PCR and the 16S rRNA gene serve as the foundation for most nucleic acid-based detection technologies, biases inherent in these tools pose a challenge for truly universal and quantitative analysis of complex samples. This paper describes a novel approach to DNA amplification that overcomes the biases associated with PCR amplification of 16S rRNA genes.

Several basic constraints must be met for PCR amplification of mixed templates: all molecules must be equally accessible; primer and template hybrids should form with equal efficiency; polymerization efficiency should be the same for all; and substrate exhaustion should affect all templates equally [3]. Of these considerations, primer and template interactions deserve much of the attention because priming sequences often differ among various groups of organisms. The efficiency of primer binding depends on nucleotide composition of the template priming site, G+C content, and various chemical and thermal parameters of the PCR reaction. In addition to variations in annealing efficiencies, the exponential increase in template copies over successive PCR cycles may contribute to error. For example, stochastic variation in amplification during the early cycles of the PCR can become exacerbated over successive cycles leading to a situation known as “drift,” although this problem can be mitigated by performing fewer cycles and combining replicate reactions [4]. Another result of exponential amplification is that primer concentrations decrease as templates increase, creating a situation where complementary DNA strands compete with primer for template binding [5]. This poses a problem for quantitative analyses in mixed samples because the amplification of different templates may saturate at different times, and the ratios of the different templates eventually converge after saturation at a common plateau. Taken as a whole, biases related to PCR amplification limit the ability to perform quantitative analyses of mixed samples unless target sequences are identified and quantified one at a time.

DNA probing technology offers a solution to the limitations of traditional two-primer PCR. By attaching a single-stranded DNA probe onto the surface of a superparamagnetic particle or other substrate, it is possible to capture desired genes from a sample and eliminate the background of the genome [6]. This approach has the effect of enriching the gene of interest relative to background DNA, and works with a single oligonucleotide capture sequence rather than two priming sequences as required in a conventional PCR reaction. Thus, design parameters and hybridization conditions for capture probes are much less stringent than for PCR primer pairs: there are fewer constraints on the melting temperatures between oligonucleotide and template hybrids; there is no risk of forming heterodimers; and it is possible to accommodate much higher levels of degeneracy in the capture probe pool. The ability to design probes with higher degeneracy allows for comprehensive capture of protein-encoding genes, since these sequences present greater variability in the wobble positions of the nucleic acid code.

Despite the benefits of using a DNA probe to capture genes of interest, one of the limitations is that the copies of captured genes are too few to be visualized, cloned, sorted, or otherwise analyzed. This is where the PCR has an advantage for generating substantial quantities of material for further study, and to date, bead-based sequence capture with DNA probes has been used primarily as a pre-enrichment step for conventional PCR [7], [8]. An alternative approach for amplifying the enriched material is to use a random PCR reaction with fully degenerate hexanucleotide primers [9]. In this case, random hexamers are used to amplify the DNA without regard to primer design or specificity of the target sequences. As a result, signal amplification may proceed with minimal bias and therefore preserve the quantitative ratios of homologous genes in the enriched sample.

The goal of this research was to explore the quantitative aspects of gene cap̲ture and r̲andom a̲mplification, a technique given the name CAPRA (Fig. 1). Quantitative PCR was used to measure ratios of homologous genes from mixed genomic DNA samples during initial, intermediate and final phases of the assay. In this case, Q-PCR served as a simple analytical tool that allowed for development and refinement of the assay conditions, independently of any specific downstream detection technology. Our intent was to develop an understanding of CAPRA with the longer-term goal of optimizing these steps for sequence identification and detection through DNA clone libraries and DNA oligonucleotide microarrays, respectively.

Much of the current understanding of microbial diversity and phylogeny is based on the study of the 16S rRNA gene [10], [11]. This gene encodes for one of the structural RNA components of the prokaryotic ribosome and has several unique features which make it a valuable target for molecular studies. For example, highly conserved regions of sequence offer universal priming sites for the polymerase chain reaction, allowing for a common set of PCR primers to be used to amplify the 16S rRNA genes from unknown organisms. Other regions of the 16S rRNA gene are more variable, and the sequence differences between organisms can be scored and calculated for determining degrees of relatedness. As a result, large databases of 16S rRNA gene sequences and a variety of molecular analytical tools have been developed that offer deeper insights into the microbial world [12].

However, other aspects of the 16S rRNA gene are less well-suited for accurate and quantitative analysis of microbial communities. For example, the discriminating power of the 16S rRNA gene is relatively coarse and limits the differentiation of more closely related species and strains. Another confounding factor is that microorganisms can have variable numbers of copies of the 16S rRNA gene, and the different copies within one organism can accumulate sequence mutations independently of each other [13], [14]. When sampling 16S rRNA genes from an undefined community, the heterogeneous copies arising from one organism can lead to an overestimation of diversity. Variations in copy number also contribute to quantitative bias in DNA-based molecular assays, since organisms with higher copy number give a stronger signal relative to their population size compared to organisms with lower gene copy number [15]. A final consideration for the use of the 16S rRNA gene in community analysis is the use of so-called “universal” priming sites for the PCR. These are short stretches of conserved sequence which are important for maintaining the structural integrity of the ribosome. However, the term “universal” is a misnomer, because these priming sequences are not strictly conserved across all microbial lineages [16]. This influences the ability of the PCR to successfully amplify 16S rRNA gene sequences from organisms whose priming sites differ from the commonly used primers, and may lead to a significant underestimation of diversity in natural samples [17]. Clearly, the factors involved in over- and underestimating microbial diversity do not compensate for each other: two wrongs do not make a right.

Alternatives to the rRNA genes for molecular analysis include the various single-copy “core” genes that encode for proteins, where many have co-evolved with the ribosome [18]. These protein-encoding genes offer a finer level of sequence resolution, due to the accumulation of silent mutations in the wobble positions of the nucleic acid code. A single-copy gene also lends to the accuracy of an analysis technique, since one gene represents only one cell. The genes encoding the β and β subunits of the DNA-dependent RNA polymerase, rpoB and rpoC, respectively, are two examples of universal core genes that are emerging in prominence in molecular techniques. These genes have been successfully substituted for the 16S rRNA gene in assays that target specific groups of organisms, such as fingerprinting of isolates from marine ecosystems using denaturing gradient gel electrophoresis (DGGE) and the characterization of marine prokaryotes from clone libraries [19], [20]. Although protein-encoding genes do not offer universal priming sites for conventional PCR, the DNA-dependent RNA polymerase genes and other conserved housekeeping genes represent valuable targets for the identification of universal DNA capture probes. For example, a comparison of rpoC genes available in the NCBI comprehensive microbial resource database reveals a short stretch of amino acids with strict sequence conservation. This amino acid sequence, NADFDGD, corresponds to the Mg-chelating center of the RNA polymerase enzyme [21]. Further review of the associated literature suggests that the sequence (Y/F)NADFDGD(E/Q)M(N/A) is universally conserved across all known domains of life [22]. After accounting for the degeneracy in the wobble positions of the nucleic acid code, this set of oligonucleotide capture probes has the capacity to target Eubacteria, Archaea, the Eucaryotic domain and all its kingdoms, as well as viruses containing their own DNA-dependent RNA polymerases. Among cell-based organisms, this allows for the possibility of developing a truly universal assay.

Section snippets

Beads and probes

Streptavidin-coated MagneSphere paramagnetic particles were obtained from Promega in 0.6 ml aliquots and were used in a MagneSphere Technology magnetic separation stand (Promega, Madison, WI). Oligonucleotide probes were synthesized with a 5-biotin molecule and a polynucleotide A(12) linker with a degenerate nucleic acid sequence accommodating all possible combinations of the amino acid sequence FDGDQMA (5-TTYGAYGGNGAYCARATGGC-3). Probes were reconstituted to a final concentration of 10μM for

Optimization of the CAPRA assay

In order to develop the methodology, CAPRA was performed with sheared genomic DNA from a pure culture of Shewanella oneidensis as a positive control. Conditions were optimized by comparing the enrichment of the rpoC gene relative to a single-copy background gene selected at random, uridine kinase (udk). Given that these two genes are initially present in the S. oneidensis genome at a ratio of 1:1, the efficiency of capture was expressed as a signal-to-noise ratio of rpoC:udk. Under gentle wash

Discussion

CAPRA offers a promising strategy for universal and quantitative recovery of homologous genes from complex mixtures of DNA. Support for this concept was demonstrated with the universal DNA-dependent RNA polymerase (rpoC) gene as a target, where mixtures of genes were recovered within a factor of two compared to their initial concentrations. Five genomes were accurately represented after gene capture, although only three of these could be accurately measured with Q-PCR after random

Acknowledgments

This work was funded by the Office of Science Biological and Environmental Research NABIR Program, U.S. Department of Energy (DOE) under Grant DOEAC05-00OR22725, and by the STC Program of the National Science Foundation under Agreement Number CTS-0120978 to the University of Illinois Urbana-Champaign. The authors wish to thank Dr. Jizhong Zhou for hosting L. C. in his laboratory, and for providing samples of D. radiodurans. Samples of A. tumefaciens, and M. tuberculosis and V. cholerae, were

References (26)

  • G. Mangiapan et al.

    Sequence capture-PCR improves detection of mycobacterial DNA in clinical specimens

    J Clin Microbiol

    (1996)
  • T. Stinear et al.

    Identification of Mycobacterium ulcerans in the environment from regions in Southeast Australia in which it is endemic with sequence capture-PCR

    Appl Environ Microbiol

    (2000)
  • C.R. Woese

    Bacterial evolution

    Microbiol Rev

    (1987)
  • Cited by (9)

    View all citing articles on Scopus
    View full text