Introduction

Despite recent advances in sequencing technologies, present capabilities do not permit routine whole-genome sequencing for mutation detection. In response, enrichment methods have been described to capture specific sequences from genomes that will work well for most screening studies.1, 2, 3, 4, 5, 6, 7

However, these capture-based enrichment methods are limited in some situations; they require an earlier knowledge of the target sequences for array or primer design and are thus restricted to resequencing projects. They will not be suitable when rare sequence rearrangements are in place; for example, inter-individual differences of highly dynamic structures, such as telomere and subtelomere regions, that might be difficult to capture by the mentioned methods but have relevance to ageing, cancer and inherited disease.8, 9, 10, 11 Other regions, such as pericentromeric heterochromatin, were not targeted by the Human Genome Project because they are difficult to clone and to annotate owing to high repeat content and homology.12 However, heterochromatin comprises 20% of the human genome, and seems to have relevance for gene expression and disease.13, 14, 15 In some linkage or association studies, significant results identify regions that contain no known genes.16, 17 Furthermore, even regions with known genes could feature unrecognized rearrangements or insertion of mobile elements with effects on gene regulation and expression.18, 19, 20 Other frequent examples are cryptic rearrangements in promoter regions or fusion genes in cancer development.21, 22 Those dynamics would be missed or difficult to analyze by the above-mentioned capture methods. Genome-wide paired-end sequencing is extremely sensitive,23 but may not be meaningful or practicable for large-scale screening studies when there already is a localized region of interest. However, next-generation sequencing technologies are developing extremely fast and will probably enable whole-genome sequencing at affordable costs in the near future.

To avoid making a priori assumptions on the sequences in the target region, we developed an approach that could directly start with a patient's chromosomal region linked to a disease. The most direct way is to dissect that suspicious piece of chromosome and sequence it. We have done that by coupling conventional cytogenetics (karyotyping), microdissection and high-throughput sequencing (Figure 1). We present data from three experiments, in which we obtained sequences from as few as six chromosomes and present a proof-of-feasibility protocol.

Figure 1
figure 1

Microdissection-based enrichment for next-generation sequencing. The figure shows metaphase chromosomes prepared from lymphoblastoid cells and microdissection of chromosomal region 12p. In this experiment, we microdissected and processed 10 short arms of chromosome 12. The microdissected fragments were amplified by DOP-PCR, followed by the processing protocol for 454 sequencing.

Methods

Preparation of metaphase chromosomes, microdissection and degenerate oligonucleotide-primed PCR

We prepared metaphase chromosomes from lymphoblastoid cells (chromosome 12p) and from peripheral blood (chromosome 1). Microdissection was performed as described.24 For amplification, we used an adapted degenerate oligonucleotide-primed PCR (DOP-PCR).24, 25 A detailed protocol for microdissection and DOP-PCR is provided in the supplementary material. Before preparing the 454 library, we verified the specificity of the microdissected material by dye-labeling an aliquot of DOP-PCR product and subsequent hybridization on metaphase chromosomes (reverse painting, reverse fluorescence in situ hybridization, FISH).26

Library preparation

The 454 library was prepared according to the manufacturer's instruction, and included adapter ligation, library immobilization, melting and quantification. We performed an additional reamplification of the 454 library to get a measurable amount of library material. For this purpose, we used the normal Roche 454 amplification primer (20 μ M final concentration; Roche, Branford, CT, USA) and performed a standard PCR with 35 cycles (50 μl volume). On the basis of the short length of DOP-PCR products (<200 bp) in the starting material for library preparation, we have not used paired-end sequencing.

Sequencing runs carried out with 454/Roche FLX genome sequencer

Runs were carried out according to the manufacturer's instructions with the following modification. To increase the number of sequencing reads, we passed the normal AMPure Bead Purification for FLX runs, Agencourt Bioscience Corporation, Beverly, MA, USA. We used 70 × 75 PTPs with 16-region gaskets. To obtain more sequencing information, we loaded the single lanes with more than 70 000 DNA beads. Each experiment was processed in a 1/16 run.

Bioinformatic analyses

We mapped sequences against the genomic reference sequence (hg18, March 2006, build 36.1) using MegaBlast, National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/blast/megablast.shtml. Although usually the aim is to maximize the amount of mapped reads, in our analysis, we put strong emphasis on a stringent discrimination between on- and off-target hits. We determined an optimal e-value threshold that maximizes the number of unique hits, as described in Albert et al.2 Reads with multiple hits and significant e-values were considered as non-unique mappings and were excluded from further analysis. Thus, the amount of not mapped reads is the direct consequence of the stringent regime of parameters, enabling optimal on-target off-target discrimination, and not because of other reasons such as contaminations, gaps or low sequencing quality.

Results

For microdissection, we prepared metaphase chromosomes from human peripheral blood and from lymphoblastoid cells. We targeted human chromosomes 12p and 1. We microdissected 10 short arms from chromosome 12 (experiment Chr 12p) and 6 chromosomes 1. The experiment for chromosome 1 was performed twice (experiments Chr 1(A) and Chr 1(B), respectively). The critical step was amplification from small amounts of starting material, which was successfully done by DOP-PCR. We obtained sufficient amount of DOP-PCR product. Subsequently, we took an aliquot of microdissected and DOP-PCR-amplified material for proof of specificity on control chromosomes (reverse FISH in the left panel of Figure 2).

Figure 2
figure 2

Specificity of the microdissected material tested in metaphase fluorescence in situ hybridization (FISH) and in 454 sequencing. Left panel: specificity of the microdissected material tested on metaphase FISH. On the left, the hybridization signals of the microdissected material are shown as seen under the microscope; material from chromosome 12p and whole chromosome 1 were purified and processed without major contamination. When signal detection was enhanced (middle panel), we saw minor cross-hybridization on other chromosomes, mainly in pericentromeric regions, which have a high degree of homology among chromosomes. The other faint cross-hybridizations might be due to repeat elements and segmental duplications. The inverted DAPI channel shows a GTG-like chromosome banding to unambiguously identify the metaphase chromosomes. Right panel: mappings of sequence reads show enrichment of the targeted chromosomes. A total of 41.8% of reads that uniquely mapped to the whole genome, aligned to targeted chromosome 12p, and 76.2 and 77.6% to targeted chromosome 1 (experiments A and B, respectively). This result documents the feasibility of the microdissection approach. Mapping on other chromosomes might represent repeat elements, transposons, gene families or annotation problems. The higher proportion of off-target hits from the microdissected material from chromosome 12p might result from the lower ratio of specific sequence to repeat-rich heterochromatin, as the microdissected short arm 12p contains a significant amount of pericentromeric heterochromatin. In addition, 12p is known to have been involved in segmental duplications.30, 31, 32, 33 Also, annotation errors due to unclonable genomic regions with subsequent contig gaps might contribute.29 (For each experiments we used 1/16 of a 454/Roche FLX run).

The remaining material was used for the 454 library preparation. Of all obtained sequence reads, about 52, 67 and 55% could be mapped to the human reference genome in experiments chr12p, chr1(A) and chr1(B), respectively (Table 1). For chromosome 12p, 42% of sequence reads, that were mapped to the whole genome just once, had their primary BLAST hit in the target region (Table 1, Figure 2 upper right panel, Figure 3, Supplementary Figure 1). In both chromosome 1 experiments, more than 75% of uniquely mapped sequence reads had their hit on chromosome 1 (Table 1, Figure 2 middle and lower right panels, Figure 3, Supplementary Figure 1).

Table 1 Analysis of mapped and unmapped sequences regarding the target chromosome and sequence
Figure 3
figure 3

Distribution of reads mapped to the genome. For each chromosome, we show the unique alignment locations of reads from the three data sets 12p, 1(A) and 1(B) (in olive, red and blue), as well as placements in annotated repeats (gray). The targeted regions (chromosome 1 and the p arm of chromosome 12) are highlighted in white boxes (for a zoom-in of these regions, see Supplemental Figure 1).

The distribution of sequencing coverage (Table 2, Figure 3, Supplementary Figure 1), the number of reads partially containing repeat sequences (Table 3, Figure 3, Supplementary Figure 1) and SNP detection rates (Table 1) were within the expected range for currently available enrichment methods. Although the sequence harvest can be further optimized, we obtained a sufficient number of high-quality reads for a proof of principle. The data show that 454 sequencing, starting from as few as six chromosomes, is feasible.

Table 2 Distribution of multiple sequence coverage (analysis of read clusters)
Table 3 Analysis of sequence repeat patterns

Discussion

We present the feasibility of a cytogenetic-based approach to capture target regions for next-generation sequencing. Direct microdissection of the target region in metaphase chromosomes with subsequent DOP-PCR amplification obtained sufficient material and quality for 454 sequencing from as few as six chromosomes.

We analyzed whether some fragments were sequenced several times by clustering reads. In all three experiments, the majority of hit regions were covered by only one read, indicating a coverage distribution and range for preferential amplification within the expected range. Although the representation of repeat regions in the obtained reads is higher than the average density in the currently available genome annotation, the obtained data seem consistent in light of the many newly detected, probably population-specific or private insertions of repeat elements, as they become available from the Watson and Venter genome or the 1000 Genomes Project, respectively.23, 27 Although it was not the aim of our experiments, analyzing regions with repeat elements might be facilitated by a microdissection approach. However, considering the fact that even more stable regions of the genome, such as exons, require a high sequencing depth,28 exploiting sequence variations in repeat elements will probably warrant an even higher coverage. Their de novo annotation would be facilitated by longer sequencing reads to include sequences adjacent to the repeat.

Making chromosomes visible requires the chromatin to condense and arrange, which happens mainly when cells replicate and prepare to divide (metaphase in cell cycle). Accordingly, direct microdissection of the patient's chromosomes is possible only when dividing cells are available. This state-of-affairs is a limitation. Another point to care for is the risk of contamination, especially during the microdissection process and the first cycles of DOP-PCR. However, single-cell techniques are well established in many pre-implantation diagnostic and tumor microdissection laboratories. The number of generated sequences is probably lower than that in other approaches, but can be increased by optimized loading density and compensated by more runs. Our approach works well for complete chromosomes and partial chromosome arms and regions. Smaller parts can also be microdissected, as was previously shown for microdissection libraries with band-resolutions that were created for chromosome painting.24 Such adaptation might be useful for specific questions. Here, we wanted to cover complete regions, including centromeric regions and repeats. When not required, repeats can be blocked by COT-1 DNA to increase the harvest for unique sequences.

We showed that high-throughput sequencing of microdissected chromosomes is feasible and can be done from a few molecules. The coupling of microdissection and next-generation sequencing is suited for a wide range of applications, including standard mutation detection. Sequencing phase-defined chromosomes allows experimental determination of haplotypes and haplotype blocks. The combination of defined localization information and independency from earlier knowledge of sequence composition in the target region might help in solving annotation problems of repeat rich or non-clonable regions in de novo sequencing.29 The approach might also be relevant in humans, when population-specific insertions are suspected, for tracking down small ‘private’ cytogenetic abnormalities in patients or tumor cells and for resequencing of dynamic chromosomal regions, such as telomeres, subtelomeres or pericentromeric heterochromatin.