Skip to main content
Advertisement

< Back to Article

Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants

Figure 1

Schematic strategy of genome sequencing/assembly.

The orange line represents the target individual genome, the red bars stand for the SNPs and small SVs compared to the reference, and the green region represents a large SV. (A) After the sequencing experiments, single and paired-end reads with different lengths (long, medium, short, shown in different colors) are generated, which can be viewed as various partial observations of the target genome sequence. The dashed lines represent the links of the paired-ends. The horizontal positions of the reads indicate their locations in the genome. (B) After error correction, the reads are mapped back to the reference genome, and the short reads are assembled into longer contigs based on their overlapping information. The red and green regions stand for the mismatches/gaps in the mapping results. (C) The SNPs and small SVs can be inferred directly from the mapping results, and haplotype phasing can also be performed after this step. (D, E) Large SVs can be detected and reconstructed based on the reads without consistent matches in the reference genome, and also based on the results from CGH arrays. This step will be explained in more details in the Results section. (F) The final assembly is generated after all the small and large SVs are identified.

Figure 1

doi: https://doi.org/10.1371/journal.pcbi.1000432.g001