High-resolution mapping of reciprocal translocation breakpoints using long-read sequencing

Graphical abstract


Method details
Genomic DNA of balanced reciprocal translocation carriers was sequenced in a MinION flow cell (R9.4, Oxford Nanopore Technologies, UK) with a 48 -h sequencing protocol [1]. In brief, genomic DNA was extracted from peripheral blood using QIAamp DNA Blood Mini kit (Qiagen, Manchester, UK). The integrity of the DNA was confirmed by gel electrophoresis. One microgram of DNA was used to prepare a sequencing library using the SQK-LSK108 kit (Oxford Nanopore, Oxford, UK). Local base calling, sequence alignment and breakpoints determination were performed, followed by confirmation of breakpoints by Sanger sequencing.
Base calling, sequencing alignment and breakpoints determination 1) Local base calling was performed using Albacore 2.3.1 (Oxford Nanopore, Oxford, UK). FastQ files were sorted into pass and fail folders using a default quality score cutoff of 7. Please note that at the time of manuscript revision, FAST5 files generated by the updated version of MinKNOW (core 3.5.5, Oxford Nanopore, Oxford, UK), are no longer supported by Albacore. Real time basecalling can be performed by the data processing toolkit (Guppy) integrated in MinKNOW, or performed locally using Guppy Basecalling Software (Version 3.3.0).
read_fast5_basecaller.py -flowcell FLO-MIN106 -kit SQK-LSK108 -output_format fastq -input folder_with_fast5 -recursive -save_path fastq/ -worker_threads 12 1) The FastQ files in the "pass" folder were aligned to GRCh37/hg19 using minimap2(2.11) [2] and then converted to sorted bam files by samtools minimap2 -ax map-ont reference.fasta fastq/pass/*.fastq > example.sam samtools view -bS example.sam > example.bam samtools sort example.bam example.sorted samtools index example.sorted.bam 1) The bam files were split according to the translocated chromosomes. Merge of the 2 split bam files was necessary before breakpoint determination. This could significantly cut down the time for breakpoint determination. Take sample #100648 as an example, the time required was reduced 60 % after splitting the bam file. 1) Breakpoints determination was performed using NanoSV1.2.0 [3], and all the putative breakpoints were listed in example.vcf Coordinates of the predicted breakpoints that were correlated with the cytogenetic reports of the patients were extracted from the example.vcf. Table 1 shows the extent of concordance of the predicted breakpoints of 9 samples with Sanger sequencing results. There were discrepancies of a few bases due to microinsertions / microdeletions in all samples. To get more information on the characteristics at the break junctions, positions of the breakpoints were visually checked on IGV. The aligned sequencing reads of chr2 and chr10 of sample #100238 on IGV are depicted in Fig. 1A and B. Microdeletions were easily identified at the regions showing reduced coverage between the 2 breakpoints. Sanger sequencing results of the derivative chromosomes ( Fig. 1C and D) confirmed the microdeletions.
In case where a large deletion was suspected at the breakpoints, it is important to visually check the alignment reads by zooming out on IGV. For instance, the aligned reads on chr8 of sample #100648 indicated a deletion of 216 bases at the breakpoint ( Fig. 2A). This provided clues to design PCR primers that covered the deletions. Fig. 2B shows the aligned reads on chr10. Primers were initially designed based on the predicted breakpoint at chr10:128,448,174. Breakpoint PCR successfully amplified chr8, chr10 and der(10) but not der(8) (Fig. 2C). It was suspected that a large deletion was involved at the breakpoint on chromosome 10 based on the following reasons. First, successful amplification of chr8 and chr10 indicated that the primers 8 F, 8R, 10 F and 10R were working. Second, successful amplification of der(10) confirmed the 5 0 location of the deletion on chr10 and the 3 0 location of the deletion on chr8. Third, IGV view on chromosome 8 clearly showed a deletion of approximately 200 bases between primers 8 F and 8R. Therefore, it was deduced that primer 8 F should be able to bind to der (8), and that the failure in amplification of der(8) was due to the deletion which removed the binding sequence of primer 10R. Fig. 2B shows the chimeric reads (reads aligned to both chromosome 8 and chromosome 10) at 3637 bases downstream from the predicted breakpoint (depicted in the blue box) and the 3 0 location of the deletion on chromosome 10. New PCR primers flanking the 3 0 end of the deletion could amplify the der(8) successfully, confirming the breakpoints on chromosome 8 and chromosome 10.
In summary, we demonstrate a new method for accurately mapping of the reciprocal translocation breakpoints directly on balanced carriers. Splitting the bam files according to the translocated chromosomes before breakpoint prediction can dramatically cut down the time required for the prediction. Visual inspection of the aligned reads on IGV is crucial to accurate identification of the breakpoints. Nonetheless, this method has a few limitations. First, it relies on the previous cytogenetic diagnosis for splitting bam files, therefore it is not applicable for mapping genome-wide de novo structural variations. Second, the method is not applicable to patients with translocation breakpoints Table 1 Concordance of NanoSV prediction and breakpoints confirmed by Sanger sequencing.  located across the highly repetitive centromeric regions of acrocentric chromosomes, such as Robertsonian translocation carriers.

Additional information
Balanced reciprocal translocation carriers are usually phenotypically normal but are at an increased risk of infertility, recurrent pregnancy loss or having children with physical or mental abnormalities. Preimplantation genetic testing on chromosomal structural rearrangement (PGT-SR) offers a way to screen against unbalanced embryos during in vitro fertilization treatment cycle, but the methods cannot distinguish euploid carrier from noncarrier embryos. Although replacing carrier embryos should result in phenotypically normal live births, the offspring will encounter the same problems as their parent in terms of infertility, recurrent pregnancy loss or having affected children. Distinguishing between carrier and noncarrier embryos is possible by SNP haplotyping [4,5] and next generation sequencing (NGS) [6][7][8], however these methods involved complicated analyses and interpretations. SNP haplotyping approach also relies on the presence of an unbalanced embryo, DNA of previously affected pregnancies or DNA of close relatives for phasing. Instead nanopore sequencing can provide simple, accurate and high-resolution breakpoint mapping directly on balanced carriers. This enables the design of breakpoint PCR primers which can be used to distinguish carrier from noncarrier embryos produced in PGT-SR cycles. This method allows patients to prioritize the replacement of euploid noncarrier embryos.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.