Case report: Deep sequencing and long-read genome sequencing refine prior genetic analyses in families with apparent gonadal mosaicism in PIK3CD-related activated PI3K delta syndrome

Gonadal and gonosomal mosaicism describe phenomena in which a seemingly healthy individual carries a genetic variant in a subset of their gonadal tissue or gonadal and somatic tissue(s), respectively, with risk of transmitting the variant to their offspring. In families with one or more affected offspring, occurrence of the same apparently de novo variants can be an indicator of mosaicism in either parent. Panel-based deep sequencing has the capacity to detect low-level mosaic variants with coverage exceeding the typical limit of detection provided by current, readily available sequencing techniques. In this study, we report three families with more than one affected offspring with either confirmed or apparent parental gonosomal or gonadal mosaicism for PIK3CD pathogenic variants. Data from targeted deep sequencing was suggestive of low-level maternal gonosomal mosaicism in Family 1. Through this approach we did not detect pathogenic variants in PIK3CD from parental samples in Family 2 and Family 3. We conclude that mosaicism was likely confined to the maternal gonads in Family 2. Subsequent long-read genome sequencing in Family 3 showed that the paternal chromosome harbored the pathogenic variant in PIK3CD in both affected children, consistent with paternal gonadal mosaicism. Detection of parental mosaic variants enables accurate risk assessment, informs reproductive decision-making, and provides helpful context to inform clinical management in families with PIK3CD pathogenic variants.


Exome Sequencing
Detailed methods for exome sequencing can be found in Similuk et al. (1).Briefly, exome sequencing was performed on the Illumina HiSeq2500 sequencing system for Family 3 and on the Illumina NovaSeq 6000 instrument for Family 1 with a minimum coverage of 95% > 20X and mean coverage of 100X for identification of variants related to the clinical presentation of affected subjects (1).Relevant variants were confirmed by Sanger sequencing or other appropriate methods meeting Clinical Laboratory Improvement Amendments/ College of American Pathologist (CLIA/CAP) requirements.Confirmation of genomic sex and relationship was based on estimates of identity by descent.

Genome Sequencing
Research-based genome sequencing was performed on an Illumina sequencing system with a minimum coverage of 95% > 20X and mean coverage of 30X for targeted genomic regions (2).Relevant variants were confirmed by Sanger sequencing meeting CLIA/CAP requirements.

Targeted Deep Sequencing
Next generation sequencing-based targeted deep sequencing was performed for the detection of low-level mosaic variants in PIK3CD with an average coverage of 9,677X.

Pacbio Long Read Whole Genome Sequencing (LR-WGS)
Long Read Whole Genome Sequencing (LR-WGS) was performed on quad samples (2 siblings and 2 parents) using Pacbio Single Molecule Real-Time (SMRT) chemistry and Revio sequencing platform that consisted of two processing steps: library construction and second SMRT sequencing.
Library construction was performed using between 3 to 5 ug of high molecular weight (50% ≥30 kb & 90% ≥10 kb) gDNA in non EDTA buffer following manufacturer protocol according to Similuk et al. (1).The gDNA was sheared to generate fragments ranging from 15 to 25Kb.The fragmented gDNA went through multiple library construction steps including DNA damage Pacbio repair, End-repair/A-tailing, adapter ligation, beads clean up and size selection using Pacbio HiFi SMRTbell Library Prep Kit 3.0 (catalog #: 102-182-700).Library QC to ensure fragment sizes were between 15-25 Kb was performed using Agilent fragment analyzer.Library concentrations were determined using Qubit to ensure between 20-60 ng/µL prior to sequencing.The final libraries were prepped for sequencing according to instructions from Pacbio SMRT Link v13.0 (a software interface to allow a user to interact with for loading a SMRTbell library and enabling downstream streamlined analysis workflows), at a recommended on-plate-loadingconcentration between 200-300 pM.The library was then dispensed into a Revio sequencing plate (catalog #: 102-587-400) and loaded on the Revio system along with a SMRTcell tray (catalog #: 102-587-400) for sequencing targeting a depth of 20x.At least 70 Gb HiFi reads per sample were generated for downstream analysis.Sequencing data was generated with PacBio Circular consensus sequencing (CCS) analysis for each of the four samples using Pacbio SMRT Link 13.0.Bioinformatics data analysis for each set of LR-WGS raw sequencing data was performed in the following steps as shown in figure 1.First, with raw unaligned PacBio BAM files as input, PBMM2 software (version 1.13.1, https://github.com/PacificBiosciences/pbmm2)was used to align long reads against human genome (GRCh38) to generate aligned BAM files.This was followed by running Deep Variant software version 1.5.0 (3) to call small variants and genomic variants were saved in files.Finally, phased genomic variants were generated using the software WhatsHap software version 2.0 (4).Genome coverages were calculated by using Mosdepth version 0.3.4(5) as shown in Table 1 for four samples.SNP genotypes near the variant of interest presented in both sibling samples "chr1:9726972G>A" are shown in Figure 2A using IGV (Integrative Genomics Viewer).For the phased VCF files, four phased variants around "chr1:9726972G>A" are listed in Figure 2B.From both Figure 2A and Figure 2B, variant "chr1:9726972G>A" was determined to be inherited from father (blue/grey color from father/mother in Figure 2B).

Figure 1 .
Figure 1.Flowchart of analysis pipeline

Table 1 .
Genome coverage for 4 samples