Evaluation and Analysis of Absence of Homozygosity (AOH) Using Chromosome Analysis by Medium Coverage Whole Genome Sequencing (CMA-seq) in Prenatal Diagnosis

Objective: Absence of homozygosity (AOH) is a genetic characteristic known to cause human diseases mainly through autosomal recessive or imprinting mechanisms. The importance and necessity of accurate AOH detection has become more clinically significant in recent years. However, it remains a challenging task for sequencing-based methods thus far. Methods: In this study, we developed and optimized a new bioinformatic algorithm based on the assessment of minimum sequencing coverage, optimal bin size, the Z-score threshold of four types of allele count and the frequency for accurate genotyping using 28 AOH negative samples, and redefined the AOH detection cutoff value. We showed the performance of chromosome analysis by five-fold coverage whole genome sequencing (CMA-seq) for AOH identification in 27 typical prenatal/postnatal AOH positive samples, which were previously confirmed by chromosomal microarray analysis with single nucleotide polymorphism array (CMA/SNP array). Results: The blinded study indicated that for all three forms of AOH, including whole genomic AOH, single chromosomal AOH and segmental AOH, and all kinds of sample types, including chorionic villus sampling, amniotic fluid, cord blood, peripheral blood and abortive tissue, CMA-seq showed equivalent detection power to that of routine CMA/SNP arrays (750K). The subtle difference between the two methods is that CMA-seq is prone to detect small inconsecutive AOHs, while CMA/SNP array reports it as a whole. Conclusion: Based on our newly developed bioinformatic algorithm, it is feasible to detect clinically significant AOH using CMA-seq in prenatal diagnosis.


Introduction
Chromosomal microarray analysis (CMA) has been initially recommended as a firsttier clinical genetic tool for the evaluation of postnatal patients with suspected developmental disabilities or congenital anomalies [1]. During the last decade, chromosomal microarray analysis with single nucleotide polymorphism arrays (CMA/SNP arrays) has enabled the rapid advancement of genome-wide detection of copy number variations (CNVs) at a relatively medium resolution for the discovery of microdeletion and microduplication syndromes [2]. More importantly, CMA/SNP arrays can also provide clinically useful information regarding the absence of homozygosity/uniparental disomy (AOH/UPD) in copy-number neutral scenarios and polyploidy by genotyping [3][4][5]. Therefore, CMA/SNP arrays are recommended for first-tier testing in the prenatal setting for fetuses with structural anomalies [3,6,7]. Since AOH is associated with autosomal recessive diseases and imprinting disorders resulting from UPD, the detection of AOH is of significant clinical importance in both prenatal and postnatal settings.
Recently population-level genomic studies using sequencing technologies have provided tremendous new insights regarding the clinical importance of AOH/UPD. First, the estimated prevalence of AOH/UPD based on the general population of approximately 4 million individuals revealed that its occurrence is 1 in 2000 births, which is a nearly 2× increase compared to previous estimations [8]. It is even more significantly higher (1 in 167) in a pediatric patient cohort with a broad spectrum of clinical manifestations subjected to exome sequencing [9]. Second, one of the plausible factors accounting for such a huge discrepancy in prevalence is the possibility that the hybridization-based microarray approach has an intrinsic resolution limitation, which directly impacts the detection sensitivity of AOH. The limitation arises mainly from the nonuniform probe distribution and the resulting nonuniform genomic coverage, which are known at the beginning of this technology [10]. A recent study showed that each individual carries, on average, 2.9 rare structural variants (SVs) affecting coding regions and 19.1 rare noncoding deletions. The mutation burden of CNVs is nearly equal to that of loss-of-function single nucleotide variants (SNVs) and indel variants, which is only just being realized by the scientific community [11]. Third, as the importance of detecting variants both from coding and noncoding regions is being realized in clinical tests, the guideline for clinical interpretation of variants found in all loci across whole genome regions has been recently developed and published [12].
The successful application of sequencing technologies has been exemplified by CNV detection in clinical settings. Low-pass whole-genome sequencing (LP-WGS) or CNV sequencing (CNV-seq) with~0.1-1× coverage depth was reported for clinical application and widely used for CNV detection [13][14][15][16]. For example, Dong et al. used ã 0.25× coverage LP-WGS approach in CNV analysis to identify aneuploidies, pathogenic CNVs and chromosomal mosaicism as low as 25% [7,17]. Our group also reported the detection of CNVs for prenatal diagnosis with LP-WGS [18]. However, in most cases, additional testing is required for AOH detection, either short tandem repeat markers or methylation analyses.
The clinical importance and necessity of accurate AOH detection has become more significant, particularly for fetuses with abnormal ultrasound findings, while detection and clinical interpretation remain challenging tasks for the sequencing-based methods thus far. First, genotyping the B-allele by LP-WGS is difficult or infeasible via SNP information. Even with the tremendous advantages of LP-WGS compared with the CMA/SNP array for CNV detection, the inability to reliably detect AOH limits the sequencing-based approach for wide clinical application [19][20][21]. Second, only a few studies have investigated the feasibility of AOH detection by medium-pass genome sequencing [22,23], and the key bioinformatic algorithm has not yet been fully developed and optimized. Therefore, illustrating the detailed bioinformatic algorithm, as well as systematically evaluating its performance in clinical samples, is warranted.
In this study, we first performed sequencing at a series of different coverage depths and various genomic window sizes to demonstrate the minimal sequencing coverage depth for reliable AOH detection. We then used this method, called chromosome analysis by medium coverage whole genome sequencing (CMA-seq), in AOH detection, using different types of prenatal/postnatal samples previously confirmed by the CMA/SNP array. We demonstrated that SNVs within a 200 kb bin size window calling from a minimal 5X sequencing coverage yield sufficient information for accurate AOH detection, and the CMA-seq results are equivalent to those of the CMA/SNP array using our newly developed and optimized bioinformatic algorithms.

Study Design and Sample Preparation
The study protocol was approved by the medical ethics committee of Peking Union Medical College Hospital. The analyses of anonymized samples and reporting of deidenti- fied molecular data with minimum clinical information were approved by the Institutional Review Board of Peking Union Medical College Hospital. In our study, twenty-eight samples with a negative AOH finding, verified by the CMA/SNP array (750k), were used as control samples for the training set of algorithm development. In the following validation stage of algorithm development, a total of 27 clinical samples were enrolled as the patient cohort, including prenatal/postnatal cases (chorionic villus sampling (CVS, 5), amniotic fluid (AF, 12), cord blood (CB, 1), peripheral blood (PB, 8) and abortive tissue (AB, 1)) with a positive finding of AOH by the CMA/SNP array (750K). The PB group consisted of adult samples with positive AOH findings that were clinically collected when the testing result of the fetal sample was abnormal and parental validation was warranted. For objectivity, detailed results of the AOH derived from CMA/SNP array were blinded across the whole process of CMA-seq testing and unclosed at the end of the CMA-seq experiment and analysis. Figure 1 illustrates the flow chart of development and validation of CMA-seq algorithm for AOH detection.

Whole Genome Sequencing at Different Coverage Depths
Genomic DNA from CVS and uncultured AF, CB, PB or AB was extracted with the standard operating instructions of the QIAamp DNA Blood Mini Kit (Qiagen, Valencia, CA, USA). Genomic DNA was broken by random fragmentation with an ultra sonicator

Whole Genome Sequencing at Different Coverage Depths
Genomic DNA from CVS and uncultured AF, CB, PB or AB was extracted with the standard operating instructions of the QIAamp DNA Blood Mini Kit (Qiagen, Valencia, CA, USA). Genomic DNA was broken by random fragmentation with an ultra sonicator Q800R (Qsonica, Newtown, CT, USA). Library construction and sequencing on the AmCareSeq-2000 sequencer (AmCare Genomics Lab, Ltd., Guangzhou, China) was conducted according to the instructions, with a 200~500 bp insert size and PE 150 bp sequencing strategy. Different coverage depths of whole genome sequencing (from 1× to 10× coverage at increments of 1×) were performed in 28 control samples. Sequencing reads were cleaned by discarding reads with a base quality less than QC20 and mapped to the reference human genome version hg19. The alignment was performed by using Burrows-Wheeler Aligner (BWA) [22].

Bioinformatics Analysis of AOH Detection
Quality parameters of the analysis were considered to be satisfactory with a data yield ≥ 20 Gb, Q30 > 85% and read counts ≥ 100 M. SNVs and indel variants were extracted by an in-house bioinformatics pipeline [18,23]. Based on the variantallele fraction (VAF), SNVs were classified into homozygous SNVs (B allele), heterozygous SNVs (AB allele) and nondiploid heterozygous SNVs (AAB allele and ABB allele) [24]. The parameter definitions can be seen in the Supplementary Materials. The coverage and SNVs profiles of the normal cohort at window sizes of 10 kb, 50 kb, 100 kb, 200 kb, 300 kb, and 400 kb across the human genome were also generated. The Z-score was calculated for each 200 kb window for a single sample only. The regions containing at least 10 continuous 200 kb windows with a Z-score either below minus 1 or above 1 were marked for further manual examination.

Validation of AOH Detection by CMA/SNP Array
According to the SNP array quality control system, DNA quality control (>250 ng, electrophoretic bands > 2000 bp), amplified and purified product quality control (the concentration of amplified purified products was more than 300 ng/µL when diluted 10 times), and fragmented product quality control (electrophoretic bands were 25-125 bp) were all achieved. Then, the labeled DNA was hybridized to the Affymetrix ® CytoScanTM 750K Array (Affymetrix, Santa Clara, CA, USA). After washing by the Affymetrix GeneChip ® Fluidics Station 450, the arrays were scanned with the GeneChip ® System (GCS) 3000Dx. Finally, we used the Chromosome Analysis Suite (ChAS) 14.2 software to analyze the CEL files obtained from scanning the arrays. The human reference genome was the GRCh37 (hg19) genome. The pathogenicity interpretation of CNVs and AOHs was performed according to the American College of Medical Genetics (ACMG) guidelines [25].

Assessment of Minimum Sequencing Coverage for AOH Detection
The sequencing depth of WGS was previously shown to be positively correlated with the SNVs calling accuracy. Based on WGS data, it was estimated that SNP calling from at least 13.7× coverage depth can be >99% concordant with the genotypes obtained from CMA/SNP array, and a 15× coverage depth was recommended to have accurate single SNV genotyping [26]. However, such a high coverage depth makes it unlikely to be affordable to implement routinely.
It has previously been demonstrated that the number of called variants at 20× coverage can reach saturation at approximately 4.2 million, whereas the number of called variants at 5× can only reach 3 million, which means that approximately 30% of variants are missed [26]. Thus, we first performed WGS at a cascade of coverage depth levels of 1×, 2×, 3×, 4×, 5×, 6×, 7×, 8×, 9× and 10×. The result from Figure 2A demonstrates the number of SNVs detected at each coverage depth. It further validated that the number of call variants is positively correlated with the sequence depth and reaches a plateau at a coverage depth of nearly 7-9×. least 5x coverage appeared to be necessary for the accurate assessment of genomic variations as for the study of linkage disequilibrium [27]. To enhance the signal and increase the statistically meaningful total number of SNVs within a genomic block from 5× medium coverage data, we further extracted the number of B allele, AB allele, AAB allele and ABB allele at different bin sizes from 10 kb to 500 kb of window size. As shown in Figure 2B and Supplementary Figure S1, the total number of variants in each bin size remain in a Gaussian distribution, except for bin sizes of 10 kb and 50 kb. The number of each allele at the 100 kb bin size is relatively small, and its standard deviation is large, which makes the downstream calculation of the Z-score challenging. Considering that the AOH size of the ethnic background of the Chinese population is approximately 2 Mb in length [28,29], if we define that at least 10 continuous bin size can yield reliable AOH detection, a 200 kb bin size would be appropriate. The results of Figure 2B-F show these four types of allele counts and distributions at a 200 kb bin size for our normal cohort.

Performance of Detection of AOH for Prenatal Samples with CMA-seq
A total of 27 prenatal/postnatal samples (CVS, AF, CB, PB, AB) were subjected to CMA-seq and analyzed by our newly developed analytical parameters as outlined above. These 27 positive samples cover a wide spectrum of AOH types encountered during our clinical practice in recent years. Types of AOHs included chromosomal level AOHs (14.8%, 4/27), whole genome wide AOHs (3.7%, 1/27), and segmental AOHs (81.5%, 22/27). The positive findings by CMA-seq were 100% concordant with those of the CMA/SNP arrays (Table 1). This result indicated that a variety of prenatal samples were suitable for sequencing-based approaches. These data also suggested that AOHs occur ubiquitously in the human genome. When only reads below 5× were used for alignment, the single SNV calling accuracy declined drastically. Such low coverage coupled with random sampling alignment errors, sequencing error, etc., make higher noise than the signal, thus, making it unreliable to extract single SNV genotyping information. In sum, LP-WGS might not be suitable for AOH detection, considering the low quality of SNV, per se, and the total number of SNVs within a genomic block. A previous study also revealed a similar recommendation that at least 5× coverage appeared to be necessary for the accurate assessment of genomic variations as for the study of linkage disequilibrium [27].
To enhance the signal and increase the statistically meaningful total number of SNVs within a genomic block from 5× medium coverage data, we further extracted the number of B allele, AB allele, AAB allele and ABB allele at different bin sizes from 10 kb to 500 kb of window size. As shown in Figure 2B and Supplementary Figure S1, the total number of variants in each bin size remain in a Gaussian distribution, except for bin sizes of 10 kb and 50 kb. The number of each allele at the 100 kb bin size is relatively small, and its standard deviation is large, which makes the downstream calculation of the Z-score challenging. Considering that the AOH size of the ethnic background of the Chinese population is approximately 2 Mb in length [28,29], if we define that at least 10 continuous bin size can yield reliable AOH detection, a 200 kb bin size would be appropriate. The results of Figure 2B-F show these four types of allele counts and distributions at a 200 kb bin size for our normal cohort.

Performance of Detection of AOH for Prenatal Samples with CMA-seq
A total of 27 prenatal/postnatal samples (CVS, AF, CB, PB, AB) were subjected to CMA-seq and analyzed by our newly developed analytical parameters as outlined above. These 27 positive samples cover a wide spectrum of AOH types encountered during our clinical practice in recent years. Types of AOHs included chromosomal level AOHs (14.8%, 4/27), whole genome wide AOHs (3.7%, 1/27), and segmental AOHs (81.5%, 22/27). The positive findings by CMA-seq were 100% concordant with those of the CMA/SNP arrays (Table 1). This result indicated that a variety of prenatal samples were suitable for sequencing-based approaches. These data also suggested that AOHs occur ubiquitously in the human genome.

Evaluation and Analysis of Normal Sample and Detection of Chromosomal Level AOH
CNV and AOH analyses of 28 normal control samples verified by the CMA/SNP array were carried out as a training set by sequencing. Usually, CNV analysis of CMA/arrays requires data normalization or bias correction by mixing two sets of differently labeled genomic DNA for hybridization and signal detection [30]. Meanwhile, CNV analysis of low-pass CNV-seq takes a computationally synthesized reference from a template of approximately 10-20 batched test samples [13,20,30]. One distinct feature for CNV analysis by CMA-seq is that the data set of one single sample is used for both interchromosomal and intrachromosomal normalization, without an experimentally or a computationally generated reference. This is important for downstream continuous bioinformatics analysis of AOH detection by focusing on the sample itself. Figure 3A shows the CNV coverage for chromosome 3 of 124446BX. The number of each type of allele SNVs counts is indicated as B_variant_count, AB_variant_count, AAB_variant_count, and ABB_variant_count in Figure 3B. The AB allele variant count was greater than the B allele variant count, which is consistent with more heterozygote variants than homozygotes for a normal individual. Meanwhile, the Z-score for each type of allele as computed to the mean value of the corresponding type of allele count across the normal control cohort at a 200 kb bin size is shown in Figure 3C,D. The Z-score for each type of allele as colored by green (B_allele), red (AB_allele), orange (ABB_allele) and blue (AAB_allele), showing that the value fluctuated around zero within the range of −1 and +1 standard deviation. There is occasionally a spike, which is probably due to a particular genomic context. When we applied the criteria that at least 10 continuous 200 kb windows with a Z-score either below −1 or above 1 candidate AOH segments, this sample was considered normal.
Case #032 was from a 31-year-old pregnant woman who underwent amniocentesis for a high risk of trisomy 3 detected by noninvasive prenatal testing. Chromosome 3 encompasses at least 63 genes associated with autosomal recessive diseases, but no imprinted genes. Dandy-Walker malformation, which is a rare congenital malformation, was found by an ultrasound at 28 weeks of gestation. The parents opted to terminate the pregnancy and declined further genetic evaluation to detect potential autosomal recessive diseases. CNV of chromosome 3 in this case was normal, as shown in Figure 4A. The number of B_variant_count color in green increased dramatically; at the same time, the number of AB_variant_count also decreased, but the change in the number ABB_variant_count and AAB_variant_count is not evident in Figure 4B. When examining the Z-score distribution for B_variant_count and AB_variant_count, they show uniform values either above 1.5 or below −1.5 in Figure 4C. The divergence of the Z-score for the B and AB alleles is consistent with the genetic event of AOH occurrence, which corresponds to the BAF section of the CMA/SNP array result as indicated in Figure 4E. Genotyping error at medium coverage, suffering from insufficient depth, sequencing errors, variant call inaccuracy and statistic biases, is also significant for each SNV being called and increases the amount of noise level by interfering with single SNP-based AOH detection. However, the scheme proposed in this study for application of the total number of SNVs within a genomics block can be less affected by these genotyping errors, because the AOH event occurring at particular genomic loci belongs to a systematic event, since genotyping errors cannot influence the true occurrence of AAB and ABB alleles. As shown in Figure 4D, even with a whole chromosomal 3 AOH event, the SNVs of AAB and ABB alleles are still present at the basal level, but the Z-score distribution for the ABB_variant_count and AAB_variant_count show a small but significant shift below the zero line.  Case #032 was from a 31-year-old pregnant woman who underwent amniocentesis for a high risk of trisomy 3 detected by noninvasive prenatal testing. Chromosome 3 encompasses at least 63 genes associated with autosomal recessive diseases, but no imprinted genes. Dandy-Walker malformation, which is a rare congenital malformation, was found by an ultrasound at 28 weeks of gestation. The parents opted to terminate the pregnancy and declined further genetic evaluation to detect potential autosomal recessive diseases. CNV of chromosome 3 in this case was normal, as shown in Figure 4A. The number of affected by these genotyping errors, because the AOH event occurring at particular genomic loci belongs to a systematic event, since genotyping errors cannot influence the true occurrence of AAB and ABB alleles. As shown in Figure 4D, even with a whole chromosomal 3 AOH event, the SNVs of AAB and ABB alleles are still present at the basal level, but the Z-score distribution for the ABB_variant_count and AAB_variant_count show a small but significant shift below the zero line.

Evaluation and Analysis of Whole Genome Level AOH
Case #059 was from a 31-year-old pregnant woman with a twin pregnancy who underwent chorionic villus sampling of the hydatidiform mole and amniocentesis of the coexisting live fetus. The CMA/SNP array testing result of the live fetus was normal, while a whole genome AOH of the hydatidiform mole was identified by the CMA/SNP array.

Evaluation and Analysis of Whole Genome Level AOH
Case #059 was from a 31-year-old pregnant woman with a twin pregnancy who underwent chorionic villus sampling of the hydatidiform mole and amniocentesis of the coexisting live fetus. The CMA/SNP array testing result of the live fetus was normal, while a whole genome AOH of the hydatidiform mole was identified by the CMA/SNP array. The CNV of all the chromosomes of this hydatidiform mole was normal, as shown in Figure 5A,E of the Weighted log2 Ratio section. Figure 5B,C show that the number of B_variant_count increased and AB_variant_count decreased. The Z-score distribution and the divergence for B_variant_count and AB_variant_count meet the criteria we proposed in this study. Noticeably, as shown in Figure 5D, the Z-score distribution of AAB_variant_count remains relatively normal around the zero line, but that of ABB_variant_count shows large fluctuations below the zero line. The exact mechanism remains to be further elucidated. Paternal UPD was highly suspected in this case, but the parents declined further validation. Spontaneous abortion occurred at 23 weeks of gestation. Autopsy confirmed that the complete mole coexisted with a normal fetus.
The CNV of all the chromosomes of this hydatidiform mole was normal, as shown in Figure 5A,E of the Weighted log2 Ratio section. Figure 5B,C show that the number of B_var-iant_count increased and AB_variant_count decreased. The Z-score distribution and the divergence for B_variant_count and AB_variant_count meet the criteria we proposed in this study. Noticeably, as shown in Figure 5D, the Z-score distribution of AAB_vari-ant_count remains relatively normal around the zero line, but that of ABB_variant_count shows large fluctuations below the zero line. The exact mechanism remains to be further elucidated. Paternal UPD was highly suspected in this case, but the parents declined further validation. Spontaneous abortion occurred at 23 weeks of gestation. Autopsy confirmed that the complete mole coexisted with a normal fetus.

Evaluation and Analysis of Segmental AOH
Case #109 is from a 39-year-old pregnant woman who underwent amniocentesis due to an advanced maternal age. The CNV of chromosome 7 in this case was normal, as  1 (132,300,000-151,500,000, 18.1 Mb). The other regions of chromosome 7 do not meet the criteria for AOH consideration. This region encompasses at least 15 genes associated with autosomal recessive diseases, including AGK, NUP205, TBXAS1, etc., but no imprinted gene. CMA testing using parental peripheral blood samples ruled out parental consanguinity and validated that fetal AOH was not UPD. The baby was delivered at 34 weeks of gestation via cesarean section due to placenta previa, and the development was normal after 1 month of follow-up. The CMA/SNP array confirmed an AOH event at q32.3q36.1 of chromosome 7 with an estimated size of 19 Mb (132,165,146-151,810,715) ( Figure 6E).
shown in Figure 6A,E of the Weighted log2 Ratio section. Figure 6B,C show that the number of B_variant_count increased and AB_variant_count decreased, as well as the Z-score distribution and the divergence for B_variant_count and AB_variant_count at the genomic loci of 7q32.3q36.1 (132,300,000-151,500,000, 18.1 Mb). The other regions of chromosome 7 do not meet the criteria for AOH consideration. This region encompasses at least 15 genes associated with autosomal recessive diseases, including AGK, NUP205, TBXAS1, etc., but no imprinted gene. CMA testing using parental peripheral blood samples ruled out parental consanguinity and validated that fetal AOH was not UPD. The baby was delivered at 34 weeks of gestation via cesarean section due to placenta previa, and the development was normal after 1 month of follow-up. The CMA/SNP array confirmed an AOH event at q32.3q36.1 of chromosome 7 with an estimated size of 19 Mb (132,165,146-151,810,715) ( Figure 6E).  For segmental AOHs, a single AOH (>10 Mb) was detected in eight samples, and multiple AOHs (2.1-109.1 Mb) were detected in 14 samples (Table 1). Chromosomes 6, 7, 14, 15 and 20 associated with imprinted genes were detected in 11 samples, five of which involved the imprinted genes of chromosome 7. Moreover, an additional 2.15 Mb-16.9 Mb AOH was detected in 8 samples by our sequencing-based method. One common phenomenon was observed through the sequencing-based approach that detected multiple short segmental AOHs more frequently than the CMA/SNP array, occasionally resulting in a discrepancy in the number of AOH segments within one genomic region. For example, in case #002, the CMA/SNP array detected one segment of AOH at 2p21p11.2 spanning 45,974,855-877,053,152 with a size of 41.1 Mb. The same sample was identified to have two AOH segments of 2p21p13.2 (45,700-72,400,000, 26.7 Mb) and 2p13.1p11.2 (75,000,000-89,000,000, 14 Mb) by CMA-seq, where heterozygous SNVs can be detected in between and verified by high read depth sequencing.

Discussion
Our study demonstrated that the identification of clinically significant AOHs by CMA-seq is in concordance with the high-density CMA/SNP array in prenatal diagnosis.
Currently, CMA is the first-tier recommendation for fetuses with ultrasound abnormalities in prenatal diagnosis, with multiple advantages over conventional karyotyping and fluorescence in situ hybridization (FISH). Next-generation sequencing (NGS) technologies have revolutionized DNA sequencing, enabling entire genomes to be sequenced more cost-efficiently [18,31,32]. In fact, limitations of CMA/SNP arrays have also gradually begun to be realized since array design preferentially covers clinically critical regions over other genomic regions, as mentioned before [33]. This uneven probe design may result in failure to detect some pathogenic CNVs [20]. In recent years, low-pass CNV-seq based on NGS has emerged as a high-resolution and cost-effective technology for genome-wide CNV detection. Several studies have shown that the resolutions of CNV-seq in detecting chromosomal microdeletions/microduplications and mosaic CNVs are very similar to those of CMA [7,18,20,31,33,34]. Compared with CMA, CNV-seq has the advantages of reduced DNA amount and quality control requirements and has been widely used in prenatal diagnosis for CNV detection [24,35,36]. However, most current low-pass WGS only focuses on CNV detection and must be combined with other methods to detect AOH in prenatal diagnosis.
AOH is a genetic characteristic known to cause human genetic diseases mainly through autosomal recessive or imprinting mechanisms. Therefore, AOH identification is strongly recommended during prenatal or postnatal genetic testing. However, most current low-pass WGS studies have only focused on CNV detection [24,36].
First, we addressed the minimal sequencing coverage threshold and genomic bin size at different read coverage depths for reliable AOH detection. For genome sequencing, the sequencing depth is variable, and increased depth could achieve higher analysis accuracy for AOH detection. In 2020, Chaubey et al. increased the detection depth in 409 clinical cases to detect AOHs and CNVs simultaneously and found that some patients with pathogenic CNVs were missed by the CMA/SNP array [21]. In 2021, Dong et al. demonstrated the feasibility of a 4× coverage for AOH analysis (≥5 Mb) and showed high consistency compared to the CMA/SNP array in 17 prenatal/postnatal cases [24].
Second, we have demonstrated the basal parameters at the particular bin size window and showed that the number of these SNV alleles remains relatively stable. Moreover, we also proposed the criteria for candidate AOH loci consideration. The number of each type of allele variant count is informative to provide the absolute number of variants within a specified genomic window block. The Z-score for each type of allele provides a numerical and sensitive measurement for the convergence and divergence of variant distribution at a very high genotyping error background. The minimal size of AOH that can be reliably evaluated currently is at least 10 continuous 200 kb windows with a Z-score either below minus one or above one. Detected AOHs are then marked for further manual examination.
Third, this study was designed to perform a medium coverage WGS method to detect AOHs in prenatal samples. By using data from 28 normal samples and 27 positive samples, the AOH by CMA-seq exhibited 100% concordance with those of CMA/SNP array analysis. In addition, CMA-seq had some additional AOH findings due to increased SNVs coverage.
Fourth, according to the ACMG recommendation, the size cutoff value of the AOH reporting region for the identification of chromosome or fragment UPD is 5 Mb [17]. In this study, the data demonstrated that the minimal AOH size reporting region for the identification of fragment UPD is 2 Mb, which can meet the ACMG reporting standard.
The limitation of this study, based on the short read sequencing method, is the inadequate capability of detecting inversions and balanced translocations, which are similar issues encountered by the CMA/SNP array approach. Thus, further algorithm optimizations and improvements are warranted with our current data. One caution of this analysis that needs to be taken into account is that the sensitivity and accuracy of the Z-score depends on the estimation on the basal size of the linkage disequilibrium block for the particular ethnic population being applied to, which would require recalibration if used elsewhere [28,29].
In conclusion, our study demonstrated that the identification of clinically significant AOHs is concordant with CMA/SNP arrays in the characterization of direct samples from prenatal genetic screenings. It is feasible to analyze the AOH by CMA-seq. This is different from previously very low coverage CNV-seq, which can only yield CNV information. Simultaneous analysis of AOHs and CNVs by CMA-seq could improve the diagnosis yield and efficiency in prenatal diagnosis. This combination ability offers an example of genomics technologies that can deliver the promise of balancing clinical testing accuracy and low economic burdens.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/diagnostics13030560/s1, Figure S1: The number of variants dependence on the bin size selection for 24 normal samples.