Application of third-generation sequencing for genetic testing of thalassemia in Guizhou Province, Southwest China

ABSTRACT
 Objectives To explore the application of third-generation sequencing (TGS) for genetic diagnosis and prenatal genetic screening of thalassemia genes. Methods Two groups of subjects were enrolled in this study. The first group included 176 subjects with positive hematological phenotypes for thalassemia. Thalassemia-associated genes were detected simultaneously in each sample using both the PacBio TGS platform based on single-molecule real-time (SMRT) technology and the conventional PCR-reverse dot blot (PCR-RDB). Sanger sequencing was used for validation when results were discordant between the two methods. The second group included 53 couples with at least one partner having a positive thalassemia hematological phenotype, and they were screened for homotypic thalassemia variants by TGS, and the risk of pregnancies with babies presenting with severe thalassemia, was assessed. Results Of the 176 subjects, 175 had concordant genotypes between the two methods, including 63 normal subjects and 112 α- and/or β-thalassemia gene carriers, with a concordance rate of 99.43%. TGS detected a rare β-thalassemia gene variant −50 (G > A) that was not detected by conventional PCR-RDB. TGS identified seven of the 53 couples as homotypic thalassemia gene carriers, five of whom were at risk of pregnancies with severe thalassemia. Conclusion TGS could effectively detect common and rare thalassemia variants with high accuracy and efficiency. This approach would be suitable for prenatal thalassemia genetic screening in areas with high incidence of thalassemia.


Introduction
Thalassemia is a heterogeneous group of hereditary hemolytic disorders caused by the absence or insufficient synthesis of α-globin or β-globin and is one of the most common monogenic genetic diseases in the world [1]. It is estimated that approximately 1-5% of the world's population are carriers of mutations within the thalassemia genes [2]. It is mainly distributed over the Mediterranean coast, North Africa, the Middle East, the Indian mainland, Southeast Asia and Southern China [1,3]. Guizhou is a multi-ethnic province located in the southwest of China and is an area where the incidence rate for thalassemia is high.
In most clinical laboratories within China, detection of common mutant phenotypes in the population is by first-line techniques after blood phenotypic testing. If the test result is negative, then second-line techniques are used to detect rare or unknown mutations [4,5]. For thalassemia gene detection, the first-line techniques include Gap-Polymerase Chain Reaction (Gap-PCR), real-time PCR, reverse dot blot (RDB) and probe melting curve analysis (PMCA), while second-line techniques include multiplex ligation-dependent probe amplification (MLPA), array-CGH, and Sanger sequencing [4,5]. The main drawbacks of these methods are that they are cumbersome, labor-intensive and have a risk of missed diagnoses, limited resolution, and able to detect a limited range of mutations [6]. In recent years, next-generation sequencing (NGS) technology has also been used for the molecular diagnosis of thalassemia [7,8]. However, disadvantages of the NGS technology include too short read length, introduction of PCR amplification errors, GC bias [9] and high costs [8,10]. In addition, the NGS technology cannot effectively detect repetitive regions and structural variants within the human genome [11]. The third-generation sequencing (TGS) technology developed by Pacific Biosciences (PacBio) effectively solves this problem whereby sequencing accuracy is high, especially HiFi Reads generated by the Circular Consensus Sequencing (CCS) mode with an accuracy of up to 99.999% [11]. This sequencing platform uses a simplified sample preparation procedure and reduces sequencing time and costs [12]. Taken  together, TGS provides a solid technology for largescale clinical applications. Some previous studies confirmed that the SMRT technology for genetic diagnosis of thalassemia was advantageous in terms of wide detection range, high efficiency and high accuracy [1,13]. However, no related study has been undertaken on using the TGS technology for thalassemia gene detection in the population of Guizhou province. In this study, 176 individuals from Guizhou province with suspected thalassemia were screened using the TGS technology and conventional PCR-RDB simultaneously, while Sanger sequencing was used to evaluate the diagnostic performance of the TGS technology. Subsequently, 53 couples with at least one positive thalassemia phenotype were tested using SMRT to screen for couples with high-risk pregnancies and advised to perform prenatal diagnosis.

Subjects
The first group (n = 176) of subjects was aged between 3 months to 80 years old and made up of 67 males and 109 females who presented at Guizhou Provincial People's Hospital from January 2020 to December 2021 and assessed to have a positive thalassemia phenotype. The second group of subjects consisted of 53 couples that presented at the same hospital between January 2022 and June 2022 to undergo genetic screening for thalassemia in preparation for pregnancy or during the first trimester, with at least one member of each couple having a positive thalassemia phenotype. The positive hematological phenotype of thalassemia met at least one of the following inclusion criteria: (1) routine hematology examination showed abnormal mean corpuscular volume (MCV) ≤ 82 fL and/or a mean corpuscular hemoglobin (MCH) ≤ 27 pg; (2) hemoglobin electrophoresis showed HbA2 ≥ 3.5%, elevated Hemoglobin F (Hb F) or abnormal hemoglobin. This study was approved by the Ethics Committee of Guizhou Provincial People 's Hospital (approval number 2022-05), and all study subjects or their legal guardians signed an informed consent form.

Hematologic screening
All samples were screened for the thalassemia phenotype using routine hematological methods [13]. An automatic blood cell analyzer (Sysmex XN-100-4, Japan) was used to analyze red blood cell parameters and an automatic electrophoresis analytic system (Hydrasys LC; Sebia Electrophoresis, Evry, France) was used for hemoglobin analysis.

Extraction of genomic DNA
The magnetic bead-beating method [1] with the NP968 Nucleic Acid Extraction System (Xi'an Tianlong Science and Technology, Xìan, China) was used to extract genomic DNA from the blood samples. The concentration and purity of extracted DNA were assessed with a NanoDrop spectrophotometer (Thermo Scientific, USA). The ratio of absorbance values at OD260 nm/280 nm for the extracted DNA was between 1.5 and 2.5, and the concentration was 20-40 ng/μl. The extracted DNA was stored at −20°C.

TGS for detection of αand β-thalassemia variants
The extracted genomic DNA samples were tested for variants within the αand β-thalassemia genes using the single-molecule real-time (SMRT) technology. Briefly, the target regions were amplified by longrange multiplex PCR, and then the PCR fragments were end-repaired and ligated to the PacBio barcoded adapters to form individual dumbbell-shaped pre-libraries. Then equal mass of individual prelibraries were pooled and mixed with the sequencing primer and DNA polymerase to form the PacBio sequencing library. Sequencing was performed using the PacBio Sequel II platform (PacBio, Menlo Park, CA, USA) under the circular consensus-sequencing (CCS) mode with a run time of 30 hours. The average polymerase read length was on average between 70-80 kb. The SMRT Link system provided by PacBio was used to convert raw subreads into high-fidelity CCS reads, which were divided into individual samples based on different barcodes, and then aligned to the genome build hg38. FreeBayes1.3.4 (Biomatters, San Diego, CA) was used to analyze single-nucleotide variations (SNVs) and indels. The SNVs, indels, large deletions and structural variations were annotated according to HbVar database of hemoglobin variants (https://globin.bx.psu.edu/), Ithanet public resource on Hb disorders (https:// www.ithanet.eu/) and Leiden Open Variation database (LOVD) (https://www.lovd.nl/).

Concordance between conventional RDB and TGS for thalassemia genetic testing
Of the 176 samples, 175 had consistent genotypes between the two assays, comprising of 63 normal individuals and 112 αand/or β-thalassemia gene carriers, with a concordance rate of 99.43% (175/176) ( Table 1). A total of 15 types of variants (122 alleles) were detected in 176 samples by TGS, including six types of α-thalassemia variant (58 alleles) and nine types of β-thalassemia variant (64 alleles) ( Table 2). An additional rare β-thalassemia variant c.−100G > A in one allele was detected by TGS ( Table 2). The detection rate for thalassemia variants in subjects with positive hematological phenotypes was 64.20% (113/176). The frequencies of the 14 common variants were consistent with those reported in the literature [16,17].
Clinical phenotype of the patient with a rare βthalassemia variant The rare variant c.−100G > A in the HBB gene detected by TGS was validated by Sanger sequencing ( Figure 1). c.−100G > A was identified as a likely rare β + thalassemia variant [18][19][20]. The patient with heterozygous c.−100G > A variant was an 18-year-old male with no clinical symptoms, and he also carried a heterozygous -SEA deletion. Red blood cell indices showed that the cells were microcytic but the hemoglobin level in the cells was normal (Hb 136 g/L, MCV 68.8 fl, MCH 21.1 pg, MCHC 307 g/L). -SEA /-α 3.7  Genotyping of 53 couples screened for thalassemia TGS was performed for both couples if one of them had positive hematological parameters. Of the 53 couples screened, TGS detected 13 couples without αand β-thalassemia gene variants, 13 couples with only one partner carrying α-thalassemia variants, 12 couples with only one partner carrying β-thalassemia variants, 8 couples carrying different types of thalassemia, and 7 couples carrying the same type of thalassemia. Of the 7 couples, five were homotypic αthalassemia carriers while the other two couples were homotypic β-thalassemia carriers (Table 3). While two couples were both α-thalassemia silent carriers and did not have a high pregnancy risk, the other five couples were at risk of having children with moderate or severe forms of thalassemia and required prenatal diagnosis. The five couples accounted for 9.43% (5/53) of couples with at least one partner who was positive for the thalassemia hematological phenotype.
Four variants of unknown significance (VUSs) in the HBB gene were detected by TGS in five individuals from the two groups, including two heterozygous carriers of the c.

Discussion
In the past, conventional genetic testing methods for thalassemia were mainly aimed at detecting common variants in specific populations and could not detect rare variants. Several provinces in southern China have launched the 'zero birth' program for fetuses with severe thalassemia, which requires clinical laboratories to conduct tests that are able to detect rare thalassemia variants. TGS technology has only just begun to be used clinically, over the last two years, for thalassemia genetic testing. The TGS platform has shown advantages over conventional detection methods and the NGS technology, in terms of detection rates. In this study, TGS technology was used for thalassemia genetic testing and prenatal genetic screening in Guizhou Province, China, to evaluate its potential value for clinical application.
The results of this study have shown that the detection rate for thalassemia gene mutations in individuals with a positive thalassemia hematological phenotype was 64.20% (113/176). In 53 couples with at least one positive hematological phenotype, TGS detected homotypic thalassemia genes in 5 couples accounting for 9.43% (5/53) of the couples tested and these couples required prenatal diagnostic tests. These results propose that Guizhou Province could be an area of high thalassemia incidence with a high positive detection rate for variant thalassemia genes in the population. Nearly one-tenth of the couples with at least one positive hematological phenotype were at risk of giving birth to a child with thalassemia major and therefore, required prenatal diagnosis. Taken together, it is necessary to strengthen thalassemia screening of the reproductive age population in Guizhou province and adopt the thalassemia gene detection methods suitable for the local population to effectively prevent the birth of children with severe thalassemia.
Among 176 individuals with positive hematological phenotypes of thalassemia, the concordance rate between conventional PCR-RDB and TGS for the detection of thalassemia genes was 99.43% (175/176), which was significantly higher than that reported elsewhere. Luo et al [1] reported that the positive detection rate of TGS was 9.91% higher than that of conventional techniques in 434 enrolled cases. Zhuang et al [21] reported a 7.14% incremental yield in rare α-and β-globin gene variants by TGS technology as compared to the conventional detection methods. The difference in concordance rates may be due to our small sample size, but another reason is that we only took definite causative variants into consideration, while other studies included VUSs for statistical analysis. For example, Luo et al [1] included likely benign variants such as c.316-45G > C and c.315 + 308delA in the statistical analysis. As far as pathogenic variants are concerned, the detection rates for both methods are highly concordant. Therefore, conventional PCR-RDB is still an inexpensive and effective genetic test for thalassemia in economically underdeveloped areas.
Hundreds of αand β-globin gene variants contribute to thalassemia, and these defective types and their combinations are complex. The TGS platform is able to cover the full length of both αand β-globin genes due to its capacity to produce ultralong reads [11,22]. Previous studies have shown that the advantage of TGS for thalassemia genetic screening is not only in detecting rare SNVs and Indel-type mutations, but also the ability to detect structural rearrangements of the α-globin gene cluster, including the α-globin gene triplication and HongKong αα (HKαα) allele [23]. In addition, TGS adopts the haplotype analysis mode, which facilitates further understanding of the range of sequence diversity, and TGS can also determine whether the compound heterozygous mutations are in cis or trans configuration [21,24]. Despite our small sample size, TGS detected an additional rare β-thalassemia gene mutation c.−100G > A, indicating that TGS has an advantage in the detection of rare variants. We suggest that TGS can be used as a second-line technique for samples with positive thalassemia hematological phenotypes but negative mutation detection using conventional methods. TGS can also be used directly as a first-line detection procedure for the thalassemia gene in laboratories able to run routine TGS analysis to improve the detection rate of rare mutations.
As the TGS technology is highly sensitive and has a wide detection range, many VUSs will certainly be detected. In our subjects with a small sample size, five individuals were identified as carriers of four types of VUSs. In a number of previous reports on the application of NGS and TGS for thalassemia detection, identification of these VUSs illustrated the high sensitivity of the two detection methods [1,21]. However, when applied in a clinical setting, these VUSs were identified but seemed to be of unknown clinical significance. Hence, detection of these variants in clinical laboratories may lead to misdiagnosis by clinicians. This is particularly critical in prenatal diagnosis where these VUSs should be interpreted with caution. For example, we found that a male subject carried both a rare nonpathogenic variant c.316-3C > T and a common pathogenic variant c.126-129delCTTT within the HBB gene, which easily led clinicians to misdiagnose him as β-thalassemia intermedia, but in fact, he presented with only clinical phenotypes of mild β-thalassemia. Therefore, we recommend that clinical laboratories should not list such VUSs in test reports to avoid confusion and unnecessary stress on the patients. At the same time, laboratory personnel and clinicians should use both genotype and phenotype data to make a comprehensive patient diagnosis and avoid any misdiagnosis. The TGS-derived raw reads have high error rates resulting from the single-molecule sequencing approach. However, with high-fidelity CCS reads generated from multiple raw reads and a high sequencing depth are expected to reduce the occurrence of random sequencing errors. Previous studies have also demonstrated that with HiFi reads and more than 60× sequencing depth, PacBio sequence reads provide a high degree of accuracy for variant analysis [13,21,24]. Although the total cost of library preparation and sequencing reagents per sample is low, the PacBio instrument is very expensive, which makes it difficult to apply this technology in smaller clinics. Developing a less-expensive benchtop platform could be a solution to increase the clinical application of TGS in areas with a high prevalence of thalassemia.
In conclusion, our study is the first to demonstrate the value of TGS in genetic testing for thalassemia in the Guizhou population. TGS can effectively detect both common and rare thalassemia variants with high accuracy and efficiency, and should be widely used for genetic testing and prenatal thalassemia genetic screening in areas with high incidence of thalassemia.