A Review of Gene Sequencing in Infertility

Assisted reproductive technology (ART) is the technology used primarily for infertility treatments in order to achieve pregnancy. With the increasing demands of preimplantation genetic testing (PGT), the safety inspection of in-vitro fertilization (IVF), the prevention of pathogenesis in sterility, the application of gene sequencing technology in ART, and the diagnosis of infertility-related diseases have significantly increased. The development and principles of gene sequencing technology, the application of gene sequencing in PGT, the application of whole genome sequencing in ART-conceived children, and the application of whole genome sequencing in infertility-related disorders have been summarized in this review.

Since the DNA double helix structure was discovered in 1953, people began to understand that DNA fragments were responsible for genetic effects and that the gene is able to control biological properties (CHIU et al. 2020). Genes can not only faithfully reproduce themselves and maintain biological characteristics, but they can also mutate, leading to the occurrence of diseases that may be inherited by future generations. Therefore, it is necessary for researchers of the life sciences to reveal the complexity and diversity of genomes by gene sequencing and by further access to or utilization of genetic information. As gene sequencing technology has been widely used in many disease mechanism studies and clinical diagnoses, it will have extensive applied values in biomedicine, gene therapy, and clinic practice (THENMOZHI et al. 2020).
Infertility is a global problem, affecting an estimated one in four couples in developing countries. The incidence of infertility has not changed significantly over the past two decades, although it is diffi-cult to estimate because of the inconsistent definitions used and a lack of standardization in demographic surveys (MASCARENHAS et al. 2012;WILTSHIRE et al. 2020). Through traditional semen analysis and other classical diagnostic tests, approximately one-third of infertility cases are attributable to male factors, onethird to female factors, and the remaining third of couples are unable to determine the cause, also known as unexplained or idiopathic infertility (ISIDORI et al. 2006;THONNEAU et al. 1991). Some infertility cases of unknown causes may be due to an unknown disease in the men, women, or couples (i.e., men, women and women). Among them, the combination of male and female factors can lead to infertility in couples. So, in general, men are just as likely as women to be infertile (GNOTH et al. 2005;TURNER et al. 2020).
sRNA from Escherichia coli alanine and a yeast was the first nucleic acid molecule to be sequenced in 1965 (SANGER et al. 1965). A critical transition technique (the 'plus and minus' method) was introduced for DNA sequencing in 1975, which completely dominated gene sequencing for over 30 years (MAXAM & GILBERT 1977). In the 1970's, Sanger and Maxam-Gilbert developed the 'base-specific chemical cleaving method' (DEWEY et al. 2014) and the 'enzymatic chain termination dideoxy method' (BOYCOTT et al. 2013), which greatly improved the speed, efficiency and accuracy of DNA sequencing technology. The 'dideoxy method' and the 'chemical cleaving method' were considered to be 'first--generation' sequencing technology. It is notable that the completion of the Human Genome Project (HGP) was achieved as consequence of this so-called firstgeneration sequencing technology.
After the HGP, the application of nucleic acid sequencing has significantly increased in genomic function studies, with a resultant demand for highly efficient, rapid, and low-cost DNA sequencing platforms. Next-generation sequencing (NGS), or second-generation sequencing, is a generic term used to describe several high-throughput DNA sequencing technologies (GASSNER 2020). Also known as massive (or massively) parallel or deep sequencing, the technology can sequence DNA and RNA faster and more cheaply than the first generation of Sanger sequencing that was previously used, and even larger than the size of a genome. In the past decade, this development has revolutionized genomics and molecular biology (MARDIS 2011). The high demand for low cost sequence data has driven the development of NGS technologies that can produce 1000 or millions of sequences concurrently. NGS relies on massively parallel sequencing and imaging techniques to yield several 100's of millions to several 100's of billions of DNA bases per run (SHENDURE & JI 2008). Several NGS platforms, such as Roche 454 FLX Titanium (THUDI et al. 2012), Illumina Mi Seqand Hi Seq2500 (BENTLEY et al. 2008, and Ion Torrent PGM, have been developed and used recently (HE et al. 2014;QUAIL et al. 2012). High-throughput sequencing technologies are intended to lower the cost of DNA sequencing far beyond what is possible with the standard dye-terminator methods (SCHUSTER 2008). In ultra-high-throughput sequencing, as many as 500,000 sequencing-by-synthesis operations may be run in parallel (METZKER 2010a;QUAIL et al. 2012).
All NGS strategies follow a similar protocol for DNA template preparation, where universal adapters are ligated at both ends of randomly sheared DNA fragments. They also rely on the cyclic interrogation of millions of clonally amplified DNA molecules immobilized on a synthetic surface to generate up to several billions of sequences in a massively parallel fashion. Sequencing is performed in an iterative manner, where the incorporation of one or more nucleotides is followed by the emission of a signal and its detection by the sequencer (METZKER 2010b). Most NGS platforms are able to generate reliable sequences and display near perfect coverages of GC-rich, neutral and moderate AT-rich genomes. However, there are key differences between the quality of sequence data and the applications it will support (QUAIL et al. 2012).
NGS technologies commercialized by Illumina generate shorter reads, ranging from 50 to 300 bp, with sequencing throughputs ranging from 1.5 to 600 G bp depending on the platform being used. Several instruments are commercialized by Illumina, ranging from the bench top Mi Seq sequencer to the high-throughput Hi Seq2500 sequencer. The Illumina sequencing technology combines clonal amplification of a single DNA molecule with a cyclical sequencing-bysynthesis approach. PCR amplification is performed using a solid phase amplification protocol to generate up to 1,000 copies of an original molecule of DNA, grouped together into a cluster. Sequencing is performed with proprietary reversible fluorescent terminator deoxyribonucleotides, in a series of cycles consisting of single base extension, fluorescence detection (where the nature of the signal is used to determine the identity of the base being incorporated) and cleavage of both the fluorescent label and of the chemical moieties at the 3 hydroxyl position to allow for the next cycle to occur (HE et al. 2014).
The application of NGS technologies highlights the striking impact of these massively parallel platforms on genotyping, which have expanded from previously focused readouts from a variety of DNA preparation protocols to a genome-wide scale and have fine-tuned their resolution to single base precision. This method, takes advantage of DNA polymerases to incorporate four different fluorescently labeled deoxyribonucleotide triphosphates (dNTPs) into a DNA template strand during DNA synthesis. The nucleotides are identified by fluorophore excitation at the point of incorporation. The DNA sequence is then analyzed by proprietary software on the sequencing platform. NGS sequencing platforms are typically represented by GSFLX Titanium from Roche, Solexa Genome Analyzer from Illumina, and SOLiD/Ion Torrent PGM from Life Sciences. Sequencing through synthesis is the most widely adopted technology. The principle of this approach is that a fluorescently labeled, reversible terminator is imaged as each dNTP is added, and then cleaved to allow the incorporation of the next base, which can allow the millions of DNA sequences to be read, thereby making a deep sequencing of transcriptome at a low cost and at high speed achievable (WANG et al. 2014).
NGS technologies have been under continuous development and improvement with the onset of the 21st century. By eliminating the major NGS drawback-short read length, the revolution is continu-ing as scientists move into the era of single molecule sequencing. This is often referred to as 'third-generation' sequencing. Several platforms have been developed to provide low-cost and high-throughput RNA sequencing, such as true single molecular sequencing (tSMS), single molecule real-time (SMRT), fluorescence resonance energy transfer (FRET), DNA strand transits through nanopores, and microscopy-based techniques.
With the advance of gene sequencing technologies, the application of whole genome sequencing has been widely used in many fields of biology, including genetic studies of diseases (MARTIN et al. 2013). Over 4000 genes responsible for rare monogenic diseases have been identified (PALINI et al. 2013). Whole exome sequencing provides new approaches to characterize individual genomic landscapes and identify disease-related information, such as simple nucleotide variations (SNVs), insertions and deletions (INDELs), copy number variation (CNV), structural variation (SV) , RNA expression profiling difference , and the base 5-hydroxymethylcytosine (HOU et al. 2013). Ultimately this will facilitate the discovery of rare-disease-causing genes and increase therapeutic opportunities.

The Application of Gene Sequencing in Preimplantation genetic testing (PGT)
PGT is a test performed to analyze the DNA from oocytes (polar bodies) or embryos (cleavage stage or blastocyst) for HLA-typing or for determining genetic abnormalities ( Figure 1). These include: PGT for aneuploidies (PGT-A); PGT for monogenic/single gene defects (PGT-M); and PGT for chromosomal structural rearrangements (PGT-SR) (ZEGERS-HOCHSCHILD et al. 2017). As an assisted reproductive technology (ART), PGT was applied and resulted in the success of the world's first PGT test-tube baby in 1990 and it has become an important technology in clinical applications (GRECO et al. 2020;LI et al. 2015). NGS is a rapidly developing approach applied for PGT. With increasing applications in clinical practices, this new approach offers the process of removing a single-cell from an in vitro fertilization embryo for genetic testing. In comparison with fluorescence in situ hybridization and DNA chip testing, the NGS technique allows for the detection of an embryonic chromosomal abnormality rapidly with higher accuracy, higher resolution, more sensitivity, and efficiency (BOYCOTT et al. 2013). It can detect copy number variations more than 1 Mb at the single cell level (LI et al. 2015) and can identify individual single-nucleotide variations (SNVs) with no false positives detected . Furthermore, NGS could potentially support more 'personalized' procedures in patients at risk of genetic disease (PETERS et al. 2015).
The success of gene sequencing requires a large population of cells, the gene expression information derived from clinical studies reflects the expression from the population level, i.e. the majority of cell types among millions of cells are typically analyzed in bulk. However, the cell phenotypic difference is due to cell heterogeneity (KONG et al. 2012) and gene expression noise (SUZUKI & BIRD 2008). To understand gene expression from a single cell for PGT, genomic sequencing must be done on a single cell.
Some common resources used for PGT include blastomere cells, trophectoderm cells, first and second-polar bodies, and blastocoel fluid (ZHAO et al. 2013). The quantity of genomic DNA present in an individual cell is insufficient to directly sequence; therefore, genome-wide amplification (WGA) has been used as the first step in single-cell DNA amplification. However, allele dropout (ADO) and the amplification bias (PA) phenomenon still restrict low genome coverage of WGA-based PGT at 40%. To address this limitation, WGA efficiency must be increased. A new technique has been developed to obtain extensive coverage and uniform amplification of genomes (SUZUKI & BIRD 2008). This method, termed multiple annealing and looping-based amplification cycles (MALBAC), leads to 93% genome coverage at an average 253 sequencing depth for DNA sequencing from a single cell. This method was also used to sequence sperm to phase the personal genome and map recombination events at a high resolution (HALLIDAY et al. 2004). With this method, the genome of the oocyte pronucleus can be deduced by sequencing the triads of the first and second polar bodies (HANSEN et al. 2013), thereby avoiding mater-nal genetic congenital birth defects in newborns. MALBAC-based preimplantation genomic screening makes it possible to accurately select normal fertilized eggs for embryo transfer (FENG et al. 2008;KITZMAN et al. 2012;ZHENG et al. 2013), therefore, the sequencing of MALBAC-amplified material from a single cell increases the number of healthy, live babies born (EHRICH 2011). In addition, NGS could potentially support more 'personalized' procedures in patients at risk of genetic disease (PETERS et al. 2015).
De novo mutations, occurring in a gamete (egg or sperm) of one of the parents or in a developing fetus, include single nucleotide polymorphisms (SNPs) and short multibase insertion/deletions (INDELs). This will cause a number of genetic diseases, such as Down syndrome, severe intellectual disability, autism, epileptic encephalopathies, and many other congenital disorders. However, due to the mutation being absent in the blood DNA of the unaffected parents, the screening of both parents could fail to detect such a mutation. By using gene sequencing, the preimplantation genomic screening of an embryo prior to being transferred to the uterus allows for the detection of de novo mutations and methylation pattern. PALOMAKI et al. (2012) developed a new process using advanced WGA to detect single base de novo mutations from IVF blastocyst biopsies, providing highly sensitive and specific screening. In addition, WGA can also be used to understand the profile of DNA methylation (QIN et al. 2007). This method facilitates the evaluation of ART outcomes and improves clinical diagnoses. WGA not only offers great potential in analyzing preimplantation embryos, but also poses substantial challenges regarding its clinical utilization, such as the difficulty with examining balanced translocations and expansion CGG(n) testing (e.g. fragile X-mental retardation-1 gene). Therefore, a better reference genome and the combination of traditional PGT techniques are needed to improve its diagnostic accuracy. In addition, the limitation in resolution of chromosome level PGT with DNA PGT needs to be overcome in the future. The possibility of performing WGA and NGS for PGT is just around the corner for IVF. With the breakthrough of sequencing technology and accurate sequencing data analysis, it will be possible to increase the chances of a successful pregnancy and live baby birth rate, as well as to reduce the miscarriage rate due to aneuploidy embryos and to avoid congenital birth defects.

The Application of Gene Sequencing in the safety assessment of ART progeny
In mammals, gametogenesis and early preimplantation development are two critical periods of epigenetic modification. A wave of global DNA demethylation and remethylation during these two periods, which have an important role in feto-placental development, are required for the expression of imprinted genes (WANG et al. 2012). ART-related manipulations to oocytes and embryos involve several processes, such as follicular stimulation, intracytoplasmic sperm injection (ICSI), and embryo culture. These ART procedures might predispose embryos to acquire imprinting errors and diseases, resulting from coincident timing between gametogenesis, embryo development periods, and the time windows of ART (WU et al. 2005). Several studies have shown that ART treatment increases the risk of live baby birth defects (HANSEN et al. 2013). ART might result in abnormal methylation, loss of methylation, aneuploidy, structural rearrangements of chromosomes, dynamic mutation of trinucleotide repeats (ZHENG et al. 2013), and X or Y-chromosome microdeletion (FENG et al. 2008). In addition, ISCI offspring are at higher risk of aneuploidy and structural rearrangements of the chromosomes than control groups and the incidence of Y-chromosome micro deletion in the AZF gene increased in ART children. Whole genome sequencing (WGS) diagnoses can detect these genetic diseases by examining a small quantity of embryo cells before a baby is born.
Chromosome abnormality is an inherited genetic disease and it is one of leading cause of congenital birth defects. However, due to the lack of a better therapy, the pregnancy is often terminated according to diagnostic testing. Currently amniocentesis and umbilical vein sampling are two main accurate and useful diagnostic antenatal tests for the diagnosis of chromosomal abnormalities. These are fairly easy procedures and are less traumatic to the fetus. The disadvantage of these two tests is that they are invasive and carry a small risk of miscarriage. Hence it is important that these tests are performed only in certain situations with an accurate diagnostic result. Highthroughput sequencing in combination with amniocentesis and umbilical vein sampling from fetal cell DNA in maternal blood plasma has been widely used in the identification of trisomy 21, trisomy 18, trisomy 13, and Turner's syndrome.
There are two fetal genetic materials present in maternal circulation, including intact fetal cells and cell-free fetal DNA (cff-DNA). All fetal genomes are present in maternal blood in the form of cff-DNA (ALBERRY et al. 2007). They consist of short DNA fragments in maternal circulation, originating from apoptotic placental cells (trophoblast cells) in the embryo (LUN et al. 2008). In addition, cff-DNA can be detected within 4 weeks of gestation and will be gone from maternal circulation within 2 h after delivery (SUZUMORI et al. 2018). Therefore, cff-DNA in maternal plasma has been used for fetal aneuploidy, NIPT of paternal hereditary disease, and fetal X-chain disease sex identification. The presence of fetal cells is a relative rarity in maternal blood, estimated to be at only one to two per ml in the first-trimester pregnancy, whereas cff-DNA is more abundant in maternal plasma in early pregnancy. Therefore, analysis of cff-DNA provides a method for a non-invasive prenatal diagnosis (NIPD). A variety of methods have been used for mutation detection with cff-DNA. As a upto-date technique, gene sequencing is trended to be applied in clinical studies (PAPAGEORGIOU & PATSALIS 2013). It approaches high sensitivity and accuracy to perform prenatal screening for trisomy 21 (EHRICH 2011), trisomy 16, and trisomy 18 (PALOMAKI et al. 2012) using the NGS of cell-free DNA found in maternal circulation. Furthermore, a complete set of the genome sequences of a human fetus at 18.5 weeks of gestation promoted the analysis of monogenic diseases using noninvasive prenatal genetic diagnostics (KITZMAN et al. 2012).
A safe treatment is critical to implement the diagnosis of genetic diseases, ART can provide an effective treatment to evaluate and ensure its safety by sequencing. With technology advancing, the information of small gene mutation, including chromosomal aneuploidy of ART fetal, mark genetic changes, and small deletions in tiny genetic mutations, can be obtained from cff-DNA or fetal cells in the blood of pregnant women. Consequently, such information allows for a better assessment of the safety of ART offspring and a decrease of the risk of birth defects.

The Application of Gene Sequencing in Infertility-Related Disorders
According to the investigation and statistics of the World Health Organization (WHO), infertility is mainly caused by premature ovarian failure (POF), polycystic ovarian syndrome (PCOS), endometriosis (EMS), and spermatogenesis. To test for genetic disorders, DNA sequencing is one of the most reliable, accurate methods to identify possible disease-causing mutations. NGS has been revolutionizing genetic research and diagnostics, especially in the area of human fertility problems. Advances in NGS have led to a better understanding of reproductive genetics by investigating infertility-related genes, such as the genes involved in premature ovarian failure, polycystic ovarian syndrome, endometriosis, and spermatogenesis.
POF was defined as the cessation of ovarian function before the age of 40 and is associated with gonadotropin serum (FSH) & GT; 40mIU/ml) (GUNNING et al. 2019). Studies have reported that incidences of POF in women before they are 20 years old is 0.01%, before they are 30 years old is 0.1% (KOKCU 2010), and before they are 40 years old is 1%-2% (CABURET et al. 2014). The average age of onset is 23.3 years old (PERSANI et al. 2010). POF reduces the fertility of women of childbearing age. In addition, it has serious health consequences, including an increase in autoimmune diseases, osteoporosis, infertility, ischemic heart disease, and cardiovascular disease. Genetic studies through gene sequencing indicated that a variety of genes are related to premature ovarian failure, including the Nanos Homolog 3 gene (NANOS3) (QIN et al. 2007), B-cellymphoma-2 gene (Bcl-2), Human bone morphogenetic protein 15 gene (BMP-15), and growth differentiation factor 9 (GDF-9) (LAISSUE et al. 2006). NANOS3 is a functional gene that makes primordial germ cells (PGCs) migrate to the gonad and preform their reproductive function. NANOS3 can express in oocytes at different follicular development phases and consequently, takes an important role in the formation of germ cells. Bcl-2 is a kind of apoptosis gene that can even speed up the apoptosis and atresia of follicles if its expression is down-regulated (WANG et al. 2012). BMP-15 is only expressed in oocytes and the oocyte-specific BMP-15 may promote follicle growth in vivo, but prevent the premature luteinization. If the BMP-15 gene is down-regulated, follicles can singularly mature and consequently, this will lead to ovulation failure. The GDF-9 gene can normally regulate the expression of key enzymes which warrant the normal development of the cumulus-oocyte complex. In an experiment on mice, follicular development was restrained and could cease at the primary stage if GDF-9 genes were knocked out. In addition, many genes, such as the transforming growth factor III receptor gene (TGFBR3), fox head box L2 (FOXL2), newborn ovary homeobox gene (NOBOX), factor in the germ line alpha (FIGLA), sal like 4 (SALL4), diaphanous homolog 2 (DIAPH2), natriuretic peptide C (NPPC), (zinc finger X) ZFX, the X-inactivation gene (XISF), and the FSH receptor gene, are related to premature ovarian failure and distributed on the autosome and X allosome (QIN et al. 2007).
PCOS, which is characterized by irregular menstruation, infertility, hirsutism, and polycystic ovary morphology (PCOM), is a common disease which affects female fertility, and is prone to increase the risk of complications, such as diabetes, cardiovascular disease, and endometrial cancer, in the long term (BARRY et al. 2014). Genetic studies have shown that PCOS is caused by a multigene mutation accompanied by abnormalities of the endocrine and metabolic systems. These multigenes mainly include the aldosterone synthase gene, CYP11A gene, a variable number of tandem repeats (VNTR), the MCF2L2 gene, melatonin receptor gene 1B (MTNR1B), adiponectin gene, gonadotropin releasing hormone receptor (GnRH-R) gene, serum tumor necrosis factor alpha (TNF-á), and insulin like growth factor 1 (IGF-1). These genes are closely related with the PCOS pathophysiological mechanism and mainly concentrate on insulin, sex hormones, and the type II diabetes mellitus (T2DM)-related pathway, providing a new direction for PCOS genetic and genomic research.
EMS affects 10-15% of women of childbearing age. Endometriosis may occur in 17-44% of all patients treated with assisted reproductive technology (ART) (REDWINE 1999). The likely impact of endometriosis on ART outcomes remains the opposite. The available data on endometria-mediated ovarian injuries are conflicting. A gene association study of EMS indicated that several regional mutations of chromosomes were correlated with morbidity and that the abnormal expressed gene is involved in cellular localization, ion channel, and material transport. The periodic expression of a few homeobox genes (HOX) in the endometrium has been shown to be important for implantation and decidualization. HOXA10 gene expression in the endometrium is significantly decreased in patients with EMS infertility (WU et al. 2005).
Studies of the correlation between sperm quality and spermatogenesis genes in infertile males indicated that abnormal epigenetic modifications, Y chromosome microdeletion, and genetic variation in the autosomes can affect sperm quality and spermatogenesis. Normal epigenetic modifications of sperm can ensure the normal formation of the sperm nucleus and trigger normal, embryonic development. However, abnormal epigenetic modifications of sperm may result in early miscarriage, a descendant's phenotype defect, and clinical illnesses after birth. Some researchers reported that the expression of transcription factor (Ets variant 5) and the methylation transferase gene (ZAMUDIO et al. 2011) were defective and consequently, led to a spermatogenesis disorder and even to infertility, and the decrease of methylation levels of immune-related genes and sperm imprinting genes. The increase of the promoter activity of the perm DAZ (deleted in azoospermia) gene can lead to oligospermia. Low methylation changes on CpG loci result in weak sperm; whereas the low methylation of the maternal imprinted gene (MEST) can lead to a change in sperm morphology and dynamic (ZHENG et al. 2013). The Follicle stimulating hormone (FSH) â subunit and the follicle stimulating hormone receptor (FSHR) play an important role in regulating spermatogenesis. FSH and FSHR can be used as genetic markers associated with infertility treatment and consequently, they can provide a basis for hormone therapy and the optimization the infertility treatment program.
At the genetic level, diseases related to infertility involve chromosome diseases, single gene diseases, point mutations of related genes, epigenetics, etc., therefore, gene sequencing technology is considered to be a very effective way to reach an infertility diagnosis. In addition, many diseases are the result of interactions between multiple genes, gene sequencing technology can help us to discover differentially expressed genes and the genes related to etiology. This would be very helpful in the study of the infertility mechanism and to design a treatment plan.

Conclusions
As our scientific knowledge and experience accumulate, the ART success rate has significantly increased and the potential applications of this technology have widely expanded. With the increasing prevalence of ART pregnancies, there have also been many concerns, which have gradually been overcome through the improvement of ART over more than 40 years. Tens of thousands of infants were born via ART around the world a number that will continually increase in the future. Although ART can help plenty of couples to solve their fertility issues, some unsuccessful cases resulted from individual differences and offspring safety assessments that are still an issue. These challenges will be settled with the rapid development of new technology in the postgenomic age.
The gene sequencing technique has not only been applied to the whole genome in a single cell, but also to the transcriptome. With NGS, the dynamic change of the transcriptome of human oocytes and early embryos, as well as the differential expression of allelic genes can be discovered in different developmental phases. The basic process and theory of the cell cycle, gene regulation, and metabolic pathways can be clarified and the gene promotor region in early embryonic development will one day be understood. Gene sequencing technology has had a significant impact on the study of the developmental mechanism of embryos, the diagnosis of human fetal sex diseases, and the prevention and treatment of infertility.