Clinical Interpretation and Management of Genetic Variants

Highlights • The human genome contains approximately 4 million variants, whose population frequencies vary according to the ethnic backgrounds.• Genetic diversity of humans in part determines interindividual variability in susceptibility to diseases, response to therapy, and the clinical outcomes.• Genetic variants exert a gradient of biological and clinical effect sizes. In general, variants with the largest effect sizes are responsible for the single-gene disorders, whereas those with moderate and modest effect sizes are responsible for oligogenic and polygenic diseases, respectively.• A phenotype is the consequence of nonlinear stochastic interactions among multiple genetic and nongenetic determinants.• Discerning pathogenicity of the genetic variants, identified through genetic testing, in the clinical phenotype is challenging and requires complementary expertise in human molecular genetics and clinical medicine.

The DNA replication machinery is extremely exquisite and precise. The major unit of this complex, namely the DNA polymerase, which was discovered by Arthur Kornberg in 1955, is blazingly fast and amazingly accurate, albeit it is not perfect. The replication machinery incorporates approximately 1 wrong nucleotide per every 100 million nucleotides that it synthesizes (the error rate is w1.3 Â 10 À8 per nucleotide) (2)(3)(4). Given the size of the human genome being approximately 3.2 Â 10 9 base pairs, the error rate of the DNA replication machinery introduces approximately 50 de novo point mutations and a lesser number of larger mutations with each genome replication (2)(3)(4)(5). Thus, each offspring differs from the parents by about 50 novel genetic variants.
The replication error rate is not uniform across the human genome and varies according to complexity of the genome, with some spots being more prone to mutations (5)(6)(7). It is this rare error rate of the DNA replication machinery that is mainly responsible for human genetic diversity, and hence, the basis for variation in susceptibility to disease, response to therapy, and the clinical outcomes. The error, however, is not restricted to those occurring during DNA replication but also encompasses mutations that occur during recombination, DNA damage, and impaired repair mechanisms. For example, slippage strand mispairing, which typically occurs at the tandem repeats, leads to expansion of di-and trinucleotide repeats in the genome, which are the causes of the so-called triplet repeat syndromes (8). It is not unreasonable to surmise that the rare error rate of DNA replication is the essence of life, because in its absence, the eugenic human species would have been amenable to extinction by invading germs or diseases. The current pandemic of coronavirus-2019 , caused by severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) virus, which is itself a product of a rare error of the replication machinery, is a prime example of the interindividual variability in susceptibility to COVID-2 and its clinical outcomes.

HUMAN GENETIC DIVERSITY
The discovery of the first restriction enzyme by  Table 1) (14).
Nevertheless, only a fraction of genetic variants in human genomes has been detected. The full extent of human genetic diversity is expected to be much greater than that observed so far.

DETECTION OF THE GENETIC VARIANTS
Clinicians are increasing challenged with interpretation of the results of genetic testing performed by third party providers as well as the failure in identifying the pathogenic variant(s) upon genetic testing.
The lack of a commonly practiced approach to genetic testing further compounds the difficulty. Therefore, a brief discussion of the various steps involved in identification of the genetic variants using the current sequencing techniques is expected to inform physicians and the clinical investigators on the strengths and the shortcomings of the current approaches.
WHOLE GENOME AND EXOME SEQUENCING TECHNIQUES.

SHORTCOMINGS OF THE REFERENCE GENOME.
Once the short reads are aligned with the reference genome, nucleotides that differ from the reference sequences are identified as variants. As in the previous steps, confidence in calling a variant directly depends on the number of reads that cover that variant sequence. A low read number does not bode a high confidence in calling a variant. In addition, the reference genome to which the reads are compared has a number of shortcomings that influence accurate and comprehensive detection of the variants. The reference genome, which is a composite of a small number of genomes, does not adequately represent diversity of the human genomes. Consequently, a rare variant that is also present in the reference genome will not be called, and conversely, variants that are not present in the reference genome but are relatively common in another population will be called in the individual genome. This is particularly relevant to populations with different ethnic backgrounds, because variants are typically populationspecific and naturally not expected to be adequately represented in the reference genome (16)(17)(18). Broadening the composition of the reference genome, although valuable in representation, is unlikely to be sufficient in assessing information content of the variants in the clinical setting. The reference genome also contains gaps, particularly in the complex genomic regions, which interfere with proper identification of the variants. Given that the reference genome is constructed on the basis of short-read sequencing, it does not adequately represent large indels or SVs, and hence, is not a robust reference point for their detection (19).
To overcome some of the shortcomings related to the reference genome, population-specific reference genomes comprising contiguous haploid sequence data of each chromosome are being generated (20). In addition, long-read single-molecule sequencing technologies are available that are capable of sequencing of several thousand bases of DNA and de novo assembly of each individual's genome. The long-read sequencing approach is particularly relevant to sequencing of the repeat regions by increasing mapping certainty, as well as detection of variants located in the regulatory regions, and large SVs, which are increasingly being implicated in human diseases (reviewed in Eichler [21]). However, the error rate of current single-molecule sequencing in basecalling, and hence, accurate detection of SNVs and indels, is higher (3% to 15%) than that of short-read sequencing technologies, which limits clinical applications of these technologies (22).   Table 1). Small indels are also common, but SVs are uncommon. However, SVs typically involve more nucleotides than SNVs because a fraction of SVs are large and encompass several million    Each exome contains about 85 and 25 coding indels that affect 3 and 6 nucleotides, respectively, and therefore maintain the coding frame (35). In-frame indels could also lead to a phenotype, as in the case of cystic fibrosis (36). About a third of the coding indels affect 1 or 2 nucleotides, corresponding to about 35 and 5 coding indels in each genome, respectively, and lead to a frame shift. Such coding indels often abolish expression of the involved protein and have considerable biological effects, depending on the tolerance of the gene to mutation.
As for the noncoding indels, those involving 1 nucleotide are the most common, and their numbers correlate inversely with the number of affected nucleotides, that is, the smaller indels are more common than the larger indels (34). Indels located in the  Table 2). As discussed earlier, population frequency of the variant is an important consideration, as rare variants are more likely to impart larger effect sizes than the common variants (17,50). Overall, the population frequency of the genetic variants inversely correlates with their effect sizes, because a common variant seldom exerts a large effect size.
However, it is important to note that rare variants are population specific, and therefore, population frequency of the variants has to be assessed in that context of the specific population in which the variant is identified (16,18,26). In accord with the preceding, de novo variants, defined as variants detected in the index case, but absent in the parents, are more likely to be pathogenic. Each genome has about 50 to 60 de novo variants, the vast majority of which are not expected to exert an effect that is clinically relevant ( Table 1). However, those located in genes pertinent to the pathogenesis of the disease of interest are strong candidates to be pathogenic, and hence, clinically significant.
The TTN gene, encoding the giant sarcomere protein titin, illustrates the challenges one faces in determining pathogenicity of the genetic variants.
TTN is a well-established causal gene for dilated cardiomyopathy (51). Mutations in the TTN gene that     The point was well-illustrated more than 2 decades ago upon detection of myocardial tissue Doppler abnormalities before development of overt HCM in individuals who carried pathogenic variants in genes encoding sarcomere proteins (70,71). The genetic discovery laid the foundation for the subsequent development of PCSK9 inhibitors, which are highly effective in reducing plasma low-density lipoprotein cholesterol levels as well as reducing cardiovascular mortality (72,73). To conclude, the key is to discover the fundamental secrets of the nature and never to be concerned about the immediate clinical or translational impact of the discovery. The impact will become evident over time.