Prediction of functional consequences of the five newly discovered G6PD variations in Taiwan

Glucose-6-phosphate dehydrogenase deficiency (G6PD deficiency; OMIM #300908) is the most common inborn error disorders worldwide. While the G6PD is the key enzyme of removing oxidative stress in erythrocytes, the early diagnosis is utmost vital to prevent chronic and drug-, food- or infection-induced hemolytic anemia. The characterization of the mutations is also important for the subsequent genetic counseling, especially for female carrier with ambiguous enzyme activities and males with mild mutations. While multiplex SNaPshot assay and Sanger sequencing were performed on 500 G6PD deficient males, five newly discovered variations, namely c.187G > A (p.E63K), c.585G > C (p.Q195H), c.586A > T (p.I196F), c.743G > A (p.G248D), and c.1330G > A (p.V444I) were detected in the other six patients. These variants were previously named as the Pingtung, Tainan, Changhua, Chiayi, and Tainan-2 variants, respectively. The in silico analysis, as well as the prediction of the structure of the resultant mutant G6PD protein indicated that these five newly discovered variants might be disease causing mutations.


G6PD deficiency Mutation analysis
In silico analysis Structural predication a b s t r a c t Glucose-6-phosphate dehydrogenase deficiency (G6PD deficiency; OMIM #300908) is the most common inborn error disorders worldwide. While the G6PD is the key enzyme of removing oxidative stress in erythrocytes, the early diagnosis is utmost vital to prevent chronic and drug-, food-or infection-induced hemolytic anemia. The characterization of the mutations is also important for the subsequent genetic counseling, especially for female carrier with ambiguous enzyme activities and males with mild mutations. While multiplex SNaPshot assay and Sanger sequencing were performed on 500 G6PD deficient males, five newly discovered variations, namely c.187G > A (p.E63K), c.585G > C (p.Q195H), c.586A > T (p.I196F), c.743G > A (p.G248D), and c.1330G > A (p.V444I) were detected in the other six patients. These variants were previously named as the Pingtung, Tainan, Changhua, Chiayi, and Tainan-2 variants, respectively. The in silico analysis, as well as the prediction of the structure of the resultant mutant G6PD protein indicated that these five newly discovered variants might be disease causing mutations.
© 2019 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).
The comparison sequence of these variants in G6PD protein of different species [2], including Homo sapiens, Mus musculus, Danio rerio (zebrafish), Drosophila melanogaster (fruit fly), and Caenorhabditis elegans were presented in Fig. 2. The in silico analysis using SIFT [3], PolyPhen-2 [3], Mutation Taster [4] and Slicing Finder [5] softwares, as well as the conservation between species and allele frequency in Taiwanese population [6] were summarized in Table 2. Furthermore, the amino acid alterations were presented in the functional domains [7] (Fig. 3) and in partial 3D model of G6PD [8] (Fig. 4). The structure of the resultant mutant G6PD protein were analyzed by HOPE, Have yOur Protein Explained [9] (Table 3).

Mutation identification: sanger sequencing
In 500 G6PD-deficient male newborns detected by G6PD enzyme activity assay [10], nine of which do not carry any of the 21 common mutations described in Taiwan and Southeast Asia using multiplex

Value of the Data
This study extends the G6PD mutation spectrum. The three-dimensional structure illustrates the importance of the amino acid residues related to the function of the G6PD protein.
The in silico analysis served as a tool in determining the functional consequence of the mutations, making it potentially valuable for primary care as well as research processes.
SNaPshot assay [1]. Their dried blood spots used in newborn screening were subsequently subjected to mutational analysis by sequencing. The whole coding exons and exon-intron boundary sequences of G6PD gene were amplified and analyzed by forward and reverse Sanger sequencing. Putative mutations were confirmed by sequencing of an independent PCR product. The study protocol was reviewed and approved by the Institutional Review Board of Taipei City Hospital, Taiwan.

Sequence alignments between species
Conservation of the peptide sequence around the affected residues was assessed by alignment of orthologous and human G6PD sequences with ClustalW2, [2].

Severity prediction and allele frequency in population
Different online algorithms were used to predict the functional consequences of the five variants. The in silico analyses were performed using the SIFT [3], PolyPhen-2 [3], MutationTaster2 [4], and Human Splicing Finder [5] programs. Furthermore, the allele frequency of the alterations in Taiwanese population was listed as provided in Taiwan Biobank [6].

Distribution of mutations along the coding region and protein sequence
Distribution of alterations was highlighted in the coding region and the functional domains [7]. The A at the ATG translational initiation codon was numbered as 1 in reference accession number NM_001042351. The amino acid numbers were counted from the N-terminal Met of human G6PD protein.

3D structure model of wide type G6PD protein
The 3D structure of G6PD variations observed in this study were presented based on the X-ray crystal structure available at the Protein Data Bank from human G6PD protein (PDB code 1QKI) [8].

Prediction of structural effects of variations
When protein structure is important to predict the effects of variants [11], effect of mutations over G6PD protein structure was determined using HOPE (Have yOur Protein Explained) software [9].   The G6PD protein of 515 amino acids contains two binding domains, namely NAD(P)-binding domain (blue box, amino acids 25e210) and C-terminal domain (green box, amino acids 212e503), and two binding sites, namely NAD(P) binding site (left red box, amino acids 38e44) and G6P-binding site (middle red box, amino acids 198e206), and one dimer interface (right red box, amino acids 380e425). The five mutations were highlighted in black in the coding region and protein domains.   The wide-type residue forms a salt bridge with arginine at position 104. The difference in charge will disturb the ionic interaction made by the original, wild-type residue. p.Q195H The wild-type residue forms a hydrogen bond with arginine at position 192. The size difference between wildtype and mutant residue makes that the new residue is not in the correct position to make the same hydrogen bond as the original wild-type residue did. p.I196F The mutant residue is bigger than the wild-type residue and is located in a domain that is important for the activity of the protein and in contact with residues in another domain. The mutation can affect this interaction and as such affect protein function. p.G248D The wild-type residue is a glycine, the most flexible of all residues. This flexibility might be necessary for the protein's function. Mutation of this glycine can abolish this function. p.V444I The mutant residue is bigger than the wild-type residue and is located in a domain that is important for binding of other molecules. The mutation might affect this interaction and thereby disturb signal transfer from binding domain to the activity domain. a Using software Have yOur Protein Explained (HOPE, http://www.cmbi.ru.nl/hope/) [9].