Genetic variation in phenylketonuria: analysis of the PAHvdb database

Introduction. Phenylketonuria (PKU) is the most frequent inborn metabolism error. The principal determinant factor for the metabolic phenotype in PKU is the residual enzymatic activity, which is determined by the variations in the phenylalanine hydroxylase (PAH) gene. To date, there are known over 1200 PAH gene variants contained in the PAH International Database of Variations in Phenylalanine Hydroxylase Gene (PAHvdb). Aim. The aim of this study is to elaborate an updated PAH variant report with their type, localization, frequency and severity. Material and method. The PAH variant analysis was made using PAHvdb. PAHvdb presently contains 1285 PAH variants and is connected to BIOPKU genotype-phenotype database which has anonymized information from nearly 18000 PKU patients, with data regarding the genotype and the correspondent phenotype. Results. From the 1285 studied variants, the most frequent variants are substitutions – 1051 (81.8%) and deletions – 150 (11.7 %). The majority (723 - 56.3%) is represented by missense mutations, followed in frequency by frameshift variants - 177 (13.8%), splice ESS (Exonic Splicing Silencer) variants - 127 (9.9%) and nonsense mutations - 86 (6.7%). The most affected region of the gene is exon 6 (169 variants), followed by exon 7 (151 variants), exon 3 (130 variants) and exon 11 (121 variants). The majority of variants are located in the catalytic (57.66%) domain. The most frequent 3 alleles are c.1222C>T with a frequency of 19,2%, c.1066-11G>A with a frequency of 6,8% and c.782G>A with a frequency of 5.5%. From the total 1285 variants, 488 (38.5%) cause a severe phenotype, 57 (4.9%) cause a moderate phenotype and 74 (6%) cause a mild phenotype; in the case of 666 variants (50,6%) which have a low allelic frequency the metabolic phenotype couldn’t be established. Discussions. The majority of the variants are substitutions, missense in the catalytic domain of the gene, on the 6th exon. Approximatively 50% of the alleles are found in single patients so they can’t be used for phenotypic prediction. The majority of variants are causing a severe phenotype. Conclusion. The existence of a database with a large number of PKU patients and PAH variants brings an important contribution to the understanding of the genotype-phenotype relation and to the capacity of better phenotypical prediction based on genotype.


InTRoduCTIon
Phenylketonuria (PKU), caused by a mutation in the phenylalanine hydroxylase gene (PAH), is the most frequent inborn metabolism error [1], with a global incidence of 1:24.000 newborns [2]. PAH vari-ants, with autosomal recessive transmission, lead to a decree in enzymatic activity and a phenylalanine (Phe) accumulation to neurotoxic levels [3]. In more seldom cases, the hyperphenylalaninemia is produced by tetrahydrobiopterin (BH4) deficit, a cofactor to Phe metabolization [4].
The determinant factor for the phenotypic expression in PKU is the residual enzymatic activity. According to the pre-treatment levels of Phe, PKU classifies in: (i) classic PKU (cPKU) in which the enzymatic activity is completely (or almost completely) abolished, leading to a plasmatic Phe level of over 1200 μmol/L (20 mg/dl); (ii) mild PKU (mPKU) with a residual PAH activity and Phe plasmatic levels between 600 and 1200 μmol/L (10-20 mg/dl); and mild hyperphenylalaninemia (HPA) with plasmatic Phe levels of 120-360 μmol/L (2-6 mg/dl) [2]. Some authors use an intermediary form of PKU -a moderate form, with Phe plasmatic levels of 900-1200 μmol/L (15-20 mg/dl) [5]. The large phenotypic spectrum is due mainly to the many genetic variants. Presently, there are known over 1200 PAH mutations, contained in the PAH International Database of Variations in Phenylalanine Hydroxylase Gene (PAHvdb) [6].
PAH gene, responsible for the PAH enzyme, was identified on the human chromosome 12q22-24.1. It has a length of 90 kb (171 kb if the untranslated regions -UTR flanking regions are included), with 13 exons and 3 domains: regulatory, catalytic and oligomerization domain [1].

AIM And IMPoRTAnCe of THe STudy
The objective of this article is to elaborate an updated report of the known PAH variants with their type, localization, frequency and their phenotypic implications, aiming to contribute to a better understanding of phenotype-genotype relation, critical in the disease severity prognostic.

MATeRIAl And MeTHod
This article is an original study realized with data extracted from PAHvdb.
PAHvdb (PAH International Database of Variations in Phenylalanine Hydroxylase Gene) [6] contains 1285 PAH variants. It is connected to the BI-OPKU genotype -phenotype database which offers supplementary information. BIOPKU database [7] gathers anonymized information from nearly 18.000 PKU patients, providing data about their genotype, the corresponding phenotype, BH4 response (when reported) or pre-treatment Phe levels (when reported).
The collected information was used to establish some indicators like: • arbitrary assigned value (AV), • allelic phenotype values (APV), • allelic frequency The AV, used for the first time in 1998 by Guldberg et al. is a numeric value assign to each genetic variant depending on the metabolic phenotype se-verity. Initially, this classification was made using 686 patients with 133 different PAH variants [8]. Depending on the AV, the PAH variants are divided in: (i) variants that determine cPKU (AV=1), (ii) variants that determine mPKU (AV=4) and variants that determine HPA (AV=8). In the original article, the authors used an AV=2 to describe the moderate forms of PKU, not used in this study.
APV also defines the phenotypic severity of PKU by making a genotype -phenotype correlation [9]. In the BIOPKU database APV is calculated using over 10500 PKU or HPA patients, with over 800 different PAH variants. With values between 0 and 10 (0 being the value assigned to the most severe form of cPKU and 10 to the mildest form of HPA), APV classifies PKU in cPKU (APV= 0 -2.7), mPKU (APV= 2.8 -6.6) and HPA (APV= 6.7 -10); an -11 value signifies an unknown APV.
The data accumulated from the PAHvdb and BI-OPKU database provides information to a series of algorithms used to determine the protein stabilitythe principal determinant mechanism of the metabolic phenotype and the key in understanding the genotype -phenotype correlation. These algorithms are FoldX, SIFT, Polyphen 2 and SNPs 3D.
FoldX is an algorithm that uses an empiric force field that can establish the energetic effect of a mutation and also the energetic interaction of protein complexes [7,10].
SIFT (Sorting Intolerant From Tolerant) works on the supposition that some important position in the sequence of a protein was preserved during the evolution process and mutations at these positions may alter the protein function. SIFT sorts this mutation, on the important positions (intolerant) from the other mutations, with less impact (tolerant) [11].
PolyPhen-2 (Polymorphism Phenotyping v2) is an algorithm that uses physical consideration to compare a normal gene to a mutated one, and to assess the possible impact of the mutation on the gene structure and functionality [7,12].
SNPs 3D identifies, by protein analysis, which protein substitution may destabilize the protein [13].
The allele frequency was calculated based upon 16.270 alleles contained in the BIOPKU database in August 2015 [7].
For the statistical processing of the data IBM SPSS Statistics Version 26 was used. A p<0,05 was considered statistically relevant.

ReSulTS
From the 1285 analyzed variants, the majoritarian type was substitution, followed by deletion -150 (11.7 %); the rest of the types together represent 6,5% from the total (Figure 1). The most affected region of the gene is the 6 th exon (169 variants, representing 13,2% from the total), followed by the 7 th exon with 151 variants (11,8% from the total), 3 rd exon (130 variants reported at this level, representing 10,1%) and 11 th exon with 121 variants (9,4%) ( Figure 2) Most of the variants are located in the catalytic domain (57.66%), 18.28% are located in the regulatory domain and 5.29% on the oligomerizator domain. The rest of the variants (18.75%) are found on the introns or on the UTR regions.
Information related to enzymatic activity is available at 140 variants from the 1285. The enzymatic activity is positively correlated with AV and AVP (p=0,004).
AV was reported in the case of 549 variants of the 1285 and APV in the case of 587 variants. Based on AV, 421 variants (32.8% from the 1285 total) are with AV = 1, representing a severe phenotype -cPKU, 65 variants (5.1%) have an AV = 4, representing a moderate severity -mPKU and in 63 cases (4.9%) the AV = 8, defining a mild phenotype -HPA; in 736 cases (57.3%) AV wasn't assigned. Based on the APV, 488 variants (38.5%) have a severe phenotype, with 456 of these having an APV = 0, indicating the most severe form of disease; 57 variants (4.9%) are with a moderate phenotype and 76 (6%) have a mild phenotype. The AVP wasn't reported in 666 variants (50,6%). AV and APV corelate positively (p<0.0001).

dISCuSSIonS
Despite the progress made in the last decades in the diagnosis and the treatment of phenylketonuria, the complete picture is not known yet. Globally pro- The most frequent variants in each metabolic phenotype gress is made in the identification of new PAH variants, and in the establishment of a genotype -phenotype correlation, all this with the objective of the improvement of disease evolution prediction. With the increase in molecular testing accessibility, the new PAH gene reports have also increased. The existence of a database with a large number of information related to genotype and phenotype is a great benefit in the understanding of the relationship between them. This study enlarges the sphere of knowledge regarding phenylketonuria by adding updated information of the existing PAH variants, their localization on the gene and their impact on the metabolic phenotype.
The majority of variants are substitutions, missense, in the catalytic domain, on the 6 th exon. The values obtained in this study, about the distribution of the variants' localization and the type are similar with the ones in previous studies [2,14,15].
The genotype isn't the only factor that influences the phenotype. It can vary according to the promptitude of dietary treatment initiations, the access of the patients to the treatment, treatment compliance, and other environmental and epigenetic factors. This is the reason why some variants, with a low frequency, don't have an APV attributed. The APV in the case of the variants with the frequency under 0,01% is not reliable [6]. Surprisingly, about 50% of the allele are found in single patients and can't be used for phenotypic prediction [2].
The majority of known variants produce a sever phenotype. These variants also appear with a larger frequency in the PKU patients, the first 10 most frequent severe variants summing 53,1% from the total of the cases, compared with 11,59% in the case of the 10 most frequent mPKU and 7,81% in the case of the 10 most frequent HPA.
Indifferent to the genetic variant the principal determinant mechanism of the phenotype expression in PKU is the decrease in protein stability [16]. The existing algorithms, used in the establishing of the impact of a mutation in the gene functionality, are used to evaluate new types of treatments [17].

ConCluSIonS
The ongoing research in phenylketonuria generates continuous information. The existence of a large database of PKU cases and PAH variants brings an important contribution to the genotype -phenotype relationship understanding. This ongoing research has the aim of increasing the phenotype prediction capacity.
This study offers an updated report of the PAH variants, of their type, localization and effect.
The actualization of knowledge in this field is of great interest, as the variant identification is becoming more extensive and more exact.