Meta-analyses of cognitive functions in early-treated adults with phenylketonuria

Our study estimated size of impairment for different cognitive functions in early-treated adults with PKU (AwPKU) by combining literature results in a meta-analytic way. We analysed a large set of functions (N = 19), each probed by different measures (average = 12). Data were extracted from 26 PKU groups and matched controls, with 757 AwPKU contributing 220 measures. Effect sizes (ESs) were computed using Glass ’ Δ where differences in performance between clinical/PKU and control groups are standardized using the mean and standard deviation of the control groups. Significance was assessed using measures nested within independent PKU groups as a random factor. The weighted Glass ’ Δ was (cid:0) 0.44 for all functions taken together, and (cid:0) 0.60 for IQ, both highly significant. Separate, significant impairments were found for most functions, but with great variability (ESs from (cid:0) 1.02 to (cid:0) 0.18). The most severe impairments were in reasoning, visual-spatial attention speed, sustained attention, visuo-motor control, and flexibility. Effect sizes were larger with speed than accuracy measures, and with visuo-spatial than verbal stimuli. Results show a specific PKU profile that needs consideration when monitoring the disease.


Introduction
Phenylketonuria (PKU) is an autosomal recessive genetic disease with an incidence of about six cases per 100,000 live births, but with strong regional variations (e.g., Shoraka et al., 2020). It is characterized by variants in the gene encoding the enzyme phenylalanine hydroxylase (PHA) that metabolizes the essential amino acid phenylalanine (Phe) into another amino acid: tyrosine. PAH is mainly present in the liver. When it is not functioning, Phe accumulates in the blood and in the brain since Phe can cross the blood-brain barrier. How PKU impairs brain functions is not completely clear, but several different causes are likely (van Spronsen et al., 2021). High Phe levels may have a direct toxic effect on myelin, the sheet of cells which wraps around axons and increases speed of transmission (de Groot et al., 2010). In addition, no or limited PAH activity will reduce the availability of important neurotransmitters such as dopamine and serotonin. Dopamine is synthesized from tyrosine which will be less available, both because less tyrosine is synthesized from Phe and because Phe will compete with the tyrosine which is supplied by food to cross the blood-brain barrier. Phe will also compete with other essential amino acids, such as tryptophan, from which serotonin is synthesized. Finally, elevated levels of Phe are thought to inhibit the activity of the enzymes tyrosine hydroxylase and tryptophan hydroxylase, which synthetize dopamine and serotonin respectively (e.g., Donlon et al., 2015;Gonzalez et al., 2016;see Boot et al., 2017 for a review). Through all these mechanisms, high levels of Phe seriously damage brain health and, if left untreated, PKU leads to severe disability.
Fortunately, Phe levels can be controlled by following a strict diet that severely limits the ingestion of natural foods containing Phe (most protein foods), replacing them with foods where Phe is naturally low or has been artificially removed, and adding protein substitutes which guarantee adequate protein intake (see MacDonald et al., 2020 for general guidelines). If a strict PKU diet is followed from birth, the most serious consequences of high Phe levels can be avoided. Since the mid-sixties, programs to screen new-borns for PKU have been adopted by an increasing number of countries, allowing recent generations of people with PKU to reach good educational standards and lead fulfilling lives, although possibly not reaching their full potential. The treatment of PKU is one of the great achievements of modern medicine. However, outcomes remain suboptimal and cognitive impairments are well documented both in early-treated children and adults (van Spronsen, 2021). Impairments are especially documented in IQ, speed of processing, and executive functions. Group performance is generally below average, but there is strong individual variability (for reviews see Burlina et al., 2019;Hofman, 2018;Trefz et al., 2011;van Spronsen et al., 2011; for adult results spanning a comprehensive set of functions see Brumm et al., 2004;Palermo et al., 2017). Most of these impairments and variability are attributable to poor metabolic control with many studies documenting strong correlations between blood Phe levels and performance in cognitive tasks Moyle et al., 2007a;Nardecchia et al., 2015;Ris et al., 1994;Romani et al., 2017Romani et al., , 2019; also for differences between groups with high vs low Phe see Bartus et al., 2018;Channon et al., 2007;Palermo et al., 2017). Treatment practices implemented in the last 30-40 years have dramatically reduced Phe levels in people with PKU. However, in most individuals current and historical Phe levels remain well above current therapeutic recommendations. European guidelines recommend maintaining Phe within 120-360 μmol/L until 12 years of age and within 120-600 μmol/L above 12 years (van Spronsen et al., 2017;van Wegberg et al., 2017). American guidelines recommend 120-360 μmol/L throughout life (American College of Medical Genetics and Genomics, ACMG, Vockley et al., 2014). Dietary recommendations have changed over the years. Keeping a strict diet was originally recommended only until six/eight years of age (see Koch et al., 2002 for examples of early discontinuation), then till adolescence (e.g., Report of Medical Research Council Working Party on Pheylketonuria, 1993) and more recently for life (e.g., van Spronsen et al., 2017). These changes, however, are only partially responsible for failures to maintain therapeutic targets. Instead, many individuals with PKU opt to abandon the diet in adolescence or even in late childhood because the PKU diet is expensive, time-consuming, socially difficult, and, despite continuous improvements, unpalatable. Thus, even when on diet, individuals with PKU show metabolic control which is generally far from perfect (e.g., see Romani et al., 2019).
Although there is accumulating evidence that cognitive outcomes in people with PKU are below what would be expected based on control groups, it is difficult to have accurate estimates of the size of the impairments. Understandably, there is variability in the outcomes reported by different studies. This is due to the limited number of participants per study, and to differences in level of metabolic control, both across studies and across participants within studies (see later demographic characteristics of the studies included in our review). Additionally, most studies have assessed only a few functions and with few tasks because of limitations in the resources of research teams and in the engagement that can be requested from participants. Meta-analyses combine results from different studies by standardizing differences between PKU and control groups (using effect sizes; ESs) and then averaging them across studies. This allows better estimates of the size of impairments, thus providing a more comprehensive and accurate picture of which functions are spared or impaired. How tasks are attributed to different functions is necessarily a matter of judgement. Tasks (especially more sensitive, complex tasks) involve multiple functions, and the decision about which 'main' function is tapped by a task is, to some degree, a matter of opinion. However, combining results across studies allows each function to be assessed with multiple tasks overlapping in the skill of interest. This will produce a better overall estimate of the function, reducing the measurement error inherent in each task.
There are several meta-analyses of cognitive performance in children with PKU (DeRoche and Welsh, 2008;Waisbren et al., 2007). DeRoche and Welsh (2008) used meta-analyses of 33 studies to assess (Hedges' g) effect sizes for IQ and for executive functions such as flexibility, planning, inhibition and working memory. They found significant differences for all functions, with the largest difference for flexibility (ES = 1.15) and the smallest for IQ (ES = 0.42). Waisbren et al. (2007) estimated the effect of Phe levels on IQ by carrying out a meta-analysis of within-study correlations. In early-treated children, IQ correlated with life-time Phe (r = − 0.35), Phe at 0-10 years of age (r = − 0.34), and Phe at the time of testing (r = − 0.31; N of studies = 12, 14, 29, respectively). An increase in Phe of 100 µmol/l predicted an average reduction in IQ of 1.3, 1.9, and 0.5 points, respectively, for the different Phe measures. Finally, Fonnesbeck et al. (2013) carried out meta-analyses to predict the probability of low IQ from a range of blood Phe levels. They pooled results from 17 unique studies which assessed individuals with PKU of different ages (children, adults or mixed). They found significant associations between the probability of IQ being < 85 and Phe level averaged in critical and non-critical periods (with a coefficient = − 0.036), but no relationship with concurrent Phe.
These meta-analyses provide important results on the relationship between IQ and Phe. A recent review by Hofman et al. (2018) addressed, instead, the issue of the profile of cognitive outcomes in adults with PKU (from now on AwPKU). It summarized the results of 16 studies, investigating cognition in early-treated AwPKU and associations with Phe levels. The most consistent deficits involved vigilance/sustained attention (assessed by tasks such as the Continuous Performance Test; Rapid Visual Processing and Dot tasks; and impaired in 6/7 studies), working memory (assessed through both verbal and visuo-spatial tasks; and impaired in 12/15 studies) and motor skills (assessed by tasks such as Digit-symbol-coding, Grooved pegboard, Motor screening test, and pursuit tasks; and impaired in 5/6 studies). Other functions were impaired less consistently. For example, complex executive functions (assessed by tasks such as the Brixton test, Eithorn Perceptual Maze Test, Six Element Test, Stockings of Cambridge etc.) were impaired in only in 9/17 studies. This review provides important insights into the pattern of impairment present in PKU, but it reports study results in terms of presence/absence of significant impairment, without estimating size of impairment across studies. Only three meta-analyses of cognitive performance in AwPKU have been published so far. However, in a motivated way, these reviews focused on what are common impairments in PKU (mainly regarding executive functions) and did not address the breadth/spectrum of cognitive functions considered by Hofman et al. (2018). Moyle et al. (2007a) assessed effect sizes for inhibition, attention, motor control, processing speed and working memory, combining results from five studies and 220 AwPKU. Hedges' g ESs for these functions ranged from .30 to .90 (number of measures 3-10). Working memory was not significant, and the largest effect size was obtained for tasks measuring processing speed. Bilder et al. (2016) carried out a meta-analysis of 13 studies focusing on executive functions. There were significant effects for attention (11 studies; 252 participants; ES= − 0.74); inhibition (6 studies; 119 participants; ES = − 0.41) and flexibility (7 studies; 157 participants; ES = − 0.43). Working memory was again not significant (5 studies; 112 participants; ES = − 0.08). Albrecht et al. (2009) carried out meta-analyses of studies which used timed tasks in children (N = 229), adolescents (N = 106) and adult with PKU (N = 174). Tasks assessed Trail making; Simple RT; Choice RT; Interference; N-back; Visual search; and rule identification tasks such as the Wisconsin card sorting test. Effect sizes were highly significant in all age groups (for children =− 0.31; 95 % CI: − 0.43, − 0.19; adolescents = − 0.13; 95 % CI: − 0.29, − 0.03; adults =− 0.54; 95 % CI: − 0.64, − 0.44), but they were significantly moderated by the average Phe level only in the child and adolescent groups.
The aim of the current study was to provide a comprehensive picture of the cognitive profile of early-treated AwPKU by considering a large set of functions and tasks as done by Hofman et al. (2018), but also combining the results from different studies through meta-analyses rather than merely counting numbers of significant results. In addition, and in contrast with previous meta-analyses which included mixed-age groups, we limited comparisons to cohorts of people > 16 years old, thus focusing more strictly on outcomes in adulthood. Finally, we used a slightly different measure of ES than previous meta-analyses. We used Glass' Δ as a measure of effect size which standardizes the difference between the means using only the standard deviation (SD) of what can be considered the reference group (the control group in our case). Previous meta-analyses have used Hedges' g ESs, where a difference between means is standardized using a combined SD from both the control and the PKU group. The Glass' Δ ES is more appropriate for clinical groups because one wants to measure the performance of a clinical group against a control group in a way conceptually analogous to a z-score. The fact that Glass' Δ is not affected by the SD of the clinical group is an added advantage, because, in the PKU groups, the SD is generally larger than in controls, since individual variation in performance is increased by differences in disease factors such as differences in metabolic control (e.g., Romani et al., 2019).
Another benefit of our review is that it will provide a snapshot of the characteristics of studies carried out since the early 1990s when the first cohorts of early-treated AwPKU reached adulthood by considering number of participants, age, current Phe levels and cognitive functions assessed. We will assess how ESs are moderated by variables such as average Phe level, age when participants started treatment, or year the study was conducted. More recent studies may include participants with better metabolic control given the adoption of more stringent targets for upper Phe levels. Note, however, that these analyses do not consider variability within studies and thus will underestimate the effects of Phe on cognition. Better estimates should be based on a meta-analysis of within-study correlations (see Waisbren et al., 2007;Fonnesbeck et al., 2013), which is outside the scope of the present study. Our primary aim is to provide estimates of impairment for different cognitive functions. These estimates are important to assess the success of current treatment practices and to have a baseline measure against which to evaluate the success of future policies. Additionally, better knowledge of the typical cognitive profile in PKU will allow affected people, their families, and their clinicians to have more accurate expectations about outcomes and to take strengths and weaknesses of the disease profile into consideration when evaluating educational achievements, job performance, and remediation strategies.

Literature search
We comprehensively searched the literature for published studies which reported cognitive performance in early treated AwPKU and control groups. We used the Web of Science database (WOS) which includes the Medline database, among others, and Embase. The Pubmed database largely overlaps with Medline and was not separately searched (see Bramer et al., 2017). Databases were searched without any limitation for year of study up to May, 2022. The first relevant study was from 1994 (around this time, results from early-treated AwPKU first became available). The following combination of terms were used: PKU OR Phenylketonuria AND Adult* AND cognitive OR IQ OR RTs OR reaction times OR speed OR executive OR inhibition OR flexibility OR attention OR visuo-motor OR spatial OR learning OR language OR memory OR Stroop OR Go no-go.
A flowchart detailing our identification process is shown in Fig. 1. All abstracts and all full articles were read by at least two authors and screened for the following inclusion criteria: • Participants with PKU were adults aged > 16 years; • Participants with PKU were early-treated, with treatment started within three months of birth, or were defined as early-treated; • Outcomes were measured in a quantitative way through a cognitive task (not through a questionnaire); • An age-matched control group was included; • Both mean and SD of the PKU group and the matched control group were reported or could be recovered.
Disagreements regarding whether a study could be included in the meta-analysis only occurred in a handful of cases and were addressed and resolved by the first author. Our procedure left 26 articles/publications with potential results to include in the meta-analysis. Given our age criterion, the following studies included in previous adult reviews were excluded because they included some younger participants: from Moyle et al. (2007a): Brunner and Berry (1987); Clarke et al. (1987); Luciana et al. (2001);2004;Stemerdink et al. (1999); Griffiths et al. (1995). from Bilder et al. (2016): Antenor-Dorsey et al., 2013;Griffiths et al. (1995); Luciana et al. (2001). A list of adult studies was not provided by Albrecht et al. (2009) to allow a check on the age of participants.

Data extraction
To make sure that only unique sets of data were considered, we used the following criteria: 1. When results included different subgroups, only results for the whole PKU group were included. If only results for the subgroups were available these were separately reported. Exceptionally, Burgard et al. (1997) assessed a French and a German PKU group; only the German group was included since diet was relaxed very early in the French group (after five years old) and a French control group was not included. 2. When results were reported for the same participants at different times, only one set of results was selected according to the following criteria: a) Nardecchia et al. (2015) measured cognitive functions over a 14-year gap. Only the measures taken at time 2 were included, since participants were adults at only this time; b) Weglage et al. (2013) assessed AwPKU at baseline and after five years. Only results at baseline were included, since all participants were already adults at this time and there were no differences between the two testing times. c) Feldmann et al. (2019) assessed a subgroup of the AwPKU assessed by Weglage et al. (2013) after a further 5 years (10 from baseline). Only the Weglage results at baseline were included. d) Schmidt et al. (1994) assessed AwPKU in three conditions (no diet at time of testing, diet, no diet again), only Time 1, baseline measures, were included. e) Sundermann et al. (2011) assessed AwPKU after receiving either a Phe loading capsule or a placebo (within-participant design). Only results for the placebo condition were included whether it was assessed after (group 1) or before Phe loading (group 2). 3. In some studies, overlapping PKU groups were assessed using different tasks. This was the case for Palermo et al., (2017, and for Jahja et al. (2016) and (2017). Results were separately included. However, for overlapping data only results from the most recent study were included.
After applying these criteria, we were able to extract results for 26 separate pairs of PKU and control groups (see Table 5 in the Appendix). For every group, averages and SDs of cognitive measures were entered in a spreadsheet. When several measures were reported for the same task, we limited input to those likely to be more independent. For example, for the peg-board task, we averaged measures from the dominant and non-dominant hand; for the Rey verbal learning test, we reported only two measures: recall over five trials and delayed recall; for the Wisconsin Card Sorting test, we only reported number of categories correct and number of perseverations. However, we always considered both speed and accuracy measures separately since they could be affected differently in people with PKU. We included measures of IQ only from studies where participants in PKU and control groups have been 'roughly' matched for IQ by excluding PKU participants with pathologically low IQ. This is a conservative choice. We will see that in spite of this PKU and control groups differed for IQ. All raw results were extracted from the studies by two of the authors working together (LB and DP). They were then thoroughly and comprehensively checked (and rechecked) by two other authors (CR and AO), with any disagreement resolved by discussion. Moyle et al. (2007a), Albrecht et al. (2009) andBilder et al. (2016) all carried out their meta-analyses based on Hedges' g effect sizes, which are calculated by dividing the difference between the PKU and control mean by a weighted standard deviation (SD) that pools the SDs of the two groups. Hedges' g is almost identical to the best-known effect size measure, Cohen's d, but adds a correction for small samples. We carried out the bulk of our analyses, however, using Glass' Δ, reporting Hedges' g only for comparison in Supplementary materials. The Glass' Δ ES differs because the difference between groups is standardized using only the SD of the reference group (the control group in our case; see Ialongo, 2016). Arguably, therefore, it is a better measure of deviation from normality for clinical samples.

Types of effect sizes
These are the formulas for the three types of ESs.
where X C and X E are the mean values of the control group and the PKU group; S C and S E are the respective standard deviations (SD); and n C and n E are the number of participants in the two groups.
Following Pustejovsky (2016), Glass' Δ ESs were adjusted for small samples using the same adjustment that is applied to Hedges' g. When only t-values (rather than mean and standard deviations) were available (e.g., Aitkenhead et al., 2021) we computed Glass Δ and variance of Glass Δ values according to the formulas given by Pustejovsky (2016).
We checked for unusual effect sizes by calculating the mean and standard deviation of effect sizes for a cognitive function. We eliminated studies when the effect size was more than 2.5 standard deviations from the average effect size for that function. Six measures were excluded on this basis (=2 % of effect sizes).

Combining ESs and statistical analyses for individual functions
To compute individual ESs, the control mean of each measure was subtracted from PKU mean. However, to combine results across tasks and measures we wanted all differences between PKU and control groups to be in the same direction. Therefore, we reversed the difference (multiplied by − 1) when a higher mean was indicative of lower performance, as when performance was reported in terms of reaction time or errors. Thus, negative ES always indicated worse performance for the PKU group.
When combining ESs it is also important to weight effects according to some study variables. More weight must be given to estimates obtained with more participants, using more tasks, and where more independent groups were involved, (i.e., adjusting when several measures came from the same group of participants). To combine ESs by function, we used a random effects model. A fixed effects model assumes that all ESs reflect a fixed, true effect shared across PKU groups and measures. Thus, results would not generalize to different studies with different characteristics. A random effects model, instead, assumes that ESs are sampled from a population of ESs which vary depending on the degree of impairment in different PKU groups and the sensitivity of different tasks. Therefore, estimates consider both sampling error and heterogeneity in the population of ESs. Heterogeneity is smaller when results from different measures are consistent. Moyle et al. (2007a) used either a fixed or a random effects model based on a heterogeneity test. If measures were not significantly heterogeneous, a fixed effects model was used. Albrecht et al. (2009) andBilder et al. (2016) used a random effects model for all comparisons. We also always used a random effects model as the more conservative and theoretically sound model.
To assess significance, we used mixed-model regression analyses. We used a three-level random effects model because we wanted to account, not only for sampling error and heterogeneity in the population, but also for structure in the participant groups, since some measures were taken from the same group of participants. We used 'measure' nested within 'PKU-group' as a random factor, which clustered together measures which came from the same people. Our model, therefore, accounted for potential differences in sensitivity among measures, for potential differences in ability among groups, and for clusters of measures taken from the same people. We carried out our analyses using the Metafor package from R (see: bookdown.org/MathiasHarrer/Doing_-Meta_Analysis_in_R/mlma.html) as follows: M1 < -rma.mv(g,Vg, random =~ 1 | GroupNum / MeasureNum, test="z", data=MyData, method = "REML"). Significance of coefficients was assessed using z values. Complete code for the analyses is available at OSF https://osf.io/ awr5e. The materials are also available through github at: https:// github.com/olsonac/pku_cognitive_functions_meta_analysis.
Studies with a small sample size could unduly inflate ES estimates if they are published more often when they obtain results matching the expectation that the PKU groups are impaired. We examined this possibility using a funnel plot and a sensitivity analysis. With a funnel plot analysis ESs are plotted against a measure of study precision. If precision is plotted on the vertical axis, then one expects that the ESs derived by studies with more precision will cluster closer to the mean at the top (closer to the true value of the effect) and be more widely distributed at the bottom, thus producing a plot with a funnel shape. Moreover, if a publication bias exists, one would expect that some studies at the top of the distribution will be missing on the side inconsistent with predictions because small studies with inconsistent outcomes will be less likely to be published. We will assess a potential asymmetry in the distribution of a funnel plot using an Eggar's test (see Egger et al., 1997). In addition, we will carry out a sensitivity analysis where individual ESs are ordered by precision (based on the number of participants contributing to the estimate) and, then subdivided into bins which include progressively more ESs of low precision. If there is a publication bias and studies with low precision inflate the estimate, this should increase with the addition of low precision studies (for an outline of this logic see in Borenstein et al., 2009, Chapter 30).
We considered the effects of the Phe level of the PKU samples at the time of assessment in three different ways. When we compared groups of measures (e.g., visuo-spatial vs. language tasks), we factored out Phe level since this could influence the size of the effect for a group of measures. Secondly, we ran bivariate Pearson correlations between individual effect sizes, Phe levels, age of diet initiation, and year of study to assess the relationship among these variables. Finally, we assessed the contribution of the average current Phe (as well as the contribution of age of treatment initiation and year of study) to our ES estimates by considering whether these variables were significant moderators.

Cognitive outcome measures
We computed effect sizes for 19 different functions. They can be grouped into different descriptive domains, but our focus is on functions which are theoretically coherent and which can be potentially independently impaired. We considered the following functions, each assessed through different tasks (see also Hofman et al., 2018;Lezak et al., 2012).
Executive functions: • a) Flexibility and planning:  Visuo-spatial cognition: • a) Visuo-spatial skills: Rey Figure copy; Complex figure copy; Perceptual judgment task-shape and function. • b) Visual detection speed: Finger motor speed; Simple detection-RT; Saccade latency.
Social cognition: • a) Reading the mind in the eyes; ANT Facial Recognition -speed and accuracy; ANT identification of facial emotions -speed and accuracy; Faux-pas Recognition Testscore; Identity Test -speed and accuracy; Affect Selection -speed and accuracy; 3 Face test -speed and accuracy.
We also evaluated IQ and an overall effect size averaged across all functions, except IQ, to have an indication of overall performance.
It should be recognized that how tasks are attributed to functions is not always consistent in the literature. Disagreements, however, do not regard whether a function is involved in a task at all, but rather under which function heading the author chooses to categorize a task. Thus, categorization difficulties are mitigated when, as in our case, several tasks are used in the assessment and heterogeneity is considered. Our attributions of tasks to functions followed common assessment practices and were agreed by the three authors with experience in clinical and experimental neuropsychology (SR, LA, SH). 2 A task that was particularly difficult to categorize was Digit-symbol coding. It involves writing the appropriate numbers below symbols according to a key. We attributed this task to flexibility and planning because it involves moving back and forth between the symbol list and the key showing the proper associations. However, this is a complex task which also involves visuo-motor control and working memory (it is facilitated by keeping in mind the association between digits and symbols). The fact that involves a set of different functions (as well as processing speed) could be one reason why it is often severely impaired (e.g. it is the most impaired task in Aitkenhead et al., 2021).
For a complete list of tasks and measures organized by functions, and a brief description of tasks see Supplementary Materials 1. For the number of measures contributing to each function see Table 3.

Characteristics of reviewed studies
The characteristics of the 26 PKU samples included in our metaanalyses and the distribution of studies and participants according to  1 Note visuo-spatial skills refers to the ability to process and analyse visuo-spatial information; visuospatial attention refers to the ability to engage and disengage attention when processing visual displays.
2 For transparency, we report here differences between our categorizations and those of Hofman et al. (2018). With few exceptions, these differences involved us splitting functions into more homogenous subgroups, as shown below: 1. Among executive functions, we separated reasoning from flexibility and planning. 2. H&al had a category 'attentional capacity' encompassing a number of tasks that we attributed to different functions. We attributed choice RT, feature search, conjoined search, and detection with distractors to visuo-spatial attention; the Telephone search test to sustained attention; the California Verbal Learning test to Verbal LTM; the digit span and n-back to STM/WM; the video tracking task to motor skills; the control condition of the Stroop to naming. 3. We distinguished between visual and verbal memory tasks because of previous evidence of more severe impairments with visual stimuli (see Romani et al.,201). 4. In assessing LTM, we did not distinguish between recall and recognition tasks since these conditions are generally very correlated. 5. We attributed Hayling Sentence completion part B and Naming-semantic interference to inhibitory control. H&al attributed them to 'complex language skills'. 6. H&al attributed non-word reading and semantic interference effects to complex language tasks. We attributed them to reading and inhibitory control, respectively. 7. We attributed processing prosody and emotional context to complex language skills. H&al attributed them to basic language skills. 8. We attributed Digit-symbol coding to flexibility and planning. H&al attributed it to motor skills (both attributions are equally reasonable). 9. Finally, H&al had a category named 'Processing speed' which brought together tasks which tapped different functions, but where speed of processing was measured, for example, simple motor tasks (e.g., motor screening test, saccadic latencies, simple detection, simple RT, Trail making part A) as well as tasks that involved reasoning (Stockings of Cambridge initial thinking time). We have separated tasks according to function, as well as according to RTs or accuracy measures when available.
nationality are shown in Tables 1 and 2. A full list of studies and their characteristics is presented in Appendix 1. Participants were all young adults aged up to their early 40 s. Within and across studies there were similar numbers of male and female participants. Average current Phe level was well above the current target of 120-600 µmol/L recommended by European guidelines (average of our PKU groups = 954 µmol/L), but there was high variability both across studies and among participants within studies. While treatment always started within three months of age (this was our selection criterion), age of treatment initiation varied within this period. In our analyses, we will consider potential effects of Phe level and age of treatment initiation on ESs.
All studies assessed AwPKU who were under the care of metabolic centres and had been continuously monitored since infancy. For most studies, it can be assumed that diet was continued at least until adolescence, although this was not always explicitly stated. In two samples, diet was interrupted by 10 years of age: Ris et al. (1994), N = 8 participants; and Ullrich et al., (1996), for an unspecified number of their 8 participants. These samples were only a small proportion of our participants (<16/756, ~2 % of all participants). In Channon et al. (2004), the 25 participants discontinued the diet after 10 years of age, but no more specific details were given. When age of diet discontinuation was reported this was specified in Appendix 1.
To best appraise the consequences of PKU, PKU and control groups should be matched for the education and SEC of the parents because these variables will affect outcomes independent of any effect of PKU. However, among our reviewed studies 10/26 = 38 % matched PKU and control groups for their own education/SES rather than their parents'. This is a conservative choice. It will control for possible environmental effects since the SES of adult participants will be related to that of their parents. It will also, however, reduce any potential differences from controls. Education/SES are related to cognition. Thus, matching groups for education/SES will necessarily reduce differences in cognition. Therefore, the profile shown by our review could be attenuated compared to the profile one would have obtained with unmatched samples.
The large number of AwPKU and controls contributing to our metaanalysis gave excellent power to assess overall differences in performance. Moreover, PKU groups were assessed with a variety of tasks tapping different functions (N = 220 different ESs excluding IQ), allowing generalizations across functions. We had less power for the investigation of individual functions since each has been assessed by only a subset of studies: a mean of 5.5 studies (SD = 3.2) and 12 measures (SD = 7.4) per function.

Effect sizes by functions
All individual Glass' Δ ESs for different tasks and measures, organized by function, are reported in the Supplementary materials 2; Hedges' g ESs are also included for comparison. Table 3 shows Glass' Δ estimates for each individual function weighted by variance, number of tasks administered, and number of independent PKU samples who carried out the tasks. Examples of forest plots are shown in Fig. 2. A complete listing of all the forest plots is presented in Supplementary materials 3.
With the exception of spelling, all functions showed negative ESs, reflecting worse performance of PKU groups compared to controls. The performance of AwPKU was − 0.44 of a SD below that of the controls (unweighted average = − 0.40) and even more impaired for IQ (ES − 0.60). Not all the pooled effect sizes reached significance given the stringent criteria we used for assessment, but overall, results paint a picture of clear and widespread impairment. Importantly, however, there was great variability among functions, with some showing no or very small differences and others showing a strong impairment. The largest impairments were, in order, in reasoning, visuo-spatial attention RT, sustained attention, visuo-motor control, flexibility and planning, visual learning/LTM, social cognition, higher language skills, verbal STM/WM and visuo-spatial skills (all p < .05, ES range = − 1.02 to -0.18). Performance, instead, was normal or close to normal in spelling, visuo-spatial attention accuracy, language accuracy, inhibition, simple RT and naming RT (p > .09). Among language tasks, phonological tasks, Table 3 Summary of weighted Glass'Δ effect sizes for different functions. Significance established with a random effects model and taking into consideration variability of tasks and groups of participants (with group nested in study). Highly significant results are shown in bold. Heterogeneity refers to whether effects are or are not homogeneous across measures. which showed a marginally significant impairment (p = .07), have a strong monitoring component. Verbal learning/LTM and Visual STM were also borderline (p = .07 and p = .06). As expected, within-function heterogeneity was often significant since the degree of impairment varied depending on measures and PKU groups.

Publication bias
Our overall estimate of ES including IQ was − 0.46 (CI= − 0.57, − 0.35; N = 233). To examine the possibility of a publication bias, we first plotted Glass' Δ effects against the size of the PKU sample as a measure of study precision. The resulting funnel plot showed a moderate but non-significant asymmetry (Egger's test, using PKU sample size as moderator: z = − 0.66; p = .51; see Fig. 5 in Appendix 2). To run a sensitivity analysis, we ordered all our individual ESs (N = 233) by precision, based on the number of participants contributing to the estimate, and subdivided the data into 11 bins. The first bin included the first 40 most precise ESs. Each of the following bins included an additional 20 measures (the last one had 13), progressively increasing the proportion of less precise measures. Fig. 3 plots cumulative effect sizes calculated for each of these eleven bins. If a publication bias significantly distorted our estimates, estimates should have increased across the bins. This did not happen. Estimates remained relatively stable around the final estimate of − 0.44 (ranging and between − 0.37 and − 0.53) and were never close to the "no difference" value of 0.

Differences among functions
As noted above, performance was not uniformly affected across functions and measures. Here we assess differences between groups of measures e.g., between tasks measuring speed vs. accuracy and between language vs. visual tasks. Moreover, we assessed whether ESs remained significant even when we considered tasks where speed was not a premium.
To assess a contrast between speed vs accuracy, we considered all tasks where performance was measured both ways. The effect size was much stronger for speed than accuracy measures (N = 39 each; unweighted mean ES = -0.24 vs -0.56; weighted ES -0.19 vs -0.47). We assessed the significance of this difference using a mixed model analysis with 'type of measure' (RTs vs accuracy) and current Phe of the group as Fig. 2. Examples of forest plots showing effect sizes for differences between adults with PKU and age-matched controls grouped by cognitive function. Glass's Δ effect sizes corrected for small sample size are pooled to calculate an overall effect size (black diamond) that takes into account the number of tasks nested within independent PKU samples. PKU worse= PKU worse than controls; PKU better= PKU better than controls. % Wt= is the relative weight of each study in the calculation of the overall effect size. The study by Aitkenhead et al. (2021) did not report the means and SDs of PKU and control participants, but z scores of the PKU group from the control group. We estimated Glass's Δ effect sizes from these scores.
fixed factors, and 'task nested within group' as a random factor. Results showed a significant main effect of type of measure (coefficient for acc/ rt difference = − 0.25, z = − 3.29, p < .001), a marginally significant, but small main effect of Phe (Phe: coefficient =− 0.0007, z = − 1.81, p = .07) and a significant interaction type of measure X group (coefficient = 0.001, z = 2.88, p = .004). Differences between speed and accuracy ESs were generally smaller for groups with higher Phe, which were more uniformly impaired. To assess significance, we ran a mixed model analysis with 'type of measure' (this time: verbal vs visual stimuli) and current Phe of the group as fixed factors, and 'task nested into group' as a random factor. Results showed a significant effect of visual/verbal task-type (difference coefficient = − 0.29, z = − 2.37, p = .02) but no main effect of Phe (coefficient = − 0.0002, z = − 0.57, p = .57) or interaction with Phe (coefficient = − 0.0001, z = − 0.25, p = .81).
Finally, we considered whether the ES for all executive functions taken together (tasks involving flexibility, inhibitory control, reasoning and sustained attention) remained significant when we excluded speed measures and/or tasks with a focus on speed (e.g., complete the task as quickly as possible). The effect size remained significant (N = 50; ES = − 0.51, z = − 4.11, p < .001) indicating that speed is not the only issue in PKU.

Moderation of effect sizes by metabolic control
To assess whether and how effect sizes were moderated by differences in metabolic control among the PKU groups, we carried out correlations where all individual effect sizes for all functions were correlated with three measures: a) Average Phe level at the time of testing. b) Age of treatment initiation. Most studies gave some information regarding treatment initiation that we operationalized as follows: soon after birth: score 1; N = 129; within a month: score 2; N = 52; within 2 months: score 3; N = 10; within 3 months: score 4; N = 19. a) Year of study. It is possible that clinical care and, therefore, metabolic control was better in PKU cohorts reported in more recent publications.
Pearson r correlations among Glass' Δ ESs, average Phe, age of treatment initiation, and year of study are shown in Table 4. There were significant correlations between Glass' Δ ESs, and Phe, with larger impairments in groups with higher Phe. There were also correlations between age of treatment initiation and Phe, with higher current Phe levels associated with later treatment initiation. Finally, year of study was highly negatively correlated with average Phe level and age of treatment initiation. As expected, more recent studies assessed PKU groups with lower Phe and earlier treatment initiation.
We further considered the influence of concurrent Phe average, age of treatment initiation, and year of study on ESs, including these variables as moderators in a mixed model analysis. The adjusted Glass' Δ was the dependent measure, task nested within PKU group was a random factor, and each of the above variables was used as a moderator. Results showed an influence of Phe (coefficient = − 0.0004, z = − 1.94, p = .05). There was no main effect of year of study (coefficient = 0.0004, z = 0.07, p = .95) or age of treatment initiation (coefficient = − 0.02, z = − 0.25, p = .80). There was a year of study X Phe interaction (z = 2.83, p = .005) since earlier studies had a larger range of Phe and, therefore, a clearer effect. There was no age of initiation X Phe (z = − 71, p = .48) interaction. In sum, when Glass' Δ ESs were weighted by precision (rather than using raw values as in the correlation analyses) any effect other than Phe disappeared. The lack of an effect of Phe initiation is not surprising given that our index had a very limited range and only gave a rough idea of when metabolic control was reached. Fig. 4 shows a scatter plot of weighted effect sizes by average Phe. A coefficient of − 0.0004 means that for every 100 μmol of Phe the effect size decreases by 0.04 SDs. In a normal distribution, 1.6 % of the distribution lies between that SD and the point which identifies 50 % of the  area (the mean). Therefore, a decrease of 0.04 SD means losing 1.

General discussion
Our study aimed to provide a snapshot of the state of the literature and current cognitive outcomes in adults with PKU (AwPKU) who were born after the introduction, in the late sixties, of new-born screening policies and consequent early treatment in western countries. We wanted to expand previous reviews by providing estimates of performance for a larger set of cognitive functions assessed with more participants and more tasks. For this purpose, we have computed standardized differences (effect sizes=ESs) between the performance of participants with PKU and controls and then used meta-analyses to combine tasks and measures according to function to provide a profile of PKU outcomes.
Our results come from research articles published from 1994, when the first cohorts of early-treated individuals with PKU reached adulthood, to the present. They show that, at present, cognitive outcomes are suboptimal, resulting, on average, in a reduction in cognitive ability of − 0.44 SD (our weighted Glass' Δ ES) for all cognitive skills combined, excluding IQ, of − 0.60 SD for IQ and of − 0.46 SD overall. These effect sizes correspond to the 33rd, 27th and 32nd percentile in a distribution of 100 scores, which means that, on average, AwPKU lose 17, 23 and 18 index points, respectively, from a mean of 50 compared to age-matched controls. 3 Impairments of this size would be expected to impact educational and occupational attainment. Our estimates are similar to the averages by Moyle et al. (2007a), Bilder et al. (2016), and Albrecht et al. (2009), but they were obtained considering more studies, more participants, and more functions. This increases our confidence in the results.
Suboptimal outcomes were related to suboptimal metabolic control. Average Phe levels in our samples were much higher than the levels recommended by current European guidelines (van Spronsen et al., 2017). We found that an average group increase of 500 µmol of current Phe resulted in losing 8 percentile points (Albrecht et al., 2009 did not find a significant effect in adults, but this was likely due to lack of power) and this estimate is likely to be on the low side since it does not consider intra-sample variability which was high and systematically higher than in control groups (see SDs reported in Supplementary materials 2). It is important to stress, however, that we used only a single value of Phe collected at the time of neuropsychological assessment as an index of dietary control. We did not have enough data to consider the lifetime exposure, or Phe exposures at other ages and we did not consider intra-individual variability. Our review was focused on the profile of early-treated AwPKU and results concerning the relationship with metabolic control have clear limitations (see the later "limitations" section).

Differences among functions
While cognition showed an overall impairment, there were strong differences among functions. The functions which appeared most affected (and significantly affected) were in order: reasoning (ES = 3 Effect sizes can be easily converted into percentiles. An effect size measures the deviation from the control mean in number of SD (or z-scores). Normal distribution tables provide values for the % areas of the normal distribution that correspond to given z scores. If one considers that 50% of the distribution lies to the left of the mean, one can calculate the percentage of the normal distribution which is below a z-point by subtracting the % value provided by the table from 50. This will give the percentile score. . Impairments of this size mean dropping between 16 and 35 index points from an average of 50 in a normal distribution of 100 scores. Other functions were impaired, but to a lesser extent (higher language skills, working memory and visuo-spatial skills) and still other functions (spelling, visuo-spatial accuracy, language accuracy and inhibition) were unimpaired (see also Brumm et al., 2004 for partially consistent results).
Visuo-spatial attention speed and visuo-motor control were among the most severely affected skills. Visual attention is a skill commonly engaged in daily living when we scan for objects in our visual environment. It is tapped by laboratory tasks where participants are asked search for a target in a visual display. This skill was not separately considered by other reviews (i.e., it was part of the more general category "attention" in Hofman et al., 2018), but impairments in congruent tasks were reported in previous studies (see the Albrecht et al., 2009 review reporting severe impairments in 'choice reaction time' tasks and Huijbregts et al., 2002 reporting impairments in similar tasks in children with PKU). Possibly related to these impairments, we also found a significant difficulty in tasks involving visuo-motor control. For example, when participants are requested to insert shaped pegs in consistent holes or track a visual target with a tool (for consistent results from studies not included in our meta-analyses see Griffiths et al., 1995;Luciana et al., 2001;Weglage et al., 1995). This skill is also important for everyday tasks involving manual dexterity.
In terms of executive functions, not all functions in this domain were impaired to the same degree. There were significant difficulties in tasks involving flexibility (see also Bilder et al., 2016;Brumm et al., 2004;Hofman, 2018: for the category 'fluency and set shifting'), sustained attention (see also Bilder et al., 2016;Hofman et al., 2018 for the categories 'attention' and 'vigilance'; Moyle et al., 2007a), and reasoning (see Hofman et al., for the category 'complex executive functions'), but no significant impairment in inhibition (see Bilder et al., 2016 andMoyle et al., 2007a for contrary results). It is possible that difficulties in inhibitory control reduce with age. However, contrary to what is sometimes assumed, difficulties of inhibitory control are not systematically reported even in children with PKU (see the review of Christ et al., 2010 for failure to find impairments in the Stroop task). Moreover, child studies have not always assessed performance in interference conditions against performance in baseline/control conditions, which does not allow one to establish whether inhibitory control is disproportionally affected (e.g., see Jahja et al., 2014). Future studies should confirm whether or not there are difficulties in this domain, independent of task complexity and other skills. Finally, we found weak, but still significant impairments in verbal STM. It is possible that impairments arise mainly when tasks stress a monitoring, WM component rather than storage (also see Bilder et al., 2016 andMoyle et al., 2007a for no significant STM impairments in AwPKUand Christ et al., 2010 for no impairments in the storage component of working memory in children with PKU).
When we compared speed and accuracy measures in the same tasks, the average effect size was more than two times larger for speed than accuracy. This is consistent with processing speed being a key weakness in PKU (see also Albrecht et al., 2009;Moyle et al., 2007a). Our results, however, showed no speed-accuracy trade-offs in the sense that ESs for accuracy measures were still negative, although not always significant (see Albrecht et al., 2009 for a similar observation). Moreover, impairments in executive functions were significant even when performance was not measured in terms of speed, and speed was not the focus of the task. Together, these results suggest that impairments in AwPKU go beyond a reduction in processing speed (see Janos et al., 2012 for similar results with PKU children; for results supporting impairments beyond a generalized reduction in speed of processing). Additionally, we found that performance was generally worse in visuo-spatial than language tasks (memory tasks with spatial vs verbal stimuli; search tasks vs naming tasks; see Canton et al., 2019;Janzen and Nguyen, 2010 for similar results in child studies). These contrasts among functions point to a cognitive profile which is typical of PKU. We will now turn to discuss the specificity of this profile and its relationship to neurophysiological mechanisms.

Specificity of the PKU cognitive profile compared to other populations
PKU deficits in executive functions and sustained attention overlap with similar deficits seen in other conditions affecting brain functions such as closed-head injuries, degenerative disorders and aging (for an overlap of deficits between children with PKU and ADHD see Stevenson & McNaughton, 2013;Burton et al., 2015; for an overlap between children with PKU and HIV see Bisiacchi et al., 2018). These similarities are not surprising. Complex functions involve networks of brain areas, so that lesions in different regions and in the white matter connecting these regions can all create difficulties. To understand and monitor PKU, what is more important is to establish how the PKU cognitive profile differs from what is seen in other disorders. We will provide two illustrative examples considering ADHD and developmental dyslexia.
In both PKU and in ADHD there may be an overall reduction in speed of processing. However, in PKU this reduction occurs across all trials (see Romani et al., 2018 for a demonstration). Instead in ADHD, average speed is unduly affected by a relatively small number of very long RTs due to lapses of attention, (see Kofler et al., 2013 for a review and meta-analysis). In fact, in ADHD there is a preference for speed over accuracy (for more errors in the face of preserved speed see Mulder et al., 2010;Wilding et al., 2007), while in PKU it is the opposite: speed is slowed to maintain accuracy (see also De Felice et al., 2018). Differences with developmental dyslexia are also marked. In developmental dyslexia, the main difficulties involve spelling, phonological processing, and verbal learning. There are less marked difficulties in executive functions (e.g., Di Betta and Romani, 2006;Romani et al., 2008Romani et al., , 2015. This is the opposite of what we have seen in PKU. The pattern of impairment in PKU is closer to what is seen in conditions involving dopamine imbalances and white matter damage such as Parkinson's disease, multiple sclerosis, and healthy aging, but direct comparisons are not available and any similarity should be considered with great caution (e.g., for similarity with Parkinson's see Evans et al., 2004;Tufekcioglu et al., 2016;Velema et al., 2015). A comparison with aging is particularly important given possible interactions between high levels of Phe and normal degenerative processes that arise from aging.
In older adults, as well as in PKU, there is a reduction of processing speed, but accuracy is minimally impaired or normal (for results in the visuospatial domain see Hommel et al., 2004;Folk and Lincourt, 1996;Foster et al., 1995). In older adults as well, there are impairments in executive functions, particularly those involving flexibility and monitoring (see Bucur and Madden, 2010;Delaloye et al., 2009), but inconsistent deficits in inhibitory control (for lack of differences see with the Stroop test, see Delaloye et al., 2009;Verhaeghen and De Meersman, 1998; with the Flanker inhibitory control and attention test, see Hamilton et al., 2017; with the Hayling test, see Delaloye et al., 2009). Furthermore, aging also affects visuo-motor coordination (Hamilton et al., 2017;Volkow et al., 1998) and there are indications of stronger impairments in the visuo-spatial than the verbal domain (see Hale and Myerson, 1996;Lawrence, Myerson, and Hale, 1998;Shafto and Tyler, 2014;but also, Park et al., 2002 for contradictory findings). In contrast, however, memory and learning are clearly affected in aging (see Salthouse, 2004), but less affected compared to other functions in PKU.
In PKU, impairments in processing speed may derive from damage to white matter tracts and more pronounced impairments in visuo-spatial than verbal tasks is consistent with matter in posterior regions being more vulnerable (see Pilotto et al., 2021 andAnderson andLeuzzi, 2010 for a review). Impairments in tasks involving executive functions may stem from a combination of impairments in white matter tracts-since efficient and fast connections among brain areas are crucial for complex tasks-and depletion of neurotransmitters (i.e., dopamine; see Boot et al., 2017). Similarities with aging may derive from an overlap in neurophysiological damage (for demyelination in aging see Eckert et al., 2010;Grueter and Schulz, 2012; for a decline in brain dopamine activity see Backman et al., 2010). Instead, relatively more preserved memory and naming in PKU may derive from preservation of the subcortical structures involved in memory encoding, and of cortical structures involved in storing information.

Limitations
Our results are important in providing an estimate of outcomes in current cohorts of AwPKU and in highlighting the pattern of spared and impaired abilities. However, it cannot be stressed enough that averages mask strong individual variability (e.g., see Palermo et al., 2017). At an individual level, some AwPKU will display impairments which are much milder or much more severe than the average. Moreover, although cognitive performance is clearly linked to metabolic control, there are indications of exceptions where some individuals are less affected by the toxic effect of high Phe (see Leuzzi et al., 2020, van Vliet et al., 2018. Variability in sensitivity to Phe may be related to environmental factors, but also to individual neurobiological characteristics which may allow some people to minimize the deleterious effects of Phe (see Boot et al., 2017;Dijkstra et al., 2021).
At the group level, several factors may affect our estimates of impairment. Firstly, level of impairment may be underestimated because results are gathered from participants who are in contact with a clinical care team and are willing to engage in research. The metabolic control of AwPKU who are lost to follow-up may be worse. In the opposite direction, the inclusion of older cohorts could potentially overestimate levels of impairment. However, when year of study was entered as a moderator we found no significant effect, showing that impairment was not significantly greater in older cohorts. Lastly, one should consider that future cohorts including older participants may show an increased level of impairment. All our results come from groups of young adults (average 26.5 years old, SD = 4.5) who have generally maintained a PKU diet up to adolescence but have relaxed their diet since. We do not know whether the degree of impairment will remain stable over the years or not. There is little indication of deterioration at present (see Feldmann, 2019;, but this is the first generation of early-treated people with PKU to reach middle age and there is some evidence that white matter damage increases with age (see Nardecchia et al., 2015;Mastrangelo et al., 2015). Cognitive performance may deteriorate accordingly when the brain has been exposed to high Phe levels for many years (see Vardy et al., 2020 for a discussion; see also Pilotto et al., 2021 for evidence of possible interactions between the effect of high Phe and aging).
Although we had enough power to detect differences among cognitive functions, our power to estimate level of impairment for some individual functions was more limited. An extreme example was language functions, where narrative skills and spelling were assessed only in one sample of AwPKU. Moreover, we have noted that attribution of tasks to functions has an arbitrary element, since tasks are never pure and typically involve more than one ability. However, the fact that we considered a variety of tasks when estimating ESs and effects were estimated considering the heterogeneity of the measures significantly ameliorated this difficulty.
Finally, our estimates of the impact of metabolic control on cognitive impairments were limited. For most studies we could assume that a phenylalanine restricted diet, in some form, was maintained until adolescence, but time of diet discontinuity was not provided by most studies. This is, in fact, difficult to establish since diet adherence may vary in degree at different times. Phe levels at different ages were not systematically available, nor were measures of Phe fluctuations, which could be equally important in determining outcomes (see Romani et al., 2017Romani et al., , 2019. Ideally, meta-analytic assessments of the effects of Phe on performance should be based on pooling within-group correlations (see use of this methodology by Waisbren et al., 2008 with PKU children andFonnesbeck et al., 2013 with mixed-age groups) and considering effects of concurrent Phe, historical Phe levels, and Phe fluctuations to partial out effects. Even stronger results will arise from longitudinal studies where the effects of changes in metabolic control on cognition are assessed within participants, thus controlling for differences in socio-economic status, education, and genetic potential that may confound outcomes in between-participant studies. Unfortunately, these studies are at present very limited (see Thomas et al., submitted for publication).

Conclusions
Our results show that treatment in current PKU cohorts was successful in avoiding the more severe cognitive impairments linked to untreated disease. Still, there remain significant cognitive impairments that vary in magnitude across different functions. The average impairment was close to half a SD below the control mean, but, while some cognitive functions were close to normal (verbal memory, naming, spelling, inhibition, visuo-spatial attention accuracy), others were severely impaired (reasoning, visuo-spatial attention RT, sustained attention, and visuo-motor control) and average group performance implies much more severe impairments in a portion of individuals. These impairments may potentially curtail the aspirations of AwPKU and affect their ability to cope with the commitments of work and family life. Suboptimal outcomes were linked to suboptimal Phe levels suggesting that differences from controls may be further reduced if metabolic control improves with more support, better dietary supplements, and with the increased introduction of pharmacological and genetic treatments. The disease-specific cognitive profile identified by our review will be important to consider when monitoring disease progression and the effectiveness of new treatment interventions (see also the recommendations of the European Guidelines; van Wegberg et al., 2017).

Statements and declarations
There were no conflicts of interests affecting this study. There was no independent funding support for this study from any firm or body.

Appendix 1
See in Table 5.

Table 5
Demographic and metabolic characteristics of the PKU groups included in the meta-analysis. NF = number of females. SD-standard deviation. Birth=treated close after birth. Ed= Education; SE=socio-economic status; m=month/s. All groups continuously monitored for most groups it is difficult to ascertain will when diet was maintained although it can be assumed that it was maintained till adolescence. * Phe as IDC> 18.