In this study, we aimed to utilize guided machine learning to test the most promising translational EEG measures (resting EEG power, and auditory chirp oscillatory variables) in a large heterogeneous sample of individuals with FXS, to address the sensitivity and specificity at correctly grouping individuals for EEG measures that consistently separate FXS from CON at the group-level [6, 7, 9, 12, 26]. In developing and validating a biomarker for use in clinical applications, the biomarker must map on to a particular mechanistic or biologic process, it must show reliability across multiple testing, and it must be able to provide information about participants at the individual level rather than via group statistics [27]. Although definitive mechanistic explanations for resting and chirp EEG abnormalities in FXS have not been formalized, the translational nature of these measures to preclinical rodent models has provided the background in which to do so [10, 11, 13, 14, 28]. In fact, a number of studies in the fmr1 KO mouse both in vivo and in vitro have narrowed the focus for circuit and molecular level understanding of biologic contributors to both resting and chirp EEG findings in FXS. In particular, studies in which fmr1 was conditionally knocked-out in excitatory cells in forebrain cortex [17] and inferior colliculus [19] have begun to dissociate features of the chirp EEG to subcortically-mediated synchronization deficits that are then inherited by auditory cortex from cortically-driven gamma power enhancements. Cortically mediated gamma power enhancement is also supported mechanistically by findings of enhanced synchronization between layers 2/3 and 5 in auditory cortex, indicating that local cortical circuits are hyper-excitable in the gamma frequency range [29], whereas subcortical structures drive flexibility to changing task demands such as those presented by the linearly changing synchronization frequency in the chirp task [19]. Resting power abnormalities in FXS may reflect similar mechanisms for gamma power enhancement [16, 30] or may reflect thalamic or hippocampal inhibitory deficits via changes to cross-frequency coupling between theta, alpha and gamma power [9, 12, 31]. Reliability across multiple testing remains an open question, although preliminary work suggests moderate to strong reliability for resting EEG in a wide range of individuals with FXS [32] and moderate reliability for gamma power during auditory tasks, at least in young individuals with FXS [33]. In this study, we utilized guided machine learning to delineate individual level predictions for diagnostic group and subgroup membership (FXS or CON, sex and mosaicism status), to address the third requirement for an effective biomarker.
We predicted that chirp EEG variables would be more successful at reliably differentiating FXS from CON, and that multiple variables composites from the same task would enhance signal-to-noise ratio in neural data and thus be more robust identifiers than any single variable. In slight contrast to our prediction, best-performing variables across chirp and resting EEG performed equally well overall, with the best-performing variables (in each case a composite) classifying FXS from CON with AUC values in the excellent (> .80) range for both tasks. Of note, classification strength from an algorithm based on Bayes rules, such as was used here, does not directly correspond to prediction strength with frequentist statistics (i.e., there is not a linear relationship between prediction strength and statistical magnitude). Therefore it is entirely possible to find larger group separation effects for the chirp task statistically while classification of individuals can be performed equally well for both chirp and resting tasks.
The overall best-performing measure for classifying all individuals with FXS was a composite of alpha peak across frontal, posterior, and whole head areas, with an AUC in the excellent (> .90) range. Performance did not noticeably increase for alpha peak measures when applied to more homogeneous sub-groups, suggesting that alpha peak is generally affected across all individuals with FXS, regardless of sex or mosaicism status. This finding is consistent with recent research showing a general slowing of the alpha peak frequency for both males and females with FXS [9](but see [26]or an example of an intermediate PAF slowing effect in females with FXS).
Sub-group analyses showed more differentiated performance for the remaining EEG tasks and measures. For resting EEG, both frontal theta and all whole-head relative power variables performed relatively poorly for classifying females with FXS from female CON, but performed better for classifying males with FXS and non-mosaic males with FXS from male CON, suggesting that relative power abnormalities are more consistent in individuals with FXS with the least FMRP [34]. Conversely, all power variables, which consisted of absolute power and task-evoked power measures rather than relative power, performed the best at classifying females with FXS for the chirp task, with the highest AUC value over all comparisons (AUC = .9520). The cross-validation error term for this measure was somewhat increased relative to those for other best-performing sub-group analyses, however, which may be reflective of the smaller sample size for females with FXS and must be interpreted with caution. Regardless, absolute power, particularly in the frontal region in which the chirp data was measured, has been recently found to differ in dynamic utilization and timing between males and females with FXS, and may reflect different underlying mechanistic and compensatory processes [35].
Importantly, all best performing variables for classifying diagnostic groups consisted of combinations of variables rather than any one single variable. One simple explanation for this finding may be the enhancement in signal to noise when using multiple measures that reflect similar pathophysiology. A more plausible explanation may be that any one measure is not reliably different in FXS; rather it is the pattern of changes in EEG across multiple measures that is inherent to the pathophysiology of FXS. One might expect similar concerns when classifying FXS from other conditions like autism spectrum disorder [36], and schizophrenia [37], where some EEG phenotypes may overlap, but the overall pattern of EEG changes across all measures is specific to each diagnostic group. Because the machine learning algorithm used here, the naive Bayes classifier, evaluates equal contributions from linear combinations of input variables, solutions from this classification strategy readily translate to single linear composite variables that can then be used for clinical correlations and to track response to treatment.
One critical point of discussion is the application for which a particular biomarker will be used clinically. In the case of FXS, a genetic test is definitive for diagnostic purposes, and to classify molecular genetic mosaicism-status, therefore EEG would be considered redundant and more time-consuming for these purposes even with perfect classification performance. The purpose of delineating diagnostic groups and sub-groups for this study was not to replace current gold-standard diagnostic testing, but to determine which EEG phenotypes best characterize the FXS brain at the individual level. In this sense, a reliable marker that is mechanistically linked to specific biologic processes may be more useful than broad behavioral testing or clinical global impression scales in quantifying target engagement and biologic consequences for novel treatments, given that it is individually present in a specific diagnostic group or sub-group, for examples non-mosaic males but not mosaic males or females with FXS. Better or more robust treatments may emerge from efforts to differentiate the EEG phenotype of FXS from other NDDs using similar methods as well.
While the findings of this study are robust, given the use of the NBC, and have important implications for the FXS field, this study had some limitations. Although the naïve Bayes classifier is very robust to smaller sample size, some of the sub-groups samples, in particular non-mosaic males, were relatively small compared to the overall sample. However, the cross-validation error was generally robust for the non-mosaic male sub-group on best performing variables, somewhat mitigating this concern. Secondly, ideally in a biomarker validation study retest reliability would also be considered, however retest data was not available for all individuals (but see [32]) and is thus an ongoing pursuit for future study. Third, we only analyzed classification accuracy within task, and did not evaluate combined accuracy across rest and chirp tasks. This was done deliberately to determine whether one task was inherently better performing, thus reducing participant burden during data collection. Our findings indicate that neither task is clearly superior to the other in terms of classification performance, and thus the recommendation is to utilize the task most fitting to the clinical question and targeted sub-group. Both tasks performed very well at classifying individuals when composite variables were evaluated, indicating that including multiple tasks in an EEG battery as a matter of course simply to increase accuracy may involve diminishing returns. One counterpoint to this recommendation, however, is the short amount of time (5 minutes or less) that was required to collect the eyes open resting EEG data used in this study, which may not represent sufficient clinical burden to justify removal.
Group-level comparisons and robust translational work have provided a strong background for considering resting EEG and auditory stimulus-evoked EEG tasks such as the chirp task as candidate biomarkers for disruptions to neural processes in FXS [6, 9]. This study marks the first effort to delineate the ability of these biomarkers to individually characterize and classify the FXS brain, a required step in biomarker validation. We found that both resting and chirp EEG robustly classify individuals by diagnostic group and genetically mediated sub-group, showing both sensitivity and specificity against typically developing controls. Future research will determine specificity against similar conditions with overlapping phenotypes (i.e., other NDDs, like idiopathic autism spectrum disorder), however the use of composite variables that describe patterns of brain activity, rather than single measures in isolation, increases likelihood of specificity to FXS. The current sample was heterogeneous, reflecting a wide range of clinical severity, age, medication use, and included both sexes, yet composite EEG variables from both tasks performed well in classifying individuals despite these hurdles. This robust performance suggests that the neural composites found here are specific to FXS and not to a research domain criterion-type concept such as intellectual/cognitive ability or autistic features. The differences in sub-group performance across measures may reflect the more sensory focus of the chirp task compared to the more general resting EEG, and thus one task may be more appropriate than another for particular clinical trial applications, depending on the pharmacological target. The breadth of preclinical work using this highly translational phenotypes, including ideally preclinical work focused on composite variables, will continue to further refine the unique mechanisms contributing to individual differences in these phenotypes, and the overlapping mechanisms inherent to neural phenotypes in FXS as a whole. These simple, short EEG composites represent feasible tasks that reflect a range of clinical characteristics in FXS [6, 7, 9, 26], are a reliable indicator of individual deficit, and are robust to noise from single variables, supporting their use in the development and outcome testing of novel drug targets with increased chance of clinical relevance in randomized controlled trials.