Introduction

Schizophrenia is a highly debilitating and complex mental disorder characterized by impairments in integrating sensory and cognitive functions leading to incoherent perception. Schizophrenia often arises in late adolescence or early adulthood and is typically preceded by a high-risk (prodromal) phase, during which subtle neurocognitive impairments and sub-threshold psychotic symptoms usually emerge1. For this reason, increasing research efforts are focused upon identifying predictive neurobiological markers for early diagnosis in individuals at risk. Heightened risk for the development of a psychotic disorder is associated with schizotypal traits2. Substantial overlapping has been found between schizotypy and schizophrenia at genetic, biological and neurocognitive levels3,4,5, strongly supporting the claim of a continuous nature of schizotypy. Specifically, genome-wide association (GWA) research indicates that a vast number of independent polymorphisms confer risk6,7,8 for psychosis proneness, whereas schizophrenia represents the extreme of these multiple quantitative dimensions. These genetic factors explain about 50% of the schizotypic variance3, whereas the remaining variance can be explained by biological9,10,11,12 and psychosocial13,14,15,16 factors. Taken together, these common genetic and environmental underpinnings have led to the assumption that schizotypy reflects the subclinical expression of the symptoms of schizophrenia in the general population17,18,19.

Several studies of patients at different stages of schizophrenia, including the prodromal phases preceding the onset of the disorder, have recently reported abnormal spontaneous alpha oscillations and altered resting-state functional connectivity of the alpha rhythm20,21,22,23. Crucially, alpha rhythm is generated by a complex interplay between thalamic and cortical pacemakers and propagates via short and long range cortico-cortical, cortico-thalamic, and thalamo-cortical connections24,25. It is well known that large bursts of alpha (7–13 Hz) band activity dominate the human electroencephalogram (EEG) during periods of rest26. However, whether abnormalities of resting-state alpha rhythm are already present in individuals with high schizotypal personality traits, and can be taken as early risk predictors for these individuals, is still unknown. There is plenty of evidence in support of this hypothesis.

Murphy and Ongur27 reported decreased peak alpha frequency in first episode psychosis patients. Specifically, they found alpha slowing in posterior regions, while peak alpha frequency did not decrease significantly in frontal and temporal regions. Further supporting the clinical relevance of abnormal peak alpha frequency to schizophrenia, there is evidence available for therapeutic effects of individualized alpha frequency transcranial magnetic stimulation on the negative symptoms of schizophrenia28.

Moreover, according to the idea that schizophrenia originates as a disconnection syndrome29, first episode psychosis patients show abnormal functional connectivity, as estimated using the phase lag index (PLI), especially in the alpha rhythm. Hence, also alpha PLI, in addition to peak alpha frequency, seems to be valuable for producing clinical significance already at the onset of schizophrenia20.

Interestingly, alpha waves propagate from anterior-to-posterior and from the cortex to the thalamus30,31,32,33, so that cortico-cortical and cortico-thalamo-cortical connections allow frontal regions to drive posterior alpha activity. All in all, this evidence led to the idea that alpha rhythm plays an important role during top-down processing in healthy conditions34,35,36. Rest EEG connectivity studies specifically testing the association between abnormal directionality of the anterior-to-posterior propagating alpha rhythm and schizophrenia risk are currently unavailable though. However, the altered alpha band activity recorded in ultra-high-risk individuals during an auditory oddball task has been proposed to indicate that a deficit in top-down control exists before the onset of schizophrenia37. In sum, while there is now enough evidence available indicating that abnormal rest EEG alpha rhythm characterizes already the onset of schizophrenia, its potential role as an early marker of a predisposition toward schizophrenia in non-clinical populations is still poorly investigated20,38.

To fill this gap in the literature, in the present study we first established the association between sub-clinical schizotypy and specific indices of rest EEG alpha oscillatory activity (i.e., individual alpha peak frequency, IAF) and connectivity using both non-directional (i.e., weighted phase lag index, wPLI) and directional (i.e., time lag index, TLI) indices. The choice of investigating resting-state EEG features, rather than task-based EEG signals, has been motivated by both theoretical and practical reasons. From a theoretical standpoint, schizotypy is defined as a stable personality trait, thus possible alterations of EEG activity should be present already during resting-state. On a practical note, due to its simplicity and versatility, resting-state EEG recording can be considered an efficient screening tool that enables task-independent standardized measures for large scale assessments. Moreover, diagnostic accuracy in early onset psychosis and schizophrenia might be improved using machine learning approaches39, as already demonstrated in studies using genetic and neuroimaging features6,40. Similarly, machine learning methods allow to classify EEG features and thus identify clinical conditions based on EEG patterns. Indeed, recent machine learning studies in schizophrenia patients identified altered amplitudes and time lag in frontal event-related potentials41, lower levels of frontal alpha amplitude during working memory tasks42, functional alterations of alpha power spectrum over occipital-parietal and frontal areas43 and altered thalamo-cortical connectivity44. Therefore, we trained and tested a pattern classifier to create a predictive model able to assess the presence of high schizotypal traits based on the alpha resting state activity of an individual.

Results

Resting-state EEG activity was recorded in a sample of 48 participants. Participants were divided into two groups based on the presence of schizotypal traits, estimated via Schizotypal Personality Questionnaire (SPQ)45. Two groups of 24 participants were subsequently created, based on SPQ score: a Low Schizotypal Group (LSG) with scores below the 20th percentile (Mean score: M = 7.62, Standard Error of the mean: SE = 0.52), and a High Schizotypal Group (HSG) with scores above the 80th percentile (M = 43.29, SE = 1.29). EEG was recorded from 64 scalp electrodes at rest for two minutes, while participants kept their eyes closed. An Independent Component Analysis (ICA) was performed for each participant to identify topographies reflecting activity in frontal and parieto-occipital areas for both the left and right hemisphere, representing our regions of interest (ROIs) for functional connectivity and alpha activity analyses.

EEG features

Power spectra over the ROIs, reflecting individual alpha frequency and amplitude (i.e., alpha activity) are shown in Fig. 1. In the following, t-test have been adjusted for multiple comparisons with corrected significance threshold p value of 0.013 for four comparisons (for details see methods). Analyses show a general faster alpha frequency, across groups (HSG vs. LSG) and ROIs (frontal and parieto-occipital), in the right hemisphere (M = 10.38 Hz, SE = 0.12 Hz) compared to the left hemisphere (M = 10.25 Hz, SE = 0.11 Hz) (main effect of hemisphere F(1,46) = 10.37, p = 0.002, 2 = 0.18, 90% CI [0.03; 0.37]). More in detail, a significant three-way interaction (Group x Hemisphere x ROIs, F(1,46) = 4.44, p = 0.040, 2 = 0.09, 90% CI [0.01; 0.23]) suggests that this hemispheric asymmetry in alpha frequency is more prominent at a parieto-occipital relative to frontal level, and specific for the LSG (Mleft = 10.39 Hz, SE = 0.14 Hz; Mright = 10.70 Hz; SE = 0.17 Hz) (one-tailed t(23) = 2.48, p = 0.011, Effect size Cohen’s d = 0.51, 90% CI [0.14; 0.86]). Interestingly, at a parieto-occipital level also a group difference emerged, with faster alpha frequency in LSG (M = 10.70 Hz, SE = 0.17 Hz) compared to HSG (M = 10.09 Hz, SE = 0.17 Hz) (one-tailed t(46) = 2.60, p = 0.006, d = 0.75, 90% CI [0.25; 1.24]).

Figure 1
figure 1

Individual Alpha Frequency (IAF) and Alpha Amplitude. First row: power spectra in the alpha frequency range (7–13 Hz) for left and right frontal and parieto-occipital regions of interest (ROI) divided for high schizotypy group (HSG) in red and low schizotypy group (LSG) in blue. Thin lines indicate single subject spectrum and black lines reflect group means. IAF. Observed alpha frequency peak in Hertz (Hz) for the two groups and for ROIs in both hemispheres (Left vs. Right). Topographies show scalp distribution of alpha frequency peak for the two groups and for the difference between groups. Alpha Amplitude. Observed maximum alpha amplitude in power (10*log10(μv)2) for the two groups and for ROIs in both hemispheres. Topographies show scalp distribution of alpha amplitude for the two groups and for the difference between groups. Corrected significant differences are marked with black asterisks. Uncorrected differences are marked with light-grey asterisks. Error bars represents Standard error of the mean. μv (microvolt); Hz (Hertz).

In line with the notion of a postero-anterior gradient46, our analyses also showed greater alpha amplitude over the parieto-occipital ROI (M = 9.50μv, SE = 0.95μv) compared to the frontal ROI (M = 1.26μv, SE = 0.94μv) (main effect of the ROI, F(1,46) = 135.43, p < 0.001, 2 = 0.75, 90% CI [0.63; 0.81]). Additionally, as for alpha frequency, a similar hemispheric asymmetry emerged for alpha amplitude (main effect of hemisphere F(1,46) = 5.13, p = 0.028, 2 = 0.10, 90% CI [0.03; 0.32]). In particular, the right hemisphere shows lower alpha amplitude (M = 5.16μv, SE = 0.88μv) compared to the left hemisphere (M = 5.60μv, SE = 0.88μv). Importantly, the mentioned patterns of alpha amplitude are differently modulated between groups (significant three-way interaction F(1,46) = 6.30, p = 0.016, 2 = 0.12, 90% CI [0.01; 0.27]). Specifically, the described inter-hemispheric asymmetries in the alpha amplitude distribution are present in the LSG only over the frontal ROIs, with a higher alpha amplitude in the left hemisphere (M = 2.03μv, SE = 1.27μv) compared to the right hemisphere (M = 1.13μv, SE = 1.22μv) (t(23) = 2.51, p = 0.019, d = 0.51, 90% CI [0.15; 0.86]) which however, does not survive multiple comparisons (corrected significance threshold p = 0.013; see Methods). On the other hand, there is an analogous asymmetry shifted back over the parieto-occipital ROI in the HSG (Mleft = 9.39μv, SE = 1.51μv; Mright = 8.37μv, SE = 1.58μv) (t(23) = 2.09, p = 0.048, d = 0.43, 90% CI [0.07; 0.77]), which however, does not survive multiple comparisons (corrected significance threshold p = 0.013; see Methods).

Functional connectivity across groups (LSG and HSG) was estimated based on phase connectivity measures between ROIs in the alpha frequency peak (Fig. 2). In the following, t-test have been adjusted for multiple comparisons with corrected significance threshold p value of 0.017 for three comparisons (for details see methods). An inter-group difference in the wPLI was found over the fronto-parieto-occipital connectivity in the right hemisphere (t(46) = 3.26, p = 0.002, d = 0.94, 90% CI [0.43; 1.44]), with the HSG showing a lower value of wPLI compared to the LSG (Mhigh = 0.08, SEhigh = 0.01; Mlow = 0.15, SElow = 0.02). No differences in connectivity were found between groups over the other considered ROIs (all ts < 0.90, all ps > 0.37, all ds < 0.26). Analysis of the time lag yielded similar results, further confirming the nature of the differential effect between HSG and LSG selectively observed between frontal and parieto-occipital ROIs in the right hemisphere (t(46) = 2.85, p = 0.006, d = 0.82, 90% CI [0.32; 1.31]). More in detail, the mean values of the TLI suggest that not only timing, but also the direction of communication in the right hemisphere is significantly different in the two groups (Mhigh = 7.04 ms, SEhigh = 4.67 ms; Mlow = − 11.21 ms, SElow = 4.37 ms). Indeed, while in the LSG the phase of the frontal precedes the phase of the parieto-occipital ROI, in the HSG the opposite trend is observed.

Figure 2
figure 2

Connectivity Measures. (A) Correlation matrices of the weighted Phase Lag Index for the High Schizotypy (HSG) vs. Low Schizotypy (LSG) Groups in the selected subclusters of electrodes in the frontal and parieto-occipital ROIs. Topographies show grand mean wPLI connectomes for each electrode with wPLI > 0 (non-random connectivity). Bar graphs below show the corresponding mean wPLI in the selected ROIs for connectivity analyses (see methods). (B) Phase angles distributions over time, plotted as sinusoid oscillations, in the selected ROIs for connectivity analyses based on time lag. First row shows oscillations in left and right parieto-occipital ROIs (inter-hemispheric connectivity) for HSG and LSG. Second and third rows show oscillations in the frontal and occipito-parietal ROIs in the left and right hemisphere respectively (intra-hemipsheric connectivity). Bar graphs below show the corresponding mean Time lags for left and right inter-hemispheric and intra-hemispheric connectivities in both groups. Error bars represents Standard error of the mean. ms (milliseconds); Hz (Hertz).

Taken together, EEG data show group differences in the right hemisphere in terms of alpha speed (slower alpha in the HSG) and in the hemispheric distribution of alpha amplitude (asymmetry shifted from frontal to parieto-occipital ROIs for HSG compared to LSG). Moreover, connectivity measures in the HSG support the disconnection syndrome hypothesis29 by pointing to a reduced and even altered fronto-parieto-occipital connectivity in the right hemisphere.

Machine learning pattern classifiers

According to information gathered from the literature on EEG markers of schizophrenia, a pattern classifier has been trained and tested with the aim of creating a predictive model able to assess the presence of schizotypal traits based on the alpha resting state activity and connectivity (Fig. 3). In particular, we have looked at alpha peak frequency, shown to be generally reduced in schizophrenia21,27, alpha amplitude, also shown to be altered in schizophrenia23,47, and measures of fronto-parietal connectivity (including wPLI and TLI), as schizophrenia has been defined as a functional disconnection syndrome within the default mode network encompassing fronto-parietal networks20,48,49.

Figure 3
figure 3

EEG features included in the machine learning pattern classifier.

Moreover, in order to distinguish between HSG and LSG, these features were included as possible input features of the models either for both right and left hemisphere or separately for each hemisphere. At the same time, we looked separately at frontal and posterior features with particular focus on posterior alpha activity and intrahemispheric connectivity. This was determined by the fact that literature on schizophrenia has shown an important discrepancy between studies supporting a general alteration of these features22,23,50, other pointing to a more specific hemispheric alteration20,38,51,52 and finally findings revealing altered alpha indices specifically at posterior sites and excluding frontal regions in patients with first episode psychosis27. Using a tenfold nested cross-validation (CV) procedure, repeated 1000 times, in the test sets of the outer CV, we observed a sensitivity of 78.8% (4.9%) [mean (standard deviation)], specificity of 69.7% (5.2%), balanced accuracy of 74.3% (3.8%) and area under the receiver operating characteristic – curve (AUC) of 0.83 (0.04). The plot of the average ROC curve across 10 000 outer loops of the nested CV (10 folds × 1000 repetitions) along with the standard deviation and the 99.9% confidence interval of the average is shown in Fig. 4. The ranking of the best feature combination has been reported in Fig. 5.

Figure 4
figure 4

Average ROC curve. The average ROC curve across 10 000 outer loops of the nested CV (10 folds × 1000 repetitions) along with the standard deviation and the 99.9% confidence interval of the average is shown.

Figure 5
figure 5

Ranking of feature selection. The relative frequency (%) with which each feature combination was selected across all outer CV folds in 1000 repetitions is shown. The feature combinations have been reordered on the occurring frequency.

The first three combinations included only the right hemisphere activity and, overall, accounted for the 96.2% of the total occurrences. The remaining 3.8% was accounted for by the left + right hemisphere activity. Feature combinations including only left features were never selected.

In general, machine learning results suggest that differences between groups are maximal over the right hemisphere, or in other words, that right hemispheric features of alpha activity are the best predictors of schizotypal traits.

Discussion

In the present study, we used a machine learning approach to identify biomarkers able to predict which individuals are at higher risks of developing schizophrenic symptoms. To the best of our knowledge, our study represents the first application of machine learning techniques to investigate resting-state EEG features in assessing the presence of high schizotypy traits. In particular, we investigated possible alterations of resting-state alpha oscillatory activity, known to be impaired in patients with schizophrenia, in the healthy population either high or low in schizotypy traits. To this aim, different parameters of resting-state alpha band activity have been included in the study: (1) alpha-amplitude (2) individual alpha frequency (IAF) (3) and indices of phase connectivity between distant neural ensembles within the alpha frequency range. Additionally, measures of alpha-amplitude and IAF have been investigated separately across the two hemispheres (left and right clusters of electrodes) and individually for two different regions of interest (frontal and parieto-occipital clusters), along with both interhemispheric (between parieto-occipital electrodes of left and right hemisphere) and intrahemispheric (between frontal and parieto-occipital electrodes of both left and right hemisphere) measures of alpha phase connectivity. Crucially, the above-mentioned indices of alpha activity were used as input features of a state-of-the-art pattern classifier, with the aim of building a model able to predict the presence of schizotypy traits of an individual based on resting-state alpha activity, thus even establishing a distinct and influential role of alpha oscillatory activity as electrophysiological marker of schizotypy dimension. The pattern classifier used a nested stratified CV loop to perform, at the same time, in the inner CV loop, selection of the best feature combination in discriminating the presence of schizotypal traits, as well as the best classifier (between C-SVM with linear kernel and logistic regression with L2 penalty) along with their hyper-parameter optimization.

Our results indicate that alterations of IAF are present in the high schizotypy group. In particular, high schizotypy seems to be accompanied by a decreased IAF in the right occipital component. The slowing of resting-state alpha activity has been reported in schizophrenia patients21, as well as in first episode psychosis27. Here, we found that this slowing is present also in the high schizotypy population, but restricted to the right parieto-occipital region.

In addition to these changes in IAF, high schizotypals also demonstrated an asymmetry of alpha-amplitude in parieto-occipital regions, with reduced alpha-amplitude in the right hemisphere. Diminished resting-state alpha-amplitude has previously been found both in individuals with psychotic disorders and their healthy relatives22,23, although even opposite results have been reported47, as well as null differences between first episode psychosis patients and healthy controls27. These discrepancies could be due to between-study differences in methodology and EEG data processing, illness chronicity or diagnostic heterogeneity, some of which could be clarified by systematically investigating the identified biomarkers both in clinical and subclinical populations.

Finally, measures of long-range connectivity in the alpha range have shown a distinct alteration in HSG. Specifically, similar to IAF results, differences between groups emerge only in the right hemisphere, with a reduced connectivity between frontal and parieto-occipital areas in HSG as measured by wPLI that, furthermore seems to follow an opposite parieto-frontal direction as shown through TLI measures. Moreover, no differences in interhemispheric connectivity have been observed between HSG and LSG. Altogether connectivity analyses point to reduced resting-state functional communication between frontal and parietal areas within the alpha range in the HSG, in line with previous results describing a similar pattern in schizophrenia patients20 but restricted to the right hemisphere in schizotypy.

In order to affirm that high schizotypals can be identified on a single subject level based on neural indices, thus by observing their resting-state alpha activity, a pattern classifier has been trained and tested which, if the hypothesis holds, should be able to successfully differentiate between HSG and LSG. Apart from being able to predict the participants group membership (low vs. high schizotypy) based on all the examined features of resting-state alpha activity (74.3% of balanced accuracy), two interesting dissociations have emerged. The first one concerns frontal and parieto-occipital features, with solely the latter being able to successfully differentiate between the two groups (frontal features do not seem to contribute to differentiating the two groups; see Fig. 5). This outcome is in line with previous findings which have not revealed altered alpha indices over frontal regions in patients with first episode psychosis27. Secondly, differences in alpha activity between the HSG and LSG seem to be present only in the right hemisphere (best feature rankings including only left features were never selected). Throughout the years, several theories have been proposed describing schizophrenia as an interhemispheric imbalance. Amongst the proposed genesis of this imbalance, both hypo-functioning and/or hyper-functioning of the one hemisphere have been hypothesized53,54,55, although often without firm empirical grounds to dissociate these alternatives. Our results support an altered pattern within the right hemisphere showing a slower and reduced alpha activity and an exclusively intrahemispheric altered communication between frontal and parietal regions. Following the dimensional approach, one could hypothesize either that the right hemisphere dysfunction could be more pronounced in schizophrenia, or alternatively, can instantiate the insurgence of first psychotic symptoms by spreading the right hemisphere dysfunction to the left hemisphere, thus resulting in a more generalized disconnection syndrome. Current research in schizophrenia does not point to the idea of a hindered right hemispheric activity but to a more spread dysconnectivity so future research should systematically point to identify whether the neurophysiological prodromal phase leading from a subclinical to a clinical psychotic condition may reside in an interhemispheric spreading.

Conclusions

Overall, our results clearly demonstrate that the altered patterns of resting-state alpha activity observed in schizophrenia patients can be tracked already before the onset of the psychosis. Specifically, we observe the presence of an altered pattern concerning the resting-state alpha oscillatory activity in the high, relative to the low schizotypy population. Thus, alpha activity seems to represent an important electrophysiological marker, which may likely pave a higher risk of developing schizophrenia spectrum psychopathology according to specific indices as pointed out by our study. Interestingly, these differences are most evident in the right posterior region and its’ functional connections with the right anterior cortex. The right parieto-occipital deficit and fronto-parietal disconnection syndrome in the HSG may significantly alter both sensory processing per se, but also top-down influence on controlling sensory processing. Therefore, this research offers a firm ground to future investigations to identify patterns of neural and cognitive developments anticipating at high-risk individuals and in describing neurocognitive (dis)functioning across the schizophrenia spectrum.

Although representing a valid tool for detection and measurement of schizotypy45, SPQ still represents a self-report questionnaire, thus facing various methodological issues56,57. Crucially, current research has employed computational methods in order to affirm the relevance of identified oscillatory features as important electrophysiological markers of schizotypy. Specifically, by building a pattern classifier, we were not only able to describe the existence of possible differences in alpha activity between HSG and LSG, but also to demonstrate their ability to successfully identify individuals high in schizotypy. Thus, this approach offers a novel accurate diagnostic tool able to detect biomarkers defining at-risk individuals of developing schizophrenia spectrum disorders, based on resting state alpha activity. In addition, the inclusion of other features (e.g., genetic and neuroimaging data) would likely enhance the over-all performance of the model, although not always feasible, due to time and resource constraints. Moreover, we note that due to the relatively small sample, it would be interesting to confirm the results obtained with our built pattern classifier by extending future machine learning applications to an independent and wider sample of participants. The availability of larger datasets will also pave the way to the adoption of deep learning approaches which may improve the overall performance.

The main aim of the current study was to identify EEG configurational pattern of activity that could represent a fingerprint of schizophrenia proneness. As such, the focus of this study was not on comparing EEG activity across different mental disorders. Therefore, by using state-of-the-art machine learning analysis of the EEG patterns implemented here, future studies should empirically test whether the EEG activity pattern identified here is specific for schizophrenia risk or rather represents a transdiagnostic biomarker of risk for psychopathology.

Finally, the question remains how do altered patterns in alpha activity during rest translate into relevant cognitive processing? Both alpha amplitude and long-range fronto-parietal alpha synchronization have a well-determined roles in visuo-attentional inhibition and selection58, along with occipital alpha peak frequency acting as temporal and spatial sampling mechanism59,60,61,62,63,64,65,66. Therefore, should we expect these altered patterns to persist even during visuo-attentional tasks, leading to reduced attentional efficiency and altered perception in schizotypy? Future research is expected to address these questions, establishing a tight link between schizotypy and schizophrenia, thus enabling an accurate and detailed description of early markers of psychosis.

Materials and methods

Participants

Participants were selected on a sample of 350 students from the University of Bologna based on the presence of schizotypal traits, estimated via Schizotypal Personality Questionnaire (SPQ). Two age- (t(46) = 1.56, p = 0.124, d = 0.45, 90% CI [− 0.12; 1.02]) and gender-matched (χ2 = 0.76, p = 0.38) groups of 24 participants were subsequently created, based on SPQ score: a Low Schizotypal Group (LSG) with scores below the 20th percentile (Mean score: M = 7.62, Standard Error of the mean: SE = 0.52), and a High Schizotypal Group (HSG) with scores above the 80th percentile (M = 43.29, SE = 1.29) who agreed to take part in the present study. As a result, a sample of 48 participants (see Table 1 for detailed demographics) was recruited for electrophysiological data collection.

Table 1 Demographics.

Each group counted one left-handed subject. All participants signed a written informed consent prior to take part in the study, which was conducted in accordance with the Declaration of Helsinki67, and approved by the Bioethics Committee of the University of Bologna. All participants had no neurocognitive or psychiatric disorders.

EEG recordings

Participants were comfortably seated in a room with dimmed lights. EEG was recorded at rest for two minutes, while participants kept their eyes closed68. A set of 64 Ag/AgCl electrodes was mounted according to the international 10–20 system (Fast’n Easy-Electrode, Easycap, Herrsching, Germany). Additionally, four EoG channels were positioned: on the outer canthi of both eyes, as well as above and below the left eye. The right and left mastoids were used as the online and off-line reference, respectively. Ground was positioned on the right cheek of the subject. All impedances were kept below 10 kΩ. EEG signals were recorded with a pass band filter 0.5–50 Hz (as set in Brain Vision Recorder, Brain Products, Gilching, Germany) at a sampling rate of 1000 Hz. Off-line data were resampled at 500 Hz (function pop_resample on EEGLab) and re-referenced to the average of all electrodes.

All EEG analyses were implemented by custom-made routines developed in Matlab R2013a (The MathWorks, Inc., Natick, Massachusetts, United States) using EEGLab toolbox functions (v. 13.0.1)69.

EEG data processing

Resting state EEG data were band-pass filtered (using a Hamming windowed sync FIR filter implemented in the pop_eegfiltnew function on EEGLab) for alpha frequency 6 to 14 Hz, and epoched in 2000 ms temporal windows. An Independent Component Analysis (ICA) was performed for each participant to identify topographies reflecting activity in frontal and parieto-occipital areas for both the left and right hemisphere, representing our regions of interest (ROIs) for alpha analysis (see Fig. 6). ICA method separates EEG data on distinct information sources (i.e., independent components) and provides the weighted projection from each independent component to each scalp electrode70,71,72. In particular, individual alpha frequency peak (IAF) and alpha amplitude were extracted from individual power spectra separately over each ROI in subclusters of electrodes selected by visual inspection of the identified topographies: frontal ROIs (left electrode cluster: F1, FC1, C1, FC3, right electrode cluster: F2, FC2, C2, FC4), as well as parieto-occipital ROIs f (left electrode cluster: PO7, PO3 and 01, right electrode cluster: PO8, PO4 and O2). Weighted phase lag Index (wPLI)73 and time lag were used as indices of connectivity. In particular, inter-hemispheric connectivity was estimated between right and left parieto-occipital ROI and intra-hemispheric connectivity was estimated in both hemispheres between frontal and parieto-occipital ROIs.

Figure 6
figure 6

Regions of interest (ROIs). Templates of relevant components for frontal (A, B), left parieto-occipital (C) and right parieto-occipital (D) components.

First, four templates, reflecting spatial topography of the ROIs, were identified via visual inspection within all ICs: two central frontal components and two parieto-occipital components, one for each hemisphere (left and right). Subsequently, for each subject, relevant ICs were identified via automatic multistep correlational template matching (CORRMAP, 0.80 correlation threshold)74. Topographies of ICs labeled as frontal and parieto-occipital components were visually inspected and back-projected to the data for frequency, amplitude and connectivity analyses75. For each participant, a minimum of one and maximum of three components were identified per template.

EEG features extraction

Individual alpha peak and alpha amplitude were extracted from power spectra of each participant using an automated peak-detection algorithm (function RestingIAF on EEGLab)76. This algorithm uses a Sovitzky-Golay filter (SGF, frequency resolution 0.24 Hz, polynomial order 5 of the SGF), which smooth power spectra and attenuate random noise. Alpha amplitude was defined as the maximum alpha power, expressed in normalized power (10*log10(μv)2). To calculate wPLI, EEG resting state data were divided into 2500 ms non-overlapping windows77. Then the cross-spectrum of the time series signals was calculated and the wPLI estimates the magnitude of the imaginary part of the cross-spectrum. For each participant, wPLI was estimated as a function of individual alpha frequency peak and 14X14 connectivity matrices were generated over the selected electrode clusters (see above). Time lag estimates the mean difference in milliseconds of two time series spectra.

EEG features

Individual independent components (ICs) were analyzed in order to extract electrophysiological features both for frontal and occipito-parietal regions of interest (ROIs) in the right and left hemisphere to be entered in the machine learning pattern classifier.

The following EEG features were extracted:

Individual alpha frequency (IAF)

For each participant, IAF was defined as the exact frequency in the alpha range (7–13 Hz) containing the maximum power. It was extracted from the individual power spectra in the alpha range and calculated using an automated peak-detection algorithm (function RestingIAF on EEGLab)76.

Alpha amplitude

For each participant, alpha amplitude was defined as the maximum power in the alpha range (7–13 Hz), expressed in normalized power (10*log10(μv)2).

Weighted phase lag index (wPLI)

This feature was extracted to calculate functional connectivity in the alpha range. This is a measure of phase-based connectivity calculated in a specific frequency, which accounts only for non-zero phase lag/lead relations between two time series signals73. wPLI is calculated between two neurophysiologic signals and can assume values between 0 and 1. Larger values of wPLI reflect a consistent phase relation between two signals. If the relation between two signals is random, the wPLI value is 0. Connectivity between frontal and parieto-occipital ROIs in the right hemisphere was estimated on the averaged wPLI values calculated over the following electrode clusters: right frontal ROI (F2,FC2,C2,FC4) and right parieto-occipital ROI (PO8,PO4,O2). Similarly, left fronto parieto-occipital connectivity was estimated over left frontal ROI (F1,FC1,C1,FC3) and parieto-occipital ROI (PO7,PO3,O1).

Time lag index (TLI)

This feature adds a further dimension to the wPLI as it provides information about the directionality of the communication between two synchronized signals78. It represents the means of the temporal phase lag in the cross-spectrum between time series signals of the selected clusters and, unlike the wPLI, it offers further insight regarding the temporal dimension of the synchronization. Specifically, TLI is used to determine the averaged phase differences in milliseconds of two considered signals78. Positive values of the TLI indicate a lag in the phase of the first considered signal with respect to the other, thus indicating the directionality of the communication between two synchronized signals.

EEG data analysis

2 × 2 × 2 mixed-model ANOVAs were performed on IAF and alpha amplitude, with the between subject factor GROUP (HSG, LSG) and the within subject factors HEMISPHERE (left and right) and ROI (frontal and parieto-occipital ICs). Specific differences in the alpha activity were further tested both for alpha frequency (with paired and independent samples one-tailed t-tests as a directionality hypothesis was formulated14) and alpha amplitude (with paired and independent samples two-tailed t-tests). Between groups planned comparison were performed on wPLI and TLI using independent samples two-tailed t-tests. p values < 0.05 were considered significant, along with Dunn-Sidak correction procedures for multiple comparison being applied where necessary79, with a corrected significance threshold p value of 0.013 for four comparisons (alpha activity analyses) and a corrected significance threshold p value of 0.017 for three comparisons (connectivity analyses).

Machine learning pattern classifiers

To train, validate and test the classifier, we employed a tenfold nested stratified CV loop (Fig. 7). In particular, empirical evidence suggested that 5- or 10-fold CV should be preferred to leave-one-out (LOO) CV as consistently reported by both current literature80,81,82,83 and state-of-the-art machine learning development tools documentation (see, e.g., https://scikit-learn.org/stable/modules/cross_validation.html).

Figure 7
figure 7

A scheme of tenfold nested CV is represented. The inner CV loop is used to perform feature selection, optimize hyper-parameters and select the best classifier, whereas the outer loop estimates the selected models’ performance.

This strategy allowed us to perform, at the same time, in the inner CV loop, selection of the best feature combination in discriminating the presence of schizotypal traits, as well as the best classifier (between C-SVM with linear kernel and logistic regression with L2 penalty) along with their hyper-parameter optimization. Indeed, given that it is not possible to define a priori which is the best machine learning algorithm with respect to the data and the specific problem to address84, we used two well-established classifiers (C-SVM with linear kernel and logistic regression with L2 regularization), which are generally appropriate choices for reducing overfitting in a small sample. In particular, for a binary classification task, a C-SVM constructs a hyperplane in a high-dimensional space separating the training data into two classes. Since, in general, the larger the margin the lower the generalization error of the classifier, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data points of any class85. On the other hand, logistic regression measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic sigmoid function86. The C hyperparameter of both the C-SVM and logistic regression classifiers takes a value that is proportional to the inverse of the regularization strength used during the training phase. For the C-SVM classifier, e.g., the choice of the C value is a trade-off between misclassification of training examples and simplicity of the decision surface. A low C value makes the decision surface smooth, while a high C value aims at classifying all training examples correctly. In this study, we varied the C value of both the C-SVM and logistic regression in the set {0.1, 0.2, 0.3}. We refer to specialized reference textbooks for a deeper description of these state-of-the-art systems85,86,87. Once the best estimator (determined by the best classifier/hyper-parameter/feature combination) maximizing the balanced accuracy was found in the inner CV, it was re-trained on the outer training set and tested on the test set kept out from the outer CV to obtain an unbiased estimation of the model’s prediction error. This procedure was repeated for each fold of the outer CV. Before each training (both in the inner and outer CV), each feature was standardized with reference to the training set only. Test set data were not used in any way during the learning process, thus preventing any form of peeking effect88.

Since the performance and the selected features may vary depending on how the data are split in each fold of the CV, we repeated the nested stratified CV procedure 1000 times recording the frequency that each feature combination was selected from each fold of the round of the outer CV. Average and standard deviation of the results from all repetitions in terms of sensitivity (the proportion of high schizotypes correctly identified as such), specificity (the proportion of low schizotypes correctly identified as such), balanced accuracy, and AUC were computed to get a final model assessment score in the test set of the outer-CV. The average ROC curve77 across 10 000 outer loops of the nested CV (10 folds × 1000 repetitions) along with the standard deviation and the 99.9% confidence interval of the average was also computed.

We used own code, freely available at https://github.com/sdiciotti/Schizotypy-prediction, developed in Python programming language (release 3.9.1) for data analysis using the scikit-learn module.