Schizo-Net: A novel Schizophrenia Diagnosis Framework Using Late Fusion Multimodal Deep Learning on Electroencephalogram-Based Brain Connectivity Indices

Schizophrenia (SCZ) is a serious mental condition that causes hallucinations, delusions, and disordered thinking. Traditionally, SCZ diagnosis involves the subject’s interview by a skilled psychiatrist. The process needs time and is bound to human errors and bias. Recently, brain connectivity indices have been used in a few pattern recognition methods to discriminate neuro-psychiatric patients from healthy subjects. The study presents <inline-formula> <tex-math notation="LaTeX">$\textit {Schizo-Net}$ </tex-math></inline-formula>, a novel, highly accurate, and reliable SCZ diagnosis model based on a late multimodal fusion of estimated brain connectivity indices from EEG activity. First, the raw EEG activity is pre-processed exhaustively to remove unwanted artifacts. Next, six brain connectivity indices are estimated from the windowed EEG activity, and six different deep learning architectures (with varying neurons and hidden layers) are trained. The present study is the <inline-formula> <tex-math notation="LaTeX">$\textit {first}$ </tex-math></inline-formula> which considers a large number of brain connectivity indices, especially for SCZ. A detailed study was also performed that identifies SCZ-related changes occurring in brain connectivity, and the vital significance of BCI is drawn in this regard to identify the biomarkers of the disease. <inline-formula> <tex-math notation="LaTeX">$\textit {Schizo-Net}$ </tex-math></inline-formula> surpasses current models and achieves 99.84% accuracy. An optimum deep learning architecture selection is also performed for improved classification. The study also establishes that Late fusion technique outperforms single architecture-based prediction in diagnosing SCZ.


I. INTRODUCTION
S CHIZOPHRENIA (SCZ) is a serious mental disorder that impacts an individual's ability to think, feel and behave clearly. Moreover, continuous recurrent episodes of psychosis are a common disorder condition [1]. Hallucinations (seeing things or hearing voices that aren't there), delusions (fixed, incorrect beliefs), paranoia, and disorganized thinking are other frequently complained occurrences [2]. In recent years, SCZ has been considered a disorder of "dysconnectivity". Dysconnectivity of the pre-frontal cortex is caused by developing hippocampal injuries, according to animal models [3]. Soon the concept of SCZ was derived considering that both heredity and organic brain ailment were implicated [4]. However, until Johnstone et al. published research employing computed tomography in 1976, the organic aspects of the disease were overlooked. The condition usually appears in early adulthood, but it is reported that men have a peak incidence about a decade earlier than women [5]. The reason behind this is still unclear. An equally puzzling but consistent finding is the slight excess of births of people with SCZ in the cold winter [6]. Thus environmental factors too seem to be associated with winter birth that causes neural damage in the fetus/neonate. The cause could be a viral infection, seasonal difference, or other complications during pregnancy or delivery [7], [8]. Nevertheless, a well-defined methodology to treat SCZ is lacking.

A. Background
SCZ related abnormalities are very small and subtle, and thus this makes the detection of the disorder to be complicated without advanced technology. MRI imaging has shown the presence of structural brain abnormalities in SCZ patients. There is also evidence that the abnormalities are neuro-developmental in origin but unfold later in development [9]. However, these abnormalities usually become evident only when the patient develops symptoms. Moreover, as many different brain regions are involved in the neuropathology of SCZ, disturbance in the functioning of a particular brain region cannot fully explain the range of various impairments. Therefore, new models need to be developed and tested to explain neural circuitry abnormalities affecting brain regions, "not necessarily structurally proximal" to each This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ other but functionally interrelated [10]. According to WHO, SCZ can be treated [11]; however, its treatment involves long-term medication and requires early detection of the disease. Determining nervous system damage is tricky since most symptoms overlap in combinations among the different disorders. Similar disorders do not have definitive sources, unique markers, or tests, making diagnosing a neurological disease laborious. Usually, complete medical history and physical exam are used to diagnose such disorders. Moreover, highly trained and experienced experts are needed for correct diagnosis. Other tests might be used, including computed tomography (CT), Magnetic Resonance Imaging (MRI), Ultrasound, and Electroencephalogram. Electroencephalogram (EEG) is a non-invasive technique that records the brain's electrical activity through scalp-connected electrodes. It has been found to help detect a range of neural disorders, viz., brain tumors, inflammations, sleep disorders, and stroke. Due to its portability, non-invasiveness, and excellent temporal resolution, EEG has gained a reputation as a preferred brain imaging tool for diagnosing neurological diseases with high accuracy [12], [13], [14].

B. Motivation
In recent years, deep learning methods have been developed to detect various neurological disorders that utilize EEG signals [15], [75] and solve various classification tasks [76], [77], [78]. Although these methods work well in finding hidden features and patterns from the nonlinear data, they struggle to attain higher-classification accuracy on EEG due to the data being highly complex and the frequent non-cerebral contamination that accompanies it [61], [62]. Cortically generated EEG is often contaminated by non-cerebral artifact origins such as eye blinks, ocular movements, Electrocardiogram (ECG), and Electromyogram (EMG) artifacts. Therefore, it becomes imperative to clean recorded EEG activity before extracting information and training deep learning models [16], [17]. Various methods for the SCZ detection using EEG have been explored. Siuly et al. [18] suggested a method using Empirical Mode Decomposition to handle non-stationary, nonlinear EEG signals, decomposed them into intrinsic mode functions and calculated 21 statistical features. Among the considered classifiers, the ensemble bagged tree classifier produced the best classification accuracy of 93.21%. In another work, Khare et al. [19] used a combination of time-frequency graphs and CNNs to classify SCZ patients and reported an accuracy of 93.36% using the smoothed pseudo-Wigner-Ville distribution-based timefrequency representation. Oh et al. [20] collected EEG data from 14 healthy subjects and 14 SCZ patients and proposed an 11-layered CNN architecture for the classification. The proposed model generated a classification accuracy of 98.07% and 81.26% for non-subject-based and subject-based testing, respectively. In another work, Jahmunah et al. [21] proposed an automated diagnostic tool to detect SCZ, extracting 157 features from both classes. An optimal feature set of 14 was selected using a t-test [22] and an accuracy of 92.91% was achieved for SVM. Dvey-Aharon et al. [23] designed a SCZ detection and classification methodology by advanced analysis of EEG recordings using a single electrode, reporting an accuracy of 88.7%. Recently, brain connectivity measures (as features) have been deemed suitable for classifying EEG activity. The brain is a large network of neurons, and synchronous neural activities at different regions can provide useful information, referred to as brain connectivity. Connectivity between brain regions can be anatomical (structural), functional (by functional integration of separated regions) and casual relationship (effective) [24]. The relationship between brain regions can be described as a brain network whose vertices and edges correspond to brain regions and their connections. If the edges are weighted, they represent the strength of connections with continuous values. Then, an adjacency matrix can be defined, whose elements are the strength of connections between two electrodes [25]. Moon et al. [26] proposed a CNN-based system to learn the representation of neural activities based on brain connectivity to classify emotions in a video by using three different connectivity measures, including Pearson Correlation Coefficient (PCC), Phase Locking Value (PLV), and Transfer Entropy (TE). The highest accuracy of 87.36% is obtained using the PLV measure with a kernel size of 5. Phang et al. [27] proposed a DNN with deep belief network for automated SCZ classification. This study suggested the usage of vector auto-regression-based Directed Connectivity (DC), graph-theoretical Complex Network (CN), and its combination as input features. It is reported that the Combined DC-CN features provide better classification performance with the highest classification accuracy of 95%.

C. Contributions
The present work aims to present Schizo-Net, a framework for reliable and high accuracy SCZ detection using late fusion multimodal deep learning on EEG-based brain connectivity measures. Here, we extract information about the disrupted brain connectivity in an SCZ brain, various phenomena such as synchronization between different brain regions, the directionality of the information, and the correlation of signals have been analyzed. Six connectivity measures have been estimated [28], [29], [30], [31], [32], [33]. The estimated measure helps quantify synchronization and provides information on directionality and causality. The output of these methods was analyzed and exported in matrices. These feature matrices are then used as input to the neural networks. In the present work, we tested several neural network architectures with varying numbers of neurons and hidden layers. It is observed from the experimentation that neural networks were able to learn latent details of the EEG connectivity patterns. Further, late fusion method is implemented and examined to combine the results of different neural networks trained on different connectivity methods. The contributions of the present study include-1) The present study is the first to provide a comparative study taking into consideration a large number of brain connectivity indices for SCZ using EEG. 2) We perform late multimodal feature fusion to evaluate the efficacy of combined feature vectors for SCZ diagnosis. For classification, five different DNN architectures are implemented and evaluated. (Section II, III) 3) Compared to the previous studies, Schizo-Net achieves state-of-the-art results for the fused feature vectors. 4) We also demonstrate experimentally that late fusion outperforms single architecture-based prediction. (Table II) 5) Compared to previous studies, a detailed study has also been performed that identifies SCZ-related changes that occur in brain connectivity, and the vital significance of brain connectivity indices is drawn in this regard to identify the biomarkers of the disease. (Section III)

B. Pre-Processing
During recording, ongoing EEG activity is frequently contaminated by non-cerebral artifacts originating from ocular movements, Electrocardiogram (ECG), and Electromyogram (EMG) artifacts. Moreover, other sources, including background electrical disturbances, noise due to instrumentation, and external electromagnetic activity, also affect the recording. This strong presence of artifacts diminishes the EEG signal quality and may cause erroneous classification. Therefore, it is essential to eliminate artifacts and noise before feature estimation [35]. Preprocessing aims to remove artifacts, improve stationarity, and increase the SNR of the recorded EEG activity. To clean contaminated EEG activity, Makoto's preprocessing pipeline is implemented using the EEGLAB toolbox [36] in MATLAB software. Initially, offset correction is performed on the recorded EEG activity. Next, the channel location data is added, which is essential to understand which channel montage is being followed. Since the continuous EEG data needs to be filtered before epoching or artifact removal, a basic Finite Impulse Response filter is used. Filtering before epoching minimizes the introduction of filtering artifacts at epoch boundaries. The main focus is on the alpha band frequency signal, so the lower and higher edges of the frequency pass band are set accordingly [37]. The EEGLAB algorithm automatically estimates the filter order for the FIR filter.
After filtering, the artifacts are automatically rejected, starting from detecting any bad channels in the data. That is, the channels that are either noisy all the time or are completely flat are removed from the dataset. Next, the rejection of bad portions of data is done using Artifact Subspace Reconstruction (ASR) algorithm [38]. ASR first finds the clean portion of the data, which is referred to as the "calibration data." If a particular data region exceeds 20 times the standard deviation of this calibration data, ASR rejects it. Independent Component Analysis (ICA) is the final pre-processing step [39]. It subtracts embedded artifacts from muscle and eye movements and identifies brain sources from which a particular signal originates. We use Information maximization (Infomax) ICA in this study. Compared to FastICA, Infomax ICA rapidly converges and separates the estimated sources more efficiently. It has better results, but it is more mathematically complex than FastICA, thus requiring more computational resources [40]. For identifying decomposed ICs as artifactual/ non-artifactual, the "ADJUST" processing pipeline is followed [72]. The identified artifactual ICs were subtracted, and inverse ICA was performed to regenerate artifact-free EEG activity.

C. Feature Extraction
In this study, multiple connectivity features are used to capture the neural synchronous activity in different brain regions. This is mainly done to get an overall picture of brain connectivity. There are 3 connectivities-structural, functional and effective. Structural connectivity refers to the connectivity between brain regions which are anatomically connected, while functional connectivity is based on the functional integration of different brain regions. In contrast, effective connectivity is based on a causal relationship [24]. In graph theory, the brain is modeled as a network consisting of nodes representing the brain's different areas or the EEG channels, with edges between them representing the connectivity between those respective nodes [34]. In the present study, six brain connectivity measures are used to compute these matrices, namely Phase-Locking Value (PLV), Partial Directed Coherence (PDC), Phase-Lag Index (PLI), Synchronization Likelihood (SL), Directed Transfer Function (DTF), Pearson's Correlation Coefficient (COR). Here, PLV and PLI provide information about phase synchronization, whereas DTF and PDC allow quantifying the causal interactions between brain regions and provide directions of information flow. SL estimates the generalized synchronization, whereas COR provides information on linear correlations within data. Each preprocessed EEG data segment has a length of 6.5 minutes. Further, the HERMES toolbox [41] is used to perform windowing on the cleaned EEG activity segment with the window length of 1 minute, overlapping 80% between windowed segments. Windowing helps in obtaining a temporal resolution to calculated indices with more trials. The segmentation led to 25 EEG trials corresponding to each subject. From the HERMES toolbox, the extracted data for each index was in the shape of (1, 28). Each element was a 4D matrix (Channels × Channels × Frequency Band × Windows). It is important to express a set of connection features as a 2D matrix to use them as an input to neural networks. This data was then transformed into the shape of (784, 2), where the first element of each row contains a (19 × 19) matrix and the second element is the label in one hot encoded format ([1,0] for H; [0,1] for Sz). Thus, we transformed all connectivity features into a 19 × 19 connectivity matrix, whose (i th , j th ) elements represent connectivity between the i th and j th electrodes. D. Brain Connectivity Measures 1) Phase Locking Value (PLV): Phase locking value (PLV) is a functional connectivity metric that relies on the instantaneous time-series phase [28]. It assumes that when two brain regions are functionally connected, the difference between their immediate phases of signals should be constant. PLV uses relative phase difference only and is defined as: where φ r el (t) represents the relative phase difference at an arbitrary time t. The phase difference can vary between 0 and 1, representing completely independent or perfectly synchronized signals. PLV estimates the relative phase distribution over the unit circle. When X and Y are strongly phase synchronized, the relative phase occupies a portion of the circle, and the PLV is close to 1. If the signals are not synchronized, the relative phase spreads across the unit circle, and the PLV remains low. When dealing with continuous data rather than evoked responses, PLV is also referred to as "Mean Phase Coherence" [42]. Nevertheless, PLV has certain limitations. It is prone to volume conduction effects. The volume conduction effect is the transmission of electric current through the human tissue towards the sensors, and therefore a single source may be seen by multiple electrodes, which can result in spurious PLV values [43].
2) Phase Lag Index (PLI): Phase Lag Index (PLI) addresses PLV's limitations, explicitly eliminating the volume conduction effect [43]. It measures the asymmetry of phase difference distributions between two signals [29]. PLI is defined as, where φ r el (t) represents the relative phase difference at an arbitrary time t. However, PLI discards a significant component of genuine interactions in doing so. It discards phase distributions centered around 0 and π to be robust against the presence of common sources. Thus, mitigating the volume conduction effect [41]. However, discontinuity in this measure limits its sensitivity to noise and volume conduction since tiny perturbations change phase lags into leads and vice versa. For small-magnitude synchronization effects, this problem might become problematic.
3) Directed Transfer Function (DTF): Directed Transfer Function (DTF) measures the information flow between multivariate spectral components [31]. The DTF is defined in the frequency domain being based on Granger Causality and models the time series by Multivariate Auto-regressive (MVAR) processes. In the MVAR model, only lagged effects are modeled among the time series, while instantaneous effects are forsaken [44]. An MVAR process of order p and dimension M, i.e., x 1 (t), . . . , x M (t) is given by: where Ar is a M × M coefficients matrix, and ϵ 1 (t), ϵ 2 (t), . . . , ϵ M (t) are independent Gaussian white noises with covariance On transforming to frequency domain, where H ( f ) is the system transfer matrix representing the relationships between signals and their spectral characteristics.
Here, DTF is defined by: where H i j ( f ) is an element of a transfer matrix H ( f ) of the MVAR model. DTF represents the causal influence of channel j th on channel i th at frequency f . This connectivity measure determines the directional impacts between any given pair of channels in a multivariate data set. 4) Partial Directed Coherence (PDC): Partial Directed Coherence (or PDC) provides a frequency-domain measure based on Granger causality [32]. DC tells us whether and how two structures under consideration are functionally connected. DC emphasizes their relative structural links by breaking their interactions into "feed-forward" and "feedback" components, whereas ordinary coherence focuses on the structures themselves and the reciprocal synchronicity of their activity. PDC was developed as a result, and it leads to structural information while simultaneously modeling several time series. It is based on modeling time series by multivariate autoregressive (MVAR) and is used to reveal the direction of information flow between different brain areas. Mathematically: where A uv ( f ) is an element of A( f ), which is a Fourier transform of the MVAR model coefficients A(t), a v ( f ) is the v th column of A( f ), and the asterisk mark represents the transpose and complex conjugate operations. PDC takes values between [0, 1] because of the normalization criterion. It displays direct channel flows, and unlike the DTF, it is normalized to provide a ratio of outflows from channel v th to channel u th to total outflows from the source channel v th , emphasizing sinks rather than sources. The observed flow intensities are influenced by PDC normalization [45]. 5) Synchronization Likelihood (SL): Synchronization likelihood (SL) [30] is a widely used metric for estimating generalized synchronization in neurophysiological data, and it is strongly connected to the idea of generalized mutual information [46]. It focuses on identifying concurrent patterns to provide a normalized approximation of the dynamical inter-dependencies between multiple time series. As a result, SL is a true multivariate system. Consider M time series x 1 (t), . . . , x M (t). Embedded vectors X p,t1 are reconstructed from each time series and represented in Eq. (8).
Here, p varies from 1 to M (channel number), t 1 varies from 1 to N (discrete-time), l is the lag and d is the embedding dimension. Eq. 9 presents a variable that represents the probability of two embedded vectors being closer to each other than a distance of ϵ.
Here, θ presents the heavy-side step function, which equals 1 for every positive input, else zero, w 1 is a window used to correct auto-correlation effects and should be the order of the auto-correlation time or more, w 2 is a window to sharpen the time resolution of the synchronization measure. Now, for each value of p and t 1 , ϵ p,t1 the critical distance is determined for which P ϵ p,t1 = p r e f ≪ 1. The number of channels H t1,t2 for each discrete time pair (t 1 , t 2 ) within our considered window (w 1 < |t 1 − t 2 | < w 2 ) and where the embedded vectors X p,t1 and X p,t2 will be close together than this critical distance ϵ p,t1 can be determined.
This value, which ranges from 0 to M, indicates how many of the embedded signals "resemble" one another. Eq. 11 defines a synchronization likelihood S p,t1,t2 for each channel p and each discrete time pair (t 1 , t 2 ).
Synchronization likelihood S p,t1 is the average of all t 2 , Pearson's correlation coefficient (COR) is used to detect linear dependencies. It calculates the time-domain linear correlation between two signals x(t) and y(t) at zero latency [33]. It is defined as: where ρ x,y is the correlation coefficient between signals x and y, σ x, σ y are standard deviation of x and y respectively and cov(x, y) is the co-variance. Also, −1 ≤ ρ x,y ≤ 1, where −1 represents complete linear inverse correlation, 0 represents no linear dependence and 1 represents complete linear dependence between the two signals.

E. Proposed Neural Network Architectures
In the present work, five different architectures of neural networks are tested and analyzed (see Table I). A bunch of parameters are involved in designing a neural network like activation functions, loss functions, optimizers, and several neurons. All these parameters are discussed in this section.
For the hidden layers, Rectified linear unit (ReLU) is applied as the activation function. ReLU is amongst the simplest and conventionally used of all activation functions. ReLU does not saturate for large input values, unlike other conventional activation functions, and returns zero as output if the input to it is a negative or zero, but for all positive input values, it returns the number itself [47]. This helps the model train quickly, achieve high accuracy and overcome the vanishing gradient problem, allowing the model to learn faster [48]. It can be simply defined as, y = max(0, x), where, x and y are input and output, respectively. For the output layer, we use Softmax as the activation function due to its ability to convert a vector of numbers to a vector of probabilities. It is configured to output n values, where n is the number of classes, and also helps normalize these values. Each value in the output is interpreted as a probability; thus, the sum of all output values equals 1 [47], [49]. It can be defined as, σ (z i ) = e ⃗ z i n j=1 e ⃗ z i , Here, ⃗ z is the input vector, and n is the number of classes in our classifier. We use Binary cross-entropy as the loss function in our neural network architectures. It can be mathematically defined as, where y i is the actual output value and p i is the probability of output 1. Adam is used as the optimizer due to its ability to achieve results faster than other optimizers [50]. The final results are obtained after averaging over 50 iterations of Monte Carlo cross-validation along with stratification [51]. Architecture-1 contains 2 layers. The single hidden layer contains 2 neurons with ReLU activation function, and the output layer has a Softmax activation function. Similarly, Architecture-2, 3, and 4 contain a single hidden layer with 4, 16, and 32 neurons, respectively. Architecture 5 consists of 2 hidden layers with 16 and 32 neurons, respectively.

F. Multimodal Late Fusion Technique
The brain is a complex organ, and to analyze brain networks effectively, there is a need to look at it from various viewpoints. The above-discussed connectivity measures can help to do that, but it can be taken a step further, i.e., making use of latent features derived from the fusion of all six connectivity measures in a particular manner [27], [52]. In the present work, a multimodal fusion technique known as "Late Fusion" is performed. The structure of this fusion technique is shown in Fig. 3. Here, independent neural networks are trained for individual connectivity feature domains, and the prediction probabilities from the softmax layer of all the domain-specific neural networks are then combined to give a single final prediction. The layer-wise overview of neural network model architectures 1-5 is shown in Fig. 2. The main reason for using the fusion technique was to effectively combine the Phase Synchronization, Causality, and Correlation measures and input the same as a multimodal input feature.

G. Evaluation Metrics
Confusion-matrix based evaluation was used to access the performance of the proposed method [53]. In addition to the confusion matrix-based evaluation metric, ROC curves were also used to evaluate model performance [54]. The same has been discussed in detail in the supplementary materials.

III. RESULTS AND DISCUSSION
Performance of the various neural network architectures in classifying SCZ and non-SCZ samples using brain connectivity measures is evaluated. We used the SCZ EEG dataset [34], open-sourced by the Institute of Psychiatry and Neurology in Warsaw, Poland. The dataset consists of EEG signals from 14 healthy and 14 SCZ subjects. The resting-state EEG signals were recorded with a sampling frequency of 250Hz from 19 channels (Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, and O2). The adjacency matrices and the connectivity graphs are calculated using HERMES Toolbox [41] based on MATLAB. This toolbox specifically studies the functionality and effectiveness of brain connectivity and provides visualizations to analyze the group difference between different classes. Neural network architectures were trained using conventional sequential training. This includes training epochs = 50, learning rate = 0.001 and Adam as the optimizer. In place of dropout, early stopping was used to avoid overfitting (with min_delta = 0 and patience = 10, monitoring "val_accuracy"). Experiments were performed on an Intel Core i5 (5th Gen) system with a 1.6 GHz processor and 8GB DDR4 RAM running Linux (Ubuntu 20.04).
The EEG signals within the frequencies of the alpha band (8-12.5 Hz) [37] are analyzed. A comparison is made among the six connectivity measures and the five neural network architectures. The results are obtained after performing Monte Carlo cross-validation along with stratification. stratification was performed to ensure that in those random splits, the ratio of healthy to SCZ remained consistent. Thus, all the metric values are obtained by averaging over all 50 Monte Carlo iterations. For each of the 50 iterations, the samples were split randomly in a ratio of 70% training set, 15% validation set, and 15% test set. During implementation, a constant random seed generator was fixed in the study in the Jupyter Notebook and all results were reported after 50-fold crossvalidation. Table I shows a comparison of the classification performance of all five neural network architectures on the six connectivity methods. It is possible to observe that in the case of PLV, the average training accuracy for Architecture 1 is at 80.32%, and it keeps increasing proportionally with a more significant number of neurons and hidden layers used by the other architectures. One crucial thing to observe here is that the training accuracy increases significantly until Architecture 3 and then stagnates. A similar trend is observed for the other evaluation metrics. It is a familiar fact that increasing the number of neurons or the hidden layers in a neural network can help extract more features from the input data, but it happens only to a certain extent. There is always a limit that is dependent on the size of our input features. Increasing the number of neurons/layers above that limit can result in over-fitting and thus reduce the classification accuracies. The stagnation in accuracy above Architecture 3 indicates that we are very close to that limit corresponding to our dataset. The results associated with the late fusion methodology are shown in Table II. It is interesting to note that Architecture 5 gives a slightly lower accuracy than Architecture 4. Here, it is very likely that the same reasoning will follow.
The ROC curve diagram for Architectures 1-5 for all six connectivity methods is shown in the supplementary material. Each graph contains ROC curves for all 50 Monte Carlo iterations. Here, the area under the curve (AUC) measures a classifier's ability to differentiate classes. A model with the highest AUC best differentiates between positive and negative classes. It is observed that the model improves over subsequent iterations, and also, the high number of diagonal curves for architecture 1, as compared to architectures 3 and 5, are very apparent. That is, the AUC is lower for architecture one than for architecture 3 and 5, which validates the trend observed.
Group differences for each connectivity measure are shown as graphs (see Fig. 4). Here, only the strongest connections are illustrated. For PLV, the calculations were performed at a threshold of 70, implying that only 30% of the strongest connections were considered. Similarly, 84%, 60%, 70%, 80%, and 40% of the strongest linkages were considered for PLI, DTF, PDC, SL, and COR, respectively. Increasing the threshold by ten signifies that 10% of the weakest connections were removed. Since the network is strongly linked, removing even 90% of the weakest links does not fully alter the typical connectivity pattern. Statistically significant differences were observed between various groups. Among the different connectivity methods, it is possible to observe that the best results are associated with the models trained with the adjacency matrices of DTF and PDC models. According to this, evidence suggests that the causality connectivity methods can deliver richer information that can be used to build classification models compared to the phase synchronization connectivity methods. In contrast to PLV and PLI, directionality measures like DTF allowed for identifying the key drivers and directions of information flow. There were more flows from the posterior to the frontal brain areas in SCZ patients, which can also be seen in Fig. 4. The increased number of flows corresponds to a stronger PLV in frontal brain regions, also seen in the obtained connectivity graphs. The EEG data collected under resting-state settings with eyes open and closed has previously revealed connections between PLV and DTF [55]. The strength and area of synchronization also depend on the method used to measure it. From Fig. 4, it is observed that in patients diagnosed with SCZ, frontal brain regions are more synchronized as depicted by PLV and PLI shows us a weaker synchronization in the posterior areas. Synchronization, quantified by PLV, was observed to be high in SCZ patients in contrast to  III  PERFORMANCE COMPARISON OF SCHIZO-NET WITH STATE-OF-THE-ART MODELS WHICH WERE TRAINED ON THE SAME DATASET   TABLE IV  PERFORMANCE COMPARISON  non-SCZ counterparts. Moreover, reduced information flows in the frontal part of the brain but increased flows from the occipital lobe to the frontal brain lobe were observed. The differences in average adjacency matrices for subjects diagnosed with SCZ and the control set of non-SCZ individuals are shown in Fig. 5. This provides insights into the degree to which each connectivity method helps us distinguish between the two classes. The differences in PLI and SL adjacency matrices are very scarce, complying with their low classification accuracy. These adjacency matrices are calculated for the six connectivity measures. Tables III and IV shows the performance comparison of Schizo-Net with state-of-the-art techniques in the current SCZ literature.

IV. CONCLUSION AND FUTURE WORK
The study presents Schizo-Net, a novel method of utilizing brain connectivity methods to diagnose schizophrenia from Electroencephalogram (EEG) signals. The significance of this approach for SCZ detection is demonstrated experimentally by comparing six connectivity measures reflecting different brain connectivity aspects, namely PLV, PDC, PLI, COR, DTF, and SL. Each metric conveys information about the interactions within and between brain areas in a distinct way. These measurements reveal crucial characteristics such as correlation, phase synchronization, and directional information flow between different brain areas. They support the examination of the neural mechanisms underlying disconnectivity disorders. Here, shallow neural networks are showing decent accuracy. The present study provides an efficient and reliable method for diagnosing SCZ using the concept of brain connectivity. We achieved state-of-the-art accuracy (of 99.84%) for different measures using deep neural networks. Monte Carlo cross-validation along with stratification was performed to validate the obtained results. The contribution of the present study is that the classification based on connectivity features is successful primarily due to the effective deep learning-based processing of brain connectivity indices associated with SCZ. The study has a few limitations. First, only alpha bands have been considered while designing the diagnosis model. Even though it has been established that alpha bands are advantageous compared to other bands in differentiating SCZ patients from control, it would be interesting to see how deep learning models perform on other bands in future studies. Second, all neural network models have been given equal weightage in late fusion. What would be ideal is to give more weightage to the connectivity feature that performs better classification. Future studies can focus on designing the weighted distribution of late fusion prediction. Future studies can also focus on expanding Schizo-Net with the notions of explainability, deep-precognitive diagnosis [63], and other disorders like myocardial infarction [73], [74].