Abstract

This study aims to increase the accuracy of autism spectrum disorder (ASD) diagnosis based on cognitive and behavioral phenotypes through multiple neuroimaging modalities. We apply machine learning (ML) algorithms to classify ASD patients and healthy control (HC) participants using structural magnetic resonance imaging (s-MRI) together with resting state functional MRI (rs-f-MRI and f-MRI) data from the large multisite data repository ABIDE (autism brain imaging data exchange) and identify important brain connectivity features. The 2D f-MRI images were converted into 3D s-MRI images, and datasets were preprocessed using the Montreal Neurological Institute (MNI) atlas. The data were then denoised to remove any confounding factors. We show, by using three fusion strategies such as early fusion, late fusion, and cross fusion, that, in this implementation, hybrid convolutional recurrent neural networks achieve better performance in comparison to either convolutional neural networks (CNNs) or recurrent neural networks (RNNs). The proposed model classifies subjects as autistic or not according to how functional and anatomical connectivity metrics provide an overall diagnosis based on the autism diagnostic observation schedule (ADOS) standard. Our hybrid network achieved an accuracy of 96% by fusing s-MRI and f-MRI together, which outperforms the methods used in previous studies.

1. Introduction

Millions of neurons are responsible for coordinating each part of the human body and brain. When brain networks are incorrectly connected to coordinate activities, certain disorders in the human body arise [1, 2]. Some of the most common neurodevelopmental disorders are autism spectrum disorder (ASD) [3], schizophrenia [4], attention deficit hyperactivity disorder (ADHD) [5], epilepsy [6], Parkinson’s disease [7], obsessive-compulsive disorder [8], and bipolar disorder (BD) [9].

ASD refers to a range of neurodevelopmental disorders with behavioral and cognitive impairments that place a huge burden on patients, families, and society. Identifying ASD patients directly in comparison to healthy controls is important for early detection and intervention. ASD’s exact cause is still unknown [10]. Due to lack of knowledge of neuropathology, symptom-based diagnosis often results in poor treatment.

Early accurate diagnosis of ASD is pivotal to develop specialized interventions [11]. Due to its complex nature and highly heterogeneous symptoms, the diagnosis of ASD is very challenging [12].

Neuroimaging is an attractive noninvasive modality to cross the gap between environment, genes, and cognitive and behavioral phenotypes in ASD. Several studies in neuroimaging have used different techniques such as structural and functional magnetic resonance imaging (MRI) [1217]. Similar studies have contributed to our understanding of brain changes in ASD subjects on structural and functional connectivity levels. Functional connectivity has been used to presage early autism diagnosis and restrict correlations within specific neural circuits across blood oxygenated level-dependent (BOLD) signals at different brain regions [18].

A number of studies have aimed to diagnose ASD based on structural magnetic resonance imaging (s-MRI) and functional magnetic resonance imaging (f-MRI) data [1]. In an earlier study, McKeown et al. anatomized f-MRI data into spatial components by blind separation [19]. Later, Uddin et al. presented a model using logistic regression classifier and independent component analyses in order to differentiate between diseased and health patient groups [20]. S-MRI data delineate the structural properties of the brain and have received attention from researchers [2126]. Another study proposed a new model for distinguishing between ASD positive and negative individuals grounded on the features of s-MRI and f-MRI data using histogram of oriented gradients [27].

The goal of the present study is to formulate an effective machine learning (ML) architecture to enhance the effectiveness of ASD diagnosis. We aim to classify ASD patients and HC participants using s-MRI in conjunction with rs-f-MRI data from a large multisite data repository, namely, ABIDE (autism brain imaging data exchange). The dataset is phenotypically rich and consists of different modalities from an important clinical population. We also aim to identify significant brain connectivity features via functional connectivity classification of ASD patients and HC participants. We apply deep learning to identify ASD patients, grounded on the patient’s brain blood oxygen level-dependent (BOLD) activation patterns. Multimodality fusion on s-MRI and f-MRI improves classification performance over the existing methods in our implementation. The proposed multimodality hybrid method achieves state of the art accuracy of 96% in distinguishing ASD from HC individuals. We benefited from the combination of convolutional neural networks (CNNs), which has strong modeling, and feature extraction power and recurrent neural networks (RNNs), which fused and ordered time series data. Furthermore, there are also privileges of the dataset and the atlas used for preprocessing.

2. Materials and Methods

2.1. Data Description

In the present study, both T1 weighted structural MRI and T2 weighted functional MRI data are obtained from image and data archive powered by laboratory of neuro imaging (LONI) [28] from ABIDE [29]. All data were used under the direction and approval of the respective institutions’ ethics boards. ABIDE is based on a collaboration of 17 international imaging sites that have aggregated and are openly sharing neuroimaging data from 539 individuals suffering from ASD and 573 typical HC [30] in the neuroimaging informatics technology initiative (NIfTI) format. The data collected from these 1112 subjects consist of structural and resting state functional MRI data along with an extensive array of phenotypic information. All subjects have been selected by evaluating phenotypic information like age, gender, and intelligence. It is known that the scanning infrastructure in each imaging site used different parameters such as repetition time (TR), echo time (TE), number of voxels, number of volumes, openness or closeness of the eyes, and protocols for the data.

Fivefold cross validation strategy was used to evaluate the performance. In detail, each source was split into five subsets with an approximately equal number of subjects. We used four subsets of the data for training and the other for validation to select the model each time. Then, we conducted the adaptation process on time series cross validation. The augmented validation data were used during adaptation process.

In this study, we used the statistical parametric mapping (SPM) software version 12 (SPM12) built in MATLAB and computation, display, and analysis of connectivity (CONN) toolbox. SPM integrated toolbox was developed [31] as an extension to SPM for incorporating morphometric voxel-based (VBM), seed-based (SBM), or region of interest (ROI)-based neuroimaging methods.

F-MRI is a noninvasive technique to assess brain functions by using signal changes [14]. A group of small cubic elements referred as voxels represent the brain volume of f-MRI data. F-MRI consists of time series data extracted from each voxel by keeping track of its activity over time. The time series represent the signal measured at each voxel. Rs-f-MRI is used for analyzing brain disorders implementing f-MRI techniques while the subject is in a resting state. The major approach explored for discriminating between typically and autistic developed brains was shape and volumetric based analysis of s-MRI. S-MRI is generally classified as an anatomical study consisting of two categories of features, namely, shape features and volumetric features.

The heterogeneity of disorders of autistic individuals has increased the need for personalized approaches to analyze and prognosticate both functionally and anatomically for each autistic subject. Hence, in the present study, we combined s-MRI and f-MRI data with the aim of achieving better diagnostic accuracy and suggesting optimum treatment plan for every autistic subject. We analyze our results to ascertain that they fit better with autism diagnostic observation schedule (ADOS). Correlation is analyzed among all subjects for trait score differences and ADOS total scores to extract features of autism severity.

2.2. Data Preprocessing

Neuroimages display thousands of cortical and subcortical areas, providing information on structures and functions. Brain atlases are used to divide brain images into a limited number of regions of interest (ROI) in order to overcome complexity [32]. Figure 1 depicts the overall pipeline of the approach we propose. For each modality, data preprocessing is necessary in order to avoid the risk of scanner bias and the effect of heterogeneity of protocols. In addition, the steps of denoising, fusion, and analysis to evaluate hybrid deep learning methods and correlation with ADOS total score are explained in the following sections.

First, in order to convert 2D f-MRI to 3D s-MRI, we used ROI percolation Harvard-Oxford atlas. Then, our preprocessing pipeline consisted of functional realignment and unwarp; slice-timing correction; outlier identification; direct segmentation and normalization; and functional smoothing within Montreal Neurological Institute (MNI) atlas. There are many studies using various atlases such as Harvard-Oxford (HO), Craddock 200 (CC200), Craddock 400 (CC400), Automated Anatomical Labeling (AAL), Eickhoff Zilles (EZ), Talaraich and Tournoux (TT), and Dosenbach 160 [1]. For the context of the present study, we downloaded the time series for the brain areas specified in MNI standard brain atlas [33]. In our literature review, we have realized that the MNI atlas has rarely been used with the large volume and different modality of ABIDE dataset. It is included in different neuroimaging analysis packages, including the statistical parametric mapping package (SPM). We have selected MNI atlas in order to perform comparisons across subjects and studies, particularly of subcortical data, which is accurately aligned by nonlinear volume registration in comparison to cortical data. In addition to that MNI atlas overcomes the neuroimage differences in shape, size, and relative orientation. The advantage of MNI atlas is that it focuses on disorders and artifacts on neuroimaging data used to analyze its functional and structural connectivity from the top portion of the brain to the bottom portion of the cerebellum [34].

Preprocessing is a significant step to remove the effects of different scanners, artifacts, or partial volume effects and the variability between subjects that may stem from data acquisition. In order to reduce execution time and achieve better accuracy, preprocessing of neuroimages generally consists in performing a fixed set of operations on the data. We used the CONN [35] functional connectivity toolbox that works with MATLAB/SPM. In order to reduce physiological and other noise sources, additional removal of movement and temporal covariates, temporal filtering and windowing of the residual BOLD contrast signal, first level estimation of multiple standard f-MRI and s-MRI measures, and second-level random-effect analysis, CONN provides a method as well as component based noise correction. Although global signal regression could also have been considered, the component based noise reduction method allows for interpretation of inverse correlations because there is no global regression signal in our implementation. The toolbox implements f-MRI and s-MRI measures, such as estimation of seed-to-voxel and ROI-to-ROI functional correlations, as well as semipartial correlation and bivariate/multivariate regression analysis for multiple ROI sources, graph theoretical analysis, and novel voxel-to-voxel analysis of functional connectivity.

In the course of functional realignment and unwarp, all neuroimages that belong to a subject are oriented in reference to the first image of the time series of that subject. The purpose of slice-timing correction is to set the time series of the voxel so that all the voxels in each image have a common reference time. Outlier identification scans are identified based on the observed global BOLD signal and the amount of subject motion. The change in the global BOLD signal at any time is calculated as the change in the average BOLD signal within SPM’s global mean mask scaled to standard deviation units. In addition, we employ the relative probability densities of gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) in MNI space as inputs to the hybrid method. Therefore, direct segmentation provides segmentation into GM, WM, and CSF tissue classes. Also, direct normalization iteratively performs tissue classification from intensity values from functional and structural reference images and estimates nonlinear spatial transformations that approximate posterior and anterior tissue probabilities until convergence. Finally, data are smoothed in order to clean images of nonbrain artifacts from the series of voxels. This consists in averaging the neighbor voxel signals, as blood supply and its functions are usually close among neighboring brain voxels. Without disturbing the BOLD signal, temporal filtering eliminates redundant components from time series of voxels [36, 37].

2.3. Data Denoising

Using neuroimages in order to diagnose ASD is challenging due to the noise redounded from the image recording process. Consequently, there are many filtering approaches such as NLM filters, wavelet based filters, and band-pass filters, to extract the noise [38]. In this study, we prefer band-pass filtering for denoising the pipeline to reduce unwanted phase shifts.

MATLAB signal processing toolbox is particularly useful to filter signals with filter design parameters such as filter type, filter order, and attenuation. It combines two steps that use linear regression of potential artifacts in the BOLD signal and temporal band-pass filtering. BOLD signals are forecasted and removed separately for each voxel and for each subject due to factors identified as potential confounding effects. Working with this filtering, we resample all data to ensure equally spaced points for comparison into subjects. To that end, we use MATLAB function resample, which applies an antialiasing band-pass filter to the time series and compensates for the delay introduced by the filter. This function resamples the input sequence, the raw head motion in our case [39].

Inhomogeneity correction is applied to increase accuracy of artifacts in images created by nonhomogeneous brain tissues. Various techniques such as histogram matching are available for normalizing the volume of images [38].

While minimizing the effects of noise sources such as head movement and physiological variations, temporal frequencies below 0.008 Hz or above 0.09 Hz are removed from the BOLD signal using a band-pass filter [40].

Figure 2 shows a sample of denoising output obtained from our dataset. Functional connectivity (FC) measures can be best classified by estimating the distribution of FC values between randomly selected pairs of points within the brain before and after denoising in order to minimize the effect of artifactual factors. After preprocessing pipeline but before denoising considering the BOLD signal, FC distributions show large intersession, intersubject variability with degrees of positive biases including large scale physiological, and subject motion effects. After denoising, FC measures orient approximately centered in the positive side with considerably reduced intersession and intersubject variability.

2.4. Classification Methods

Investigating another line of research [41], the newly proposed cross fusion fully convolutional neural network (FCN) performed best among the multimodality and fusion networks. Based on that finding, three alternative fusion strategies were considered in the present work: early, late, and cross fusion, as shown in Figure 3.

For early fusion (Figure 3(a)), the preprocessed f-MRI and s-MRI neuro images are combined for each subject thus producing a tensor. This input tensor is processed using the model network. For late fusion (Figure 3(b)), parallel streams process the f-MRI and s-MRI images independently before being fed into the model network. The output is fed through the neural network that carries out information fusion. For cross fusion (Figure 3(c)) which we propose, there are two processing branches connected by trainable scalar cross connections. The purpose of the process is to provide the functional connectivity matrix (FCM) information with cross-trainable fusion parameters rather than limiting the features to a single plane. The difference between cross fusion and studies in the related literature is the usage of hyper parameters. To overcome dimensional differences of feature matrices that belong to different neuro images during the pairwise comparison, training is carried out with a selected value of the parameter α (Figure 4). It was observed through trial runs that higher α value required almost prohibitive processing times and lower values resulted in unacceptably blurred images. Thus, α = 0.05 was selected to provide acceptable image quality with available processing power. During training, the parameter is automatically adjusted to integrate two different information modalities f-MRI and s-MRI.

With the scalar crosslinks formed with A1 (α) and B1 (α) in layer 1, N ∈ {0, 0.01, 0.02, …, 0.09, 1} probabilities of each layer are calculated within the cross fusion. α controls the gradient range. To further demonstrate the effects of α on fusion results, we have selected threshold of α = 0.05. The FCM image (Figure 4) shows areas where gray matter, white matter, and CSF features are clustered.

Figure 4 left side shows a sample of preprocessed cross-sectional volumes and right side shows their corresponding feature maps. In addition, each subimage corresponds to a single filter. The convolutional filters are sensitive to features of the preprocessed cross-sectional volumes of the patients with a diagnosis of ASD.

To tackle the high dimensionality of the acquired features, we selected tissue kind as a feature. In the literature, several novel CNN or RNN models were constructed to create different features with different configuration parameters. By taking inspiration from them, we selected only different tissue area-related features. The maps in Figure 4 are shown with the descriptive information of the clusters obtained at the selected significance level.

After data preprocessing and denoising, the first stage of our framework consists of a CNN and an RNN in a hybrid form. The main idea of these networks is to use a convolutional layer. Both networks are used to detect spatial dependencies in data within the help of the convolution layer [42]. In order to analyze multidimensional time series, CNN and RNN are useful [43]. The advantage of this model lies in the possibility of using a pretrained model.

CNN has three introductory layers referred as fully connected convolution layer, pooling layer, and the final convolution layer. First, the input signal is directly connected to the convolution layer and a kernel is used for convolution operation. In addition, operation results are created as a feature map for the next layer. Between two layers of convolution is a layer of pooling. In order to reduce the size of feature mat, the pooling layer is used. Otherwise, inside the same hidden layer, RNN sends feedback signals to the other neurons within the related layer (Figure 5). The output of the CNN layer was created by selecting α parameter of 0.05 and given as input to the RNN layer. Then, the feature vector is formed with the RNN output. In the fully connected layer, performance evaluation was made first separately and then by combining subject together with concatenation of data. At the last stage, classifier and output process takes place and the model result is parsed as ASD and HC.

We have used Matlab/SPM based cross platform software on Windows environment on an Intel Core i7 processor, a clock frequency of 3 GHz, 32 GB RAM, 500 GB Solid State Drive (SSD) computer. Training our network took a little over 2 hours per epoch and around 2 days and a half for the fully trained hybrid convolutional recurrent neural networks. Number of iterations is the number of passes, each pass processing data that belong to all subjects. Our method takes on average 2-3 minutes to segment the data of a single subject from the ABIDE dataset (nearly two days for all 1112 subjects). In high performance computing environments, CONN can distribute our processing and analyses in parallel across multiple nodes. This can result in a very significant reduction in processing time.

For each pair of subjects, Pearson’s correlation coefficients have been used with ADOS report. It is significant to have multiplicity adjustments to control the false discovery rate (FDR) for the test. In this study, we have applied the FDR with the threshold of 0.1 for correlation analysis [44].

3. Results

3.1. Summary Statistics

There is no public dataset available consisting of data from different modalities such as electroencephalography (EEG), diffusion tensor imaging (DTI), MRI, and f-MRI (resting state and task based), that belong to the same individuals. Furthermore, there is a lack of ASD subsyndromes data such as Asperger’s syndrome (AS) [45] and pervasive developmental disorder, not otherwise specified (PDD-NOS) [46], and distribution rates according to number of samples by gender are also low. For future studies, availability of datasets that provide different modalities will help researchers to improve ASD detection accuracy using ML and deep learning methods.

We observed that the combination of ML classifiers with other clinical features of ASD improved the accuracy of ASD diagnosis. The current sample size identifies relatively relevant brain regions at high risk for ASD, suggesting that this method can be extended to large and more heterogeneous ASD populations. Using s-MRI and f-MRI modalities in conjunction, we have shown that a higher level of diagnosis accuracy can be achieved.

For each subject, local diagnosis accuracy for both s-MRI and f-MRI feature matrices is calculated. Table 1 shows the accuracy, sensitivity, and specificity obtained for s-MRI and f-MRI when using all features. Accuracy measures the proportion of correct predictions made by the model. It is defined as the ratio of the number of correct predictions to the total number of predictions made. Sensitivity measures the proportion of actual positives that are correctly identified as positive by the model. It is defined as the ratio of the number of true positives to the total number of actual positives. And also, specificity measures the proportion of actual negatives that are correctly identified as negative by the model. It is defined as the ratio of the number of true negatives to the total number of actual negatives. Table 2 shows the accuracy achieved by different fusion (early, late, and cross) strategies. As can be seen, cross fusion with ADOS yielded the highest accuracy among the other fusions. We do not prefer late and cross fusion processes without ADOS because the score obtained with ADOS is consistently higher than that obtained without ADOS. Our results show that the hybrid model, achieving classification performances of 96.02%, 92.83%, and 85.70% for the accuracy, sensitivity, and specificity, respectively, is significantly superior to the single CNN and RNN models.

Our hybrid algorithm provides high accuracy and specificity when s-MRI and f-MRI are analyzed together. Our model also fuses the s-MRI and f-MRI datasets, which provides an accuracy of 96.02% accuracy, higher than alternatives.

We have investigated the effects of different s-MRI and f-MRI parameters on the machine learning algorithm. Proposed diagnosis may get better via both modalities, and we have observed that the addition of s-MRI and f-MRI parameters in features specific for ASD classification gives a higher significant Pearson correlation at  = 0.001 than benchmark data with ADOS total score. Thus, the current data suggest that the approach of a localized diagnosis with fusion of different modality datasets, fusion strategies, and correlation to ADOS will greatly improve accuracy, sensitivity, and specificity.

In Table 3, we compare individual CNN, RNN, hybrid CNN-RNN, and other recent machine learning methods with similar studies, albeit on different datasets and different diseases, based on the usage of neuroimaging data, in terms of accuracy. Studies using CNN, only RNN, their combination, and other methods are shown. A study reports a CNN study with a very high accuracy of 100 percent for Alzheimer disease Hossesini-asl et al. [47], another one presents a two-dimensional CNN with the high accuracy of 90.29 percent for hyperactivity disease [71], and other one achieves an accuracy of 98.8 percent for Parkinson’s disease [58]. Among the studies that utilized the Parkinson’s disease dataset, the study achieved an accuracy of 82.89 percent using both CNN and RNN, which is a hybrid method [65]. Researchers show the usefulness of ML techniques to identify and predict generalized disease. Application of ML technique in EEG of patients with epilepsy is very recent and is emerging with promising results within balanced accuracy of 98.13% [70]. In addition, in Table 4, we compare different ASD studies in which machine learning methods have been applied on different sets of neuroimaging data, different modalities, and different ML methods. Another inspiring publication showed that the computer-aided diagnosis system was able to accurately distinguish between individuals with ASD and controls, achieving an accuracy rate of 87.1% [15]. Yet another more recent work by the same author [18] demonstrates the potential of using dynamic functional connectivity analysis to identify brain regions associated with specific symptoms of ASD with 47 subjects which is lower than we are. By identifying these regions, the author aims to contribute the development of more targeted and personalized interventions for individuals with ASD. Many studies in the literature have focused on group level differences between individuals with ASD and typically developing controls. While these studies have identified some brain regions consistently associated with ASD, they do not account for the variability in brain structure and function that exist within the ASD population. Another difference between our study and some related studies is the use of a combination of s-MRI and f-MRI data. The combination of these two types of data allows for a more comprehensive analysis of brain structure and function, which may improve the accuracy of ASD diagnosis. Researchers have developed several approaches for seizure detection using ML classifiers and statistical features [88, 94]. A recent publication [84] demonstrates substantial difference in the efficiency and accuracy of various biomarkers used for ASD diagnosis. The difference in the performance of various biomarkers is due to heterogeneity of ASD. Our fusion of f-MRI and s-MRI data has improved the accuracy of existing autism detection systems by combining two modalities. Some studies in the literature have investigated special biomarkers consisting of biological molecules used for biomedical imaging and neuromodulation. In the present study, we did not investigate biomarkers but rather focused on algorithmic enhancement of accuracy. In addition, we combined CSF with WM and GM. Our machine learning methodology and fusion strategies are different from that applied by Jamwal et al. achieving higher accuracy via a novel neural network structure.

4. Conclusion

In general, it is difficult to generalize the findings of studies utilizing a small selection of samples. In addition, many studies in related areas focus on different age populations, thus limiting generalizability. Studies in the literature that focus on gender differences also inevitably reduce sample sizes, leading to reduced statistical confidence. An important challenge of neuroimaging datasets is the unavailability of different modalities. By using the ABIDE dataset, we were able to overcome these challenges, through utilizing s-MRI and f-MRI data together for a large number of subjects. Clinical studies have shown that using multimodality techniques play a significant role in increasing the accuracy of ASD diagnosis [97]. Our contribution can be summarized as implementing different modality fusion with higher accuracy and correlation with ADOS within a hybrid method consisting of CNN and RNN.

Future direction in the path towards more effective ASD diagnosis and treatment is expected to further exploit the potential of hybrid ML algorithms for classification. Local analysis of the brain regions is expected to enable clinicians to deliver personalized treatments to autistic individuals. And also, our cross fusion infrastructure will be provide region based analysis of the brain, which we believe that it can allocate subjects on the autism spectrum and help clinicians deliver personalized treatments to individuals with autism. Another possibility that has emerged with our approach is the integration of further imaging modalities such as DTI and EEG data to diagnostic studies based on neuroimaging, in order to obtain a higher number of features and using biomarkers to improve classification accuracy. In addition, subcategorization of autistic disorders such as Asperger and PDD-NOS via multimode neuroimaging may become possible using the proposed hybrid ML approach.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

It will be funded by the author who is name Semih Bilgen.