MADE-for-ASD: A Multi-Atlas Deep Ensemble Network for Diagnosing Autism Spectrum Disorder

In response to the global need for efficient early diagnosis of Autism Spectrum Disorder (ASD), this paper bridges the gap between traditional, time-consuming diagnostic methods and potential automated solutions. We propose a multi-atlas deep ensemble network, MADE-for-ASD, that integrates multiple atlases of the brain's functional magnetic resonance imaging (fMRI) data through a weighted deep ensemble network. Our approach integrates demographic information into the prediction workflow, which enhances ASD diagnosis performance and offers a more holistic perspective on patient profiling. We experiment with the well-known publicly available ABIDE (Autism Brain Imaging Data Exchange) I dataset, consisting of resting state fMRI data from 17 different laboratories around the globe. Our proposed system achieves 75.20% accuracy on the entire dataset and 96.40% on a specific subset $-$ both surpassing reported ASD diagnosis accuracy in ABIDE I fMRI studies. Specifically, our model improves by 4.4 percentage points over prior works on the same amount of data. The model exhibits a sensitivity of 82.90% and a specificity of 69.70% on the entire dataset, and 91.00% and 99.50%, respectively, on the specific subset. We leverage the F-score to pinpoint the top 10 ROI in ASD diagnosis, such as precuneus and anterior cingulate/ventromedial. The proposed system can potentially pave the way for more cost-effective, efficient and scalable strategies in ASD diagnosis. Codes and evaluations are publicly available at https://github.com/hasan-rakibul/MADE-for-ASD.


Introduction
Autism Spectrum Disorder (ASD) is a prevalent neurodevelopmental condition characterised by challenges in social and communicative abilities, as well as repetitive and hard-to-control behaviours in daily life [1].ASD often co-occurs with various other conditions, such as intellectual impairment, seizures and anxiety.Autistic individuals display a wide range of characteristics, from mild to severe social and communicative differences, along with restricted and repetitive behaviours and interests [2].According to a 2022 report endorsed by the World Health Organisation2 [3], approximately one in 100 children worldwide are autistic, a significant increase from the one in 160 reported in 2012 [4].The economic impact on families of autistic individuals is substantial, making ASD a critical public health concern [5].
Diagnosis of ASD at present leverages a variety of techniques, predominantly behavioural assessments such as the Autism Diagnostic Observation Schedule (ADOS) [6] and the Autism Diagnostic Interview-Revised (ADI-R) [7].Both ADOS and ADI-R deliver important insights into an individual's communication, social interactions and behaviour.Nevertheless, these methods primarily depend on observing symptoms, making them largely subjective.The process can be both expensive and lengthy.Additionally, variations in symptom presentations and their severity can introduce additional complications, underscoring the need for a diagnostic method that is more efficient and objective [8].In this context, functional magnetic resonance imaging (fMRI) of the brain could be leveraged because it provides a non-invasive means of examining brain activity.It has the potential to elucidate the intricate neurological deviations representing ASD, which can be used towards developing automated, objective, efficient and early diagnostic methods [9,10].
The Autism Brain Imaging Data Exchange (ABIDE) consortium releases resting-state fMRI (rs-fMRI) data with T1 structural brain images and demographic information of autistic individuals and Typical Controls (TCs) [11].ABIDE I dataset was collected from 17 international research sites, which makes it diverse and comprehensive for understanding the neurological nuances of ASD.Such heterogeneity in the data can help capture the diverse manifestations of ASD across different populations and geographical locations.ABIDE I data, therefore, can strengthen the generalisability of findings and enhance their clinical relevance and applicability.The preference for using ABIDE I data is proven by many prior works on ASD diagnosis [12,13,14,9,15,16,17,18,19,20,21,22].This paper uses ABIDE I dataset to classify between ASD and TC, hereinafter referred to as ASD diagnosis.
Machine learning (ML) and deep learning (DL) techniques have been increasingly employed to advance the understanding and diagnosis of various neurodevelopmental disorders, including ASD.ML and DL provide an objective and data-driven approach to diagnosis, thereby reducing the reliance on subjective symptom-based criteria.However, building appropriate models with high-dimensional multi-site neuroimaging data, such as ABIDE I, is challenging because of added dimensionality (i.e., different brain regions may have different cues towards ASD diagnosis) and site diversity (i.e., different data collection sites have different data collection protocols).Including other important cues, such as people's demographic information, can potentially improve the predictive performance but make the system development more challenging while accommodating these data with the primary neuroimaging data.Addressing these challenges, we propose a novel DL-based ASD diagnosis workflow from ABIDE I brain fMRI data and corresponding demographic information.The input rs-fMRI images are processed to extract regions of interest (ROIs) according to three different atlases (brain parcellations), from where functional connectivity features are extracted.This paper's novelty is two-fold.(1) It proposes MADE-for-ASD, consisting of a stacked sparse denoising autoencoder (SSDAE) and multi-layer perceptron (MLP), followed by a weighted ensemble learning framework.We incorporate demographic information into the model, which enhances the performance through personalised prediction.(2) This paper provides visualised insights into the most significant ROIs with a high correlation with ASD, which could further assist in understanding the neurobiological underpinnings of ASD.

Neuroimaging in ASD Diagnosis
Magnetic resonance imaging (MRI) is a critical tool for understanding the pathophysiology of neurological disorders such as schizophrenia and autism.MRI is valuable due to its costeffectiveness and non-invasive nature, which have led to its widespread acceptance and application in the medical community [23].Specifically, functional MRI (fMRI) tracks changes in blood oxygen levels over time, making it adept at inferring brain activity and drawing considerable attention from researchers studying brain dysfunctions [24].Changes in the intensity of fMRI images throughout the acquisition period usually serve as a representation of brain activity, typically expressed as a time series.Brain disorders rarely manifest as anomalies in singular or multiple brain regions; they typically manifest as atypical connectivity among various brain regions.In this context, functional connectivity helps investigate the association of specific activities between brain regions and has found widespread use in the classification of brain disorders [25].
Several datasets are instrumental in advancing autism research by providing extensive neuroimaging data.The ABIDE I dataset aggregates resting-state fMRI data from 17 international sites [11].ABIDE II expands on this by including additional subjects and sites, further enhancing its utility for ASD research [26].Various works in ASD diagnosis have leveraged a variety of machine learning (ML) and deep learning (DL) models with rs-fMRI data from the ABIDE datasets [12,13,14,9,15,16,17,18,19,20,21,22].For example, Abraham et al. [12] experimented with several ML algorithms, including support vector regression and ridge regression, achieving a classification accuracy of 66.8%.However, these conventional machine learning models have limited ability to learn complex patterns from raw data and do not take advantage of deep representations.In a separate study, Heinsfeld et al. [13] embraced a deep learning technique with an autoencoder (AE) and deep neural network (DNN), reaching 70% accuracy.Eslami et al. [19] leveraged an AE with a single-layer perceptron and achieved 70.3% accuracy, while Almuqhim and Saeed [20] used an AE, achieving 70.8% classification accuracy.Although these studies have shown the proof of concept of using AEs in this task, their performance remains limited.
Few studies [27,28,29] utilised a subset of the ABIDE dataset.For example, Plitt et al. [29] employed a random forest classifier to predict ASD on only 179 samples and reported a classification accuracy of 95%.Chen et al. [28] selected 252 high-quality data based on criteria such as head motion, artifacts, and signal dropout and reported an accuracy of 91% using a random forest classifier.
Some studies [16,30] used both ABIDE I and II datasets.For example, Khosla et al. [16] trained their Convolutional Neural Network (CNN) architecture with the ABIDE-I data and tested their model's performance on the ABIDE-II dataset.Aghdam et al. [30] proposed a 'mixture of experts' ensemble approach to diagnosing autistic young children (aged 5-10 years) in three dataset conditions: ABIDE I, ABIDE II, and a combination of them.
The National Database for Autism Research (NDAR) is another significant repository, offering a wide range of data types, including neuroimaging data, collected from various studies and institutions [31].Using this dataset, Li et al. [32] proposed a multi-channel CNN model for early ASD diagnosis.Despite the variety of neuroimaging datasets available, the ABIDE I dataset remains the most widely used in ML-based ASD diagnosis [33].

Selection of Brain Atlases
Atlas-based parcellation, which divides the entire brain into spatially proximate ROIs, offers several advantages in neuroimaging studies [34]: it identifies brain regions with significant connectivity differences between groups, examines brain functional organisation, reduces data dimensionality and improves result interpretability by linking specific brain regions to conditions or phenotypes.The Automated Anatomical Labelling (AAL) atlas, with 116 ROIs, is widely used in ASD diagnosis studies [16,35,36,37] due to its ability to provide precise anatomical locations essential for identifying structural abnormalities caused by ASD.The Craddock 200 atlas [38], with 200 ROIs, is based on functional connectivity and is particularly useful for analysing resting-state fMRI data [39,9,36,37], as it clusters the brain into functionally homogeneous regions.
Recent studies have leveraged multiple atlases to capture diverse patterns related to ASD.For instance, Mahler et al. [37] proposed a multi-atlas framework for ASD classification using resting-state fMRI data, employing three atlases, including AAL and Craddock 200.Similarly, Deng et al. [9]  To further enhance the analysis, we leverage the Eickhoff-Zilles (EZ) atlas with 116 ROIs, which has also been utilised in recent ASD diagnosis studies [40,41].Derived from cytoarchitectonic mapping, the EZ atlas combines anatomical and functional data, offering a nuanced view that bridges structural and functional aspects of brain regions [42].By integrating the complementary strengths of AAL, CC and EZ atlases to capture different aspects of brain structure and function, our approach ensures a comprehensive analysis crucial for accurate ASD diagnosis.
In a typical fMRI-based ASD diagnosis pipeline [9,16], atlas data is converted to functional connectivity matrices, which measures the degree of synchronised activity between different brain regions based on the time series of resting-state fMRI brain imaging data.While two of the atlases (AAL and EZ) focus on anatomical locations, they also provide a foundational framework for understanding the structural context within which functional connectivity occurs.Accordingly, functional connectivity of AAL and EZ atlases are utilised in ASD diagnosis literature [16,9,40].

Task Formulation
Let us consider X = {X fMRI , X demog }, where X fMRI and X demog are the input fMRI images and demographic/phenotypic information, respectively, and Y = {Y 1 , Y 2 }, where Y 1 and Y 2 are ASD and TC classes, respectively.The goal is to develop a binary classifier F to predict whether a sample input x i ∈ X can be classified as y i ∈ Y.

Overall Framework
Figure 1 illustrates the overall framework of our ASD diagnosis system, which is fundamentally segmented into two distinct stages: the ASD/TC classification phase and the high-quality subset selection phase.------------------------------------ We first calculate functional connectivity matrices on three brain atlases, followed by feature selection using F-score.To diagnose ASD, we use deep learning with a stacked sparse denoising autoencoder (SSDAE) and multi-layer perceptron (MLP), followed by a weighted ensemble.In the case of extracting high-quality data, we train the classifier with the NYU subset and predict on the whole dataset.

Data Preprocessing
As data quality control, we exclude all samples with missing fMRI time series.The missing values in the demographic data of the samples are imputed using the mean value of all available data of corresponding categories [43].In this way, the overall distribution of the data is preserved, and the imputed values are likely to be close to the true values, assuming that the missingness is completely random.While we used the mean value to impute missing data, a more sophisticated approach, such as MICE [44], could be explored in future work.
We extract the mean time series of ROIs for each sample.We use parcellated regions as our targeted ROIs to extract voxel-level connectivity features.For each atlas, a respective connectivity matrix is formed, which is then condensed into a vector before being inputted into our model.The primary feature we use to differentiate between ASD and TC subjects is functional connectivity.We compute Pearson correlation coefficients between the time series of each pair of brain regions to produce a connectivity matrix: where PCC (u, v) stands for the Pearson correlation coefficient between time series of two brain regions u and v, and E(•) is the mathematical expectation.These coefficients range from −1 to 1; coefficients near 1 indicate a strong positive correlation, while those near −1 indicate a strong negative correlation between the time series of two brain regions.For example, we obtain a 200 × 200 symmetric matrix for correlation in the case of the CC atlas because it has been divided into 200 regions.This functional connectivity matrix is used as a feature to classify subjects into ASD and TC groups.
To estimate the duplicated values in the matrix, we take the upper triangle of the symmetric matrix as the original feature representation of this subject.We then flatten the remaining triangle by collapsing it in a one-dimension vector to retrieve a vector of features: where S is the dimension for the flattened vector, and N is the number of the regions in the atlas.
For example, we get a 1-D vector with 19, 900 features for each sample for the CC atlas.Following the previous computational process, for every individual subject, we secure three functional connectivity feature representations based on the respective three atlases.

Feature Selection
Using F-score [45], we rank all features in descending order to prioritise those with the highest discriminative power between ASD and TC subjects.Mathematically, the F-score is a ratio of variance between groups to variance within groups, quantifying each feature's discriminatory power.A higher F-score indicates a larger difference in means relative to variability, suggesting better class distinction.We compute the F-score values for all features in the dataset.Let x k represent the training vectors, with k ranging from 1 to m, n + being the count of positive instances, and n − the count of negative instances.The F-score for the i th feature is computed as follows: where ri , r(+) i , and r(−) i denote the average of the i th feature for the complete dataset, the positive subset and the negative subset, respectively.The numerator captures the discriminatory power between the positive and negative subsets, while the denominator encapsulates the dispersion within each subset.
We retain the top 15% of ranked features.To determine this feature retention range, we conducted a series of tests with selections ranging from 5% to 50% based on different parcellations.The optimal parameters were determined based on the classifier's average performance across different parcellations.Details of this experiment are presented in Appendix A.
The dimensions of the post-preprocessing data vary across different atlases; for example, features from the CC atlas have a dimensionality of 19,900, while the AAL and EZ atlases have a significantly smaller dimensionality of 6,670.To account for these differences and to maintain the 15% feature retention, we adopted an adaptive feature selection approach.We retained the top 1,000 ranked features for the AAL and EZ atlases and the top 3,000 ranked features for the CC atlas.

The MADE-for-ASD Model
The MADE-for-ASD model consists of two primary components, as illustrated in Figure 2. The initial component is an unsupervised learning stage employing a Stacked Sparse Denoising Autoencoder (SSDAE).Each layer is trained independently, capturing key variations in the data, with sparsity constraints promoting sparse, distributed representations.
The second component is a supervised learning stage using a Multi-Layer Perceptron (MLP).The parameters learned from the SSDAE are transferred to the MLP's first two layers, followed by fine-tuning to enhance performance on the ASD classification task.

Stacked Sparse Denoising Autoencoder
Unsupervised Learning �----------------------------------------,   loss term and a sparsity penalty term, defined as: where L(x (i) , x(i) ) denotes reconstruction loss, β is the sparsity weight and KL(ρ|| ρ j ) is the Kullback-Leibler divergence between target sparsity ρ and average activation ρ j of hidden unit j.The Kullback-Leibler divergence can be computed as: We employ an SSDAE with two hidden layers for unsupervised pre-training, as shown in the left part of Figure 2. Optimal model performance on the validation set is achieved using reconstruction loss (mean squared error).The input and output layers have N features, where N is the number of input features.The configurations and parameters are detailed in Table 1.

Transfer Learning to MLP
The SSDAE knowledge is transferred to an MLP with three hidden layers.The first two layers, with 1,000 and 500 units, inherit the SSDAE parameters, while the third layer is initialised with random weights.Four demographic features are added to the third layer as additional input.We add these into the last two layers because these features are less significant if we add them into the original input compared to the large amounts of other features.The right section of Figure 2 depicts the supervised training stage.
Fine-tuning adjusts the MLP weights to minimise prediction error in supervised tasks.The output layer consists of two units representing the likelihood of ASD or TC, using a softmax activation function to normalise the output distribution and enable the outputs to represent the corresponding probabilities belonging to a particular class.The configurations of the MLP parameters are shown in Table 1.

Weighed Ensemble Voting
Ensemble learning uses individual models and solves the same problem.This work adopts the bagging ensemble approach, which involves creating multiple subsets of the original data, training a model on each and combining their predictions, often through majority voting, to form a final prediction [46].
Voting can be hard or soft.In hard voting, each classifier in the ensemble votes for one class label, and the class label that gets the majority of votes is predicted.Our method is based on soft voting, as shown in Figure 3, which predicts based on the probabilities for each class label.We assign a weight for each classifier according to their individual classification accuracy among all three classifiers: We then compute the sum of the products of weights and probabilities corresponding to each class across all classifiers.Subsequently, the class with the highest cumulative value is designated as the output category: where P (F i (x) = j) is the predicted probability that instance x belongs to class j according to classifier F i .
As for the evaluation metrics, we report classification performance in terms of sensitivity and specificity in addition to accuracy because of the unbalanced nature of the dataset.

Dataset
We use rs-fMRI and demographic data from ABIDE I [11], the first phase of ABIDE, with 505 autistic individuals and 530 Typical Controls (TCs).We include four key demographic features of the subjects: age (years), sex (male/female), handedness (the dominant hand; left/ambiguous/right) and full-scale IQ (overall intellectual ability) (see Table B.6 and Table B.7 in Appendix B for the distribution of demographic information across ASD and TC classes and the number of missing demographic data, respectively.).We also employ a subset of the T1w MRI images from the ABIDE I site, NYU Langone Medical Center, to test our proposed methodology.This subset encompasses 182 subjects, including 78 ASD and 104 TC subjects.We selected this specific subset because it contains the largest amount of data among all participatory sites in ABIDE I. Additionally, previous studies [27,29] have reported ASD classification performance using the NYU subset, which allows us to compare our model's performance with theirs.Refer to Table B.8 in Appendix B for details of the NYU subset having 78 autistic individuals and 104 TC subjects.

Preprocessing
Preprocessed ABIDE I data with four different pipelines are released under Preprocessed Connectomes Project3 [47].The number of data varies across the resultant datasets from different pipelines.It is important to select a specific pipeline for a fair comparison with the prior works.Most of the prior works [12,13,22,19,14,20,21,17,16,15,9] leveraged CPAC (Configurable Pipeline for the Analysis of Connectomes) pipeline.A few studies, however, experimented with other pipelines, such as CCS (Connectome Computation System) by Dvornek et al. [18] and DPARSF (Data Processing Assistant for Resting-State fMRI) by Mahler et al. [37].The CPAC pipeline includes several operations, such as slice timing correction, motion correction and voxel intensity normalisation.In addition, the nuisance signal was removed utilising 24 motion parameters, CompCor with five components, low-frequency drifts (linear and quadratic trends), and the global signal as regressors.Functional data underwent band-pass filtering (0.01 − 0.1 Hz) and spatial registration using a non-linear method to a template space (MNI152).We, therefore, leverage the CPAC pipeline, hereinafter referred to as the ABIDE I CPAC data.Although Di Martino et al. [11] released the ABIDE I dataset with 1,112 samples, preprocessing and quality control reduced this number to 1,035, which is consistent with prior studies [13,22,19,20].For the NYU subset, our preprocessing and quality control reduced the number of samples from 182 to 175.

Subset Selection
Previous research [27,29] demonstrated better performance with the NYU subset than the whole ABIDE I dataset.This suggests that the NYU subset may possess lower noise levels and higher data quality.Based on this, we hypothesise that using the NYU subset for training could improve the model's ability to select high-quality data from the entire ABIDE I dataset.We, therefore, train our model on the NYU dataset using the CC atlas, save this model and repurpose it as a selector.We apply this pretrained model to classify ASD/TC instances across the whole ABIDE I dataset, which correctly classifies 645 out of 1,035 samples (around 62.3%).We separate these correctly classified samples to create a new subset, which potentially has higher quality.We refer to it as our proposed subset throughout this paper.Analyses on the quality of our proposed subset compared to the whole dataset and the NYU subset are presented in Appendix C.
While this subset selection aims to improve data quality, it may also lead to the selection of easier examples, potentially inflating performance metrics.Furthermore, this method may introduce exclusion criteria that could affect the generalisability of the results.A thorough analysis of the discarded subset to assess whether it excludes TC, ASD, or both is warranted, which we leave for future work.

ASD vs TC Classification
We evaluate our MADE-for-ASD model through experiments on different input data, ablation studies and comparative analyses with state-of-the-art methods.Following the approach of prior ASD diagnosis studies using the ABIDE I dataset [15,17,21,20,19,13,14,12,22], we employ 10-fold cross-validation to ensure robust evaluation and mitigate overfitting.This technique splits the dataset into ten equal-sized subsets, with the model being trained on nine subsets and tested on the remaining subset.This process is repeated ten times, and the classification scores are averaged across all runs, providing a cross-validated performance measure.

Classification Result
Table 2 reports accuracy, sensitivity and specificity on the whole ABIDE I dataset, our proposed subset and the NYU subset.
The accuracy and specificity of the whole dataset are lower than those of other subsets, which can be attributed to the noise and complexity in the whole dataset.The classifier reached its peak  a Subset was selected based on specific criteria of high-quality data (e.g., low head motion) performance on the NYU subset, which can be attributed to the high quality of NYU data.This is in line with Kong et al. [27] that the classification performance is higher with the NYU subset.

Comparative Analysis
We compare the performance of our model trained on the whole ABIDE I dataset with the CPAC pipeline, which exhibits encouraging insights (Table 3).(1) Our model achieved an accuracy score of 75.20%, which is a boost of 4.4 percentage points over the best prior work [20] on 1,035 samples.(2) Even without considering the use of the same amount of input data, our model outperforms the 74.53% accuracy on 860 samples reported by Deng et al. [9].(3) By incorporating ensemble techniques and introducing sparsity in the AE, our model outperforms other AE and DNN-based approaches [20,19,13] by a margin of at least 4.4 percentage points.
We further compare our work with similar studies on certain subsets of the ABIDE I dataset Table 4. Similar to the whole dataset, our proposed MADE-for-ASD model achieved the SOTA result on the NYU subset, showing an accuracy of 96.40%.Besides, our model's performance on the new subset generated by our data selection achieved an accuracy of 88.71%.This proves our model's ability to perform consistently well on different data subsets.
This result also underlines the potential influence of data quality on classification outcomes.The performance difference across subsets reinforces the need for data selection in such classification tasks.Our strategy of creating a new subset from the ABIDE I dataset has also proven to be effective.The accuracy on this subset signifies the potential of using selective strategies for data preparation to boost model performance.

Ablation Study
As can be seen on Table 5, we carry out a series of ablation experiments to better understand the individual contributions of different components of the MADE-for-ASD model.These experiments include removing the ensemble voting and using single and combination of atlas data for training, abolishing the F-score-based feature selection, and omitting the sample demographic information during classification.
When we remove the voting and use only a single atlas and a combination of atlases, the classification accuracy decreases.When removing data of the AAL and EZ atlases, the accuracy drops to 73.42%.It drops further to 71.20% after removing the CC and EZ atlases.Removing the CC and AAL atlases, the accuracy reaches the lowest at 68.74%.When we only remove the EZ atlas, the accuracy is somewhat increased.This shows that integrating multiple atlases via voting contributes to the improved performance of our model.The different atlas information can complement each other, thereby enhancing the robustness of the classification task.
When we abolish the feature selection based on the F-score, it leads to a decrease in accuracy to 72.80%.The drop in accuracy demonstrates the significance of feature selection.By focusing on the most informative features and deleting the noisy features, our model can better discriminate between classes.Lastly, removing the demographic information leads to an accuracy of 73.50%.This indicates that including demographic information can aid in the classification process, further improving the decision boundaries.
Although the inclusion of demographic data leads to better performance, these data should be carefully utilised with ML-based practical ASD diagnosis systems so that the system is not biased to particular demographics.Future work should focus on quantifying and mitigating such biases in ML models predicting ASD.Furthermore, exploring demographic differences within the pipeline -such as sex differences in the top ROIs revealed by our algorithm -represents an important avenue for future research.Similarly, evaluating performance on stratified demographic subsets would help ensure that our model performs consistently across different groups.
While our fMRI-based approach with deep learning provides robust diagnostic performance for ASD, it is important to acknowledge the interpretability challenges associated with these methods.Traditional behaviour-based approaches offer specific insights into the symptomatic nuances of participants, which can enhance the substance of clinical reports.In contrast, our method, despite its diagnostic accuracy, may offer fewer insights into individual symptomatic specificities.This limitation is two-fold: first, there is a need for further research to improve the interpretation of fMRI data and its correlation with autism symptoms; second, the deep learning model used in our study offers limited transparency, making it challenging to explain individual predictions based on input features.Future work should enhance fMRI interpretability for autism to bridge the gap between diagnostic accuracy and clinical insight.

Visualisation of Top ROIs
We utilise the F-score feature selection methodology to discern and rank the most influential features.The initial 3D images from the CC atlas are clustered based on the spatial position of ROIs.These resulting cluster centres are treated as the spatial coordinates of the ROIs.On an existing three-dimensional brain fMRI image from the dataset, we then plot the spatial coordinates of these cluster centres.We rank the appearance frequencies of the associated ROIs, thereby identifying the top 10 most frequently occurring areas of interest (Figure 4).
As can be seen, the precuneus, typically recognised as a central node of the default mode network (DMN) [48], has a crucial impact on ASD classification.This region is involved in selfreferential thought and social cognition, which are often disrupted in autistic individuals [49].Several studies have suggested that DMN connectivity can be associated with a neurophenotype of ASD [16].For example, Chen et al. [28] highlighted substantial contributions from default mode and somatosensory areas toward ASD diagnosis.Similarly, Abraham et al. [12] found distinguishing connections in the DMN related to ASD diagnosis using the ABIDE dataset.
The anterior cingulate/ventromedial prefrontal cortex, a region with established connections to autism, was notably pronounced in the ASD classification problem [50].This area is associated with emotional regulation and decision-making, processes that are often impaired in ASD [11].Furthermore, anomalies in the medial prefrontal cortex node of the DMN have been shown to detect social deficits in autistic children [51].Additionally, the left parietal cortex was stressed for ASD prediction, aligning with the lateralised activation seen in this region in autistic individuals [52].These findings from the visualisation of top ROIs corroborate previous studies that highlight the importance of these regions in ASD pathology [16].

Conclusion
ASD is a neurodevelopmental disorder characterised by a spectrum of symptoms and impairments.Common features of ASD include challenges with social interaction and communication, 13 alongside a preference for repetitive behaviours and interests, highlighting the diverse nature of this condition.This paper proposes a novel ASD diagnosis framework, MADE-for-ASD, involving a weighted ensemble of DNNs using multi-atlas brain fMRI data.Through the F-score-based feature selection method, we obtain discriminative features that offer valuable visual insights into significant ROIs associated with ASD.They shed light on the interplay of different features and their respective contribution towards ASD diagnosis.This would help clinicians and researchers gain a more intuitive understanding of how different brain regions contribute to ASD.Our model consists of an SSDAE and an MLP, complemented by integrating demographic information, which significantly enhances our model's predictive capabilities.Furthermore, our method of selecting high-quality classification subsets serves to reduce dataset noise and improve data quality.
Our model achieves the SOTA accuracy on both the whole ABIDE I dataset (an improvement of 4.4 percentage points) and its subset.Such an imaging-based ASD prediction system can benefit patients, families and healthcare systems worldwide through objective, efficient, non-intrusive and early diagnosis.
Appendix A. Feature Selection using F-score

Figure 1 :
Figure1: Overall framework of our ASD/TC classification workflow.We first calculate functional connectivity matrices on three brain atlases, followed by feature selection using F-score.To diagnose ASD, we use deep learning with a stacked sparse denoising autoencoder (SSDAE) and multi-layer perceptron (MLP), followed by a weighted ensemble.In the case of extracting high-quality data, we train the classifier with the NYU subset and predict on the whole dataset.

2 tFigure 2 :
Figure 2: Overall architecture and training workflow of the deep networks of the MADE-for-ASD model.Knowledge from stacked sparse denoising autoencoders is transferred to a multi-layer perceptron for ASD diagnosis.Here, X refers to the input data, Y = {Y 1 , Y 2 } are ASD and TC classes, and W refers to the weight parameter.

Figure 4 :
Figure 4: Top 10 most significant regions of interest towards ASD diagnosis using fMRI CC200 atlas.

Figure A. 5
Figure A.5 illustrates our feature selection process based on F-score, showing that the classification accuracy was highest when using the top 15% of the total features.A histogram of F-scores for each atlas can reveal the decay and range of feature values, which should be investigated in future work.

Figure A. 5 :
Figure A.5: ASD classification accuracy with different feature selection ranges in our F-score-based feature selection.The presented accuracy refers to the average accuracy over all three parcellations.
utilised the AAL, Craddock 200 and Craddock 400 atlases.Their experiment indicates that Craddock 200 can outperform Craddock 400 in fMRI-based ASD diagnosis.Hence, we selected the Craddock 200 atlas, hereinafter referred to as the CC atlas.

Table 1 :
AE) consists of input, hidden and output layers, with the hidden layer containing fewer neurons.Once feature representation is obtained in an AE, it can be used to train a new AE, leading to a stacked AE with multiple layers.A Stacked Sparse Denoising Autoencoder (SSDAE) incorporates sparsity constraints and noise addition to the input data for regularisation, enhancing model robustness.The objective function includes a reconstruction Parameter configurations of the Autoencoders (AE1, AE2) and Multi-Layer Perceptron (MLP) of the MADEfor-ASD model.Here, 'n' represents the number of input features.

Table 2 :
Classification performance of MADE-for-ASD model in terms of accuracy, sensitivity and specificity on the whole ABIDE I data and its subsets.

Table 3 :
Comparison with previous studies on whole ABIDE I CPAC data in terms of classification accuracy.

Table 4 :
Comparison with previous studies on a subset of the ABIDE I dataset in terms of classification accuracy.

Table 5 :
Classification accuracy in ablation study by removing atlases, feature selection and demographic information.Negative sign (−) refers to the removal of the corresponding component.

Table B .
6: Distribution of demographic information of the ABIDE I subjects.