Sensing technologies and machine learning methods for emotion recognition in autism: Systematic review

Background: Human Emotion Recognition (HER) has been a popular field of study in the past years. Despite the great progresses made so far, relatively little attention has been paid to the use of HER in autism. People with autism are known to face problems with daily social communication and the prototypical interpretation of emotional responses, which are most frequently exerted via facial expressions. This poses significant practical challenges to the application of regular HER systems, which are normally developed for and by neurotypical people. Objective: This study reviews the literature on the use of HER systems in autism, particularly with respect to sensing technologies and machine learning methods, as to identify existing barriers and possible future directions. Methods: We conducted a systematic review of articles published between January 2011 and June 2023 according to the 2020 PRISMA guidelines. Manuscripts were identified through searching Web of Science and Scopus databases. Manuscripts were included when related to emotion recognition, used sensors and machine learning techniques, and involved children with autism, young, or adults. Results: The search yielded 346 articles. A total of 65 publications met the eligibility criteria and were included in the review. Conclusions: Studies predominantly used facial expression techniques as the emotion recognition method. Consequently, video cameras were the most widely used devices across studies, although a growing trend in the use of physiological sensors was observed lately. Happiness, sadness, anger, fear, disgust, and surprise were most frequently addressed. Classical supervised machine learning techniques were primarily used at the expense of unsupervised approaches or more recent deep learning models.


Introduction
Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by a deficit in communication, social interaction, and lack of understanding of emotions.It affects circa 1% of the population and can be detected in the first years of life [1].One of the key reasons for the emotional misunderstanding is the inability of people with autism to comprehend prototypical feelings and emotions, which directly af-generally accepted that existing approaches are not definitive.Many of these studies deal with biased data and recognition of emotions such as happiness or fear was only marginally impaired in autism as well as the generalizability of the findings from the currently available data remains unclear [2,3].Furthermore, HER algorithms primarily rely on facial cues, overlooking other important aspects such as body language, vocal tone, and contextual and situational factors that would improve the accuracy of the algorithms [4].[5] and [6] describe new tools as well as computational model to assist people with autism in understanding and operating in the socioemotional world around them.Some findings reveal that children with autism spectrum condition have residual difficulties in this aspect of empathy.In the work of [7] authors concluded that relations between particular emotions and human body reactions have long been known, but there remain many uncertainties in selecting measurement and data analysis methods Moreover, it is also observed that a great number of the HER models used in autism are based on data collected from neurotypical people [8].Be that as it may, the use of general HER models in autism-related applications poses a number of challenges yet to be addressed and which demand special attention from the scientific community.While there exists a great bulk of systematic reviews addressing the technologies and methods used for emotion recognition in general [7,[9][10][11], very few focus specifically on its use in autism.In fact, existing systematic reviews in this direction are either centred on a specific technology such as eye-tracking [12], robots [13], and wearables [14], or particular methods like deep learning [15].Hence, a comprehensive systematic review focusing on the state of the art on emotion recognition sensing technologies and machine learning methods for autism emotion recognition is presented here.The results of this review will contribute to improve the current techniques for emotion recognition used in autism studies, encourage new research focusing on other conditions of the autism spectrum disorder that have been marginally investigated to date, and promote the use of physiological methods in addition to other traditional behavioural methods as potential emotion recognition modalities to be used in autism.The primary objective of this review was to determine the trends, advances, and challenges on sensing technologies and machine learning methods for emotion recognition in autism.To that end, this review aimed to answer the following research questions: (1) What type of sensor technology has been used for emotion recognition in autism?; (2) What type of machine learning techniques are most commonly used for emotion recognition in people with autism?; and (3) What are the main challenges in the use of emotion recognition technologies in people with autism?To the best of our knowledge, there are many reviews on autism, on HER, on machine learning methods but very little written about the whole of them and their complementation of these different areas.This is the main novelty of this review.Our study covers all age groups unlike most studies that focus on children.We raised a specific question to identify the main challenges in the use of emotion recognition technologies in autism.We also provide privacy and security aspects including the use of inform consents or approval by ethics committees.Furthermore, we offer a more recent view on the art as its search reaches up to June 2023.

Methods
The PRISMA 2020 (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [16] were followed to perform a systematic review of the literature on sensing technologies and machine learning methods for emotion recognition in autism.The specific methodology followed is described in the following sections.

Eligibility criteria
This review focused on studies that dealt with sensor technology and machine learning techniques for emotion recognition in children, young, and adults with autism.We did not restrict study location, sample size, gender, age, autism type, type of emotion, emotion recognition modality, devices and sensors, nor algorithms.Studies were eligible to be included in this review if they had three characteristics: 1) they were related to emotion recognition; 2) used sensors and machine learning techniques; and 3) involved children with autism, young or adults.
Other eligibility criteria included: 1) published between January 2011 and June 2023; 2) written in English; 3) scientific article published in a journal or in conference proceedings; and 4) research domain related to computer science or engineering.
Studies were ineligible if affective technology was used in therapy and treatment of patients with autism or in an educational environment.Therefore, we excluded studies related to: 1) robotic treatments or therapies; and 2) social interaction and education.

Information sources
We conducted electronic searches for eligible studies within the reference databases of Scopus and Web of Science.The search was conducted from 1st January 2011 to 30th June 2023.

Search strategy
"Autism" and "emotion recognition"/"recognition of emotion" were selected as primal concepts to be searched.In addition to them, synonyms of the "autism" term, namely "autistic", and the "emotion" term, namely "mood" and "affect", were also considered as they are quite often used interchangeably in this research area.Limits were also applied to the search strategy based on the eligibility criteria.We selected papers published between 2011-2023, published in English computer science or engineering journals or proceedings.The resulting queries eventually run on Scopus and Web of Science are shown below.

Selection process
The records retrieved from the databases and hand search were imported to the Mendeley Web Library, which was used as a primary tool to navigate through both records and reports.Duplicate records were manually identified by cross-checking title and abstract and then removed by three reviewers (ZC, OB, CV).These reviewers also screened each record and each report retrieved, assessed their eligibility, and eventually selected the final set of studies to be included in the review after reaching a majority consensus.

Data collection process
All reviewers (ZC, OB, JM, AP, DG, JP, SA, CV) participated in the review and assessment of the included studies.The studies were evenly distributed among three groups of reviewers according to their affiliation.
We used a cloud-based collaborative spreadsheet (Google Spreadsheet) to collect data from the included studies.The document consisted of a state-of-the-art matrix where each row represented a study and the columns indicated the data items to be analyzed.Each group of reviewers had to full screen and analyze the papers that were assigned to them and fill the information in the corresponding columns of the matrix.Periodic meetings were held in order to harmonise terminology and overcome potential discrepancies in the assessment process.Reviewers worked independently to extract the information.

Data items
The columns defined in the collaborative spreadsheet corresponded to the outcomes for which data were sought.The specific columns defined were: study name, year of publication, type of article, research goals, subject condition (autism type), emotion recognition modality, dataset (collection or use of), description of the dataset (if applicable), emotions sensed, devices used for the data collection, machine learning techniques, validation methods, study sample (size, type), study length, performance results, study outcomes, privacy and security, and challenges and future work.

Study selection
A sample of 371 records were identified from the literature search.Namely, the search in Scopus yielded 206 records, while 165 records were obtained for Web of Science.71 duplicate records were removed before screening.After deduplication, 300 records remained and were screened based on title and abstract.112 records were excluded and 188 reports were sought for retrieval.13 reports could not be retrieved and the remaining 175 reports were assessed for eligibility.110 reports were excluded according to the eligibility criteria and the remaining 65 reports were included for the analysis.The workflow with the detailed process is shown in Fig. 1.
In relation to the sensed body regions or signals, the majority of studies 71% (46/65) use physical data, i.e. sensed from the external parts of the body, mostly the face.20% (13/65) of the studies exploit the inner body, including physiological signals such as electroencephalography (EEG) and electromyography (EMG), or psychoacoustic signals [19,24,25,27,47,63,65,66,69,71,73,74,77].The neurophysiological approaches provide valuable insights into the neural and muscular correlates of emotional states.As for the rationale behind considering psychoacoustics lies in the fundamental role of voice in the recognition of emotions within human interactions.By delving into the nuances of voice expression, researchers aim to deepen their understanding of how emotions are conveyed and perceived through auditory cues, con- tributing to a more comprehensive exploration of emotional recognition within the context of ASD.Around 8% (5/65) combine both physical and physiological signals [33,35,50,62,70].Less than 2% (1/65) of the studies did not provide enough information to this respect [17] (Table A.3).
One day was the minimum study duration [19] and 140 days the maximum duration [48].Yet, it must be noted that no additional information is provided in the rest of studies to this respect.The absence of duration details in the rest of the studies emphasizes the need for improved reporting standards to ensure a comprehensive understanding of the temporal aspects of HER research in autism.
Table A.5, Fig. 5 and Fig. 6 show the emotions used in all the analyzed studies for training/validation and test respectively.According to the listed results, approximately half of the studies leverage the six universal emotions, often relying on or producing publicly available datasets accessible to the scientific community.This choice facilitates meaningful comparisons between these studies, given their shared use of a standardized set of emotions.In contrast, the remaining studies opt for or create "ad-hoc" specific datasets, employing a set of emotions distinct from the universal ones.As a result, conducting comparisons be-  tween these approaches becomes more intricate due to the varied and specialized nature of the emotional categories used in these datasets.
Although some studies did not mention the emotions used in the training and validation of the HER models [28,29,36,37,44,45,67,70,75,78], a prevailing trend is the consistent use of the same set of emotions across training, validation, and test phases in most studies.Exceptions to this are [25][26][27]43,47,[58][59][60]73,74,76,77,80], which used different sets of emotions for training and validation than for test, representing 20% (13/65) of the studies.Employing a different set of emotions for test introduces valuable diversity, reflecting the model's adaptability to recognize a broader spectrum of emotional expressions beyond its training data.This approach enhances the robustness and real-world applicability of HER models by challenging them with unseen emotion data instances during evaluation.

Devices and sensors
Two generations of devices and sensors are identified for the time frame considered for this review, which is related to the periods 2011-2014 and 2015-2023, respectively.Around 11% (7/65) of the studies correspond to the period 2011-2014, which is characterized by using images and audio as the primary data source.57% (4/7) of these studies use a so-called first generation of devices consisting of webcams, headphones, and microphones [36,73,74,77].To facilitate the labelling of the user's data, some controls were incorporated into the systems in 43% (3/7) of the aforementioned studies, including control knobs [74,77] or numeric keypads [79], which were easily handled by users with autism.This period marked the initial steps in using technology for autism research, establishing a foundation for future studies.
The advent of facial and body tracking technologies was also leveraged in this field.Such technologies were used in 16% (9/58) of the studies of the so-defined second generation.Devices like Kinect and Intel RealSense enabled improved facial and body tracking, enhancing the interaction and analysis of autistic behaviours.Kinect devices were incorporated into various works [17,45,48,56] due to the availability of an RGB camera, a depth sensor, and a microphone of-the-shelf.Recently, some works have started to use the Intel RealSense device, which has characteristics similar to Kinect [41].In the same way, Tobii devices were proposed for eye tracking [57] or for head and eye tracking [18].Standard cameras were also used to record images for eye tracking [75] and pose tracking [48].
Physiological sensors such as EEG were incorporated in 5% (3/58) of the second generation studies.In [19,47] EEG data is collected using a headset with electrodes placed on the participants' scalps.In [46], the authors use a commercial EEG device (Emotiv) to collect data from the frontal, temporal, and posterior brain regions.The use of wearable devices was rare.Only 3% (2/58) of the studies included these devices.Microsoft Band 2 was used in [56], while shimmer sensors were used in [47].In an attempt to incorporate augmented reality features, Microsoft Hololens and Google Glasses were also used in [52,54], respectively.Full details are provided in Table A. 6 and Fig. 7.

Models and performance
This subsection summarises the findings concerning machine learning techniques, performance and metrics, validation methods, and the number of data samples.
The use of Deep Learning is currently confined to recent works, constituting a 31% (20/65) of the studies [20,25,[28][29][30][31][32][33][34][40][41][42][43][63][64][65][66]68,69,71].This limitation suggests an untapped opportunity, as earlier research may not have fully harnessed the capabilities of deep learning for complex pattern recognition tasks in emotion recognition in autism.Notably, some of the most recent works leverage Deep Learning techniques, including convolutional neural networks, highlighting the emerging potential for improved performance in emotion recognition.However, it is crucial to acknowledge a potential bias towards supervised learning, indicating a potential gap in exploring unsupervised or semi-supervised methods.These alternative approaches could offer valuable insights, especially in scenarios where labelled data is scarce or challenging to acquire.Exploring a broader spectrum of deep learning methodologies could enhance the versatility and effectiveness of emotion recognition models.
The performance of emotion recognition models varies significantly among the studies, attributed to differences in target emotions, sensor data types, machine learning techniques, and dataset instances.This variation suggests challenges in directly comparing study outcomes and establishing standardized benchmarks.Studies can be categorized into three groups according to performance levels.First, 28% (18/65) of the studies are ranked as of high performance (i.e.accuracies greater than 90%), most usually developing an offline evaluation based on datasets collected under controlled conditions [8,21,23  , with more ambitious and challenging solutions based on emerging sensor technologies, leading to performances below 80%.The remaining studies have not sufficiently described their performance results and could not be classified into any group.
The studies make use of a variety of metrics to evaluate model performance, including accuracy, sensitivity, and specificity.This range of metrics provides a fair understanding of model performance, especially those handling dataset imbalances.Concerning the metrics used to estimate model performance, around 57% (37/65) of the studies have chosen the use of accuracy [8,17-24,28-30,32,33,35,38-43,46,48-50, 52-55,62-64,66-70].Other studies have used unweighted average recall to deal with data set imbalance more effectively [27,30,65,74].Sensitivity has also been used in some studies [51,59,72], although they still need to evaluate the prominence of true negatives by avoiding the use of specificity.Few studies [18,22,41] rely on the use of accuracy, sensitivity, and specificity to more fully reflect the performance of their recognition system.
The studies adopted different cross-validation techniques (e.g., ten-fold, five-fold, leave-one-out) which provide a rigorous approach to model validation, ensuring the reliability of the findings.Crossvalidation is developed in 22% (14/65) of the studies: ten-fold crossvalidation is used in [19,21,29,49,53,54,60], five-fold cross-validation [31,33,46], eight-fold cross-validation [46] and one-leave-out crossvalidation [18,28,74].More exceptionally, other approaches such as random split leave one subject out [48], random split cross validation [32,52], and split train-test or hold out [26,38,42,43,63,68,69] are used.A more detailed description can be found in Table A. 7 and Fig. 8.This figure illustrates the variety of machine learning methods employed in the reviewed papers, emphasizing the growing prevalence of deep learning due to its robust yet intricate models.However, a noteworthy 12% (8/65) of the papers lack information on the techniques utilized.Encouragingly, there is an expectation that this trend will shift, leading to more papers sharing their models' code in repositories for enhanced scientific community knowledge.

Information privacy and security
Despite the relevance of ensuring privacy and security policies in this field, only 22% (14/65) of the studies acknowledge these sufficiently [18,23,24,29,36,39,41,47,54,58,61,67,72,79].Three of these studies [23,72,79] followed the 1964 Declaration of Helsinki, a formal statement of ethical principles published by the World Medical Association (WMA) to guide the protection of human participants in medical research [83].The other 11 studies mentioned that they either had the consent of the relatives of the people with autism or their work was approved by the ethics committee of the given universities or other institutions.All details are provided in Table A. 8.
Upon analyzing the obtained results, the majority of studies that indicated privacy and security aspects had obtained consent from family members, or the research was approved by ethical committees of universities/institutions; very few studies adhered to the Declaration of Helsinki.However, it is evident that there is insufficient consideration of privacy and security aspects in the majority of studies.Studies lacking pertinent details may not adhere to ethical protocols, thereby generating concerns about the protection of participants.

Findings
The great majority of the studies analyzed referred to the autism spectrum disorder in different ways.Namely, it was noticed the use of two terminologies "autism" and "all kinds of autism" to refer to this condition interchangeably.This shows a lack of unification on the use of this terminology by the scientific community of the HER field.More importantly, more research needs to be placed towards mild and highfunctioning autism, as well as other conditions of the autism spectrum like Asperger, which according to the results are just marginally considered.
While a majority of studies indicate the type of autism considered in their research, a significant number did not.Omitting information on the type of autism considered in studies can hinder accurate interpretation and reduce the applicability of findings, leading to potential misinterpretations and limiting the generalizability of research outcomes.Additionally, the absence of this specification may impede meaningful comparisons across studies, hindering the overall advancement of knowledge in the field of emotion recognition in autism.
The lack of proper specification of gender aspects in over half of the reviewed studies can have several consequences.It may lead to an incomplete understanding of how gender influences the outcomes of the research, potentially masking gender-related patterns or differences in emotional recognition within the context of ASD.Additionally, it hinders the generalizability of findings, as the impact of gender on emotion recognition might be relevant.
The number of participants involved in the studies varies remarkably, thus limiting the comparability of the results.While it is generally encouraged in the area to include as many participants as possible, the number of involved individuals should be fairly supported via an appropriate statistical power analysis.At least, it should be attempted to guarantee a sufficient number of participants matching the average number of the art.
The emotion recognition modality most predominantly used is the one based on facial expressions, followed by speech.The reason for favouring the measurement of physical variables over physiological might be related to the fact that emotions are socially expressed and perceived via physical cues, such as facial and visual expressions and the voice tone.ASD is however sometimes characterized by a lack of expressiveness.Hence, it might be a good choice to observe and to analyze physiological behaviour in this population in addition to the physical one.
A major part of the studies used the six basic universal emotions (anger, sadness, happiness, disgust, surprise, and fear) considered as a standard for HER systems.The reasons may have to do with the fact that such emotions represent the most common set expressed by people in their daily life [82].Moreover, using emotions similar to the ones used in prior works facilitate cross comparison and reproducibility, so that better conclusions can be drawn.Several studies used combinations of the six basic emotions by adding very specific emotions (neutral, calm, nervous, scared, curious, excited, sleepy, contempt, joy, and contentment).From this set of emotions, "neutral" stands out as the most frequent one, possibly due to its prevalence in the daily life.
The majority of devices and sensors employed in the period 2015-2023 are seen to be particularly advances with respect to the ones used in the period 2011-2014.IP and infrared cameras, face or body tracking sensors, and partially wearable sensors or robots are used in the second half of the decade while more traditional systems such as webcams and microphones were used during the first half.From our analysis we can conclude that most works use a single technology to assess emotions in autism.The main reason for considering a sole device could be to simplify the sensor setup and lessen the intrusiveness sometimes felt by users when using these technologies.Combining multiple technologies to assess emotions may potentially lead to more accurate and robust decisions, as shown in the literature for neurotypical populations [84].However, as we found out in a former study of ours [85], people with autism (adolescents) show reluctance to using multiple devices, and in particular to some specific ones such as infrared cameras.
All the algorithms used are of the supervised kind, which was expected since most of the reviewed studies are aimed at diagnosing autism.A limitation observed for such approaches is that they tend to be learned on general-purpose emotion recognition datasets, most likely due to a clear lack of existing autism-specific datasets.General-purpose datasets could serve well for boosting some machine learning models, however they are of limited use when it comes to recognising the emotions expressed by people with autism.One goal for the community could then be the creation of new relevant datasets particularly devised for autism applications.Moreover, we did not find any study exploring the use of unsupervised methods.The use of these methods allows for the creation of clusters, which could help identify people with similar patterns within a similar spectrum.This is found of much interest specially when it comes to a disorder like autism, which is quite diverse per se.
The performance results have been shown to vary among studies.High performances are obtained for a number of studies, however, it is observed that most of such studies do not describe in sufficient detail the evaluation method, thus hindering the validity of the reported results.Cross-validation and accuracy metrics are most widely used for evaluating the emotion recognition models performance [86].A minority of the studies characterize the performance of their system more comprehensively using other metrics such as sensitivity and specificity.In order to avoid the effects of data bias, future research in this area is encouraged to consider using more robust metrics such as the F-score [87].The number of data samples is also found key to determine the relevance of the reported results, and according to this review, only a minority of the studies appear to use a relevant sample set.This limits somewhat the validity of some of the results reported in the reviewed studies.
Privacy and security aspects have been partially addressed and only by a minority of the studies.It should be noted that this kind of studies work with sensitive data, and it is imperative to guarantee the protection of participants in medical research, especially when it comes to people with neurodevelopmental disorders.Presumably, the studies that did not give details on this information may not have followed any ethical protocol or perhaps simply forgot to report it in the manuscript.While the former is a more serious issue than the latter, we think this is an aspect that must be improved considerably in the future and information privacy and security should be compulsorily addressed in all studies of this nature.

Challenges and opportunities
From the previous findings a number of important challenges and opportunities are identified which should be considered in our opinion while designing, developing, and using emotion recognition technologies for people with autism.
One such challenge has to do with the heterogeneity of the autism spectrum disorder.The fact that autism is a spectrum disorder entails that people with autism have a wide range of abilities, difficulties, and preferences.Regular emotion recognition technologies may not account for this heterogeneity, as they often rely on standardized models and assumptions about emotional expressions of neurotypical persons.Hence, we consider important to build systems devised for each specific subtype of the disorder, and whether possible, favour the development of personalized approaches that consider the unique characteristics of each person with autism.A way to materialize this idea could be to use transfer learning approaches, in which an existing emotion recognition model trained on a large dataset from a heterogeneous cohort of individuals is used to learn a personalized model by tuning the former with new data of a particular individual or group of subjects pertaining to a specific autism subtype.
Another important challenge relates to the nonverbal communication variability linked to the disorder.People with autism may have atypical patterns in facial expressions, body language, and vocal tone.This review underscores that current emotion recognition technologies, predominantly reliant on visual and auditory cues, may struggle to accurately interpret these atypical communication styles.To overcome this, the creation of new algorithms tailored to the specific nonverbal communication of individuals with autism is proposed, potentially involving the development of expert-validated datasets that capture this variability [88,89].While engaging individuals with autism in this process is ideal, an alternative involves using actors to mimic these cues based on expert instructions.
Many people with autism have sensory sensitivities, which can affect their tolerance for certain stimuli, such as bright lights, loud sounds, or touch.As a result, the use of emotion recognition technologies that involve a sensory input, like vivid displays, loudspeakers, or tactile sensors, may potentially cause discomfort or distress for some individuals with autism.Hence, it is important to make sure that the technologies are adaptable to the sensory needs and particularly to the preferences of the user.Emotion recognition systems should be designed to be flexible and compatible enough to operate with the sensor modalities chosen by the user.One way to achieve this is to develop ensemble learning models combining multiple individual models each running on data from different sensor modalities.By following this approach, the resulting emotion recognition model can easily adapt to the absence of one or various sensor modalities and still operate, although the accuracy of the system would be normally lower.
Other relevant challenge emphasizes the contextual nature of emotions, particularly pertinent in the case of autism where social nuances may pose difficulties for solely facial or vocal emotion recognition systems.To address this, integrating alternate sensing options, such as physiological cues, is suggested.However, the broader context, including situational cues, personal history, and individual preferences, is crucial for enhancing recognition accuracy.To capture this intricate information, incorporating virtual agents or chatbots in regular interactions with individuals with autism is proposed as a means to gather comprehensive data for more effective emotion recognition models.
As previously highlighted, emotion recognition technologies introduce significant ethical considerations concerning privacy and data security.The collection and storage of sensitive emotional data carry potential implications for an individual's privacy and autonomy.Therefore, it is paramount to establish rigorous privacy measures, secure informed consent, and guarantee that people with autism retain control over their personal information [90].Given that many emotion recognition models operate on sensitive data, including video and audio, it becomes particularly imperative to transparently communicate the purpose, methodology, and safeguards associated with data collection.Ensuring these systems adhere to and assure full compliance with relevant regulations is essential.Consequently, the incorporation of emotion recognition technologies utilizing sensory inputs, such as vivid visual displays, loudspeakers, or tactile sensors, has the potential to induce discomfort or distress for some individuals with autism Addressing these challenges requires interdisciplinary collaboration between researchers, technologists, and the autism community.It is crucial to involve individuals with autism and their families in the design and development of these technologies to ensure that they are respectful, inclusive, and beneficial for the target population.
A summary of the principal findings described previously is provided in Fig. 9.The diagram shows the strengths and weaknesses of the reviewed studies on sensing technologies and machine learning methods for emotion recognition in autism as well as the challenges and recommendations for the research community.Strengths are the aspects that these studies have performed well on and could be reproduced in future investigations.Weaknesses are matters that went wrong in these studies and could be improved in future research.Challenges are the elements that the scientific community needs to address successfully to boost the investigation of this topic.Recommendations are the suggestions for the research community working on this field.

Limitations
As for any other review, and despite having used a rather broad search strategy, it is certainly possible that some interesting studies may have left out from our analysis.Namely, the search areas of this systematic review were circumscribed to computer science and engineering respectively, as they are quite large areas and most relevant for the scope of this study.Nonetheless, it is also possible that some relevant studies indexed in other related categories may have been filtered out.We conducted a preliminary check for other domains such as psychology, behavioural sciences, or pediatrics and we did not find relevant studies that would meet the defined criteria.Another possible limitation of this review refers to the reference management software used to process both records and reports.We decided to use Mendeley since all reviewers were quite familiarised with the tool and all three contributing institutions supports access to it.One of the major advantages in deciding to use Mendeley is that it allows the creation of academic research communities through collaborative research [91].However, other free and open-source reference management software, such as Zotero, could be more appropriate when it comes to pursuing open science principles.Finally, the protocol systematic review conducted here could have been pre-registered, for example via PROSPERO, however the researchers were not aware of this option when the work started.Nonetheless, we recently searched for similar pre-registered protocols and none resulted from the search, so we presume that no overlap exists between our work and other on-going reviews in the field.

Conclusion
Automatic emotion recognition constitutes a fairly consolidated research domain in the affective computing field.However, as it is shown in this review, its application to autism is limited and insufficiently validated.Thus, for example, new research should explore the design and development of models that account for the particular characteristics of people with autism, rather than pushing to the limits the generalisation of existing models trained on data collected from neurotypical people.In this regard, collecting and sharing publicly new datasets involving people with autism is found of paramount importance as these are practically nonexistent to date.More efforts should also be put towards describing in greater detail the characteristics of the samples subject to study.Gender, age, and autism type are not consistently reported thus making it difficult to assess the relevance of the proposed models and hindering the replicability of the studies.In view of the diverse nature of the autism spectrum, it seems also quite reasonable to explore in future studies the use of holistic sensing approaches.Indeed, facial expression recognition is ahead of other solutions, also in this domain, however, the disparity and lack of expressiveness among people with autism make it necessary to consider measuring multiple physical and physiological signals.Accomplishing these challenges demands interdisciplinary collaboration teams and the appropriate funding of governments and institutions to design, develop, and validate the required technologies from an autism-centric perspective and in realistic settings.We truly hope that reflecting on the positive contributions made by researchers in this field particularly on the ample room for improvement can spark great interest from other colleagues from the affective computing field to devote time and effort to boost this important domain.

Summary table
• Automatic emotion recognition constitutes a consolidated research domain in the affective computing field.Nevertheless, as shown in this review, its application to autism is limited and not sufficiently validated.• New research should explore the design and development of models that take into account the unique characteristics of individuals with autism, rather than generalising existing models trained on data collected from neurotypical individuals.• The collection and public sharing of new datasets that include individuals with autism is therefore considered of utmost importance, as they are virtually non-existent to date and should include more detailed characteristics of the samples under study.• Considering the diverse nature of the autistic spectrum, it also seems quite reasonable to explore the use of holistic detection approaches in future studies.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.[8] Create a game to help children with autism cope with their emotional difficulties Leo et al. (2015) [49] Integrate automatic emotion recognition capabilities in a robot-children interaction tool for autism treatment Postawka et al. (2019) [17] Develop emotion recognition methods for behaviour model estimation based on body position Jiang et al. ( 2019) [18] Identify subjects with/without autism by using facial emotion recognition and eye tracking Gao et al. (2015) [19] Classify emotions through electroencephalography signals Heni et al. (2016) [35] Design an app to recognize both emotions and voice Jeon et al. ( 2015) [50] Examine how children with autism and neurotypical children understand and interpret emotions Fan et al. (2017) [46] Explore the feasibility of using electroencephalography signals to analyze the facial affect recognition process of individuals with autism Joseph et al. ( 2018) [20] Propose a new algorithm to detect primary emotions of children with autism in real time using deep learning Spicker et al. (2016) [79] Investigate the differences in perception and categorization of emotional facial expressions of virtual characters between children and adolescents with autism, attention-deficit hyperactivity disorder, and neurotypical ones Enticott et al. ( 2014) [61] Examine facial emotion recognition of matched static and dynamic images among adolescents with autism and adults and neurotypical individuals Santhoshkumar et al. ( 2019) [21] Predict basic emotions from children with autism using body movements Sivasangari et al. ( 2019) [22] Propose a new methods for the automatic recognition of emotions Tang et al. ( 2017) [51] Compare the manual tagging of emotions by teachers/parents with the automatic one produced by an automatic system during naturalistic tasks Ley et al. ( 2019) [62] Evaluate existing tools for emotion recognition based on facial features as well as vocal features in voice interactions Chung et al. ( 2019) [52] Develop an augmented reality system for the presentation of the emotions detected via a facial expression recognition model Smitha et al. (2015) [53] Determine a feasible method for realizing a portable emotion detector for children with autism Daniels et al. (2018) [54] Build a therapeutic tool for children with autism using wearable technologies to recognize emotions as well as estimate how these interpretations differ between children with autism and neurotypical children Smitha et al. (2013) [55] Build a hardware efficient portable emotion recognizer on an FPGA to aid children with autism during the recognition of emotions Tang et al. (2016) [56] Develop an IoT natural play environment to help neurotypical children to understand children with autism emotions Liliana et al. (2020) [23] Develop an artificial intelligent model based on psychological knowledge to recognize emotions by analyzing facial expressions Ghorbandaei et al. (2018) [72] Build a robotic platform for reciprocal interaction in which a vision system recognizes the facial expressions of the user through a fuzzy clustering method Elamir et al. (2018) [24] Design an automatic emotion recognition system based on nonlinear analysis of various physiological signals Fernandes et al. (2011) [36] Apply a game-based approach to teach children with autism to recognize facial emotions using realtime automatic facial expression analysis and virtual character synthesis Anishchenko et al. (2017) [37] Develop a tablet application for learning and detecting facial expressions Su et al. (2018) [78] Examine the differences of emotion recognition and eye gaze pattern between children with autism and neurotypical ones using facial expressions Arellano et al. (2015) [75] Assess how abstract emotional facial expressions influence the categorization of the emotions by children and adolescents with high functioning autism AndleebSiddiqui et al. (2020) [25] Recognize emotions via speech analysis using deep learning Globerson et al. (2012) [77] Explore the association between psychoacoustic abilities and vocal emotion recognition in a group of individuals with autism and a matched group of neurotypical individuals Sunitha et al. (2014) [73] Collect a new dataset for emotion recognition from speech Bagirathan et al. (2020) [47] Compare psycho-physiological signals from neurotypical children and children with autism Piparsaniyan et al. (2014) [26] Propose a new method for facial expression recognition Marchi et al. (2012) [27] Classify a number of emotions in different scenarios Marchi et al. (2015) [74] Analyze various existing emotion recognition datasets Syeda et al. (2017) [57] Perform visual face scanning pattern and emotion perception analysis between neurotypical children and children with autism Guha et al. (2018) [76] Assess the reduced complexity in facial expression dynamics of subjects with high functional autism relative to their neurotypical peers Costescu et al. (2020) [58] Test the effectiveness of a facial expression recognition instrument in both neurotypical individuals and adolescents with autism Tracy et al. (2011) [80] Show impaired recognition of all basic emotion expressions and more socially complex ones when forced to complete the recognition process in a very brief time frame Chung et al. (2020) [52] Design an e-learning model for students with autism Zhang et al. (2016) [60] Propose a new emotion recognition system based on facial expression images Dantas et al. (2022) [38] Build a game to support the ability for children with autism to recognize and express basic emotions Saranya et al. (2022) [28] Develop an deep learning-based emotion recognition method for improving the rate of detection in children with autism Sukumaran et al. (2021) [63] Identify the presence of ASD and to analyze the emotions of children with autism through their voices Wang et al. (2021) [30] Analyze an emotion care system based on big data analysis for autism disorder patient training, where emotion is detected in terms of facial expression Banire et al. (2021) [29] Develop a face-based attention recognition model using geometric feature transformation and time-domain spatial features Piana et al. (2021) [39] Build a system for the automatic emotion recognition designed for helping children with autism to learn to recognize and express emotions by means of their full-body movement Ruan et al. (2022) [64] Design and build automatic computer-based learning tools for children with ASD to improve their performance in Maths Milling et al. (2022) [65] Contribute with a voice activity detection (VAD) system specifically adapted to children with autism vocalisations Chitre et al. (2022) [66] Model a Real-time Speech Emotion Recognition (SER) that takes audio signals as inputs and detects the emotions based on those signals Wang et al. (2022) [67] Examine the effects of video-based intervention on emotion recognition in four children with ASD with imitation in speech Wan et al. (2022) [40] Propose a novel framework for human-computer human-robot interaction and introduce a preliminary intervention study for improving the emotion recognition of Chinese children with autism Silva et al. (2021) [41] Develop a system capable of automatically detecting emotions through facial expressions and interfacing them with a robotic platform to allow social interaction with children with ASD Praveena et al. (2021) [68] Recognize and predict face emotion in ASD Rojas et al. (2021) [42] Help people with a degree of difficulty in interpreting emotions so that they can have a normal social interaction through a mobile app in real-time Karanchery et al. (2021) [43] Provide a solution to be deployed in learning environments for individuals with ASD to aid the primary caregivers in understanding their emotional states Valles et al. (2021) [69] Develop a speech emotion recognition system to help children with autism to better identify the emotions of their communication partner DIzicheh et al. (2021) [44] Present a serious game called EmoAnim that utilizes animations to screen players' emotion recognition capabilities  [32] Highlight the significance of image pre-processing in Deep Neural Network models for facial expression recognition to improve training, overall accuracy and efficacy Li et al. (2021) [33] Introduce a novel way to combine human expertise and machine intelligence for ASD affect recognition via a two-stage schema Ghanouni et al. (2021) [45] Develop a novel motion game to address perspective by incorporating both children/youth with ASD and their parents feedback Zhang et al. (2023) [70] Develop a novel discriminative few shot learning method to analyze hour-long video data and explore the fusion of facial dynamics for automatic ASD trait classification Talaat (2023) [34] Develop real-time emotion recognition system based on deep learning neural networks for youngsters with autism Murugaiyan et al. (2023) [71] Propose a model to help people with ASD to understand other's sentiments expressed through speech  [8] Facial emotion recognition Facial expression Leo et al. (2015) [49] Facial emotion recognition Facial expression Postawka et al. (2019) [17] Emotion recognition based on activities Not sufficiently described Jiang et al. (2019) [18] Multimodal emotion recognition Facial expression, Eye tracking Gao et al. (2015) [19] Brain activity emotion recognition EEG Heni et al. (2016) [35] Facial emotion recognition Facial expression, Voice expression Jeon et al. (2015) [50] Multimodal emotion recognition Facial expression, Voice expression Fan et al. (2017) [46] Facial emotion recognition Facial expression Joseph et al. (2018) [20] Facial emotion recognition Facial expression Spicker et al. (2016) [79] Facial emotion recognition Facial expression Enticott et al. (2014) [61] Facial emotion recognition Facial expression Santhoshkumar et al. (2019) [21] Body emotion recognition Body movement Sivasangari et al. (2019) [22] Facial emotion recognition Facial expression Tang et al. (2017) [51] Facial emotion recognition Facial expression Ley et al. (2019) [62] Multimodal emotion recognition Facial expression, Voice expression Chung et al. (2019) [52] Facial emotion recognition Facial expression Smitha et al. (2015) [53] Facial emotion recognition Facial expression Daniels et al. (2018) [54] Facial emotion recognition Facial expression Smitha et al. (2013) [55] Facial emotion recognition Facial expression Tang et al. (2016) [56] Facial emotion recognition Facial expression Liliana et al. (2020) [23] Facial emotion recognition Facial expression Ghorbandaei et al. ( 2018) [72] Facial emotion recognition Facial expression Elamir et al. (2018) [24] Multimodal emotion recognition EEG, EMG Fernandes et al. (2011) [36] Facial emotion recognition Facial expression Anishchenko et al. (2017) [37] Facial emotion recognition Facial expression Su et al. (2018) [78] Visual emotion recognition Eye tracking Arellano et al. (2015) [75] Facial emotion recognition Facial expression AndleebSiddiqui et al. (2020) [25] Speech emotion recognition Voice expression Globerson et al. (2012) [77] Speech emotion recognition Voice expression Sunitha et al. (2014) [73] Speech emotion recognition Voice expression Bagirathan et al. (2020) [47] Multimodal emotion recognition EEG, EMG, HR Piparsaniyan et al. (2014) [26] Facial emotion recognition Facial expression Marchi et al. (2012) [27] Speech emotion recognition Voice expression Marchi et al. (2015) [74] Speech emotion recognition Voice expression Syeda et al. (2017) [57] Multimodal emotion recognition Facial expression, Eye tracking Guha et al. (2018) [76] Facial emotion recognition Facial expression Costescu et al. (2020) [58] Facial emotion recognition Facial expression Tracy et al. (2011) [80] Facial emotion recognition Facial expression Chung et al. (2020) [52] Facial emotion recognition Facial expression Zhang et al. (2016) [60] Facial emotion recognition Facial expression Dantas et al. (2022) [38] Facial emotion recognition Facial expression Saranya et al. (2022) [28] Facial emotion recognition Facial expression Sukumaran et al. (2021) [63] Speech emotion recognition Voice expression Wang et al. (2021) [30] Facial emotion recognition Facial expression Banire et al. (2021) [29] Facial emotion recognition Facial expression Piana et al. (2021) [39] Body movement emotion recognition Body movement Ruan et al. (2022) [64] Facial emotion recognition Facial expression Milling et al. (2022) [65] Speech emotion recognition Voice expression Chitre et al. (2022) [66] Speech emotion recognition Voice expression Wang et al. (2022) [67] Facial emotion recognition Facial expression Wan et al. (2022) [40] Facial emotion recognition Facial expression Silva et al. (2021) [41] Facial emotion recognition Facial expression Praveena et al. (2021) [68] Facial emotion recognition Facial expression Rojas et al. (2021) [42] Facial emotion recognition Facial expression Karanchery et al. (2021) [43] Facial emotion recognition Facial expression Valles et al. (2021) [69] Speech emotion recognition Voice expression DIzicheh et al. (2021) [44] Facial emotion recognition Facial expression Pulido-Castro et al. (2021) [31] Facial emotion recognition Facial expression Arabian et al. (2021) [32] Facial emotion recognition Facial expression Li et al. (2021) [33] Multimodal emotion recognition Facial expression, Voice expression Ghanouni et al. (2021) [45] Facial emotion recognition Facial expression Zhang et al. (2023) [70] Multimodal emotion recognition Facial expression, Voice expression, Eye tracking, Body movement Talaat (2023) [34] Facial emotion recognition Facial expression Murugaiyan et al. (2023) [71] Speech emotion recognition Voice expression  [34] Not sufficiently described Not sufficiently described Murugaiyan et al. (2023) [71] Not sufficiently described Not sufficiently described  [32] Not sufficiently described Not sufficiently described Not sufficiently described Li et al. (2021) [33] Not sufficiently described Not sufficiently described Not sufficiently described Ghanouni et al. (2021) [45] Vision sensor Kinect Not sufficiently described Zhang et al. (2023) [70] Vision Camera Not sufficiently described Talaat (2023) [34] Not sufficiently described Not sufficiently described Not sufficiently described Murugaiyan et al. (2023) [71] Audio Microphone Not sufficiently described

Fig. 2 .Fig. 3 .
Fig. 2. Autism types investigated in each reviewed study.(For interpretation of the colours in the figure(s), the reader is referred to the web version of this article.)

Fig. 4 .
Fig. 4. Distribution of the reviewed studies according to the sample size.

Fig. 5 .
Fig. 5. Emotions used for the training-validation of the recognition models in the reviewed studies.

Fig. 6 .
Fig. 6.Emotions used for the testing of the recognition models in the reviewed studies.

Fig. 7 .
Fig. 7. Devices, sensors, and specific models used in the reviewed studies.

Fig. 8 .
Fig. 8. Machine learning techniques, performance and metrics, and validation methods used in each of the reviewed studies.

Fig. 9 .
Fig. 9. Summary of the principal takeaways of the reviewed manuscripts.
• Meeting these challenges requires interdisciplinary collaborative teams and adequate funding from governments and institutions to design, develop and validate the necessary technologies from an autism-centred perspective and in realistic settings.•The overall aim is that reflection on the positive contributions made by researchers in this field, in particular on the vast room for improvement, may inspire great interest in other colleagues in the field

Table A . 1
Main research goals for each reviewed study.

Table A . 2
Autism types investigated in each reviewed study.Emotional expressions and sensed body regions for each reviewed study.

Table A . 4
Sample characteristics for each reviewed study.

Table A . 5
Emotions used for the training-validation and testing of the recognition models in the reviewed studies.

Table A . 7
Machine learning techniques, performance and metrics, and validation methods used in each of the reviewed studies.