Objectively Quantifying Pediatric Psychiatric Severity Using Artificial Intelligence, Voice Recognition Technology, and Universal Emotions: Pilot Study for Artificial Intelligence-Enabled Innovation to Address Youth Mental Health Crisis

Background Providing Psychotherapy, particularly for youth, is a pressing challenge in the health care system. Traditional methods are resource-intensive, and there is a need for objective benchmarks to guide therapeutic interventions. Automated emotion detection from speech, using artificial intelligence, presents an emerging approach to address these challenges. Speech can carry vital information about emotional states, which can be used to improve mental health care services, especially when the person is suffering. Objective This study aims to develop and evaluate automated methods for detecting the intensity of emotions (anger, fear, sadness, and happiness) in audio recordings of patients’ speech. We also demonstrate the viability of deploying the models. Our model was validated in a previous publication by Alemu et al with limited voice samples. This follow-up study used significantly more voice samples to validate the previous model. Methods We used audio recordings of patients, specifically children with high adverse childhood experience (ACE) scores; the average ACE score was 5 or higher, at the highest risk for chronic disease and social or emotional problems; only 1 in 6 have a score of 4 or above. The patients’ structured voice sample was collected by reading a fixed script. In total, 4 highly trained therapists classified audio segments based on a scoring process of 4 emotions and their intensity levels for each of the 4 different emotions. We experimented with various preprocessing methods, including denoising, voice-activity detection, and diarization. Additionally, we explored various model architectures, including convolutional neural networks (CNNs) and transformers. We trained emotion-specific transformer-based models and a generalized CNN-based model to predict emotion intensities. Results The emotion-specific transformer-based model achieved a test-set precision and recall of 86% and 79%, respectively, for binary emotional intensity classification (high or low). In contrast, the CNN-based model, generalized to predict the intensity of 4 different emotions, achieved test-set precision and recall of 83% for each. Conclusions Automated emotion detection from patients’ speech using artificial intelligence models is found to be feasible, leading to a high level of accuracy. The transformer-based model exhibited better performance in emotion-specific detection, while the CNN-based model showed promise in generalized emotion detection. These models can serve as valuable decision-support tools for pediatricians and mental health providers to triage youth to appropriate levels of mental health care services. International Registered Report Identifier (IRRID) RR1-10.2196/51912

-The proposed approach of using Gaussian Mixture Model, EM, and RNNs was presented but no justification was given as to why they expect these approaches will be accurate.
-There is little preliminary analysis of the data to suggest that this architecture will actually work.

Suggestions: None
The panel assigned the following overall ranking to this proposal: Competitive The summary was read by/to the panel and the panel concurred that the summary accurately reflects the panel discussion.

Review:
In the context of the five review elements, please evaluate the strengths and weaknesses of the proposal with respect to intellectual merit.
This SBIR Phase I project will develop a voice-based tool to classify emotional distress severity for children and adolescents ( families with low socio-economic status) at-risk of trauma due to Adverse Childhood Experiences.The research work will focus on testing and validating two machine learning models to identify and predict emotional disorder severity.
Strength: + Emotion analysis fro voice is an interesting research area for machine learning. Weaknesses: -The major research and innovation of this project is to use machine learning to classify emotional disorder severity from voice, which is task 1.4.However, only a very small part of the proposal description is spent on discussing the technical approach.Many problems and concerns are not addressed, e.g., any existing methods?why will the proposed method work?The Gaussian Mixture Model and LSTM are mainly used in speech recognition (sequence data), which convert speech to text.Why will it work on detecting emotion?-As the main innovation and risk is on developing machine learning algorithms, it is disappointing and confusing why so much discussion and budget is on data collection, therapist training, and personnel with general management and startup experience.
-Different from speech recognition, personal talking style can affect an automatic emotion identification tool significantly.Personalization should be a factor in building such a tool.
In the context of the five review elements, please evaluate the strengths and weaknesses of the proposal with respect to broader impacts.
Strength: + The proposed work will have positive impact to society.+ The project has recruited two Georgia Based Behavioral Health Organizations as Pilot sites for product development.

Review (PI Copy)
Proposal:1938206 PI Name:Alemu, Yared Printed from eJacket: 03/23/23 Weaknesses: -The commercialization and marketing plan is lack of details and not convincing.
-It is disappointing and unrealistic to focus hiring on sales while a solid product is still under development.
-The CVs do not follow NSF requirements.
Please evaluate the strengths and weaknesses of the proposal with respect to any additional solicitation-specific review criteria, if applicable

Summary Statement
The proposed work has positive impact to health and society in general.However, there are serious flaws with technical discussion, project plan and priority setting, innovation, personnel, and budget in this project.

Review:
In the context of the five review elements, please evaluate the strengths and weaknesses of the proposal with respect to intellectual merit.
The goal of the project is to develop an automated way to detect emotional distress and functional impairments in children and adolescents.The PIs will leverage a software app called TQI App that has already been developed to collect the data, and analyze the voice samples using ML techniques to detect children in need and at high-risk.

Strengths:
+ The idea of extracting distress signals using voice samples is quite interesting.
+ The availability of a channel for collecting data using the TQApp is a strength.
+ Close ties to orgs (with support letters) such as Family Ties, Georgia Hope provides them adequate patients to collect appropriate data to carry out the proposed work.
+ The team seems to be extremely strong with the PI being an expert on psychological disorders with both research and implementation.It also contains seasoned business and technological leadership coupled with several experts who are leaders spanning technology and healthy data analysis. Weaknesses: -The proposed approach of using Gaussinan Mixture Model, EM, and RNNs was presented but no justification was given as to why they expect these approaches will be accurate.There is also no preliminary analysis of the data to suggest that this architecture will actually work.
In the context of the five review elements, please The project has the potential to detect and provide care to the highly vulnerable (and large) population of children of substance-abuse parents, that are often prone to mental and psychological disorders if left untreated.The technological advances made by this project will also help address the shortage of trained mental health professionals in providing mental care to families etc.
Please evaluate the strengths and weaknesses of the proposal with respect to any additional solicitation-specific review criteria, if applicable Summary Statement I really liked the proposal.They have identified a nice problem that can be potentially solved with ML/AI techniques.The team is also extremely strong.
They also have strong ties with organizations to access data required to carry out the project.The only remaining risk is whether the proposed technology solution for analysing voice samples will be accurate, which I think is worth funding.

Review:
In the context of the five review elements, please evaluate the strengths and weaknesses of the proposal with respect to intellectual merit.
The investigators propose to develop a machine learning algorithm that detects clinically relevant emotional distress in speech samples from at-risk youth receiving mental health and family preservation services.The algorithm to detect and predict emotional disorder severity is likely to be preliminary and may require additional data and funding to develop a more robust classification system.
+ Benchmarks and evaluation metrics are included.
-The innovation appears to be modest and is not clearly articulated.There is a great deal of prior work on emotion detection that appears to be overlooked.
-The approach in general is not rigorous.
-The PI has limited publications and research track records.The speech processing expertise appears to lie outside the company.
-The bio-sketches should be in NSF format.
In the context of the five review elements, please evaluate the strengths and weaknesses of the proposal with respect to broader impacts.
The proposed study, if successfully executed, will have a broader impact on community health and families.
The broader impacts on other disciplines is not highlighted.
Please evaluate the strengths and weaknesses of the proposal with respect to any additional solicitation-specific review criteria, if applicable -It is not clear how the revenue is calculated.

NATIONAL SCIENCE FOUNDATION
Phase I: Automated Emotional Distress Severity Classification Using Speech Analytics and SFSS for SUD and OUD-Related ACE and Trauma