Assessing Consciousness in Patients With Disorders of Consciousness Using a Musical Stimulation Paradigm and Verifiable Criteria

Numerous studies have shown that musical stimulation can activate corresponding functional brain areas. Electroencephalogram (EEG) activity during musical stimulation can be used to assess the consciousness states of patients with disorders of consciousness (DOC). In this study, a musical stimulation paradigm and verifiable criteria were used for consciousness assessment. Twenty-nine participants (13 healthy subjects, 6 patients in a minimally conscious state (MCS) and 10 patients in a vegetative state (VS)) were recruited, and EEG signals were collected while participants listened to preferred and relaxing music. Fusion features based on differential entropy (DE), common spatial pattern (CSP), and EEG-based network pattern (ENP) features were extracted from EEG signals, and a convolutional neural network-long short-term memory (CNN-LSTM) model was employed to classify preferred and relaxing music.The results showed that the average classification accuracy for healthy subjects reached 85.58%. For two of the patients in the MCS group, the classification accuracies reached 78.18% and 66.14%, and they were diagnosed with emergence from MCS (EMCS) two months later. The accuracies of three patients in the VS group were 58.18%, 64.32% and 62.05%, with two patients showing slight increases in scale scores. Our study suggests that musical stimulation could be an effective method for consciousness detection, with significant diagnostic implications for patients with DOC.


Assessing Consciousness in Patients With Disorders of Consciousness Using a Musical Stimulation Paradigm and Verifiable Criteria I. INTRODUCTION
A SSESSMENT of consciousness in patients with disor- ders of consciousness (DOC), specifically those in the vegetative state (VS) or minimally conscious state (MCS), is highly important in clinical practice [1].The Revised Coma Recovery Scale (CRS-R) is widely used in the medical field because of its scientific authority.However, its assessments are subjective, leading to a higher incidence of misdiagnosis [2].Electroencephalography (EEG) provides objective data directly related to brain activity, and EEG data are inexpensive to obtain.Thus, numerous studies have been conducted to analyze functional brain activity by monitoring EEG signals generated in response to external stimulation.Furthermore, previous studies have demonstrated that EEG is a reliable technique for detecting consciousness in clinical settings [3].
In neuroscience research, auditory stimulation has been shown to effectively activate the corresponding cortex in the brain, and the degree of activation can be determined through neuroimaging [4].The activation of the cerebral cortex by auditory stimulation in patients in a MCS is more significant than that in patients in a VS [5], suggesting that the degree of activation produced by auditory stimulation is important for distinguishing between patients in MCSs and VSs.Moreover, significant sounds, such as those associated with individuals, may stimulate the corresponding cortex of the brain strongly in patients with DOC [6].In addition, sounds associated with significant emotional stimulation can increase the sensitivity of clinical tests [7].These findings suggested that patients with DOC may be more sensitive to sounds that are personally relevant and that the degree of brain activation is correlated with the level of consciousness.
Music, especially personalized music such as preferred music and relaxing music, contributes to the arousal of patients with DOC and may be utilized for diagnostic and prognostic assessments.Hole et al. [8] reported that listening to preferred music reduced pain in patients to a certain extent.Lai et al. [9] reported that relaxing music chosen by patients helps reduce anxiety, thus reducing cardiac workload and oxygen consumption.Heine et al. [10] reported an increase in brain functional connectivity when patients were exposed to their preferred music.These results suggest that music, a highly emotional sound, is a significant stimulus for patients with DOC that can positively affect patients at the behavioral level [11].Park et al. [12] designed a randomized crossover trial using preferred and classical relaxation music, and irritable patients with cognitive deficits after severe traumatic brain injury (TBI) were significantly less agitated when listening to preferred music.Therefore, the stimulation of patients by preferred music or relaxing music may have different effects on physiological and behavioral responses and may serve as two sets of control stimulus conditions triggering distinct brain activities.Naci et al. [13] reported that brain activity in patients with DOC during natural stimuli stimulation was similar to that of healthy participants, suggesting high cognitive and potentially conscious processing.This finding indicates that when the brain functional status of patients with DOC is closer to that of healthy subjects, their level of consciousness is greater, and they have the ability to distinguish between different music stimuli, similar to healthy subjects.Sihvonen et al. [14] reported that the interpretation and appreciation of music requires extensive bilateral control of attention, memory, emotion, reward, motor skills, and auditory, syntactic, and semantic processing organized by complex brain networks.For patients with DOC, their brain function is closely related to their level of consciousness.The gold standard of consciousness, the CRS-R scale, provides detailed ratings of auditory function, visual function, motor function, verbal function, communication, and arousal function scores to demonstrate the state of the patient's brain functioning and thus their level of consciousness [15].Specific EEG features play important roles in distinguishing states of consciousness.Differential entropy (DE) and common spatial pattern (CSP) are common features of EEG signal classification in the DOC domain.For example, Huang et al. [16] extracted the DE features of patients with DOC for online EEG-based emotion recognition, and 3 of 8 achieved significant online accuracy.Pan et al. [17] reported that CSP features could be used to identify significant brain activity in the alpha band.In addition, the phase-locking value (PLV) assesses the signal synchronization between different brain regions and usually represents the functional connectivity of the brain.Cai et al. [18] used the PLV as a statistical metric to assess phase coupling between paired regions in different frequency bands and concluded that frontal-parietal and frontal-occipital connections play a dominant role in the classification of patients with DOC.In recent years, several multi-feature fusion approaches have been proposed for extracting information from EEG signals.Li et al. [19] proposed a multi-feature fusion method that combines wavelet packet energy and hierarchical fuzzy entropy for stroke EEG signal classification, and the results revealed that the fusion features yielded better classification results.More studies illustrate that the use of multi-feature fusion approaches can integrate information propagation patterns and activation differences in the brain, combining compensatory activation and connectivity information to achieve good performance in EEG signal classification [20].However, the optimal features to select for fusion and how different features should be fused remain challenges in the DOC domain.The above three features are derived from different dimensions (temporal domain, spatial domain, and brain connectivity functions), and all of them achieve good results in the classification of EEG signals in the DOC domain.Therefore, we extracted the PLV and quantized it into EEG-based network pattern (ENP) features and tried to fuse DE, ENP, and CSP features, hoping that the fusion features can complement each other's information from each dimension to achieve better classification results by providing a more comprehensive understanding of brain function.
Traditional EEG signal classifiers mainly include decision tree models, plain Bayesian classifiers, and support vector machines (SVMs).In recent years, various deep learning frameworks, such as convolutional neural networks (CNNs) and long short-term memory networks (LSTMs), have also been applied to classify EEG signals.Sheykhivand et al. [21] applied the raw EEG signals of 14 subjects under two music stimuli directly to a hybrid CNN-LSTM model without involving feature extraction and achieved a classification accuracy of 97.42%.Xu et al. [22] designed a hybrid CNN-LSTM model for detecting people with autism spectrum disorder (ASD) by determining the functional connectivity of the brain and classified task state data with an accuracy of 74.55%.These studies show that CNN-LSTM models are effective in the classification of EEG signals.Therefore, we propose a hybrid CNN-LSTM model that applicable to the music stimulation paradigm for consciousness detection in patients with DOC.
In pursuit of advancing the assessment methodologies for patients with DOC, this study proposes a musical stimulation paradigm coupled with verifiable criteria.The main contributions of this paper are encapsulated in the following three facets: 1) We designed a musical stimulation paradigm, which entails the systematic recording of EEG signals from DOC patients while they are exposed to preferred and relaxing music.This approach allows for an in-depth exploration of the brain's functional activity in response to these auditory stimuli, providing a novel avenue for discerning residual cognitive processes and levels of consciousness.Fig. 1.The musical stimulation paradigm.Each phase lasted five minutes, with resting intervals before and after the preferred music and the relaxing music, respectively, to avoid interference from other factors.Each subject was alone in a quiet space with suitable air temperature and humidity.EEG data were continuously collected during this process.
behavior-based CRS-R scores, we establish a robust framework for distinguishing between patients in the MCS and VS, thereby providing a more accurate evaluation of their consciousness states.

A. Musical Stimulation Paradigm Design
The researcher determined each subject's preferred music by conducting surveys, visiting, or asking the patient's family members.The relaxing music was selected from a professional music therapist's music library.The same relaxing music was used for all participants.The designed musical stimulation paradigm is shown in Fig. 1.If the subject was judged to be in an awake state, the EEG acquisition equipment was placed, and the patient was moved to a quiet environment to remain in a resting state for five minutes.The music therapist then played relaxing music for five minutes, followed by five minutes of rest, five minutes of the subject's preferred music, and five minutes of rest.Data acquisition was suspended or terminated if the subject moved his or her head, ground his or her teeth or fell asleep during this period and continued when the medical staff determined that the subject had returned to an awake state and that there were no other complications, thus preventing any interference in the EEG signal acquisition process.

B. Participants
EEG data were self-collected from 29 subjects, including 13 healthy subjects (H1 to H13) and 16 patients with DOC.There were 6 MCS patients (M1 to M6), whose ages ranged from 26 to 65 years (4 males and 2 females), and 10 VS patients (V1 to V10), whose ages ranged from 22 to 50 years (9 males and 1 female).The characteristic information of each patient is shown in Table .I.The subject's family members were informed of the experimental tasks and signed a written informed consent form prior to the start of the experiment.The Ethics Committee of the Medical School of Southern Medical University approved this study.

C. EEG Data Acquisition
The subjects wore 62-channel EEG caps with electrode positions conforming to the international 10-20 system.The DOC patient was medically supervised with a bipolar channel EEG device, and the sampling frequency was set at 2000 Hz to eliminate artifacts and ensure consistency of the data.Owing to skull deformation and other factors, we ultimately included EEG signals from 51 channels, including 23 pairs of symmetrical electrodes and 5 central axis electrodes.

D. EEG Signal Preprocessing
We preprocessed the collected EEG data using EEGLAB software.Initially, the first 220 seconds of data for each stimulus condition were considered valid data.A finite impulse response (FIR) filter was subsequently applied for bandpass filtering in the frequency range of 0.1 Hz to 50 Hz.The infinite impulse response (IIR) filter serves as a notch filter with a notch frequency set to 50 Hz and is used to eliminate industrial frequency (IF) interference with a 50-Hz sine wave caused by an alternating current during EEG collection.The whole-brain average is set as a re-reference point, which can avoid potential bias toward any hemisphere of the brain and avoid the reference electrode being too close to the target electrode.Finally, the data were segmented into onesecond epochs.Preferred music and relaxing music each had 220 trials, with a total of 440 trials for each patient.

E. Feature Extraction
The three features were extracted separately.DE is a generalized form of Shannon information entropy on continuous variables.In this case, the EEG signal features at a certain length approximately follow a Gaussian distribution N(µ,σ 2 ), which is numerically equal to the logarithm of a linear multiple of the standard deviation, which can be formulated as follows: where [a, b] represents the interval of information values and the final calculation formula for DE is further simplified from the difference entropy.First, bandpass filtering and noise removal are used to preprocess the EEG signals.Next, a continuous EEG signal is sampled as discrete data and divided into multiple time segments.Each segment of the signal is assumed to contain N sample points (x 1 ,x 2 ,. ..,xN ).The mean of each segment is subsequently calculated, and the mean is removed to obtain a zero-mean signal.Finally, σ 2 is substituted into (1), and the DE feature is calculated.
The CSP algorithm is a feature extraction algorithm for mixed space covariance matrices of dichotomous data, and the covariance of each sample is calculated as shown in (2).Algebraic methods were used to project two types of signals onto the same space so that their variances have the greatest difference; additionally, two different feature vectors were obtained to classify the two types of signals.The diag function was used to diagonalize the matrix, and the diagonal elements were used to form the feature vector.The ReliefF function was used to calculate the feature weights, giving relevant features higher weights and arranging them in order, and the noise reduction parameter was set to 10 to prevent excessive noise.
The EEG signal of each sample is represented as E N ×F , where N is the total number of electrode leads and F is the number of data sampling points.The data sampling frequency used in this study was 2000 Hz, and the sample duration was Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I INFORMATION ON THE CHARACTERISTICS OF PATIENTS
1 second; thus, the value of F is 2000.CSP features can be used to effectively extract the spatial distribution of each type of signal from multichannel multielectrode EEG signals and have been widely used to extract spatial domain information from EEG signals.However, spatial information is considered only in feature extraction without frequency domain or time series information.Moreover, feature ranking is performed based on a single factor associated with blindness, which may lead to the generation of redundant information and thus unnecessary interference.Based on this consideration, the first three and last three eigenvectors were selected to form the CSP feature matrix for each sample in this study.
The ENP features are extracted based on extensions and deformations of PLV features, which are used to determine the synchronization and correlation of two phases.Assuming that the instantaneous phases of the two signals x(t) and y(t) are φ(x) and φ(y), the PLV feature is expressed in the following equation [17]: where i is the number of trials, N is the total number of trials in the whole process, and φ(t) is defined as follows [17]: where φ x ( j t) and φ y ( j t) are the instantaneous phases of the signal at x and y, respectively, representing the instantaneous phase difference between the signals x(t) and y(t) [17].t is the sampling period, and j represents the order of the sampling points.The PLV feature always has a value between 0 and 1, with 0 indicating a completely random rise and fall and 1 indicating that one signal completely follows the other.The phase difference ( φ) between the two signals at moment t represents the phase lock between these two electrodes.
The Brain Connectivity Toolbox (BCT) [23] was used to simulate the construction of complex networks in the brain, and the PLV matrices of the five frequency bands (δ band (1)-3 Hz), θ band (4)-7 Hz), α band (8-13 Hz), β band (14-30 Hz) and γ band (31-48 Hz)) were used as parameter inputs for network construction.Specifically, PLV matrices, including the clustering coefficient, global average diffusion efficiency, pairwise diffusion efficiency, and distance matrix, are used as inputs to construct complex networks to define ENP features.Connecting and merging the four complex network node feature matrices of the ENP features for each band resulted in the total ENP feature.
To improve the accuracy and precision of the feature extraction process, the EEG signal was divided into five frequency bands.To make the DE and CSP features consistent with the ENP features, similarly, we applied a bandpass filter to differentiate the five frequency bands and extracted the features in the signal of each band separately.The eigenvalues in each of the five bands were calculated and synthesized to generate the final feature matrix.

F. Classification Model Design
The model has a 6-layer CNN structure, followed by a tandem LSTM structure to extract temporal information and a softmax function to complete the classification; the structure is shown in Fig. 2. In this case, the CNN consists of one input Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
layer (3D array data), a stack of four convolutional layers (three layers with 3 × 3 convolutional kernels and one layer with a 1 × 1 convolutional kernel) and one fully connected layer (for dimensional conversion), and the data are passed upward through the LSTM model in time order.
First, in the input layer, the reshape function is used to define a three-dimensional array to complete the feature matrix input.The convolutional layers are initialized using the sequential function, which allows a list of multiple training layers to be entered as input parameters.The first three convolutional layers use a 3 × 3 convolutional kernel, and the ReLU activation function is used to add a nonlinear factor, which deepens the neural network layer, with the parameter padding set to the same for padding.Each convolutional layer is followed by a batch normalization layer to normalize the data and accelerate convergence.The final convolution layer uses a 1 × 1 convolution kernel for linear transformation of the information between channels; in addition, the number of model parameters is reduced to improve the subsequent connections to the LSTM network.The model then proceeds to the pooling layer, in which downsampling is performed via the max pooling function.In addition, the matrix is flattened and connected to the fully connected layers using the flatten function.Finally, the data are fed into the LSTM network for temporal modeling, and classification is performed through the softmax function.

G. Evaluation Criteria
Three important evaluation criteria were used in this study.Data accuracy evaluation criteria: The accuracy corresponding to the neurological responses to music in DOC patients provides a window into their cognitive and conscious states.The more closely patients with DOC can distinguish between these two types of music, the greater their level of consciousness is.By examining their ability to distinguish between preferred and relaxing music, we gain insight into their preserved neural capacities, offering a nuanced perspective on their level of consciousness.In this study, the test set was partitioned into trials with lengths of 1 second, and each trial was labeled either relaxing music or preferred music.During the testing process, the classification model predicted the music category for each trial, and the percentage of the number of correctly predicted trials to the total number of trials was taken as the classification accuracy of the test set.
Statistical evaluation criteria: We used the chi-square test to verify whether the differences in accuracy were significant.Specifically, we determined whether the number of hit trials compared to the total number of trials was greater than the statistical standard.This statistical criterion was calculated as follows [24]: where f o i is the actual value and f e i is the expected value.
In this study, we have hits and misses in our predictions, k has a value of 2, the degree of freedom is calculated as 1, and the critical value of χ 2 can be obtained as 3.84 using the significance level of p = 0.05.Assuming that there is no relationship between the musical stimuli and the state of the subject's brain, based on the fit detection results, we can infer that among the 440 trials, we should reject the hypothesis only if the number of hits reaches 241.Thus, the accuracy should be at least 54.78% to verify that there is a significant difference in subjects' responses to these two music stimuli.Verifiable consciousness assessment criteria: EEG data from healthy subjects were used as the training set for model training, and the EEG data from patients with DOC were used as the testing set.If a patient achieves high accuracy, indicating that the activation state of their brain with both types of music closely resembles that of a healthy subject, the patient is considered to have a high level of cognitive functioning; thus, we infer that they may have a higher level of consciousness.Importantly, we followed up with all patients to obtain their CRS-R scores two months later to verify the results of the consciousness assessment, thus demonstrating the reliability of the designed consciousness assessment criteria.

A. Research Procedure
The flowchart of this study is shown in Fig. 3.The EEG data of the subjects were collected while musical stimulation was presented.After a series of preprocessing steps, such as noise reduction and filtering, individual three-dimensional features were extracted from the temporal, spatial, and brain functional connectivity information.Multiple feature matrices were fused at the dimensional level and fed into a CNN-LSTM model for training.The models were trained to classify the feature data through four layers in the convolutional module and one layer in the temporal module, and the results of each convolutional layer were normalized using the batch normalization function, accelerating convergence.The model ultimately outputs the predicted music categories.The level of consciousness of each patient was analyzed on the basis of the evaluation criteria, and the results were verified using the patient's CRS-R scores two months later.

B. Experimental Design
The experiment consists of two parts.The first part is a validation experiment for the designed CNN-LSTM model, and the second part is a consciousness assessment experiment for patients with DOC.A music stimulation paradigm experiment is conducted for each subject, intending to assess the patient's current level of consciousness through the patient's neural response to music.EEG data from all 13 healthy subjects were used as the training set, and EEG data from each healthy subject were used as the testing set; these data were tested on the traditional SVM model and the CNN-LSTM model to verify the validity of the latter.EEG data from each DOC patient were used as the testing set to assess their level of consciousness, and the validity of the assessment was verified by comparing the CRS-R scores after two months.
The process of scoring patients on the CRS-R follows the design of a consciousness assessment experiment, which aims to assess the patient's current level of consciousness Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 2. The CNN-LSTM model structure.The convolutional kernel is set to 3 × 3, the batch size is sequentially increased to optimize model training, and the last layer of convolution is only linear transformation, which is beneficial for subsequent dimension conversion.Subsequently, a temporal module is added to enhance the temporal classification of the data.
through the patient's neural response to music.During the stable period after treatment, researchers performed the first CRS-R assessment.Patients with a certain level of auditory ability were selected for the musical stimulation paradigm experiments.The screening criteria also included patients whose onset time was within one year and who did not have a cranial injury or who were not undergoing repair after a cranial injury to ensure that normal electrode pads could be used to collect EEG signals.Two months after the end of the experiment, the patients were assessed using the CRS-R again, and the results of the two evaluations were compared.Each CRS-R assessment lasted approximately 30 minutes and was repeated several times on the same day by the same assessor to prevent random errors, and the assessment with the highest score was chosen as the patient's CRS-R assessment result.

C. Experimental Results
We first analyzed the DE, ENP, CSP and fusion features of the subjects under preferred music and relaxing music stimulation.Brain electrical activity maps of the mean DE, ENP, CSP and fusion features of the thirteen healthy subjects are shown in Fig. 4. Base on these features, we used the traditional support vector machine (SVM) model to classify the brain activities of healthy subjects while listening to preferred music and relaxing music.After training the SVM model, we visualized the coefficients of the feature vectors through the decision function to demonstrate the separability of the feature vector matrix and classifier, as illustrated in Fig. 5. Given that the scatterplot is a two-dimensional visualization and the features are multidimensional vectors, we obtain a two-dimensional mapping of the feature weights using dimensionality reduction, and the axes in Fig. 5 imply a two-dimensional representation of the feature weights after dimensionality reduction.The weights of the feature vectors are between −1 and 1.A weight closer to 1 indicates that the feature is closer to the preferred music stimulus state, while a weight closer to −1 indicates that the feature is closer to the relaxing music stimulus state.Fig. 5 shows that the vector weights of the DE, ENP, CSP, and fused features are separable for the two types of musical stimuli (preferred music and relaxing music), suggesting that these features reflect distinct activation states of the brain and that the SVM classifier can effectively classify these feature vectors.The classification accuracies for each feature and the fusion features are shown in Fig. 6.On the basis of the comparative analysis of the results, the fusion features of 13 healthy subjects had the highest median and the lowest variance among the four groups, indicating that the fusion features are more stable and more reliable.In addition, the fusion feature results of 11 out of 13 healthy subjects were the highest among the four comparative experiments.The fusion feature is better aligned with the classification requirements of our music stimulation paradigm, and we use it as an important reference for the assessment of awareness in patients with DOC.These findings indicate that the temporal, spatial, and brain functional connectivity information associated with the DE-ENP-CSP fusion features had complementary effects, improving information extraction and classification in the musical stimulation paradigm experiments.
To validate the effectiveness of our proposed DE-ENP-CSP fusion features, we conducted four sets of experiments for each feature module.The results are shown in Table .II.The results summarize the effects of these feature modules in a musical stimulation paradigm for each healthy subject.According to Table.II, 7 out of 13 healthy subjects had the highest classification accuracy when the three types of features were fused, and the fusion feature achieved the highest average Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.accuracy of 84.99% across these four sets of experiments.According to the paired sample t-test results, the p-value for the ENP-CSP on the fusion feature is less than 0.001, which indicates that the fusion feature is significantly better than the ENP-CSP.However, the p-values for the DE-ENP and the DE-CSP on the fusion feature were 0.208 and 0.434, respectively, which are greater than 0.05.The findings show that DE features may play a more important role in the fusion features.Combining the overall performance and the statistical analysis, we concluded that the classification accuracy results were relatively best when the three types of features were fused.
Furthermore, we compared the fusion feature-based classification accuracies of the traditional SVM machine learning model and our proposed CNN-LSTM model on the basis of data from healthy subjects.As shown in Table .III, among all 13 healthy subjects, an accuracy of greater than 90% was achieved for 7 out of 13 participants using the CNN-LSTM model on the basis of the fusion features, with a mean accuracy of approximately 85.58%.The average accuracy of the SVM model was approximately 84.99%.Given that 8 out of 13 healthy subjects demonstrated higher accuracy with the deep learning model, we deduced that this approach is better aligned with the classification requirements of our music stimulation paradigm.Consequently, we opted to utilize the CNN-LSTM model for consciousness detection experiments involving patients with DOC.
For MCS patients, the classification accuracy of the CNN-LSTM model for the brain activities with two types of music stimulation based on various features is shown in Fig. 4. Brain electrical activity mapping.The brain electrical activity maps of average DE, ENP, CSP, and fusion features in 13 healthy subjects influenced by preferred and relaxing music on brain activation.These images reflect the functional changes in the brain and represent the values for each feature at the location of the electrodes in color.Fig. 5. Scatterplots of the DE, ENP, CSP, and fusion feature vector weights.Fusion refers to DE-ENP-CSP fusion features.The number of weight coefficients corresponds to the number of rows of the trained feature vector matrix.The axes imply a two-dimensional representation of the feature weights after dimensionality reductio, and the weights of the feature vectors are between −1 and 1.A weight closer to 1 indicates that the feature is closer to the preferred music stimulus state, while a weight closer to −1 indicates that the feature is closer to the relaxing music stimulus state.

Table.
IV.For patient M2, the classification accuracy based on the fusion features reached 78.18%, representing the highest accuracy among the six MCS patients.For patient M4, the accuracy based on the fusion features reached 66.14%, and the accuracy based on the DE features was also greater than the threshold.These findings may indicate that the two patients had higher levels of brain activation and consciousness than did the other patients in the MCS group, and the recovery status of the two patients after 2 months confirms this viewpoint.The CRS-R score of patient M2 before the experiment Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II COMPARATIVE EXPERIMENTS FOR THE FUSION FEATURE AND ELIMINATING EACH FEATURE MODULE TABLE III THE ACCURACIES OF FUSION FEATURES FOR HEALTHY SUBJECTS IN THE SVM MODEL AND CNN-LSTM MODEL
was 9 (1-0-5-1-0-2).After two months, the score was 23 (4-5-6-3-2-3), indicating a significant improvement in the patient's auditory, visual, motor, verbal, communication and arousal functions.The scores of patient M4 ranged from 13 (2-3-5-1-0-2) to 14 (2-3-6-1-0-2), indicating that the patient had reached a level of functional object utilization in terms of movement.Patients were considered to have emerged from the MCS (EMCS) when they achieved a score of 6 for motor function or 2 for communication [15].Thus, the scores indicate that patients M2 and M4 both recovered from MCS to EMCS.
For VS patients, the classification accuracy of the CNN-LSTM model for brain activities with two types of music stimulation based on various features is shown in Table .V. As shown in Table .V, for patient V6, the accuracies based on the ENP, CSP, and fusion features exceeded 54.78%.Two months later, the CRS-R score of this patient increased from 6 (1-0-2-1-0-2) to 7 (1-1-2-1-0-2), indicating slight improvement in visual function.For patient V9, the accuracies based on the ENP and fusion features also exceeded the threshold, and the CRS-R score of this patient increased from 7 (2-0-2-1-0-  this patient remained unchanged two months later.We will continue to review the recovery status of this patient.

IV. DISCUSSION
The effectiveness of the musical stimulation paradigm in detecting consciousness in patients with DOC was demonstrated.Our experiments showed that, compared with healthy subjects, the accuracy for patients with DOC gradually decreased as their level of consciousness decreased.We speculate that only when the functional areas of the brains of patients with DOC recover to stages capable of processing complex musical rhythms and musical emotions can the brain activities produced by preferred and relaxing music be effectively distinguished.That is, only when their brain function is restored or when patients have a higher level of consciousness can certain functional brain areas be used to effectively differentiate between the different stimuli generated by preferred music and relaxing music, which are expressed differently.The differences in stimuli produced by the two types of music can be directly identified through the experimental results with high accuracy.Music can have a positive impact on memory, learning, and attention, which is beneficial for cognitive development [25].Moreover, music may be able to help patients with DOC become conscious and improve their emotions [26].Therefore, music-based stimulation and treatment may be effective approaches for personalized medical treatment and rehabilitation.
Fusion features are more applicable in consciousness detection than single features are.On the basis of the accuracy analysis of healthy subjects, fusion features showed the highest accuracy and best performance in musical stimulation paradigm experiments, probably because the multiple pieces of information contained in the fusion features have a complementary effect [27], which enables the classification model to recognize different music stimuli more accurately.For patients with DOC, we validated this using CRS-R scale scores.The validation showed that when patients had more than chance levels of fusion features, they also exhibited a relative increase in CRS-R scale scores.Two of the six MCS patients had fusion feature accuracies above the chance level, and their scale scores increased and reached the EMCS after two months.Two of the ten VS patients also had slight increases in their scale scores, and their fusion feature accuracies were higher than the chance level.This finding confirms that the fusion feature accuracy in patients is better able to differentiate the state of brain activity under two musical stimuli than a single feature is, resulting in a more accurate assessment of the patient's state of consciousness.Therefore, extracting multiple types of features from EEG signals and fusing them may be particularly beneficial for brain state recognition; thus, this approach has high experimental value.
We compared the accuracies of healthy subjects with the SVM model and the CNN-LSTM model and found that the latter had a higher average accuracy; moreover, with this model, most subjects had accuracies well above the threshold.These findings suggest that our CNN-LSTM model is superior to traditional machine learning models in terms of extracting EEG information and is more effective in distinguishing patients' brain activation states under the two types of musical stimulation.This may be because the spatial convolutional and temporal layers in the deep learning model incorporate hidden variables for training, and the feature information is expanded and superimposed during the iterative convolutional process, which compensates for the model defects caused by insufficient data sets.
For the MCS patients, three of the six MCS patients who participated in the experiment showed improvement in their CRS-R scores two months after the experiment.Patients M1, M2 and M4 received a score of 6 for the motor item; these individuals demonstrated functional object use, and patients M1 and M2 could accurately communicate.These results indicate that these three patients actually recovered to EMCS, with better recovery of brain consciousness in M1 and M2 in particular.This finding is generally consistent with the results of our EEG-based assessment criteria.For M2, the accuracy based on the fusion features reached 78.18%, and for M4, the accuracy based on the fusion features in the CNN-LSTM model reached 66.14%.However, the experimental results for M1 were not significant, and M1 achieved good performance only in terms of DE feature accuracy in the CNN-LSTM model.This may be because the classification model may have failed to distinguish their brain activities for different musical stimuli successfully.For some subjects, such as M3, M5, and M6, even though their scale scores were higher than those of the VS patients, the lower attention levels and weaker physiological reactions of the DOC subjects indicated that they could not manage to keep their consciousness active at all times.We speculate that decoding accuracies may be related to the subject's current level of conscious activity as well as the level of attention, which must be judged more accurately by taking into full account the specificity of the physiological state of patients with DOC.
In addition, we compared the CRS-R scores of M3, M5, and M6 two months after the experiment and found that the total scores of M3 and M5 remained unchanged.For M6, their total score slightly increased in terms of verbal aspects.These results are essentially consistent with our experimental results.The accuracy of the CNN-LSTM model based on the single and fusion features of M3, M5, and M6 was poor, indicating that their brain activation status did not significantly change when presented with the two musical stimuli.Notably, the DE feature accuracy of M6 was the highest among the three patients, and the subject's score slightly increased.This confirmed that DE features may have a more sensitive effect on performance in music stimulus paradigm experiments than other features, but this hypothesis must be verified through additional experiments and more data.
For VS patients, we observed that for patient V6, the accuracies based on the ENP features, CSP features and fusion features all exceeded the threshold, and the CRS-R score of this patient increased from 6 (1-0-2-1-0-2) to 7 (1-1-2-1-0-2) within two months.For V9, the accuracies based on the ENP features and fusion features also exceeded the threshold, and the CRS-R score increased from 7 (2-0-2-1-0-2) to 9 (2-1-2-2-0-2).For V3, the accuracy based on the fusion features reached 58.18% when using the CNN-LSTM model, but the CRS-R score remained unchanged after two months.The overall brain condition of V3 may slowly recover, but within two months, they did not recover enough for their scores to substantially increase.Therefore, this patient became the subject of our ongoing follow-up.We found that the speech function score with V8 and the arousal function score with V1 on the CRS-R increased slightly, from 1 to 2 points; however, the accuracy of our evaluation criteria was not ideal.Similar results were observed for M6.This finding may suggest that our musical stimulation paradigm is targeted and that music material naturally stimulates stronger auditory function while having weaker effects on other functions.Other stimulating materials, such as videos [28], can stimulate visual and auditory functions simultaneously, achieving different effects in consciousness detection.The functional areas that the musical stimulation paradigm can effectively activate need to be explored in depth using more advanced methods in interdisciplinary disciplines such as medicine and imaging [29], [30].
In contrast, the DE feature accuracy of V2 and the CSP feature accuracies of V5 and V7 in the proposed model exceeded the chance level.This may be due to the normalization function of the deep learning model, which limits the processing of extreme data.Moreover, we believe that although some brain functions of patients with DOC are impaired, they may still maintain normal function in other regions.Their performance may vary according to different characteristics.At present, the connection between musical stimulation and functional brain areas remains unclear.Music may trigger activity in one or more functional brain areas.Therefore, patients with DOC may exhibit greater accuracy with certain features that vary from person to person, possibly because certain brain areas have relatively normal cognitive function and good stimulusinducing ability.This type of feature precisely corresponds to the extraction of EEG information in that region, which may lead to high accuracy.
This study has several limitations that should be considered.The number of subjects involved in the experiment, especially patients with DOC, was relatively small.In addition, deep learning models suitable for EEG task classification need further exploration.In future work, we will increase the sample size and construct better classification models for further studies on consciousness assessment in patients with DOC.

V. CONCLUSION
In this paper, we proposed a musical stimulation paradigm and verifiable consciousness assessment criteria, providing an objective reference for consciousness assessment in patients with DOC.We employed experiments using preferred music and relaxing music and found that the brain activation state under these two types of music stimulation strongly differed and can be used as a valid control condition for assessing patients with DOC.A hybrid CNN-LSTM model was devised, harnessing the power of multidimensional fusion features derived from the DE-ENP-CSP framework.The findings of our research further explored the potential of music stimulation as a tool for consciousness assessment for patients with DOC, paving the way for advancements in their care and rehabilitation strategies.

Fig. 3 .
Fig. 3.The experimental procedure.Conv indicates convolutional layer; BN indicates batch normalization; D-Conv indicates dimension conversion; T-Model indicates temporal model.DE, ENP, and CSP features were selected as the fusion features to complement each other in terms of temporal, spatial, and brain functional connectivity.The fusion features are input into the CNN-LSTM model for training and classification, and the prediction results of the music categories are obtained.The patient's level of consciousness was assessed on the basis of the threshold of the chi-square test, and consciousness was assessed using CRS-R scores.

Fig. 6 .
Fig. 6.The accuracies of individual features and fusion features in the SVM model.The classification accuracies of the individual and fusion features of 13 healthy subjects in the SVM model are shown in the box plot, and the bold horizontal line in the middle represents the median accuracy of the group.The bottom of the box represents the lower quarter, the top of the box represents the upper quarter, the bottom line represents the minimum value, and the top line represents the maximum value.

TABLE IV THE
ACCURACY OF PATIENTS WITH MCS IN THE CNN-LSTM MODEL2) to 9 (2-1-2-2-0-2).This indicates an improvement in the patient's visual and verbal function.Moreover, according to our evaluation model, for patient V3, the accuracy based on the fusion features reached 58.18%, but the CRS-R score of

TABLE V THE
ACCURACY OF PATIENTS WITH VS IN THE CNN-LSTM MODEL