Voice assessment in patients with amyotrophic lateral sclerosis: Association with bulbar and respiratory function

doi:10.21203/rs.3.rs-3933807/v1

Download PDF

Research Article

Voice assessment in patients with amyotrophic lateral sclerosis: Association with bulbar and respiratory function

https://doi.org/10.21203/rs.3.rs-3933807/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Patients with amyotrophic lateral sclerosis (ALS) face respiratory and bulbar dysfunction causing profound functional disability. Speech production requires integrity of bulbar muscles and good breathing capacity, being a possible way to monitor such functions in ALS. Here, we studied the relationship between bulbar and respiratory functions with voice characteristics of ALS patients, at the convenience of using a simple smartphone for voice recordings. For voice assessment we considered a speech therapists’ standardized tool – Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V); and an acoustic analysis toolbox (for both time and frequency domains). The bulbar subscore of the revised ALS Functional Rating Scale (ALSFRS-R) was used; and pulmonary function measurements included forced vital capacity (FVC%) and maximum expiratory and inspiratory pressures (MIP% and MEP%, respectively). Correlation coefficients and both linear and logistic regression models were applied. A total of 27 ALS patients (12 males; 61 years mean age; 28 months median disease duration) were included. Patients with significant bulbar dysfunction revealed greater CAPE-V scores in overall severity, roughness, strain, pitch, and loudness. They also presented slower speaking rates, longer pauses, and higher jitter values in acoustic analysis (all p < 0.05). The CAPE-V’s overall severity and sub-scores for pitch and loudness demonstrated significant correlations with MIP% and MEP% (all p < 0.05). By contrast, acoustic metrics (speaking rate, absolute energy, shimmer, and harmonic-to-noise ratio) were significantly correlated with FVC% (all p < 0.05).The results provide supporting evidence for using smartphone-based recordings in ALS patients for CAPE-V and acoustic analysis as good correlates of bulbar and respiratory function.

Neurology

Translational Medicine

Personalized Medicine

ALS

Acoustic Analysis

Personalized Medicine

Digital Health

Amyotrophic Lateral Sclerosis (ALS) is a progressive neurodegenerative disease characterized by both upper and lower motor neuron degeneration [1]. This results in progressive muscle atrophy and paralysis, including the bulbar region (impairing speech and swallowing) and the respiratory system (diaphragm, thoracic, and abdominal muscles) [1, 2]. Around 85% of patients experience bulbar dysfunction during the course of the disease [3] and around 5% present with respiratory involvement [2, 4]. Early bulbar and respiratory dysfunction are the most devastating variants of the disease, being associated with shorter survival [5, 6, 7]. Usually, patients die two to five years after first symptoms from respiratory complications [4, 8].

The heterogeneity of ALS regarding clinical presentation and progression rate presents one of the biggest challenges to finding new treatments, but also to manage and monitor its course. The ALSFRS-R (revised Amyotrophic Lateral Sclerosis Functional Rating Scale) [9] is a subjective, ordinal scale universally accepted as a sensitive tool to assess the patient’s ability to perform daily activities. However, there is an urge to find objective and personalized measures directly linked to the disease’s impact on patients. In 1968, Darley [10] pointed out that voice articulation and phonation have important clinical implications. However, only recently has it been possible to identify specific voice features in ALS [11, 12, 13].

During vocalization, the vocal cords are initially adducted, resulting in either the complete closure of the glottis or a reduction of the space between the vocal folds to a linear fissure. When expelling the air from the lungs, this closure leads to its accumulation in the subglottic region, which persists until the adducting muscles can no longer withstand the increasing subglottic pressure, causing the vocal folds to open and release the air to the supralaryngeal vocal tract. As the air passes through the vocal cords, they oscillate and generate a fundamental sound source contributing to the production of subsequent speech sounds [14]. These sounds are characterized by three essential attributes: a) frequency, perceived as the pitch of the sound; b) volume, perceived as intensity; and c) timbre, perceived as the quality of the voice [15]. Moreover, the articulation process involves the modulation of the outgoing airstream by the articulatory organs, such as the lips, oral cavity, tongue, teeth, palate, pharynx, and nasal cavity, resulting in a rapidly changing and unique quality of the produced sound. This coordinated process between the bulbar and respiratory muscles generates a highly stereotyped sound pattern (in healthy people), making voice analysis a suitable approach to study function. Acoustic measures of speech, such as speech intelligibility (i.e., the extent to which a speaker is comprehensible) and speaking rate (i.e., the pace of speaking), have been suggested as means for evaluating bulbar dysfunction in ALS patients [16]. These two measures have also been shown to correlate with progressive functional decline.

The main objective of this study is to analyze specific features of smartphone-recorded sound signal and spectrogram (extracting frequency- and intensity-related voice sound features) from ALS patients, and correlate them with their functional status, and bulbar and respiratory functions. Secondly, this analysis is correlated with the speech therapist evaluation. The CAPE-V scale (Consensus Auditory-Perceptual Evaluation of Voice), a standardized protocol [17, 18] adapted to the Portuguese population [19], was employed as the standard voice assessment tool in this work.

2.1 Participants

We included consecutive ALS patients observed in our ALS clinic in Lisbon, diagnosed according to Gold Coast criteria [20]. All patients were followed and had complete neurological, neurophysiological, neuroimaging and blood tests to rule out mimicking conditions [21]. Patients with a previous history of lung disorders, resting dyspnea, laryngeal injury, upper airway infection, marked cognitive involvement impairing the understanding of the phonatory task, and those declining to participate were excluded. The study was approved by the local research ethics committee of the Centro Académico de Medicina de Lisboa (CAML-Ref. 146/21), and all participants gave written informed consent.

2.2 Clinical evaluation

We collected demographic data, including age, sex, body mass index (BMI), disease duration at the time of study entry, and the region of disease onset. To evaluate the functional disability, we used the ALSFRS-R scale [9].

Bulbar symptoms were quantified by the ALSFRS-R bulbar subscore. Sitting predicted forced vital capacity (FVC%) was measured using a computer-based USB spirometer (microQuark®, Cosmed®, Italy), and the best of three reliable manoeuvres was used for statistics [6]. In addition to FVC%, predicted maximum expiratory and inspiratory pressures (MEP% and MIP%, respectively) were included as respiratory measures.

2.3 Voice sound recordings and auditory-perceptual assessment

The CAPE-V scale was employed as the voice assessment tool in this study. The scale, validated and translated to European Portuguese [19], quantifies auditory-perceptual parameters, encompassing severity, roughness, breathiness, strain, pitch, and loudness. Severity represents the general impression of voice impairment; roughness denotes perceived irregularities in the voice source; breathiness refers to the audible escape of air in the voice; strain relates to the perception of excessive vocal effort; pitch represents the perceptual reflection of fundamental frequency; and loudness corresponds to the perceptual reflection of sound intensity [17]. The CAPE-V (supplemental Fig. 1) encompasses three distinct vocal tasks: firstly, participants were instructed to articulate three sustainable vowels (/a/, /i/, and /u/); secondly, they were asked to read six predetermined sentences containing diverse phonetic contexts; lastly, the evaluation involved an assessment of spontaneous speech.

To ensure standardization, all subjects were seated in a quiet room and instructed to perform these three specific phonatory tasks. These were recorded according to the prescribed guidelines of the CAPE-V. A smartphone was employed for the sound recordings, positioned at an approximate distance of 20–25 cm from the mouth and an angle of approximately 45°. These measures were implemented to mitigate the influence of wind noise generated when a forceful expulsion of air directly interacts with the microphone [22]. The sound recordings were conducted during the patient's current clinical visit. Each participant underwent a recording session encompassing sixteen distinct sound recordings. Subsequently, four specific recordings (comprising three instances for vowel /a/ and one instance of spoken sentence) were subjected to objective and comprehensive sound analysis, making a total of 108 recordings analyzed within the context of this study.

After data collection, the voice quality assessment was performed by one speech-language therapist, according to the CAPE-V scoring system. Each CAPE-V subcategory was scored using a 100 mm visual analog scale (VAS). The degree of voice quality impairments was evaluated for each vocal variable with a marking along the VAS: the higher the rating, the more severe the impairment (supplemental Fig. 2).

2.4 Signal processing and feature extraction

Regarding objective analysis, phrase C and vowel /a/ were chosen for a more detailed investigation (supplemental Fig. 1). This specific phrase was chosen because it includes only voiced phonemes [19]. Vowel /a/ was selected, as it is widely recognized in literature as suitable for instrumental-based voice measures [23, 24, 25]. For this analysis, the raw signal was first processed with Librosa – a Python package for audio signal analysis [26]. The analysis was conducted using a frame length of 2048 samples per frame and a hop length of 512. To minimize potential biases stemming from the beginning and end of the recordings, the split function of Librosa was employed with a cutoff of 20 dB, eliminating the initial and final periods of silence in the voice samples.

Once the pre-processing was completed, the generated voice sound signals were analyzed to extract audio-based features. We used the Time Series Feature Extraction Library (TSFEL) [27] that extracts over 60 different features on the statistical, temporal, and spectral domains. Considering prior research findings and relevance in general sound analysis, we included the harmonic-to-noise ratio (HNR), jitter (frequency perturbation), and shimmer (amplitude perturbation), absolute energy, sound power, entropy, fundamental frequency, and spectral bandwidth, but also speaking rate (sp_rate) and pause time duration (p_time). Since three recordings of vowel /a/ were taken, the results considered the mean values of the extracted features. All extracted features were normalized to their maximum value (with a range between − 1 and 1).

2.5 Statistical analysis

Data analysis was performed using Python version 3.11.2 (Python Software Foundation). For the significance level, α = 0.05 was considered. Descriptive statistics consisted of frequencies (with proportions) for categorical variables and mean values (with standard deviation) for continuous variables. Parametric tests such as the two-sample t-test or the one-way ANOVA were applied to compare mean values. If the normality assumption of the continuous variable was violated (significant Kolmogorov-Smirnov test with an absolute skewness > 2), non-parametric tests such as Mann-Whitney U-test or Kruskal-Wallis test were considered and results reported, if different from parametric analysis. Linear correlations were used to elucidate associations between the instrumental-based voice sound features and the CAPE-V scores with the ALSFRS-R total score, and the pulmonary function measurements, such as FVC%, MEP%, and MIP%. Logistic regressions were applied to identify sound features capable of distinguishing between patients with and without bulbar symptoms.

3.1 Demographics and clinical characteristics

We included 27 ALS patients, with 12 presenting bulbar dysfunction. Demographic and clinical variables are shown in Table 1. Age, sex, duration of symptoms, ALSFRS-R, and respiratory variables did not show statistically significant differences (p > 0.05) between patients with and without bulbar dysfunction.

3.2 Correlations between instrumental-based voice measures and CAPE-V scores and the disease functional state

We investigated the correlation between the ALSFRS-R total score and the voice assessments, including both the instrumental-based voice measures and the CAPE-V scores. The length of pauses while reading phrase C and its spectral bandwidth showed significant moderate correlations with the ALSFRS-R. Moreover, no significant correlations were found for the CAPE-V’s sub-scores (see Table 2). Figure 1 represents examples of sound wave patterns generated by two patients in different functional states of the disease.

3.3 Correlations between instrumental-based voice measures and CAPE-V scores and the respiratory function

Table 3 presents the correlations between the metrics derived from pulmonary function tests and the voice assessments either by instrumental-based voice collections (extracted from phrase C and vowel /a/) and CAPE-V scores. Regarding the instrumental-based voice assessments, there were significant correlations for FVC%, MIP%, and MEP%. Specifically, for phrase C, speaking rate and pause time were the features revealing higher coherence with all respiratory measures. We noted that FVC% exhibited a positive correlation with speaking rate and shimmer, while displaying a negative correlation with absolute energy and HNR. MIP% exhibited a positive correlation with speaking rate and a negative correlation with the length of pause time. Jitter was also significantly correlated, but it is important to note the difference between this relation and the one with MEP%. Lastly, MEP% displayed a significant correlation only with the speaking rate and an inverse correlation with the pause time. Furthermore, for the sustained phonation of vowel /a/, FVC% was negatively correlated only with the fundamental frequency and spectral bandwidth.

CAPE-V scores did not reveal a significant correlation with FVC%. Nonetheless, the overall severity and the sub-scores for pitch and loudness demonstrated significant correlations with MIP% and MEP% (Table 3). Notably, these correlations were negative – the lower the scores on the CAPE-V assessment, i.e., the lower the pitch and loudness severities – the higher the values of respiratory function variables. This finding indicates an absence of perceived voice quality alterations in patients with better respiratory function.

3.4 Voice sound features related to bulbar symptomology

We compared ALS patients with and without bulbar dysfunction using the instrumental-based voice measurements and the CAPE-V scoring. Regarding the instrumental-based voice measures, significant group differences were found for several metrics extracted from phrase C. Patients with bulbar dysfunction showed significantly higher absolute energy (p < 0.01) and HNR (p < 0.01), while revealing a lower jitter (p = 0.043). Moreover, patients with bulbar dysfunction also exhibited a significantly slower speaking rate (p < 0.01) and longer pause time (p = 0.049) (Fig. 2). Applying CAPE-V scores, several differences were disclosed: patients with bulbar dysfunction presented significantly higher scores in overall severity (p < 0.01), roughness (p < 0.01), strain (p = 0.038), pitch (p < 0.001), and loudness (p < 0.001) (Fig. 3). Regarding the sustained phonation of vowel /a/, only the jitter measure was significantly different between the two groups (Fig. 4).

3.5 Correlations between the CAPE-V scores and instrumental-based voice measures

Having identified the associations between instrumental-based voice measures and CAPE-V scores with the bulbar and respiratory function of ALS patients, we examined the correlations between these voice measures and CAPE-V scores. The results are presented in Table 4. Interestingly, the CAPE-V scores are only consistently correlated with the speaking rate and pause time of phrase C. Regarding the vowel /a/, significant correlations were observed only in jitter and HNR, specifically with overall severity scores (for both measures), and with pitch and roughness (limited to jitter).

Table 5 summarizes the calculated correlations, highlighting those with significant differences.

From a clinical perspective, as hypothesized, we found significant correlations between voice sound features and the bulbar and respiratory function in ALS. Regarding the objective analysis of instrumental-based voice measures, particularly those derived from phrase C, we found that they mirrored global functional status (Table 2). In more detail, correlations between intelligible factors such as speaking rate and pause duration and the functional state of the disease align with findings from previous studies [28, 29, 30]. Patients in advanced stages of the disease (with lower ALSFRS-R total scores) exhibit reduced speaking rates and increased pause times.

Additionally, this work revealed a positive correlation between the ALSFRS-R total score and the spectral bandwidth (Table 2). In normal, healthy speech, sounds are composed of a combination of different frequencies, and the spectral bandwidth provides insight into the distribution of these frequencies. This finding implies that individuals in poorer functional states often exhibit a more restricted frequency range in their speech compared to those in better states. On the other hand, CAPE-V scores applied by the speech therapist did not correlate with the total ALSFRS-R score (Table 2). Nevertheless, they proved effective in evaluating both bulbar and respiratory impairments, as nearly all the sub-scores enabled the differentiation of patients with and without bulbar impairments, and the CAPE-V’s overall severity, pitch, and loudness were significantly correlated with MIP% and MEP% (Table 3). These positive outcomes were anticipated because voice sound production results from a highly coordinated process between the bulbar and respiratory muscles. Therefore, voice assessment should be sensitive to detect bulbar and respiratory impairments. Similarly, in the analysis of phrase C, instrumental-based voice sound measures also had significant correlations with bulbar symptomology and respiratory function variables (FVC%, MIP%, and MEP%) (Table 3), particularly the speaking rate, pause time, sound energy, and variables assessing sound variability, such as jitter, shimmer, and HNR.

Jitter, shimmer, and HNR are becoming very prominent as they were found to be enough to accurately detect laryngological pathologies using machine learning algorithms [31] and also bulbar involvement in ALS patients [24, 32]. However, they were rarely used to assess respiratory function. Jitter is a measure of frequency perturbation, shimmer is a measure of amplitude perturbation, and HNR represents the ratio between the periodic (vibrations of the vocal cord) and non-periodic elements (glottal noise).

This work not only supports that patients with bulbar impairments have less capacity for varying the intensity, as shown by lower shimmer values. Regarding jitter, we found that this was higher in patients with bulbar symptoms during the phonation of vowel /a/, highlighting the importance of the inherent nature of the task. This finding aligns with the study by Xie et al. (2014) [33], which also demonstrated a similar result. We speculate that, when sustained phonation is required, patients with bulbar dysfunction encounter greater challenges in controlling slight variances in sound frequency, especially due to varying properties of the medium (the vocal tract) through which the sound wave travels.

From a technical perspective, we intended to demonstrate the association between frequency-related voice sound features with bulbar dysfunction, and intensity-related voice sound features with respiratory impairment (Table 5, in general). As briefly explained, frequency is perceived by the voice pitch and intensity by its volume. In broad, frequency is very dependent on vocal cord functionality, as it results from its variations, and intensity is very dependent on air volume, which results from respiratory muscle function.

Overall, in both subjective and objective evaluations, we found this consistent pattern: jitter, roughness, pitch, and strain values exhibited stronger correlations with bulbar symptomatology, while shimmer and loudness showed stronger associations with respiratory impairment. This implies that outcomes such as loss of harmonic complexity, narrowed frequency range, and increased regularity or voice sound periodicity are linked with tension or stiffness in vocal cords, significantly restricting the vibrational patterns of sound, and the loss of varying sound intensity linked to abnormal lung function. Absolute energy and HNR demonstrated correlations with both bulbar and respiratory function, providing insights into sound frequency and intensity.

Another critical finding involved examining the specific sound wave features on which the subjective evaluation relied (Table 4). While this type of evaluation is non-invasive, well-tolerated by the patients, brief, and cost-effective, it remains a challenging endeavor due to its subjective nature, as it is influenced by the internal standards of listeners, their background experience, and training. In this work, we found that subjective evaluations, across all sub-scores, heavily depended on intelligibility factors, particularly the speaking rate and pause time, as we found moderate to strong correlations between these metrics and the CAPE-V’s overall severity, roughness, strain, pitch, and loudness.

The findings mentioned above reinforce three key points: 1) the extent to which a speaker is comprehensible to a listener is very important, with speaking rate and the pause time being crucial contributors (even considering correlations between sound entropy in phrase C, and jitter and HNR in vowel /a/ with CAPE-V’s overall severity); 2) the challenge of accurately evaluating features like fundamental frequency, sound energy, power, and others solely through auditory perception (even with correlations between jitter in vowel /a/ and roughness and pitch); and 3) the importance of assessing a phrase in combination to a sustained vowel. Thus, the subjective analysis should be complemented with a more objective and personalized acoustic analysis, directly related to the muscle’s functionality. Lastly, in the realm of acoustic analysis as a method for detecting voice impairments, there is not only a lack of standardized methodologies, protocols for collecting voice samples or approaches and algorithms for extracting sound features, but, frequently, conclusions are drawn from diverse populations.

From a physiological perspective, especially in the context of voice assessment, it is crucial to consider that not all languages share the same phonemes, and even within a single language, phonemes can vary, influenced by factors such as regional differences [15]. Furthermore, reproducing phonemes not present in one´s native language poses challenges, as it requires an unfamiliar positioning of the organs responsible for producing speech such as the lips, oral cavity, tongue, teeth, palate, pharynx, and nasal cavity – and thus, resulting in different instrumental-based voice measures. This work highlights the acoustic analysis in a Portuguese population – which speaks a Latin-derived language.

Limitations

The most impactful constraint was the limited sample size, which posed challenges in assessing the generalization of the findings. Specifically, it hindered the possibility of establishing correlations while controlling for various confounding factors, including age, gender, and symptoms’ duration. Moreover, considering that the evaluation hinges on perceptual assessment and the experience of the speech-language therapist, it would have been beneficial to have voice recordings evaluated by multiple specialists to minimize potential errors. Furthermore, the objective analyses of only the phrase C and vowel /a/ can also limit the relationship between this approach and the subjective assessment. Owing to the cross-sectional nature of the study, causal relationships could not be determined. Performing a longitudinal voice sound analysis, making clinical correlations with phrenic nerve motor amplitudes and cervical muscle strength, and adjusting the results for other motor neuron diseases are future perspectives that could also help elucidate the results of this work.

The present work demonstrates that analyzing voice sounds can serve as a valuable technique for evaluating ALS patients, particularly those with respiratory and bulbar impairments. Through this research, we have identified crucial sound features to prioritize, some of which are quite perceptible to the human ear. This holds particular importance because bulbar and respiratory involvement requires coordinated interventions. Therefore, early detection is pivotal to improving the quality of life and extending the lifespan of ALS patients experiencing these impairments. However, it is important to note that these analyses should not be the only indicators utilized to evaluate respiratory and bulbar health, as ALS is a multifaceted and intricate disease. Rather, they can be used as adjunct measures, supplementing commonly used ways of disease progression. Finally, this is also a methodology well-received by patients, and very convenient – which does not require specialized equipment or handling. This allows researchers to start collecting data from patients' homes, decreasing the burden of hospital visits, and improving outcomes.

Acknowledgments

This study was part of a broader ALS project (HomeSenseALS - PTDC/MEC-NEU/6855/2020), supported by the Foundation for Science and Technology.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

O. Hardiman et al., “Amyotrophic lateral sclerosis,” Nature Reviews Disease Primers, vol. 3. Nature Publishing Group, Oct. 05, 2017. doi: 10.1038/nrdp.2017.71.
S. Wales et al., “Seminar Amyotrophic lateral sclerosis,” Lancet, vol. 377, pp. 942–55, 2011, doi: 10.1016/S0140.
L. C. Wijesekera and P. N. Leigh, “Amyotrophic lateral sclerosis,” Orphanet J Rare Dis, vol. 4, no. 1, 2009, doi: 10.1186/1750-1172-4-3.
Darrell Hulisz, “ Amyotrophic Lateral Sclerosis: Disease State Overview,” Am J Manag Care.
P. Kaufmann et al., “The ALSFRSr predicts survival time in an ALS clinic population,” 2005.
S. Pinto and M. de Carvalho, “Comparison of slow and forced vital capacities on ability to predict survival in ALS,” Amyotroph Lateral Scler Frontotemporal Degener, vol. 18, no. 7–8, pp. 528–533, Oct. 2017, doi: 10.1080/21678421.2017.1354995.
S. Shellikeri et al., “The neuropathological signature of bulbar-onset ALS: A systematic review,” Neuroscience and Biobehavioral Reviews, vol. 75. Elsevier Ltd, pp. 378–392, Apr. 01, 2017. doi: 10.1016/j.neubiorev.2017.01.045.
R. H. Brown and A. Al-Chalabi, “Amyotrophic Lateral Sclerosis,” New England Journal of Medicine, vol. 377, no. 2, pp. 162–172, Jul. 2017, doi: 10.1056/NEJMra1603471.
J. M. Cedarbaum, N. Stambler, E. Malta, C. Fuller, and D. Hilt, “The ALSFRS-R: a revised ALS functional rating scale that incorporates assessments of respiratory function,” 1999. [Online]. Available:
F. L. Darley, A. E. Aronson, and J. R. Brown, “Motor Speech Signs in Neurologic Disease,” 1968.
G. Milella et al., “Acoustic Voice Analysis as a Useful Tool to Discriminate Different ALS Phenotypes,” Biomedicines, vol. 11, no. 9, Sep. 2023, doi: 10.3390/biomedicines11092439.
Y. Yunusova, E. K. Plowman, J. R. Green, C. Barnett, and P. Bede, “Clinical measures of bulbar dysfunction in ALS,” Front Neurol, vol. 10, no. FEB, 2019, doi: 10.3389/fneur.2019.00106.
H. Vieira, N. Costa, T. Sousa, S. Reis, and L. Coelho, “Voice-Based Classification of Amyotrophic Lateral Sclerosis: Where Are We and Where Are We Going? A Systematic Review,” Neurodegenerative Diseases, vol. 19, no. 5–6. S. Karger AG, pp. 163–170, Jun. 01, 2020. doi: 10.1159/000506259.
R. S. Snell, “CLINICAL NEUROANATOMY,” 2009.
Susan Stranding et al., Gray’s: Atlas de anatomia, 40th ed. Elsevier, 2010.
T. Makkonen, H. Ruottinen, R. Puhto, M. Helminen, and J. Palmio, “Speech deterioration in amyotrophic lateral sclerosis (ALS) after manifestation of bulbar symptoms,” Int J Lang Commun Disord, vol. 53, no. 2, pp. 385–392, Mar. 2018, doi: 10.1111/1460-6984.12357.
G. B. Kempster, B. R. Gerratt, K. Verdolini Abbott, J. Barkmeier-Kraemer, and R. E. Hillman, “Consensus Auditory-Perceptual Evaluation of Voice: Development of a Standardized Clinical Protocol,” Am J Speech Lang Pathol, vol. 18, no. 2, pp. 124–132, May 2009, doi: 10.1044/1058-0360(2008/08-0017).
R. I. Zraick et al., “Establishing Validity of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V),” Am J Speech Lang Pathol, vol. 20, no. 1, pp. 14–22, Feb. 2011, doi: 10.1044/1058-0360(2010/09-0105).
S. C. de Almeida, A. P. Mendes, and G. B. Kempster, “The Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) Psychometric Characteristics: II European Portuguese Version (II EP CAPE-V),” Journal of Voice, vol. 33, no. 4, p. 582.e5-582.e13, Jul. 2019, doi: 10.1016/j.jvoice.2018.02.013.
J. M. Shefner et al., “A proposal for new diagnostic criteria for ALS,” Clinical Neurophysiology, vol. 131, no. 8. Elsevier Ireland Ltd, pp. 1975–1978, Aug. 01, 2020. doi: 10.1016/j.clinph.2020.04.005.
Mamede de Carvalho et al., “Electrodiagnostic criteria for diagnosis of ALS,” Review Clin Neurophysiol, 2008.
R. V. Sharan, U. R. Abeyratne, V. R. Swarnkar, S. Claxton, C. Hukins, and P. Porter, “Predicting spirometry readings using cough sound features and regression,” Physiol Meas, vol. 39, no. 9, Sep. 2018, doi: 10.1088/1361-6579/aad948.
M. Vashkevich and Y. Rushkevich, “Classification of ALS patients based on acoustic analysis of sustained vowel phonations,” Biomed Signal Process Control, vol. 65, Mar. 2021, doi: 10.1016/j.bspc.2020.102350.
Tena A, Claria F, Solsona F, Meister E, and Povedano M., “Detection of Bulbar Involvement in Patients With Amyotrophic Lateral Sclerosis by Machine Learning Voice Analysis: Diagnostic Decision Support Development Study.,” JMIR Med Inform., 2021.
M. Vashkevich, E. Azarov, A. Petrovsky, and Y. Rushkevich, “Features extraction for the automatic detection of ALS disease from acoustic speech signals; Features extraction for the automatic detection of ALS disease from acoustic speech signals,” 2018.
B. McFee et al., “librosa/librosa: 0.10.0.post2,” Mar. 2023, doi: 10.5281/ZENODO.7746972.
M. Barandas et al., “TSFEL: Time Series Feature Extraction Library,” SoftwareX, vol. 11, Jan. 2020, doi: 10.1016/j.softx.2020.100456.
P. Rong et al., “Predicting speech intelligibility decline in amyotrophic lateral sclerosis based on the deterioration of individual speech subsystems,” PLoS One, vol. 11, no. 5, May 2016, doi: 10.1371/journal.pone.0154971.
Y. Yunusova et al., “Profiling speech and pausing in amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD),” PLoS One, vol. 11, no. 1, Jan. 2016, doi: 10.1371/journal.pone.0147573.
K. M. Allison, Y. Yunusova, T. F. Campbell, J. Wang, J. D. Berry, and J. R. Green, “The diagnostic utility of patient-report and speech-language pathologists’ ratings for detecting the early onset of bulbar symptoms due to ALS,” Amyotroph Lateral Scler Frontotemporal Degener, vol. 18, no. 5–6, pp. 358–366, Jul. 2017, doi: 10.1080/21678421.2017.1303515.
J. P. Teixeira, P. O. Fernandes, and N. Alves, “Vocal Acoustic Analysis - Classification of Dysphonic Voices with Artificial Neural Networks,” in Procedia Computer Science, Elsevier B.V., 2017, pp. 19–26. doi: 10.1016/j.procs.2017.11.004.
R. Cebola, D. Folgado, A. Carreiro, and H. Gamboa, “Speech-Based Supervised Learning Towards the Diagnosis of Amyotrophic Lateral Sclerosis,” INSTICC, Mar. 2023, pp. 74–85. doi: 10.5220/0011694700003414.
Xie HS, Ma FR, Fan DS, Wang LP, Yan Y, and Lu PQ, “[Acoustic analysis for 21 patients with amyotrophic lateral sclerosis complaining of dysarthria],” Beijing Da Xue Xue Bao Yi Xue Ban., Oct. 2014.

Table 1

Clinical characteristics of the ALS population.
Clinical characteristics	ALS patients N = 27
Age (mean±SD)	60.9 ± 12.4
Gender
Men	12 (44%)
woman	15 (56%)
BMI (kg/m²) (mean±SD)	23.4 ± 8.2
symptom duration (months)
median	28
1^st−3^RD Interquatile range	8–141
disease onset
bulbar onset	7 (26%)
upper limb onset	7 (26%)
lower limb onset	13 (48%)
alsfrs-r total score (0–48) (mean±SD)	39.4 ± 3.1
bulbar dysfunction	12 (44%)
FVC (%) (mean±SD)	73 ± 15.86

BMI, body mass index; ALSFRS-R, revised amyotrophic lateral sclerosis functional rating scale; FVC, forced vital capacity.

Table 2

Correlation analysis between ALSFRS-R total score and the instrumental-based voice measures (extracted from phrase C and vowel /a/), and the CAPE-V scores.
	ALSFRS-R total score
	R/r value	p value
Phrase C
Speaking rate	R = 0.37	0.055
Pause time	r = -0.40*	0.039
Absolute energy	R = -0.25	0.20
Fundamental frequency	r = -0.31	0.11
Entropy of the signal	R = 0.27	0.18
Power of the signal	r = 0.073	0.71
Spectral bandwidth	r = 0.44*	0.021
Shimmer	R = -0.061	0.76
Jitter	R = 0.078	0.69
HNR	R = -0.34	0.084
Vowel A
Absolute energy	R = -0.23	0.24
Fundamental frequency	R = -0.33	0.097
Entropy of the signal	r = -0.26	0.19
Power of the signal	R = 0.23	0.25
Spectral bandwidth	r = 0.22	0.22
Shimmer	r = 0.18	0.36
Jitter	r = -0.13	0.52
HNR	R = -0.19	0.33
CAPE-V score
Overall severity	r = -0.34	0.079
Roughness	r = -0.25	0.21
Breathiness	r = -0.26	0.18
Strain	r = -0.18	0.36
Pitch	r = -0.29	0.14
Loudness	r = -0.34	0.060

Spearman’s rank coefficient is presented as r and Pearson coefficient is presented as R. Correlation is significant at the 0.05 level *.

Table 3

Correlation analysis between pulmonary function tests and the instrumental-based voice measures (extracted from phrase C and vowel /a/), and CAPE-V scores.
	FVC%					MIP%						MEP%
	R/r value	p value				R/r value					p value	R/r value			p value
Phrase C
Speaking rate	R = 0.43^*		0.025				R = 0.56^**		< 0.01				R = 0.52^**			< 0.01
Pause time	r = -0.28		0.15				r = -0.51^**		< 0.01				r = -0.53^**			< 0.01
Absolute energy	R = -0.51^**		< 0.01				R = -0.20		0.32				R = -0.058			0.77
Fundamental frequency	r = -0.32		0.10				r = 0.21		0.29				r = 0.049			0.80
Entropy of the signal	R = -0.084		0.67				R = 0.35		0.071				R = 0.38			0.051
Power of the signal	r = 0.32		0.11				r = 0.25		0.22				r = 0.20			0.31
Spectral bandwidth	r = -0.19		0.35				r = 0.19		0.34				r = 0.089			0.65
Shimmer	R = 0.48^*		0.011				R = 0.24		0.23				R = 0.20			0.31
Jitter	R = 0.23		0.22				R = 0.42^*		0.027				R = 0.28			0.15
Harmonic-to-noise ratio	R = -0.59^**		< 0.01				R = -0.34		0.086				R = -0.35			0.076
Vowel A
Absolute energy	R = -0.19			0.35				R = -0.10		0.61				R = 0.19			0.35
Fundamental frequency	R = -0.54^**			< 0.01				R = -0.14		0.48				R = -0.21			0.30
Entropy of the signal	r = -0.25			0.19				r = -0.26		0.19				r = < 0.001			0.98
Power of the signal	R = 0.37			0.059				R = 0.25		0.20				R = 0.38			0.052
Spectral bandwidth	r = -0.60^***			< 0.001				r = -0.19		0.34				r = -0.37			0.058
Shimmer	r = 0.19			0.35				r = 0.17		0.40				r = -0.073			0.71
Jitter	r = -0.22			0.26				r = -0.081		0.68				r = -0.29			0.14
Harmonic-to-noise ratio	R = -0.044			0.82				R = 0.064		0.74				R = -0.28			0.15
Cape-V score
Overall severity	r = -0.33				0.097	r = -0.49^*					0.010	r = -0.44^*					0.021
Roughness	r = -0.30				0.13	r = -0.36					0.062	r = -0.36					0.062
Breathiness	r = -0.36				0.066	r = -0.33					0.085	r = -0.36					0.068
Strain	r = -0.36				0.063	r = -0.12					0.54	r = -0.24					0.22
Pitch	r = -0.33				0.093	r = -0.39^*					0.042	r = -0.39^*					0.044
Loudness	r = -0.38				0.052	r = -0.51^**					< 0.01	r = -0.48^*					0.012

Spearman’s rank coefficient is presented as r and Pearson coefficient is presented as R.

Correlation is significant at the 0.05 level *, 0.01 level ** and < 0.001 level ***.

Table 4

Correlation between CAPE-V scores and instrumental-based voice measures (extracted from phrase C and vowel /a/).
	Overall severity					Breathiness
	r value	p value	r value		p value		r value	p value
Phrase C
Speaking rate	-0.53^**	< 0.01	-0.47^*		0.014		-0.24	0.22
Pause time	0.62^***	< 0.001	0.54^**		< 0.01		0.30	0.12
Absolute energy	0.26	0.18	0.30		0.13		0.082	0.68
Fundamental frequency	-0.24	0.23	-0.027		0.89		-0.27	0.17
Entropy of the signal	-0.39^*	0.043	-0.30		0.13		-0.10	0.60
Power of the signal	-0.020	0.92	-7.60e-3		0.97		-0.25	0.21
Spectral bandwidth	-0.19	0.33	-0.18		0.36		5.80e-3	0.97
Shimmer	0.042	0.83	0.079		0.69		0.13	0.51
Jitter	-0.24	0.22	-0.16		0.41		9.60e-3	0.96
HNR	0.26	0.19	0.23		0.24		0.12	0.56
Vowel A
Absolute energy	0.019	0.92	0.040		0.84		-0.011	0.95
Fundamental frequency	-0.031	0.87	0.081		0.68		0.034	0.86
Entropy of the signal	0.14	0.47	0.18		0.34		0.18	0.35
Power of the signal	0.059	0.76	0.067		0.73		0.071	0.72
Spectral bandwidth	0.015	0.93	-0.0047		0.98		-0.016	0.93
Shimmer	0.28	0.14	0.33		0.089		0.073	0.71
Jitter	0.47^*	0.013	0.47^*		0.012		0.20	0.30
HNR	-0.39^*	0.041	-0.34		0.086		-0.17	0.39
	Strain			Pitch		Loudness
	r value	p value	r value		p value		r value	p value
Phrase C
Speaking rate	-0.37	0.057	-0.50^**		< 0.01		-0.57^**	< 0.01
Pause time	0.47^*	0.014	0.64^***		< 0.001		0.61^***	< 0.001
Absolute energy	0.13	0.50	0.30		0.12		0.31	0.11
Fundamental frequency	-0.067	0.74	-0.070		0.72		-0.23	0.24
Entropy of the signal	0.27	0.17	-0.36		0.065		-0.35	0.076
Power of the signal	6.40e-3	0.97	-0.076		0.70		-0.034	0.86
Spectral bandwidth	-0.21	0.29	-0.17		0.40		-0.25	0.20
Shimmer	0.13	0.50	0.095		0.63		-0.041	0.84
Jitter	0.072	0.71	-0.18		0.36		-0.30	0.13
HNR	0.12	0.55	0.28		0.15		0.33	0.089
Vowel A
Absolute energy	-0.13	0.51	0.068		0.73		-0.037	0.85
Fundamental frequency	0.054	0.78	0.10		0.61		1.90e-3	0.99
Entropy of the signal	-0.017	0.93	0.17		0.39		0.13	0.48
Power of the signal	0.056	0.77	0.047		0.81		0.059	0.76
Spectral bandwidth	0.099	0.62	7.50e-3		0.97		0.010	0.95
Shimmer	0.31	0.11	0.30		0.12		0.28	0.16
Jitter	0.36	0.064	0.52^**		< 0.01		0.40	0.040
HNR	-0.20	0.31	-0.32		0.10		-0.37	0.054

Correlation is significant at the 0.05 level *, 0.01 level ** and < 0.001 level ***.

Table 5 is available in the Supplementary Files section.

The authors declare no competing interests.

Download PDF

Version 1

posted

You are reading this latest preprint version

Voice assessment in patients with amyotrophic lateral sclerosis: Association with bulbar and respiratory function

Status:

Version 1

Abstract

Figures

1. Introduction

2. Methods

2.1 Participants

2.2 Clinical evaluation

2.3 Voice sound recordings and auditory-perceptual assessment

2.4 Signal processing and feature extraction

2.5 Statistical analysis

3. Results

3.1 Demographics and clinical characteristics

3.2 Correlations between instrumental-based voice measures and CAPE-V scores and the disease functional state

3.3 Correlations between instrumental-based voice measures and CAPE-V scores and the respiratory function

3.4 Voice sound features related to bulbar symptomology

3.5 Correlations between the CAPE-V scores and instrumental-based voice measures

4. Discussion

Conclusion

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1