From a clinical perspective, as hypothesized, we found significant correlations between voice sound features and the bulbar and respiratory function in ALS. Regarding the objective analysis of instrumental-based voice measures, particularly those derived from phrase C, we found that they mirrored global functional status (Table 2). In more detail, correlations between intelligible factors such as speaking rate and pause duration and the functional state of the disease align with findings from previous studies [28, 29, 30]. Patients in advanced stages of the disease (with lower ALSFRS-R total scores) exhibit reduced speaking rates and increased pause times.
Additionally, this work revealed a positive correlation between the ALSFRS-R total score and the spectral bandwidth (Table 2). In normal, healthy speech, sounds are composed of a combination of different frequencies, and the spectral bandwidth provides insight into the distribution of these frequencies. This finding implies that individuals in poorer functional states often exhibit a more restricted frequency range in their speech compared to those in better states. On the other hand, CAPE-V scores applied by the speech therapist did not correlate with the total ALSFRS-R score (Table 2). Nevertheless, they proved effective in evaluating both bulbar and respiratory impairments, as nearly all the sub-scores enabled the differentiation of patients with and without bulbar impairments, and the CAPE-V’s overall severity, pitch, and loudness were significantly correlated with MIP% and MEP% (Table 3). These positive outcomes were anticipated because voice sound production results from a highly coordinated process between the bulbar and respiratory muscles. Therefore, voice assessment should be sensitive to detect bulbar and respiratory impairments. Similarly, in the analysis of phrase C, instrumental-based voice sound measures also had significant correlations with bulbar symptomology and respiratory function variables (FVC%, MIP%, and MEP%) (Table 3), particularly the speaking rate, pause time, sound energy, and variables assessing sound variability, such as jitter, shimmer, and HNR.
Jitter, shimmer, and HNR are becoming very prominent as they were found to be enough to accurately detect laryngological pathologies using machine learning algorithms [31] and also bulbar involvement in ALS patients [24, 32]. However, they were rarely used to assess respiratory function. Jitter is a measure of frequency perturbation, shimmer is a measure of amplitude perturbation, and HNR represents the ratio between the periodic (vibrations of the vocal cord) and non-periodic elements (glottal noise).
This work not only supports that patients with bulbar impairments have less capacity for varying the intensity, as shown by lower shimmer values. Regarding jitter, we found that this was higher in patients with bulbar symptoms during the phonation of vowel /a/, highlighting the importance of the inherent nature of the task. This finding aligns with the study by Xie et al. (2014) [33], which also demonstrated a similar result. We speculate that, when sustained phonation is required, patients with bulbar dysfunction encounter greater challenges in controlling slight variances in sound frequency, especially due to varying properties of the medium (the vocal tract) through which the sound wave travels.
From a technical perspective, we intended to demonstrate the association between frequency-related voice sound features with bulbar dysfunction, and intensity-related voice sound features with respiratory impairment (Table 5, in general). As briefly explained, frequency is perceived by the voice pitch and intensity by its volume. In broad, frequency is very dependent on vocal cord functionality, as it results from its variations, and intensity is very dependent on air volume, which results from respiratory muscle function.
Overall, in both subjective and objective evaluations, we found this consistent pattern: jitter, roughness, pitch, and strain values exhibited stronger correlations with bulbar symptomatology, while shimmer and loudness showed stronger associations with respiratory impairment. This implies that outcomes such as loss of harmonic complexity, narrowed frequency range, and increased regularity or voice sound periodicity are linked with tension or stiffness in vocal cords, significantly restricting the vibrational patterns of sound, and the loss of varying sound intensity linked to abnormal lung function. Absolute energy and HNR demonstrated correlations with both bulbar and respiratory function, providing insights into sound frequency and intensity.
Another critical finding involved examining the specific sound wave features on which the subjective evaluation relied (Table 4). While this type of evaluation is non-invasive, well-tolerated by the patients, brief, and cost-effective, it remains a challenging endeavor due to its subjective nature, as it is influenced by the internal standards of listeners, their background experience, and training. In this work, we found that subjective evaluations, across all sub-scores, heavily depended on intelligibility factors, particularly the speaking rate and pause time, as we found moderate to strong correlations between these metrics and the CAPE-V’s overall severity, roughness, strain, pitch, and loudness.
The findings mentioned above reinforce three key points: 1) the extent to which a speaker is comprehensible to a listener is very important, with speaking rate and the pause time being crucial contributors (even considering correlations between sound entropy in phrase C, and jitter and HNR in vowel /a/ with CAPE-V’s overall severity); 2) the challenge of accurately evaluating features like fundamental frequency, sound energy, power, and others solely through auditory perception (even with correlations between jitter in vowel /a/ and roughness and pitch); and 3) the importance of assessing a phrase in combination to a sustained vowel. Thus, the subjective analysis should be complemented with a more objective and personalized acoustic analysis, directly related to the muscle’s functionality. Lastly, in the realm of acoustic analysis as a method for detecting voice impairments, there is not only a lack of standardized methodologies, protocols for collecting voice samples or approaches and algorithms for extracting sound features, but, frequently, conclusions are drawn from diverse populations.
From a physiological perspective, especially in the context of voice assessment, it is crucial to consider that not all languages share the same phonemes, and even within a single language, phonemes can vary, influenced by factors such as regional differences [15]. Furthermore, reproducing phonemes not present in one´s native language poses challenges, as it requires an unfamiliar positioning of the organs responsible for producing speech such as the lips, oral cavity, tongue, teeth, palate, pharynx, and nasal cavity – and thus, resulting in different instrumental-based voice measures. This work highlights the acoustic analysis in a Portuguese population – which speaks a Latin-derived language.
Limitations
The most impactful constraint was the limited sample size, which posed challenges in assessing the generalization of the findings. Specifically, it hindered the possibility of establishing correlations while controlling for various confounding factors, including age, gender, and symptoms’ duration. Moreover, considering that the evaluation hinges on perceptual assessment and the experience of the speech-language therapist, it would have been beneficial to have voice recordings evaluated by multiple specialists to minimize potential errors. Furthermore, the objective analyses of only the phrase C and vowel /a/ can also limit the relationship between this approach and the subjective assessment. Owing to the cross-sectional nature of the study, causal relationships could not be determined. Performing a longitudinal voice sound analysis, making clinical correlations with phrenic nerve motor amplitudes and cervical muscle strength, and adjusting the results for other motor neuron diseases are future perspectives that could also help elucidate the results of this work.