Detecting Effect of Levodopa in Parkinson’s Disease Patients Using Sustained Phonemes

Background: Parkinson’s disease (PD) is a multi-symptom neurodegenerative disease generally managed with medications, of which levodopa is the most effective. Determining the dosage of levodopa requires regular meetings where motor function can be observed. Speech impairment is an early symptom in PD and has been proposed for early detection and monitoring of the disease. However, findings from previous research on the effect of levodopa on speech have not shown a consistent picture. Method: This study has investigated the effect of medication on PD patients for three sustained phonemes; /a/, /o/, and /m/, which were recorded from 24 PD patients during medication off and on stages, and from 22 healthy participants. The differences were statistically investigated, and the features were classified using Support Vector Machine (SVM). Results: The results show that medication has a significant effect on the change of time and amplitude perturbation (jitter and shimmer) and harmonics of /m/, which was the most sensitive individual phoneme to the levodopa response. /m/ and /o/ performed at a comparable level in discriminating PD-off from control recordings. However, SVM classifications based on the combined use of the three phonemes /a/, /o/, and /m/ showed the best classifications, both for medication effect and for separating PD from control voice. The SVM classification for PD-off versus PD-on achieved an AUC of 0.81. Conclusion: Studies of phonation by computerized voice analysis in PD should employ recordings of multiple phonemes. Our findings are potentially relevant in research to identify early parkinsonian dysarthria, and to tele-monitoring of the levodopa response in patients with established PD.


I. INTRODUCTION
PARKINSON'S disease (PD) is the second most common neurodegenerative disorder [1]. With aging populations, its prevalence is expected to increase. The motor deficits of PD are caused by degeneration of the dopamine-producing (dopaminergic) neurons in the substantia nigra region of the brain. Pathological changes are present in other neuronal populations as well, explaining the development of various non-motor impairments in PD [2]. Most diagnoses are based on clinical detection of motor signs-the presence of two or more of tremor, rigidity, bradykinesia, or postural impairment [3]. Confirmatory evidence can be provided by dopamine transporter scanning, though this test is not widely available across the world. Other medical imaging modalities lack sensitivity. There is a need for biomarkers that can, with high reliability, recognize PD before overt motor signs appear.
The Movement Disorder Society Unified Parkinson's Disease Rating Scale Part III (MDS-UPDRS-III) [4] is the standard tool for objective measurement of parkinsonian motor disability. However, scoring requires clinical observations and has the potential limitations of subjectivity, clinician bias, and inter-rater variability [5]. Consequently, there is some loss of sensitivity for early stage diagnostics, for monitoring disease progression, and for assessing the effectiveness of medication or other therapies [6]. The requirement for regular clinical visits can be burdensome in some circumstances, and there is a need for an objective measure of PD symptoms VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ that is suitable for telehealth applications. The use of gait analysis [7] and handwriting have been proposed [8], but these require specialized equipment.
One of the early symptoms of PD is change in voice, which can precede other motor features [9]. Voice testing has been proposed for early diagnosis of the disease, or to monitor its progression [10]. Human speech is an overtrained and habitual response that requires fine-motor control, cognitive abilities, auditory feedback, and muscle strength [11]. Parkinsonian dysarthria can be characterized by reduced vocal tract loudness, reduced speech prosody, imprecise articulation, significantly narrower pitch range, longer pauses, vocal tremor, breathy vocal quality, harsh voice quality, and disfluency [12].
Studies on the voice or speech parameters can be divided into four groups based on the analyzed aspect: phonatory, articulatory, prosodic, and linguistic [13]. The study of articulatory, prosodic, and linguistic aspects [14] involves more complex and broad factors such as the psychology, linguistics, and cognitive conditions of patients, and this makes it difficult to diagnosis. On the other hand, phonatory aspects of voice are less obscured by these conditions. Phonation relates to the glottal source and resonant structures of the vocal tract and has greater potential for reliable diagnose of PD.
Numbers of studies have investigated the voice parameters obtained from sustained phonemes to determine the differences between PD patients and healthy participants [15]- [19]. Behroozi and Sami [20] introduced a multi-classifier framework to separate PD patients from healthy controls. The use of deep learning has also been applied for the classification of voice recordings [21]. Vaiciuknas et al. [12] investigated the strategy for PD screening from sustained phoneme parameters and text-dependent speech modalities. Voice analysis has also been proposed for estimating the severity of the disease. Tsanas et al. [22] investigated the relationship between speech signal parameters and motor disability score of PD patients. Perez et al. [23] developed an automatic feature extraction to diagnose PD and to track its progression. Khan et al. [24] evaluated PD severity based on vocal function assessment from audio recordings.
The voice features that have shown a significant difference between the voice of healthy and PD patients are pitch frequency, jitter, shimmer, and harmonics to noise ratio [9]. However, these parameters can also be affected by other factors such as age, gender, and ethnicity, and this can result in poor reliability. The pitch frequency, f 0 , is the fundamental frequency of the vocal cords when producing a sound or phoneme and varies with sex, age, and health conditions. Jitter is the perturbation of the glottal vibration period which is affected by the diminished motor control, rigidity, and tremor of the larynx. Shimmer, the amplitude perturbation, is related to the glottal resistance and increases due to lack of control of the voice box and the breathing muscles. Harmonics to noise ratios (HNR and NHR) are the ratios between the periodic (voiced) and non-periodic (noise) component of the speech. These indicate the relative harmonic strength which is reduced with diminished glottal vibration. Low HNR is an indicator of the existence of dysarthria. Studies have reported the use of other features such as the fractal dimension (FD) [25] linear predictive model (LPM) [13], multivariate deep features [21], and entropy [25]. Many of these studies have shown these features are very effective in differentiating between the voices of PD and healthy people.
The motor symptoms of PD are managed by dopaminergic pharmacological treatments, of which levodopa is the most effective and widely used. Most patients improve on levodopa, though one weakness of the drug is the tendency for an unstable, fluctuating response to develop after a number of years [26]. Careful balancing of levodopa dosage and addition of other agents are often required to counteract these motor fluctuations. This is often a trial-and-error process, which may require a patient to undertake multiple visits to their neurologist [27]. Computerized analysis of speech could be useful for remotely monitoring the medication effects in PD patients. However, the results of studies that have evaluated the effect of medication on speech and voice parameters in PD are inconsistent, even contradictory [28]. The study by Rusz et al. [9] showed that levodopa might improve the consonant articulation in the early stages of PD. Elfmarkova et al. [29] found that levodopa only partially improves speech prosody in some patients. Contradicting these are the findings of Tykalova et al. [30] who reported an increase in dysfluent speech after 3 -6 years of dopaminergic treatment compared to a drug-naive condition. Cusnie-Sparrow [31] found that the magnitude of the levodopa response may increase with increasing severity of the voice quality symptoms. Skodda et al. [32] studied the short and long-term dopaminergic effects on dysarthria in early Parkinson's disease. They found that none of the parameters of phonation, intonation, articulation, and speech velocity improved significantly in the ''on'' state. Ho et al. [33] reported that there was no reliable or meaningful improvement in speech with the use of levodopa. However, the study did not use any of the above phonatory parameters. Instead, they used average intensity, intensity decay, and duration of speech within one breath envelope.
Studies that have primarily focused on phonation in PD do not show a consensus about levodopa effect. Sanabria et al. [34] found some decrease in jitter, fundamental frequency and harmonic-to-noise ratio in response to levodopa medication. However, they found no significant differences in shimmer. Goberman et al. [35] did not find any significant differences between PD and healthy groups in the fundamental frequency variability of prolonged vowels. They also found that the group differences between PD patients before and after medication was small. The study of Fabbri et al. [36] found no significant effect of levodopa on speech or voice. De Letter et al. [37] studied the changes of phonatory speech characteristics across a levodopa dose cycle. They investigated several respiratory, articulatory, prosodic measures, as well as phonatory features such as pitch, jitter, shimmer, harmonics to noise ratio. They found that the majority of speech acoustic parameters do not vary significantly with levodopa. Notably, only a single sustained phonation, either /a/ or /i/ was used in the studies of Rusz et al. [9], Santos et al. [38], and De Letter et al. [37].
The aim of this study was to investigate the use of phonatory parameters to classify PD patients before and after levodopa medication. We examined the change in phonatory parameters by using a statistical hypothesis test and the Support Vector Machine (SVM) algorithm to separate the two PD medication states and the control groups. Three different sustained phonemes were considered: /a/, /o/ and /m/. These phonemes were selected to examine a range of voice production [11]. The vowel /a/, as in ''car'', is an openback or low vowel, which is produced while the jaw is widely open, with the tongue is inactive and positioned low in the mouth. The vibration of the vocal folds dominates the sound of the vowel. The vowel /o/, as in ''oh'', is a closed-mid-back vowel, in which the back of the tongue is positioned midhigh towards the palate, and the lips are at a rounded position. The phoneme /m/ is a voiced nasal phoneme which is produced by the vibration of the vocal folds with the air flowing through the nasal cavity. Although all three phonemes require control of the respiratory and laryngeal vocal fold muscles, there are considerable differences in patterns of activation of the rostral muscles of articulation (of pharynx, tongue, jaw and lips). Observations on a selection of voice parameters will reveal the effect of PD and its medication on each of these phonemes. Besides the statistical analysis, the machine learning approach was used to investigate the possible nonlinear separation between the two classes.

A. PARTICIPANTS
Twenty-four PD patients (13 males and 11 females) were recruited from the Movement Disorders Clinic at Monash Medical Centre. All had been diagnosed within the last ten years and complied with the Queen Square Brain Bank criteria for idiopathic PD [39]. The presence of any advanced PD clinical symptoms-visual hallucinations, frequent falling, cognitive disability, or need for institutional care-was an exclusion criterion [40]. Twenty-two healthy participants (12 males and 10 females) were recruited from several retirement centers.
PD participants were first assessed in a practically defined off state (PD-off) (fasting, with anti-parkinsonian medication withheld for at least 12 hours). They were retested in the on state (PD-on), taken to be the maximum improvement 30 -90 minutes after a subject's usual morning levodopa dose. Motor function in off and on states was scored by a neurologist on the MDS-UPDRS-III [4]. Cognitive function was scored on the Montreal Cognitive Assessment scale [41]. The mean levodopa equivalent daily dose, calculated using standard conversion factors, was 480 ± 296 mg/day [42]. Table 1 presents participants' demographic and clinical information. The study protocol was approved by the ethics committee of Monash Health, Melbourne, Australia (LNR/16/MonH/319) and RMIT University Human Research Ethics Committee, Melbourne, Australia (BSEHAPP22-15KUMAR). Before the experiments, written informed consent was obtained from all the participants.

B. VOICE RECORDING
Three sustained phonemes /a/, /o/, and /m/ were recorded from each participant. The participants were instructed to pronounce the vowel for as long as it was comfortable, with their natural pitch and loudness.
The phonemes were recorded using Samson-SE50, an omnidirectional head-worn microphone. The recordings were saved into a single-channel uncompressed WAV format with a sampling rate of 48 kHz and a 16-bit resolution. Each recording contained one single sustained phoneme of 5.1 to 38.6 seconds duration as shown in Table 2. There were 60 seconds of relaxation time between each recording. The recording was performed in a noise-restricted room. The deidentified data is available on RMIT website and has been reported earlier [25].

MATLAB2018b (MathWorks) was used for all analyses.
Each recording was manually segmented to eliminate any unwanted sections such as silent pieces and the voice of the instructor. Based on the assumption that vowels correspond to largely stationary signals, and the need for more samples for the purpose of cross-validation, each recording was divided into ten segments of 0.5 seconds each, and the jitter, shimmer, pitch, and harmonics parameters of each segment was calculated. Each segment was considered as an individual example.
The features of each segment were calculated using Praat [43], a publicly available software for analyzing, synthesizing, and manipulating speech. The first step for feature extraction was to locate the time instances (t i ) of the pulses VOLUME 9, 2021 in the recording that represent the glottal vibration. The instantaneous period of the glottal wave (T i ) was calculated as the difference between subsequent instances of the pulses, Four jitter parameters were extracted from the recordings: jitter absolute (abs), jitter relative (rel), relative average perturbation (rap), and period perturbation quotient-5 (ppq5). The rap and ppq5 are the perturbation of the difference between T i and the moving average of T i with a window size of 3 and 5, respectively. The equation to calculate the four jitter parameters [44] are shown in equations 1 to 4: Five shimmer parameters extracted from the segments are the absolute shimmer (in dB), the relative shimmer, apq3, apq5, and apq11 measured in percentage. The apq3, apq5, and apq11 are the perturbation of the difference between A i and the moving average of A i with a window size of 3, 5, and 11, respectively. The parameter calculations are described in equations 5 to 9.
The pitch parameters are the mean, median, standard deviation, maximum, and minimum of the instantaneous pitch frequency f 0i = 1/T i . The HNR and NHR were calculated based on the normalized autocorrelation function of the segment. R xx [T 0 ] is the peak next to the centre of R xx at a distance corresponding to the T 0 of the recording. The HNR and NHR were calculated as described in equations 10 and 11 [45], [46]: The scatter plots on Fig. 1 illustrate the distribution of the features for the different classes. The figure indicates that there is a high level of overlap between classes.

D. STATISTICAL ANALYSIS
All the statistical analyses were performed using MAT-LAB2018b (MathWorks). The normality of the extracted parameters was examined with the Anderson-Darling test [47]. Statistical non-parametric Wilcoxon signed-rank test [48] was then applied to compare voice parameters between PD-on and PD-off to determine the effect of medication on the patient. Mann Whitney U-test [48] was used to compare the group differences for voice parameters between CO and PD-off, and CO and PD-on. The p-values for age and MoCA scores were calculated using independent sample t-tests, while paired t-testing was used to compare PD-off and PD-on MDS-UPDRS-III scores.
The 95% confidence level was considered for the analysis and p − value < 0.05 indicated that the mean of the groups was significantly different.

E. MACHINE LEARNING BASED CLASSIFICATION
Support Vector Machines (SVM) [49] classifier with Gaussian kernel and fifth-order cross-validation model was used in this study. The Gaussian kernel was selected since it yielded the best result compared to the other kernels. It was trained to model the hyperplane that can separate the groups using the input as the extracted voice features. Seven SVMs were created to classify the groups. The input to the SVMs were the voice features described in Section II (c) with the exception of mean, median, max, and min of pitch features because of the known gender-based difference-the dataset was not suitable for testing based on the gender divide. The input to the first three SVMs was the parameters of phoneme /a/, /o/, and /m/, respectively. The input to the other three SVMs was the combination of two phonemes, /a/+/o/, /a/+/m/, and /o/+/m/. The parameters of all three phonemes were given as the input to the seventh SVM.
For each SVM training, only 80% of the training sets (randomly picked) were used, while the other 20% were used for testing (5 th order validation). The above has now been inserted in the revised manuscript.
The   false-negative (FN). The Receiver Operating Characteristic (ROC) curve was generated, and the Area Under Curve (AUC) was calculated for each SVM model.

A. STATISTICAL ANALYSIS
Anderson-Darling test confirmed that the voice parameters for the three groups and the three phonemes were not normally distributed and thus unsuitable for parametric test. Mann Whitney U test was used to test for group differences in each of the features. Wilcoxon signed-rank test was used to test the differences between dependent data of PD-on and PD-off. Table 3 presents the result of the Wilcoxon signed-rank test between PD-on and PD-off, and this identifies the features that are significantly changed by medication. It shows that p-value was less than 0.05 for most of the parameters of phoneme /m/. The jitter and harmonics parameters of phoneme /o/, as well as the pitch of phoneme /m/ were also changed due to medication.

2) MANN WHITNEY U-TEST BETWEEN CO AND PD-OFF
The Mann Whitney U-test results for group differences between CO and PD-off are demonstrated in Table 4. The   table shows that there was a significant group difference between the majority of the voice features of all the phonemes except the shimmer of phoneme /a/. Table 5 gives the p-values for group differences between CO and PD-on. The results show that jitter, shimmer, and harmonics parameters of phoneme /o/ of the two groups were well separated. The jitter of phoneme /a/ and /m/ were effective to differentiate CO and PD-on.

4) SUMMARY OF THE STATISTICAL ANALYSIS
Comparison of the results on the above statistical analysis, confirm that the majority of the parameters of the phoneme /o/ were significantly different between the healthy subjects (CO) and PD patients; both, PD-on and PD-off. Phoneme /m/ was effective to identify the change due to medication as well as differentiating between CO and PD-off. It is also seen that harmonics parameters of /o/ and /m/ were statistically different between PD-off and PD-on.

B. CLASSIFICATION ANALYSIS
The results of classification by SVM of the voice features of each of the three phonemes are shown in Table 6. It is seen that the best classification result between the three groups were obtained with the combination of the phonemes. The best AUC for the classification between PD-off and PD-on was 0.81. Classification between CO and PD-off shows that the best result was with the combination of the three phonemes, with AUC = 0.90. The best classification result between CO and PD-on was with the combination of phonemes /a/ and /m/ with AUC = 0.86.

IV. DISCUSSION
The statistical analysis in Table 3 shows that /m/ was the best performing individual phoneme in differentiating PDoff from PD-on. Time and amplitude perturbations (jitter and shimmer) decrease with medication, suggesting that levodopa improves voice quality. In differentiating control from PDoff recordings, in effect detecting parkinsonian dysarthria, all /a/, /o/ and /m/ achieved comparably good levels of statistical significance (Table 4 ) with the exception of shimmer of /a/. Table 5 shows a drop-off in significance of /m/ after medication, reflecting the shift of PD-on towards control values because of the better levodopa response for this phoneme. Table 6 presents the SVM classifications with a Gaussian kernel. SVM classifications reveal that the best results were obtained by combining the features for all three phonemes used in this study: /a/, /o/ and /m/. This can be seen for each of the comparisons, with the combined phonemes achieving AUCs between 0.81 and 0.90. SVM classification to separate control and PD-on with the phonemes /a/ and /m/ was slightly better than that of the three phonemes. Table 6 also shows that the ranking of the individual phonemes to separate control from PD values was consistent with those derived from Mann Whitney U testing in Tables IV and V, with the exception of the classification between control and PD-on. The explanation for these divergences could be that there are both linear and non-linear effects of the parkinsonian state on voice. The SVM used a Gaussian kernel and thus performed nonlinear separation, in contrast to the linear statistical analysis of Tables 3-5.
The voice features that identify parkinsonian dysarthria are not exactly congruent with those that recognize the levodopa response in PD. This is relevant to the two different types of task for which computerized voice techniques might be used in research and clinical practice. One is the early detection of motoric evidence of PD in individuals at risk of developing the disorder. The other is the monitoring of treatment effects in established PD, either in the clinic or in drug trials. Nev-ertheless, for both of these purposes, we have demonstrated that combined analysis of a set of phonemes comprising /a/, /o/ and /m/ should give a satisfactory level of sensitivity.
Cusnie-Sparrow [31] found a significant change in percent and absolute shimmer when comparing control and PD patients. They reported that jitter and shimmer of the sustained phoneme /a/ demonstrated moderate correlations with perceived voice quality and showed sensitivity to medication. We suspect that some earlier studies that used only /a/ or /i/ to assess levodopa responsiveness may have performed better if other phonemes had been considered [34], [49], [50].
In our study, the features most significantly changed by medication were the time, amplitude, and harmonic perturbation of /m/. Of the three phonemes examined, /o/ probably requires the greatest aggregate control of muscles of articulation-precise positioning of the tongue at mid-height, with a rounded formation of the lips [51]. Some tongue control is required for /a/, with the lips open. For /m/, the lips are simply closed, tongue position is of little consequence, and air is passed through the nasal cavity. The relatively good response of /m/ to levodopa could imply that fine control of anterior articulatory muscles (of tongue, lips, and jaw) shows a degree of resistance to medication. There are many examples of uneven levodopa responsiveness in other aspects of parkinsonism. Gait freezing and postural instability can be refractory to drug treatment [52]. Levodopa improves speed and amplitude of finger tapping, but motor decrement shows little benefit [53]. Basic motor deficits of tremor and bradykinesia are often differentially affected by dopaminergic medication. Further research is needed to understand better the selective character of levodopa's actions on voice production in PD. We recruited a group of patients at relatively early in their PD course (mean duration 5.3 years). Their MDS-UPDRS-III motor disability scores and degree of levodopa responsiveness were in keeping with this stage of the disease [54].
There are two novelties of this study. The first is that it has found that medication has a significant effect on the change of time and amplitude perturbation (jitter and shimmer) and harmonic to noise ratio of the phoneme /m/ which is confirmed also by SVM classification. Thus, /m/ can be used to differentiate between PD-on and PD-off. The second novelty is that this study confirms that the three groups can be differentiated best when the three phonemes, /a/, /o/ and /m/, are used.
The limitation of this study is that because of the relatively small number of participants, it was not possible to differentiate between genders. Further, the differences such as accents, demographics, and language skills have not been investigated. Another shortcoming of this study is that each participant was only studied once and hence the repeatability has not been checked. Recruitment of controls was subject to some conditions from Research Ethics approval to advertising, and the mean control group age is about 5 years younger than the PD patients.

V. CONCLUSION
This study has investigated the effect of levodopa medication on the voice of PD patients based on utterance of three phonemes, /a/, /o/ and /m/. It has found that medication has a significant effect on the change of time and amplitude perturbation (jitter, shimmer and harmonic to noise ratio) of the phoneme /m/. But the highest accuracy in differentiating PD-on and PD-off, using SVM, was when all three phonemes were used. While /o/ showed the greatest differences between PD and controls, the best classifications when using SVM were again obtained from combined analysis of all three phonemes. Whether attempting to separate parkinsonian dysarthria from control voice, or to detect the levodopa effect on voice in PD, this study shows that computerized analysis of multiple phonemes should be employed. Our findings are potentially relevant in research to identify early parkinsonian dysarthria, and for tele-monitoring of the levodopa response in patients with established PD.