Effects of Intensive Voice Treatment (the Lee Silverman Voice Treatment [LSVT]) on Vowel Articulation in Dysarthric Individuals With Idiopathic Parkinson Disease: Acoustic and Perceptual Findings

Purpose: To evaluate the effects of intensive voice treatment targeting vocal loudness (the Lee Silverman Voice Treatment [LSVT]) on vowel articulation in dysarthric individuals with idiopathic Parkinson’s disease (PD). Method: A group of individuals with PD receiving LSVT (n = 14) was compared to a group of individuals with PD not receiving LSVT (n = 15) and a group of agematched healthy individuals (n = 14) on the variables vocal sound pressure level (VocSPL); various measures of the first (F1) and second (F2) formants of the vowels /i/, /u/, and /a/; vowel triangle area; and perceptual vowel ratings. The vowels were extracted from the words key, stew, and Bobby embedded in phrases. Perceptual vowel rating was performed by trained raters using a visual analog scale. Results:Only VocSPL, F2 of the vowel /u/ (F2u), and the ratio F2i/F2u significantly differed between patients and healthy individuals pretreatment. These variables, along with perceptual vowel ratings, significantly changed (improved) in the group receiving LSVT only. Conclusion: These results, along with previous findings, add further support to the generalized therapeutic impact of intensive voice treatment on orofacial functions (speech, swallowing, facial expression) and respiratory and laryngeal functions in individuals with PD.

voice and speech deficits of PD, collectively termed hypokinetic dysarthria (Darley, Aronson, & Brown, 1975), are often accompanied by orofacial abnormalities such as disturbed swallowing, reduced facial expression, and tremor (Perez, Ramig, Smith, & Dromey, 1996;Sharkawi et al., 2002;Spielman, Borod, & Ramig, 2003).Pharmacological, surgical, and traditional speech therapy methods for PD have yielded disappointing results in terms of magnitude and long-term effects of therapeutic outcome (cf.Goberman & Coelho, 2002;Sapir, Ramig, & Fox, 2006, 2007;Trail et al., 2005).In contrast, the Lee Silverman Voice Treatment ( LSVT) has been shown to produce marked and long-term improvement in voice and speech functions in individuals with PD (Farley, Fox, Ramig, & McFarland, in press;Trail et al., 2005).In fact, it is the only speech treatment for PD that has been tested by randomized controlled trials (Ramig, Sapir, Countryman, et al., 2001;Ramig, Sapir, Fox, & Countryman, 2001) and has Level 1 evidence for short-and long-term treatment outcomes (Fox et al., 2006;Suchowersky et al., 2006).The LSVT is an intensive, 1-month speech therapy regimen that trains dysarthric individuals with PD to speak in a louder voice while self-monitoring the effort it takes to produce such a voice (Ramig et al., 1988).
The LSVT capitalizes on the known effects of increased vocal effort and loudness on the respiratory, phonatory, and articulatory subsystems of speech (Baker, Ramig, Sapir, Luschei, & Smith, 2001;Schulman, 1989).Vocal loudness is largely a perceptual correlate of the acoustic power radiated from the speaker's mouth.This acoustic power is the product of the glottal power source and the vocal tract power gain (Titze, 2004;Titze & Sundberg, 1992).The regulation of the glottal power source is largely determined by respiratory and laryngeal adjustments and the interaction between aerodynamic and vocal fold viscoelastic forces (Baker et al., 2001;Finnegan, Luschei, & Hoffman, 2000;Stathopoulos & Sapienza, 1993;Titze, 2004).The enhancement of the amplitude of particular bands of frequencies by the vocal tract is largely determined by the three-dimensional characteristics of the vocal tract, which are regulated by orofacial and extrinsic laryngeal muscles (Erickson, 2002;Honda & Kusakawa, 1997;Maeda & Honda, 1994;Perkell, Matthies, Svirsky, & Jordan, 1993;Riordan, 1977;Sapir, 1989).
This study deals primarily with the effects of LSVT on speech articulation in individuals with PD.It also deals with trained (rather than stimulated) loud speech.Stimulation refers to situations in which the speaker is asked or instructed to perform a task, such as speaking in a loud voice (e.g., "say that twice as loud") in a single session-that is, stimulation induces a transient behavior in response to an external cue.Training refers to a systematic and intensive program (e.g., in the case of LSVT/LOUD, sixteen 60-min sessions of individual therapy in 1 month) that is designed to change a behavior such that the speaker will internally cue himself or herself for the behavior, the speaker will not depend on external cueing, and changes will be sustained over a period of time (i.e., over months or years).Thus, training involves learning, memory, and reliance on internal sources (self-cueing, self-regulation) to maintain the acquired behavior.
Physiologic evidence suggests that the three subsystems of speech-respiration, phonation, and articulationare involved in vocal loudness regulation in a highly orchestrated, integrated, quasi-automated manner (McClean & Tasko, 2002, 2003;Wohlert & Hammen, 2000).The central mechanism underlying this regulatory system is not clear but possibly involves a phylogenetically old neural network that regulates emotive vocalization and that is subjugated to higher neural networks involving linguistically driven motor control (Sapir et al., 2006).Vocal loudness might also be regulated, or at least influenced, by a biomechanical and neurophysiologic linkage between the articulatory and phonatory systems.Specifically, articulatory positions or movements have been shown to influence laryngeal muscle activity, vocal fold closure, laryngeal tension, transglottal air flow, maximum air flow declination rate, and air pressure, with some of these adjustments also correlating strongly with vocal sound pressure level (VocSPL) or vocal loudness level (Cookman & Verdolini, 1999;Higgins, Netsell, & Schulte, 1998;Larson & Sapir, 1995;McClean & Tasko, 2002;Sapir, 1989).Whether these articulatory influences on laryngeal function are intentional is not clear (Higgins et al., 1998).At least in singers, it appears that articulatory adjustments are intentionally used to influence laryngeal function-for example, in the control of vibrato, vocal loudness, and pitch, as argued elsewhere (Sapir, 1989;Sapir & Larson, 1993).
The specific articulatory adjustments associated with the regulation of vocal loudness in healthy speakers are only partially understood.Using kinematic and acoustic measures, Schulman (1989) found that the production of stimulated loud speech compared with normal loudness speech was characterized by amplification of normal articulatory movement patterns.Tasko and McClean (2004) found an increase in articulatory displacement as a function of low versus habitual versus high loudness level in speech tasks performed by healthy speakers.However, there was a complex interaction among speech loudness level; rate of speech; different speech tasks (a nonsense phrase, a sentence, an oral reading passage, and a spontaneous analog); and kinematic measures of lower lip, upper lip, tongue blade, and jaw movements.Some of the changes in articulatory movements with loudness, rate, and type of speech were specific to a particular articulator.Dromey and Ramig (1998) studied the effects of speech rate and vocal loudness in healthy adult speakers.Their kinematic measurements of labial movements showed increased articulatory displacements and peak velocities of the upper and lower lips associated with increased VocSPL.Using magnetic resonance imaging of laryngeal and oropharyngeal movements, Di Girolamo and colleagues (1996) found that during vowel production with loud phonation, the vocal tract was enlarged at the oral cavity for the vowels [a] and [u] and at the pharynx for the vowel [i].Using acoustic and kinematic measurements, Erickson (2002) found that an emphasized low vowel /a / in a syllable was produced with greater jaw movement (mouth opening) and lower tongue dorsum position than the same vowel in an unemphasized syllable.She also found that in the case of /i /, the emphasized /i / was produced with greater jaw movement (mouth opening) and more anterior positioning of the tongue dorsum compared with the unemphasized /i /.The acoustic correlates of such emphasis were an increase in F1 and a decrease in F2 for the vowel /a / and a decrease in F1 and an increase in F2 for the vowel /i /.Erickson interpreted these findings to suggest that the movement of the jaw in emphasized vowels is directly related to the regulation of vocal loudness, whereas the movement of the tongue is mostly related to the phonetic gesture and its extreme expression in the emphasized condition.
Several studies have examined the effects of loud voice stimulation or loud speech training on articulatory function in dysarthric speakers.Tjaden and Wilding (2004) compared dysarthric speakers (with PD or multiple sclerosis) with healthy speakers during loud speech production and normal habitual speech loudness level.They found that increased loudness was associated with an increase in stop consonant acoustic distinctiveness, as reflected in the dynamics of formant transition.Scaled intelligibility for the dysarthric speakers with PD also improved in the stimulated loud condition.Dromey, Ramig, and Johnson (1995), in a study of an individual with PD treated with LSVT, found that increased VocSPL following therapy was accompanied by improved laryngeal function and speech articulation, the latter reflected in an increase of duration and extent of F2 transition.Using acoustic and perceptual measurements, Sapir et al. (2003) studied the impact of LSVT on the dysarthric speech of a woman with a 5-year history of ataxic dysarthria secondary to cerebellar dysfunction.The LSVT training resulted in marked improvement in VocSPL, along with articulatory functions, as reflected in significant pre-to posttreatment expansion of vowel triangle area and perceptual ratings of speech quality and intelligibility.Two perceptual studies (Ramig, Countryman, Thompson, & Horii, 1995;Sapir et al., 2002) that used larger samples of individuals with PD treated with LSVT have demonstrated significant improvement in speech quality and intelligibility.Several studies have also demonstrated LSVT-induced improvement in nonspeech orofacial functions, such as tongue pressure and motility (Ward, Theodoros, Murdoch, & Silburn, 2000), oropharyngeal phase of swallowing (Sharkawi et al., 2002), and facial expression (Spielman et al., 2003).
The foregoing evidence suggests that loud phonation, whether stimulated or trained, has marked effects on speech articulation in both normal and dysarthric speakers.However, some of this evidence is based on perceptual assessment of speech quality and intelligibility, which can be strongly influenced by VocSPL, voice quality, and prosodic pitch inflection (De Bodt, Hernandez-Diaz, & Van De Heyning, 2002;Ramig, 1992;Watson & Hughes, 2006).Given that LSVT has been shown to improve VocSPL, voice quality, and prosodic pitch inflection (Baumgartner, Sapir, & Ramig, 2001;Ramig, Sapir, Countryman, et al., 2001), it would be important to evaluate whether some of the improvement in speech clarity and intelligibility is related to articulatory movements.The purpose of the present study was to do just that by means of articulatory acoustics and perceptual vowel ratings.Also, given that most of the acoustic and physiologic data on the effects of LSVT on speech articulation in dysarthric speakers have been based on case studies, we elected to study such effects on a group of individuals with PD treated with LSVT and to compare the results with two control groups, one of dysarthric individuals with PD not receiving LSVTand one of individuals with normal speech.
In the present study, we were specifically interested in studying changes in the F1 and F2 of vowels, as it has been shown that these formants and their various combinations (e.g., F2-F1, F2 / F1) play a major role in the perception of vowels and consonants (Fant, 1960).These formants are also lawfully related to articulatory movements, to the three-dimensional configuration of the vocal tract, and to activation of specific articulatory muscles of the tongue and lips (Erickson, 2002;Fant, 1960;Honda & Kusakawa, 1997;Maeda & Honda, 1994;Perkell et al., 1993).For example, as a rule, F1 varies inversely with tongue height, both F2 and the difference F2-F1 increase directly with tongue advancement, and both F1 and F2 decease with lip rounding (cf.Kent, Weismer, Kent, Vorperian, & Duffy, 1999).We also wanted to assess LSVT-induced changes in articulatory function by using the ratio F2i / F2u for two reasons.First, F2 is a major acoustic cue for both vowels and consonants, especially for place and manner of articulation (Most, Amir, & Tobin, 2000).Second, the F2 center frequency range, formed by the English vowels /i / and /u /, is relatively large (from about 1000 Hz for the vowel /u / to about 2500 Hz for the vowel /i / (Hillenbrand, Getty, Clark, & Wheeler, 1995), and, as such, it should serve as a sensitive index of changes in the extent of articulatory movements and positions.Additionally, we wanted to study LSVT-induced changes in vowel triangle area (VTA).In dysarthric speech, vowels tend to centralize because of limited movements of the speech articulators.This centralization should be reflected as a smaller-than-normal VTA (Ziegler & von Cramon, 1983).Treatment-induced improvement in articulatory function should, therefore, be reflected as an enlargement of the VTA relative to pretreatment (Kent & Kim, 2003;Weismer, Jeng, Laures, Kent, & Kent, 2001).
Although acoustic changes secondary to LSVT can provide important information about articulatory movements, these changes might be subtle and may not necessarily reflect significant perceptual changes in speech.Thus, to better interpret the results of the acoustic data in the present study, we included a perceptual study comparing vowels produced in sentences before and after therapy.We refer to this as a measurement of "vowel goodness," although vowel goodness generally refers to how well an uttered vowel is judged as an exceptionally good instance, or best exemplar, of an intended vowel (Iverson & Kuhl, 1995).We assumed that because of poor articulation and the centralization of vowel formants, the quality or identity of vowels should deteriorate in dysarthric speech.Conversely, improvement in vowel articulation with treatment should result in a more acceptable perception of vowel goodness.As shown by Liu, Tsao, and Kuhl (2005), ratings of vowel goodness, especially if accompanied by acoustic (vowel formant) data, can provide strong indications of changes in articulatory function.

Method
Participants Three groups of participants were included in this study.Two groups comprised participants with idiopathic PD, and the other group consisted of neurologically healthy participants with normal speech.All participants were native speakers of American English.The participants with PD had all been diagnosed and medically treated by neurologists with special expertise in Parkinson's disease and other movement disorders.These participants were randomly assigned to either a treatment group ( PD-T, n = 14, 50% men, 50% women) or a nontreatment group ( PD-NT, n = 15, 53.3% men, 46.7% women).There were no significant differences between these two groups in age ( PD-T: M = 68 years, SD = 6 years; PD-NT: M = 77.6 years, SD = 8.0 years), time (in years) since diagnosis ( PD-T: M = 9.08 years, SD = 6.97 years; PD-NT: M = 6.29 years, SD = 2.21 years), stage of disease (Hoehn & Yahr, 1967; PD-T: M = 1.33,SD = 1.63;PD-NT: M = 1.86,SD = 1.35), or severity of speech and voice disorder prior to treatment (as judged independently by two speech-language pathologists highly experienced with voice and motor speech disorders).The majority of the participants with PD in the two groups had voice problems, characterized by reduced vocal loudness, hoarseness, and monotone speech.Additional speech characteristics observed in some participants included imprecise articulation, and, in 1 individual, palilalia.In the majority (90%) of the participants with PD, the dysarthria was judged mild to moderate in severity, and in 3 (10%) participants, it was judged severe.None of the participants with PD had a history of speech-language treatment prior to participation in this study.Although cognitive function was not formally tested, all participants were judged by the clinicians to have grossly intact cognition, with clear ability to follow directions and perform the tasks and tests adequately.
All of the participants with PD were taking anti-Parkinson medications at the time of data collection and did not change medications during this period.They were all optimally medicated and stable at the time of the study.Efforts were made to collect data from participants at the same time in their medication cycle; however, this time was not always exact.This was judged not to be of great concern in the present study, given that anti-Parkinson medications have been documented to have only limited or no effects on speech and voice in people with PD (cf.Goberman & Coelho, 2002;Sapir et al., 2006) and to have no significant effects of time of medication cycle on voice and speech (Larson, Ramig, & Scherer, 1994).Nevertheless, the experimenters recorded the time of each participant's last medication and next medication at the beginning of each session for reference in the event that large or unusual variability would be observed in a participant's performance from session to session.No such variability was observed.
The participants with PD were recruited for the study in Tucson, Arizona, through various neurological clinics and PD support groups.A third group of participants was of individuals who were neurologically normal (NN, n = 14, 50% males) and age-matched to the participants in the other groups.These participants were free of any known condition that could affect their speech or voice, including neurological disease, speech or voice complaints, or a history of speech-language disorders.This information was determined from screening questions asked during the recruitment process and from the initial interview with these individuals before data collection procedures.These participants were recruited via the PD support groups.Most of these participants were spouses of participants in the PD-T or PD-NT groups.
All participants in the study underwent videolaryngoscopic examination, which was performed by an otolaryngologist prior to participation in the study.This examination confirmed that participants were free of any laryngeal tissue pathology such as vocal nodules, polyps, or gastric reflux.Vocal fold bowing (at mid-portion) was found in 9 of 14 (64%) participants in the PD-T group, in 5 of 15 (33%) participants in the PD-NT group, and in none of the participants in the NN group.No other form of glottal incompetence was reported.Laryngeal tremor was observed in 1 participant in the PD-T group and in 1 participant in the PD-NT group.Hearing status was not assessed audiometrically, but all participants were reported to have adequate hearing when not in a noisy environment.The clinicians who tested and assessed speech and language abilities in these participants reported that the participants appeared to have adequate functional hearing, at least for communication in a quiet environment.Formal auditory testing was not done, primarily because the participants were required to do so many things in the laboratory that lengthening their stay more than necessary may have resulted in the wearing off of medication, the worsening of symptoms, and fatigue.
All of the participants in this study signed an institutional review board consent form stating that they agreed to participate.The participants in the PD-NT group were told that they would be offered treatment with the LSVT 1 month after the termination of the study.Participants were not informed of the specific purposes of the study and were told only that the purpose of the study was to assess voice and speech abilities or problems in individuals with and without PD.

Data Collection
Recording times.Data collection for all three groups took place on 3 different days just before treatment and on 2 different days just after the end of treatment.The terms PRE and POST will be used to indicate the times of recordings, with the understanding that only the PD-T group received treatment and that the specific dates of recordings were different across participants, yet the overall time schedule, as described previously, was the same for all participants.
Speech samples.On each recording day, the participants performed various tasks as part of a larger protocol.One of these tasks was to read aloud three phrases, each repeated three times, for a total of nine phrases per day.These phrases were "The blue spot is on the key," "The potato stew is in the pot," and "Buy Bobby a puppy."The words key, stew, and Bobby in these phrases were later used for the acoustic analyses and the perceptual ratings of the vowels /i /, /u /, and /a /, respectively.
Recording methods.For all participants, the acoustic data were collected in a sound-treated booth using a head-mounted condenser microphone (AKG C410) and a DAT 8-channel recorder (Sony PC-208AUC).The microphone was positioned 6 cm from the participant's lips.The gain of the microphone amplification system was adjusted for each participant to ensure optimal recording.The data were then digitized from the DAT tape to a computer at a sampling rate of 22 kHz using Goldwave software.
The VocSPL data were collected by both hand and DAT recording.The collection of the VocSPL by hand has been done routinely in our laboratory and has been shown to be reliable and valid (Fox & Ramig, 1997).This hand collection was done by a highly trained data collector using the digital display presented at 1-s intervals from a Class I Bruel & Kjaer 2236 sound level meter (SLM).As the data collector heard the phrase uttered by the participant, she wrote down each value (dB SPL) that was presented on the SLM monitor.The SLM was positioned at a constant distance of 30 cm from the lips with the distance checked throughout the recording session.The output of the SLM was simultaneously recorded onto the DAT.Because of problems with digital sampling that were discovered later in the specific channel on which the SLM was recorded, it was not possible to use the SLM data from the DAT.Therefore, all of the VocSPL data reported in this study are based on the hand-collected measurements.All other channels on the DAT, including the channel recording the raw speech signal, did not show any problems.Across all recording sessions, data were collected at approximately the same time of day by the same experimenter ( Lorraine O. Ramig).The experimenter did not administer treatment and was blind to the treatment group assignment.The treatment was performed by another individual (Cynthia Fox), a highly trained speechlanguage pathologist with extensive experience in voice and motor speech disorders and the administration of the LSVT.This individual helped with the technical aspects of the recordings but was not present in the sound-treated room and did not talk with any of the participants during the recording session.She also did not discuss the identity of the participants with the experimenter, nor did she reveal information about treatment assignment.She was occasionally and briefly seen by the participants during the recording session.
Calibration signals were recorded onto the DAT tapes for each participant prior to the recording of speaking and voice tasks and following any adjustments of input levels on the DAT recorder.Standard procedures for recording calibration signals were followed (tone generator and sustained phonation).The distance for calibration signals and all tasks was 30 cm.This distance was monitored constantly throughout the recording session.The calibration signal was recorded as part of the general protocol and was intended to be used in the event that we wanted to extract VocSPL from the microphone signal.However, in the present study, we elected not to use this method but, rather, to rely on the SLM output for VocSPL measurements as discussed above.

Treatment
The LSVT program was administered only to members of the PD-T group.The treatment was delivered as described elsewhere (Fox, Morrison, Ramig, & Sapir, 2002;Ramig et al., 1995).A detailed rationale for the specific LSVT treatment tasks has been summarized previously (Farley, Fox, Ramig, & McFarland, in press;Fox et al., 2002Fox et al., , 2006;;Ramig, Bonitati, Lemke, & Horii, 1994).In brief, the LSVT uses high-effort, but not strenuous, healthy loud phonation to encourage maximum phonatory efficiency and co-activation and coordination of speech subsystems.Patients are taken through exercises on a daily basis, repeatedly practicing and emphasizing maximum duration loud phonations, maximum high-and low-pitch phonations, and speech exercises with increased loudness.This improved phonation is then carried over into speech and conversation following a standardized hierarchy, with a focus on monitoring the amount of effort required to sustain sufficient vocal loudness (calibration).No direct attention is given to speech rate, prosodic pitch inflection, or articulation.Therapy is administered four times per week over 4 weeks, each session lasting 50-60 min.Therapy was provided to the participants in this study free of charge.The individuals with PD serving as controls (PD-NT) were offered the same treatment free of charge 1 month after the completion of this study.

Measurements of Vowel Formants
Measurements of the first ( F1) and second ( F2) formants were obtained from the vowels /i /, /u /, and /a / extracted from the phrases mentioned above.Formant frequency analysis was done using TF32, a Windowsbased version of CSpeech software.With TF32, both wideband spectrographic displays and linear predictive coding ( LPC) spectra were used to determine formant frequencies.F1 and F2 values for /i / and /a / were measured for a 30-ms window at the temporal midpoint of each vowel.To measure the vowel /u /, the window included the final 30 ms of the vowel.This segment of the vowel was chosen to avoid the intrusion of the formant transition immediately preceding the /u/ in "stew."To assess the validity of the acoustic measurements with the TF32 method, we submitted 40% of the vowel data to another method of formant measurement using MATLAB (Version 5.3).MATLAB also uses linear predictive coding to determine the formant frequencies (with a 50-ms window).In this method, F1 and F2 were measured at their most extreme position near or at the temporal midpoint of the vowel.Again, wideband spectrographic displays helped identify whether formants were correctly tracked.The two methods of formant measurement were performed by the same person, an individual who is highly experienced in acoustic analyses of normal and disordered speech.
Only acoustic variables that significantly ( p < .05)differentiated between the participants with and without PD prior to the treatment phase of the study were used to assess treatment effects.This was done to minimize statistical error associated with multiple hypotheses.These select variables were also used to assess differences between the two patient groups and between men and women in the three groups.The specific formants and formant combinations tested were F1 and F2 of each of the three vowels (/i /, /u /, /a /), the difference between F2 and F1 of each vowel, the ratio F2 / F1 of each vowel, the ratio F2i / F2u, and the area of the triangle formed from the Euclidean distances between the three vowels in the F1-F2 space.

Rating Vowel Goodness
Two certified speech-language pathologists and four graduate students in speech-language pathology-all speakers of American English as their first language and all experienced in voice and motor speech disorders through courses, clinical work, and laboratory studiesserved as judges in a perceptual rating of vowels.Each judge was presented, via computer, with pairs of the same vowel (/i /, /u/, or /a/) spoken by each participant, with the pair containing one vowel that had been produced at PRE recording and one vowel that had been produced at POST recording.The order of the vowels was randomized within and between pairs and across participants.The VocSPL of all the vowels was normalized within and across participants and conditions prior to presentation to ensure that the rating of each vowel was not influenced by VocSPL level.These normalized vowels were presented binaurally via Logitech 200 semi-open stereo headphones to each judge at her comfortable loudness level, with this setup level then remaining the same throughout the experiment.The mean duration (in ms) of the vowels was 257.1 (SD = 68.1)for /i /, 185.2 (SD = 66.7) for /u/, and 156.3 (SD = 27.3) for /a/.The difference in mean duration of each vowel PRE recording versus POST recording was small (/u/ = 4.2 ms, /a/ = 3.8 ms, /i / = 21 ms) and statistically not significant ( p < .05).On the basis of previous results with similar vowel durations in a study of vowel discrimination and identification (Hillenbrand, Clark, & Houde, 2000) and much shorter durations of vowels in isolation in a vowel recognition task (Tekieli & Cullinan, 1979), we deemed the durations of the vowels in the present study sufficiently long to allow valid ratings of vowel goodness.
For vowel goodness ratings, vowels from the second repetition of phrases produced in PRE 1 and POST 1 recordings were used, for a total of 43 pairs of vowels.
Twenty-five (58%) of vowel pair ratings were repeated for intrarater reliability.Using a 0-to 100-point visual analog scale, the judge's task was to rate the second vowel in the pair relative to the first vowel.On this scale, a rating of 50 indicates no perceptible difference in vowel goodness between the first and the second vowels in the pair.Rating the second vowel in the pair greater than 50 indicates that it is a better exemplar of the target vowel than the first vowel in the pair, and rating the second vowel in the pair less than 50 indicates that it is a worse exemplar than the first vowel in the pair.
All judges completed an initial computerized training module to familiarize themselves with the rating scale and with what was meant by judgments of "better than" "same as," or "worse than."As part of the training, the judges were presented with examples of pairs of vowels that contained a poor exemplar and good exemplar of the target vowel.The voice samples in the training module were taken from the pool of vowels generated by several participants from the study, but these samples were not used for any actual rating.The judges were then presented with the scale and were shown how to rate the exemplars in the pair, always rating the second vowel in the pair (Vowel B) relative to the first vowel in the pair (Vowel A).The scale as displayed on the computer was 11 cm long, with the left side labeled "much worse," the midpoint labeled "same as," and the right side labeled "much better."Under these labels were the numbers 0, 50, and 100, respectively.The judges were then asked to rate some of these practice items independently.The training module guided the judges on how to use the computerized response form for each of the pairs of vowels presented.The judges were free to listen repeatedly to each of the two vowels in the pair (by clicking on one of two icons representing the first [Vowel A] and second [Vowel B] vowels in the pair) before making their decision.The judges were told that in the actual experiment, two of the same type of vowel, spoken by the same person, would be presented in a pair.They were also instructed that in all cases, rating of vowel goodness should be done by comparing the second vowel in the pair (Vowel B) to the first vowel in the pair (Vowel A).At the beginning of the practice module, they were told the following: The purpose of this study is to determine the effects of speech therapy on vowel production in PD.You will be asked to listen to different vowels produced by speakers with and without PD and rate them on the clarity of their production.The vowels we are studying are: Ah as in "Bob" Ee as in "key" Oo as in "stew" Here are some examples of vowels you might hear.Listen to each one and think about how clear they are as examples of each vowel.Imagine how they would sound in a word or sentence and if they would be easy or difficult to understand.Note that some of the vowels are very short.Try to listen to the articulation, and not the length.
Once the judges understood the task and familiarized themselves with the computerized rating forms, the rating session began.As in the practice session, the judges were allowed to click on the Vowel A or Vowel B icons to hear the vowels in the pair as many times as they needed to determine their rating.The computer recorded the judges' numerical ratings and stored them in a table for later analyses.

Reliability Measures
Reliability of formant measures.To assess interjudge reliability of F1 and F2 measurements, 25% of the formant measurements obtained with the TF32 method were selected at random and were analyzed by a second person who was well trained in the analysis method using the same program.A Pearson product-moment correlation yielded r = .99for each formant.The means of interjudge standard errors measurement (SEMs) for the F1 and F2 measurements were 20 and 42 Hz, respectively.Interjudge reliability of vowel duration measurements performed independently by the same two judges on 387 vowel samples were high, with a Pearson productmoment correlation of .93 and mean SEM of 27.2 ms.
To assess the reliability and validity of formant measurements, we correlated the measurements from the two methods of formant extraction described above.These correlations were very high for the F2 (r = .96-.99) and high for the F1 measurements (r = .83-.95) across the different vowels, times (PRE, POST), and groups.SEMs were relatively small for both F2 (M = 26 Hz) and F1 measurements (M = 19 Hz).These findings indicate high reliability of formant measurements across the two methods.Given the high interrater reliability for both formants, the lower level of agreement for F1 likely reflects slight differences in extraction method between the two programs.
Reliability of perceptual ratings.To establish intrarater reliability for the perceptual data, 25 (58%) of the vowel pairs were randomly selected for repeated ratings.Pearson product-moment correlation coefficients were low for 2 raters (r = .57and .65)and acceptable for the other 4 raters (r = .76, .79, .86, .92).Therefore, ratings of the two judges with the low reliability measures were discarded, and all subsequent analyses were done based on the ratings of the remaining four judges.
To establish interjudge reliability, we examined the ratings of the four judges for each vowel, participant, time (PRE, POST), and group.The rating of the vowel was considered acceptable if at least 3 of the 4 raters agreed in their categorical rating, regardless of whether the rating indicated improvement (rating of more than 50 on the scale) or no improvement (rating of 50 or less on the scale).We then calculated the percent agreement among the raters (interrater reliability) by dividing the number of all cases where there was an interrater agreement by the total number of cases (pairs rated).By this calculation, percent interrater agreement across groups and vowels was 87.6%.Percent interrater agreement across groups was 88.4% for /a/, 81.4% for /u/, and 93.0% for /i /.Percent interrater agreement across vowels was 78.6% for the PD-T group, 91.1% for the PD-NT group, and 90.5% for the NN group.Thus, interrater agreement was, overall, high.

Statistical Analyses
Differences between the groups for each of the dependent variables were separately evaluated for the PRE and POST data using a one-way analysis of variance (ANOVA).Differences between recording times at PRE or POST and across gender were tested with a two-way repeated measures ANOVA (RM-ANOVA).A two-way RM-ANOVA was also used to assess interaction between two variables.A repeated measures multivariate analysis of variance and multivariate analysis of covariance were run to assess posttreatment differences while accounting for pretreatment variation.The Tukey's studentized range test and Scheffe's test were used for planned comparison analyses of significance, with alpha set at the .05level for significance.Fisher-Yates exact tests were used to assess significant differences in frequency measures, with alpha set at the .05level for significance.Pearson product-moment correlations were used to assess the relationships between groups on the dependent variable.The magnitude of change from PRE to POST recordings was assessed with effect size ( ES) measures using a pooled variance method (Cohen, 1988).To calculate the ESs, we compared the mean PRE to POST change (and SD) in the PD-T group relative to the mean PRE to POST change (and SD) in the PD-NT group and relative to the mean PRE to POSTchange in the NN group.We also compared the mean PRE to POST change in the PD-NT and NN groups.By the pooled variance method, an ES of 0.80 is considered large, 0.50 is considered medium, and 0.20 is considered small (Cohen, 1988).

Acoustic Measures
Data pooling.A one-way RM-ANOVA indicated no significant differences ( p > .05) between the acoustic data obtained on the 3 different days at PRE recordings for any of the variables.Similarly, there were no significant differences between the acoustic data obtained in the 2 days of POST recordings for any of the variables.Therefore, the PRE data were pooled, as were the POST data.All subsequent analyses were performed on the pooled data.
Differences between groups at PRE. Table 1 summarizes the means and standard deviations of the acoustic measurements ( F1 and F2 of the three vowels, F2i / F2u, vowel triangle area, and SPL for each of the vowels) and perceptual ratings PRE and POST and for each of the three groups of participants.
A one-way ANOVA of the PRE data indicated that of all the acoustic variables tested (see Method section), only five variables-F2u, F2i /F2u, SPLi, SPLu, and SPLasignificantly differentiated between the participants with and without PD.Specifically, each of these five variables showed significant differences across the three groups, SPLi: F(2, 255) = 16.33,p < .0001;SPLu: F(2, 255) = 12.67, p < .0001;SPLa: F(2, 255) = 22.93, p < .0001;F2u: F(2, 255) = 15.88,p < .0001;F2i / F2u: F(2, 255) = 29.51,p < .0001.Planned comparisons indicated significant differences between the PD-T and NN groups for each of the five variables, between the PD-NT and NN groups for each of the five variables, and between the PD-T and PD-NT groups for only the SPLi variable.As can be seen in Table 1, of the five acoustic variables that differentiated between the participants with and without PD, SPLa, SPLu, SPLi, and F2i /F2u had higher means and F2u had a lower mean in the healthy participants compared with the participants who had PD.
As can be seen in Table 1, the POST VocSPL in the PD-T group exceeds that of the NN group, whereas the POST F2i / F2u and F2u values are in the direction of the NN values but do not reach normal values.The ES data are shown in Table 2.As can be seen, the PRE to POST differences between the PD-T group and each of the other two groups show large ES values for the VocSPL and moderate to large ES values for the F2u and F2i /F2u.The differences between the two control groups (PD-NTand NN) are associated with minimal ES values for all variables.Collectively, these findings indicate marked improvement in VocSPL and moderate to large improvements in F2u and F2i / F2u in the PD-T group only, with no significant changes in these variables from PRE to POST recordings in the PD-NT and NN groups.

Perceptual Vowel Ratings
As can be seen in Table 1, the PD-T group had mean ratings between 57.9 and 65.5, indicating that in this group, the POST vowels were rated as better exemplars of the target vowel relative to the PRE vowels.In the control groups, the POST vowels were rated near or below 50 on the visual analog scale, indicating no perceptible differences between the PRE and POST vowel productions or slightly worse rating of the POST vowels.A multivariate analysis of variance procedure indicated a significant overall group effect, F(6, 76) = 4.38, p = .0007,and significant group differences for the vowel /i /, F(2, 40) = 10.62,p = .0002;/u/, F(2, 40) = 9.80, p = .0003;and /a/, F(2, 40) = 4.77, p = .0138.Planned comparisons show significant differences between the PD-T and each of the control groups for the vowels /i / and /u/ and between the PD-T group and the NN group for the vowel /a/.There were no significant differences between the PD-NT and NN groups in the ratings of the three vowels.
We also analyzed the frequency (percentage) of ratings that were indicative of improvement (i.e., rating greater than 50 on the scale) in each group.By this analysis, 78.8% of the ratings in the PD-T group indicated improvement, compared with 34.1% in the PD-NT group and 21.1% in the NN group, c 2 (2, N = 43) = 26.04,p < .0001.The difference between the PD-T group and the PD-NT group is significant, c 2 (1, N = 29) = 12.93, p = .0003,as is the difference between the PD-T group and the NN group, c 2 (1, N = 28) = 21.33,p < .0001.The ES values for the perceptual vowel ratings are shown in Table 2.As can be seen, the ES values are large for the differences between the ratings of the PD-T group relative to the PD-NT group and relative to the NN group.The differences between the PD-NT and NN groups have small ES values.Collectively, these findings indicate significant changes (in the direction of improvement) in vowel distinctiveness rating in the PD-T group only.

Pearson Product-Moment Correlation Analyses
Pearson product-moment correlation analyses were conducted to assess the relationships between changes in vowel goodness ratings and changes in the acoustic variables from PRE to POST recordings.Because no significant changes occurred in the PD-NT and NN groups, only the PD-T data were subjected to correlation analyses.The results of these analyses are shown in Table 3.As can be seen, there are significant correlations between the perceptual and most of the acoustic variables, with some of the correlations being very strong (/a/ rating and SPLa; /u / rating and F2i / F2u) or moderately strong (/a / rating and SPLi; /u / rating and SPLi).These correlations indicate a significant, and in some cases very strong, relationship among improvements in SPL, vowel goodness rating, and vowel acoustics.

Discussion
In the present study, individuals with PD treated with LSVT showed significant changes in VocSPL, F2u, F2i / F2u and vowel goodness ratings after treatment.These changes were in the direction of normal values, indicating improvement in vocal and articulatory functions.The changes are consistent with what we know about the role of the articulatory system in vocal loudness regulation, as discussed previously.The lack of change in the control groups suggests that the effects in the treated group were treatment specific and were not related to passage of time or familiarity with the testing procedures.These findings are congruent with previous findings regarding the effects of loud phonation training on voice and articulatory functions in dysarthric individuals (Dromey et al., 1995;Sapir et al., 2003).These findings are also consistent with the effects of stimulated loud phonation on articulation in normal speech (Di Girolamo et al., 1996;Dromey & Ramig, 1998;Erickson, 2002;Schulman, 1989) and dysarthric speech (e.g., Tjaden & Wilding, 2004).Collectively, these findings suggest that the LSVT is an efficient way to improve multiple aspects of speech production in dysarthric individuals with PD.
In the present study, only two vowel acoustic variables, F2u and F2i / F2u, differentiated between individuals with and without PD prior to any treatment.This finding is somewhat surprising given that other formant variables tested here have been shown, in other studies, to be sensitive to dysarthric speech.However, it is also the case that some studies have failed to find strong relationships between perceptual measures and acoustic measures, between speech intelligibility and acoustic measures, or between severity of motor impairment and perceptual and acoustic measures of dysarthric speech (cf.Bunton & Weismer, 2001;De Letter, Santens, & Van Borsel, 2004;McAuliffe, Ward, & Murdoch, 2006;Tjaden & Wilding, 2004;Weismer et al., 2001).This incongruity may be related to factors such as intersubject variability, the nature of the motor impairment, and the nature of the task being performed (Tasko & McClean, 2004).The vulnerability of the articulatory system to a breakdown in certain speech tasks but not others seems to be especially evident in the speech of individuals with PD (Ackermann & Ziegler, 1991;Connor & Abbs, 1991;Ho, Bradshaw, Cunnington, Phillips, & Iansek, 1998;Kempler & Van Lancker, 2002).
Why the F2u and F2i / F2u differentiated between the groups with and without PD and why they were sensitive to treatment effects is also not clear.One possible explanation is that these two variables are sensitive to the particular orofacial deficits in individuals with PD and that with treatment, these deficits were alleviated or minimized, thus the improvement in the acoustic measures.For example, it is possible that prior to treatment, the participants had reduced lip movements, which adversely affected lip rounding for /u/ and lip spreading for /i /.These effects can result in a less-than-optimal decrease in F2u, less-than-optimal increase in F2i, and lessthan-optimal F2i /F2u ratio.After treatment-with the improvement in facial muscle activity-lip rounding and spreading improves, and with it comes a decrease in F2u and an increase in both F2i and F2i /F2u toward normal values.Improvement in facial function following LSVT in individuals with PD has already been demonstrated (Spielman et al., 2003).However, the improvement was in facial expression, which may not necessarily reflect improvement in speech articulation because these two functions are subserved by different neural mechanisms (Cummings, Benson, Houlihan, & Gosenfeld, 1983).Thus, future physiologic studies are needed to confirm improvement in lip function for speech following LSVT.Still another explanation for the changes in F2u and F2i /F2u is the improvement in tongue movements with LSVT, especially in the anterior-posterior direction.It has been shown that tongue movements in individuals with PD are reduced in range, especially when the tongue has to move for heterogeneous phonetic gestures, such as going from alveolar to velar place of articulation and vice versa (Ho et al., 1998).It is, therefore, possible that the abnormally high F2u and abnormally low F2i and F2i /F2u values prior to LSVT were related to the reduction of the range of tongue movements and that with treatment, these articulatory problems were minimized, thus the improvement in F2u and F2i /F2u ratio.The possibility of lingual function improvement with LSVT in individuals with PD has already been demonstrated, although the improvement was in tongue pressure and motility (Ward et al., 2000) and swallowing (Sharkawi et al., 2002) rather than during speech.Again, physiologic studies of the tongue are needed to verify the effects of LSVT on speech articulation in individuals with PD.
Still another explanation for the changes in F2u and F2i / F2u from PRE to POST has to do with vertical laryngeal and/or suprahyoid anterior movements during speech.These movements can significantly affect vowel production and the locations of vowel formants along the frequency domain (Riordan, 1977;Sapir, 1989).Therefore, it is possible that prior to LSVT, these movements were limited, whereas with treatment they improved, thus the changes in F2u and F2i /F2u variables.However, whether or not vertical laryngeal and anterior hyoid bone movements change (improve) with LSVT needs to be verified physiologically.Finally, all three mechanisms ( lip rounding, tongue movement, and laryngeal movement) might have simultaneously affected the acoustic variables.
In the present study, among the formant measurements, only F2u and F2i /F2u were sensitive to differences between groups and to treatment effects.One might argue that the F2i /F2u ratio is superfluous because F2i did not change significantly and only F2u was changed.However, the F2i /F2u ratio shows more sensitivity-in terms of level of significance and effect size-than the F2u to both the differences between groups and in response to treatment.Thus, the F2i/F2u ratio appears to be an important parameter that indexes changes in the vocal tract, whether in response to the disease or to treatment effects.

Limitations of the Study
Certain limitations in this study should be addressed in future studies.First, individuals with PD differ markedly in their speech and nonspeech symptoms, the severity of symptoms, and the degree of fluctuations of these symptoms.In this study, there were 14 or 15 participants in each group.Also, the majority of the participants with PD in this study had mild or moderate speech abnormalities and were at the early to mid-stages of the disease.Thus, the results from this study may not necessarily apply to all individuals with PD.Second, in this study, the acoustic and perceptual data were based on three phrases that were read aloud.It has been demonstrated that the presence, type, and severity of dysarthric speech in individuals with PD depend on the specific speech task being performed (Caligiuri, 1989;Kempler & Van Lancker, 2002;Rosen, Kent, & Duffy, 2005).In general, conversational speech is more likely to show the true deficits in the speech of individuals with PD when compared with more structured and less automatic modes of speech, such as reading aloud.We did not examine the effects of LSVT on vowel acoustics in conversational speech for the simple reason that, methodologically, it is very difficult to compare changes in vowel acoustics when the utterances vary within and across participants and across times ( PRE vs. POST).Nevertheless, in future studies, it would be important to use methods that allow comparison of speech before and after treatment under more natural conditions.Third, in the present study we used acoustic and perceptual correlates of vowel articulation.Although these methods, when combined, can provide important information, they are limited in their ability to depict the specific physiologic abnormalities underlying hypokinetic dysarthria and the effects of treatment in PD.Thus, it would be important to combine the present methodology with more direct physiologic ( kinematic) measures of articulatory functions.Fourth, this study examined changes in articulation in response to LSVT only at the end of treatment.It is possible that over time, the effects of LSVT on articulation may have diminished.However, SPL data obtained in previous studies out to 2 years suggest maintenance of increased SPL post LSVT above pretreatment levels (Ramig, Sapir, Countryman, et al., 2001;Sapir et al., 2002), and we suspect that the same might be true for the effects of LSVT on articulation.Thus, future follow-up studies are needed to ascertain the long-term benefits of LSVT on articulatory functions.

Conclusion
The aforementioned limitations notwithstanding, the present findings provide empirical support for the therapeutic effects of LSVT on articulatory function in individuals with PD.The effects demonstrate that therapy with one single focus-increased loudness-can have a positive effect across the speech mechanism without direct attention to other systems.Future follow-up studies should help verify the long-term therapeutic effects of LSVT on speech articulation, intelligibility, and acceptability.

Table 1 .
Mean and standard deviation of SPL, vowel formants, vowel triangle area, and vowel ratings at PRE and POST recordings for each of the groups.

Table 2 .
Effect size (ES) measurements.Note.These ESs indicate the magnitude of change in one group relative to the magnitude of change in the other two groups for each of the acoustic and perceptual variables.

Table 3 .
Pearson product-moment correlations between PRE and POST changes in the acoustic variables and between PRE and POST changes in the ratings of vowel goodness in the PD-T group.