Student Learning of Perceptual Skills Related to Differentiating Motor Speech Disorders

Purpose: This study aimed to determine if Speech-Language Pathology (SLP) graduate students’ perceptual skills improved after taking an MSD course by comparing pre- and posttest performance. The potential relationship between posttest perceptual-skills performance and academic performance was also investigated.Method: Before beginning instruction in MSD course content, students in a Master’s program in SLP were given a pretest (The Baseline & Post Learning Assessment of Listening & Diagnostics Skills (BPLALDS; Duffy, n.d.a)). Throughout the semester, students were exposed to didactic learning in the classroom supplemented by audio and video modules. At the end of the course, the BPLALDS was used as a posttest. Variation in perceptual skills development was described and compared to overall course performance. Results: Scores on posttests of perceptual ability were significantly higher than pretest scores. Post-hoc comparisons revealed that students who learned relatively more were those who generalized perceptual knowledge to novel stimuli. Academic grade assignment correlated strongly with but accounted for only some of the variation in perceptual ability. Conclusion: Although some variation in perceptual ability related to differentially diagnosing motor speech disorders can be accounted for by academic attainment, additional factors, such as students’ ability to generalize knowledge from novel to new cases, likely contribute. The authors reflect on the manner in which learning theory can inform these results.


Need for the Current Study
It is important for course instructors in Speech-Language Pathology (SLP) to document the attainment of student knowledge and skills (American Speech-Language-Hearing Association [ASHA], 2016). In particular, perceptual skills need to be taught in courses that require students to make a perceptual distinction between different types of speech output. Accurately perceiving speech output is an important skill component in courses such as Phonetics, Articulation Disorders, Voice and Motor Speech Disorders (MSD). To meet the goal of including perceptual skill development as a course goal, teachers of MSD courses must be able to both teach and document learning of the skills necessary to perceptually diagnose MSDs. Although auditory perceptual skill is likely the main perceptual modality used to clinically distinguish MSD features, it is not used exclusively. Visual perception is also used to do things such as (a) visualize confirmatory signs such as masked facies (associated with Parkinson's Disease and hypokinetic dysarthria) or to (b) visualize neurophathophysiology of abnormal movements associated with hyperkinetic dysarthria. Previous research, however, has related to the teaching and learning of mainly auditory perception of speech and seems to fall into three categories: (1) investigations of the relationship between student variables and perceptual abilities, (2) group comparisons of SLP's perceptual abilities as compared to those of students and other less-trained listeners and finally (3) other studies that investigated the ability of students to learn to perceive specific voice and resonance characteristics.
Studies of student characteristics have focused on the effects of variables, such as exposure to musical or vocal lessons, on students' ability to learn to perceive speech characteristics such as vocal quality, pitch, and speech intonation (Dankovicova', House, Crooks, & Jones, 2007;DeBoer & Shealy, 1995;). The characteristics of one's native language has also been found to affect the sensitivity with which listeners perceive the presence of hypernasality and phonemic contrasts between synthesized vowels (Kreiman, Gerratt, & ud Dowla Khan, 2010;Lee, Brown, & Gibbon, 2008). Others have focused on level of students' phonemic awareness in relation to their ability to perceive and transcribe speech phonetically (Robinson, Mahurin, & Justus, 2011).
Additional research has revealed group differences in the auditory and/or visual perception of speech stimuli between trained SLPs as compared to those with lesser training. Gelfer (1993) found that trained SLPs perceived five dimensions of difference in normal voices: (1) pitch, (2) loudness, (3) age and speech rate, (4) pitch variation, and (5) vocal quality; whereas untrained listeners only picked up on two dimensions: (1) pitch associated with resonant/shrill quality and (2) pitch variability, age and speech rate. This finding supports the notion that the extent of training and/or more professional experience led to more detailed, and overall better perceptual ability. Kreiman, Gerratt, and Precoda (1990) found that trained clinicians attended to multiple aspects of voice quality (i.e., fundamental frequency, roughness and breathiness) as compared to their untrained counterparts, who paid attention mainly to the feature of fundamental frequency when judging both normal and pathological voices. This finding suggests that some groups of listeners will be better at perceiving salient vocal characteristics than others. For example, with professional experience, listeners develop perceptual strategies that allow them to flexibly hone in on idiosyncratic details relevant to accurately perceiving the multiple dimensions of vocal quality. Naïve listeners listen instead for general feature differences, which are not specific enough to allow for them to accurately perceive all the dimensions of quality. Similarly, a few individual case studies show that students did relatively poorly at the task of differentiating characteristics of speakers with MSDs as measured by low accuracy scores on a pretest of auditory and visual perceptual diagnostic skills of the Baseline & Post Learning Assessment of Listening & Diagnostics Skills (BPLALDS;Duffy, 2013b;2013a). On the same task, additional case studies revealed that SLPs with some kind of professional experience, not necessarily specific to MSDs, tended to score higher than students on the BPLALDS, and SLPs with years of experience with MSDs scored higher still (Duffy, 2013a). Others have found contradictory results; that students' perceptions related to MSDs were better than those of professionals (Zyski & Weisiger, 1987). Specifically, Zyski and Weisiger (1987) investigated students' as compared to professionals' ability to categorize dysarthria types according to Darley, Aronson, and Brown's (1969a, 1969b, 1975 classic subgroups. SLP professionals with at least 5 years of experience and who routinely diagnosed MSDs were able to do so with an average of only 19% accuracy, whereas students, who had recently received dysarthria classification training, performed much better, with an average of 56% accuracy. Others have looked at the development of the ability to perceive specific dimensions of voice and resonance. Oates and Russell (1998) found that SLP students in a focus group reported their belief that their perceptual skills for distinguishing vocal characteristics had improved by means of using an interactive, multimedia program (A Sound Judgment; Oates & Russell, 2003) that presented video and audio stimuli of systematically increasing complexity. Similarly, Lee, Whitehill, and Ciocca (2008) found that three conditions, listening experience, direct feedback about judgement accuracy, or both were effective in improving the inter-examiner reliability of SLP students' perceptual judgements. Finally, the specific dimensions of "roughness" and "breathiness" were found to be teachable to SLP students using an anchor approach and synthesized speech stimuli (Yiu, Chan, & Mok, 2007).
There seems to be limited data on the learnability of the global set of perceptual features that distinguish MSDs from one another as measured by students' relative perceptual ability before and after studying the subject. Like phonetics, perceptual skills related to MSDs are directly relevant to the abilities to recognize and label motor-speech characteristics needed for diagnosis and treatment of dysarthria and apraxia of speech. As a result, students should be held accountable for acquiring these skills in order to prepare themselves for competent clinical practice. To date, however, there is no research available to document how SLP master's students should be expected to perform perceptually at baseline as compared to how much perceptual ability they might be expected to gain after taking a university course. That is, no one has systematically documented whether a group of students can learn the global set of auditory and visual perceptual skills needed to differentiate MSDs.

Definitions and Mechanisms of Perception
Although some might limit the definition of perception to sensation (e.g., using the sense organs to engage in activities such as touching, hearing, and tasting) most current definitions of the term include an aspect that acknowledges the transfer of basic sensory information into higher-level thought. For example, the Dictionary of Sport and Exercise Science (Churchill Livingstone, 2008) describes perception in the following manner, the act or process of becoming aware of internal or external sensory stimuli or events, involving the meaningful organization and interpretation of those stimuli; . . .
[p]erception is to be distinguished from sensation which refers to the subjective experience that results from excitation of the sensory apparatus without any interpretation or imposition of meaning (p. ). Goldstone (1998) described the four likely mechanisms responsible for transfer of sensory information from the peripheral sense organs to the level of storage of meaningful information. They include (1) attentional weighting, (2) stimulus imprinting, (3) differentiation and (4) unitization. Attentional weighting occurs early on in the process, when we pay selective attention to the important aspects of stimuli given feedback from our environment. For example, a learner might learn to give attentional weight to the color dimension of a set of objects if color has been determined, through prior experience, to be an important feature for making distinctions between them. The dimension of color is going to be important for making the distinction between fruit, and so a learner will use the feature of the color "orange" to characterize oranges and the features of the colors "red or green" to characterize apples. Next, during stimulus imprinting, the stimuli cause learners to develop specialized receptors, functional areas in which to process perceptual information. Imprinting can occur for an entire stimulus or receptors can be specialized for features of a stimulus, if the features are important to capturing the stimulus's regularities. So, using the fruit example, imprinting might allow the learner to store a mental template for the concept of "fruit" but can also allow storage of those particular features of edible flesh, sweetness, and seeds that capture the regularities of the stimulus. Differentiation can then occur, a mechanism that allows for separation of two or more concepts that were once conflated. Using a different example, at some point, the learner may have called all mammals by one name, "dog," then later learned that mammals could be distinguished from each other and to call dogs, "dog," and bears "bear." In this case, two levels of learning occurred: that the bear and dog stimuli differed from each other, and that there were features that could help make that distinction: size, body shape, etc. A fourth mechanism is unitization in which perception of an integrated, multifeatured concept can become clear given exposure to only one important feature. Unitization is more efficient than earlier developing abilities that would have required exposure to each of the relevant stimuli before being able to identify the whole. Using the unitization example, the learner might be able to tell that a bear is a bear not a dog simply by perceiving the dimension of size. Both might be brown and furry, but a bear tends to be larger than a dog. Through these mechanics, the workings of human perception can be described. Explanations for how and why these mechanics work as they do can be informed by perceptual learning theories.

Perceptual Learning Theories
Perceptual differentiation vs. transactional theories. Gibson (1969) provides an overview of the continuum of perceptual learning theories with two opposing ends. On the first end, "perceptual differentiation theory" (Gibson, 1969) holds that environmental stimuli provide most of what we need to learn to perceive. On the other end, theorists such as Bruner (1958) and Ittelson and Cantril (1954) argue that because the environment provides inadequate information, human beings must extrapolate using various internal resources. This latter group of theorists have been termed "transactional" or "interactionist" theorists.
Perceptual differentiation theory holds that the environment and its stimuli are sufficient, or that alone they are adequate for perceptual development. According to perceptual differentiation theory, the developing child must learn to do two things: first, to pay attention to the essential, distinctive features of environmental stimuli, and second, to use those features to classify objects into separate categories based on those distinctive features. Perceptual learning begins with paying attention to the distinctive features whereas memory storage occurs later, when the features are applied to real world objects. This theory can be applied to the case of MSDs, where a student may be able to learn to perceive motor speech stimuli through sufficient environmental input. For example: "slow and strained" vocal quality may first be perceived as distinctive features of a familiar case study. But when these features are both perceived and used to label "spastic dysarthria", as opposed to an incorrect alternative such as flaccid dysarthria in a novel case study, we have evidence that the student is drawing from memory.
Transactional (aka, interactionist) theories (Bruner, 1958;Ittelson & Cantril, 1954) contrast with perceptual differentiation theory in that they view the environment as inadequate, that is, perceptual development cannot occur given input from the environment alone. Instead, learners are given credit, at least in some respect, for the extrapolations they are able to make given limited input from environmental stimuli. As such, two different people may have two different perceptions of a similar stimulus. For example, regarding motor speech, one person's assumptions and expectations regarding speech sound stimuli may be very different than the next person's. These differences are due, at least in part, to the life experiences they have had associated with MSDs in the past. The implications of this theory are that, based on different life experiences, learners will bring very different levels of perception to the learning task. Quantitatively, some will be better at the task of perceiving MSD characteristics at baseline as compared to others. From a qualitative standpoint, they may perceive stimuli differently from one another, so that what one student perceives as "fast" may not sound perceptually fast to another. It is through mutual life experience (example.g.,, the experience of a MSDs course) that mutual agreement regarding such perceptual parameters can be made. Qualitative adjustments can also be made to perceptual abilities by adding categories to our learned classification systems (Bruner, 1958). For example, at the beginning of a MSD class a student may have a vague notion that there is a single category of MSDs called "dysarthria." With perceptual experience, the categories of the subtypes of dysarthria can be increased from one dysarthria to seven dysarthrias (i.e., flaccid, spastic, ataxic, hypo-and hyperkinetic, unilateral upper motor neuron and mixed (Darley et al. , 1969a(Darley et al. , , 1969b(Darley et al. , , 1975). With an increase in categories comes the opportunity to adjust perceptual systems to make additional contrasts between new and different categories.

Language determines thought: The Sapir-Whorf hypothesis.
Related to the idea that cognitively distinct categories can change perception, the Sapir-Whorf hypothesis (Sapir, 1921;Whorf, 1956) holds loosely that the language that we use allows us to think in certain ways (Whorf, 1956). One of Whorf's (1956) most often quoted and also commonly criticized examples is that many words that the Eskimo people have for snow allows them to perceive snow differently than people who speak languages like English which has only one word for it. This applies to categorical knowledge as described earlier regarding dysarthria type but to other linguistic/conceptual pairs as well. For example, students learn to perceive completely new isolated concepts within a MSDs course (i.e., irregular articulatory breakdown -commonly associated with cerebellar damage and ataxic dysarthria; vocal flutter -commonly associated with lower motor neuron damage and flaccid dysarthria; palmomental reflex -commonly associated with upper motor neuron damage and a co-occurring symptom with spastic dysarthria). It is not clear how much the Sapir-Whorf hypothesis accounts for original perceptual development during word learning as originally proposed. Rather, there is some newer evidence (Bowerman, 1996) that universal, non-linguistic spatial concepts, such as the spatial meanings underlying the words "in" and "on" are predetermined across languages, but then later modified in nuance of meaning through perceptual learning available through the lexical items available in a speaker's native language. As a result, concepts such as "in" may be universal to begin with, but with exposure to one's native language, end up meaning something slightly different to the speaker of Korean as opposed to English. Whereas in English, "in" can refer to any object that has been put into any container, no matter the size of the object or container, the Korean words "nehta/kkenayta" specify an object that has been put into or taken out of a loose fitting container. A different Korean word would be used if the fit were tight. The words of our native language help us to modify universal concepts in a way that differs from other languages. Overall, this theory holds, in its extreme version, that words can be used to create distinct differences in perception, and in its less extreme version, to modulate universal perceptions to create nuanced differences in meaning. Gibson (1995), later in her career, extended the perceptual differentiation theory (1969) by adding many concepts that other theorists would have categorized under the broader heading of cognition. She proposed that perceptual learning is fundamental to knowledge and that perception is where all thought begins. Furthermore, she criticized her previous work (1969) for not acknowledging the importance of action to the process of perceptual development. As infants develop they are both interacting with their worlds motorically and perceiving it in order to learn (Gibson, 1988). As such, her revised theory was integrally linked with behavior and included three hallmarks: "agency" (the self in control), "prospectivity" (the forward-looking direction of activity), and "behavioral flexibility" (transfer of means and strategies to new situations) (Gibson, 1970). First, agency occurs when infants learn through motor control and perceptual ability that they can control their own actions in order to have an effect on the world (example.g., learning that moving my foot up to make contact with the mobile above my crib will make it move). Second, prospectivity occurs when perception leads to anticipatory thought to preplan potential impending action (for example, preplanning to throw my sippy cup over my highchair to see what my mother's reaction will be). Finally, flexibility occurs when infants can perceive the applicability of their actions to new and different situations (for example, being presented with a new doll, and realizing that the feeding and dressing behaviors learned with one doll generalize to the new doll).

The revision of perceptual differentiation as applied to MSDs.
The revised version of perceptual differentiation theory might apply to the adult learner and the adult learner of MSDs as a case in point. Agency, in this case, might involve engaging actively in the lab work done outside of class to practice the actions involved in applying previously learned perceptual information from course materials into applied situations. Prospectivity might be applied to studying lab and lecture notes in anticipation of the upcoming post-test at the end of the semester. Finally, flexibility might be applied in transferring perceptual learning from a case study of MSDs with which a student is familiar, to a more novel case.

Purposes of the Current Study
This study aimed to: 1. Determine if perceptual skills related to distinguishing MSDs improved after taking an MSD course by comparing pre-and posttest perceptual skills performance. 2. Investigate whether there was a difference in perception of familiar items that were specifically taught in didactic instruction and did not require generalization of knowledge and novel items that were not specifically taught in the course with which participants were unfamiliar, and that required generalization of knowledge. 3. Investigate whether generalization skill to perceive MSDs varied in terms of membership in the top or bottom half of the class of learners as defined in the method section based on two different comparisons: (a) large and small gainers that made relatively large compared to small score gains from pre-to posttest and (b) high and low scorers that scored relative high compared to low on the perceptual posttest. 4. Investigate the potential relationship between posttest perceptual skills performance and general academic performance, which is a measure of didactic knowledge acquisition as indicated by overall course grade. 5. Investigate the relationship between participants' subjective ratings of difficulty in completing the pre-and posttest perceptual skills task as measured by Likert scale ratings of 1-10 taken both pre-and posttest. It was then investigated whether there was a relationship between changes in subjective ratings, changes in perceptual skills, and course grades.
Hypotheses 1. Perception scores will increase from pre-to posttest, 2. As a group, participants will perform better in familiar as opposed to novel conditions overall, 3. In contrast, those at the top of the class may do better than those at the bottom on novel as opposed to familiar items, 4. Course grades will be related to perception score, 5. Subjective ratings of difficulty will be related to perception score.

Participants
Participants in this study were 13 Master's students, ages 22 -30, who were enrolled in a first-year Master's-level MSD course within a university SLP program. All of the participants were Caucasian females. Two semesters earlier, each participant had passed a bilateral, pure-tone, air conduction hearing screening conducted by a fellow master's student as part of a course, taught by an ASHA-certified audiologist, taken during the first semester of the graduate program. The screenings used pure tones presented in an audiology booth or a quiet therapy room over headphones at the frequencies of 1000, 2000 and 4000 Hz at 25 dB HL in accordance with ASHA Association screening standards for adult hearing screenings (ASHA, 1997). Students are required to report any illnesses to clinical supervisors and, in the meantime, none reported any hearing difficulties. No participant had previously taken a specific course dedicated to MSD; all had previously taken two semesters of graduate-level coursework including Phonetics, Articulation Disorders, and Aphasia. A waiver of informed consent was applied for and granted through the first author's Institutional Review Board as is frequently the case with retrospective analyses.

Materials
The BPLALDS (Duffy, 2013b) was used as a measure of perceptual skills related to features that distinguish motor speech disorders. The BPLALDS is presented as a "…crude assessment of listening and diagnostic skills, either at baseline (before any formal learning or clinical instruction/mentoring) or after learning and skill acquisition has presumably taken place" (Duffy, 2013b, p. 2). The BPLALDS is a supplemental program in the form of digital slides, which is similar to PowerPoint (Microsoft Office, 2013) made available to instructors utilizing the textbook entitled Motor Speech Disorders (Duffy, 2013d) and accessible through the Elsevier website.
The BPLALDS contained 44 predesigned PowerPoint-type slides (Duffy, 2013b) each with one speech sample representing a case of a person with either a particular dysarthria type or apraxia of speech. Half (22) of the slides showed embedded video files with accompanying audio and half (22) contained audio-recorded files without video. 172 total questions accompanied the entire BPLALDS presentation. As such, each predesigned slide showed between two and six questions on the following topics: speech characteristics, confirmatory signs, underlying neuropathophysiology, neurological localization, and/or traditional MSD diagnosis as described by Darleyand colleagues (1969aDarleyand colleagues ( , 1969bDarleyand colleagues ( , 1975. Speech characteristics responses included terms such as "fast, slow, harsh, irregular, quiet." Confirmatory signs included terms such as "masked facies" and "fasciculations." Underlying neuropathophysiology could have included, "weakness, spasticity or incoordination." Neurological localization may have included Unilateral "Upper Motor Neuron (UUMN), Basal Gangliar Control Circuit, or Cranial Nerve VII," for example. Traditional MSD diagnosis would have required the name of one of the dysarthria types or apraxia of speech. Each of these characteristics is relevant to the perceptual diagnosis of motor speech disorders. Some were directly related to perceptual skill, whereas others used perceptual skill in higher order manners (Bruner, 1958;Gibson, 1995;Sapir, 1921;Whorf, 1956) in order to categorize motor speech type or associate localization or confirmatory signs. MSD diagnosis types were expected to be classified in terms of the traditional dysarthria types. An example of a BPLALDS slide shows a video file of a woman with dysarthria with five corresponding questions: "13. What term best describes speech rate? 14. What term describes voice quality?15. What term describes articulation? This person has two types of dysarthria. 16. Name one. 17. Name another" (Duffy, 2013d, Slide #3, questions 13 -17). BPLALDS response sheets were in the form of an open-ended list numbered from 1-172, which required, in most cases, the correct word(s) to be recalled and written in each blank. A small number (n = 2) of the total number of (N = 172) questions (or 1.2%) were of a multiple-choice format.
Although 29 of the samples also appeared in the online student training materials and appeared relatively familiar to participants on posttesting, the remaining 15 samples were unique to the BPLALDS and were unique or "novel" to the pre-and posttesting conditions. Confirmation of the categorization of the samples as unique (n = 15) or not (n = 29) was determined via comparison by the first author of the student training materials and the BPLALDS; correct categorization of items was later verified (J. Duffy, personal communication, August 13, 2016). This categorization was important for later assignment of stimuli to "familiar" and "novel" conditions for statistical analysis.
An answer key was provided as part of the supplemental course materials exclusively for use by the instructor; participants did not have access to the answer key.
A Dell Optiplex 755 with Internet access was used to project the BPLALDS PowerPoint to a large 64" Aquos 1080P, flat-panel television screen. An Altec, Lansing multimedia computer with a surround sound speaker system model number ACS 400 with adjustable volume was used to present the audio signal associated with each clip.
Outside of class, participants had access to an online program that supplemented the Duffy (2013d) text entitled, Developing Perceptual & Diagnostic Skills (DPDS; Duffy 2013c), "intended to guide the acquisition of auditory and visual perceptual skills necessary to describing motor speech disorders (MSDs) and understanding their meaning" (Duffy, 2013a, p. 1).
For the didactic portion of the course, oral lectures were supplemented with PowerPoint presentations. The presentations contained written notes, figures, tables, and at least one example for each dysarthria type and apraxia of speech from the samples provided in the DPDS student training materials. Participants were also required to read the course textbook (Duffy, 2013d). Furthermore, they had access to instructor-created PowerPoint lecture outlines associated with each lecture. Links to the lecture outlines were provided via the Moodle 2.0 (Moodle Pty LTD, 2016) open-source course delivery platform.

Research Design
This study used two different designs (Orlikoff, Schiavetti, & Metz, 2015). First was a within subjects, repeated measures design used to investigate whether there were differences in scores on the BPLALDS before and after having taken an MSD course. A within subjects design was also used to compare posttest scores of all participants to academic grades of the same participants. The second was a mixed between-within subjects posttest-only design used to compare the same group of participants divided into lower and upper halves of the class (a between groups variable) in terms of their responses to novel and familiar stimuli (within groups variable) on a posttest measure of accuracy in perception score as measured by the BPLALDS. Within this second design, there were two independent variables (IVs) and one dependent variable (DV). The IVs were (1) familiar and novel conditions in which all participants were exposed to two distinct types of stimuli ( previously exposed items and items that they had not been exposed) and (2) the grouping variable top and bottom halves of the class defined in two different ways (those who made large as compared to small gains from pre-to posttest and those who made high as compared to low scores on the BPLALDS posttest). The DV was percentage accuracy score on the BPLALDS posttest. (See procedure below). A control-group was not used due to concerns regarding the practicality and ethics of using a "no treatment" group with participants.

Procedure
BPLALDS pretest. Before didactic course instruction began, all thirteen participants sat at desks in a quiet classroom with a blank response sheet and a pencil. The BPLALDS was played by the course instructor via the television and speakers. Prior to presentation of each slide, the course instructor called attention to the numbers of response items that would be addressed.
Participants were asked to respond to each of the 172 questions on previously described the topics of (a) speech characteristics, (b) confirmatory signs, (c) underlying neuropathophysiology, (d) neurological localization, and/or (e) traditional MSD diagnosis. Each slide was delivered at approximately the same conversational volume although minor volume adjustments were made at the request of the participants. Participants completed response sheets independently. Administration of the BPLALDS pretest lasted for a total of approximately one hour. Although BPLALDS pretest scores were not used in the main analyses of variance, they were used to (a) compare overall pre-to posttest scores and to determine participants' membership in the category of top or bottom half of the class as defined by amount of gain.
Course instruction. The didactic portion of the course took place over a spring semester and consisted of fourteen individual three-hour-long class periods. Each period had a small break of usually 5-10 minutes about 1.5 hours or approximately halfway through the class period. Lectures were delivered primarily in a traditional lecture format supplemented by PowerPoints and small-group work. DPDS audio and video slides were embedded within some PowerPoint lectures to supplement course content. Each lecture on each specific dysarthria type or apraxia used at least one DPDS video or audio clip. In addition to the traditional lecture format, smallgroup work similar to Think-Pair-Share methodology (King, 1993) was used in nearly every class period for approximately 5-10 minutes each time. Small-group work required participants to engage in activities such as summarizing, diagraming, and applying knowledge to theoretical frameworks.
BPLALDS posttest. After didactic course instruction was completed, the pretest procedure was carried out again as a posttest. All pre-and posttest data were gathered as part of the typical activities of the course. As a result, permission to analyze the data for the purposes of conducting this study was obtained retrospectively from the IRB at the authors' home institution.

Data scoring and analysis.
Perception scores. Perception scores were derived by the first author (course instructor) by comparing responses to the answer key. Those items that were not identical to or a reasonable synonym for an answer provided on the answer key were marked wrong. The number of items correct for each answer key was tallied and divided by the total number possible (N = 172) to derive a percentage of items correct or "perception score." Perception scores were then loaded into a Microsoft Excel (2013) spreadsheet for the purposes of analyzing descriptive statistics such as central tendencies and variances. Next, pre-and posttest scores were compared using a Wilcoxon signed-ranks test to investigate the relationship between pre-and posttest perception scores.

Familiar vs. novel conditions comparison.
After the basic analysis of scores as a whole, a post-hoc analysis of familiar and novel stimuli was conducted. The scores were broken into two groups. The first group was labeled "familiar," because it contained items to which participants had had exposure within instructional materials, specifically within the instructional labs. The second group was labeled "novel" because it contained items to which participants had had no exposure during the course; that is, the items were unique to the BPLALDS. These two groups of stimuli were assumed to require two different types of learning: (1) familiar, which required "stereotyped" application of perceptual knowledge resulting from previous exposure to the stimulus and (2) novel, which required "flexibility" of application of perceptual knowledge even with a lack of previous exposure to the stimulus (Gibson, 1970). This stereotyped type of knowledge is often referred to as "memorization" whereas flexible knowledge requires a deeper understanding (Bransford, Brown, & Cocking, 1999). Statistical Package for Social Sciences (SPSS) was then used to run an independent-samples t-test to compare accuracy of responses in familiar and novel conditions. Two separate, two way Analyses of Variance (ANOVAs) were then conducted to determine whether there were differences in the percentage accuracy of responses on the posttest (DV) in familiar and novel stimulus conditions (within subjects factor with two levels) across the top and bottom halves of the class of participants (between subjects factor with two levels). The between subjects factor was determined in two different ways, so two different analyses were conducted, one using each method. The first analysis of variance broke participants into two groups that varied in terms of overall gain in perceptual scores, roughly the top and bottom halves of the class in terms of gain. The top half was composed of those who improved perceptual ability relatively more (those who gained 30 percentage points or more (n = 7) and were labeled "large gainers"), and the bottom half was composed of those improved relative less (those who gained less than 30 percentage points (n = 6) and were labeled "small gainers"). The second analysis of variance broke participants into two groups, again, roughly the top and bottom half of the class, but, this time, in terms of overall posttest score. These groups varied in terms of percentage correct on the posttest score regardless of the amount of gain made. The first of these had the high posttest perceptual scores (with scores of 50% or above (n = 7) and the second of these had the low posttest scores with scores below 50% (n = 6) regardless of initial score. (See Table 1).
[Insert Table 1 here] Course grades. Course grades, which were unrelated to the data collected in this study, were earned by participants based on the number of possible points received in the course out of a total possible 360 points. Three exams composed of multiple choice and short-answer questions were worth 100 points each, a case study project was worth 50 points and 10 points could have been earned for class participation. Course grades were investigated as to the strength of their relationship to posttest perception scores using a correlation analysis to derive a Spearman's Rank Order correlation coefficient, a nonparametric measure of relationship used for these data due to the small sample size.

Subjective ratings of difficulty.
Participants were asked to quantify their subjective impression of the difficulty of the pre-and posttest tasks. After completion of each of the tasks they were asked to rate difficulty by writing a number on a scale of 1 to 10 with 1 being extremely easy and 10 being extremely difficult on the top of their paper. Difficulty ratings were then categorized into four groups: (1) very difficult -a score of 8, 9 or 10; (2) moderately difficult -a score of 6 or 7; (3) moderately easy -a score of 4 or 5; and (4) very easy -a score of 1, 2 or 3. Descriptive statistics were derived for pre-and posttest difficulty ratings.
Interjudge agreement regarding data scoring. Fifteen percent (n = 4) of the perception scores originally derived by the first author were randomly selected then rescored by the second author and resulted in an interjudge agreement score of 90.99%.

Statistical Analyses
Description of perception scores. The mean pretest perception score was 18.88% (SD = 4.0%). The mean posttest perception score was 50.18% (SD = 13.2%). On average, participants gained 53.84 points or 31.3% on their posttests as compared to their pretests. The range in gain across individuals varied from between 14 points or 8.2% to 91 points or 52.9%. Additional descriptive analyses of individual participant performance revealed the following: the person with the highest score on the pretest, did not make the most gain, and yet ended up in the group with the highest posttest scores. Furthermore, the person with the most gain of 91 points, gained a number of points greater than some participants' final score total of 51 points or the number of points earned by the participant with the lowest total final score.
[Insert Figure 1 here] Perception score comparison. A Wilcoxon signed-ranks test indicated that posttest scores were significantly higher than pretest scores: T = 91, p < .005. A large to very large effect size for this difference (Cohen, 1988;Rosenthal, 1994) r = .64, was calculated using r = Z/the square root of N (Field, 2013;Pallant, 2007;Rosenthal, 1994).

Familiar vs. novel conditions comparison.
Response accuracy of all participants on each item (1 = correct and 0 = incorrect) was then compared across the familiar condition (M=.51, SD=.50) as compared to the novel condition (M=.48, SD=.50) using an independent-samples t-test and revealed no significant differences in overall response accuracy (t(2234) = -1.34, p = .182). Specifically, as a group, participants responded with similar accuracy to familiar items and novel items.

Posttest comparisons of top vs. bottom half of the class' performance with novel and familiar stimuli .
The top and bottom halves of the class of participants were investigated as to their variation in performance in perceiving familiar stimuli to which they had been exposed and novel stimuli which they had not been exposed. This comparison was made in two different ways. First, "large gainers" on posttest were compared to "small gainers," and then, "high posttesters" were compared to "low posttesters." Generally, the two groups were composed of the same participants, i.e., the large gainers were generally also high posttesters, and the low postesters were also low gainers. (See Table 1).

Large gainers compared to small gainers.
In the first of two, two-way analyses of variance, participants who made larger gains from pre-to posttest were compared to those who made smaller gains in terms of their performance on familiar as compared to novel stimuli on the posttest. (See Figure 2).
[Insert Figure 2 here] The interaction effect was not significant, F(1, 2232) = 3.72, p = .054, ηp 2 = .002, indicating that small gainers did not perform significantly worse than large gainers in the novel-items condition than in the familiar-items conditions on the posttest. A main effect was found for familiar vs. novel test items, F(1, 2232) = 9.10, p = .003, ηp 2 = .004, such that the average score for familiar items (M = .52, SD = .50) was significantly higher than the average score for novel items (M = .46, SD = .50). The main effect for difference in scores was also significant F(1, 2232) = 81.45, p < .001, ηp 2 = .035, across small gainers (M = .40, SD = .49), and large gainers (M = .58, SD = .49). Effect sizes for the main effect of familiar and novel items was small, as was the effect size for the interaction effect, whereas the effect size for the main effect across groups that made small and large gains was large (Cohen, 1988). Although the interaction was not significant (p = .054), it indicated an overall trend in the data toward those who made larger score gains being those who were the best generalizers.
High as compared to low posttesters. In the second two-way analysis of variance, the interaction effect was significant, F(1, 2232) = 20.84, p < .001, ηp 2 = .009, indicating that those who made lower posttest scores did significantly worse than those who had higher posttest scores, and that this difference was significantly greater in the novel-items condition than in the familiaritems condition. (See Figure 3).
[Insert Figure 3 here] A main effect was found for familiar vs. novel test items, F(1, 2232) = 5.75, p = .02, ηp 2 = .003, such that the average score for familiar items (M = .51, SD = .50) was significantly higher than the average score for novel items (M = .48, SD = .50). The main effect for difference in scores was also significant F(1, 2232) = 89.53, p < .001, ηp 2 = .039, across low posttester (M = .40, SD = .49), and the high posttester groups (M = .58, SD = .49). Similar to the previous analysis of variance, effect sizes for the main effect of familiar and novel items and effect size for the interaction effect were both small, whereas the effect size for the main effect across groups with low vs. high posttest scores was large (Cohen, 1988).
Taken together, these two comparisons of the top and bottom halves of the class indicate a group difference in perceptual ability performance on items which require generalization and those that do not.
Description of course grades. Overall course grades averaged 82.48% (SD = 6.95). The lowest course grade was a 67.5%, D, and the highest grade was a 93.33%, A.

Comparison of difference scores to final grades.
In comparing pre-posttest difference scores to grades in the course, a Spearman's correlation coefficient revealed a strong positive relationship (rs = .64, p < .01). Additional descriptive analysis of grades as compared to perceptual scores revealed that those with the two highest academic grades in the course also ended up with the two highest perceptual posttest scores; although, neither of these top-two participants started with the highest perceptual score on pretest. In contrast, the one person to get a grade of "D" in the course had the overall lowest posttest score, but did not begin with the lowest pretest score.
Description of difficulty-ratings data. Subjective ratings of difficulty averaged 9.46 (SD = .78) with a range from 8 -10 for the pretest and 7.5 (SD = 1.79) with a range from 3 -9 for the posttest.
Difficulty ratings comparison. All of the pretest subjective difficulty ratings could be categorized as "very difficult." On the posttest, three participants rated the task as being "very difficult" but none of them gave it the extreme rating of "10." The most frequent posttest rating was "moderately difficult" (n = 5). Several participants gave a posttest rating of "moderately easy" rating (n = 4), and one participant rated the task as "very easy."

Examination of the Results
Although it is now required for ASHA Council on Academic Accreditation in Audiology and Speech-Language Pathology (CAA)accredited graduate programs in SLP to document students' skills development (ASHA, 2016), it is not always clear how academic course instructors should do so. The results of this study indicate that the development of the gross clinical skills needed for perceptual diagnosis of MSDs is documentable. Both academic knowledge and perceptual skill development were tracked, and a strong correlation between the two was noted. This result indicates that gains in perceptual ability closely co-occurred with gains in overall academic knowledge. Although the array of variables accounting for individual differences in performance were not investigated in this study, one difference, performance on novel as opposed to familiar stimuli, seemed to influence overall perceptual performance as indicated by posttest score. As a group, participants performed similarly across familiar and novel conditions, indicating no overall difference in generalization of knowledge to unfamiliar items. However, analysis of group differences culled out a remarkable difference in learning. The top half of the class or "high posttesters" tended to generalize to novel items better than the lower half of the class or "low posttesters". This finding is not dissimilar from previous research, which indicates that deeper learning occurs when items are not learned strictly from memorization but rather, after generalization to similar tokens of a given type (Huberman, Bitter, Anthony, & O'Day, 2014). Memorization involves storing each example of a learned input explicitly, whereas generalization involves storing the concepts and general rules that underlie the input in order that it be made available for application to future tokens. Generalization or "transfer" of knowledge to new contexts allows for efficient storage of multiple tokens with similar features, which then allows for prediction of the features that will be present when additional tokens of a certain type are presented (Bruner, 1958;Bransford, Brown, & Cocking, 2000;Bransford, & Schwartz, 1999). This is flexibility, the end point of perceptual development to which Gibson (1995) refers, with which learners are able to apply perceptual skills to new experiences in new environments.
Reflecting on theory, the findings from the current study provide support for Gibson's (1969) basic tenets of perceptual differentiation theory in that all students learn a basic set of perceptual features that distinguish MSDs from one another. However, those at the bottom half of the class seemed to have knowledge that was restricted to the stereotyped knowledge that they had gained only from familiar stimuli. In contrast, those at the top of the class tended to be those who were flexible learners. This involved the ability to perform well on stimulus items that were familiar, and also on novel items. This application of perceptual knowledge shows flexibility in that the perceptual knowledge was applied to new cases that the student engaged with, analyzed, and interpreted using a previously learned set of perceptual knowledge. This study revealed that perceptual development involved not only learning to strictly match interpretations within a familiar sample, but also the ability to apply these perceptions to new and different cases flexibly. This perceptual development was likely achieved through a combination of the effective application of cognitive and linguistic resources (Bruner, 1958;Sapir, 1921;Whorf, 1956) and active engagement in course material and lab work (Gibson, 1995). Although specific a-priori listener characteristics were not investigated in this study, previous research seems to suggest that there are several that seem to influence one's skill in perceiving speech disorders. These include: listener familiarity with the speaker and linguistic experience (Kent, 1996) as well as musical training, phonemic awareness and native language status (Dankovicova' et al,, 2007;DeBoer & Shealy, 1995;Kreiman et al., 2010;Robinson et al., 2011). The experiential factors that seem to influence one's skill in perceiving speech disorders include: education in speech pathology and specific training in perceiving speech disorders (Gelfer, 1993;Kreiman et al., 1990;Zyski & Weisiger, 1987). Those educational and training factors will certainly continue to be elucidated by ongoing research.
The takeaway for instructors is that students in MSD courses with strong outcomes on measures of perception seem to be more likely to readily generalize learning to novel cases. In contrast, those who learn solely through memorization may be more likely to have poorer generalization. But rather than instructors throwing up our hands in regard to those who are more concrete learners, we should remain open to the possibility that supporting generalization in more concrete learners may be possible by providing bootstrapping supports to aid generalization. These supports could come in the form of: (1) provision of multiple case studies that broaden the number of exemplars of each MSD type, (2) visual organization of each motor speech disorder type to provide symbolic reference for novel cases, (3) practicing making judgements regarding novel cases (e.g., via lab assignments), (4) instructor modeling of the reasoning regarding categorization of familiar and novel exemplars, (5) setting high expectations regarding outcomes as a means of increasing student motivation to generalize knowledge.
Stated differently, the data from the current study revealed that all students experienced gain, but each showed different amounts of gain. Students' ability to generalize knowledge seemed to help, as did baseline perceptual ability prior to training. It remains unclear to what degree each of these variables accounts for this difference in gain across students; and likely overall ability to learn perceptual skills is influenced by nature and nurture to different degrees in each student.
As perceptual abilities improved, ratings of perceived level of difficulty decreased (improved). In other words, there seemed to be an inverse relationship between perceptual ability and difficulty rating. Although this trend might be expected, posttest difficulty ratings remained near the "difficult" end of the continuum; and although they were less extreme than difficulty ratings at pretest, overall, perceived difficulty did not decrease as much as expected. Furthermore, extent of difficulty rating changes could not be explained by extent of improvement in perceptual skill alone. Confidence seemed to increase in some to a degree greater than their increase in perceptual ability, and vice versa. Some with relatively great increases in perceptual ability did not report a marked decrease in difficulty. This finding is in line with previous research that relatively high clinical confidence can correspond to relatively low competence (Stockman, Boult, & Robinson, 2008). Other research (Pasupathy & Bogschutz, 2013) has found a positive relationship between students' clinical ability and "self-efficacy." Self-efficacy is a construct that can be defined as: "beliefs in one's capabilities to organize and execute the courses of action required to produce given attainments" (Bandura, 1997, p. 3). As competence in perceiving motor speech disorders grows, clinical supervisors will certainly find benefit in monitoring their students' self-efficacy (Pasupathy & Bogschutz, 2013).

Limitations of the Current Study
The current study offered a first look into students' development of the gross perceptual skills needed for perceptual diagnosis of MSDs. However, additional, more controlled studies could employ comparison groups to impose greater experimental control. A lack of a control group in the present study prevents analysis of the reason(s) for the change from pre-to posttest.
An additional criticism might be made that a threat to external validity was introduced by the small and homogenous sample. As a result, generalization to the larger population of "SLP students" should be approached with caution.

Future Research
Further research could investigate the effect of pedagogical and/or student variables on perceptual outcomes. Would outcomes change given different instructional methods? Furthermore, what additional student variables would affect outcomes?
Additionally, focusing on particular perceptual features within traditional classification schemes may be a promising direction. Alternatives to those perceptual features associated with traditional classification systems could be investigated as well. Some such potential targets were identified by Lansford, Liss, and Norton (2014), who found that students of SLP freely associate specific perceptual dimensions in their classification of dysarthria. Student research participants were asked to listen to dysarthric speakers then drag and drop icons representing each speaker onto a computer screen. In so doing, a visual display of six distinct clusters of speakers, and conceivably, six perceptually distinct MSD types was created. These clusters did not match with Darley, Aronson, & Brown's traditional classifications but with the perceptual distinctions made by the listeners. Further analysis revealed that the six groups could be distinguished on the basis of their differences in use of three perceptual features of (1) rate and rhythm, (2) intelligibility and (3) vocal quality (Lansford et al., 2014). Since these features are so salient in naturally grouping dysarthric speakers, a pedagogical focus on these particular features may aid in generalization of knowledge for students as a whole. That is, all students may learn better with direct instruction to listen for distinctions in those three salient features. Furthermore, different types of learners may benefit differentially from this added focus. The current research suggests that those with better perceptual discrimination as measured by high posttest BPLALDS scores are more likely to generalize than those with poorer discrimination ability. It remains unclear whether each of these groups would benefit equally from instructional methods that focus on those features which are amenable to free classification (i.e., rate, rhythm, intelligibility and vocal quality). It may be that honing the perceptual foci of those who do not generalize well can bootstrap them into generalizing better than they would without such instructional support, and may help generalizers to learn even more than they might have.

Conclusion
It appears that the skills needed to perceptually diagnose MSDs are amenable to course instruction. Future research of the Scholarship of Teaching and Learning ilk will be needed to determine the pedagogical methods and student variables that lead to optimal outcomes. Students' ability to generalize underlying concepts, from stimuli with which they have had experience to novel stimuli, seems to be an essential dynamic within the learning and teaching process.    Post-test scores of students who made large and small gains by novelty of stimuli. This figure illustrates the differences in performance of two groups (those who made relatively small and those who made relatively large gains across pre-and posttests) in terms of their scores as measured by percentage correct (on the y-axis) on two types of stimulus items (those that required generalization (GEN) due to being novel and those that did not require generalization (NOT) due to being familiar). Note: This comparison includes student #10 in the large gain group (LRG), and student #6 in the small gain group (SML). . Post-test scores of students who had large and small post-test scores by novelty of stimuli. This figure illustrates the differences in performance of two groups (those who received relative small and those who received relatively large post-test scores) in terms of their scores (on the y-axis) on two types of stimulus items (those that required generalization due to being novel (GEN) and those that did not require generalization due to being familiar (NOT)). Note: This comparison includes student # 6 in the high final score group (LRG), and student #10 in the low final score group (SML).