Introduction

A cleft lip and palate (CLP) is a relatively common congenital malformation, with an incidence of about 1 in 700 newborns in the Caucasian population [1]. While an isolated cleft lip primarily is an aesthetic problem, complete CLP may cause velopharyngeal insufficiency (VPI) which can interfere with speaking, breathing, and swallowing. Often CLP is associated with an articulation disorder generally regarded as compensation to an anatomical abnormity leading to VPI. This may lead to dysfunctions not only of the velopharyngeal sphincter, but also of the entire vocal tract [2]. CLP-associated VPI typically leads to deviations in the resonance such as hypernasality or hyponasality, nasal air emission and weak pressure consonants, and compensatory articulation [3].

VPI is an eminently clinical diagnosis. Any surgical intervention to correct the underlying anatomy should be planned, based on the combination of video-naso-pharyngoscopy and multiplanar videofluoroscopy. Magnetic resonance imaging is an emerging diagnostic tool, however, to date not widely adopted [4, 5].

Typically, speech and language pathologists assess articulation placement and manner, and examine the oral cavity and the pharynx through direct vision and palpation of the hard and soft palate [6]. However, the most important diagnostic procedure is a subjective evaluation by speech and language pathologists, who assess hyper-or hyponasality, nasal air emission and/or turbulances, consonant production errors, and voice disorders. For this subjective evaluation, the patients’ language background and age are important.

A more objective method may be the so-called nasometry, which measures nasalance instrumentally and can provide objective data for evaluating nasal resonance.

Different centers may use different speech parameters in testing, and therefore, they are not always comparable [7]. Several assessment protocols have been described [810], but none of them was widely adopted. The most accepted assessment protocol was developed by an international working group. Henningsson et al. [11] reported an universal system for reporting speech outcome measures. The system includes five universal characteristics like hypernasaly, hyponasality, nasal air emission and/or turbulances, consonant production errors, and voice disorders. Concerning the grading of hypernasality, mild forms are appreciated primarily in certain vowels and may not be socially disturbing to the patient or family. Moderate hypernasality is audible with most vowels and deemed socially unacceptable. Severe hypernasality would usually prompt a recommendation for intervention by the patient and the clinician, because speech intelligibility is significantly diminished [11]. A standard test regimen for the perceptual nasality evaluation is routinely performed using specific test items, like those from the Heidelberg Rhinophonia assessment form [12]. Typically, for perceptive nasality assessment, a scale with three to five grades is used. Recently, Baylis et al. [13] described that the use of analog visual scales is more accurate than the nasality documentation within a few categories only. Many studies investigate the use of instrumental objective diagnosis, which typically provide higher diagnostic resolutions compared to only few assessment classes diagnosed by speech therapists [1, 1416].

Our goal was to determine possible differences between perceptual and instrumental measurements in the east Austrian area. We selected various items of the Heidelberg Rhinophonia assessment form and determined their nasalance scores on the NasalView® System to explore their potential to assist the perceptive nasality assessment.

Materials and methods

Patients

We recruited 39 patients grown up in eastern Austria with a repaired cleft palate. All patients (or their parents) provided written informed consent to their study specific video/voice recording and instrumental nasalance assessment. All recruited patients consented to the electronic storage of their speech recordings, personal data, and the use of their assessments for scientific purposes. Because our study focused on hypernasality after cleft palate repair, we excluded three patients with obvious nasal obstructions, e.g., due to acute infections and in one case a Cul-de-sac resonance. There were no further inclusion criteria, such as age or gender.

Perceptual nasality assessment

We selected four vowel items out of the Heidelberg Rhinophonia test form, two words with fricatives and plosives and an oral sentence (without any nasal consonants), and one sentence with eight nasal consonants (Table 1). We termed the four items without nasal contents (#1, #3, #6, #7) as “oral” and the items with nasal content as “nasal” (#2, #4, #5, #8).

Table 1 Selected 8 items from the Heidelberg Rhinophonia assessment form

During the testing, a video with sound of each patient was recorded. Two speech therapists (one experienced and one trainee) assessed the perceptive nasality of the patients. We provided both evaluators with the identical video records, to reduce the burden for the patients and also minimize any variance in the patients’ presentations. The evaluators assessment was categorized in 4 grades (grade 0—normal, 1—mild, 2—moderate, and 3—severe hypernasality) as proposed by an international working group [11].

Instrumental nasalance evaluation

Nasalance was measured with the NasalView®-System (Version 1.2, Tiger Electronics DRS Inc., Seattle, WA, USA). The instrument was calibrated according to the producers’ instructions. The instrumental measurements and the speech recordings on the videos were done during the same appointment. For the comparison of the instrumental and the perceptive assessments, we used the same speech items for the NasalView® measurements and the perceptive evaluation. For each of the eight speech items, the mean nasalance values per test person were statistically analyzed.

Data processing and statistics

For the statistical analyses, we used the computer program Stata (Version 13.1, StataCorp, TX, USA) to analyze mean and standard deviations. The interrater reliability between the two speech therapists was analyzed by computing Cohens kappa with linear weighting [17], and was found to match “almost perfect”. Although the analysis showed no difference between the novice and the experienced observer, just to avoid intermediate scores, we plotted the diagnostic groups of the experienced speech therapist against the NasalView® measurements in box plots (Figs. 1, 2).

Fig. 1
figure 1

Box plots perceptual/instrumental. Plot of the four diagnostic nasalance groups against the NasalView® results. Boxplots with medians, quartiles, minimum and maximum, and outliers of the eight speech items #1, #2, #3, #4, #5, #6, #7, #8

To discard suboptimal speech items, we tested the individual items and item groups by receiver-operating characteristics (ROC) analyses as done before by Bressmann [18]. Because ROC analyses are a binary classifier system [19], we transformed the ordinate data of the perceptive tests into two categories. The group rated “0” was determined “normal nasality”, the groups rated “1, 2, and 3,” were summarized to “hypernasality”. Figures 3 and 4 show the ROC curves.

In the next step to describe how well the test separates the groups with and without nasality, we measured the accuracy by the area under the ROC curve. The area under the curve is a measure of correctly classified nasality assessments. We computed the cutpoints and the areas under the curve (AUCs), and determined the sensitivity and specificity for each of the eight individual speech items and various speech item groups (Table 2).

Table 2 Correlation of subjective and objective speech evaluations

Results

Participants

36 participants contributed to the data set for this study. The age range was from 8 to 27 years (15.4 ± 5.3 mean ± SD); 13 participants were female (36.1%) and 23 were male (63.9%). Most patients (86.1%) were CLP patients (N = 31; 11 female, 20 male). Five patients (13.9%; 3 female, 2 male) had an isolated cleft palate. Of the 31 CLP patients, the cleft was on the left side in 15 cases (48.4%), in 8 patients on the right (25.8%), and in 8 patients (25.8%) on both sides. In our study cohort, we observed no statistically differences with respect to gender, location, and type of the malformation (data not shown). 22.2% of the patients revealed a normal nasality and 77.8% of the patients were hypernasal.

Interrater agreement

The interrater reliability between the two speech therapists resulted in a weighted kappa of 0.8816 (almost perfect match, ĸ w = 0.81–1.0).

Perceptive vs. instrumental diagnosis

The data obtained with the NasalView®-System were categorized based on the four perceptual assessment groups: group 0 (N = 8, normal), group 1 (N = 14, mild), group 2 (N = 9, moderate), and group 3, (N = 5, severe hypernasality). Figure 1 shows the instrumentally measured nasalance distributions for each perceptually diagnosed grade per single speech item. The nasal items #2, #4, and #5, and the oral item #7 showed the best linear correlation with the four perceptual grades (Fig. 1; Table 2). Figure 2 shows the results of nasal distance (ND) and the nasalance ratio (NR) from speech item #7 and #8, all four oral and all four nasal speech items, or all eight speech items together. Among all groups of four or eight items, only the nasal group revealed similar results as the four best single items (Fig. 2; Table 2).

Fig. 2
figure 2

Grouped speech items. Box plots of the four diagnostic groups; median, quartiles, minimum and maximum, and outliers of ND (a) and NR (b) of the speech items #7 and #8; group score of oral speech items #1, #3, #6, and #7 (c); group score of nasal speech items #2, #4, #5, and #8 (d); and group score of all eight speech items (e)

Figure 3 shows the ROC analyses for each speech item. Figure 4 shows the results of ND and NR from the speech items #7 and #8, the average of the oral or the nasal items, or the average from all eight speech items. For the discrimination of specificity and sensitivity, the AUC and the content of correctly classified patients are listed in Table 2. Based on these two parameters, by taking into account the correctly classified patients and the AUC, we determined the cut points, and the respective sensitivity and specificity (Table 2).

Fig. 3
figure 3

ROC curves, 8 items. Receiver-operating characteristics (ROC) to determine the accuracy of the test reliability of NasalView® measurement compared to the perceptual nasality measurement. The ROC curves show the relationship between the perceptual and instrumental method; eight individual speech items #1, #2, #3, #4, #5, #6, #7, #8

Fig. 4
figure 4

ROC, combined items. Receiver-operating characteristics (ROC) to determine the accuracy of the test reliability of NasalView® measurement compared to the perceptual nasality measurement. The ROC curves show the relationship between the perceptual and instrumental method; ND (a) and NR (b) of the speech items #7 and #8; grouped oral speech items #1, #3, #6, and #7 (c); grouped nasal speech items #2, #4, #5, and #8 (d), and all eight speech items (e)

The ND and NR of the speech item #7 and #8, as well as the oral (#1, #3, #6 and #7), and nasal (#2, #4, #5 and #8) speech items, as well as all eight speech items together did not reveal a higher sensitivity than the four single speech items #2, #4, #5, and #7 (Table 2).

Discussion

By comparing the gold standard “perceptual diagnosis” with instrumental measurements, we found speech items, suitable for the instrumental follow-up assessments in patients with diagnosed hypernasality. The sensitivity of only four specific speech items is superior to the averaged sensitivity using additional speech items. Therefore, our findings may provide clinicians with a strategy to increase the sensitivity in the follow-up of patients with perceptively diagnosed nasality.

Considering the reliability of perceptually diagnosed nasality, multiple raters may be preferred compared to the diagnosis by only one evaluator [20]. Therefore, we engaged two speech therapists for the perceptive assessment: an experienced speech therapist and a trainee. It is clear that a comparison between two speech therapists cannot be statistically significance, and therefore, our specific results cannot be extrapolated to other clinical centers. The comparison of the vowel [i:] and [a:] reflects a possible compensatory function of the tongue. During [i:] phonation, the tongue is positioned high close to the soft palate. During phonation of the vowel [a:], the tongue lies deep and far back [21]. Our findings that the use of a nasal and oral sentence can provide clinically useful results confirm Bressmann [18], who described that the use of only one short nasal and one short oral sentence does not compromise the validity of the examination. In addition, Watterson, et al. [22] described that speech items with a minimum of six syllables are sufficient for valid determination of nasalance. Therefore, we selected a short sentence without nasal consonants (Table 1; item #7) and another with only nasal consonants (Table 1, item #8). To reveal possible additional information, we computed the so-called nasalance distance (ND) and nasalance ratio (NR) from the two sentences (item #7 and #8). The nasal sentence reveals the nasalance maximum, while the oral sentence indicates the individuals’ nasalance minimum [23].

When plotting the four perceptually diagnosed groups against the NasalView® measures, we observed a good correlation with some but not all individual speech items. The positive discrimination was best in the speech items 2 and 4 (Fig. 1). The ND and NR between speech item #7 and #8 corresponded rather poorly to the perceptual grading (Fig. 2). The mean of grouped speech items (four oral, four nasal, all 8 speech items) discriminated rather well between the perceptual and instrumental measures (Fig. 2).

The comparison of perceptive and instrumental assessments is a common approach to investigate the theory and praxis of speech assessment [24]. Using ROC curves, we revealed that the nasalance associated with individual items and item groups corresponded at various degrees to the perceptual assessments.

NasalView® measurements can correlate with perceptual assessments [16]. Single speech items can correlate better than item groups (Table 2). However, it would be a key flaw to adopt our specific results to other clinical centers without further tests, simply because a comparison of only one speech language pathologist with an instrumental diagnostic method cannot reveal statistically significant results.

Each of the single nasal speech items #2, #4, #5, and the oral speech item #7, provided the same or better AUC measures compared to the four oral, or the four nasal speech items combined, or all 8 speech items in total (compare Fig. 3, with Fig. 4). The computed sensitivity and specificity for each single item analysis of speech item #2, #4, #5, and #7 score highest for sensitivity (96.43–100%), while the averaged “four oral” “four nasal” or “all eight speech items together” scored lower (89.29–96.43%; Table 2). The specificity of all results was below acceptable standards (Table 2).

Our findings with the NasalView System® are only partially transferable to other systems due to differences between instruments [25]. Because, in our study, the specificity of the instrumental nasalance measures was generally low, our findings support the previously published opinion, that instrumental assessment can never substitute, but only complement perceptual evaluation [7].

Socio-cultural and regional slang affects the comparability between studies [26, 27]. Seaver et al. [28] and Watterson et al. [29] considered regional differences and proposed the need for standardization for different regions. The usability of specific speech items may depend on the cultural and linguistic background of the assessed person [30].

Nasalance can vary between individual speakers and regional dialects [26]. However, in the follow-up situation where every patient provides his/her personal baseline and only intraindividual comparisons are relevant, a repeated instrumental assessment can document subtle changes. Therefore, we postulate that the instrumental assessment can be used independently of the patients´ specific linguistic background.

Bressmann et al. [23] described the ND and the NR as useful values, which can provide additional nasalance information. In our study, the sensitivity and specificity of ND and NR between speech item #7 and #8 were low (Table 2; AUC < 0.73). However, ND and NR may depend on the individual test person and specific test items.

Instrumental measures could be superior to perceptual examination in two aspects: finer scale (0–100%, instead of 0–3 in perceptual assessments) and objectivity of the instrument. As Baylis et al. [13] described that the use of a finer scale can provide more accurate documentation, instruments may provide better opportunities to quantitatively describe subtle improvements during follow-ups after therapeutic interventions.

Conclusion

The perceptual assessment of nasality by speech language pathologists remains the gold standard method for diagnosis, as it can also elucidate the grade of speech impairment. Instrumental evaluation cannot replace perceptual examination. However, after hypernasality has been diagnosed by perceptual methods, the instrumental nasalance assessment—due to the finer scale—may provide objectively documented subtle changes in the follow-up evaluation. Further studies to test the efficacy of instrumentally assisted follow-ups (e.g., after surgical intervention) are warranted.