Development and validation of the protocol for the evaluation of voice in patients with hearing impairment (PEV-SHI)

Introduction The voice of individuals with hearing impairment has been widely described, and can be compromised in all levels of the phonatory system. Objective To develop and validate an instrument for evaluating the voice of this population. Methods The instrument underwent the validation steps suggested by the Scientific Advisory Committee of the Medical Outcomes Trust. The study sample consisted of seventy-eight Brazilian people with cochlear implants (experimental group) and 78 individuals with normal hearing (control group), divided in groups by age range — children from 3 to 5 years; children from 6 to 10 years and adults from 18 to 46 years. The study sample participated in a voice recording of the sustained vowel /a/, connected speech and spontaneous conversation, in which three voice specialists rated using the proposed instrument. It consists of visual-analog scales of suprasegmental aspects, respiratory-phonatory coordination, resonance, phonation, additional parameters and general vocal perception. Results Evaluation by an expert committee and a pilot test established content validity. Reliability measures showed excellent test-retest reproducibility for the majority of the parameters. Analysis with the ROC curve showed that perceptual evaluation with the sustained vowel did not strongly differentiate individuals with cochlear implants from those with normal hearing, and the parameter “speech rate” did not differentiate the groups at all. For the connected speech and spontaneous conversation, the majority of the parameters differentiated the experimental group from the control group with an area under the curve ≥0.7. The cutoff values with maximum specificity and sensitivity were 30.5 for mild, 49.0 for moderate and 69.5 for intense deviation. Conclusions The protocol for the evaluation of voice in subjects with hearing impairment, PEV-SHI, is a reliable and useful tool for assessing the particularities of the voice of individuals with hearing impairment treated with cochlear implants and can be used in research and clinical settings to standardize evaluation and facilitate information exchange among services.


Introduction
Voice production occurs by the integration of the respiratory, phonatory and articulatory systems, 1,2 also involving highly complex mechanisms related to the central and peripheral nervous systems, such as auditory monitoring. 3 It can be described through auditory-perceptual, acoustic, aerodynamic evaluations and laryngeal imaging. 3 The auditory perceptual evaluation is considered the gold standard in voice assessment and enables characterization and quantification of perceptual vocal features. 4,5 The voice characteristics of individuals with hearing impairment can vary according to the type, severity, onset of the hearing loss, and to the treatment of choice. A list of perceptual attributes used to characterize the voice of these individuals in the last 10 years include: negative overall impression of the voice quality 6---8 ; roughness 6 ; strain 6,9 ; resonance disorders 7,10,11 ; high pitch 7 ; instability 7,12 ; and altered suprasegmental features such as intelligibility, articulation 10 and intonation. 13 Respiration, phonation, resonance and suprasegmental features are intimately related. For example, many of the references to nasality in deaf speech may refer not only to the actual feature of nasal resonance, but misarticulation of nasals, lack of oral/nasal distinctions, pitch variation, or any combination of these parameters. 14 These perceived characteristics can be justified by the lack of auditory monitoring of the voice, causing difficulty in developing phonatory control and abilities to regulate and vary the voice use in different situations. 3,15 Therefore, in addition to social, educational, and language limitations, hearing impairment can cause specific deviation of the communication related to speech and voice, interfering with intelligibility and crucially compromising the social integration of the individual, 3 so it is important that the assessment of voice production cover all of these elements.
The studies that performed auditory-perceptual evaluation of the voice of individuals with hearing impairment used protocols and scales directed to the global population with voice problems such as the GRBAS scale (G ----Grade; R ----Roughness; B ----Breathiness; A ----Asthenia; S ----Strain) 16 and the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V). 17 These scales, however, focus on voice production mainly at a glottal level, and therefore do not approach other pertinent features of the voice of the population with hearing impairment, such as the different possible resonance deviations and suprasegmental features of the voice. Also, the lack of standardization of the evaluation process across studies, such as which scale to use and the rating methods, can lead to unreliable and conflicting results.
For an adequate evaluation, it is important that the instrument consider all the relevant parameters to study a specific population. In addition, the scale should allow reliable discrimination between the normal voice and the voice of the target population. 18 The validation, therefore, of an instrument that approaches the singular voice characteristics of those with hearing impairment can bring important directions for speech-language pathologists regarding the investigation of voice production and rehabilitation of oral communication of these individuals. The purpose of this study was to develop an instrument for evaluating the voice of individuals with hearing impairment who use cochlear implants, establishing its validity for clinical and scientific purposes.

Methods
The ethics committee of the Brasília University ----College of Health Sciences approved this study under process number 16887713.4.0000.0030. All participants, parents or legal guardians signed the informed consent.

Participants
This study involved the participation of 156 individuals, seventy-eight people with cochlear implants (Experimental Group ----EG) and their hearing peers (Control Group ----CG) divided in groups by age range: 52 children from 3 to 5 years (G1), 54 children from 6 to 10 years (G2) and 50 adults from 18 to 46 years (G3). Half of the participants of each group consisted of the EG and half consisted of the CG. All participants were native speakers of the Brazilian Portuguese language. The EG included individuals with bilateral, severe to profound sensorineural hearing loss using a cochlear implant, with absence of associated disorders, attending a rehabilitation program, and who had experience of device use of at least one year. This study did not consider other criteria such as hearing loss onset, unilateral or bilateral implant, or use of contralateral hearing aid, since its purpose was to develop an instrument for the over-all population with cochlear implants. The CG consisted of individuals with normal hearing. To verify normal hearing, the participants of the CG underwent pure-tone threshold audiometry. Exclusion criteria for both groups were professional voice use, stage of menopause for women, current or previous smoking, regular use of alcoholic drinks, previous laryngeal surgery and being ill with pulmonary or upper airway infection on the day of the recording session.

Validation steps
The criteria recommended by the Scientific Advisory Committee of the Medical Outcomes Trust 19 directed the development and validation process of the instrument. The validation steps include describing the conceptual and measurement model, determining reliability measures, content validity, construct validity, interpretability and describing respondent and administrative burden.

Conceptual and measurement model
The Protocolo de Avaliação de Voz do Deficiente Auditivo (PAV-DA), translated as Protocol for the Evaluation of Voice in Subjects with Hearing Impairment (PEV-SHI) (Appendix A) was developed by consensus between three speech-language and hearing sciences professionals, based on perceptual features studied in the literature that stand out in the voice of individuals with hearing impairment. The voice tasks selected were the sustained vowel /a/, connected speech (numbers from 1 to 10) and spontaneous conversation. A 100 mm or 200 mm Visual-Analog Scale (VAS) follows each parameter. For the 100 mm line, the leftmost portion reflected the absence of deviation and the right end of the scale reflected the judgment of most intense deviation. For the parameters intonation, speech rate, pitch and loudness a 200 mm line was used, since the nature of the deviation can turn to opposite sides. For example, the pitch can be either too low or too high. Therefore, in the 200 mm scale, the midpoint was defined as adequate, with possible deviations to the left or right of this midpoint, allowing the rater to visualize the full range of the deviation in the VAS. Suprasegmental features and respiratory-phonatory coordination were to be assessed only for the spontaneous conversation. The selected parameters and their respective definitions were: -Suprasegmental aspects of the voice quality: Intelligibility: How understandable the speech is; Articulation: The correct production of speech sounds; Intonation: The melodic pattern and frequency variation in speech; Speech rate: How fast or slow speech in produced within a sentence.
Respiratory-phonatory coordination: Coordination between breath and speech.
-Resonance: The way in which the voice is projected into space. It may have an isolated or mixed characteristic. The raters selected more than one item in the protocol in case of a mixed resonance. The term ''excessively'' was used to express unbalance and predominance of the resonance in a certain region of the vocal tract. The resonance was classified as: Excessively laryngeal: Low resonance focus, the voice seems to be stuck in the throat; Excessively pharyngeal: The resonance focus is not so low. It is more centered in the oropharynx, which gives the voice a metal feature; Excessively hyponasal: Insufficient use of nasal cavity, which causes a perception of nasal obstruction. This parameter must be disregarded in the evaluation of the sustained vowel /a/; Excessively hypernasal: Excessive use of the nasal cavity, which causes a perceived nasal voice; Excessively anterior: Oral resonance focus, which causes a perception of a child-like voice in adults. In case of children, their voices do not match their ages. It seems like the person places their tongue anteriorly during speech; Excessively posterior: The resonance focus is in the posterior oral space, resembling someone speaking with a hot potato in the mouth. -Phonation: Strain: Excessive phonatory effort; Breathiness: Audible air escape in the voice; Roughness: Irregularity in voicing source; Instability: Unstable quality of emission regarding frequency and/or intensity. The same emission can have short-term or long-term instability. Both should be considered; Pitch: Perceptual correlate of fundamental frequency. A medium pitch is neither too low nor too high, and varies based on gender and age. The deviation may occur to high or low; Loudness: Perceptual correlate of intensity. A medium loudness is neither too loud nor too soft, considering the environmental features. The deviation may occur to loud or soft.
-Additional parameter: Any other relevant vocal characteristic the rater may notice and which is not addressed in the protocol. -General vocal perception: Global, integrated perception of voice deviation, after every parameter is separately assessed. The general vocal perception involves all aspects assessed in the protocol.

Content validity
The establishment of content validity consisted of two steps. In the first, an expert committee consisting of speech-language pathologists and audiologists, who were not involved in in the development of the protocol, judged the initial version of the PEV-SHI for its clarity, parameters and form of evaluation. All suggestions were analyzed and a partial version was determined. In the second, two speechlanguage pathologists with 20 years of training performed a pilot test based on the analysis of five voice samples of each speech task of individuals with cochlear implants using the instrument. Both had participated in the expert committee. After the pilot test, final adjustments were made, determining the final version of the PEV-SHI.

Data collection and auditory perceptual evaluation
After the determination of the final version of the PEV-SHI, three voice specialists who had not participated in any of the previous steps of this study rated the voice samples with it. The use of an odd number of raters is important to avoid potential ties in the evaluation and this number of raters was selected based on common practice in auditory-perceptual assessment of voice. 20---25 The three raters had extensive experience in performing perceptual evaluation in normal and disordered voices, and one of them had experience working with voice disorders in individuals with cochlear implants. The raters participated in previous training sessions, with the purpose of becoming acquainted with the protocol and having the same understanding of the parameters assessed in each speech task. The ratings were performed separately by age range and speech tasks. The raters knew the age and gender of each voice sample, but not if it belonged to a participant of the EG or the CG. The raters were also unfamiliar with the patients. Each rater performed the task individually and the data was charted. If the difference between the score given by the three raters for a determined parameter was within a margin of ten points, the mean of the three scores was considered. For the parameters which the difference exceeded 10 points, consensus rating was carried out. The raters gathered in additional meetings for new analysis, discussion and rating of these parameters.
The voice samples were recorded with the Sony Sound Forge 10.0 software with sampling rate of 44.100 Hz, 16 Bit, and Mono channel. The head microphone AKG C512, preamplifier M-audio Fast Track Pro and a notebook were used. The procedure was performed in a quiet, soundproof room with the microphone positioned at 45 • with a 3 cm distance from the participant's mouth.

Reliability
In order to establish reliability of the PEV-SHI, the raters repeated the auditory-perceptual evaluation of 20% of the voice samples in random order. The Interclass Correlation Coefficient (ICC) was used to verify test-retest reproducibility. The correlation scale adopted is available in Fig. 1.

Reliability
Construct validity was determined in two steps. First, by comparing the scores of the EG and CG using ANOVA. In addition, analysis of efficiency, sensibility and specificity were performed using the ROC curve. The closer the AUC is to 1.0, the greater the distinction between the EG and the CG. Second, by correlating the scores of the PEV-SHI with an external clinical criterion. For this, the raters, in another occasion classified the voice samples according to the overall dysphonia Grade (G) of the GRBAS scale. This score was compared to the score of the general vocal perception of the PEV-SHI.

Reliability
The cutoff values were determined based on the score of the overall dysphonia Grade (G) and on the levels of specificity and sensitivity given by the ROC curve to differentiate the voice of an individual with hearing impairment from a listener using the score of the general vocal perception of the PEV-SHI for the three voice samples. In addition to the cutoff values, the severity of the vocal deviation was determined, which can range from a normal variability of the voice quality, mild deviation, moderate deviation or intense deviation.

Reliability
Respondent and administrative burden included a full description of any demands involving the administration of the PEV-SHI, including time, training and necessary resources.

Results
This study presents the development and validation process of the Protocol for the Evaluation of Voice in Subjects with Hearing Impairment (PEV-SHI).

Content validity
After analysis of all the suggestions made by the expert committee and the pilot tests, the final version of the PEV-SHI was determined. Changes from the initial version included changes in definition and order of presentation of parameters, change in terminology and unification of the parameters articulation and instability, which were previously unraveled into more parameters. The final version is clear, comprehensible and contains adequate content for the target population. Table 1 illustrates the results of the ICC for the three groups together, showing excellent reliability for all tasks and excellent test-retest reproducibility. For the groups separately, only one parameter presented with poor correlation. For G1 in the sustained vowel and for G3 in the connected speech there was poor correlation for the parameter strain. Correlation was either good, or in most cases, excellent, for all parameters in all tasks in the separate groups.

Construct validity
The comparison of the scores between the EG and CG using ANOVA showed significant differences in most parameters. The task with least significant results was the sustained vowel, followed by the connected speech and spontaneous conversation (Tables 2 and 3). The efficiency of the PEV-SHI, given by the Area Under the Curve (AUC) of the ROC curve, demonstrated that the majority of the parameters is adequate to differentiate individuals with hearing impairment from individuals with normal hearing, especially for the connected speech and spontaneous conversation. Table 4 presents the AUC for each parameter for the separated groups and for the groups together. Table 5 illustrates the cutoff values and highest levels of sensitivity and specificity for each parameter for all of the groups together.
Regarding the correlation between scores of the PEV-SHI with an external clinical criterion, there are significant and positive correlation between the scores of the G parameter         of the GRBAS scale and the score of the general vocal perception of the PEV-SHI, indicating that as the general vocal perception increases in the VAS, it increases in the numerical scale (NS) and vice-versa. Most correlations were classified as good and excellent. This analysis was not performed for the connected speech and spontaneous conversation for G3, since there was no variability of responses in the NS (Table 6).

Interpretability
To determine interpretability of the PEV-SHI the ROC curve was used to set the cutoff values based on the parameter general perception of the voice quality and the parameter G from the GRBAS scale. The cutoff values in the VAS were obtained by correlation between the VAS and the NS ( Table 7). The maximum efficacy rule was used to estimate the cutoff values, considering the highest values of sensitivity and specificity, which were concomitantly combined with the highest values of efficiency. The cutoff values were obtained by group and task. In some cases, the analysis was not performed because there was no variability of responses in the NS (Table 7).

Burden
Respondent burden refers to the recording procedure of the voice samples. In this study, the PEV-SHI was used to assess the voice of 156 individuals, divided into the EG and CG by age range. The individual had to attend the location of the recording where they received instructions to perform the three tasks. The time for the recording was about 10 min. The PEV-SHI was considered not suitable for individuals with hearing impairment who had poor language development and could not perform the speech tasks. Administrative burden included a quiet environment, recording equipment (computer, sound card and microphone), headphones, a printed PEV-SHI, pencil and ruler. The time for analysis by the raters was about two minutes for each speech task. To complete the PEV-SHI the rater must be familiar with all of the definitions and instructions for completing the analysis with the protocol. The rater must also be experienced in the evaluation of normal and altered voices, and with speech and voice of people with cochlear implants.

Discussion
Auditory-perceptual evaluation of the voice quality is a key element in the clinical assessment of the voice. The use of non-specific instruments, however, may not approach some relevant characteristics of a certain population. The population with hearing impairment is an example of people with particular voice features the exceed alterations at a glottal level. An instrument that approaches all of the potential attributes of the voice is, therefore, of great importance to characterize with precision the voice of this population. The validation process was conducted in steps. 19 Based on the process of development, revision and pilot test, content validity was established. By the definition of content validity, 26,27 the PEV-SHI addresses in a relevant and representative way the voices of subjects with hearing impairment with cochlear implants and is adequate for its elements of instructions, parameters and scoring.
In the following steps, auditory-perceptual evaluation was performed with the PEV-SHI for the extraction of the psychometric measures of reliability, efficiency, sensitivity, specificity and cutoff values.
Reliability measures, extracted with the ICC based of the repetition of the auditory-perceptual evaluation of 20% of the sample showed good and excellent reliability for the majority of the parameters ( Table 1). The PEV-SHI has good test-retest reproducibility, and therefore, is considered a reliable instrument. 19 The comparison of the EG with the CG using ANOVA (Tables 2 and 3) evidenced significant differences for most parameters. The task with less significant differences was the sustained vowel. With the variance analysis alone, it is not possible to determine whether these results were due to the voice characteristics of the populations or to the sensitivity of the PEV-SHI, since this test compares means between the populations. 28 Measures given by the ROC curve (Table 4) complemented and corroborated this analysis. The ROC curve represents the relationship between the sensitivity and the specificity of any given test. 29 The AUC measures the performance (efficiency) of the test, in this case, its accuracy to identify individuals with hearing impairment. The closer the AUC is to 1.0, the better the ability of the instrument to perform an adequate classification as to what it proposes to evaluate. A test that is not able to discriminate between individuals with or without a certain disorder has an AUC of 0.5. 29 In this study, parameters with AUC ≤ 0.5 were considered not suitable for distinguishing IC users and listeners. Values between 0.5 and 0.7 were considered acceptable and values ≥0.7 were considered adequate. There were cases of AUC ≤ 0.5 for isolated parameters of the PEV-SHI in all of the groups.
In the sustained vowel, there was occurrence of AUC ≤ 0.5 for the parameters breathiness (G1), anterior resonance (G2 and G3) and breathiness (G3 and all) ( Table 4). The sustained vowel is a test of glottal efficiency, 30 essentially evaluating the ability of an individual to control the aerodynamic forces of the pulmonary airflow and myoelas- tic forces of the larynx 31 and does not suffer interference of suprasegmental features of the voice. Stability is an important featured to be evaluated and, in fact, this was the only parameter of the PEV-SHI that had AUC ≥ 0.7 for all groups in this task (Table 4). The parameters that least differentiated the CG from the EG were breathiness and anterior resonance. Most of the remaining parameters presented with AUC > 0.5 and <0.7 for the sustained vowel. The PEV-SHI was more efficient to differentiate the population with CI from the population with normal hearing for the tasks involving speech. Even though the parameters pitch and loudness had acceptable AUC in the connected speech and spontaneous conversation for most groups (Table 4), these parameters have great clinical relevance, are easily interpreted and are routinely used in voice assessment. 17 Breathiness is an expected feature in children and women due to laryngeal configuration. 32 The same occur with roughness in the male voice. 33 Although these parameters do not strongly distinguish the EG from the CG, they are important for the PEV-SHI since they are expected voice characteristics for determined age and gender, regardless of the hearing loss.
The same occurs with the parameters resonance and intonation. Resonance had AUC ≥ 0.7 for two groups in the connected speech and one group in the spontaneous conversation (Table 4). Individuals with hearing loss tend to present resonance disorders, since the lack of auditory monitoring leads them to use inadequate vocal tract adjustments in the voice production. A mixed resonance is a common feature, 3 and for this reason, the PEV-SHI sought to approach the all possible types of resonance. Intonation disorders is a perceived feature of the voice if individuals with hearing loss, 34,35 however this parameter differentiated the EG from the CG with AUC > 0.5 < 0.7 for three groups and AUC ≥ 0.7 for one group. For every group the AUC for the parameter speech rate was ≥0.5, so it was excluded from the protocol.
The sensitivity and specificity of an instrument refer to its ability to correctly detect individuals, respectively, with or without a disorder. 36 The results presented on Table 5 suggest that the PEV-SHI is susceptible to error, especially in the sustained vowel. These errors occur when a normal hearing individual is classified as an individual with hearing impairment (false positive) and vice versa (false negative).
To determine construct validity, a simple scale for auditory-perceptual evaluation with power of discrimination of different degrees of vocal deviation using a robust parameter (G of the GRBAS scale) and a unidimensional scale (general vocal deviation of the PEV-SHI) were used. 5,33 This also allowed correspondence between the VAS and NS (numerical scale) 5,20,33,37,38 and understanding the boundaries between normal and disordered voices between the EG and CG. Findings showed significant and positive correlation between the scores of the G parameter of the GRBAS scale and the score of the general vocal perception of the PEV-SHI ( Table 6).
The cutoff value is a number from which the result of a test is classified either as positive (presence of deviation, disorder or illness that is being tested) or negative (absence of what is being tested). If the result found is smaller than the cutoff value, the result of a test is classified as negative and vice versa. 39 Depending on the group and task, the PEV-SHI presented with different cutoff values to differentiate the CG from the EG, with AUC close to 1.0 and satisfactory values of sensitivity and specificity (Table 7). This discriminatory power can assure reliable use of these measures on clinical and scientific contexts. 29 As the results of Table 7 show, cutoff values vary with the speech task, the parameter 5,33 and age range. In practice, however, it is suggested that the rater use the most robust cutoff values to distinguish the voice of individuals with hearing impairment, providing greater reliability in the use of this instrument for the population with cochlear implants. These results were obtained for all of the groups together in the spontaneous conversation (Table 7). For the PEV-SHI, therefore, the 30.5 value corresponds to the cutoff point between normal variability and mild vocal deviation; the 49.0 value corresponds to the cutoff point between mild and moderate vocal deviation; and the 69.5 value corresponds to the cutoff point between moderate and intense deviation (Fig. 1).
The results discussed in this section show that the sustained vowel did not differentiate the voices of the individuals with cochlear implants from those with normal hearing as robustly as the connected speech and spontaneous conversation. Even so, the vowel can be used, with caution, for evaluation with the PEV-SHI, considering that this task has great importance for the global comprehension of the vocal behavior. 30,40 Some of the benefits of using the PEV-SHI for the target population over existing auditory-perceptual tools include: evaluating voice while taking into account in a single instrument all elements of the voice production (respiration, phonation, resonance and suprasegmental aspects) 1---3 ; the possibility to unravel the resonance and evaluate predominance of one or more resonance focus; assessing instability; having a VAS for an additional parameter and assessing the general vocal perception after taking into account all of the parameters.
Although this validation study was performed with CI users, the PEV-SHI can also be of great contribution other groups of individuals with hearing impairment, such as users of hearing aids or other implantable devices. The extraction of psychometric measures for other groups with hearing loss is recommended, since the cutoff values established in this study correspond to CI users of the studied age rage. Further studies include also the use of the PEV-SHI with individuals with hearing impairment during the stages of puberty and aging. The PEV-SHI is currently undergoing transcultural adaptation for the English language.
The PEV-SHI is a reliable and useful tool for assessing the particularities of the voice of individuals with hearing impairment with cochlear implants and can be used in research to standardize evaluation and facilitate information exchange among services. It can also be used as part of the clinical assessment of patients, which should encompass all aspects of oral communication, from auditory abilities, to language development, orofacial functions and voice pro-duction. Finally, it can be useful in defining therapeutic goals, and follow up of the patient.

Conclusion
The content of the Protocol for the Evaluation of Voice in Subjects with Hearing Impairment (PEV-SHI) is adequate for the intended target population. It has good test-retest reproducibility and is sensible and reliable for all the studied age groups, especially for the connected speech and conversational speech. The cutoff values with maximum sensibility and specificity were those found for the overall population in the conversational speech and these can be used as values of reference in the application of the PEV-SHI. The cutoff values to be considered are, therefore, from 0 to 30.5 normal variability of the voice quality, from 30.6 to 49 mild deviation, from 50 to 69.5 moderate deviation and above 69.5 intense deviation. The use of the PEV-SHI requires adequate sound capture, clinical experience and familiarity of the rater with the voice of individuals with hearing impairment.

Conflicts of interest
The authors declare no conflicts of interest.