Development of a Visuoperceptual Measure for Fiberoptic Endoscopic Evaluation of Swallowing (V-FEES) in Adults with Oropharyngeal Dysphagia: An International Delphi Study

Visuoperceptual evaluation of fiberoptic endoscopic evaluation of swallowing (FEES) is a commonly used assessment in dysphagia or swallowing disorders. Currently, no international consensus exists regarding which visuoperceptual measures to use for the analysis of FEES recordings. Moreover, existing visuoperceptual FEES measures are limited by poor and incomplete psychometric data, identifying an urgent need for developing a visuoperceptual measure to interpret FEES recordings. Following the COSMIN group’s (COnsensus-based Standards for the selection of health Measurement INstruments) psychometric taxonomy and guidelines, this study aimed to establish the content validity of a new visuoperceptual FEES (V-FEES) measure in adults with oropharyngeal dysphagia. Using the Delphi technique, international consensus was achieved among dysphagia experts across 21 countries, resulting in a new prototype measure for V-FEES, comprising 30 items, 8 function testing items (i.e., specific tasks performed by patients while observing and rating items), and 36 unique operationalisations (i.e., defining items into measurable factors that could be measured empirically using visuoperceptual observation). This study supports good content validity for V-FEES, including participants’ feedback on the relevance, comprehensiveness, and comprehensibility of the included items. Future studies will continue the instrument development process and determine the remaining psychometric properties using both the classic test theory (CTT) and item response theory (IRT) models.


Introduction
Since its introduction in the 1980s, fiberoptic endoscopic evaluation of swallowing (FEES) has been an important instrumental assessment used to evaluate dysphagia or swal-lowing problems [1]. Several clinical protocols [2][3][4] and visuoperceptual assessments [5][6][7][8] have been published since the FEES was introduced to practice and research. Dysphagia or swallowing disorders are a frequent symptom of one or more underlying anatomical abnormalities or impairments and disorders in cognitive, sensory and motor acts involved in transporting food and liquids from the mouth to the stomach [9]. Dysphagia can lead to reduced efficiency and safety of swallowing, failure to maintain hydration and nutrition, risk of choking and aspiration leading to pulmonary complications, and reduced quality of life [9,10]. Dysphagia may refer to oropharyngeal disorders involving upper digestive tract problems and esophageal disorders involving lower digestive tract problems (or a combination of these). The prevalence of oropharyngeal dysphagia in the general population has been estimated to range between 2.3 and 16% [11]. Pooled prevalence estimates of oropharyngeal swallowing problems determined by meta-analyses, for example, are as high as 42% in post-stroke [12], 31.5% (95% CI 8.9-68.4%) in head and neck oncology [13], 36.9% (95% CI 30.7-43.6%) in Parkinson's disease [14], 44.8% (95% CI 40.4-49.2%) in multiple sclerosis [15], 50.4% (95% CI 36.0-64.8%) in cerebral palsy [16], and (95% CI 26.7-95.0%) in dementia 72.4% [13].
Prevalence data may differ depending on which screen or assessment has been used, but in general, instrumental assessments (i.e., endoscopic and videoradiographic recordings of the swallowing process) are considered to be the most optimal evaluation methods to identify dysphagia, especially because both 'gold standard' assessments can diagnose aspiration (including silent aspiration) and other physiological problems in the pharyngeal phase [17]. However, no international consensus exists regarding which visuoperceptual measures to use to analyse these video recordings. Moreover, insufficient psychometric evidence has been identified from the literature to recommend any individual measure as valid and reliable to interpret swallowing recordings [18]. Consequently, implementing assessments with poor psychometric qualities will undermine evidence-based practice and research as current health status or intervention effects cannot be objectified if measures lack psychometric robustness [19,20].
The lack of robust psychometric visuoperceptual measures to evaluate the 'gold standard' instrumental recordings identifies an urgent need for instrument development. A recent Delphi study aimed to achieve international consensus on the visuoperceptual evaluation of videoradiographic recordings of swallowing (VideoFluoroscopic Swallowing Study or VFSS) as a starting point for instrument development [21], but there is no such study yet for the evaluation of FEES recordings. The ongoing discussion in the literature about which instrumental assessment to use shows advantages and disadvantages for both FEES and VFSS. Videoradiography is associated with radiation exposure, expensive resources (e.g., equipment and required personnel including a physician and allied health clinician), and limited availability to clinicians. Conversely, videoendoscopy requires no radiation, is less expensive, and is usually more accessible to healthcare providers, but it cannot assess the oral phase of swallowing and shows a brief period of white-out during the actual swallow act. Apart from the listed advantages and disadvantages, the choice of implementing either FEES or VFSS in dysphagia care will also depend on the main purposes of the examination, as well as factors related to the clinical environment (such as availability and/or affordability).
The COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) group established an international consensus-based taxonomy, terminology, and definitions of measurement properties [22,23]. The framework comprises nine measurement properties within three domains: reliability, validity and responsiveness. In line with the COSMIN framework, content validity is the most important measurement property, referring to the degree to which an instrument's content adequately reflects the construct to be measured [22]. A measure is considered of questionable value for either research or clinical practice if content validity is flawed or lacking. To meet the COSMIN criteria for good content validity, both recent literature on the construct of interest and clinical experts should be involved in the process of developing new measures. According to COSMIN, professionals should be asked about the relevance (all items should be relevant to the construct of interest within a specific population and context of use), comprehensiveness (no key aspects of the construct should be missing), and comprehensibility (the items should be understood by the target populations as intended) of the items of a measure [22].
Informed by the COSMIN guidelines [22], this study aims to report on the first step towards developing a visuoperceptual measure to evaluate endoscopic recordings of swallowing in oropharyngeal dysphagia. To ensure good content validity, this study reports on an international Delphi study involving dysphagia experts to seek agreement on definitions and items of a prototype of visuoperceptual measurement in FEES recordings.

Study Design
This study used the Delphi technique to develop group consensus between experts on a defined topic using a series of survey rounds as part of a structured process [24]. Consecutive Delphi surveys are modified based on the percentage of agreement and participants' feedback from preceding survey rounds. Participants remain anonymous throughout all Delphi rounds to avoid bias and discourage individuals from dominating the consensus process. The same participants are invited to complete each Delphi round, although some participants may choose to withdraw during the Delphi process. Delphi rounds continue until group consensus has been reached or it becomes apparent that consensus cannot be reached. This study used online surveys (e-Delphi) to seek expert consensus regarding the items to be included in a visuoperceptual measure to evaluate FEES recordings in adult patients with oropharyngeal dysphagia.

Participants
To be eligible to participate in the Delphi study, participants needed to: (1) have English reading skills adequate for work (e.g., understanding of the main points of texts and technical terms within the participant's field of expertise); and (2) have five (full-time equivalent) or more years clinical, research or teaching experience in dysphagia with a caseload of at least 50% or more related to adults with dysphagia (e.g., provision of clinical services, research, staff development, academic teaching) and using FEES.

Recruitment
This study was approved by the Curtin Human Research Ethics Committee (Curtin University, Perth, Australia: HRE2021-0187). Delphi participants were recruited via professional organisations (e.g., European Society for Swallowing Disorders), from the professional networks of the authors, through reviewing relevant publications in FEES, and by asking recruited participants to identify other potential participants (snowballing). Identified participants were sent an invitation and an information sheet about the Delphi study. Participants who accepted the invitation were sent a link to the first online Delphi survey. As all survey data were processed anonymously, all participants were reinvited to consecutive Delphi rounds regardless of whether they had completed previous rounds.

e-Delphi Surveys
Definitions for main concepts and a list of potential measure items were constructed based on: (a) existing visuoperceptual measures in FEES as identified in a previous systematic review [18], (b) selected consensus-based definitions related to the visuoperceptual analysis of videoradiographic recordings of swallowing in oropharyngeal dysphagia as reported in a recent Delphi study [21], (c) other relevant international literature, and (d) group discussions between the authors. Potential items were presented to participants across three rounds via an online survey platform Qualtrics over fourteen months (December 2021-January 2023). Participants indicated consensus on definitions, comprehensibility and relevance using a five-point scale. In both the second and third Delphi rounds, participants were also asked about their choice of function testing (i.e., specific tasks performed by patients while observing and rating items) and the operationalisation thereof (i.e., defining items into measurable factors that could be measured empirically using visual perceptual observation). Participants who disagreed were asked to provide further details and suggestions for revision in open text boxes. In addition, participants were asked about the comprehensiveness of the preliminary measure and to identify missing items to capture underlying constructs fully. At the end of each Delphi survey, open-ended comment sections were available. Between Delphi rounds, participants received summarised findings on participants' characteristics, percentage agreement on definitions and items, and revisions made using their feedback (i.e., rewording definitions, revising, and adding or deleting items). Samples of all three Delphi rounds are presented in Supplementary File S1, providing further details on structure and content.

Analysis
Survey responses were analysed using the Statistical Package for the Social Sciences [25]. Criteria for agreed consensus were defined before the first Delphi survey using a five-point scale (i.e., strongly disagree, disagree, neither agree nor disagree, agree, strongly agree). Consensus between participants was achieved if at least 70% of respondents indicated 'Strongly agree' or 'Agree' for the formulation of definitions, ease of understanding items (comprehensibility), the importance of including items (relevance), operationalisation of items, and function testing [26,27]. Participants' responses to open-ended questions were analysed per item before rewording the original items or creating a new item based on participants' suggestions. As proposed by participants, responses to open-ended questions on the measure's comprehensiveness were grouped into themes where potential new items were identified based on the aggregated feedback. Overall, measure revisions were based on themes noted in most participants' comments, feedback supported by literature, and identified gaps or ambiguities in items. The total number of Delphi rounds was to be determined by the level of agreement following each round. When performing data analysis, the authors were blinded to the identity of the participants.

Delphi Process
In addition to the structure and content of Delphi rounds with examples (Supplementary File S1), a summarised overview of the Delphi process is outlined in Table 2.  [21]. An additional 11 items were defined based on other relevant literature and authors' expert opinions. As the first 21 items had achieved international consensus on definition and comprehensibility in the previous Delphi study by Swan et al. (2021) [21], participants were only asked to rate the relevance of these 21 items for the visuoperceptual evaluation of FEES recordings using a five-point ordinal scale (strongly disagree to strongly agree). For the remaining 11 items, participants rated both the relevance and the degree of agreement with each item's suggested definitions and comprehensiveness using similar ordinal scales. If participants disagreed or strongly disagreed, they were invited to comment in an open text box.
Fifteen items were included without requiring revisions, whereas seven were slightly reworded. The following six items were excluded as they were not considered relevant for FEES evaluation: 'Arytenoid tilting', 'Base of tongue to posterior pharyngeal wall approximation', 'Epiglottic return to rest position', 'Linguavelar seal', 'Tongue base activity', and 'Swallow initiation'. The relevance percentage scores for these six items ranged between 39.1% and 65.8%. The original item 'Piecemeal deglutition' and its corresponding definition were also deleted, but the item name was used to rename 'Clearing swallow (oral)'. 'Nasopharynx penetration' was the eighth item that was deleted as participants doubted its visibility in FEES. Two items ('Premature spillage [Liquids]' and 'Premature spillage [Other than liquids]') moved to the second Delphi round after renaming both item names (replacing 'Premature' with 'Posterior') and rewording their definitions. Further details can be found in Table 2 (Overview of the Delphi process) and Table 3 (Relevance of items to the visuoperceptual evaluation of FEES).    [21] were rated for relevance for FEES only as items resulted from a previous Delphi study on visuoperceptual evaluation of videofluoroscopic evaluation of swallowing recordings, having achieved international consensus on definition and comprehensibility. b If 'Disagree' or 'Strongly disagree', changes can be suggested in comment boxes. c Consensus agreement is defined as ≥70% of participants rating 'Strongly agree' or 'Agree'. d Participants' feedback on definition and comprehensibility.

Delphi Round II
The second Delphi round consisted of three sections. The first section included the two items that were carried over from the first round ('Posterior spillage [Liquids]' and 'Posterior spillage [Other than liquids]'). Both items were accepted without further rewording. The second section targeted the comprehensibility of the measure under development. Participants were asked to study a list of all included items from the previous Delphi round (n = 24, including both items from the first section) and identify any missing items relevant for visuoperceptual evaluation in FEES. Four new items were suggested ('Esophageal backflow', 'Symmetry', 'Pooling of secretions', and 'Respiratory rate and effort'). Based on participants' feedback, all four items and corresponding definitions were carried over to the third Delphi round for agreement ratings.
The final section focused on how to assess each item or aspect of the item (i.e., operationalisation) and, where applicable, the tasks that needed to be performed by patients (i.e., function testing) while observing and rating the item. For each operationalisation, examples of possible response scales were provided. Several operationalisations and function testing items were listed for participants' evaluation. For example, for the item 'Aspiration', the following three aspects of the item were presented (each with a different operationalisation): (1) 'Aspiration of material (e.g., liquids, solids)' operationalised by 'Volume of aspirated bolus (e.g., nil material, a small amount of material, a large amount of material)'; (2) 'Patient response to aspiration (i.e., an overt sign of aspiration, such as cough/throat-clear)' operationalised by 'Cough (e.g., immediate cough, late cough, no response)'; and, (3) 'Success of ejecting aspirated bolus' operationalised by 'Success in ejecting material from the airway (e.g., complete clearing, incomplete clearing, nil clearing)'. For the item 'Pharyngeal constriction', both a 'Saliva swallow' and a 'High pitched strained 'eeee' (pharyngeal squeeze manoeuvre)' were suggested to test function. Participants rated their level of agreement with operationalisations and function testing using the same fivepoint ordinal scales as in previous Delphi sections, including the option for comments in case of disagreement (see Supplementary File S1).
Agreement ratings for function testing (n = 10) and operationalisations (n = 35) for all 24 items were determined by considering the comments listed by participants. This resulted in a total of eight function testing items and thirty-nine operationalisations being included in the third Delphi round. Similarly, based on participants' feedback, two items ('Vocal fold medialisation' and 'Laryngeal vestibule closure') were rephrased again and included in the third Delphi round.

Delphi Round III
The third Delphi round consisted of three sections. The first section asked participants to rate their agreement with both revised and renamed items from the second round. One revised item was accepted without need for rewording. However, even though the other item, 'Laryngeal vestibule closure', was considered relevant to the visuoperceptual evaluation of FEES, participants could still not agree on a definition after three Delphi rounds and thus it was excluded. The second section asked about the relevance and agreement on definitions for all four new items, as suggested in the second Delphi round. Participants agreed on including three new items, but the fourth item, 'Respiratory rate and effort', was excluded due to low relevance ratings. Further, using participants' feedback, the items 'Regurgitation' and 'Esophagopharyngeal backflow' were combined into a new item and renamed 'Esophageal backflow'.
The third section of this final Delphi round asked about participants' agreement with the scales used to operationalise the included items using the same five-point ordinal scales as used in previous rounds. For example, after participants agreed in round two to operationalise the item 'Bolus holding (to command)' by the presence of material in the pharynx, a three-point ordinal scale (i.e., Nil material present in the pharynx; <one-third of bolus present in pharynx; ≥one-third of bolus present in pharynx) was presented for agreement ratings in round three. In the end, 29 out of 39 operationalisations presented were accepted for inclusion in the prototype measure. Table 4 provides an overview of agreement ratings of function testing and operationalisations.

Final Prototype
This Delphi study resulted in the prototype measure, Visuoperceptual measure for Fiberoptic Endoscopic Evaluation of Swallowing (V-FEES). The final prototype measure comprises 30 items, 8 function testing items and 36 unique operationalisations (see Table 4). The final percentage agreement for the relevance of the included items to visuoperceptual evaluation to FEES is presented in Table 3: mean 86.7% (SD 9.85%). Table 5 provides an overview of the included items and percentage agreement with definitions (mean 84.0%; SD 6.71%), and Table 4 shows participants' agreement with function testing (mean 81.5%; SD 8.60%) and final operationalisations (mean 75.8%; SD 10.21%).    White out A flash of intense white glare at the maximal constriction of the swallow due to the decreased distance between pharyngeal tissue and the light source. 79.7 I

Content Validity
This study is the first step towards developing and validating a visuoperceptual measure to evaluate fiberoptic endoscopic recordings of swallowing in adults with oropharyngeal dysphagia. To meet the COSMIN guidelines for content validity [23], an international Delphi study was conducted to seek agreement among dysphagia experts on definitions and items of a prototype measure and achieve consensus on function testing and operationalisations for the included items, covering all three aspects of content validity (i.e., relevance, comprehensibility and comprehensiveness).
An initial number of 64 dysphagia experts completed the first Delphi round, of whom 41 experts also completed the final, third Delphi round. As the COSMIN guidelines consider a minimum of 30 experts to support adequate and a minimum of 50 experts to support very good methodological quality for quantitative studies (e.g., Delphi study) [22], the current study meets the COSMIN standards for the required number of respondents. Further, most participants had completed a higher degree by research (92.3-93.7%), and most experts reported having over ten years of experience working with FEES in adult patients with dysphagia (73.4-82.9%) of whom between a quarter and a third of experts (23.4-34.6%) noted over 20 years of experience. Therefore, the Delphi participants represented a highly qualified and experienced dysphagia expert group.

Definitions and Items
The final prototype measure V-FEES includes 12 item definitions from a previously published international Delphi study [21], whereas consensus agreement was achieved for the remaining new definitions in the current study. Overall, relevance ratings were high (mean 86.7%), with seven items (i.e., 'Aspiration', 'Cough (reflexive)', 'Penetration', 'Pharyngeal residue', 'Pooling of secretions', 'Premature spillage [Liquids]', and 'Silent aspiration') showing ratings above 95% (range 96.9-100%). Similarly, agreement of item definitions was high (mean 84%), with one item ('Esophageal backflow') achieving ratings above 95% (97.2%) after three Delphi rounds. For one item ('Laryngeal vestibule closure'), however, participants could not agree on a definition despite high relevance ratings (78.1%), after which the item was excluded from the prototype. For all other items, disagreements about terminology and phrasing were resolved within three Delphi rounds.

Function Testing and Operationalisations
Although participants agreed on function testing with minimal need for discussion, achieving consensus on the scales used to operationalise the included items was more challenging. Two recurrent topics for discussion remained unresolved. The first point of contention involved describing the location of residue or material in the pharynx. Participants disagreed on anatomical boundaries and reference scalars. As a compromise, the authors agreed that anatomical descriptors would be augmented by providing example pictures for each level of the location scales. The second point of contention involved how to report on volumes of material or bolus. The authors decided, based on participants' feedback, to retain three-point ordinal scales as suggested (e.g., item 'Pharyngeal residue': (1) no residue or minimal coating [none-trace)]; (2) <one-third of bolus present in pharynx [mild-moderate]; (3) ≥one-third of bolus present in pharynx [severe]). During the next stage of instrument development, the implementability and reliability of volume scales will be evaluated.

Strengths and Limitations
An important strength of the current Delphi study is that it did not solely focus on participants' percentage agreement with definitions, items, function testing and operationalisations but emphasised incorporating opportunities for feedback and discussion. Participants were encouraged to provide the logic for their ratings and comment on the study in general after each Delphi round. The authors used these arguments and comments to make decisions about the wording of items and the conceptualisation of response options. In addition, participants were informed about the previous round's overall results between Delphi rounds, including revisions made based on experts' feedback. This approach ensured that participants' views and opinions were carefully considered in constructing the V-FEES.
However, even though this Delphi study represented many different geographical locations (i.e., 21 countries across 5 continents) and experts from various professional backgrounds, study outcomes were strongly influenced by the viewpoints of the included participants. Furthermore, participant dropout across Delphi rounds may impact results [24], even though the completion rate is considered to be within the expected rate for web-based survey studies [28,29]. Finally, this Delphi study is the first step towards developing the V-FEES. The current results do not address whether the items are valid or can be measured reliably. Future research will focus on determining the psychometric properties of the V-FEES.

Future Research
Achieving consensus among international experts on definitions and items of a prototype visuoperceptual measure for FEES recordings (V-FEES) supports the content validity of the newly developed measure. This constitutes an important milestone as content validity is considered a measure's most important psychometric property according to the COSMIN framework [22]. Future studies will trial the V-FEES in patients with dysphagia to determine its psychometric properties using both classic test theory (CTT) and item response theory (IRT; Rasch analyses).
The COSMIN framework will guide the psychometric evaluation of V-FEES. The internal structure of V-FEES will be defined by evaluating structural validity and internal consistency, after which reliability, measurement error, measurement invariance, and hypotheses testing for construct validity (e.g., convergent validity) will be assessed. Responsiveness will be evaluated by comparing pre-and post-treatment data and reporting on the measure's sensitivity to change. Because no internationally agreed 'gold standard' in visuoperceptual evaluation of FEES is available, criterion validity cannot be determined. Lastly, although not considered psychometric properties, the feasibility and interpretability of V-FEES in daily clinical practice will be evaluated by assigning qualitative meaning to quantitative scores. Following the COSMIN guidelines and using robust psychometric methodologies, V-FEES aims to be the first valid and reliable visuoperceptual measure for FEES based on international expert consensus.

Conclusions
This study has reported on the first steps towards validating a visuoperceptual measure to evaluate FEES recordings of swallowing in adults with oropharyngeal dysphagia by establishing the content validity of the V-FEES. Following COSMIN guidelines, an international Delphi study among dysphagia experts resulted in a new prototype measure comprising 30 items, 8 function testing items and 36 unique operationalisations. The findings from the current study support good content validity by incorporating participants' feedback on the relevance, comprehensiveness, and comprehensibility of included items. Following the instrument development process, future studies will determine the psychometric properties of V-FEES using both classic test theory (CTT) and item response theory (IRT).
Funding: This research received no external funding.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Human Research Ethics Committee of CURTIN UNIVERSITY (protocol code HRE2021-0187 and date of approval 21 April 2021).
Informed Consent Statement: Informed consent was obtained from all participants involved in the study.