Reliability and Responsiveness of a Novel Device to Evaluate Tongue Force

Background: Measurements of tongue force are important in clinical practice during both the diagnostic process and rehabilitation progress. It has been shown that patients with chronic temporomandibular disorders have less tongue strength than asymptomatic subjects. Currently, there are few devices to measure tongue force on the market, with different limitations. That is why a new device has been developed to overcome them. The objectives of the study were to determine the intra- and inter-rater reliability and the responsiveness of a new low-cost device to evaluate tongue force in an asymptomatic population. Materials and Methods: Two examiners assessed the maximal tongue force in 26 asymptomatic subjects using a developed prototype of an Arduino device. Each examiner performed a total of eight measurements of tongue force in each subject. Each tongue direction was measured twice (elevation, depression, right lateralization, and left lateralization) in order to test the intrarater reliability. Results: The intrarater reliability using the new device was excellent for the measurements of the tongue force for up (ICC > 0.94), down (ICC > 0.93) and right (ICC > 0.92) movements, and good for the left movement (ICC > 0.82). The SEM and MDC values were below 0.98 and 2.30, respectively, for the intrarater reliability analysis. Regarding the inter-rater reliability, the ICC was excellent for measuring the tongue up movements (ICC = 0.94), and good for all the others (down ICC = 0.83; right ICC = 0.87; and left ICC = 0.81). The SEM and MDC values were below 1.29 and 3.01, respectively, for the inter-rater reliability. Conclusions: This study showed a good-to-excellent intra- and inter-reliability and good responsiveness in the new device to measure different directions of tongue force in an asymptomatic population. This could be a new, more accessible tool to consider and add to the assessment and treatment of different clinical conditions in which a deficit in tongue force could be found.


Introduction
The tongue is a muscle that is part of the stomatognathic system and plays an important role in phonation, breathing, and eating [1,2]. The tongue has been classified as a muscular hydrostat structure due to its ability of movement and deformation without a column of air or fluid. It is characterized as being portable, having low-cost sensors, and being able to capture pressure while swallowing food or fluids. Oropress demonstrated good-to-excellent ICC values (ICC = 0.86) for its reliability [17]. Nevertheless, this device only has a pilot study trying to demonstrate its validity [17]. A bigger sample is needed to develop a good study of validity, and an intra-and inter-rater reliability study must also be carried out. This is very important to corroborate the safety, the psychometric properties, and the clinical utility of all these devices.
In order to overcome the limitations of the devices and improve the features, a validated, portable, handy, and lower-cost prototype device to measure tongue force was developed. The prototype has an intuitive interface, and it has been developed to assess and train tongue force in different movements, allowing its use not only for professionals, but also for patients for clinical and home rehabilitation. The software includes videogames with biofeedback for training at home which could increase the patient's adherence to the treatment [23]. The new device promotes patient independence in the rehabilitation process and reduces social and health care costs [24]. This new instrument, unlike the others developed up to now, proposes accurate assessments and future treatments based on gamification (Table 1). Moreover, compared to current tongue force instruments, this new device has already demonstrated good validity values and a high intrarater reliability, ensuring its safe use in the clinic [25]. Nevertheless, as a first step in the validation process, good inter-rater reliability for this device is also needed. Reliability is defined as the probability that a system, instrument, or device could perform a specific function in certain circumstances. It refers not only to the agreement but also the consistency between measurements. Moreover, random and systematic errors are needed to obtain reliability data and ensure accurate results. For this reason, devices must demonstrate a good stability and reliability before their use or commercialization. This makes the device safer during its use in a variety of clinical and research settings as well as by any type of person (professionals, patients, or patients' relatives) and ensures the security to be used with patients and different environmental conditions. According to this, it is established that this type of study should be developed in healthy subjects at first for trying to protect vulnerable individuals and could ensure that the device is safe for its condition. After that, reliability studies must be performed in patients for demonstrating the clinical usefulness [26]. The main objectives of this study were to determine the intra-and inter-rater reliability and the minimum detectable change (responsiveness) in the maximum tongue force measurements using a newly developed device. The authors of the study wanted to demonstrate that this device could measure with the same reliability independently of the professional or patient who is using it. Since current commercial systems do not have enough evidence of their validity or inter-rater reliability and due to the high costs of technologies such as fluoroscopy, currently applied screening techniques are very subjective and depend on the training and experience of the therapist. This makes the devices less reliable. For this reason, demonstrating the reliability and sensitivity of our system would help in developing more objective assessments, regardless of the therapist performing the measurements.

Materials and Methods
An intra-and inter-rater reliability single-blind study with repetitive measurements was conducted based on the guidelines for reporting reliability and agreement studies (GRRAS) [27]. This study was approved by the Ethics Committee from the Centro Superior de Estudios Universitarios La Salle (CSEULS) of the Universidad Autónoma de Madrid (project code: CSEULS-PI-036/2019). Subjects were recruited from the CSEULS of the Universidad Autónoma de Madrid. Participants were recruited through nonprobability sampling.

Subjects
A total of 26 asymptomatic subjects older than 18 participated in this study. The sample size was calculated based on the intraclass correlation coefficient (ICC) values obtained in previous studies [28][29][30][31]. An ICC of 0.90 was estimated based on the hypothesis. A sample of 26 subjects with 2 measurements per subject was needed to achieve 80% power (β = 0.2) to detect an ICC of 0.90, with a significance level of 0.05.
Subjects were excluded if they presented TMD, cancer, or an active infection of the neck/head/mouth, had a history of orofacial or cervical surgery, had temporomandibular/orofacial/cervical acute pain before or during the test, were undergoing physical therapy for the neck or craniofacial region, had more than 6 points out of 10 on the subjective perception of fatigue scale, or had neurological disorders and rheumatic systemic disorders.

Instrumentation
The new low-cost prototype device, introduced in a previous article [23], was specifically designed and developed to measure tongue force objectively and accurately. The device consists of a physical part and associated software. The physical part consists of a hardware system that measures the pressure exerted on a piezoelectric sensor (FSR 402, Interlink Electronics Inc., Irvine, CA, USA) [32] and transmits the information with an Arduino UNO via a wired connection to a personal computer, where the software is located ( Figure 1). This type of sensor is a very thin and flexible piezoelectric that does not cause any discomfort to the patient. The software is responsible for processing and displaying the information in real time. In addition, the software facilitates the recording of the demographic information of the subjects and the information recorded by the sensor is stored in a database for the subsequent extraction of reports ( Figure 2).

Subjects
A total of 26 asymptomatic subjects older than 18 participated in ple size was calculated based on the intraclass correlation coefficient (I in previous studies [28][29][30][31]. An ICC of 0.90 was estimated based on the ple of 26 subjects with 2 measurements per subject was needed to ach 0.2) to detect an ICC of 0.90, with a significance level of 0.05.
Subjects were excluded if they presented TMD, cancer, or an ac neck/head/mouth, had a history of orofacial or cervical surgery, had lar/orofacial/cervical acute pain before or during the test, were under apy for the neck or craniofacial region, had more than 6 points out of perception of fatigue scale, or had neurological disorders and rheum ders.

Instrumentation
The new low-cost prototype device, introduced in a previous art ically designed and developed to measure tongue force objectively device consists of a physical part and associated software. The physic hardware system that measures the pressure exerted on a piezoelect Interlink Electronics Inc., Irvine, South California, EEUU) [32] and mation with an Arduino UNO via a wired connection to a personal c software is located ( Figure 1). This type of sensor is a very thin and fl that does not cause any discomfort to the patient. The software is r cessing and displaying the information in real time. In addition, the the recording of the demographic information of the subjects and t orded by the sensor is stored in a database for the subsequent extractio 2).   The interface has a user-centered design for ease of use in the clinical environment. The device can measure the pressure exerted on the sensor by placing it in different positions. Depending on the positioning of the sensor, it is possible to measure the force exerted in the following movements: lip to lip, tongue elevation (tongue against the anterior part of the hard palate), tongue depression (tongue against the jaw), right tongue lateralization (tongue against the right cheek), left tongue lateralization (tongue against the left cheek), and their combinations.

Procedure
Two experienced physical therapists with more than 3 years' experience working in the cervico-craniofacial area were trained on how to perform the maximum tongue force test and the whole intervention. The biomedical engineer that developed the device specifically helped and trained both physical therapists on how to use the new device. The tongue force test was performed on each participant in a sitting position for the tongue movements mentioned above in Section 2.2. Two measurements of each tongue movement were performed by each rater. The GraphPad Quickcals website was used to randomize which assessor had to go first on the measurements. The measurements were performed on the same day for both raters. Each rater was blind to the other rater's measurements. The subjects and raters were not able to see the results between the 2 measurements performed for each movement.
A single-use hypoallergenic protective measure made of nitrile was used to cover the sensor during the measurements for each subject ( Figure 3). The single-use protection was not changed during the whole test, only between different participants. The subjects were asked to sit with their back against the chair, feet on the ground, and head in its natural position. The tongue sensors were placed by the subjects following the instructions given by the rater according to the movement tested. During the maximum tongue force test, the subjects were then asked to exert the maximum tongue force against the sensor for 10 s. A 5 min resting period was used between each measurement. Firstly, for the lip-to-lip movement, the sensor was placed between the lips, not including the teeth. Secondly, the sensor was placed behind the superior incisors in the anterior part of the hard palate for the tongue elevation movement. Thirdly, the subjects placed the sensor behind the inferior incisors in the jaw for the tongue depression movement. Finally, right and left tongue lateralization movements were developed by placing the sensor in the anterior part of the right and left cheeks, respectively. The whole procedure is described in Figure 4. The interface has a user-centered design for ease of use in the clinical environment. The device can measure the pressure exerted on the sensor by placing it in different positions. Depending on the positioning of the sensor, it is possible to measure the force exerted in the following movements: lip to lip, tongue elevation (tongue against the anterior part of the hard palate), tongue depression (tongue against the jaw), right tongue lateralization (tongue against the right cheek), left tongue lateralization (tongue against the left cheek), and their combinations.

Procedure
Two experienced physical therapists with more than 3 years' experience working in the cervico-craniofacial area were trained on how to perform the maximum tongue force test and the whole intervention. The biomedical engineer that developed the device specifically helped and trained both physical therapists on how to use the new device. The tongue force test was performed on each participant in a sitting position for the tongue movements mentioned above in Section 2.2. Two measurements of each tongue movement were performed by each rater. The GraphPad Quickcals website was used to randomize which assessor had to go first on the measurements. The measurements were performed on the same day for both raters. Each rater was blind to the other rater's measurements. The subjects and raters were not able to see the results between the 2 measurements performed for each movement.
A single-use hypoallergenic protective measure made of nitrile was used to cover the sensor during the measurements for each subject (Figure 3). The single-use protection was not changed during the whole test, only between different participants. The subjects were asked to sit with their back against the chair, feet on the ground, and head in its natural position. The tongue sensors were placed by the subjects following the instructions given by the rater according to the movement tested. During the maximum tongue force test, the subjects were then asked to exert the maximum tongue force against the sensor for 10 s. A 5 min resting period was used between each measurement. Firstly, for the lip-to-lip movement, the sensor was placed between the lips, not including the teeth. Secondly, the sensor was placed behind the superior incisors in the anterior part of the hard palate for the tongue elevation movement. Thirdly, the subjects placed the sensor behind the inferior incisors in the jaw for the tongue depression movement. Finally, right and left tongue lateralization movements were developed by placing the sensor in the anterior part of the right and left cheeks, respectively. The whole procedure is described in Figure 4.

Analysis and Sample Size
The sample size was calculated based on the intraclass correlation coefficient (ICC) values obtained in previous studies [28][29][30][31]. An ICC of 0.90 was estimated based on the hypothesis. A sample of 26 subjects with 2 measurements per subject was needed to achieve 80% power (β = 0.2) to detect an ICC of 0.90, with a significance level of 0.05.
The interclass correlation coefficient and standard error of measurement (SEM) were used to calculate the reliability. The ICC3,1 was designated as the two-way analysis of variance mixed model for the absolute agreement of single measures. The ICC3,2 was designated the same way as the ICC3,1 but using the average of the two measures of each rater to determine the inter-rater reliability [33]. Intraclass correlation coefficient values greater than 0.75 indicate good reliability, those between 0.50 and 0.75 indicate moderate agreement, and those below 0.50 indicate poor agreement [33]. A 95% confidence interval (CI) was also calculated, and p < 0.05 was used as the level of statistical significance.

Analysis and Sample Size
The sample size was calculated based on the intraclass correlation coefficient (ICC) values obtained in previous studies [28][29][30][31]. An ICC of 0.90 was estimated based on the hypothesis. A sample of 26 subjects with 2 measurements per subject was needed to achieve 80% power (β = 0.2) to detect an ICC of 0.90, with a significance level of 0.05.
The interclass correlation coefficient and standard error of measurement (SEM) were used to calculate the reliability. The ICC3,1 was designated as the two-way analysis of variance mixed model for the absolute agreement of single measures. The ICC3,2 was designated the same way as the ICC3,1 but using the average of the two measures of each rater to determine the inter-rater reliability [33]. Intraclass correlation coefficient values greater than 0.75 indicate good reliability, those between 0.50 and 0.75 indicate moderate agreement, and those below 0.50 indicate poor agreement [33]. A 95% confidence interval (CI) was also calculated, and p < 0.05 was used as the level of statistical significance.

Analysis and Sample Size
The sample size was calculated based on the intraclass correlation coefficient (ICC) values obtained in previous studies [28][29][30][31]. An ICC of 0.90 was estimated based on the hypothesis. A sample of 26 subjects with 2 measurements per subject was needed to achieve 80% power (β = 0.2) to detect an ICC of 0.90, with a significance level of 0.05.
The interclass correlation coefficient and standard error of measurement (SEM) were used to calculate the reliability. The ICC 3,1 was designated as the two-way analysis of variance mixed model for the absolute agreement of single measures. The ICC 3,2 was designated the same way as the ICC 3,1 but using the average of the two measures of each rater to determine the inter-rater reliability [33]. Intraclass correlation coefficient values greater than 0.75 indicate good reliability, those between 0.50 and 0.75 indicate moderate agreement, and those below 0.50 indicate poor agreement [33]. A 95% confidence interval (CI) was also calculated, and p < 0.05 was used as the level of statistical significance.
Bland-Altman plots were constructed using mean differences between measurements [34]. Limits of agreement (LOA) were calculated as mean differences ± (standard Life 2023, 13, 1192 7 of 13 deviation multiplied by 1.96) [35]. Calculation of the occurrence of systematic or random changes in the data means that it was performed through a calculation of 95% confidence intervals (CI) of the mean differences between the values of the measurements.
The responsiveness was determined with minimal detectable change at 90%, which was calculated as SEM × 1.65 × √ 2 [36,37]. The MDC 90 expresses the minimal change required to be 90% confident that the change observed between two measurements reflects a real change (sensitive measure) and not a measurement error.

Results
A total of 26 subjects were included in the reliability analysis (57.7% men and 42.3% women). The average age of the sample was 25.69 years old with a standard deviation of 7.46 years old. In relation to body mass index, it was 26.1 (25.7-26.5; 95%CI) in men and 24.1 (23.7-24.7; 95%CI) in women. In addition, the percentage of participants with or in the process of completing tertiary education was 63%. According to the Shapiro-Wilk test, the data were normally distributed (p > 0.05).

Intrarater Reliability Results
The descriptive data for intrarater reliability, ICC 3,1 , SEM, MDC 90 , and Bland-Altman analysis with the 95%CI and LOA are summarized in Table 2. Good-to-excellent intrarater reliability for all tongue movements was found for both raters (ICC 3,1 ≥ 0.80). The SEM was <0.70 for rater A and <0.98 for rater B. The MDC was between 1.10 and 1.64 for rater A and between 0.96 and 2.30 for rater B.

Inter-Rater Reliability Results
The descriptive data for inter-rater reliability, ICC 3,2 , SEM, MDC, and Bland-Altman analysis with the 95%CI and LOA are summarized in Table 3. Good-to-excellent intrarater reliability for all tongue movements was found for both raters (ICC 3,2 ≥ 0.80). The SEM was <1.29. The MDC was between 1.20 and 3.01. Graphical representations of the Bland-Altman plot are shown in Figure 5.

Inter-Rater Reliability Results
The descriptive data for inter-rater reliability, ICC3,2, SEM, MDC, and Bland-Altman analysis with the 95%CI and LOA are summarized in Table 3. Good-to-excellent intrarater reliability for all tongue movements was found for both raters (ICC3,2 ≥ 0.80). The SEM was <1.29. The MDC was between 1.20 and 3.01. Graphical representations of the Bland-Altman plot are shown in Figure 5.

Discussion
As far as the authors know, this is the first study evaluating the maximum tongue force in four different directions of tongue movement. According to the results, a good-toexcellent intra-and inter-rater reliability was found for all movements. The measurements were also responsive to detect real changes.
This was also the first study testing the reliability of a device with a force-sensitive resistor (FSR) sensor to measure the maximum tongue force. Although the MOST device is composed of the same type of sensor, its reliability has not been tested [20]. There is currently no gold standard for maximum tongue force outcome measurements. That is why the results from this study are compared with the devices that are often used in clinical practice and research.
The present study has demonstrated an excellent intrarater reliability for maximum tongue force measurements of the superior, inferior, and right tongue movements (ICC 3,1 > 0.93) and a good intrarater reliability for measurements of the left tongue movement (ICC 3,1 > 0.82). The measurements of the superior tongue movement obtained the highest ICC 3,1 values (>0.95). These values were slightly greater than those found for the reliability measurements of tongue force in superior movements using the IOPI device, which ranged from 0.77 to 0.90 [38]. Likewise, better ICC values were obtained when compared to the study by White et al., who reported an excellent intrarater reliability for the KSW device in a healthy population (ICC = 0.92) [19]. In reference to the Oropress reliability results, similar ICC values were found (ICC = 0.86) when compared to the present study [17].
An excellent inter-rater reliability for measurements of the tongue force in elevation (ICC 3,2 = 0.94) and a good inter-rater reliability for measurements of the tongue force in depression and right and left lateralization (ICC s3,1 = 0.83, 0.87 and 0.81, respectively) were found in the current study. Youmans and Stierwalt (2006) obtained a 94% inter-rater agreement (r = 0.94) during the maximum isometric force measurement using the IOPI device [39]. The IOPI device is commonly used; however, it is only used to measure tongue force in one direction (tongue elevation). Additionally, the IOPI analysis protocols are different from the ones utilized in the present study. While the common IOPI protocol for analysis uses the highest value obtained during the three tests or the mean of the two best tests, the current study used the mean of the two measurements. Nevertheless, researchers cannot define the analysis with any of the devices since there is no defined protocol. Similarly to the IOPI device, the KSW instrument only measures superior tongue force movement and commonly collects the higher measure of the three tests performed.
The inter-rater reliability of tongue force measurements using the IOPI device in subjects with different conditions was reported to be good to excellent (ICC > 0.75) [22], with the exception of a study evaluating dysarthria patients in which a moderate reliability was found (ICC = 0.535) [22,38]. However, there are no recent studies available on the evaluation of the inter-rater reliability for the IOPI in healthy subjects, and the authors of this paper believe that this should be the first step prior to measurement and use in patients. Likewise, there is no inter-rater reliability research for measuring tongue force with the KSW device. The KSW device uses the same type of sensor as the IOPI, a silicon air-filled bulb. The main difference is that the KSW bulb is fixed to the palate, providing more stability and reliability. Probably due to its multiple functions, the KSW device is used more for research evaluating tongue force during swallowing. According to Fei et al., the KSW device is more reliable than the IOPI when evaluating tongue force during the function of swallowing [40].
Regarding responsiveness, the SEM values were low for elevation, depression, and right and left tongue movements (1.03, 1.09, 0.60, and 0.51, respectively). The MDC values were also low for elevation, depression, and right and left tongue movements: 2.40, 3.01, 1.43, and 1.20, respectively. Therefore, the new device was able to capture real change in tongue measures in all directions. Although we can assure good reliability and responsiveness for the device presented in this study in an asymptomatic population, we cannot guarantee the same findings in symptomatic subjects yet. Only one previous study determined the SEM and the MDC of the IOPI device in asymptomatic subjects [38]. This study estimated these values using standard deviation (SD), while the present study based the calculation on the root mean square (RMS) [40]. The SD was used to estimate the SEM, avoiding possible uncertainties due to the selected ICC type [35]. Therefore, the evaluation of the SEM varies between studies. Additionally, a Bland-Altman method was used to evaluate agreement, including the LOA. A good LOA was found, and the SEM, MDC, and LOA revealed a good level of concordance. These values are very important for the use of the device in clinical practice as they ensure that any improvement in tongue force is due to the treatment rather than measurement errors.
This study demonstrated that the newly developed tongue force device is reliable for measuring the maximum tongue force in different directions within and in between professionals. The new device overcomes some limitations from the tongue devices commonly used in the literature. This validated, safe, portable, and easy-to-use device can allow patients to perform tongue exercises at home, and the ability of the device to display the tongue activity in real time may increase their motivation to progress with their rehabilitation program. All these features add to the fact that it is a low-cost instrument. We recommend that future studies are needed to test the tongue force device including both healthy subjects and patients. Additionally, future studies must include in silico/computational simulation to ensure that the force data used from the device is accommodated correctly [41].

Limitations
This study presents some limitations. Nonprobability sampling is always a limitation of a study. Ideally, a sufficiently large population would have been accessible for probability sampling. The reliability of the developed tongue force device was tested on healthy young subjects mainly (at an average of 25.7 years of age) and, therefore, these results should be taken with caution when transferring them to other populations. Further studies should test the device in different age groups in order to generalize the results. Likewise, future studies should include subjects with different health conditions. The results showed a significant statistical difference in some values of the Bland-Altman plot. These differences are close to 0 and all mean difference values are below the MCD in all cases. This led us to assume those results are statistically significant but not clinically relevant. The minimal clinically important difference (MCID) should be evaluated in future studies. Likewise, the values of other populations must be established and validated in future studies as with any measurement device or questionnaire.

Clinical Implications
From a neurophysiological point of view, it is known that the cerebral cortex has areas where information (input and output) from the V (trigeminal nerve), VII (facial nerve), and XII (hypoglossal nerve) cranial nerves is integrated [42]. In this way, these cranial nerves control the muscles of mastication, facial gestures, and the tongue, respectively, in order to achieve the optimal functionality of the entire system during speech and mastication, among other functions. Additionally, we have already published an observational study which showed significant differences in the maximum tongue force between asymptomatic women and those with chronic temporomandibular disorder, corroborating the necessity for the assessment of the tongue force in this pathology [43]. In this article, a decrease in tongue strength of about 30% on average across all directions was found in the group of patients with chronic TMD. In line with this, clinical experience shows that many patients with TMD (especially the chronic type) have lingual alternations both in terms of flexibility (length) and strength in various directions.
This new device to measure tongue force allows obtaining objective measurements of tongue force in clinical practice in order to help clinicians with the diagnosis process and treatment progression. This will give clinicians and patients real data to observe the changes during the treatment. Moreover, the new tongue force device has a diagnostic interface and treatment interface with different games to train the force at home and in the clinic. This training with games will motivate the patients and increase the adherence to the treatment. This offers an accessible device for patients and clinicians due to the fact that the few that are available in the market have this limitation and are much more expensive. Moreover, its validity has been proved in a previous study that has been recently published [25]. This could be a new tool to consider and add for the assessment and treatment of these patients. Likewise, as a new tool in the treatment of TMD, it could decrease the sociosanitary costs that this pathology implies for the sanitary system due to its chronicity.

Conclusions
This study showed a good-to-excellent intra-and inter-reliability for the newly developed device to measure the maximum tongue force in four different directions in an asymptomatic population. The measurements with the new device were also able to detect real changes, suggesting a more sensitive measure (good responsiveness in the device). These results confirmed that the device is suitable for objective and precise tongue measurements independently of the subject that is using this tool. The new prototype device seems to be an improved tongue force measurement tool that is safe, validated, and more accessible than others on the market.

Institutional Review Board Statement:
The study was conducted in accordance with the guidelines for reporting reliability and agreement studies (GRRAS) [27]. This study was approved by the Ethics Committee from the Centro Superior de Estudios Universitarios La Salle (CSEULS) of the Universidad Autónoma de Madrid (project code: CSEULS-PI-036/2019). Subjects were recruited from the Centro Superior de Estudios Universitarios La Salle (CSEULS) of the Universidad Autónoma de Madrid.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patient(s) to publish this paper. Data Availability Statement: Not applicable.