Measuring lip force by oral screens. Part 1: Importance of screen size and individual variability

Abstract To reduce drooling and facilitate food transport in rehabilitation of patients with oral motor dysfunction, lip force can be trained using an oral screen. Longitudinal studies evaluating the effect of training require objective methods. The aim of this study was to evaluate a method for measuring lip strength, to investigate normal values and fluctuation of lip force in healthy adults on 1 occasion and over time, to study how the size of the screen affects the force, to evaluate the most appropriate measure of reliability, and to identify force performed in relation to gender. Three different sizes of oral screens were used to measure the lip force for 24 healthy adults on 3 different occasions, during a period of 6 months, using an apparatus based on strain gauge. The maximum lip force as evaluated with this method depends on the area of the screen size. By calculating the projected area of the screen, the lip force could be normalized to an oral screen pressure quantity expressed in kPa, which can be used for comparing measurements from screens with different sizes. Both the mean value and standard deviation were shown to vary between individuals. The study showed no differences regarding gender and only small variation with age. Normal variation over time (months) may be up to 3 times greater than the standard error of measurement at a certain occasion. The lip force increases in relation to the projected area of the screen. No general standard deviation can be assigned to the method and all measurements should be analyzed individually based on oral screen pressure to compensate for different screen sizes.


| INTRODUCTION
Lip force is related to the ability of perioral musculature to produce adequate pressure to tightly close the lips and keep them closed. In the act of swallowing, blowing, sucking, chewing, and pronouncing vowels, the orbicularis oris, buccinators, and superior constrictor muscles function as a unit (Logemann, 1998;Perkins, Blanton, & Biggs, 1977). Lip force is of great importance to remove food from the spoon and to avoid leakage of food and liquid (Chigira, Omoto, Mukai, & Kaneko, 1994). Impaired lip force might cause drooling, retention of food in the vestibulum and affect the swallowing. Apart from being a considerable social handicap, this can be a severe and life-threatening complication, as aspiration of contaminated saliva in many cases results in pneumonia (Yoneyama et al., 2002). Decreased ability to eliminate food from the oral cavity due to oral muscular dysfunction increases the risk of developing caries. It has been shown that the severity of drooling is positively correlated to sugar clearance time (Gabre, Norrman, & Birkhed, 2005).
Drooling and leakage of food from the mouth makes eating with friends and relatives an embarrassing and sometimes even a traumatic experience (Axelsson, Norberg, & Asplund, 1984). Accidental biting of the lip and tongue is reported common in patients with poor oral motor function due to brain damage (Millwood & Fiske, 2001). Furthermore, lip closure is of great importance in articulation when producing bilabial sounds (Barlow & Rath, 1985). In order to rehabilitate patients with oral motor dysfunction, lip force can be trained using an oral screen which is a curved shield made of acrylic with a handle . In the market, there are several different prefabricated oral screens available of different sizes. Training 2 to 3 times a day has been suggested (Thüer & Ingervall, 1990).

Hägg and Sjögreen used a handheld dynamometer, the Lip Force
Meter LF 100, and prefabricated oral screens in different material and sizes (Hägg, Olgarsson, & Anniko, 2008;Sjögreen, Lohmander, & Kiliaridis, 2011). Hägg found excellent intra-investigator reliability testing both patients and controls. Control persons had a significantly stronger lip force than stroke patients using a hard prefabricated oral screen . Using a soft oral screen intraindividual variability was tested on healthy adults on two occasions (Sjögreen et al., 2011). The oral screens used in these studies are of different sizes, and thus, it is not possible to compare the measured forces. To our knowledge, whether or not the size of the screen influences on the measured force has not been investigated.
In order to evaluate if the patient improves, fluctuations of lip force in healthy adults must be studied both regarding the variation in one monitoring and how it may change over time. A prefabricated oral screen allows the test person to suck or squeeze during the measuring. Thus, it is uncertain, whether or not it is the force produced by the perioral muscles being measured or if it is a mixture of the force created by sucking and squeezing. This aspect has not been taken into account in any studies. To obtain a reliable measurement of lip force, a method should be selected where the test person squeezes the oral screen without being able to suck.
To express the relative reliability of the measurement, intraclass correlation coefficient (ICC) is a commonly used statistical method.
However, a high ICC does not always indicate a small error of measurement in terms of absolute reliability (Atkinson & Nevill, 1998).
The value is sensitive to the heterogenicity of the participants. An increasing heterogenicity with a higher standard deviation between subjects and a similar error of measurement will increase the ICC value, thus giving a false impression of accuracy (Atkinson & Nevill, 1998;Hopkins, 2000;Lexell & Downham, 2005). To determine the range of measurement error, the standard error of measurement (SEM) and the smallest real difference (SRD) should be explored (Beckerman et al., 2001;Lexell & Downham, 2005). A real improvement is shown if the strength increases more than SRD.
The aims of this study were to 1. Study how lip force is affected by the size of the screen. 3. Identify force in relation to gender.

| MATERIALS AND METHODS
The Ethics Committee of the University of Gothenburg approved the study, (Dnr S43-96), and it was performed in accordance with the Declaration of Helsinki.

| Lip force
The lip force meter LF 100 is an electronic lip force measuring instrument measuring the maximum lip force in Newton over a set period of 10 s . A wire is connected to a force transducer based on strain gauge sensing forces from 0 to 250 N with a resolution of 1 N (0.4%). From calibration measurements before and after the test period, the uncorrected bias was less than ±1 N.

| Subjects
Twenty-four healthy adults (12 males and 12 females) were recruited on a voluntary basis (range: 26-73) and informed consent was obtained. The group was mainly composed of dental health personnel at the Public Dental Service. The test persons had ordinary morphology of the face, normal oral motor function, and occlusion. Two males and two females were recruited to each age group. The age groups were 20-29, 30-39, 40-49, 50-59, 60-69 and 70+.

| Oral screens
Three different sizes of oral screens-small, medium, and large-were made from plaster casts measuring 45 mm, 49 mm, and 56 mm between the buccal surfaces of teeth 15 and 25. The oral screens were made of acrylic and covered the oral vestibule in the front and back to the distal surfaces of the second premolars each side. They were designed with a small hollowed tube around the handle (Figure 1a,b).
The tube made it possible to let air pass and prevent suction.

| Projected area of the oral screen
The screen was placed on a piece of paper. By looking from a perpendicular direction, the parallel projected contour was identified and drawn on the paper. A reference area of known size was applied to the paper. The paper was scanned and analyzed in an image manipulation program (GIMP). The projected area of the small screen was 13.4 cm 2 ; the medium, 15.5 cm 2 ; and the large, 22.6 cm 2 . The maximum error was estimated to 5% of measured area.

| Measurement procedure
The examiner demonstrated the measuring procedure and gave the verbal instruction: "Hold the oral screen in your mouth as firm as you can, while I pull it out." The screen was placed inside the lips. The wire was stretched perpendicular to an imaginary line between the nose and the chin of the test person, and the measuring was started. The examiner pulled the wire gradually increasing the power until the oral screen was pulled loose. The procedure was repeated 3 times in succession for each screen.

| Data collection
The lip force was measured at 3 times during a period of 6 months with 3 months between the measurements. No exercise was to be performed by the participants. The measurement procedure was carried out 3 times for each size of oral screens at each occasion. Changing from one size of the oral screen to another, the test person rested for 2 min. For each of the 24 individuals, 27 measurements were carried out in total. There were 216 measurements for each screen and in total, 648 measurements in the dataset. The same investigator made all measurements.

| Statistical analysis
The dataset was analyzed with SPSS and further processed in MS Excel. Calculations of the confidence limits for the standard deviations were based on the χ 2 distribution. Upper and lower bounds of the standard deviations at 95% confidence level were calculated in MS Excel for different n values. Homogeneity of variances was tested in SPSS with Levene's test and one-way analysis of variance (ANOVA) was used to compare means. The data was analyzed in SPSS for normality by the Shapiro-Wilk test. The difference between men and women was tested with Student's t test. One-way ANOVA was used to test variation over time for different individuals.
Measurements were divided among the three time groups, and an estimated standard deviation within the same occasion (SEM) was calculated from a one-way ANOVA analysis as the square root of the mean square within groups (MSWG).
A normalized quantity SEM% can be calculated from the relation: where mean is the mean of all measurements. In order to analyze the magnitude of changes with time, a relative mean value change d i1 was calculated according the following equation: where A 95% confidence level of significant difference between two measurements is often calculated according the following relation (Beckerman et al., 2001): In our case we calculate the mean x i at every occasion from m = 6 measurements. With k = 3 occasions, we get in total n = mk = 18 measurements for each individual. A 95% confidence level of significant difference between two means could then be calculated as However, from ANOVA the calculation of the SEM value is based on a limited number of measurements with df = n − k degree of freedom. We must then introduce the t statistics for a more accurate calculation of SRD mean giving where t .975 , df is the value of the t statistic with cumulative probability .975 and df, degrees of freedom. In our case, we get t .975 , 15 = 2.13 and 3 | RESULTS

| Screen size
An overall picture of the whole dataset is given in Figure 2, where the data is divided between the three different screen sizes. The mean value and standard deviation for single measurements differs between the screen sizes. Error bars are showing the 95% confidence limits for the measured parameters. Calculation of the 95% confidence limits for the standard deviations are based on the χ 2 distribution (df = 215, lower limit 0.91·SD, upper limit 1.10·SD). The mean value of lip force varies significantly with screen size. From a Levene's test, it was concluded that the variances were significant different, F(2, 645) = 16.1, p < .001. By dividing lip force with the projected area of the screen, a normalized value can be obtained which is independent of the screen size. The new parameter will have the dimension of pressure, that is, After analyzing the OSP data with Levene's test, it was found that the variances were not significantly different, F(2, 645) = 1.06, p = .346.
The mean value for the smallest screen was significantly smaller than the mean values from the medium screen and large screen (Table 1).
However, mean value did not differ significantly between medium and large screens. The data was analyzed in SPSS for normality by the Shapiro-Wilk tests. The measurements might be normally distributed since the p values in the Shapiro-Wilk test are greater than .05. However, for the small screen, a deviation from a normal distribution is seen.
From these results, it was concluded that measurements from the medium and large screen could be combined in order to analyze individual variability. However, measurements from the small screen could be biased with a small systematic error. Thus, the small screen values were excluded from further variability analyses. The total number of measurements is N = 18 for each individual in the following analyses.

| Differences between individuals
In Figure 3, OSP standard deviations for single measurements are shown versus mean OSP for each subject. Calculation of the confidence limits for the standard deviations was based on the χ 2 distribution (df = 17, lower limit 0.75·SD, upper limit 1.50·SD

| Variation associated with gender
The OSP values for women were 13.7 ± 3.5 kPa and 14.7 ± 2.3 kPa for men (mean ± SD). An independent samples t test showed no significant difference in mean value between men and women, t(22) = −0.88, p = .39.

| Variation over time
A one-way ANOVA analysis was carried out for each subject in order to investigate possible significant changes in the mean value with time.
From Levene's test, it was found that for all individuals except two, the variances were not significantly different at the three different    Figure 4 shows a plot of the estimated SEM value versus the mean value of OSP for each subject. Calculation of the confidence limits for the SEM value was based on the χ 2 distribution (df = 15, lower limit 0.73·SD, upper limit 1.58·SD). The magnitude of the SEM value is lower than the previous standard deviation in Figure 3 since the spreading, due to measurements at different occasions, is now eliminated. However, it is seen that there is still a wide spreading in SEM values among the subjects indicating that this parameter is really an individual parameter. Individual parameter data are summarized in Table 2. Here, it can be seen that both mean values and SEM values may be normally distributed because Shapiro-Wilk test gave p values significantly greater than .05. The data in Figure 4 and Table 2 may be converted into values of SEM% (Equation 2). The SEM% values were found to be in the range 4%-14% with a mean of 8.6%.

| Gender
As in other studies, no significant difference regarding sex has been found (Sjögreen et al., 2011). A possible variation associated with gender (around 1.0 kPa) is small compared to the individual variability (standard deviation 2.9 kPa).

| Further studies
Oral screens without possibility to mix suction and squeezing have been used in this study. There is a great need for studies to clarify the difference between measuring with and without suction.

| CONCLUSION
1. The maximum lip force depends on the area of the screen size. By evaluating the projected area of the screen, lip force could be normalized to an OSP quantity that can be used for comparing measurements from screens with different sizes.
2. Both the mean value and standard deviation for single measurements were shown to vary between individuals. Therefore, no general standard deviation measure can be assigned to the method and all measurements should be analyzed individually.
3. For a particular individual, longitudinal data can be analyzed by variance analysis (ANOVA).

Normal variation over time (months) may be up to 4 times greater
than the SEM at a certain occasion.
5. No significant relation to gender was found.