Reliability analyses and values of isometric shoulder flexion and trunk extension strengths stratified by body mass index

The main goal of this study was to investigate the reliability of muscle strength across different levels of obesity. A sample of 142 healthy subjects performed maximum voluntary isometric contractions for shoulder flexion and trunk extension on each of four days. Subjects were recruited into one of three groups, non-obese, overweight, or obese, based on body mass index (BMI). Reliability of the strength measurements within each session and across the four sessions was determined from the intraclass correlation coefficient, coefficient of repeatability, coefficient of variation, and standard error of measurement. For the shoulder flexion measures, the coefficient of variation was < 10% and intraclass correlation coefficient was > 0.75. The absolute reliability of trunk extension strength measurement was rejected due to a high variability across sessions. For both tasks, comparable strengths across the BMI groups were found.


Introduction
A shift in the adult population to include~1.9 billion obese (body mass index (BMI) � 30 kg/ m 2 ) and overweight (25 � BMI < 30 kg/m 2 ) adults worldwide has had the negative consequence of increased risk of injuries with obesity [1]. Increasing BMI has been shown to be associated with impaired back extensor muscle function [2,3] and shoulder pain complaints [4] during physical resistance tests. In a meta-analysis of 33 publications in the Medline and Embase databases, overweight and obesity have been linked with chronic low back pain [5]. Sedentary lifestyle and lower physical activity among individuals who are obese may contribute to weaker muscle strength. However, this negative effect can be counterbalanced with a favorable chronic weight-bearing effect as a result of increased body mass [6].
Mixed findings have been reported in the literature on the effect of obesity on muscle strength. Most studies have reported a higher absolute isometric strength of the trunk and lower extremity muscles [7][8][9][10][11], as well as upper extremity muscles [7,[12][13][14][15][16] with obesity. Interestingly, some of these studies have also reported conflicting results. For example, contrary to findings reported by Hulens et al. [8] and Rolland et al. [9], Kitagawa [16] reported no strength differences of the trunk and knee muscles. This increases concerns about sources of variability such as experimental protocol, equipment settings and calibration, postures, experimenters, and subjects. To guard against various sources of errors, reporting the reliability or repeatability of the strength measurements is critical [17], which has been neglected in previous studies of strength differences with obesity. Interpretability of strength results in product and workstation design, rehabilitation decision-making, and ergonomics practice depends on the reliability of the measurements, particularly when participants with a wider range of BMI are included, as this increases the betweensubject variability. Within-subject variability, such as biological changes due to fatigue, either physically or mentally, and learning effect, could also affect the results [18], and this effect can vary between subject groups. For example, lack of motivation, mental fatigue, or impaired central activation for maximal muscle contractions have been reported to be more apparent after sustained isometric contractions to exhaustion for individuals who are obese [19,20]. Therefore, a reasonable number of subjects and trials is needed to have a higher inter-and intraindividual reliability of strength measures [18].
During repeated measurements, relative reliability reflects the degree of maintaining each individual's rank relative to a sample of subjects, and absolute reliability shows the amount of within-subject variability [21,22]. As suggested by Bland and Altman [23] and Rankin and Stokes [24], as no single measure sufficiently covers both relative and absolute reliability, multiple reliability metrics are needed for accurate assessment. In this study, maximum voluntary isometric contractions (MVCs) were repeated across four sessions, each on a different day, in a large heterogeneous (wide range of BMI) sample of subjects, which enabled measuring the relative and absolute reliabilities. Verifying the reliability in this study enables generalization of the findings more reliably to other conditions and individuals. Therefore, the main objective of this study was to test whether maximum muscle strength is reliable across repeated measurements across multiple sessions and across different levels of obesity. Two tasks of shoulder flexion and trunk extension were selected for the purpose of strength measurement. These muscle groups were selected for the high frequency of shoulder use during activities of daily living and the role of trunk extensor strength in supporting the more anterior center of mass, due to accumulated fat tissues around the abdomen, for individuals who are obese. Secondary objectives were to capture strength differences across subject groups and to identify significant predictors of strength. It was hypothesized that strength is affected by obesity (classified either using BMI or percent body fat). It was also hypothesized that trunk extensor strength would exhibit a greater influence of obesity due to the greater accumulation of fat tissue in the abdominal area.

Subjects and ethical approval
As part of a larger study on functional capacity changes related to obesity, 142 healthy subjects were recruited. Participants were assigned to three groups: 49 (24 males, 25 females) normal weight (18 � BMI < 25 kg/m 2 ), 50 (25 males, 25 females) overweight (25 � BMI < 30 kg/m 2 ) and 43 (22 males, 21 females) obese (BMI � 30 kg/m 2 ). Based on an acceptable reliability (p 0 ) of 0.6 and an expected reliability (p) of 0.8, with α = 0.05, power = 0.8, and number of observations = 4, the target sample size per group was 22. All participants completed demographic, health history, and physical activity questionnaires prior to the experiment. Only healthy individuals who did not perform extensive physical activity more than one hour per day up to three days per week were included in this experiment. Detailed demographic and anthropometric information for the participants, divided by group, is provided in Table 1. As described in Mehta and Cavuoto [25], participants' heights were measured with a standard stadiometer to the nearest 0.1 cm and weights with a digital metric scale to the nearest 0.1 kg. Waist and hip circumferences were measured to the nearest 0.1 cm using a standard flexible, inelastic measuring tape. Body fat percentage (%BF) was measured with an electronic impedance scale (TANITA Corporation, Tokyo, Japan). All study protocols were approved by the University at Buffalo and Texas A&M University Institutional Review Boards and all participants provided informed consent prior to participation. The individual pictured in this manuscript was a research assistant who recreated the postures adopted and has given written informed consent (as outlined in PLOS consent form) to publish this image.

Experimental setup and protocol
Three MVCs each of shoulder flexion and trunk extension were repeated across four sessions. Sessions were separated by at least 48 hours to minimize the effect of any residual fatigue on performance, and task order was counterbalanced across subjects. The timing of sessions was based on participant availability, and thus could not be controlled across participants. Effort was made to schedule each participant at the same time of day for the four sessions, but this was not controlled. To reduce the sources of variability related to lack of motivation and inability to fully activate motor neurons, the MVCs were repeated three times and the maximum of them was selected as a measure of muscle strength for each subject in each session. Reproducibility or repeatability of the observed values was tested across sessions. In each session, participants warmed up before each task with repeated shoulder and trunk flexion, extension, and rotation. The three isometric MVCs, each 4-5 seconds long with two minutes of rest in between, were then performed, based on standard strength testing procedures [26]. Realtime visual feedback and verbal encouragement were provided during the trials. Shoulder flexion, and trunk extension were separated by at least 10 minutes each.
An isokinetic dynamometer (Cybex Humac NORM, Ronkonkoma, NY, USA) was used to measure the shoulder flexion and trunk extension torque at a rate of 100 Hz. Shoulder flexion of the right arm was tested with participants laying supine on the dynamometer chair with a seat belt around the pelvis and arm flexed at 90˚with extended elbow ( Fig 1A). The dynamometer's axis of rotation and shoulder adaptor height were set with respect to the acromion process and arm length, respectively. The forearm was kept parallel to the shoulder adaptor grasping a handle with the wrist in a neutral position. Participants were allowed to release the handle during the rest periods. For trunk extension, participants stood upright on the dynamometer footplate with slightly flexed (< 5˚) trunk against the sacral pad ( Fig 1B). The dynamometer's axis of rotation was aligned based on the iliac crest and L5/S1 location. The scapular and chest pads were fastened in parallel across the center of the scapulae and against the subject's chest. The feet were placed in a fixed position against footplate heel cups separated at about shoulder width. The thigh pad, tibial pad, and pelvic belt was attached to help firmly securing the lower body minimizing the confounding effect of other muscles during trunk extension.

Measures and statistical analysis
Due to differences in strength by gender, and a hypothesized interaction between obesity level and gender on strength, all analyses were conducted separately for males and females. For each task, the level of absolute agreement across the four MVCs for each person was evaluated using the intraclass correlation coefficient (ICC). ICC estimates and their 95% confidence intervals were calculated based on a single measure (k = 1), absolute agreement, two-way random effects model. K = 1 was chosen since strength testing is typically performed on a single day with the intent of using that value as the representative strength for an individual. The ICCs, as a measure of relative reliability, were calculated for the overall sample of subjects as well as for each of the three groups separately to evaluate between-group differences in reliability [27,28]. A high agreement (ICC > 0.9) indicates excellent reliability of repeated MVCs as a measure of strength for each person within each task [21]. The interpretation for the ICC was based on the benchmarks suggested by Landis and Koch [29] and updated by Shrout [30]: values between .4 and .6 were considered fair, .61-.8 moderate, and .81-1 substantial. Since the ICC is a relative measure of reliability and cannot be used to assess trial-to-trial differences, additional absolute indices of reliability were calculated [22,27], including the coefficient of repeatability (CR), standard error of measurement (SEM), and coefficient of variation (CV) with its 95% confidence interval [17,31]. CR, suggested by Bland and Altman [23] for the case of repeated measures, was calculated as 2.77s w , where s w is the within-subject standard deviation from the one-way analysis of variance (ANOVA) with subject as the factor. The mean CV was calculated as the average of the within-subject standard deviations divided by the withinsubject means. Variability less than 10% of the mean (CV < 10%) indicates that about 68% of the measurements are within 10% of the mean [32]. SEM was calculated overall and by group using the relevant ICC and standard deviation of the sample strength measures ). SEM eliminates the effect of inter-individual variability calculated in ICC [17]. The SEMs were then used to provide 95% CIs for the true strength measurements quantified as strength ± 1.96(SEM) for each task and overall population. In addition, minimum detectable change (MDC) was calculated as SEM×1.96× p 2 to show the minimum considerable real change in the performance [27].
Linear regression analysis was performed for each task in order to find the linear trend of strength changes with age and either increasing BMI or percent body fat (%BF). The strength data was then used to provide the percentile values of strength for each task overall as well as for each group. The variability of the strength in the commonly used submaximal percentiles (i.e., 5 th , 10 th , 25 th , etc) in workstation, task, and product design was also considered. All statistical analyses were conducted in SPSS Ver. 24 (IBM Corporation) and the level of significance was set at α = 0.05.

Reliability of strength measurements
The ICC and CV with 95% CIs, CR, and SEM results by gender, overall and by BMI group, are presented in Table 2. Overall strength measurement across sessions had substantial reliability as indicated by the ICC values exceeding 0.8 for both genders and tasks. If the average measure was used, these ICCs would be expected to increase. When divided by BMI group, reliability remained good, with all group-level ICCs > 0.7. The lowest mean ICC was for the obese female group for trunk extension. The lower bound for the 95% confidence interval was above 0.6 in all but one case (obese female for trunk extension). Therefore, MVC measurement from a Measures of strength for the shoulder flexion task were the most reliable, which is evident from the lower SEM and CR values. Generally, higher variability was observed for the trunk extension compared to the shoulder flexion task, and for males compared to females. For females, the normal weight group had lower variability between the tasks compared to the overweight and obese groups across all measures, with the overweight group having the largest CR value and the obese group having the largest SEM and CV values. For males, the pattern of the overweight group having the largest CR remained for both tasks. The overweight male group also had larger CV and SEM values for the trunk extension task, but not for shoulder flexion.
Additionally, mean, 95% CI of the true strength estimates, and MDC for the overall sample of subjects by gender and for each task are presented in Table 3. MDCs for females are 15.1% and 25.6% of the mean for shoulder flexion and trunk extension tasks, respectively. For males they are 14.3% and 26.9%, respectively.

Effect of obesity on strength
Regression analyses revealed no significant associations between BMI and strength or body fat percentage and strength for shoulder flexion and trunk extension strengths. Age was not significant in the regression models either. For practical purposes, strength percentiles of each task, overall and for each obesity group, are provided in Table 4. Fig 2 presents boxplots of the strength outcomes and the variability across participants.

Reliability of strength measurements
Absolute and relative reliabilities of the shoulder flexion strength measures were supported by having agreement between the measurements ICCs > 0.75, variabilities about the means < 10%, and repeatability coefficients < 15 for females and < 20 for males [31,32]. These results are consistent with those for healthy young adults, including those for isometric elbow flexion strength [33], isokinetic shoulder peak torque (21), and isokinetic trunk flexion [34]. For the trunk extension task, while relative reliability remained substantial for most groups (only two had ICC < 0.8), the absolute reliability of the measures was not shown (CV% > 10% for all groups and CR > 25 with large differences between groups). For this task, in both females and males, the normal weight group had a lower mean CV (12.0% and 12.8%, respectively) compared to the overweight (16.2% and 18.3%) and obese groups (18.9% and 16.5%). Similarly, the normal weight group had a lower CR compared to the overweight and obese groups, with the overweight group having the largest coefficient of repeatability. The heterogeneous sample over a large BMI range resulted in an increased coefficient of variation for the trunk strength compared to shoulder strength. Larger variability during trunk exertions was previously reported and attributed to the complex muscle involvements and co-contractions involved [35]. In addition, while the experimental setup was designed to minimize recruitment of supporting muscles, such as the quadriceps, by controlling the posture and limiting movement (as has been done in other studies [34]), participants may adjusted their posture during the task and used their legs in generating the trunk extension movement.
Measuring trunk extensor strength in other postures or settings, such as a seated posture, might be required to increase the stability of the muscles involved and minimize unintended activation of antagonist muscles, to increase the reliability and precision of the measures. Findings of this study suggest that individual changes in strength equal or greater thañ 15% and~26% of the mean above or below the previous score can be interpreted as a real Reliability of BMI-specific strength change, with 95% confidence, for shoulder flexion and trunk extension tasks, respectively. These MDCs help in performance assessments and return to work evaluations to test the effect of an intervention on performance improvements [27].

Effect of obesity on strength
The present study found comparable trunk extension strength between groups of individuals who are obese and non-obese; a finding consistent with previous studies [13,16]. However, the former classified obesity based on %BF rather than BMI, and overweight individuals were not considered in either of the studies. In contrast, Hulens et al. [8] found significantly higher absolute trunk extension strength, but these results may be attributable to a sample that included individuals who were extremely obese and/or older, along with a lack of control over participant physical activity. As described by Pajoutan et al. [36], trunk strength has previously been shown to be positively correlated with fat-free mass. When obesity-related increases in strength have been found for the lower extremity or trunk (e.g., Maffiuletti et al. [10] and Lafortuna et al.), chronic training and increased cross-sectional area of the muscle have been hypothesized. While larger cross-sectional area has been found with obesity [37], the muscle contraction can be impaired by the presence of intramuscular fat [38]. Contrary to the present study findings, two other studies [12,13] found significantly higher shoulder flexion strength with obesity, with~20-25% higher strength for the obese groups in those studies. Mean shoulder flexion strength in both of those studies was~50 Nm for the non-obese groups and~60-65 Nm for the obese groups, using a similar posture to the one adopted in the current study. While there was not a significant difference in shoulder flexion strength in the current study, these means are similar to those observed for the current male sample, where the 50 th percentile strengths were 50 Nm for the normal weight group and 62 Nm for the obese group. The prior studies do not distinguish strength outcomes by gender, but differences in the sample and participant physical activity levels may explain the lower female strength values in the current study compared to previously reported overall means.
Comparable strength across obesity levels aligns with findings reported in existing studies of hand grip strength as well [8,9,35]. For example, in a large study of 1443 elderly subjects in all three BMI categories, isometric grip strength remained intact with obesity [9]. In that study, while obesity was not a significant factor, recreational physical activity was found to be the determinant of the handgrip force capacity. Since the present study sample was similar in their physical activity levels, i.e., recreationally active to sedentary individuals, we were unable to test strength predictability based on physical activity levels. It is recommended that future studies address this by including subjects across different physical activity levels. Age was not a significant negative linear predictor of strength in any of the regression models. This is not contrary to expectation, as most studies that have identified a significant age effect have considered samples with individuals greater than 60 years old, versus 56 in the current study. For example, Eksiolgu [39] observed equivalent hand grip strength for males 18-59 and the only significant differences for females were for those greater than 50 years old. Similarly, in a meta-analysis of reference values for adult hand grip [40], the age-related decline in grip strength (for either hand) occurs from the 55-59 years group and older. While muscle strength values have shown to differ by ethnicity/race groups Blakley et al. [41], these differences have shown to be influenced by factors such as age [42]. Given that obesity rates are higher in African American and Hispanic adults in the United States [43], research is warranted to document strength values stratified by race/ethnicity, obesity, and age groups from to represent current workforce demographics.

Limitations
While the reliability of the measures was verified for the shoulder flexion task in a large and heterogeneous sample of subjects, representative of the population, it is worth mentioning that the results of this study can only be extended to younger healthy adults, for the specific tasks tested. The BMI and age recruitment criteria for this study were limited to Class I and II obesity (30 < BMI < 40 kg/m 2 ) and younger adults. Extremely obese individuals were excluded since they represent~6% of the population [44]. While efforts were made to recruit workers in the present study, the study sample largely comprised of university students and residents from local communities. In addition, experimental sessions were scheduled based on participant availability and thus the time of day was not controlled. Despite potential circadian influence, reliability remained high. Future work should control timing of strength measurements to confirm these findings. The study also only considered the reliability of isometric strength measurements. Future studies should consider dynamic strength tasks where the influence of body mass support may have a larger impact on repeatability of strength measurement. Generalizing the results to other postures, tasks, populations, and individual factors requires further research.

Conclusion
Across obesity levels, strength measurements had high levels of absolute and relative reliabilities for the shoulder flexion task, even when accounting for various sources of errors (e.g., repeated measurements). The findings of this study for the shoulder flexion task support the use of a single strength measurement for practical purposes to estimate the required strength and allowances in product design and performance evaluation. Trunk extension strength values should be used with caution. While the relative reliability was good to substantial, the absolute reliability measures showed high variability across sessions when considering gender and BMI group.