Large-scale analysis of iliopsoas muscle volumes in the UK Biobank

Psoas muscle measurements are frequently used as markers of sarcopenia and predictors of health. Manually measured cross-sectional areas are most commonly used, but there is a lack of consistency regarding the position of the measurement and manual annotations are not practical for large population studies. We have developed a fully automated method to measure iliopsoas muscle volume (comprised of the psoas and iliacus muscles) using a convolutional neural network. Magnetic resonance images were obtained from the UK Biobank for 5000 participants, balanced for age, gender and BMI. Ninety manual annotations were available for model training and validation. The model showed excellent performance against out-of-sample data (average dice score coefficient of 0.9046 ± 0.0058 for six-fold cross-validation). Iliopsoas muscle volumes were successfully measured in all 5000 participants. Iliopsoas volume was greater in male compared with female subjects. There was a small but significant asymmetry between left and right iliopsoas muscle volumes. We also found that iliopsoas volume was significantly related to height, BMI and age, and that there was an acceleration in muscle volume decrease in men with age. Our method provides a robust technique for measuring iliopsoas muscle volume that can be applied to large cohorts.

The iliopsoas muscles, predominantly made up of slow-twitch fibers, are a composite of the psoas major and iliacus muscles; they are anatomically separate in the abdomen and pelvis but are merged together in the thigh. The iliopsoas is engaged during most day to day activities, including posture, walking and running. Together these muscles serve as the chief flexor of the hip and a dynamic stabiliser of the lumbar spine 1 , with the psoas uniquely having role in the movement of both the trunk and lower extremities 2 . Given the key involvement of the iliopsoas muscles in daily activities, there is increasing interest in its potential as a health biomarker. This has most commonly taken the form of a cross-sectional area (CSA) through one (generally the right) or both iliopsoas muscles, with the most common measurement taken through the psoas muscle. This CSA can be used either as an independent measurement or as a ratio to vertebral body size 3,4 or in the form of the psoas muscle index, calculated as the psoas muscle major CSA divided by the height squared 5 . Indeed, psoas CSA has been suggested as a predictor of sarcopenia 6 , surgical outcome and length of hospital stay post surgery [7][8][9] , poor prognosis in response to cancer treatment 10 , morbidity following trauma 4 , a surrogate marker of whole body lean muscle mass 11 , cardiovascular fitness 12 , changes in cardiometabolic risk variables following lifestyle intervention 13 and even risk of mortality 14,15 .
Measurements of the psoas major muscle are most commonly made from CSA of axial MRI or CT images 7,12 , with most studies generally relying on manual annotation of a single slice, through the abdomen, these tend to be retrospectively repurposed from clinical scans rather than a specific acquisition [16][17][18] . However, the CSA of the psoas muscle varies considerably along its length 2 therefore small differences in measurement position can potentially have a significant effect on its overall measured size. Moreover, there is a lack of consistency within the literature regarding the precise location at which measurement of the psoas CSA should be made, with researchers using a variety of approaches including: the level of the third lumbar vertebrae (L3) 6,9,10,17,18 , L4 3,4,14,16 , between L4-L5 11,13 , as well at level of the umbilicus 7,8,19 the precise position of which is known to vary with obesity/ascites. There is further discrepancy between studies regarding whether the measurements should comprise of one single 10 or both psoas muscles 17 , with the majority of publications combining the areas of both muscles.
This lack of consistency together with the relatively low attention given to robustness and reproducibility of its measurement, and the reliance on images from retrospective clinical scans have led many to question its validity as a biomarker 20 . A more objective proposition may be to measure total psoas muscle volume 21 22,23 , from the origin of the psoas at lumbar vertebrae (unspecified) to its insertion in the lesser trochanter 24 , or with no anatomical information provided at all 25 . Whilst all of these approaches include substantially more muscle than is included in simple CSA measurements, these are still incomplete volume measurements. Moreover, measuring the entire psoas muscle volume as a single entity is challenging, since even with 3D volumetric scans it is difficult to differentiate between composite iliacus and psoas muscles once they merge at the level of the inguinal ligament. Therefore, to measure psoas volume as an independent muscle it is necessary to either assign an arbitrary cutoff and not include a considerable proportion of the psoas muscle (estimated to be approximately 50% in some studies 22 ) or simply include the iliacus muscle and measure the iliopsoas muscle volume in its entirety. Convolutional Neural Networks (CNNs) have become a strong tool for automated image segmentation, especially architectures such as the U-Net 26 for two-dimensional (2D) data or the V-Net 27 for three-dimensional (3D) data. These techniques owe their popularity to the modest amount of training data required, robustness and fast execution speed. CNNs have been applied for automated muscle segmentation in computed tomography [28][29][30] , specifically for 2D segmentation of the psoas major muscle 29 , as well as MRI 31,32 .
The increasing use of whole body imaging 33 in large cohort studies such as the UK Biobank (UKBB), which plans to acquire MRI scans from the neck to the knee in 100,000 individuals 34 , requires different approaches to image analysis. Manual image segmentation is time consuming and infeasible in a cohort as large as the UKBB. However, this dataset provides a unique opportunity to measure iliopsoas muscles volume in a large cross-sectional population. Therefore, development of a robust and reliable automated method is essential. In this paper, we present an automated method to segment iliopsoas muscle volume using a CNN and discuss results arising from 5000 participants from the UKBB imaging cohort, balanced for BMI, age, and gender.

Data.
A total of 5000 subjects were randomly selected for this study, while controlling for BMI, age, and gender from the UKBB imaging cohort. Age was discretised into four groups: 44-53, 54-63, 63-72 and 73-82 years. The eight strata were defined to cover both age and gender. Weights were used to maintain the proportion of subjects within each age group to match that of the larger UKBB population.
Demographics for the study population (Table 1) were balanced for gender (female:male ratio of 49.9:50.1). The average age of the male subjects was 63.3 ± 8.4 years and the female subjects was 63.3 ± 8.3 years. The average BMI of the male subjects was 27.0 ± 3.9 kg/m 2 (range 17.6-50.9 kg/m 2 ) and for female subjects 26.2 ± 4.7 kg/m 2 (range 16.1-55.2 kg/m 2 ), with the mean for both groups being categorised as overweight. The self-reported ethnicity was predominantly White European (96.76%). As per the whole UKBB population, the sub-cohort in the current study was significantly healthier than the UK general population. The most common ailment were related to arthropathies, with smaller proportion reporting a variety of neoplasms, ranging from skin melanomas to benign neoplasms (Supplementary Table S1).
Participant data from the UKBB cohort was obtained as previously described 34 through UKBB Access Application number 23889. The UKBB has approval from the North West Multi-Centre Research Ethics Committee (REC reference: 11/NW/0382). All methods were performed in accordance with the relevant guidelines and regulations, and informed consent was obtained from all participants. Researchers may apply to use the UKBB data resource by submitting a health-related research proposal that is in the public interest. More information may be found on the UKBB researchers and resource catalogue pages (https ://www.ukbio bank.ac.uk/). Raw MR images were obtained from the UKBB Abdominal Protocol 35 and preprocessed as previously reported 36,37 . The data were acquired on the same model, a Siemens Aera 1.5 T scanner (Syngo MR D13) (Siemens, Erlangen, Germany), across three sites (Stockport, Newcastle, Reading, UK). The Dixon sequence involved six overlapping series that were acquired using a common set of parameters: TR = 6.67 ms, TE = 2.39/4.77 ms, in-plane voxel size 2.232 × 2.232 mm, FA = 10 • and bandwidth = 440 Hz. The first series, over the neck, consisted

Manual annotation.
A single expert radiographer manually annotated both iliopsoas muscles for 90 subjects using the open-source software MITK 38 . Each axial slice of the water images was examined, the iliopsoas identified, and the borders of the psoas and iliopsoas manually drawn for 90 subjects. On average, manual annotation of both muscles took five to seven hours per subject. The annotated data covered a broad range of age and BMI from male and female UKBB participants. A typical Dixon abdominal dataset, centred on the iliopsoas muscles, is shown in Fig. 1, manual iliopsoas muscle annotations are overlaid on the anatomical reference volume in red and a 3D rendering of the manual annotation.
Model. We trained a model able to predict both muscles individually. The preprocessing steps for the training data, where the cropping is also needed for applying the model to unseen data, are as follows. Two arrays of size 96 × 96 × 192 were cropped around the hip landmarks 36 , to approximate the location of the muscles in order to perform the segmentation (an example of the cropped regions may be found in Supplementary Fig. S1). After cropping, each volume was normalised such that the signal intensities lie between zero and one, where the 99 th percentile was used instead of the maximum to avoid possible spikes in signal intensity. That is, all signal intensities above the 99 th percentile were mapped to one. Two sets of 16 training samples were generated for every subject by separating the right and the left muscles, introducing reflections exploiting the symmetry of the structures. Further data augmentation included seven random transformations consisting of translations by up to six voxels in-plane, up to 24 voxels out-of-plane, and random scaling ranging from −50 to +50 % out-ofplane and from −25 to +25 % in-plane, in addition to the original data. We chose larger factors for out-of-plane transformations to account for the skewed variability in shape and position of the muscles, reflecting the fact that there is more variation in height than width in the population. After data augmentation, 2880 training samples were produced from the original 90 manually annotated pairs of iliopsoas muscles. The model used for 3D iliopsoas muscle segmentation closely follows a similar architecture to the U-Net 26 and the V-Net 27 , with a contracting path and an expansive path connected by skip connections at each resolution level. These network architectures have been established as the gold standard for image segmentation over the last few years, as they require modest amounts of training data as a consequence of operating at multiple resolution levels while providing excellent results within seconds. Several convolution blocks are used in our model architecture. An initial block (I) contains a 5 × 5 × 5 convolution with eight filters followed by a 2 × 2 × 2 convolution with 16 filters and stride two. The down-sampling blocks in the contraction ( D i,m ) consist of i successive 5 × 5 × 5 convolutions with m filters followed by a 2 × 2 × 2 convolution of stride with stride two, used to decrease the resolution. In the expansion, the up-sampling blocks ( U j,n ) mirror the ones in the contraction where there are transpose convolutions instead of stride two convolutions. The block (L) at the lowest resolution level of the architecture consist of three successive 5 × 5 × 5 convolutions with 128 filters followed by a 2 × 2 × 2  39 , where the self-normalising properties allow it to bypass batch normalisation layers enabling higher learning rates that lead to more robust and faster training. The model was trained minimising the Dice score coefficient (DSC) loss 27 with a batch size of three using the Adam optimiser and a learning rate of 1e−4 until convergence at 100 epochs. The learning rate was determined through a parameter sweep (1e−1 to 1e−6). We performed all of the CNN development, learning, and predictions using Keras (TensorFlow backend) 40 on a NVIDIA Titan V 12 GB GPU. We limited the batch size to three due to the GPU memory.

Validation.
A common metric used to evaluate segmentation performance is the DSC, also known as the F1 score. It is defined as twice the intersection of the labels divided by the total number of elements. Intersection of labels can also be seen as a True Positive (TP) outcome. The total number of elements can also be seen as the sum of all False Positives (FP), False Negatives (FN) and twice the number of TPs.
For validation of the model, we performed a six-fold cross-validation experiment, where in a single iteration 75 of the manually annotated images (approximately 83%) were used to train the model and the performance was evaluated on the remaining 15 out-of-sample images (approximately 17%).
Statistical analysis. All summary statistics, hypothesis tests and figures have been performed using the R software environment for statistical computing and graphics 41 . Variables were tested for normality using the Shapiro-Wilk's test, the null hypothesis was rejected in all cases. Spearman's rank correlation coefficient ( ρ ) was used to assess monotonic trends between variables. The Wilcoxon rank-sum test was used to compare means between groups, and the Wilcoxon signed-rank test with paired observations. Methods for segmenting the iliopsoas muscle volume were compared using the Bland-Altman plot. Given the exploratory nature of the research, p-values < 0.05 were judged to be statistically significant. Supplementary Table S2. The average bias was −0.2 % with upper and lower limits of agreement being 13.3% and −13.7 %, respectively (Fig. 2). The overlap between the CNN-based and manual segmentations for two subjects is also provided in Fig. 2 Example segmentations from our method are provided in Fig. 3, displaying a sample of 12 subjects covering a variety of body sizes and habitus. The first three subjects (a-c) have some of the smallest iliopsoas muscles ( total volume ≈ 346 ml), the next three subjects (d-f) have typical iliopsoas muscles ( total volume ≈ 800 ml) and the third set of three subjects (g-i) have some of the largest iliopsoas muscles ( total volume ≈ 1300 ml). The final set of three subjects represent subjects whose left and right iliopsoas muscles differ in volume ( difference in volume ≈ 93 ml for j and k, difference in volume = 182 ml for l). We can see that the model performs well for all of them, with additional details regarding model validation provided in Supplementary Fig. S2. Iliopsoas muscle volume. In each gender there was a small (approximately 2%) yet statistically significant asymmetry between left and right iliopsoas muscles (Wilcoxon signed-rank test; male: d = −7.3 ml; female: d = −6.5 ml; both p < 10 −15 ) (Fig. 4). These differences were not significantly associated with the handedness  www.nature.com/scientificreports/ of the participants. Significantly larger iliopsoas muscle volumes were measured in male compared with female subjects (Table 2).

Validation. A summary of the cross-validation experiment may be found in
Relationship between iliopsoas muscle volume and physical characteristics. Significant correlations were observed between the total iliopsoas muscle volume and height in both genders (male: ρ = 0.51 ; female: ρ = 0.54 , both p < 10 −15 ) (Fig. 5).
To account for the potential confounding effect of height on iliopsoas muscle volume, an iliopsoas muscle index (IMI) was defined with units ml/m 2 . Significant correlations were observed between the IMI and BMI in both genders (male: ρ = 0.48 ; female: ρ = 0.47 , both p < 10 −15 ) (Fig. 6).
A significant negative correlation was observed between IMI and age in both genders (male: ρ = −0.31 , p < 10 −15 ; female: ρ = −0.11 , p < 10 −7 ). However, the relationship could not be easily explained by a simple linear method (Fig. 7). In fact the decrease in IMI as a function of age accelerates for men, starting in their early 60s, while for women it remains relatively constant.

Discussion
There is considerable interest in measuring psoas muscle size, primarily related to its potential as a sarcopenic marker, thereby making it an indirect predictor of conditions influenced by sarcopenia and frailty, including health outcomes such as morbidity, and mortality 4,[6][7][8][9][10]14,15 . The complexity in measuring total muscle directly, particularly in a frail population has necessitated the reliance on easily measured surrogates and the psoas muscle CSA is increasingly used for this purpose. However there is little consistency in the field regarding how the psoas muscle is measured, with considerable variation between publications. An automated approach to analysis will (2) IMI = total iliopsoas muscle volume height 2 , Table 2. Iliopsoas muscle volumes ( n = 5000). Significance refers to the p-value for a Wilcoxon rank-sums test, where the null hypothesis is the medians between the two groups (male and female subjects) being equal.  www.nature.com/scientificreports/ reduce the need for manual annotation, allowing more of the muscle to be measured and enable much larger cohorts to be studied, this is particularly important as large population based biobanks are becoming more common. In this paper we have described a CNN-based method to automatically extract and quantify iliopsoas muscle volume from MRI scans for 5000 participants from the UKBB. Excellent agreement was obtained between automated measurements and the manual annotation undertaken by a trained radiographer as demonstrated by the extremely high DSC with testing data. CNNs have been established as the gold standard in automated image segmentation. The results, which can be produced with a modest amount of manual annotations as training data and smart data augmentation, are highly accurate, fast, and reproducible. Manual annotations become a bottleneck for large-scale population studies, when the number of participants exceeds many thousand such as with the UKBB. Applying automated methods to vast amounts of data requires a thorough set of quality-control procedures beyond just out-of-sample testing data, which is often used to validate new methods in machine learning studies. Large-scale quality control can be done by steps such as looking at maximum and minimum values, asymmetric values (for symmetric structures such as the iliopsoas muscles), outliers, and overall behaviour of the results.  www.nature.com/scientificreports/ The vast majority of previous studies investigating psoas muscle size have relied on CSA measurements primarily because of data availability and time constraints 3,4,[6][7][8][9][10][11]13,14,[16][17][18][19] . Analysis of CSA is considerably less labour intensive than manually measuring tissue volumes, furthermore, many studies have repurposed clinical CT or MRI scans [16][17][18] which typically will not have been acquired in a manner to enable volume measurements. This has led to psoas muscle CSA being measured at a variety of positions relating to lumbar landmarks including L3, L4 and between L4-5, as well as more unreliable soft tissue landmarks such as the umbilicus, with the CSA measurements used alone, relative to lumbar area, height, height squared or total abdominal muscle within the image at the selected level. While lumber landmarks should provide a relatively consistent CSA in longitudinal studies, comparison between studies and cohorts becomes almost impossible. This is further compounded by studies that have shown considerable variation in psoas CSA along its length 2,42 , and that regional differences in psoas CSA have been observed in athletes 43 , following exercise training or inactivity 44 . This appears to suggest that CSA at a fixed position may not accurately reflect changes in the psoas size elsewhere in response to health related processes. It is clear that to overcome these confounding factors, it is essential to measure total psoas volume.
In this study, we have trained a CNN to segment iliopsoas muscles, applied it to 5000 UKBB subjects and measured their total volume. This measurement includes the psoas major and iliacus muscles, and as mentioned in the proceeding section, the psoas minor muscle (if present). This reflects the practical difficulties of isolating the entire psoas muscle in images in a consistent and robust manner. The merging of the iliacus and psoas muscles below the inguinal ligament makes their separation not only impractical, but unachievable with standard imaging protocols. Similarly, it is not possible to separate the psoas major and minor muscles under these conditions, even if CSA measurements were to be made. Therefore, a standard operating procedure was required, either measure a partial psoas volume, selecting an anatomical cut-off before the junction with the iliacus muscle, or to include the iliacus and measure the iliopsoas muscle volume in its entirety. In this study we have opted for the latter, as selecting an arbitrary set point would clearly introduce a significant confounding factor with unforeseeable impact on the subsequent results. Thus, we have measured the entire iliopsoas muscle, and although literature comparisons are limited, as there is a paucity of comparable volumetric studies within the general population, our average reported values for male subjects ( 407.2 ± 62.7 ml) were within the range 351.1-579.5 ml in a cohort which included male athletes and controls 43 .
Furthermore, our CNN-based method performs very well, with a small but systematic underestimation of −0.2 % when compared with manual annotations. Incremental improvement of the model is possible using straightforward techniques, such as increasing the number and variety of training data or expanding the breadth of data augmentation 45 . These are currently under investigation.
We observed a small (approximately 2%) but significant asymmetry in iliopsoas muscle volume, with the right muscle being larger in both male and female subjects. Previous studies have looked at the muscle asymmetry in tennis players, and found that the iliopsoas muscle was 13% smaller on the non-dominant compared with the dominant side of the body, whereas inactive controls the dominant size was 4% larger than the non-dominant 43 . Similarly footballers players have significantly larger psoas CSA on their dominant kicking side 46 . The best equivalent to this within the UKBB phenotyping data was handedness, which we found not to be related to leftright differences in iliopsoas volume in the current study. An additional factor which may contribute towards iliopsoas asymmetry relates to the presence or absence of the psoas minor muscle, a long slim muscle typically found in front of the psoas major. This muscle can often fail to develop during embryonic growth 2 and there can be considerable differences in the incidence of agenesis which can be unilateral or bilateral with ethnicity thought to be a factor 47 . Further work is required to understand whether this contributes to the left-right asymmetry observed in the present study, since it is not possible to resolve this muscle on standard MRI images.
In line with previous studies of psoas CSA, male subjects had significantly larger iliopsoas muscles compared to females 6 . This is unsurprising since gender differences in both total muscle and regional muscle volumes are well established 48,49 . Indeed some studies have suggested using gender specific cut-offs of either psoas CSA alone or psoas muscle index to identify patients at risk of poorer health outcomes 10 . Furthermore, some studies have suggested that the magnitude of gender differences in trunk muscle CSA vary depending where are measured. This adds weight to the argument that volumetric measurements are perhaps more robust than CSA measures for this comparison 50 . It has been proposed that the gender differences in psoas volume could in part relate to the impact of height on psoas volume 12 . Indeed, we found a significant correlation between iliopsoas muscle volume and height similar to those previously reported by earlier studies 49 . However, the gender differences observed in our study were still present when correcting for height. Interestingly, it has been reported that the relationship between muscle volume and body weight is curvilinear, since increases in body weight often reflect gain in fat, as well as muscle mass. In the present study we observe a significant correlation between IMI and BMI. This is in agreement with previous studies of psoas CSA which have also shown a significant correlation with BMI 6 , indeed some studies combined both metrics as a prognostic marker 17 . We also found a significant correlation between IMI and age. It is widely reported that muscle mass declines with age, particularly beyond the fifth decade, a fundamental characteristic of sarcopenia 51 . The magnitude of this decline was relatively small, but this may arise by the limited age range within the UKBB data set (44-82 years), compared to other studies that have investigated the impact of age on muscle volume across the entire adult age span (18-88 years), which usually tend to reveal a more dramatic decline in muscle volume 49 .
In conclusion, we have developed a robust and reliable model using a CNN to automatically segment iliopsoas muscles and demonstrated the applicability of this methodology in a large cohort, which will enable future population-wide studies of the utility of iliopsoas muscle as a predictor of health outcomes.

Code availability
Model weights and instructions for use are available at https://github.com/recoh/iliopsoas_muscle.