Describing the application of statistical shape modelling to DXA images to quantify the shape of the proximal femur at ages 14 and 18 years in the Avon Longitudinal Study of Parents and Children

Bones are complex objects with considerable variation in the shape and structure often attributed to anatomical, environmental or genetic differences. In addition, bone shape has been of interest in relation to its associations with disease processes. Hip shape is an important determinant of hip osteoarthritis and osteoporotic hip fracture; however, its quantification is difficult. While previous studies largely focused on individual geometrical indices of hip geometry such as neck-shaft angle or femoral neck width, statistical shape modelling offers the means to quantify the entire contour of the proximal femur, including lesser trochanter and acetabular eyebrow. We describe the derivation of independent modes of variation (hip shape mode scores) to characterise variation in hip shape from dual-energy X-ray absorptiometry (DXA) images in the Avon Longitudinal Study of Parents and Children (ALSPAC) offspring, using statistical shape modelling. ALSPAC is a rich source of phenotypic and genotypic data which provides a unique opportunity to investigate the environmental and genetic influences on hip shape in adolescence, as well as comparison with adult hip shape.


Amendments from Version 1
We considered reviewers comments and revised our manuscript accordingly. The following changes have been made: • In order to clarify the focus of the paper the title, abstract and the introduction have been edited in line with reviewers' comments • The introduction section now explains how the data can be used in future analyses and the last sentence of the 'dataset' section gives an example of analyses already carried out using these data • As suggested, a table with descriptive statistics (combined and sex-stratified) such as age, height, weight and BMI (including number of males and females included in the final datasets) has now been added • Clarification regarding key landmark points has now been added to the 'methods' section • Additional text to explain what the reference model is, at suitable points in the manuscript, has been added and references to other studies that have used reference models in this way have been provided • As suggested, Intraclass Correlation Coefficients have been calculated to assess intra-and inter-observer reliability (the results are briefly described in the text and provided as a figure) • The sentence regarding the independence of the models has now been clarified • For consistency, since the data note describes a dataset derived in adolescents, Figure 1 has been replaced with an image from this cohort (version 1 contains an image from an adult cohort)

Introduction
Bones are complex objects with each bone showing considerable variation in size and shape between individuals, which can be attributed to anatomical differences, environmental and genetic influences or be a consequence of a disease process. Traditionally these differences have been assessed by measuring lengths and angles, however it has been recognized that single geometrical measurements are often correlated with measures of body size as well as other geometrical indices 1 . Statistical shape modelling (SSM) is a method which uses a set of landmark points to describe an outline of an object as opposed to a single geometrical measurement and can represent a combination of several different aspects of shape of that object (e.g. in case of proximal femur, concomitant variations in femoral neck (FN) and femoral head size and shape).
Musculoskeletal disorders are a significant cause of disability worldwide and the number of people affected is expected to increase given the ageing population, rise in obesity and increasingly sedentary lifestyles 2 . Osteoarthritis (OA) and osteoporotic fractures are the most common age-related musculoskeletal diseases and are associated with significant healthcare burden. Previous studies suggest that hip shape is an important risk factor for both hip OA 3,4 and osteoporotic hip fracture 5 . Little is known, however, about its development in childhood and adolescence. Statistical shape modelling provides a means for capturing the global shape of the proximal femur; it uses principal components analysis to generate modes of variation (Hip Shape Modes (HSMs)) which describe each image in terms of standard deviations below or above the mean shape, after removing variation in size. One disadvantage of SSM in previous literature has been that models reflect the variation within the dataset they were trained on, making direct quantitative comparison between similar studies difficult. This can however be overcome by using a previously built model as a reference model for a subsequent dataset 6,7 .
The Avon Longitudinal Study of Parents and Children (ALSPAC) is a longitudinal birth cohort, which in the 1990s recruited pregnant women in South West England 8 . ALSPAC is a rich source of data, including phenotypic and genetic data collected for the mothers, fathers and children. It is uniquely suited for examining variation in hip shape in earlier life, based on hip dual-energy X-ray absorptiometry (DXA) scans obtained when the children were, on average, 14 and 18 years old. This data note describes the methodology and data used to quantify the shape of the proximal femur in ALSPAC offspring at these time points. In order to allow direct comparability with other studies and between the time points, an adult reference statistical shape model (SSM) template (based on 19,379 images from 5 cohorts 9 ) was applied to these data.
These generated data (HSMs describing variation in hip shape) provide an opportunity to quantify variation in hip shape and be subsequently used in future analyses to examine sex differences in hip shape, and to explore associations with other factors, including genetic.

ALSPAC Data
ALSPAC is a longitudinal birth cohort which recruited a total of 14,541 pregnant women with expected delivery date between 1 st April 1991 and 31 st December 1992. Of these pregnancies, 69 have no known birth outcome, and of the remaining 14,472 pregnancies, 195 were twin, 3 were triplet and 1 was quadruplet accounting for 14,676 known foetuses. These pregnancies resulted in 14,062 live births, of which 13,988 children were alive at 1 year of age.
In addition to the initial enrolment that took place between 1991 and 1992, further recruitment took place when the children were, on average, 7 years old, and another from age 8 onwards to which eligible children and those not initially enrolled were also invited. This resulted in a total of 15,247 pregnancies enrolled.
Since recruitment these children have been followed up at regular intervals; questionnaire and clinical assessment data have been collected. Moreover, additional data on siblings, mothers and their partners, have also been collected.
Hip DXA scans Hip DXA scans collected during two assessment clinics, Teen Focus (TF) 2 and TF 4, were used to quantify the shape of proximal femur. TF 2 was performed between January 2005 and

Statistical shape model (SSM)
Raw hip DXA images were securely transferred to collaborators in Aberdeen for image processing and uploaded into Shape software (University of Aberdeen). Each image was marked up with a set of landmark points, which relate to points that are placed at easily identifiable anatomical features of an object (please refer to Figure 1, which shows the placement of landmark points, and Table 2, which describes the anatomical positions of each of the key landmark points (shown in red in Figure 1)).
Following point placement, Procrustes analysis was used to estimate the mean shape. The aim of this step is, first, to remove any translational, rotational and scaling information and then align each image as closely as possible. Any effect of age and/ or sex or other non-image variables is not accounted for at this stage. After completing the alignment, principal component analysis (PCA) was performed using the coordinates of each point to build the SSM, producing a set of orthogonal modes of variation known as principal components (referred  to as hip shape modes (HSMs)). These modes together explain 100% of variance in the data set, with the first HSM accounting for the largest amount of variance and subsequent HSMs accounting for less variance. Each HSM has a mean of zero and unit standard deviation (SD), and each image and, consequently, each individual is assigned a set of values for each HSM which describes the number of SDs away from the mean shape.
Applying external adult reference SSM template to adolescent data One of the limitations of statistical shape modelling is the lack of comparability of HSMs with other datasets and studies, since each SSM is unique to that particular set of images. One way of overcoming this limitation is to apply a set of pre-defined HSMs, previously obtained from a reference population. An SSM template based on a reference set generated from a GWAS meta-analysis of hip shape from five cohorts (based on 19,379 images) 9 , was applied to both adolescent datasets in order to directly compare hip shape between adolescent time points as well as with adult hip shape. See Table 3 for details regarding cohorts contributing to the adult reference SSM. Briefly, the reference model was built as described above and the eigenvectors were saved and used to calculate the mode scores for subsequent models (without adding the new image to the reference model or changing it in any way).

Reproducibility of point placement
A set of 100 images, collected during TF 4 clinic, were randomly selected and marked 2 months after completing the initial point placement in ALSPAC adolescents. The same set of images was also marked by a second marker. Intra-(within-) and inter-observer (between-observer) repeatability of manual point placement was measured as the difference in pixels between coordinates of 58 points. The intra-and inter-observer reliability assessed by mean point-to-point repeatability was 1.22 and 1.78 pixels, respectively. Considering that the average size of hip DXA image in pixels was 250 × 180, these errors are small and a cut off median point-to point difference of less than or equal to 3 was previously considered as accurate 10 . In addition, average Intraclass Correlation Coefficients (ICCs) for the top ten HSMs were calculated. Figure 2 shows the intra-and inter-observer agreement values for each of the modes. The mean ICC values were 0.87 for intra-and 0.70 for inter-observer agreement. Whilst all ICC values for intra-observer agreement were above or equal to 0.70, inter-rater scores for modes 3, 6, 9 and 10 were below 0.70. Whilst the initial model was based on a 58-point model, this was subsequently modified to a 53-point model due to high variability in points placed at the acetabular overhang and medial and lateral femoral shaft, in both adolescent and adult SSM templates.

Dataset
The first ten HSM scores generated using external adult reference SSM for adolescent data collected at ages 14 and 18 years,  are available in the ALSPAC resource. A total of 4,468 individuals had hip shape data generated at age 14 (2,140 were male, 2,328 were female) and total of 4,413 had data available at age 18 (1,939 were males, 2,474 were female). Please refer to Table 4 for descriptive statistics of the final sample for ALSPAC adolescents. Similarly to previously published literature 10-12 the first 10 modes, which together explained 85% of variance, were selected (higher modes >10 can often be regarded as noise as each represents less than 1.5% of the variance). Figure 3 and Figure 4 provide graphical representation and Table 5 provides summary of the features described by each HSM. Compared to mean = 0 and SD = 1 when using the data as its own reference, when using the adult reference SSM (based on adult data with age ranging from 48 to 74 years), means for the first ten HSMs ranged from -1.14 to 2.26 at age 14 and from -1.5 to 2.42 at age 18, whereas SDs ranged from 0.42 to 0.97 at age 14 and from 0.41 to 0.91 at age 18 (Table 6).
When the adult reference SSM was applied to ALSPAC mothers' images, means for HSMs 2-9 were close to 0 (ranging      from -0.35 to 0.34) and SDs were close to 1 (ranging from 0.8 to 1), whereas mean and SD HSM1 score were 1.45 and 0.5, respectively.
The differences in means and SDs could be due to sex and/ or age differences (i.e. mothers were on average 48 years old, therefore more closely resembling the ages of cohorts included in the reference model as opposed to ALSPAC offspring). The deviation away from the mean was particularly noted for HSM1, which is likely to reflect scanner differences between ALSPAC and other cohorts in the adult reference set. Different pixel spacing in the Lunar Prodigy scanner (used to acquire DXA scans in ALSPAC) relative to other scanners alters the aspect ratio (ratio between image height and width), and therefore HSM1 reflects these differences. Likewise, the smaller standard deviation is likely to reflect the narrower range generated when only one scanner is used.
Whilst direct comparison of the modes across the time points is an added advantage of applying an external reference SSM, one of the potential issues that may arise is that previously independent HSMs might no longer be independent of each other. In order to quantify the extent of the potential loss of independence, after applying SSM based on the combined adult reference model to adolescent data Matrix Spectral Decomposition was performed using the matSpD tool to compute the number of independent modes. The top ten HSMs based on adult reference SSM at both time points were first correlated (Table 7 and Table 8) and tested for independent number of variables (HSMs) using matSpD. As expected, the results showed that the top ten HSMs were essentially independent, as reflected by matSpD score of 9.6, indicating 4% loss of independence.
SSM methodology offers a powerful approach to study subtle changes in hip morphology and it has been successfully    applied to study variation in hip shape associated with the incidence 13,14 and progression of OA 15 , as well as associations with hip fracture 16 in adult cohorts. A major drawback of the methodology has previously been that as each model is datadriven, the HSMs generated are unique to the sample used, thus preventing direct cross-comparison with other studies. One of the key strengths of hip shape data presented here is the application of an adult reference SSM to hip DXA images at ages 14 and 18 years, which allows direct comparisons of associations with HSMs between these time points and comparison of findings with results in adults. For example using the results from the largest to date meta-analysis of DXA derived hip shape 9 , we were able to replicate these analyses in adolescents and directly compare the relationships between genetic loci associated with hip shape in adults with those in adolescents 17 . Furthermore, future analyses examining associations between hip shape and OA-case status, applying the same SSM template which was used for the purpose of this data note will enable future studies in adolescents to focus on those aspects of hip morphology more strongly related to pathology in later life.

Ethical approval and consent
Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees, full details of the approvals obtained are available from the study website (http://www.bristol.ac.uk/alspac/researchers/research-ethics/).
Written informed consent was obtained from parents, and children were invited to give consent where appropriate. Study members have the right to withdraw their consent for elements of the study or from the study entirely at any time.

Data availability
ALSPAC data access is through a system of managed open access. The steps below highlight how to apply for access to the data included in this data note and all other ALSPAC data. The dataset generated in this data note has been deposited within the ALSPAC data resource and is linked to ALSPAC project number B1274. Please quote this number to request required variables which have been described in this dataset (HSMs generated at ages 14 and 18 years).
1. Please read the ALSPAC access policy (PDF, 627kB) which describes the process of accessing the data and samples in detail, and outlines the costs associated with doing so.
2. You may also find it useful to browse our fully searchable research proposals database, which lists all research projects that have been approved since April 2011.
3. Please submit your research proposal for consideration by the ALSPAC Executive Committee using the online process. You will receive a response within 10 working days to advise you whether your proposal has been approved.
If you have any questions about accessing data, please email alspac-data@bristol.ac.uk.
The ALSPAC data management plan describes in detail the policy regarding data sharing, which is through a system of managed open access. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Version
"Applying external adult reference SSM template to adolescent data": The comparison was done using similar modes between cohorts? or the first ten modes? What is exactly the reference model? Please describe how or what is the reference model. "The intra-and inter-observer reliability assessed by mean point-to-point repeat ability was 1.22 and 1.78 pixels, respectively". I would like to see a known estimate of intra-class correlation because in pixels it does not tell much to the readers of the article. The question is: the correlation within and between reader was good? What are the estimates? "The differences in means and SDs could be due to sex and/or age differences" it means that the HSMs were not adjusted for age and gender? Please clarify in methods. In methods, how many females, range of age, BMI...do you have information on those variables? If you have it please add it.
In my opinion HSM1 could be use as reference for the other modes. It represents the gross percentage of variation. If this mode (HSM1) what really reflect is scanner differences, when you correct in your equation for type of scanner the difference would disappear. Can you confirm this? "The results showed close to 10 (9.6) independent variables for both time points" Please clearly describe in the document whether the 10 modes were independent or not (or which of them were not) and what was the real loss of independence (in % or number that readers understand).
In table 6 and 7 what is represented in the axis X and Y to make the correlation?

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others?
Partly Partly

Are the datasets clearly presented in a useable and accessible format? Yes
No competing interests were disclosed. Competing Interests: Reviewer Expertise: Orthopedics, rheumatology, epidemiology including genetics epidemiology.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. We thank the referee for their helpful comments. The reference model is a numerical description of a set of images using, in this case, a previous model based on N=19,379 images. By default, PCA produces a number of HSMs explaining 100% of variation in the dataset. However, as in most previous publications, the top few modes explain the majority of the variance in the dataset and ten were selected for analysis. The reference model is locked and HSMs describing each image from the adolescent cohort data were calculated in terms of this model. Again, we selected the top ten for analysis in order to be able to compare them with the previous study. The number of modes selected for analysis, however, has no bearing on the results of the reference model (so had we chosen 9 or 90 modes; the values would be unaffected). We have added additional text to explain this point at suitable points in the manuscript and provided references to other studies that have used reference models in this way.
"The intra-and inter-observer reliability assessed by mean point-to-point repeat ability was 1.22 and 1.78 pixels, respectively". I would like to see a known estimate of intra-class correlation because in pixels it does not tell much to the readers of the article. The question is: the correlation within and between reader was good? What are the estimates? We have calculated intra-class correlation coefficients as suggested (description and a figure showing the results have been added to the manuscript) and updated the sentence relating to pixel results to add clarity. "The differences in means and SDs could be due to sex and/or age differences" it means that the HSMs were not adjusted for age and gender? Please clarify in methods.
We have now clarified in methods that derived HSMs were not adjusted for age and gender.
In methods, how many females, range of age, BMI...do you have information on those variables? If you have it please add it.
We have now added a table with descriptive statistics to the dataset section.
In my opinion HSM1 could be use as reference for the other modes. It represents the gross methodology and data that in other studies the authors will use, the authors should focus on that. Therefore, the authors should clearly define their objectives and construct the title of the paper and the Introduction section according these objectives. Both of them (Title and Introduction) should lead the reader to the final objectives of the paper. I think that in addition to describing the methodology used to analyse the proximal shape of the femur by statistical shape modelling and the landmarks used, the authors should also explain how they will use all of this in their analyses. I wonder why they do not construct different models for boys and girls. It is well known that female and male femur each follow divergent growth trajectories which are clearly marked from 12 years of age onward (Pujol et al., 2016 ). How are they going to use these ten 10 PCs on future papers? How will the application of the external adult reference statistical shape model template to adolescents and mothers aid comparability with other studies and ages? What results do they think can obtain from the application of these data in their future analyses? I think that all of this should be better explained and discussed in the paper.
Other comments: What is the difference between Geometric Morphometrics and Statistical shape modelling? This and the advantages to use Statistical shape modelling in front of Geometric Morphometrics should be explained in the Introduction section. What do key landmarks mean in Statistical shape modelling? In Material and Methods section, when the authors describe the final chosen sample of the Avon Longitudinal Study of Parents and Children used for their study, they should indicate the final number of boys and girls to be analysed.

What is the difference between Geometric Morphometrics and Statistical shape modelling? This and the advantages to use Statistical shape modelling in front of Geometric Morphometrics should be explained in the Introduction section.
There is no material difference between these methodologies. The primary difference lies in how the results are displayed (direct variation in shape rather than warping of an underlying grid). In musculoskeletal research the SSM approach and terminology has been used in preference to GM so we prefer to continue to use this to avoid confusion.
What do key landmarks mean in Statistical shape modelling? Key landmark points in this data note relate to points that are placed at easily identifiable anatomical features of an object -this sentence has now been added to 'Statistical shape model (SSM)' section of the data note.

In Material and Methods section, when the authors describe the final chosen sample of the Avon Longitudinal Study of Parents and Children used for their study, they should indicate the final number of boys and girls to be analysed.
This has now been added to the dataset section.
No competing interests were disclosed. Competing Interests: