Mechanisms of Ageing and Development From bed to bench: How in silico medicine can help ageing research

Driven by the raising ethical concerns surrounding animal experimentation, there is a growing interest for non- animal methods, in vitro or in silico technologies that can be used to reduce, re ﬁ ne, and replace animal experimentation. In addition, animal experimentation is being critically revised in regard to its ability to predict clinical outcomes. In this manuscript we describe an initial exploration where a set of in vivo imaging based subject-speci ﬁ c technologies originally developed to predict the risk of femoral strength and hip fracture in osteoporotic patients, were adapted to assess the e ﬃ cacy of bone drugs pre-clinically on mice. The CT2S technology we developed generates subject-speci ﬁ c models based on Computed Tomography that can separate fractured and non-fractured patients with an accuracy of 82%. When used in mouse experiments the use of in vivo imaging and modelling was found to improve the reproducibility of Bone Mineral Content measurements to a point where up to 63% less mice would be required to achieve the same statistical power of a conventional cross-sectional study. We also speculate about a possible approach where animal-speci ﬁ c and patient-speci ﬁ c models could be used to better translate the observation made on animal models into predictions of response in humans.


Introduction
There is a growing interest for non-animal methods, in vitro or in silico technologies that can be used to reduce, refine, and replace animal experimentation (see for example this review relative to vision research (Combes and Shah, 2016); the primary motivation is the raise in our societies of ethical concerns with animal experimentation.
But there is another problem that drives the critical revision of animal experimentation: According to a recent BIO report 1 , based on a database of nearly 10,000 regulatory phase transitions, the Likelihood of Approval from phase I (after the compound has successfully passed the pre-clinical assessment, mostly based on animal experimentation) ranges between 26.1% in haematology to 5.1% in oncology, with an average across all indications of 9.6%. That mean that in the best-case scenario animal models are able to spot all safety and efficacy issues associated with a new compound only one time out of four.
In parallel there is a growing enthusiasm around the use of subjectspecific computer modelling and simulation to support the clinical decision. Developed in Europe under the flag name of Virtual Physiological Human (VPH), and elsewhere with that of Physiome Project, these technologies are capable of accurately predict the disease progression and the effect of different treatments in individual patients, for a variety of conditions (Viceconti and Hunter, 2016).
Computational methods have been used in the last few years to investigate various aspects on skeletal ageing (Lambers et al., 2015;Patel et al., 2014;Razi et al., 2015a,b;Yang et al., 2017). In this manuscript we describe an initial exploration where a set of VPH technologies, originally developed to predict the risk of femoral strength and hip fracture in osteoporotic patients, were adapted to assess the efficacy of bone drugs pre-clinically on mice. We also propose an approach where animal-specific and patient-specific models could be used to better translate the observation made on animal models into predictions of response in humans.
because of the severity of the consequences, is of particular clinical relevance.
Pharmacological treatments exist that slow down the progression of the disease, reducing the incidence of fragility fractures of 40% or better (Sanderson et al., 2016). But because this is a long-term therapy, with considerable associated costs, and some potential side effects, it is essential to recognise who are the patients with a sufficiently high risk of fragility fractures to justify such pharmacological treatment.
It is not trivial to define the accuracy with which a predictor identifies the patients at higher risk. One possible approach is to build artificial cohorts where patients who experienced a fragility fracture before being treated, and then for each of them identify another patient who did not had any fragility fracture yet, but match the fractured patient in gender, age, weight, and height. One of such cohorts, hereinafter referred as the Sheffield Cohort, includes 50 postmenopausal women with a fragility hip fracture and 50 control cases (no fractures), pair-matched by gender, age, weight, and height (Yang et al., 2014). With such cohorts it is possible to compare the accuracy of risk predictors in term of their ability to discriminate the fractured patients and non-fractured ones; this will be referred to hereinafter as discrimination accuracy.
The risk of experiencing a fragility hip fracture is strongly related to the side-fall strength of the patient's femur, intended as the force required to fracture the femoral bone, under loading conditions similar to those observed during side-fall impact with the floor. Strength can be measured only invasively; however, a number of methods use medical imaging to acquire non-invasively information about shape and/or the degree of mineralisation of the patient's bone and use this information to estimate the strength. The current clinical standard involves measuring the areal Bone Mineral Density (aBMD) at the hip region using a specialised medical imaging technology called Dual X-ray Absorptiometry (DXA). Since 1985 (Basu et al., 1985), a number of research groups have refined the methods involved with an alternative approach, where quantitative computed tomography (QCT) of the femoral region is used to generate a subject-specific finite element model (SSFE) of the patient's bone, a computer model that is used to predict the femoral strength.
In a recent meta-analysis, conducted over a set of studies run by five different groups and that involved over 400 cadaver femurs, we showed that QCT-SSFE is more accurate (84-85%) than DXA-aBMD (77-81%) in predicting femoral strength in side fall, as measured in cadaver femurs (Viceconti et al., 2018). However, until recently, QCT-SSFE was reported to be not significantly more accurate than DXA-aBMD in predicting the risk of hip fracture (Zysset et al., 2015), due to a number of methodological issues. The CT2S technology developed in Sheffield has addressed most of these issues effectively. The femoral geometry is segmented from the CT images, and then automatically meshed with parabolic 10-node tetrahedral elements using a modified version of the advancing front method. The resulting mesh is projected back onto the CT images, and the calibrated CT attenuation coefficients are numerically averaged over the volume of each finite element, assuming trilinear interpolation between voxel values, in order to generate heterogenous material properties mapping. As human femurs exhibit a clearly fragile failure under side fall loading, we use a maximum strain failure criterion with different limits for tension and compression. Over the years the method was refined through extensive methodological research (Polgar et al., 2001;Taddei et al., 2004;Viceconti and Taddei, 2003;Viceconti et al., 1999a,b;Zannoni et al., 1998), subjected to extensive verification (Taddei et al., 2006;Viceconti et al., 2004), and equally extensive validation against strain and strength measurements on cadaver femurs (Grassi et al., 2012;Schileo et al., 2014Schileo et al., , 2008aSchileo et al., 2008bSchileo et al., , 2007. The method was further developed to be used as a clinical predictor of the risk of fracture (Falcinelli et al., 2014;Qasim et al., 2016); on the Sheffield cohort the stratification accuracy of CT2S is 82%, while that of DXA-aBMD is 75% (Viceconti et al., 2018).

DigiMouse: imaging & modelling technologies to quantify bone adaptation to therapies
The generic term of "bone drugs" indicates a variety of pharmaceutical products (Alendronate, Risedronate, Ibandronate, Zoledronic acid, Denosumab, Raloxifene, Teriparatide, Abaloparatide, etc.) which through fairly different mechanisms of actions slow down or reverse the progressive loss of bone mass. The most common animal models used to test the efficacy of new bone drugs use wild-type (WT) female mice of C57BL/6 or Balb-C strains, typically aged around 14 weeks, who all receive an abdominal surgery; for half this involves the removal of the ovaries (OVX), and for the other half only the sham surgery (SHAM). Each of the two groups is subdivided in two sub-groups, one that is injected with the drug being tested, the other with just the vehicle. After a period of treatment, typically four weeks, all animals are sacrificed, one bone (typically the tibia or the femur) is harvested, and two small portions of 2-3 cubic millimetres of bone tissue below the proximal growth plate and in the mid-shaft are scanned with microcomputed tomography (microCT) (Ruegsegger et al., 1996) in order to evaluate the properties of the cancellous and cortical bone, respectively (Bouxsein et al., 2010). From the obtained images 3D histomorphometric measurements can be performed to describe the tissue density and morphology. The most important parameter is called bone volume fraction (BV/TV), which is the volume of bone that is mineralised in the considered region of interest (ROI) over the total volume of the ROI. The higher is the BV/TV of the treated animals compared to those nontreated, the more effective is considered the treatment. The high-resolution microCT images allow also measuring other important microstructural parameters as trabecular thickness (Tb.Th), trabecular number (Tb.N), trabecular spacing (Tb.Sp), cortical area (Ct.A) and cortical thickness (Ct.Th), which provide an assessment of local bone quality (Bouxsein et al., 2010). For example, see (Johnston et al., 2007) who investigated the effect of a combined Teriparatide and Alendronate treatment with this animal model.
Although these are rarely explicitly stated, this model relies on a number of assumptions: 1 The ovariectomy produces in mice a systemic osteopenia similar to age-related osteoporosis in humans; 2 The sham surgery is required to normalise for the systemic effect of surgery; 3 The proximal tibia cancellous bone and the measurement of the properties of the cortical at a thin portion of the mid-shaft well represent global skeletal changes (negligible spatial gradient). 4 The changes at four weeks are representative of the long-term response (negligible temporal gradient after the time point); The above-mentioned analyses can also be performed longitudinally by using an in vivo microCT imaging  where each mouse would act as its own control, instead of the standard cross-sectional design. In a first study (Lu et al., 2015) we explored surface-tosurface distance as a measure of bone remodelling between two consecutive time points, subsequent to proper image segmentation and Boolean operations, an approach similar to that used by Ralph Müller's group at ETH Zurich to investigate the bone remodelling of mouse caudal vertebrae (Christen and Muller, 2017;Schulte et al., 2011).
While the method was found very accurate by using repeated scans, when applied to longitudinal data it highlighted that even after 14 weeks of age mouse tibiae continue to grow, and even if the linear growth per week is small, in the order of a few hundred micrometres, it biases heavily the bone remodelling quantification, making difficult to separate changes due to growth from those due to bone remodelling. To compensate for this, a new approach was adopted (Lu et al., 2016). The 3D images obtained at different time-points were all aligned to the one generated at the first time-point (baseline) using rigid registration. In each image the tibia was segmented by thresholding and its length calculated. The analyses included only the 80% of the bone below the proximal growth plate, which was subdivided in ten equally spaced sections, which were further subdivided into four anatomical oriented (medial, lateral, anterior, posterior) compartments (Fig. 1, top). Using this anatomical partitioning, we evaluated the reproducibility of various measurements, finding that the total Bone Mineral Content (BMC) obtained by the necessary densitometric calibration was more reproducible than other morphometric indicators, such BV/TV. A simple power calculation showed that by using a longitudinal approach enabled by in vivo imaging and the more reproducible BMC as metric of bone adaptation, we could reduce of 63% the number of animals required in each group to achieve the same statistical power, with respect to the traditional cross-sectional study described above (see Table1). This reduction of the required animals for each group should be added to the several animals that do not need to be sacrificed at each time point. For example, if we are interested in measuring the bone changes at six different time points between two groups of 10 mice, with a standard cross section study we would need 120 mice. With a longitudinal approach we would just need 8 mice.
Nevertheless, one potential concern for the longitudinal assessment is that the X-ray ionising radiation that the repeated microCT scans involve (approximately 500 mGy per scan) might affect the bone remodelling process, altering the very same biological mechanism we want to observe. In small group of mice (N = 5 per group) treated with parathyroid hormone (PTH) or with vehicle, the right leg was scanned 8 times in vivo (once per week between weeks 14 and 22 of age, excluding week 15 that was skipped) and at week 24 both right (irradiated) and left (not irradiated) tibiae were scanned ex vivo. The differences between the total BMC of tibiae were between 3.7% and 8.0% (difference for control group: 5.8 ± 1.9%, difference for treated group 5.1 ± 0.9%) . This difference was slightly larger than that found between left and right tibiae of 10 control mice of the same age (22 weeks) that were not irradiated (2.0 ± 1.3%, range: 0.3% to 3.7%). Therefore, as caution principle, we recently revised the longitudinal protocol to perform the scans every two weeks instead than weekly and to half the scanning time which leads to a reduction of the nominal ionising radiation dose from 513 mGy to 256 mGy per scan. This reduction in scanning time was not associated with significant  .

Table 1
Theoretical sample size reduction (with 95% confidence interval, CI) if data from the longitudinal study are analysed by accounting for baseline measurements or if they are analysed as standard cross-sectional study. The data are reported for morphometric and densitometric parameters: Bone Volume Fraction (BV/TV), trabecular thickness (Tb.Th), trabecular number (Tb.N), cortical thickness (Ct.Th), total trabecular bone mineral content (Tb.BMC), total cortical bone mineral content (Ct.BMC) and total bone mineral content (Tot.BMC). The intra-class correlation (ICC) is also reported for each variable. differences in the estimated densitometric and mechanical properties of the tibia (Oliviero et al., 2017). The new method described here opened additional possibilities, beyond the reduction of animal experimentation. A first important benefit is that for the first time it was possible to observe the effect of an intervention over a large anatomical site (the whole tibia), and across time. We used this to explore how the injection of PTH would manifest in space and time, with respect to WT mice without any intervention . Indeed, the study found that the anabolic effect of PTH follows a specific spatial-temporal pattern, confirming that spatial and temporal gradients are not negligible.
Furthermore, the longitudinal scanning of the whole mouse tibia allows to use mouse-specific computer modelling for the non-invasive prediction of the bone strength, a much more informative quantity to investigate the effect of an intervention on the mechanical competence of the skeleton. This application is explored in detail in the following section.

Predicting bone strength non-invasively in mice
While animal models are useful for the prediction of effect of novel interventions on the density, microstructure and mechanical competence of the bone tissue, the differences in anatomy, physiology and kinematics make it impossible to have a preclinical counterpart of the clinical estimation of the risk of fracture. However, the bone mechanical competence, an endpoint in the assessment of interventions which is much more clinically relevant than standard densitometric and morphometric measurements, should be measured or estimated when possible.
As described above, from longitudinal microCT images we can measure a large number of densitometric and morphometric parameters that can be affected by ageing, diseases and/or interventions. In most cases it is hard to quantify the efficacy of the interventions from this multi-dimensional dataset: it is hard to assess how good is a treatment that improves one of the morphometric parameters or only the bone density in a small region of the considered bone. In this context the measurement or estimation of a few mechanical parameters can provide a much easier interpretation of the effect of interventions, which target an improvement of the bone mechanical competence. These parameters include for the example the load required to induce a fracture (failure load or bone strength), the load required to start permanent deformation in the bone (yield load), the bone stiffness and the energy required to induce a fracture in the bone.
The mechanical properties of mice bones can be measured in the laboratory through standard mechanical testing (Schriefer et al., 2005). However, due to the complex shape and the small size of these ex vivo analyses are affected by experimental errors (Wallace et al., 2014) or are limited to loading conditions, which are less physiologically relevant (e.g. three-point bending test). Moreover, the estimation of yield or failure loads require destructive mechanical tests and can be therefore used only in standard cross-sectional studies, which require large number of animals and resources.
Similarly, to subject-specific QCT-based approaches developed for clinical applications, microCT-based finite element models (microFE) can be used to non-invasively estimate the bone mechanical properties longitudinally (Borg et al., 2018;Isaksson et al., 2009;Javaheri et al., 2018;Lambers et al., 2012;Liu et al., 2012;Lu et al., 2012;Lynch et al., 2010;Tiwari et al., 2018;Tiwari and Prasad, 2017;Ulrich et al., 1997Ulrich et al., , 1998Wernle et al., 2010). The workflow to create a microFE model from the medical image is similar to that developed for clinical CT based finite elements for humans (Fig. 1, bottom). The bone voxels of the rigidly registered microCT images of the mouse tibia collected in vivo at the different time points are separated from the background and then converted into hexahedral finite elements. This in silico approach allows us to simulate different types of loading, the simplest and most important for the tibia being uniaxial compression.
Unfortunately, conversely to the clinical counterpart, the validation of the predictions of microFE outputs is limited to a few studies. While microFE models have been found to well estimate structural stiffness and strength of trabecular bone specimens (Schwiedrzik et al., 2016;Wolfram et al., 2010) and to well estimate local displacement under compression for trabecular bone (Chen et al., 2017) and vertebral bodies (Costa et al., 2017), little has been done to validate models of the mouse tibia. A few studies have found good agreement between the predicted local strains in a region of the external surface of the mouse tibia and experimental measurements performed with strain gauges (Patel et al., 2014;Razi et al., 2015b) or digital image correlation (Carriero et al., 2014;Pereira et al., 2015). However, the accuracy of the predictions of internal strains or of the structural mechanical properties of the tibia was never challenged. In a recent study performed in our group (unpublished data) we combined mechanical testing, microCT imaging, digital volume correlation and computational modelling to evaluate the ability of microFE models of the mouse tibia to predicting local and structural properties (Oliviero et al., 2018). The results of this pilot study (mechanical tests performed on six tibiae) showed good predictions of local displacements when compared to digital volume correlation measurements (coefficient of determination of the linear regressions between predicted and measured displacements greater than 82%, intercepts close to 0 and slopes in the range: 0.69-0.95), an image processing technique that measures of the full deformation field given two microCT images of an undeformed and deformed bone (Dall'Ara et al., 2014). Furthermore, both structural bone stiffness (difference between predicted and measured stiffness equal to 14 ± 11%) and bone strength (difference between predicted and measured stiffness equal to 9 ± 9%) were well predicted.
By using these validated microFE models applied to longitudinal microCT images we have estimated the effect of an anabolic treatment (parathyroid hormone, PTH1-34, base for the FDA approved drug against osteoporosis Teriparatide) on the bone mechanical competence of female C57BL/6 mice. In particular, we have shown that the tibia stiffness and strength normalized by baseline measurements (week 14 of age) were significantly higher after two weeks of treatment (treatment started at week 18 of age) compared to the control group .

The future: using models to translate from mouse to human
For the particular application addressed in this paper we have described two methodologies that allow respectively in humans and in mice to quantify non-invasively the changes in bone mass distribution in a given bone, and the relative changes in mechanical strength (and thus of the risk of fracture) under representative loading conditions. It might be interesting to speculate how these two technologies could be combined. As a guiding example we will imagine a study aimed to establish if the combination of PTH and physical therapy is more effective than PTH alone in reducing the risk of bone fracture (Gardinier et al., 2015).
First, we would observe longitudinally the changes in bone mass distribution in the tibia, with and without superposition of some loading regime (say for example a vibration from the ground in standing position). Longitudinal in vivo microCT would give us the evolution of bone mass distribution over time, and animal-specific finite element models would give us an estimate of the actual biomechanical stimulus delivered in each point of that mouse's tibia under the vibration exercise, so to build a clear correlation between the dose of PTH injected, the intensity of the biomechanical stimulus, the anatomical location, and bone adaptation response over time.
A second experiment would be done on osteoporosis patients, under very similar conditions. We would use clinical CT to follow-up longitudinally changes in the bone mass distribution in the tibia but also in the femur. We would build patient-specific finite element models to calculate the actual biomechanical stimulus delivered in each point of that patient's tibia by the vibration exercise.
As a first step we would use the matching data at the tibia to build a "scaling function" that tell us how the bone remodelling response observed in the mouse for a given normalised dose of PTH and for a given normalised level of biomechanical stimulus would correlate with that observed in the human under the same normalised conditions. We would then use this scaling function to predict the remodelling in the human femur, using the mouse response in the tibia and the actual normalised level of biomechanical stimulus in the femur predicted by the patient-specific model. We would then validate this mouse-human scaling model by comparing the predicted bone remodelling with that observed in human in the femur form longitudinal clinical CT.
Assuming the predictive accuracy is sufficiently good, we could now test only in the mouse a different dose or loading regime, or another anabolic composition, and use the scaling model to predict how such regime would change the bone mass distribution in the femur of the cohort of patients with the initial data of those enrolled in the previous study. Then we would use the CT2S technology to estimate how in that cohort such changes in femoral bone mass would reduce the actual risk of fracture at the hip.
All this would of course not replace the clinical trial, but we would go the clinical trial with much more confidence on the efficacy of our new treatment of what we can possibly achieve today with the current animal experimentation approaches.

Discussion
In this paper we propose one possible scenario for the use of modelling and simulation to reduce, refine and replace animal experimentation, and to improve the ability of animal experiments to predict the outcome in humans. Of course, there are other more radical scenarios: for example, to use the animal experiments simply to inform the inputs of in silico clinical trial population model, where a virtual population is "virtually treated" with a new drug, which effect is modelled from experiments on mice. An even more radical approach would be to skip animal experimentation altogether and inform such in silico clinical trial population model with the results of experiments done in vitro, on cell cultures. All these scenarios are plausible, and in fact in at least one case, the regulators have already accepted in silico testing as a full replacement of animal experiments (Man et al., 2014). The approach we proposed here is somehow incremental with respect to the current methods based on animal experimentation, and as such they may be more rapidly adopted.
A fundamental assumption in such scenario is that a "scaling function" exist to quantitively translate observations made on mice to humans. Of course, case by case the existence of such function must be demonstrated. Where no such function could be found, this would cast serious doubts on the usefulness of that animal experiment to predict the response in humans; in such case, modelling and simulation could still be useful, this time to replace entirely animal experimentation.

Conclusions
In this manuscript we described an initial exploration where a set of VPH technologies, originally developed to predict the risk of femoral strength and hip fracture in osteoporotic patients, were adapted to assess the efficacy of bone drugs pre-clinically on mice. We also speculated about a possible approach where animal-specific and patientspecific models could be used to better translate the observation made on animal models into predictions of response in humans.
We are confident that in vivo imaging and subject-specific modelling technologies like the ones described here can not only reduce, refine, and in some cases even partially replace animal experimentation, but also provide a much more rigorous reasoning framework through which to interpret clinically the results of animal experiments.

Conflict of interest
The authors declare that they do not have any financial or personal relationships with other people or organisations that could have inappropriately influenced this study.