Finite element models of the tibiofemoral joint: A review of validation approaches and modelling challenges

The knee joint is a complex mechanical system, and computational modelling can provide vital infor- mation for the prediction of disease progression and of the potential for therapeutic interventions. This review provides an overview of the challenges involved in developing ﬁnite element models of the tibiofemoral joint, including the representation of appropriate geometry and material properties, loads and motions, and establishing pertinent outputs. The importance of validation for computational mod- els in biomechanics has been highlighted by a number of papers, and ﬁnite element models of the tibiofemoral joint are a particular area in which validation can be challenging, due to the complex na- ture of the knee joint, its geometry and its constituent tissue properties. A variety of study designs have emerged to tackle these challenges, and these can be categorised into several different types. The role of validation, and the strategies adopted by these different study types, are discussed. Models representing trends and sensitivities often utilise generic representations of the knee and provide conclusions with relevance to general populations, usually without explicit validation. Models representing in vitro specimens or in vivo subjects can, to varying extents, be more explicitly validated, and their conclusions are more subject-speciﬁc. The potential for these approaches to examine the effects of patient variation is explored, which could lead to future applications in deﬁning how treatments may be stratiﬁed for sub- groups of patients.


Introduction
The knee is the articulating joint most commonly affected by osteoarthritis [1] , and there are still major challenges to overcome in the development of lasting treatments. There is now an increasing effort to develop early stage interventions to prevent knee degeneration and delay the need for joint replacement surgery. This includes regenerative therapies for cartilage and bone [2] , as well as repairs for the meniscus [3] and ligaments [4] . Development of such tissue-sparing interventions requires an understanding of the mechanical environment of the knee, necessitating improved pre-clinical testing methods such as in vitro simulation [5] . Experimental methods can provide a controlled environment for assessing joint mechanics, but are generally expensive and time-intensive when using large numbers of in vitro specimens or in vivo subjects, and are limited in the scenarios and outputs that can practically be investigated. Computational models therefore play an important role in non-invasively understanding knee mechanics [6] ; they can provide information that would be difficult or impossible to obtain from experimental studies and can also be utilised for sensitivity testing in order to assist the setup of experimental models.
Finite element (FE) modelling has been used extensively in biomechanics, and a growing number of studies of the knee that use FE methods are being reported. Examples include the investigation of cartilage degeneration and osteochondral defects [7][8][9] , the influence of meniscus shape [10,11] and the biphasic response of cartilage to loading [12] . Models have also been developed to investigate stresses in the patellofemoral joint [13] , which is beyond the scope of the present review.
The computational investigation of the contact mechanics of the tibiofemoral joint is particularly challenging because there are multiple contacts between tissues and complex articulating surfaces. Validation of knee models is therefore non-trivial, and despite the large body of work, there has so far been only limited progress in translating the findings and tools of modelling research into clinical practice. This may become a more common aim as modelling technology progresses. Because of the complexities in terms of the structures, material representations and forces that can be included in knee models, many studies currently aim for increased understanding of the knee's mechanical behaviour, particularly in the context of disease scenarios where interventions are becoming increasingly common. These investigations may occur on highly subject-specific models or more generic representations of the knee.
Kazemi et al. [6] wrote an extensive review on advances in computational mechanics in the human knee in 2013, and a more general review on knee biomechanics was provided by Madeti et al. [14] in 2015, but there has since been significant further work produced, especially in the area of model validation. The purpose of the present review is to provide an overview of the main processes and current challenges in knee modelling, and then to focus on examining validation strategies and the circumstances in which validation can be omitted. Three particular categories of study are identified: those representing trends and sensitivities, often using generic models; and two types of more subject-specific models, representing in vitro specimens and representing in vivo subjects. This review is not intended to provide a comprehensive list of papers utilising FE models to investigate the knee, but to focus on the key challenges and the state of the art for validation when these different study types are utilised. This includes highlighting the importance of model reuse, verification, calibration and context of use, as well as discussing good practices and potential areas for future development .

Processes and challenges for knee models
The knee is a highly complex physical system and comprehensive models remain elusive due to the sparsity of precise data on knee tissue properties and limited understanding of the interactions between them, as well as how these factors vary among different subjects. Therefore models of the knee may be generated by considering a subset of the system, pragmatically chosen based on a focused question that the model is designed to address. The focus of studies modelling the tibiofemoral joint in particular is often on the behaviour of the cartilage, meniscus or ligaments ( Fig. 1 ).
In addition to validation, which will be covered in detail in Section 3 , there are several key challenges to address in order to develop computational models of the tibiofemoral joint. These include: Fig. 1. Diagram of some of the typical components commonly featured in computational models of the tibiofemoral joint.
(1) The capture and representation of appropriate geometry and material properties. (2) The representation of appropriate motions, loads and constraints. (3) The establishment of relevant outputs and their levels of uncertainty.

Geometry
The level of detail required for the representation of the structures in the knee depends on the particular application of the model. For example, in a study focussed primarily on cartilage response, it may be appropriate to use basic spring elements to represent ligaments, or even omit them completely for computational efficiency in which case their effect on the primary tissue of interest should be taken into account using kinematic constraints [7] . Thus there exists variation among models in the manner of implementation and detail in the representation of different structures in the knee. There are several options for including constituent tissues (e.g. ligaments, meniscus), in knee models: • The tissues can be explicitly modelled, and output measures taken (with detail driven by output precision/accuracy). • The tissues can be explicitly modelled, but perform a supporting role in the model (with detail driven by precision/accuracy of their effect on the outputs of interest). • The effect of the tissues can be wrapped into boundary conditions and loads, with no geometry included (applies primarily to ligaments).
For tissues that are included with explicit geometric representation, geometry is usually incorporated by utilising medical imaging (CT or MRI) of cadaveric specimens, volunteers or patients [15][16][17] . Dependent on image resolution, this can provide an approximation of the native joint geometry, although it does not provide a true representation since there will be errors due to image resolution, imaging artefacts and simplifications inherent in the segmentation process [18] , as well as smoothing applied to specimen-specific models to ensure robustness of contact algorithm solutions [19] . In some cases, multiple users may perform segmentation to minimise variability [15] , with variation between paired images required to be below a specified threshold [20] .
Pena and colleagues provided some earlier instances of models using CT and MR imaging to include all of the main structures of the knee, producing models featuring cartilage layers, menisci and ligaments, as well as rigid bone representations [21,22] . These models were used to analyse the effects of meniscectomy [22] , and later to investigate the combined role of menisci and ligaments in load transmission and knee joint stability [21] . Although only basic validation was provided in terms of comparisons to the kinematics and stresses reported by other studies using different subjects, and an idealised model was necessary to test mesh convergence, these studies nevertheless demonstrated the potential for subjectspecific FE models to predict complex stress and strain patterns and kinematics occurring in knee joints.
As an alternative to segmentation-based models, the geometry for knee models can also be described mathematically [23][24][25] , reducing computation and analysis time [26][27][28] . It has been shown recently that trends predicted by idealised parametric models of joints based on mathematical geometric descriptors can be similar to those seen in models based on image segmentation, in both the knee and hip joints [10,29,30] . Thus simplified models can provide reliable qualitative predictions of expected trends, with particular potential to identify the aspects on which to focus in more sophisticated models. However, quantitative data from idealised models may not match well with experimental predictions [10,31] . When a fully parametric geometric approach is taken, levels of geometric complexity can be added depending on the intended application. For example, the menisci play an important role in knee stability and they are essential for both load transmission and joint lubrication [32] . Meniscus injury is associated with osteoarthritis [33] , and meniscal pathology provides a key biomarker for osteoarthritis progression. Modelling the behaviour of the menisci can therefore potentially provide new insight into disease progression and a parametric approach can be taken to investigate the effects of different meniscal geometries [11] . On the other hand, a study focused on forces occurring in osteochondral grafts during cartilage-on-cartilage contact may include a more basic meniscal representation [8] .

Material properties
In terms of material properties, a large amount is known about the internal structure and properties of the bone, articular cartilage, meniscus, and ligaments within the knee. Equally, there exist theoretical models and numerical techniques that include collagen fibre alignment, hyperelastic behaviour, fluid contribution and multi-layered aspects of the knee as a mechanical system [12,15,34,35] . However, obtaining sufficiently detailed experimental data to calculate the many property values required for such sophisticated representations is very challenging. Thus the material property models within computational knee simulations are commonly set up based on a sensible choice of physics informed by literature and not explicitly validated. There is great variability in the properties used for each tissue, which may be taken from existing literature or be derived experimentally. Imaging can also be used to obtain subject-specific material properties; for example location-specific bone density is commonly derived using CT imaging [36][37][38] , and one study [39] used sodium MRI to determine fixed charge density distributions in the tibial cartilage, although this technology is limited by imaging resolution.
Bone is often assumed to be rigid in knee models for comparative studies where loading effects on cartilaginous soft tissues or ligaments are of particular interest [10,16] . A more complex bone representation may be important for making subject-specific predictions of regions at risk of joint failure for particular specimens, as bone stiffness can affect tibial cartilage stresses [36] .
For representing cartilage, a linear elastic material model is commonly used, due to the equivalence between short-time biphasic and incompressible elastic material responses demonstrated by Ateshian et al. [40] . Depth-dependent material properties, inhomogeneity of the cartilage and the biphasic response may also be relevant if the intended application is to better understand longer-term cartilage mechanics [12,34,41] . Osteochondral defects, consisting of damage to both articular cartilage and the underlying subchondral bone, present a large clinical burden by altering the local biomechanics and biotribology of the knee joint and causing joint pain [2] . Whilst imaging approaches can be used to assess cartilage deformation [42] , such methods do not provide an effective means to analyse the contact force distribution on the articulating surfaces. Inclusion of detailed cartilage layers within FE models of the knee is therefore crucial for progressing understanding of their degeneration. This may include multiscale modelling of cartilage [43][44][45] . Freutel et al. [46] discussed the challenges of material models for soft tissues in greater detail, so this will not be covered further here.

Inclusion of motions and loads
This section is concerned with the challenge of ensuring knee models produce motions corresponding to an experimental situation of interest. This requires an understanding of the moving parts within the knee and how they react to different motions modifying the mechanical environment. Both computationally and experimentally, motions may be applied directly or they may result from applied loads when translational or rotational freedoms are applied to the femur or tibia. The application of only loads without any constraints on motion allows the model the most freedom but could result in physiologically inaccurate movements. Different modelling studies therefore approximate knee function in different ways. Loads may be controlled and adjusted for sensitivity purposes [11] , or specific loads and boundary conditions may be chosen to ensure motions replicate experiments [47] .
Specific in vivo motions can be difficult to derive; Stentz et al. [48] reported using CT bone models combined with a dynamic radiostereometric analysis system to achieve non-invasive measurements of joint kinematics. This approach to measuring knee movement was validated against a gold standard skin marker method, and has potential clinical applications to prosthesis migration. Deriving in vivo knee forces for use in quasi-static FE models may also be achieved using multibody models [49,50] . Fully dynamic FE models may be necessary if the effects of inertia were thought to be crucial, for example in a study of knee replacements [49] .
Ligaments play a large role in both knee kinematics and biomechanical load bearing [51] and their injury is associated with increases in pain, osteoarthritis and knee joint instability [33,52] . Ligaments are therefore a key factor which influence the relationship between applied loads (or motions) and resulting motions (or loads) in the knee. One of the principal clinical drivers for the detailed modelling of ligaments in the knee is the analysis of their repair after injury, particularly the anterior and posterior cruciate ligaments (ACL and PCL), and FE models are increasingly being utilised to investigate ligament rupture and reconstruction [53][54][55] . Ali et al. [16] demonstrated that ACL resection can produce altered knee mechanics and motion by testing cadaveric knee specimens in an electro-hydraulic knee simulator with motoractuated quadriceps and loads applied at the hip and ankle, in each case first with the ligament intact and then resected. FE models developed to simulate these scenarios revealed that changes resulting from ACL resection can manifest differently among different specimens; one specimen exhibited altered anterior tibial translation, whilst the other exhibited elevated joint loads. Since individual differences exhibited most clearly when calibrated ligament properties were used, this suggests a subject-specific ligament modelling approach would be beneficial for a larger study. This is supported by findings of Beidokhti et al. [15] , who found that including subject-specific derived ligament properties in continuum modelled ligaments improved predictions of experimental kinematics and contact pressures. Earlier models [19] also found that tuning ligament properties so that model kinematics matched those found in a cadaveric specimen aided the validation of model derived joint contact forces. More recently, it has been suggested that additional peripheral soft tissues including knee capsules as well as ligaments may alter predicted knee mechanics [56] .
Another major challenge related to modelling knee motion is understanding the role of the meniscus and analysing meniscal movement in response to loading. This may be crucial for modelling the potential for damage progression in the knee. Meniscal translation has previously been captured using MR imaging [57] , and Halonen et al. [47] created a subject-specific FE model using MR images of a volunteer's knee specifically to investigate meniscus movements and cartilage strains. One particular issue with accurately modelling the meniscus is incorporating meniscal attachments. This aspect can be difficult to accurately capture within models due to the challenge of achieving precise segmentation of attachment site geometry and establishing material models for their behaviour. The attachment sites can be particularly difficult to identify in imaging, especially without prior ligament removal if using a cadaveric specimen, and may be virtually impossible to distinguish from other soft tissues if using CT imaging. Thus generic spring elements with estimated properties and locations have been used to represent meniscal attachments, which along with friction serve to limit meniscal movement. Freutel et al. [58] segmented medial meniscus geometry from MR images of porcine knee joints, with meniscal displacements having previously been determined experimentally. From this data, optimisation was used to determine subject-specific material properties of the meniscus and its attachments, allowing the time-dependent behaviour of the meniscus and its attachments to be investigated.

Input precision and establishing relevant outputs
There are many different motivations for developing computational models of the knee, producing many distinctive approaches to doing so. It is therefore important to consider the relevance of model outputs assessed to the original aim of the study. In particular, when the ultimate aim is to use the models to predict disease risk or assess the suitability of treatments in vivo, it is necessary to consider the clinical relevance of outputs reported [59] . This may include outputs to indicate risk of damage progression or to assess intervention suitability. For example, meniscus movement might be a crucial metric in a study of the progression of meniscal tears [58] , whilst in a study of femoral osteochondral grafts it may be pertinent to analyse tibial cartilage contact patterns to understand the effects of graft recession or extrusion [8,60] .
In the case of FE models used to examine mechanical response in the knee, model outputs can be highly sensitive to the chosen representation and condition of included tissues such as the menisci and cartilage. Ambiguity in input values can result in a wide range of reported values for outputs of interest. One study [61] found that output uncertainty can be reduced when specimen-specific data for certain input parameters is known, including joint geometry such as meniscal insertion site positions, kinematics and BMI to inform loading. Thus some uncertainty can be reduced for specimen-specific models, although many inputs may be difficult or impossible to obtain clinically. Furthermore, in certain situations some parameters may be impossible to control experimentally, and in this case they may be used as tuning parameters for each knee-specific model. For example, an earlier study by the same group [62] used the varus-valgus angle as a tuning parameter and found similar regions of contact stress between models and experimental work (in this case quantified by normalised cross correlation values within 69 to 85%).
Even when the uncertainty for output measures is minimised, it remains crucial to establish exactly which outputs are of interest for the particular focus of the model so that they can be used to predict intervention response or disease progression. The Osteoarthritis Initiative (OAI) [63] provides a database on the natural history of osteoarthritis by making publically available clinical evaluation data and imaging (X-ray and MRI) for nearly 50 0 0 subjects. One study [7] considering subjects from the OAI database defined outputs specific to disease progression by splitting subjects into two groups based on osteoarthritis risk and BMI. FE models of one representative subject from each group were generated and collagen fibril damage was defined to occur when tensile stresses exceeded a threshold limit during gait loading, with control of degeneration based on the duration of loading in different regions over successive iterations. In this way an algorithm was presented to predict knee cartilage degeneration based on accumulated excessive stresses in the medial tibiofemoral compartment. Approaches like this may become more common as modelling complexity increases, allowing outputs relevant to specific scenarios to be analysed.

Model sharing
As the prominence of computational modelling in biomechanics increases, uncertainty about modelling results can be decreased through increased sharing of models and data [64,65] . It can be a challenge to fully describe the methods used in FE models of the knee within research papers, but this is important for understanding simplifications that may affect the results. Sharing models and data of sensitivity studies in particular can help clarify the effects of different levels of input precision on likely model outcomes. Sharing of scripts and protocols in addition to data can help mitigate potential issues with software and version compatibility and ensure repeatability. In addition to the OAI project [63] mentioned previously, other researchers are beginning to make knee models freely available online [16,66,67] . This can have significant impact; models made available through the Open Knee Project [66] have supported many new publications, including [10][11][12]68] . Sharing of models also provides the means for improved understanding of what aspects of the model were validated. Transparently reporting numerical quantification of validation evidence is also essential, because it is plausible that a given set of experimental and computational results would be described as similar by one group where another would conclude that the model has failed to precisely replicate the experiment. Knee model validation is the focus of the next section.

Validation, verification and calibration
Having considered some of the ways in which methodological challenges in knee modelling are being addressed, the challenge of providing validation for computational models of the knee is now examined. To validate a model is to provide evidence that model generated results correspond to the outcomes of the real world scenario simulated [69] . Several guidelines exist with considerations for reporting FE validation studies in biomechanics and including sufficient detail for repeatability [70][71][72] . In particular, Pathmanathan et al. [73] recently proposed a framework for the applicability analysis of validation evidence in computational models for biomedical applications, and this provides a resource for evaluating validation quality. The framework recommends the systematic assessment of the relevance of the validation evidence for the proposed context of use, which encompasses the purpose of the model and what factors its results are used to inform.
For the purposes of this paper, direct validation is used to refer to situations in which a comparison between a model prediction and an experimental test result is made after a model has been developed to match the corresponding experiment as closely as possible [69] . Indirect validation is used when the model prediction is compared to a physical case where it not known whether the conditions are the same. Confidence can also be built by performing several related validation checks, for example by comparing: • Several different outputs (e.g. displacement, stress) • Under a variety of conditions (e.g. loading cases, restraint cases) • Against data sets from a variety of sources (e.g. multiple specimens) Table 1 Summary of the main types of model used in finite element studies of the knee. Here subject-specificity is used to refer to the geometric representation employed in the model, but calibration of material properties may mean that the materials could also be described as subject-specific. achieve model convergence. Contact algorithms in finite element codes may not be sufficiently robust to handle meshes on complex specimen-specific geometries such as those found in the knee, with small changes to geometry resulting in models that do not converge. Through the FEBio project [74] , open-source code has been specifically designed for such biomechanical applications and is addressing some of these challenges. One aspect of verification for which the modeller is responsible is demonstrating the suitability of the chosen mesh. Hexahedral elements are generally preferred for modelling contact, but present a particular challenge when meshing complex geometries such as femoral cartilage and the menisci. Quadratic tetrahedral elements provide a possible alternative to alleviate this issue; they are more straightforward to implement and have previously been seen to perform well in models of foot biomechanics [75] and articular contact in the hip [76,77] . Recently some authors have also used quadratic tetrahedral elements in modelling knee joints [15] .
Computational models can be developed contemporaneously with, and validated against, in vitro or in vivo experiments. This may require some calibration of model parameters so that model results align with experimental results [78] . Calibration involves tuning input parameters based on model results. If these tuned parameters are not specific to each specimen, this generally means minimising the model-experiment error across a set of specimens. A gold standard for validation is thus to test whether model results continue to correspond well with experimental data when independent specimens are tested. However, because several factors affect their outputs, it is important to avoid erroneously concluding a model is validated based on its calibration. At the study design phase, researchers should carefully consider what their models aim to elucidate and plan validation steps accordingly. Several different combinations of model parameters may lead to similar results, and consequently it may be possible to erroneously 'validate' a model by chance. Parameters that initially appear unimportant could cause crucial differences in model output following the addition of further parameters. For example, in a model of knee contact mechanics, calibrating the meniscus properties may produce the cartilage contact pressures that align well with experimental data, but the meniscus properties themselves may actually be incorrect. These incorrect properties, coupled with incorrect properties of meniscal attachments and the coefficient of friction between the meniscus and cartilage layers could lead to inaccurate conclusions about meniscal pathology even if cartilage pressures were observed to be correct. Experimental data also has limitations (for example in resolution and accuracy of sensors and detection of environmental noise), so researchers should take into consideration that model outputs may need to be compared to suboptimal experimental data. There is substantial literature on sensitivity testing in knee models [10,61] and researchers should be encouraged to report these findings to provide the community with an improved basis for output interpretation and understanding of the circumstances in which conclusions remain valid.

Approaches to validation in different study designs
Validation is often more challenging when models are developed to represent in vivo subjects. On the other hand, generic models aiming for broad conclusions may not require detailed validation strategies. Calibration and validation strategies used in different knee modelling studies therefore vary according to the study purpose and the types of model utilised; this is summarised in Table 1 .
Further discussion of each of these identified approaches is provided in subsequent parts of this section. Some of the key studies discussed are outlined in Table 2 to demonstrate examples of each study type from the literature.

Models representing trends and sensitivities
Computational models can be used for investigating features of the knee as a complex physical system, with the aim of evaluating sensitivity of general outcomes to input parameters. In these scenarios, models may incorporate generic or previously measured inputs, which have inherent uncertainty associated, into an assumed physics framework. Generic representations of the knee are particularly well suited to parametric testing to demonstrate trends and highlight which uncertainties are most critical. In this case the conclusions are population-based rather than specimenor group-specific. The Open Knee model [66] is commonly used for investigating generic trends in knee mechanics. For example, its geometry has been used to investigate the time dependent behaviour of cartilage and fluid pressure at the cartilage-meniscus interface [12] , and for investigating the effects of meniscal tears and full meniscectomy [11] . In studies like these, there is generally no direct validation included due to the lack of experimental counterpart for the developed models, but the results are often compared with literature findings. In one study [12] results were compared with other published models and experimental predictions of outputs including contact areas and femoral displacements under static loads to provide confidence in the modelling approach. In another study [11] the authors explicitly stated that physical validation could not be included as an abnormally flat meniscus geometry was used to investigate meniscal extrusion in the presence of meniscal tears. Their findings are likely relevant to certain real cases with similar morphological characteristics, and would be more difficult to achieve using a complex specimen-specific model, where convergence issues may arise due to more complex articular geometry.
Generic or specimen-specific models can also be developed alongside experimental tests to generate additional information regarding internal stresses or strains, or to provide sensitivity data allowing for fewer test runs. In these cases the response of the tibiofemoral joint to different loading scenarios can be investigated without requiring detailed direct validation to support the study conclusions. For example, one study [60] used a model based on Table 2 Applications of a selection of recent finite element modelling studies of the human tibiofemoral joint, arranged into the three identified study types to highlight the validation strategies used. Model parameters such as material properties are commonly taken from literature, so instances of calibration are also highlighted.

Reference
Application Subjects Calibrated parameters Aspects validated

Models representing trends and sensitivities
Meng et al. [10] Compare image-based and polynomial based knee geometry using Open Knee model.

Open Knee Model
None, but properties consistent between models.
No physical validation.
Previous study compared biphasic model contact areas with literature (Meng et al. [12,10] ). Łuczkiewicz et al. [11] Investigate effects of meniscal tears and meniscectomy using Open Knee Model.
Open Knee Model None, but properties consistent between models.
No physical validation.

Models representing in vitro subjects
Ali et al. [16] Evaluate patellofemoral and tibiofemoral mechanics in knees with and without ACL resection.
Kinematics over gait cycle. Peak quadriceps forces during stance and swing phase. Beidokhti et al. [15] Assess ligament modelling strategy; non-linear springs or transversely isotropic continuum models.

cadaveric knees MRI and CT
Ligament properties from both literature and optimised based on laxity tests. Torques.
Translational and rotational kinematic response. Contact area and peak contact pressure under axial loading. Guo et al. [61] Quantify reduction in output uncertainty when using clinically measurable input variables.

cadaveric knees MRI and CT
Varus-valgus angle uncontrolled in experiment but tuned for each model within physical ranges.
Regions of contact stress (Guo et al. [62] ). Loads through medial and lateral cartilage. Mootanah et al. [19] Predict contact forces and pressures for different degrees of malalignment.
Peak pressure and force in medial and lateral compartments.

Models representing in vivo subjects
Räsänen et al. [17] Investigate effects of fixed charge density on cartilage response during gait.

volunteer MRI
Fixed charged density content of tibial cartilage determined with Na-MRI.

patients MRI from OAI database
Threshold limit to determine degeneration initiation tested at different levels.
Predictions compared with Kellgren-Lawrence grades from X-ray data of OAI database sub-groups. Halonen et al. [47] Evaluate meniscus movements during standing with CT contrast media and study collagen fibril effects.

volunteer MRI and CT
Material properties derived from bovine tissue.
Meniscal motion patterns and mean strains.
Kazemi and Li [35] Investigate creep and stress relaxation of knee joint in full extension. a single subject to parametrically investigate different alignments, geometries and properties for osteochondral grafts. This study mentioned basic validation in terms of comparisons to experimental studies of different cadaveric specimens, and whilst this means the results could not provide subject-specific information on approaches to alleviate chondral lesions, the data do provide guidance on the types of specimen on which to focus in future studies using more complex models. Thus the specimen-specific geometry here was largely incidental, and the study findings suggest generic trends, such as indicating proud placement of grafts in particular increase the stress they experience. These trends can be used to inform future studies and provide useful sensitivity data. When highly complex material representations are used in knee models, particular outputs from specimen-specific cases may be impossible to directly validate. Gu et al. [82] generated a model of a single volunteer to investigate effects of collagen fibres on fluid pressurisation in cartilage, and their findings were not directly validated since collagen fibre directions were assumed to follow a generic split-line direction and zonal difference was not considered. Similarly, Mononen et al. [41] used a poro-viscoelastic model based on a single subject to examine the importance of collagen fibril organisation for the optimal function of articular cartilage, which again could not be directly validated. Although subject-specific geometries from in-vivo subjects were utilised for these studies, they were not seeking to make clinical recommendations, but rather to contribute to the body of evidence on the factors which are important in cartilage behaviour. Thus models from studies such as these fit into the category of models used to better understand the trends and sensitivities in the knee as a mechanical system.

Balancing output relevance and validation in specimen-specific models
Whilst generic models are well suited to parametric testing to establish trends, the development of models with greater subject-specific detail can provide biomechanical insights allowing for subject-specific predictions. This is important since variations in anatomy and tissue properties between patients may lead to differences in treatment outcomes. Furthermore, specimen-specific models provide the means to set up one-to-one matches with experiments to allow for direct validation. Geometrically specimenspecific models may also include generic or previously measured inputs and the associated uncertainty this brings, but incorporate some subject-specific inputs relevant to the scenario they are used to examine. Such complex, specimen-specific knee models are becoming increasingly prevalent [15,16] , but remain in their infancy, warranting a more in depth discussion regarding their validation and extracting useful information from their outputs. There are multiple possible sources of error when comparing computational models of the knee against experimental tests in order to assess their capability to predict real world scenarios. Potential areas for alignment or discrepancy are displayed within dashed boxes. The double lined arrows indicate that there must be a trade-off between validation of computational models against experimental data with controlled conditions, whilst replicating uncertain real world data as closely as possible.
Whether simulating in vitro or in vivo subjects, a stronger relationship between a model and an experiment can often only be achieved at the expense of the relevance of the experimental data to the real world scenario, such as the progression of osteoarthritis under different mechanical conditions. Here there must be a trade-off; experiments can be closely aligned with a model and potentially lose certainty about the relevance of motions and loads to in vivo scenarios, or the model could be compared directly to an in vivo scenario, potentially losing the capacity for calibration of specimen-specific properties ( Fig. 2 ). In both cases, the outputs that can be practically compared in validation tests may be proxy measures for the true output(s) of interest. For example, elevated contact pressures may be used to predict regions with increased risk of osteoarthritis in knee models, as this can be measured both experimentally and computationally, but a direct observation of cartilage quality might be preferred in a cadaveric joint. In the case of models based on a cadaveric specimen, it is possible to obtain specimen-specific measures of some material properties (for example ligament parameters can be derived through laxity testing [15] ), and experiments can be set up in controlled conditions that can be replicated in FE models using matching boundary conditions [15,16] . However it is difficult to be certain that applied loads match the conditions that the specimen experienced in vivo, and removal of supporting tissues such as knee ligaments during experimental set up may alter constraints experienced by cartilaginous soft tissues of interest. In the case of a model based on an in vivo subject, it is possible to set up in vivo experiments to capture cartilage deformation under in vivo conditions, for example by using an MR imaging compatible loading device [83] . However, non-invasively deriving specimen-specific material properties to include in the computational model may not be possible, so literature-based material properties are often required [39] .
Regardless of the validation approach taken, it is important to consider that there are always a number of different aspects which can influence how effectively experimental and computational data can be compared, as well as the relevance of the data to the real life scenario of interest. Potential sources of error are illustrated in Fig. 2 .
Whilst validation evidence for knee models should ideally be obtained from an experimental scenario as relevant as possible to its proposed context of use, it is not unusual to report an initial validation study and later use a modified version of the model for a different purpose [73] . In these cases, it is important to understand whether the validation evidence supports the trustworthiness of the model's predictions in the new context. The risks of uncertainty over model trustworthiness depend on the model's application and its decision consequence. In the future for example, validation may be performed on software intended to provide medical advice, in which case the outcome of erroneous validation is likely to be severe. More typically in current knee modelling, validation is performed to provide credibility to simulation results as scientific evidence.

Models representing in vitro subjects
Subject-specific inputs for models representing in vitro subjects may be derived from imaging of the in vitro specimens or through calibration with experimental measures on these specimens. Calibration may be repeated for the model of each subject to yield a set of validated subject-specific models (models where the response is known to be correct for each subject), or calibrated average properties may be used with potential for testing across a population subset.
Piezoresistive thin-film sensors are commonly used for measuring experimental contact mechanics in experiments using cadaveric specimens [19] . In one of the earlier demonstrations of this technique, it was shown that a transversely isotropic linearly elastic material model for meniscal tissue could provide good computational estimations of experimental measurements of contact pressure, with discrepancies between mean contact pressures in the region of 10-15% [84] . Fundamental validation studies like this can provide confidence when models are later used for more specific scenarios, in this case the investigation of meniscectomy under axial loading [85] . However, since physical experiments contain inherent uncertainty, an important part of performing effective validation of in vitro models is judging when to stop calibrating to exactly match laboratory test results. In particular, thin-film sensors have some limitations; measurements of contact pressure and area are subject to experimental errors in detection and signal processing, particularly on highly curved or uneven surfaces, and they cannot record shear components of stress [15,86] .
Furthermore, the insertion of sensors may lead alteration of the surrounding tissue, interfering with the natural joint mechanics.
When describing subject-specific knee models as validated, it is important to be clear about the scope of the model. One study [8] describes a knee model as validated, but this was based primarily on literature comparisons rather than direct validation, and the model used in the study was a development of a much earlier knee model [87] with the mesh modified to refine the cartilage and menisci. Although the conclusions were within a scope that could be reasonably determined using more generic models (indicating that bone damage and cartilage splits can alter the magnitude and pattern of cartilage pressure and strain), to make more specific conclusions would require further supporting validation work. Contrastingly, another study, which assessed relationships between contact pressure and osteochondral defect size [38] was clear about the extent of validation achieved. Although a non-specimen-specific linearly elastic material representation was used for the cartilage, predictions of peak pressure in the lateral and medial condyles closely matched results from cadaveric experimental models for defect sizes up to 20 mm, with the exception of the smallest size tested (5 mm). Thus a range was provided for the predictive capabilities of the presented model, with poorer agreement for smaller defects possibly due the mesh density. Freutel et al. [58] were similarly clear about the circumstances in which their model was valid. On the basis of low prediction errors for experimental meniscal displacements, the authors analysed stress distributions in modelled porcine menisci, and stress magnitudes were seen to correspond well with previous studies of the meniscus [22] . This level of validation suggests the methods could be utilised for further investigations, for example into the effects of meniscal tears, but in order to be applied to human specimens, corresponding experimental data with human tissue would be required.
Comparison of models against in vitro experiments can also be used as a method to illuminate the necessary complexity of a model in different testing scenarios. Beidokhti et al. [15] for example demonstrated that models featuring subject-specific material parameters for ligaments (derived from experimental tensile tests) produced kinematics more aligned with experimental data than those with literature-based properties. Literature-based spring models for ligaments produced high errors in contact pressure but acceptable kinematics, suggesting scenarios where this approach may be sufficient. As restricting knee movement is a major function of ligaments, validating the kinematic output was important for the application of these models. Precisely matching contact pressures for this scenario was arguably less crucial, but this aspect was nevertheless also reported in detail in the paper.
It is important to recall that experimental tests themselves may not precisely emulate the in vivo situation of interest, although demonstration that a model can replicate a controlled environment is generally the first step in assessing its potential to simulate more complex in vivo scenarios. Loads applied during the validation experiments in Beidokhti et al.'s study [15] were reduced due to structural limitations of the testing apparatus, and were not intended to represent in vivo quadriceps loads. They were however selected based on the intended application of the models, the analysis of ACL reconstructions. Technical issues can also reduce the relevance of an in vitro scenario to an in vivo situation; for one of the study's specimens, the collateral ligaments required excision in order to permit sensor insertion. To mitigate this issue, this condition was replicated in the corresponding model before contact pressure and area assessments.
In highly complex models of in vitro specimens, several calibration stages are often necessary to align computational and experimental results and achieve validation. One group initially performed calibration of patellofemoral mechanics [79] , followed by ligament laxity tests to allow calibration of tibiofemoral soft tissue material properties and attachment locations for models of cadaveric human knees [80] . Following this, subject-specific knee mechanics were modelled in silico for intact and ACL-deficient conditions under several loads to simulate experimentally modelled dynamic activity [16] . This complex study design with several stages resulted in models where validation evidence was provided under both healthy and pathological conditions through comparison of experiments and models. Validation data were generated in terms of both kinematics in the tibiofemoral and patellofemoral joints and in terms of forces experienced by the knee. Model predicted kinematics were seen to largely agree with experimental data in both trends and magnitudes, assessed using root mean squared differences, and as expected, the largest differences occurred at flexion angles greater than those for which ligament laxity calibration had been performed. Furthermore, quadriceps forces seen in the models were comparable to quadriceps forces seen following loading in the experimental simulator tests, although these forces actually changed negligibly following ACL resection, possibly missing potential effects of any adaptive behaviour that could manifest in vivo.
A key aspect of Ali et al.'s study [16] was the commendable use of validation under very different loading scenarios: knees with intact and resected ACLs. However, in this study the ligaments were modelled as spring elements for calibration purposes, and the meniscus was not included when these ligament calibrations were performed [16] . Considering that Beidokhti et al. [15] found a continuum model for ligaments more closely reproduced experimental results, a logical next step would be to use constitutive material models for the ligaments. Furthermore, it would be beneficial to include the meniscus when these calibrations are carried out, since the meniscus may provide additional stability, and damage from dissection to facilitate its removal could cause altered responses. Since multiple experiments were run on the same samples, it is also important to consider whether this may induce tissue damage in the specimens beyond that which is likely to occur in vivo. Further, it may be possible to consider any detrimental effects of bone fixation processes and the order in which the experiments were conducted. All these factors would however need to be offset against increased costs in terms of additional experimental and computational resources in studies which are already complex and challenging.

Models representing in vivo subjects
Developing subject-specific models of in vivo subjects, as opposed to in vitro cadaveric specimens, presents its own unique set of challenges. Obtaining data for calibration is likely to be much more challenging in an in vivo scenario, and similarly outputs that can be practically compared for validation are more limited. For example, there are both ethical and practical limitations to the direct measurement of joint contact forces in vivo, so it may be necessary to partially validate an in vivo model against experimental data of other specimens [81] . On the other hand, modelling in vivo subjects allows patient-specific multibody dynamics models to be developed based on in vivo movements to aid the application of potentially more clinically relevant joint forces to an FE model [88] .
One group approached the validation of an FE model based on an in vivo subject by using an MR compatible compression device to load the knees of an asymptomatic subject [39] . The cartilage deformation magnitudes were found to correspond well at equivalent loads, but there was a discrepancy in load distribution between the medial and lateral plateaus, which might be partially explained by free varus-valgus rotation allowed in the model. This validation was sufficient to support comparative studies into the effect of local variations of fixed charged densities in the cartilage [17,39] . However to derive conclusions specific to the knee mechanics of the particular subject used for the model, it would be beneficial to seek an improved match between the computational and experimental conditions. Another group also developed models based on MR images of healthy volunteers, initially observing the knee joint becoming stiffer as a result of elevated fluid pressure following meniscectomy [89] . Although there was no way to validate these findings, the group provided discussion of validation for follow up models used to investigate load transfer from the cartilage to the meniscus [35] . In this study, indirect validation was discussed on two levels. Firstly, the material model used for cartilage was compared to stress relaxation and creep data for bovine cartilage explants [90] . Secondly, results from the overall knee model were compared to older experimental data [91] . Whilst these steps do not constitute direct validation, they provide a means to evaluate whether the model results seem sensible. The region of maximum contact pressure was observed in the medial compartment, consistent with previous data, although pressure values were relatively low. It is challenging to derive specimen-specific material properties in vivo, so bovine cartilage was used for material calibration, implying a general representation of cartilage was used. Thus the discrepancy in pressures values may have been due to the non-specimen-specific nature of the employed material model and potentially the shape of the specific knee used in the study. Another study [47] also used material properties originally derived from bovine tissue in a knee model to assess cartilage strains and meniscal motions. This model was able to capture patterns of meniscal motion with only slight differences from those observed in vivo using CT with contrast media. In this case, the non-specific material properties along with other uncertainties meant the authors stated that they did not seek to calibrate the model for further validation against the experimental data. However these comparisons again provide a means to assess whether the results may be sensible without the authors claiming to have developed a fully validated model.
In order to derive conclusions relevant to larger populations, development of models with subject-specific geometry will be required for greater numbers of in vivo subjects. This could allow predictions to be made on the efficacy of potential treatments for the knee in different patient subgroups. This will require the development of models that can be validated for multiple uses (such as under different loading scenarios) and confirmation that model development procedures are reproducible. It would also be beneficial to understand where the greatest variation lies in populations, in order to concentrate modelling effort s on particular knee subgroups. A recent study [20] used measurements of the menisci from both knees of subjects from the OAI database to generate active appearance models. Meniscal damage locations were consistently seen primarily in the posterior medial region, and meniscal thickness and tibial coverage were identified as risk factors for osteoarthritis progression, which may be an important consideration for future modelling studies. Mononen et al. [7] also used in vivo subjects from the OAI database, in this case seeking to validate FE models based on these subjects. The complexity of the model scenario, in which an algorithm was presented to predict cartilage degeneration, meant that direct validation would be very challenging and would need to occur over a prolonged time period, because a model would need to be developed to match the experimentally observed conditions seen as osteoarthritis progressed in a particular patient. Follow-up Kellgren-Lawrence grades on the OAI database subjects did however provide a basis for comparison, and correlations were derived between model data and clinical observations. Progress of collagen fibril degeneration in the models was seen to occur mostly in the initial stages of os-teoarthritis, consistent with experimental data from obese subjects in the OAI database. These comparisons provided confidence in the predicative capability of the models, but areas of uncertainty, such as the threshold limit for determining degeneration initiation, prevented more subject-specific conclusions. Future studies that employ a range of at-risk subjects and use subject-specific rather than representative loading data, could be used to predict osteoarthritis onset in a range of knee subsets, suggesting in vivo models have strong potential for future clinical applications.

Summary
This paper has described some of the challenges in developing computational models of the tibiofemoral joint, describing the difficulties in ascertaining representative geometries, material properties and loading conditions, as well as challenges in deciding appropriate outputs to measure. Although the focus has been on the tibiofemoral joint, many of the concepts discussed are also relevant to the modelling of other joints. In particular, the sharing of computational models and data in biomechanics has the potential to move the field forward more rapidly.
Thorough validation strategies remain an important aspect for inclusion in knee modelling studies. As has been indicated by others [70][71][72][73] , a model can only be described as validated for the scenarios and outputs that have been tested against corresponding experimental data, and validated models can only provide strong conclusions when varied within the scope of their original purpose. However, to maximise the returns of developing models, further scenarios or specimens are generally investigated. Thus it is crucial to consider whether validation evidence is appropriate for the context of use of a given model. Equally, if every aspect of every model requires validation, there is little purpose in producing models of the knee joint at all. Eventually, unvalidated cases must be run in order to take advantage of the capability that computational models provide. In this case, researchers must decide what constitutes a reasonable step away from a validated case, where modelled outcomes remain trustworthy. In particular, it may be desirable to use models to bypass further experiments and predict in vivo knee biomechanics or kinematics. For example Ali et al. [16] analysed additional measurements beyond those directly validated, such as ligament AP shear forces with respect to the tibia. In future, it would be useful for study authors to include their rationale for why additional outputs are thought to be appropriate. Parametric tests, such as material changes from a validated baseline are also in a sense unvalidated, but initial validated cases provide a degree of confidence, for example in work by Räsänen et al. [17,39] .
In some cases detailed validation may not be required at all; this can be true when generic knee models are generated with the aim of investigating trends in the complex joint system without making more specific conclusions related to individual specimens or sub-populations. Computational modelling has been used to great effect to provide detailed information about the trends and uncertainties in the knee joint using generic and idealised models [10,26] , although it is not yet clear in which circumstances unvalidated cases can be trusted and it is useful to compare conclusions of these studies with contemporary literature findings.

Outlook
The long term ambitious aim for the modelling community as a whole could be to continue to increase understanding of the knee joint using complex computational models. Checking surprising or counterintuitive models with physical models to confirm modelling predictions is a necessary task, but confidence in models can also be increased with better sharing of comprehensive information. This will require: • Material property and imaging data for knee tissues including the meniscus and cartilage, including data on their variation across populations, e.g. with age, sex and presence of pathology. • Precise subject-specific measurement techniques for calibration, with associated sensitivity data. • Robust numerical methods capable of generating solutions for the irregular geometry and complex materials of the tibiofemoral joint. • Methods for connecting information across institutions and projects, e.g. using approaches such as the Open Knee Model.
More immediately, future studies featuring knee models should focus validation efforts on gathering evidence relevant for their particular application of interest. As the complexity of modelling ability increases, these applications will become of greater clinical relevance. In order to draw clinically relevant conclusions, it is important not only to be confident that models are able to provide valid predictions of outputs, but also that these outputs are pertinent to the problem of interest. For example, when investigating the onset of osteoarthritis, it is reasonable to suggest elevated contact pressures could indicate a greater risk of symptomatic joint damage, and models can be used to highlight the scenarios in which this is most likely to occur [68] . However in order to progress to using modelling to understand the natural history of the disease, it will be necessary to validate model outputs that have more direct relevance to joint damage. In contrast to the modelling of joint replacement materials, where contact pressures are directly associated with material wear [88] , it is less obvious how elevated contact stresses in soft tissue are related to damage mechanisms that contribute to joint disease. Furthermore, since it can be challenging to report the full complexity of contact stress distributions from models, some studies report only peak contact stresses or contact area, and these metrics may be even further removed from the damage mechanisms of interest. On the other hand, such fundamental contact mechanics outputs can be useful for pre-clinical testing of potential therapies, in order to obtain an initial indication of whether an intervention can provide a more favourable mechanical environment in a degenerated knee joint. In the future, studies with greater numbers of subjects and loading scenarios could provide more insight into how the manifestation of elevated pressure are affected by subject-specific factors, which has potential applications in stratifying future therapeutic interventions.
Understanding subsurface stress distributions may be critical for predicting specific osteoarthritic changes in the knee joint, such as the potential for cartilage fragmentation and delamination from the underlying subchondral bone [60] . Additionally, details of strain behaviour, such as deformation of the meniscus as it undergoes loading, could provide more elucidation on joint damage mechanisms [11,47] . Depending on patient-specific geometries and motions, these findings may better explain how joint degeneration is initiated. Another ambitious future direction would be to link investigations and evidence in this area to attempts to understand the process of cartilage damage through collagen network degeneration [7] . Generating such models for a large number of subjects with varied properties and geometries, and testing them in a range of loading scenarios, would elucidate how treatments may be tailored to stratified patient groups. Taken all together, this could result in an improved understanding of the progression of knee osteoarthritis.

Conclusion
The purpose of this paper was to provide an overview of the current challenges in computational modelling of the tibiofemoral joint and the role of validation in different study designs which utilise FE modelling of the knee. It is evident that the community is developing many valuable computational and validation tools to address the emerging biomechanical questions in the knee. In parallel with huge increases in computational power and improvements to imaging techniques, significant advances have been made in the FE modelling of subject-specific knee joints. For complex subject-specific models based on in vitro specimens, validation remains a crucial aspect and particular focus should be given to validating outputs with specific relevance to a model's intended applications in order to maximise the utility of research time and impact of results. Equally however, the power of computational models lies in their ability to be used to investigate scenarios beyond those that can be experimentally examined. Generic knee models remain important for sensitivity testing and parametric analysis to understand population-wide trends, whilst subject-specific in-vivo models can provide insight into internal mechanical behaviour within individual patients' knees, providing data which could aid stratification of future treatments for different patient groups. With the growing capacity for increasingly complex models, outputs from knee modelling studies are likely to have increasing clinical importance.

Declaration of Competing Interest
None.