Deep reinforcement learning coupled with musculoskeletal modelling for a better understanding of elderly falls

Reinforcement learning (RL) has been used to study human locomotion learning. One of the current challenges in healthcare is our understanding of and ability to slow the decline due to muscle ageing and its effect on human falls. The purpose of this study was to investigate reinforcement learning for human movement strategies when modifying muscle parameters to account for age-related changes. In particular, human falls with modified physiological factors were modelled and simulated to determine the effect of muscle descriptors for ageing on kinematic behaviour and muscle force control. A 3D musculoskeletal model (8 DoF and 22 muscles) of the human body was used. The deep deterministic policy gradient (DDPG) method was implemented. Different muscle descriptors for ageing were integrated, including changes in maximum isometric force, contraction velocity, the deactivation time constant and passive muscle strain. Additionally, the effects of isometric force reductions of 10, 20 and 30% were also considered independently. An environment for the simulation was developed using the opensim-rl package for Python with the training process completed on Google Compute Engine. The simulation outcomes for healthy young adult and elderly falls under modified muscle behaviours were compared to experimental observations for validation. The result of our elderly simulation for multiple ageing-related factors (M_all) produced a walking speed of 0.26 m/s for the two steps taken prior to the fall. The over activation of the hip extensors and inactivation of knee extensors led to a backward fall for this elderly simulation. The inactivated rectus femoris and right tibialis are main actors of the forward fall. Our simulation outcomes are consistent with experimental observations through the comparison of kinematic features and motion history evolution. We showed in the present study, for the first time, that RL can be used as a strategy to explore the effect of ageing muscle physiological factors on kinematics and muscle control during falls. Our findings show that the elderly fall model for the M_all condition more closely resembles experimental elderly fall data than our simulations which considered age-related reductions of force alone. As future perspectives, the behaviour preceding a fall will be studied to establish the strategies used to avoid falls or fall with minimal consequence, leading to the identification of patient-specific rehabilitation programmes for elderly people.


Introduction
Ageing is a complex physiological process which is of particular interest of study in countries with ageing populations. For example, according to the French National Institute for Statistical and Economical Studies, the population of French individuals over the age of 60 surpassed that of the population of those under 20 for the first time in 2014 [1]. More recently, in 2018, these figures represented 20.4% of the population for those less than 20 years of age and 25.6% for those over the age of 60. In particular, the French health agency also currently reports approximately 9300 deaths in those aged 65 and over each year as a result of fall-related complications and 76,000 hospitalizations for femoral neck fractures following falls. Of all hospitalizations following a fall, 6.8% of admissions correspond to individuals of 65-69 years of age, 8.6% for those aged 70-74 and 29.8% for those 85-89 years of age [2]. Moreover, hip fractures are viewed as a major morbidity associated with falls and have also been linked to bone density and neuromuscular function changes [3]. Similar trends have been also observed in other European and non-European countries [4]. Muscle strength losses and the fear of falling have also been identified amongst the risk factors for falls [5].
Muscle ageing is also often synonymous with sarcopenia, a condition defined by a loss of mass and force which is more prevalent with age. Humans attain their greatest skeletal mass and force at midlife; however, these values are known to fall by 50% before the ninth decade of life [6]. As such, a large number of experimental studies have been completed on both animal and human subjects in order to characterize the ageing process and better understand these losses [7][8][9][10][11][12][13]. Different descriptors for ageing have been characterized in order to analyze age-related changes within skeletal muscle. This includes values for muscle mass, contraction velocity, fat distribution, range of movement (ROM), fast and slow muscle fibre percentage and size, calcium sensitivity, ATP production, the number of mitochondria per section, the number of capillaries surrounding the muscle, maximum isometric force, muscle stiffness and cross-sectional area of different sections of muscle, amongst other factors. In one study of patients in a long-term care facility, 84.5% of falls occurred amongst those aged over 70, 42.7% of fallers walked independently, 22.7% of falls were due to loss of balance and 22.5% occurred when patients did not have the capacity to perform the activity that they were attempting [14]. While datasets for falls are readily available for younger populations, only a few studies have been able to gather data, most importantly video capture, in real situations for elderly populations [15][16][17].
In addition to experimental observations, numerical methods have been applied to study ageing's cause-effect relationships. As predictive models are often only as reliable as the studies on which they are based, the choice of which parameters to include is critical to the success of the model. It has been observed that new models often build on previous ones by changing or adding muscle parameters, the number of muscles considered, geometry and complexity [18]. Despite this growing body of literature building on simulation-based insights using these different musculoskeletal models, investigations into the effects of these models on simulation results remain limited. It should be mentioned, however, that the definition of the coordinate system and the muscle parameters considered have already been proven to significantly impact simulation results for walking in healthy young adults [18]. In particular, strength and muscle mass, as well as fatigue, are influential factors on walking patterns for the elderly [19]. Despite some interesting results, few numerical studies have simulated ageing's cause-effect relationships and explored the effect of different muscle descriptors for ageing on kinematics and muscle mechanics.
In our previous study, human falls of young adults under normal conditions were successfully modelled through deep reinforcement learning techniques [20]. Deep learning has been successfully applied to a large range of situations thanks to new model architectures and properly labelled data. Moreover, different learning strategies such as reinforcement learning or transfer learning [21] have been developed to mimic human intelligence. The purpose of this study was to investigate the use of reinforcement learning for human movement development when modifying muscle parameters to account for age-related changes. In addition to investigating the effect of changing these muscle parameters, we also considered the potential effect of muscle memory when sudden changes occur in muscle function with ageing. This second task was accomplished by training models on young muscle parameter values, then testing the learned behaviours of the model with the aged parameters. Human falls with modified physiological parameters were thus modelled, and learning was simulated to determine the effect of muscle descriptors for ageing on kinematic behaviour and muscle force control.
Five models were trained and tested by the neural network. In reinforcement learning, training refers to the process by which the neural network learns the desired behaviour or the policy, in our case how to fall. Testing consists of evaluating this policy by producing simulations and analyzing the movement outcome. Of the five models trained, one model was produced for muscle parameter values found in young adult populations. The remaining four models were trained using different values for age-related changes, or elderly models. The first elderly model (M_all), considered a 30% age-related reduction in isometric contraction in addition to changes in muscle properties including maximal contraction velocity, deactivation time constant and passive muscle strain at maximal isometric force as suggested by Song and Geyer for their model on walking performance declines with ageing [19]. The following three models were based on the young model and considered only decreasing maximum isometric contraction changes for each muscle of 30%, 20% and 10% respectively.
The model currently accounts for the individual mass of each body structure considered and the maximum and minimum ranges of motion of each joint. For each muscle, in addition to the geometry path and a minimum and maximum force value given as 0 and 1 respectively, numerous biomechanical parameters are considered, as can be seen in Tables 1, 2 and 3. This virtual environment was initially developed to encourage individuals to build a controller that drives the musculoskeletal model to explore an unseen environment given a specific motor task (e.g. walking, running, falling, etc.). The simulation is driven by exploring a high-dimensional action space (i.e. muscle activation patterns which vary from 0 to 1 (fully activated)) from a small dimensionality of states (e.g. segment position and velocity, joint kinematics) of the musculoskeletal system. This is essentially an optimization problem whereby the muscle coordination patterns are found from the muscle active matrix space for the action that is most suited for a specific motor task, in our case, human falls.

Single-agent reinforcement learning solved using the actor-critic method: DDPG
A single-agent reinforcement learning problem is often expressed as a Markov process ⟨S, A, P, R⟩ , whereby S denotes the state space of the agent (i.e. the musculoskeletal model), A denotes the action (i.e. muscle activation space) taken by the agent, P denotes the transition function and R denotes the reward function. The solution to this problem is known as the policy ( ) . As in our previous study, the optimal policy (s) * is found by using the following Bellman equations: where s, s ′ , are the current and next state of the musculoskeletal model. T , is the transition function. U is the value function. R is the reward. is the discount factor. To solve the Bellman equations above, the DDPG or deep deterministic policy gradient method was used to produce the model weights.

Bioinspired reward reshaping strategies and dynamic movement simulation
A fall was defined as a biomechanical event occurring when the centre of the trunk moves outside of the body's base of support. There are multiple mechanisms for falls, including those whereby the body is destabilized by some external force, or not. The purpose of this model was to study simple forward falls as expressed by the following: where R forward fall is the reward function for a forward fall. R walk is the reward function for a normal walking strategy before the initiation of a fall (F < n) . F is given as the number of footsteps taken, and n is the number of footsteps taken before a fall is privileged. R fall is the reward function used to begin the fall event ( F ≥ n ). P x t and P x t−1 are the positions of the pelvis at the current and previous time steps. M is the number of muscles included in the model. A is the muscle activation matrix. d t = 0.01 s is the simulation time step.
(1)  and v foot t−1 are the current velocities of the pelvis and foot respectively.

Implementation
For the learning phase and simulation for forward falls, a sequential model type was used. The actor network consisted of two fully connected hidden layers (500 and 400 neurons respectively) and had a learning rate of 0.0001. For this network, rectified linear unit (ReLU) and hyperbolic tangent (tanh) functions were also used. The critic network consisted of two hidden layers (500 and 400 neurons respectively), with a learning rate of 0.001. For the critic network, ReLU and the linear functions were used. The agent was compiled using the ADAM optimizer. The network chosen was largely inspired by previous work on deep learning for continuous control problems [23].
Hyperparameter tuning was completed. The number of neurons in each of the two layers used was tested from as low as 64 upwards. No significant improvement was noted after 500 and 400 neurons in each of the two layers respectively nor for the use of the scaled exponential linear unit (SELU) function in the place of the ReLU function. It should be noted that the second-place entry in the NIPS2017-Learn-ingToRun [22] challenge also used the DDPG approach. This was done by implementing two hidden layers of 800 and 400 units with SELU activations followed by a tanh activation, in addition to a learning rate of 0.0003 in both the actor network and critic network. The first hidden layer of the critic network took the states as inputs, and the second  took the actions before being concatenated and further run through the network. Our falls models implement a common feature extraction path which proved sufficient for producing a movement pattern for human falls. This difference in strategy likely accounts for the differences in the hyperparameters used. Training was completed by using Google Compute Engine (n1-standard-4: 4 vCPUs, 15 GB RAM). A total of five different models were trained, one for each of the muscle models developed (young parameters, all ageing parameters, 30% isometric force reduction, 20% isometric force reduction and 10% isometric force reduction). During the testing phase, the young trained model weights were tested on each of the elderly muscle models in order to test the hypothesis of muscle memory. Afterwards, each elderly muscle model was tested with its corresponding muscle model parameters (Fig. 1).

Kinematics
In order to evaluate the effectiveness of such a model, we compared our kinematic results with experimental observations found in the literature. We have focused part of our research on analyzing video capture databases which provide open source video files. These files show the circumstances preceding different falls for elderly individuals living in long-term care facilities [17] and also from experimental studies for different fall movement types [15]. This second database, known as the Sistemic database, provides videos for subjects between 19 and 30 years of age and also for those aged between 60 and 75 years of age. In addition to acceleration values, angular velocity raw data files are provided. Only one elderly subject, a judoka, performed the falls as a safety precaution. While other studies have successfully compiled databases, not all video databases are readily available to the scientific community and thus have not been included within this study.
We considered the acceleration data from the Sistemic study for two subjects, one young subject and the one elderly subject that performed the falls. Three trials for each fall type including slipping forward, tripping forward and vertical falls were analyzed. The data in the raw files was recorded from sensor ADXL345, with a resolution of 13 bits and a range of ± 15 g. The raw acceleration data was obtained using the following equation: where a is the acceleration (g), Rg and Re are the range and the resolution, a raw is the acceleration in bits. For noise removal, a FIR filter was applied to the data with a (6) a = 2 * Rg 2 Re × a raw cutoff frequency of 10 and a collection frequency of 200. Then, a hamming window function was applied.
In addition to the Sistemic database analysis, other studies have focused on how to analyze data from videos footage where accelerometry data is not available. It has been suggested that a top bounding box point strategy be used to track the head centroid, considered to be the most suitable landmark for tracking amongst all possible angles of filming and potential occlusions [24]. Additionally, depth data is necessary and is often obtained from infrared signals, which is not always available. As such, it has been suggested that an expert estimate the frames corresponding to the onset of the imbalance preceding the fall, the initiation of the descent (defined as one frame after foot contact with the ground in the last recovery step) and the first occurrence of impact (for each of head and pelvis) [25]. This allows for the calculation of a total fall duration (i.e. the interval between the onset of imbalance and initial impact of a body part) and descent duration (the interval between the onset of fall initiation and impact). Impact velocities can be estimated by manually digitizing points for the anterior superior iliac spine and the head (either ear or forehead), using the Hendrick method, a twodimensional direct linear transform (2D DLT) and a finite difference to estimate time-varying vertical and horizontal velocities. Finally, a fifth-order polynomial can be fit to the traces using a polyfit function. In order to further analyze our falls, we thus calculated total fall duration and descent duration by identifying the frames in question and using the available simulation data to obtain our values.

Motion history analysis
The video databases of falls as well as the simulations were also analyzed using motion history analysis. Motion history analysis is a tool for representing movement from a video in a single image. It allows us to compare the allure of falls in a single image and see the progression of movement through time. This was completed by modifying the easy-mhi 1.3 library created by Luke Barnett for Python. First, video of either the simulation or experimental human falls was processed for background removal. This process relied on the OpenCV library whereby background subtraction was completed using the MOG2 algorithm for the simulated video and the KNN or the MOG2 algorithm for the experimental videos (Fig. 2), choosing the algorithm that produced the best video quality and contrast. This choice allowed us to obtain a better result by trial and error. A 3,3 kernel was sufficient for the simulated videos, while a 2,2 kernel was used for the experimental videos and applied as a filter after median blurring of each frame. Finally, a threshold was applied to ensure that each image contained only pure white or pure black pixels. This allowed us to easily create a gradient legend for the motion history images by considering the number of frames included in each image. In our case, each image shows 25 frames for a capture of 1 s of movement. A minimum brightness value of 48 was retained. A legend was created using matplotlib's colormap feature and divided into 25 sections ranging from pure white to the minimum brightness value.
For the Sistemic videos, three videos were of particular interest, including F01: walking caused by slip, F04: walking caused by trip and F06: a vertical fall when walking caused by fainting. Unfortunately, to capture the entirety of the fall in these videos, the camera pans across the environment, creating motion history images with backgrounds that appear to move. As such, the background images were manually removed from the frames in question for these videos as shown in Fig. 3.

Fall simulation outcomes
The results of the simulation using the model weights which were fully trained on young muscle parameters and tested  Fig. 4a. In all five episodes of testing, each fall for this simulation was oriented in the backward direction. The displacement of the head, pelvis, toes and centre of mass shows that at the beginning of the simulation, the model took small steps with little displacement, before eventually falling backward. For this model, the muscle activation patterns shown in Fig. 4d indicate that at approximately the 2-s mark of the simulation, the right short head of the biceps femoris, right hip flexors and right rectus femoris show a sudden decline in activation. Shortly thereafter, a strong activation of the right gluteal muscles (i.e. hip extensors) leads to a period of backward mass displacement until approximately 2.75 s into the fall. At this moment, the left rectus femoris and left vasti become largely inactive creating the displacement leading to the backward fall. Before the simulation hits the ground and stops, we also note an overactivation of the left short head of the biceps femoris, left gastrocnemius, left glutei, left adductors and abductors, hamstrings and hip flexors. The right short head of the biceps femoris, gastrocnemius, rectus femoris, soleus, tibialis anterior and vasti also show a strong activation once the fall is initiated.
For the elderly model that considered the M_all condition for both training and testing, the testing results provided five similar falls, each of which shows a stepping response by the simulation before an eventual fall as shown in Fig. 5a. From 0 s to just before 1 s, we see a stronger push off from the right leg, causing a compensatory response of two steps in the left leg as can be seen by the two peaks in ankle extension. For this model, the muscle activation corresponding to the initiation of the fall phase begins just before the 1 s has elapsed. At this moment, the right rectus femoris and right tibialis suddenly show no activation. Afterwards, the right soleus, right abductors and right adductors show an increase in activation, and the right rectus femoris reactivates shortly afterwards. Meanwhile in the left leg, the main change preceding the fall occurs after 1.25 s. At this time, the left gastrocnemius and left adductors show a decrease in activation at the same moment as an increase in the left short head of the biceps femoris and the left tibialis anterior. In the final moments of the fall while the body is descending into the fall, we see an increase in activation from the left glutei, abductors, adductors, soleus, tibialis anterior and vastii, as well as an increase in activation from the right adductors, hip flexors and rectus femoris. This fall is largely characterized as a fall forward onto the flexed left leg with an extension in the right leg.

Fall simulation evaluation with experimental kinematics
The Sistemic study's pelvis acceleration data from two fall types was analyzed and compared to our simulation data. Plots for pelvis acceleration for the slipping forward fall are shown in Fig. 6 for both the young and elderly subjects. Figure 7 shows the pelvis acceleration data for the vertical falls of both young and elderly subjects. From the forward slipping data shown in Fig. 6, we see the peak acceleration in the trial shown reaching a higher value for the young subject than for the elderly subject. Over three trials, this peak acceleration averaged 3.41 g (varying from 2.61 to 4.99 g) for the young subject and 2.06 g (varying from 1.28 to 2.56 g) for the elderly subject. Considering the gradient characterizing the increase in acceleration resulting from the fall, we calculate a value for jerk from the last value lower than the average walking acceleration prior to the peak acceleration, to the peak acceleration. Averages of 24.05 g/s for the young subject (values from 13.12 to 41 g/s) and 7.926 g/s for the elderly subject (values from 2.12 to 18.85 g/s) were found for this gradient.
The difference between the young and elderly subject's peak acceleration and the jerk for the same fall type under the same conditions may also be influenced by hesitation and a more cautious approach that may intuitively be taken by an elderly subject. This data is also coherent with values presented in our simulation as shown in Table 4, showing that the time from fall initiation to impact is greater in the elderly. We also see that vertical impact velocity is lower in the young than in the elderly [25]. This suggests that younger people can recover from greater peak accelerations and implement strategies to slow any potential impact better than the elderly.
From Fig. 7, we see the variation across three trials for vertical falls for both the young subject and elderly subject. Though our simulation does not consider vertical falls, the data does highlight a challenge in the analysis of simple forward falls. We suspect that the downward spike seen for vertical falls is indicative of the moment that the individual stops walking forward, a braking moment. As this may break the forward momentum, we see less significant upward spikes in acceleration as compared to the forward slipping falls. In fact, the differences previously seen disappear as we fall vertically. This may further confirm that hesitation Fig. 4 Results of testing the young trained model for an elderly falls simulation with a 30% reduction in isometric contraction in addition to changes in muscle properties including maximal contraction velocity, deactivation time constant and passive muscle strain at maximal isometric force (M_all). a A screenshot of the fall shortly after it has been initiated. b The trajectories of the centre of mass, head, pelvis and torso during the fall. c The joint angular velocity of the right and left leg. d The muscle activation throughout the simulation due to the speed and consequence of injury may affect the kinematics of a fall (Fig. 7).
Our simulations stop once the pelvis touches the floor. As such, the young fall simulation data stops shortly after the peak pelvis acceleration is reached (Fig. 8). For the young simulation, the peak acceleration reached is 3.01 g, a value comparable to that of the experimental average of 3.41 g and in the range of values seen over the tree trials. The jerk value of the young simulation fall is 7.92 g/s, a value well under the experimental average of 24 g/s. For the elderly simulations, a lower peak acceleration was indeed found at 2.36 g which is again in the range of those seen for the experimental values. The jerk value for the elderly fall simulation is 5.96 g/s.

Fall simulation evaluation using motion history analysis
Motion history analyses have been performed on each of the muscle models tested. These representations show the second before contact with the ground for each of the falls. Lighter images occurred more recently in time than darker ones. This allows us to represent the evolution of movement over time in a 2D image. Figure 9 shows the young trained/young tested model, along with the young trained/ elderly tested model and elderly trained/elderly tested model whereby the elderly model considered the M_all condition. Figure 10 shows a comparison between the young trained/ young tested model in the left column, the young trained/ It should be noted that the young trained/young tested data is for the same fall for easier comparison. Testing provides kinematic data and videos for five trials; however, only one trial for each model is shown in the images in this paper. In Fig. 9, despite that forward falls were rewarded in each case, the young trained/elderly tested model for the M_all case produced five backwards falls, while the other cases provided falls in the forward direction.
The vertical and forward tripping fall videos from the Sistemic study along with the forward fall video from the Robinovitch study [17] were also analyzed by motion history analysis and presented in Fig. 11.

Discussion
The exploration of human locomotion learning is a complex engineering task due to the complex interactions within the musculoskeletal system. Reinforcement learning has been recently used to achieve this complex objective. The use of reinforcement learning allows us to analyze motions without inputting kinematic data. In particular, thanks to the use of dedicated bio-inspired reshaping strategies, we were previously able to simulate falls for healthy people leading to the identification of risk factors [20]. In our present study, we extended our previous work to explore falls for elderly people using the reinforcement learning approach. It has been suggested that muscle weakness has an effect on walking which reduces joint angle motion and minimizes toe clearance, which, in turn, affects falling [28]. Our simulations are therefore rewarded to take two steps before falling. Kerrigan et al. (2001) reported that the walking speed of elderly individuals who fall is reduced (0.89 m/s on average) in comparison with those who do not fall (1.21 m/s on average) [29]. In the case of our model, the elderly simulation produced a walking speed over those two steps of 0.26 m/s; less than the 0.41 m/s found for the young model ( Table 4). As most falls occur in the bedroom and bathroom [30], we believe these to be short distances where a steady walking speed is not likely to have yet been attained. The difference seen in our values and those of Kerrigan can therefore be explained by the fact that normal walking speed had yet to be reached within the two steps taken; however, the tendency for elderly individuals to walk more slowly is still evident.
It has also been suggested that older adults use mechanisms to absorb energy during a fall and reduce the impact velocity and thus the risk for injury [25]. This often results Fig. 6 Extraction of the Sistemic study acceleration data for falls for young subject SA02 and elderly subject SE06 for forward slipping. a Young subject acceleration data for the entire fall. b Young sub-ject acceleration data for the 0.6 s encapsulating the peak of the fall. c Elderly subject acceleration data for the entire fall. d Elderly subject data for the 2.5 s encapsulating the peak of the fall in a stepping response, which is present in approximately 64% of falls. It has also been found that pelvis velocity was 5% lower in falls involving stepping compared with those involving no stepping response. Our elderly trained/elderly tested model, which considered all of the muscle parameter changes together (M_all), showed a stepping response in each testing scenario. While some other testing cases showed small steps taken between the walking phase and the falling phase, no other model showed this as prominently.
Further research also suggests that lower leg antagonist muscle coactivation during normal walking increases with age and, irrespective of falls history, shows unperturbed stance at approximately 10% coactivation for the young and 20% for the elderly [31]. While we do not see the coactivation effect in our model, it is interesting to note the near constant activation in the young fall case for the gluteus, adductors and abductions, while the elderly trained falls did not show activations persisting throughout the entirety of the simulation, with the exception of gluteus contraction from at least one of the limbs.
One of the young trained models that were tested on the elderly muscle parameter values resulted in a backwards Fig. 7 Analysis of the vertical fall for three different trial: a trial R1 for young subject SA02, b trial RA for elderly subject SE06, c trial R2 for young subject SA02, d trial R2 for elderly subject SE02, e trial R3 for young subject SA02, f trial R3 for elderly subject SE06 fall, despite the simulation rewarding for forward falls. While the same model weights from the young simulation were tested on each of the elderly muscle parameter cases, including those for the different isometric reduction cases, the only case to produce consistent backwards falls was that which considered all of the muscle parameter changes at once (M_all). We speculate that this could be due to the fact that the model weights are optimized in a manner that made them particularly sensitive to the reduction in maximal contraction velocity or the increased deactivation Table 4 Average values and ranges (where possible) for measures used to characterize falls. Values from experimental studies for young and elderly individuals as well as values from our simulation, with elderly (trained and tested) simulation values given for the M_all condition shown in Fig. 6. * represents walking speed over two steps and not steady walking speed [25][26][27]   time constant. These two factors were not considered in the other elderly models that were still able to produce forward falls on testing. From the Sistemic data, we note the challenge in validating fall simulation data from kinematic sensor data. Through the analysis of forward slipping and vertical falls, we note than when a fall is initiated with a greater pelvis acceleration, the maximum acceleration and jerk is lower for the elderly subject, despite the same variation not being present for a vertical fall. As such, the importance of a means by which to analyze candid falls and falls within nursing homes through video analysis becomes even more pertinent. Given that the subjects in the Sistemic study knew they were to fall, it is unclear how much of this deceleration was due to anticipation, and how much was due to neurological or muscular changes due to ageing. It is also noticeable from the motion history analysis that the falls with a greater reduction in maximum isometric force bend at the knee and fall down in a more vertical manner. As each motion history image shows just 1 s of the fall, we also see more translation forward with increasing reductions in isometric force from the elderly trained/ elderly tested models. Though this is difficult to see in the motion history images from the Robinovitch study shown in Fig. 11 showing a candid fall, the simple forward fall translated further forward in the video then either of the cases shown by the young subject in the Sistemic study. This translation forward may further be explained by the phase of hesitation previously seen in the acceleration data from the elderly subject in the Sistemic study.
One of the limitations of our study can be found in the assumptions made in our model. Firstly, the upper body is not coded for in order to significantly accelerate training time. The reaction and affect of arm balance and upper body reflexes are therefore not incorporated. As a result, any deceleration before the pelvis hits the ground that may come from falling onto the arms or using the arms to slow the fall cannot be considered. Secondly, even if we further modified the model properties to take the ageing effect into consideration, the rheological models used for muscle and joint mechanics remain plausible only for normal healthy people. Thus, further studies should investigate how to incorporate novel rheological models that better describe the behaviours seen with ageing. Further limitations lie in the difficulty in validating a falls model with the limited data publicly available from real human falls due to the practical difficulties to obtain in vivo data in a noninvasive manner. Thus, further enhanced validation should be done. The use of transfer learning will be a potential solution to dealing with limited data [21]. Finally, input data and model property uncertainties also lead to the imprecision of simulation outcomes. These uncertainties should be considered in future implementation to obtain more accurate outcomes leading to determine reliable risk factors and associated rehabilitation strategies to prevent falls for elderly people.

Conclusion and perspectives
We have shown in the present study, for the first time, the use of RL as a strategy to explore the effect of ageing muscle physiological factors on the kinematics and muscle control during falls. Our findings showed that the elderly fall model which considers multiple ageing-related factors has created a fall that more closely resembles an elderly fall. Additionally, given that the elderly model has translated more in the forward direction and taken more than the two steps privileged by the reward function, it is also suggested that future work consider improvements to the reward function in order for the simulation to be more physiologically accurate. We also suspect that the decline in other muscle factors related to sarcopenia and muscle ageing may allow for a more realistic fall. It is also necessary to consider additional means of validating this work by exploring innovative ways to analyze true elderly falls Fig. 11 In the left column, motion history analysis has been completed for a vertical fall from the Sistemic study [15] showing a young subject. In the middle column, the Sistemic forward trip fall case with a young subject has been analysed by motion history analysis. Finally, on the right, the forward fall from the Robinovitch study [17] for an elderly subject is presented outside of a laboratory setting. Moreover, the behaviours prior to the fall will be studied to establish falls avoidance strategies or falling with minimal consequences, in order to determine potential patient-specific rehabilitation strategies for elderly people.
Funding The authors would like to thank the Hauts-de-France Region and Labex MS2T (Maîtrise des systèmes de systèmes technologiques) for the funding of this work.