Wearable Fall Detector Using Recurrent Neural Networks

Falls have become a relevant public health issue due to their high prevalence and negative effects in elderly people. Wearable fall detector devices allow the implementation of continuous and ubiquitous monitoring systems. The effectiveness for analyzing temporal signals with low energy consumption is one of the most relevant characteristics of these devices. Recurrent neural networks (RNNs) have demonstrated a great accuracy in some problems that require analyzing sequential inputs. However, getting appropriate response times in low power microcontrollers remains a difficult task due to their limited hardware resources. This work shows a feasibility study about using RNN-based deep learning models to detect both falls and falls’ risks in real time using accelerometer signals. The effectiveness of four different architectures was analyzed using the SisFall dataset at different frequencies. The resulting models were integrated into two different embedded systems to analyze the execution times and changes in the model effectiveness. Finally, a study of power consumption was carried out. A sensitivity of 88.2% and a specificity of 96.4% was obtained. The simplest models reached inference times lower than 34 ms, which implies the capability to detect fall events in real-time with high energy efficiency. This suggests that RNN models provide an effective method that can be implemented in low power microcontrollers for the creation of autonomous wearable fall detection systems in real-time.


Introduction
Falls are major public health problems worldwide for elderly people. Reports from the World Health Organization (W.H.O.) indicate that approximately 28%-35% of seniors over 65 years old suffer at least one fall per year [1]. The reports also show that this rate increases when considering people over 70 years old. The analysis of the records of emergency departments reported in [2] identified that fall victims suffered at least one new fall every six months. A major factor that influences this fact is that many elderly people lose confidence and adopt a more sedentary life, losing mobility, quality of life and, thus, increasing the probability of falling because of their poor shape [3,4]. Direct consequences of falls can be injuries to muscles or ligaments, bone fractures and head trauma with consequent brain damage, among others. Major injuries pose significant risk for post-fall morbidity and mortality. In addition to that, it has strong economic impacts on family and public health. For instance, it was estimated that the United States spent $19 billion as a consequence of fall related hospitalizations in 2006 [5]. This topic is gaining importance due to the progressive increase in the elderly population [6,7].
Fall detection systems (FDS) are devices that monitor user activity and ideally alert when a fall has occurred. Their main goal can be summarized as distinguishing between two states: Activity of daily living (ADL) and fall events (alerting when this one happens) [8]. These devices allow sending an accident notification immediately to medical entities, caregivers and family members for quick assistance.
The detection of falls through technological systems is a very active field of study, given the importance of the subject. The literature review in [8] distinguishes between context-aware and wearable systems. The first one uses sensors such as cameras, pressure sensors or microphones, deployed in the environment. Their main advantages are that it is not necessary to wear any special device, and that acquisition sensors can be more complex for an increased effectiveness as they do not have significant computational or energy supply limitations. However, these kinds of solutions are limited to their deployment area, which usually implies having to perform an installation of sensors in the different rooms where the user lives or is monitored. These facts mean that these systems are not suitable in some situations, for instance if the user lives in sparsely populated areas such as small towns and leaves home often. In addition, these systems are generally expensive because of the installation they require and the sensors they use, which could make them economically unfeasible for some population niches. Another important aspect is that its installation in public health systems could be difficult because these systems would not only collect information from the target patients, but from other people, undermining their privacy.
On the other hand, wearable devices allow continuous monitoring without any dependence from environment-based sensors. That makes them ubiquitous systems that only acquire user-related data, which favors its use in hospitals and many other scenarios. In addition, they usually use simple sensors, commonly accelerometers and gyroscopes, that require low-power consumption. Several review studies have been done about this topic and one of them is presented in this work [9]. This fact allows to reduce the size of the devices and to increase their battery life. This also usually implies lower economical costs compared to context-aware systems. As disadvantages, these devices need to be worn by the user and must be charged periodically. In order to make these systems autonomous, they must combine efficiency and effectiveness: Fall detection techniques require a continuous sensor monitoring process (several times per second) that may demand a high power consumption if the data is processed externally (in order to obtain better results); but, if the detection is done inside the embedded system itself (to reduce power consumption), the detection algorithm may reduce the fall detection accuracy and the system could have high response times if the algorithm implemented is computationally expensive.
Among the different algorithms that exist for wearable devices, we can find two main types: Threshold based and machine learning based algorithms. While threshold based algorithms show very high performance [10] in terms of detection effectiveness and low computational complexity, they present many difficulties when trying to adapt them to new types of falls and user characteristics [11]. Machine learning methods are considered more sophisticated approaches to solving this problem, but they require a high number of samples to achieve high effectiveness rates, and nowadays there is a scarcity of datasets for study these events [12]. Other functionalities that can be investigated for this type of system is the prevention of falls or the possibility of damage mitigation [13].
Recurrent neural networks (RNN) such as long short-term memory units (LSTM) and gated recurrent units (GRU) are deep learning networks specifically designed to process sequences. Recent studies shed some light on the potential of RNNs for dynamic signals classifications [14] and more precisely for accelerometer data [15,16]. However, these algorithms have a high computational cost due to the large number of algebraic operations they perform. Running these models on low power microcontrollers with limited features, suitable for wearable devices, can lead to long response times and high power consumption, even for simple tasks [17]. This fact makes difficult to create real-time wearable fall detectors based on RNN.
The research described in this paper aims to assess the feasibility of implementing a wearable system for the detection of both falls and fall hazards using RNN architectures which has a good performance in terms of computational complexity and real-time effectiveness.
The article is organized as follows: the current Section 1 continues with the description of the most recent works in the literature that use machine learning algorithms for fall detection, implemented on wearable devices, as well as the basis of the two types of RNN used, that is, Long Short Term Memory (LSTM) and Gated Recurrent Units (GRU); Section 2 describes the proposed materials and methodology used for the assessment of the RNN-based wearable fall detector systems; Section 3 presents the results and discussion regarding the effectiveness of the trained deep learning models, the performance obtained after their integration into an embedded system, as well as an analysis of energy consumption; and Section 4 includes the conclusions and points out possible future works.

Previous Works
Fall detection systems are a very active research area. In this section we consider several of the most recent studies that are based on the use of wearable devices to detect falls. Table 1 summarizes these works highlighting information about the methodology and results. In [18] four different machine learning algorithms were analyzed using two combined datasets: k-nearest neighbors (K-NN), artificial neural network (ANN), quadratic support vector machine (QSVM) and ensembled bagged tree (EBT). The main contribution to this research area is the proposal of a set of new features obtained from accelerometer information, so these can be used as output from the machine learning algorithms. The best accuracy obtained (97.7%) was obtained with the ensembled bagged tree algorithm, a type of decision tree algorithm. The study shown in [22] also proposed new features, based on the first and second order moments, extracting 12 new features that were used with a Support Vector Machine algorithm. The results are very good, with an accuracy of 99.9% when using the features.
The work in [21] combines threshold based metrics (TBM) with multiple kernel learning support vector machine (MKL-SVM). The system was implemented in an Android app, and was trained to identify falls with the mobile phone located near both the waist and the thigh. The first TBM stages allow to discard false positives resulting from performing a daily activity that has sharp acceleration moments, such as lying on a bed. The best results were obtained when the mobile was located in the waist, with an accuracy of 97.8%.
The study in [11] also considers the effectiveness of different algorithms, that is, k-NN, linear discriminant analysis (LDA), logistic regression (LR) and classic decision tree (DT). In this case, the fall detector system consist of an ATMega32 Arduino microcontroller located in the user wrist. Thus, the features considered as output of these algorithms have to identify arm movement key values. In this work, k-NN algorithm had the best results with a 99.0% of accuracy, a 100% sensitivity and 97.9% specificity. In this case, three sensors were used: An accelerometer, a gyroscope and a magnetometer.
The work in [24] showed a fall detection system architecture design that combines big data techniques used for a continuous improvement of a decision tree algorithm. Initially, the algorithm was trained with a subset of activities from the SisFall dataset [23] to classify three different classes of falls, and ADL. It was tested with data obtained from empirical experiments, with good results. While the wearable device only acts as an accelerometer signal acquisition tool, it would be possible to create a version that dumps the updated decision tree in the embedded system periodically to get improved alert times.
A more unusual detection system is described in [13], where the used signals consist of muscle impulses measured by a surface electromyography sensor. The study analyzes the capacity of a LDA algorithm to identify the initial phases of a fall and prevent damage with an actuator system. The results obtained showed that these signals can also be used to detect falls and can complement the most common acquisition systems to reduce the number of false positives.
The study in [25] also combined TBM with Machine Learning. The TBM stage detected potential falls and was implemented in an embedded system with accelerometer located in the user front-pocket. The potential falls were finally classified using a k-NN algorithm implemented in an Android app. The system was empirically tested with 20 users who simulated falls and activities of daily living. With this approach short execution times were achieved, which allow real-time classification and good accuracy.
Finally, the proposal in [26] is unique, to the best of our knowledge, as it assesses the use of a RNN-based algorithm to detect falls. The used approach, which we address in this work as well, is the detection of both falls and fall hazards. The obtained effectiveness was exceptionally good, considering that it is possibly the first study that uses this technology for fall detection using accelerometers, that the architecture used comes from other studies and no modifications were made to adapt it to this problem, and that the algorithm inputs are raw sensor samples without preprocessing or calculating any feature. However, the main problem lies in its computational cost, which ruled out its use in real time when executed on a microcontroller. One of the reasons that made real-time execution non viable were the high sampling rates and the complexity of the used RNN architecture. The term architecture refers to the number and type of layers that configure a specific neural network based algorithm.
In this work we assess architectures where execution times are improved without losing effectiveness. We also perform tests with different sampling frequencies.

Gated RNNs
Gated recurrent neural networks are RNN architectures that provides an effective solution to the vanishing gradient problem [27] and the exploding gradient problem [28] that affected backpropagation through time [29] in previous RNN versions. The central idea behind these architectures is a memory cell with nonlinear gating units. The memory cells hold information separated, maintaining its state over time. The information is managed through a set of activation functions, named gates. During the training process, each cell adjusts the activation weights, that is, learns to close or open its gates, according to the relevance of the information obtained from the sequence and the information currently stored. This information is used in the learning process of the classical RNN part. Since the information contained in the cells is isolated from the flow of the conventional RNN, they are not affected by the vanishing and exploding problems. long short-term memory units [30] were the first proposed Gated RNN. They contain three gates, two of which, called input and forget gates, are responsible for evaluating the addition of new information into memory and the deletion of part of the stored information, respectively. A third one, called output gate, controls what information is provided to the next step of the neural network in the training process. The set of vector formulas that rule a LSTM layer can be expressed mathematically as where h t is the unit state. c t represents the cell memory, whilec t is the new information coming from the recurrent neural network. o t , f t , i t are the results of the output gate, forget gate and input gates, respectively. σ and tanh represent the sigmoid and hyperbolic tangent activation functions, respectively. Vectorial pointwise multiplication is denoted by • . We get the following weights: where N is the number of LSTM units, and M the number of inputs.
On the other hand, gated recurrent units (GRU) [31] are more recent cells similar to LSTM. They are distinguished mainly by the lack of the output gate and, thus, what is stored in the memory by the cell is dumped into the neural network completely during the entire training process. The remaining gates are named update and reset, which add new input information and clear data stored from previous iterations, respectively. The equations are quite different from those modeling the LSTM, mainly as a result of the absence of output gate: where z t , r t are the result of the update gate, and reset gates, respectively. For this architecture, there are fewer weights involved: Both RNN layer alternatives have shown to be similarly effective [32], but GRUs have a slightly lower computational cost because of the absence of the output gate.

Dataset
The research protocol and results presented in this work were performed using the SisFall dataset [23]. It is composed of several simulated activities mainly classified in falls and ADL. The participants in the data collection were 38, among which there are 23 adults and 15 elderly people. Each sample contains accelerometer and gyroscope measurements obtained from a device fixed to the user's waist and acquired at 200 Hz. This dataset was complemented in [16] with a labeling proposal. Each temporary sample was classified according to whether it belonged to a fall event, a fall hazard or an activity of daily life. To our best knowledge this is the only public fall dataset that contemplates fall hazard events, consisting of moments before a fall, or during a dangerous situation where the user was able to avoid a fall.
As mentioned in previous sections, the inputs of recurrent neural networks consist of a sequence of values with a fixed length. That length is named width. Each value in the sequence has a fixed dimension. In the context of this problem, the values consist of a tuple with three elements corresponding to the three axes of the accelerometer. From now on, throughout the manuscript we will refer to each tuple with the term sample. In the same way, each sequence of samples with fixed width will be referred as block. To train a RNN model, each block must have an associated label, corresponding to the event class that contemplates. We used the proposal established in [16], in which each block is classified according to the percentage of appearance of the most relevant class. The classes in order of relevance refer to a fall event (FALL), a risk of falling (ALERT) and others, labeled as background (BKG). Background or BKG class considers the rest of time intervals, that mainly includes activities of daily life, other activities not related to a fall, such as jumping, and also the time that the user remains lying after a fall. The classification criteria are schematized in Figure 1 (left). This rule was applied to each activity record from the dataset, establishing a block width of 256 samples, equivalent to 1.28 seconds. A 50% stride was applied.
Lastly, three additional versions from the resulting dataset were created, reducing the number of samples per block, that is, the width. It is intended to evaluate the performance of the models when they are trained with less information, simulating a lower sampling rate. The process of reduction of samples consisted in eliminating the samples in even position of each block (see Figure 1, right). It was performed three times with each resulting dataset, obtaining blocks with a width of 128, 64 and 32 samples, which correspond to 100 Hz, 50 Hz and 25 Hz sampling frequency, respectively.

RNN Architectures
Results obtained in [33] showed that the regularization of sample values substantially improves the effectiveness. To achieve this, a batch normalization layer is included at the beginning of the architecture. A recent study [34] revealed that this smooths the objective function to improve the performance. A 10-fold cross validation study [35] determined that this was not effective for obtaining non-sequential characteristics. Based on these results, in this work we deepened our study and analyzed the feasibility of integration for four different architectures. These architectures are those with higher performance determined in previous studies.
The two simplest architectures consist of batch normalization, a RNN layer and a fully-connected output layer (see Figure 2). Softmax is used to determine the event class. The difference between one and the other is the use of LSTM or GRU as the recurrent layer. The other two architectures contain a second RNN layer of the same type as the previous one. While the computational cost in the most complex versions is higher, their effectiveness is also slightly higher.
In order to optimize the results, we adjusted batch size and learning rate hyperparameters by grid search. Dropout [36] technique was also applied to the inputs of the fully-connected layer.

Embedded System Features
We chose two STM32 32-bit microcontrollers (MCUs) for the integration and performance analysis of the trained models. Both are based on the high-performance ARM Cortex-M4 processors, with features that allow real-time capabilities, digital signal processing and low-power operation.
The first device selected is a STM32L476RG, part of the ultra-low-power catalog with the specified ARM processor MCU. It operates at a frequency up to 80MHz, contents 1 Mbyte of flash memory and 128 Kbyte of SRAM. The second device is a STM32F411RE, that offers a higher processing performance. It operates at a frequency up to 80 MHz, 512 Kbytes of flash memory and 128 Kbyte of SRAM. Both feature a floating point unit for a better precision in data-processing.

Protocol
The feasibility analysis consisted in a set of tests, divided into three stages. The first aims to study the algorithm effectiveness before the training, optimizing the hyperparameters. Secondly, the performance of the modes were assessed once they are integrated in the microcontroller. Lastly, a power consumption analysis was performed.

Effectiveness Analysis
The architectures were trained using the data from 30 users, (near of 80% of the dataset), while the rest, corresponding to 8 users, were used for the final evaluation. The users for each subset were randomly chosen, but maintaining an equitable distribution between adults and elderly. The training subset were the used in [35] applying 10-fold cross validation and estimate the goodness of the models with a correct reliability. In a first stage, five training processes for each architecture with different sampling frequencies were performed, in order to determine those with the best performance. In a second stage, we used smart grid search for optimizing the architectures with better results in the first stage.
Due to the dataset being highly unbalanced, the overall classification accuracy is not an appropriate way to measure the effectiveness of the system. We compared the effectiveness employing the macro F1-score [37], that measures the relations between data's positive labels and those given by a classifier through a harmonic mean of macro-precision (precision m ) and macro-recall (recall m ).
where m index refers to macro metric and classes = {BKG, ALERT, FALL}. TP c , FP c and FN c denotes the number of true positives, false positives and false negatives of each class c ∈ classes, respectively. While the F1-score is an appropriate metric for a multi-class problem, it is not usual to assess the performance of a FDS. In this context, sensitivity and specificity metrics are more commonly used. Sensitivity is another term to refer to recall. The formula for specificity is where TN c denotes the number of true negatives of each class c ∈ classes. These metrics are also considered in this work.

Performance on Embedded Systems
Two main aspects were analyzed for the embedded devices with each proposed model. First, the time spent processing a block, that is, the execution time of the integrated model. This parameter seeks to locate those architectures that can work in real time, that is, that are capable of providing a response in the time that elapses until a new sample of the accelerometer is read. Secondly, we assessed the differences on the inference outputs of the models optimized for their execution on the embedded systems. This is obtained by calculating the relative L2 error: where F g enerated refers to the flatten array of the generated model last output layer and F o riginal refers to the flatten of the original model.

Power Usage Analysis
We assessed if the implementation of these kinds of models in an embedded system provides some advantage in terms of energy consumption. For this, two fall detector system designs were considered (see Figure 3). The alternative version consisted in using the embedded system as only an acquisition and transmission tool, so that its tasks are reading of the accelerometer measures and transmitting each new sample to an external device with greater computational capacity and no energy related constraints. We considered Bluetooth as the communication technology. This first scenario was compared with the target version, consisting of an embedded system which integrates the RNN model and executes it in real-time. This version has as main tasks the accelerometer reading, the execution of the implemented model and an alert transmission to an external device, only in case of an alert or a fall event. The power consumption for each task was calculated based on the technique specifications for the embedded systems and the auxiliary modules: The bluetooth module and the triaxial accelerometer. The execution time for each task was also estimated based on the hardware features and the RNN execution time.

Models Analysis
The number of blocks per each subset from the dataset is shown in Table 2. As mentioned in the methods section, the number of blocks from classes ALERT and FALL are much lower than BKG. This is due to the short duration of risk and fall events. For the training process, we used a graphic processor unit NVIDIA GTX 1080 Ti and the CuDNN versions of this RNN layer provided, implemented in the Keras framework. The use of CuDNN RNN layers improves the training speed substantially, 8 to 10 times faster. The results of F1-score with cross-validation using the training set ( Figure 4, left) indicate that the reduction of the number of samples per block does not affect the results negatively. The standard deviation (around ±0.25 and ±0.35) reveals a slight dependence on training and validation subsets. Each architecture was trained five times with initial random weights. Figure 4 (right) shows the macro F1-score average results using the subset reserved for test. Both architectures presented similar effectiveness. The architecture with two GRU layers shows a sightly better F1-score. However, we did not consider the differences between the models substantial enough to discard any model in terms of effectiveness. On the other hand, since the computational efficiency of these models was greatly influenced by the reduction in the number of samples processed, the rest of the study was conducted with blocks of 32 samples (25 Hz).  Table 3 covers the values considered for the hyperparameters and the dropout for grid search. The best results obtained for each model and the associated hyperparameters are shown in Table 4. The accuracy is greater than the most recent works which consider a multi-class problem. However, the sensitivity is quite lower. We have assessed the architectures using 10-fold cross-validation to ensure the results are independent from the test subset used. The effectiveness deviation depending on the test subset that reveals Figure 4 (left) can explain the differences in the results with [16], where a typical 80%/20% dataset split was used. Macro F1-score results are mainly affected by the low macro precision metric value which, in turn, is low due to the low precision value in the ALERT class. This is due to the scarcity of data for this class. A small percentage of BKG events are wrongly predicted as ALERT, but comparing with the amount of blocks of the ALERT class this is a very significant percentage. This fact reveals the difficulty in training machine learning algorithms with unbalanced data. A larger quantity of datasets is necessary, something difficult for this problem, since falls can only be obtained from simulations and they imply putting at risk the health of the participants, especially if the participants are elderly, which is unfortunately the target population.
The receiver operating characteristic (ROC) curves (see Figure 5) per each model and class reveal a good reliability in the inference of event classes. These curves were obtained from the results for each node of the output layer by modifying the confident threshold. The areas under the curve (AUCs) are higher than 96%. The confusion matrix for each model (see Figure 6) shows high accuracy values, but in addition, it reveals the previously mentioned problem about the scarcity of ALERT events and the percentage of BKG predicted as ALERT.

Integrated Model Performance
The different RNN models were integrated in the ST-Nucleo boards using the X-CUBE-AI STM32CubeMX expansion pack. It allows the conversion of pre-trained models optimized for their execution on SMT32 devices. Furthermore, it provides tools for measuring the execution times of the model more accurately, as well as for the comparison between the original algorithm version and the C-model running on the microcontroller. It is important to mention that, due to the models being trained using CuDNN versions for the RNN layers, it was needed to transmit and adapt the weights to non-CuDNN equivalent layers, before their conversion to optimized c-models. The framework allows this. To verify that the change did not affect to the model effectiveness, it was checked that the classification of the test subset matched to the results shown in the previous section. There were no differences in the classification.
To evaluate the variation in the effectiveness of the models after their conversion to optimized versions for ST32 devices, we compared the values of the outputs of the last layer for both cases. The outputs per block inference consists of three values, one for each class considered, with ranges between 0 and 1, in floating point. The L2 error for LSTM model (see Table 5) was less than 10 −6 , which indicates very little variation in the generated models. However, the L2 error obtained is much lower in LSTM models than GRU ones. This fact can be due to differences in Keras and X-CUBE-AI libraries that affect the GRU layer implementation.  Figure 7 shows the time required for each inference, that is, the classification of a unique block. It is calculated as the average execution time for 10 executions per model and block size. The lines in the chart indicate the accelerometer sampling rate, which implies approximately the available deadline of each model to run in real time. In case of the F411RE device, only the simplest models complied with the required running time, with a sampling frequency of 25 Hz, equivalent to 32 samples per block. For the L476RG, only the simplest GRU model satisfied the time requirements, but it was very close to the sampling rate (35.8 ms per classification). Due to the fact that the microcontroller also has to perform other operations such as the accelerometer reading, the L476RG device had to be discarded. Since the system can operate in real time, at a frequency of 25 Hz, this implies that the system is capable of sending an alert notification in less than 40 ms. Based on the criteria used to classify the dataset blocks, a fall would be detected in less than 180 ms since it starts. Additionally, an alert event could be detected in less than 680 ms since it begins. This implies that these types of systems can be a preventive tool, connected to some element such as a portable airbag.

Power Consumption Estimations
The components that conform the systems are a ADXL345 accelerometer and a Bluetooth HC-06 module connected to a F411RE microcontroller, and a general-purpose device as receptor. During the tests, this receptor was a personal computer, but in a real environment it would be ideally a portable device with a continuous connection to a health emergency center. The transmission protocol used for the accelerometer was I2C.
According to the technical features of the F411RE microcontroller, the current consumption when executing from Flash memory should be as low as 100 µA/MHz. In stop mode the power consumption is lower than 10 µA, which can be considered negligible. Using an I2C protocol for the accelerometer register values from the ADXL345 sensor the current estimated during the reading process is 5 mA. In case of the device without an integrated RNN, the battery is mainly used in the transmission of data, that is determined by the sampling frequency. The current for stage, consisting in transform the values to be sent, was estimated in 5 mA, and the sample sending via Bluetooth was 43 mA considering the power consumption in the specifications. At 25 Hz, the device battery life would be approximately 9.9 h if it is powered with a 150 mAh battery.
Regarding to the device with the simplest LSTM model implemented, the energy consumption comes mainly from the accelerometer values reading and the execution of the algorithm. The current estimated during the RNN execution is 5 mA, although the time spent running it is considerably longer than the transformation of values performed in the previous case (82.5% running for the simplest LSTM model and 57.5% for the simplest GRU model). The remaining power consumption depends on the number of transmissions made to alert on a fall or a risk event detection. According to [1,38], the number of falls of an elderly person is near to once a year. However, we consider in this analysis unfavorable cases, such as the case of people with poor balance or motor difficulties. Figure   Results obtained improve the battery life reported by other works with machine learning solutions [22,26]. Due to this, it can be possible to add new characteristics, such as a wifi module or connection to mobile networks, instead of bluetooth, to directly transmit information without the need for an auxiliary device.
Given the scarcity of datasets that currently exists from falls, that is the biggest problem currently for the improvement of deep learning algorithms, the system should be improved with an infrastructure based on big data analysis, as proposed in [24]. In order not to affect the battery consumption while in use, these wearable devices could integrate a data storage module that saves the data registered during the day, to be synchronized in the cloud when charging the device. This would allow this anonymized data to be used to improve the algorithm.

Conclusions
This work provides a study of the feasibility for the creation of wearable fall detector systems in real time using RNN architectures. The obtained results reveal that the architectures with 1 RNN layer at 25 Hz sampling frequency can be executed into a low power microcontroller in real time. The assessment of the trained models reveals that the reduction in the sampling frequency only affects the effectiveness very slightly. The estimated consumption indicates that it is possible to use small batteries. It allows to design a miniaturized device that is easy and comfortable to wear by the users.
The results in accuracy and specificity are greater or similar to other multi-class fall detector classifiers using accelerometer signals. However, sensitivity is slightly lower. The lack of data on the optimal values used and absence of F1-score metric in these studies did not allow us to make a more exhaustive comparison of effectiveness. In this study, 10-fold cross-validation has been used for greater result reliability, independently of the training subset. This reveals an F1-score deviation depending of the subset used and can explain the differences in sensitivity with other studies with evaluation methods that may be influenced by the dataset split used. In any case, this work focuses mainly on the integration of this type of model in low performance embedded systems. The execution times obtained with the proposed models are much higher than those obtained in [26], allowing real-time prediction using low power microcontrollers and higher battery life.
Due to the fact that these systems can be executed in real time, we consider that this work shows that deep learning RNN architectures are a new approach to the creation of more effective wearable fall detection systems. Therefore, we encourage research on these models, for instance by applying techniques that are already used in traditional machine learning models such as the introduction of features as input data, or reducing the complexity of the proposed models.
In future works a complete fall detection system based on this model will be thoroughly tested with new participants in order to verify the effectiveness in real scenarios. Funding: This work has been partially supported by the Telefonica Chair "Intelligence in Networks" of the Universidad de Sevilla, Spain.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: