Towards personalized environment-aware outdoor gait analysis using a smartphone

Automatic gait analysis in free-living environments using inertial sensors requires individualized approach as local acceleration and velocity profiles vary with the walker and the topological properties of the environment (e.g., walking in the forest vs. walking on sand). Here, we propose a smartphone-based gait assessment architecture which consists of two data processing modules. The first module employs a set of personalized classifiers for automatic recognition of the walking environment. The second module provides accurate step time estimates by selecting the optimal filtering frequency tailored to the predicted environment. The performance of the architecture was evaluated using experimental data collected from 10 participants walking in 10 different conditions typically encountered during daily living. Compared with ground truth data, the architecture successfully recognized the walking environments; the percentage of correctly classified instances was above 92%. It also estimated step time with high accuracy; the mean absolute error was less than 10 ms, outperforming or at the very least matching the performance levels achieved in controlled laboratory trials (indoor flat surface walking). Compared with using one filtering frequency for all environments, using optimal frequency tailored to each environment reduced step time estimation error by more than 39%. To the best of our knowledge, this is the first study which successfully demonstrates that parameter tuning can improve gait characterization in outdoor environments. However, further research using a larger data set (including more participants with varying demo-graphics and degree of impairment) is needed to confirm this result. Our findings highlight the importance of environment-aware gait analysis, and lay the ground-work for a smartphone-based technology that can be used in the community.


| INTRODUCTION
In recent years, developing cheap and portable technologies for automatic assessment and characterization of how well people walk and maintain balance in free-living environments has gained momentum with potential applications in areas of healthcare, sports science, and surveillance. One promising approach is to use body-worn devices (either custom-built or off-the-shelf including smartwatches and mobile phones) which record movement data using inertial measurement unit (IMU) sensors (e.g., accelerometer and gyroscope) (Tao et al., 2012).
Compared with laboratory-based measurements (performed by human experts using expensive 3D motion analysis sensors), body-worn devices will enable large-scale, continuous, and long-term data collection under more natural conditions. These rich data sets can then be used to investigate how walking patterns vary depending on a wide range of factors: physiological (e.g., fatigue and injury), bio-mechanical (e.g., walking with an assisting device or carrying a shopping bag), environmental (e.g., walking on irregular surfaces or climbing uphill), and behavioural (e.g., cleaning and shopping) (Ippersiel et al., 2022;Kowalsky et al., 2021;Luo et al., 2020;Yang et al., 2012).
However, analysis and interpretation of outdoor walking data are not straightforward as the local acceleration and velocity profiles obtained from IMU sensors are noisy (e.g., because of undesired motion artefacts), highly variable (due to factors listed above), and not intuitive. The successful recovery of meaningful gait parameters hinges on context-aware data processing methods; that is, methods capable of recognizing under which circumstances a recording is made and processing data accordingly to improve parameter estimation accuracy.
To take a step forward in this direction, this study focuses on one aspect of context-awareness, which is knowing where walking takes place. We propose a new software architecture to achieve personalized, environment-aware outdoor gait analysis ( Figure 1). The architecture consists of two modules: environment classification module which recognizes the environment a recording is made (e.g., while walking uphill or on pebble beach), and a gait characterization module which uses an adaptive algorithm to estimate temporal gait parameters. The gait characterization module is adaptive because the predictions made by the environment classification module are used to adjust the parameters of the gait characterization module to improve its performance. In particular, we concentrate on estimating one clinically relevant gait parameter, step time. When combined with step length, step time is used to predict walking speed (i.e., step length divided by step time).
Within the remit of automatic gait analysis using wearable IMU sensors, the majority of the previous work studied walking in controlled laboratory environments, and the proposed step detection and analysis methods were not adaptive in the sense that they did not consider who was walking and under which conditions walking took place (Avvenuti et al., 2018;Manor et al., 2018;Zhong & Rau, 2020). The proposed software architecture attempts to address both limitations by aiming for adaptive gait analysis in free-living environments. In particular, we study whether it is feasible to recognize outdoor walking environments using classical machine learning methods. We also evaluate whether personalized (obtaining a separate model for each individual as opposed to one generalized model for all), and environment-aware parameter tuning improve the accuracy of the gait characterization module.
The remainder of the paper is organized as follows. Section 2 discusses related work on environment classification and gait characterization using smartphones or other devices utilizing IMU sensors. Section 3 provides details about the design and implementation of the environment classification and gait characterization modules, as well as experimental procedures including data collection and analysis. Section 4 presents results summarizing the performance of the two modules. Section 5 presents a short summary of the work and highlights its main findings. It also discusses its current limitations as well as potential avenues for future work.
F I G U R E 1 A proposed software architecture for automatic estimation of temporal gait parameters in outdoor environments using a smartphone. The details about the environment classification and gait characterization modules are discussed in Section 3. For this study, phone data were processed offline using a standard laptop. The long-term goal is to run the architecture on the phone for real-time gait analysis 2 | RELATED WORK 2.1 | Environment classification Hu et al. (2021) studied walking patterns of 30 participants (15 females and 15 males, age = 23.5 ± 4.2 years old, height = 169.3 ± 21.5 cm and weight = 70.9 ± 13.9 kg). Participants wore six IMU sensors (one on the wrist, one on the lower back, one on each thigh and one on each tibia) and walked 15 m at preferred speed in nine different environments: flat, cobblestone, grass, stairs up, stairs down, uphill, downhill, bank left, and bank right. They evaluated the performance of three complex deep neural networks in recognizing where walking occurred: convolutional neural network, long short-term memory network, and long short-term memory network with global pooling. They showed that the classification accuracy was 84% when only lower back IMU sensor was included in the analysis, and it increased to 92% when all sensors were included in the analysis.
This data set is publicly available. We used it in (Bunker et al., 2021) to evaluate the classification performance of seven classical machine learning methods (i.e., fuzzy-rough nearest neighbour, vaguely classified nearest neighbour classifier, random forest, decision tree, naive Bayes, support vector machine, and multi-layer perceptron); 82% classification accuracy was achieved matching the performance of the deep neural networks mentioned above. However, neither of the studies evaluated the generalization performance of the classifiers; that is, whether they could extend to unseen participants successfully. Dixon et al. (2019) studied the running patterns of 29 participants (14 females and 15 males, age = 23.3 ± 3.6 years old, height = 180 ± 10 cm and weight = 63.6 ± 8.5 kg). Participants wore two IMU sensors (one on tibia and one on lower back) and ran on three different surfaces (synthetic track, concrete pavement and wood chip trail). They evaluated the performance of two classifiers (gradient boosting and deep convolutional network) and their variations using 90% training and 10% testing data split (which was repeated five times by randomly reshuffling the training and testing data). Above 90% classification accuracy was reported for all classifiers (the best performance being around 97%). Again, there was no mention on how well the classifiers generalized to unseen participants data. Benson et al. (2020) studied the running patterns of three groups of participants: Group 1-28 participants, 10 females and 18 males, age = 32.2 ± 13.4 years old, height = 174 ± 9 cm and weight = 70.5 ± 10.3 kg, Group 2-25 participants, 13 females and 12 males, age = 36.9 ± 10.1 years old, height = 173 ± 10 cm and weight = 70.2 ± 13 kg and Group 3-16 participants, 8 females and 8 males, age = 31.2 ± 10.3 years old, height = 170 ± 9 cm and weight = 67.1 ± 8.1 kg. All participants wore an IMU sensor on lower back. Group 1 participants ran on a treadmill (duration = 5 min), Group 2 participants ran on a concrete side-walk (length = 600 m), and Group 3 participants ran both environments. A binary support vector machine classifier was trained using data from Groups 1 and 2, and its performance was evaluated using 10-fold cross-validation resulting in 93% accuracy. The authors also evaluated the generalization performance of the model using data from Group 3 (84%). Ahamed et al. (2018) compared the running patterns of six participants (five females, age = 47.5 ± 9.6 years old, height = 169 ± 2 cm and weight = 67.4 ± 11.5 kg, and one male, age = 29 years old, height = 170 cm and weight = 75 kg). Participants wore two IMU sensors (one lower back and one on wrist watch) and ran in two different weather conditions: one in winter (À10 C) and one in spring (6 C). The random forest classifier (including 100 decision trees) was trained and evaluated using 70% training and 30% testing data split resulting in 87% accuracy. The authors also tried training a separate model for each participant which increased the accuracy to 95%.

| Gait characterization in outdoor environments
Within the scope of automatic gait analysis using IMU sensors, previous research has primarily focused on step detection (i.e., whether someone is walking or not) (Avvenuti et al., 2018) and extraction of step-related spatial and temporal gait parameters (such as cadence, step length, step time, swing to stance ratio, double support time, gait asymmetry and variability) during steady walking in indoor environments (Manor et al., 2018;Silsupadol et al., 2019;Zhong & Rau, 2020). Only a handful of studies have looked into gait analysis in free-living environments. Silsupadol et al. (2019) investigated walking patterns of two groups of participants: Group 1-12 young adults, 8 females and 4 males, age = 21.4 ± 1.2 years old, and group (2) 12 older adults, 12 females, age = 72.4 ± 6.1 years old (participants' height and weight information was not reported). All participants carried three smartphones (two on lower back and one in shoulder bag) walked on pedestrian walkways in seven different ways: (1) preferred speed, (2) turn left (3) turn right, (4) decelerate from normal to slow speed, (5) accelerate from slow to normal speed, (6) accelerate from normal to fast speed, and decelerate from fast to normal speed.
Step time (and other gait parameters including gait speed, cadence, step length and asymmetry) was estimated from anterior-posterior acceleration channel which was low-pass filtered at 2 Hz cut-off frequency. This channel was utilized to detect heel-strike time points which were then used to estimate the gait parameters. The step time estimation error varied between 0 and 60 ms and it was higher during unsteady walking including acceleration and deceleration. Weiss et al. (2011) studied 22 Parkinson's (7 females and 15 males, age = 65.9 ± 5.9 years old) and 17 control participant (9 females, 8 males, age = 69.9 ± 8.8 years old) (participants' height and weight information was not reported). Participants wore an IMU sensor (lower back) and walked for a minute in a hospital corridor and outside the hospital. In addition, one Parkinson's and one control participant were asked to wear the sensor at home and outside for three consecutive days. The study compared acceleration profiles between groups and between environments but did not attempt to estimate temporal gait parameters from the data.

| MATERIALS AND METHODS
The study was approved by the Aberystwyth University Ethics Committee Board, and all experiments were conducted in accordance with the Declaration of Helsinki. All participants gave their informed consent before participating in the study.

| Participants
Ten participants from Aberystwyth town were recruited for data collection (4 females and 6 males, age = 29.0 ± 8.7 years old, height = 173 ± 7.9 cm and weight = 78.2 ± 16.2 kg). All participants were healthy with no apparent neurological or physical impairments that could affect their gait or compromise their safety while walking outdoor.

| Data recording mobile app
A custom-built Android app, developed by our research group, was used to record motion data from the embedded sensors of Google Pixel 4 smartphone at an average sampling rate of 400 Hz. The motion data included time stamps, accelerometer (three-channel) and gyroscope (threechannel) readings. These data were combined with participant information and GPS coordinates of the experimental location, and were transferred to a secure online server for storage and processing. Note that the GPS data were not used in this study.

| Data collection
Experiments were conducted in the Summer of 2021 at eight different outdoor locations in Aberystwyth town including a grass patch, a running track, a pavement, sandy beach, pebble beach, a forest track, a road with a slope (for uphill and downhill walking), a set of stairs (for up and down walking) ( Figure 2). These locations were chosen to cover a wide range of walking patterns seen in real-life. In each environment, the participants were instructed to walk back-and-forth between two landmarks at preferred speed without stopping.
The landmarks were 14 m apart. In each session, we ensured recording of at least 1 min of walking data except from those performed on the road and stairs. In these locations, the duration of the data recording was doubled to collect data for both walking uphill/going up the stairs and walking downhill/going downstairs. All experiments were repeated twice, and performed in daylight and under supervision ensuring the safety of the participants. To create more realistic walking conditions, we deliberately did not control for external factors such as weather conditions (e.g., windy or rainy day), surface conditions (e.g., beach conditions varied after a tide), or having a crowd in the vicinity (although they were not allowed to cross the experimental path).
During experiments, participants carried the data recording phone using a fixation belt with a phone holder fixed on lower back close to L3 vertebra. The belt was tied around the waist tightly (without discomforting the participants) to reduce motion artefacts. The phone was placed in the holder horizontally with z-axis corresponding to the anterior-posterior axis (i.e., direction of walking). In addition, all experiments were recorded using a high-speed camera (GoPro Hero 10, frame rate 240 frames s À1 ). These videos were annotated manually to measure actual footground contact times (and subsequently step times) as gold standard. To synchronize phone and GoPro data, at the beginning of each data recording session the experimenter performed a predefined set of motions in the field of view of the camera (i.e., holding the phone at rest close to chest for 5 s, lifting it up and bringing down quickly to the resting position, and holding it there for another 5 s). The relative time offset between phone and GoPro was estimated by aligning the time points of maximum vertical acceleration.

| Data preparation
In total, we have collected 220-min of phone data including 20 data sets per participant (with 10 walking environments Â2 repetitions). Each recording was inspected visually to identify intervals of steady straight walking; that is, turnings were excluded from the analysis. This extracted steady walking data were used for training and validating classifiers in the environment classification module, and estimating step time in the gait characterization module.
3.5 | Part 1: Environment classification 3.5.1 | Feature extraction Each recording was divided into short segments using a 2-s sliding window with 50% overlap. This resulted in at least 25 segments per trial (i.e., data instances). From each segment, seven time domain features (min, max, mean, SD, skewness, kurtosis and number zero crossings), and two frequency domain features (dominant frequency and its amplitude) were extracted leading to 54 features in total; nine features Â six channels (three channels from the accelerometer and three from the gyroscope sensors).

| Classifier training and testing
The training and testing data were a 2D array consisting of different data segments (rows) and 54 features (columns). The participant ID and environment decision class were also added to the 55th and 56th columns, respectively.
Previous studies on human activity recognition have shown that training multiple personalized models (one classifier per person) could lead to better classification performance than training one generalized model for multiple people (Mannini & Intille, 2018). In our data set, we also expected a degree of inter-participant variability. To compare personalized versus generalized models, two distinct classifier-training methods were followed. First, one classifier was obtained for all participants. Nine participants were chosen to train the classifier, and the remaining participant was used to evaluate its true performance (onefold). This process was repeated 10 times by reshuffling the participants in the training and testing data sets (10-fold in total). Second, for each participant, a separate classifier was obtained. In this case, 90% of the personalized data were used to train a classifier, and the remaining 10% was used to evaluate its performance (onefold). Again, this process was repeated 10 times by randomly reshuffling the data in the training and test data sets (10-fold in total).

| Classifier evaluation
For all classifiers, performance was evaluated using 10-fold cross validation (including 10Â randomization at each fold) as described above. The classification accuracy was reported as the percentage of correctly classified instances. The stability of each classifier was evaluated based on the coefficient of variation (SD divided by mean). The class confusion matrix of one of the high-performing classifiers was also visualized to investigate the similarity of walking patterns across different environments. We predicted higher confusion (i.e., more incorrectly classified instances) among environments where participant exhibited similar walking patterns (e.g., pavement and track).

| Part 2: Gait characterization
The goal of the characterization module is to estimate temporal gait parameters such as step time, variance and swing to stance ratio accurately, and the goal of this study is to improve the robustness of the gait characterization module in outdoor conditions by making it more adaptive to the environment where walking takes place.
The gait characterization module starts with filtering the data to remove high-frequency noise that is not related to walking (Phase 1). It then detects heel-strike and toe-off time points using a peak detection algorithm (Phase 2). Next, it estimates step times by measuring the time difference between two consecutive heel-strike or toe-off time points (Phase 3). Finally, a post-processing algorithm looks for missed steps (false-negatives) or pseudo steps (false-positives) by analysing the distribution of estimated step times (Phase 4). If the distribution of the estimated step times does not match the expected distribution, the algorithm changes the peak detection threshold and reanalyses the data (repeats Phase 2, 3 and 4). More details about the gait characterization module can be found in .
The peak detection algorithm (Phase 2) runs under the assumption that the negative and positive peaks prominent in forward acceleration (recorded by the accelerometer z-channel) align well with heel-strike (i.e., when maximum deceleration occurs) and toe-off time points (i.e., when maximum acceleration occurs), respectively. Previous studies have shown that this alignment assumption holds reasonably true while analysing data recorded in controlled laboratory conditions (Khandelwal & Wickström, 2017), however, it has not been tested on data recorded in outdoor environments. Walking on uneven and granular surfaces or against the wind may change the acceleration profile of a participant creating additional peaks or shifting existing ones in time. This would negatively impact the accuracy of step time estimation. In addition, these environmentdependent acceleration profiles may vary from person to person necessitating to tailor the gait characterization module to each participant.
3.6.1 | Identifying optimal cut-off frequency and step time estimation error One parameter that impacts the performance of the gait characterization module is the cut-off frequency of the low-pass filter (Phase 1). In the original implementation, we proposed a relatively high cut-off frequency (10 Hz) to preserve walking-related dominant frequency and its harmonics in the acceleration and angular velocity profiles. To evaluate whether lowering the cut-off frequency improves step time estimation, we ran the gait characterization module using different cut-off frequencies (varied between 2 and 10 Hz with 1 Hz increments), and saved the optimal frequency that led to minimum step time estimation error (for each person and environment). The estimation error was calculated as the mean absolute error (e) between predicted and actual step time measurements (obtained from the camera). The relative change in error (Δe) was also calculated, where b e corresponding to the estimation error when the cut-off frequency was fixed at 10Hz. Δe varied between 0 (b e ¼ e) and 100% (e ¼ 0). The higher Δe, the better the optimal cut-off frequency is. Figure 3a shows three examples of how estimation error varied as a function of filtering frequency while participant 1 walking on sand, participant 1 walking downhill and participant 2 walking on sand. The relationship between estimation error and frequency differed between participant 1 and participant 2. In participant 1, frequencies <7 Hz resulted in lower estimation errors whereas in participant 2 frequencies >6 Hz resulted in lower estimation errors. When data were filtered using optimal frequency, peaks aligned better with heel-strike and toe-off time points (Figure 3b-d).

| Statistical analysis
Two statistical tests were performed: one to compare the performance of personalized and generalized classifiers, and one to evaluate whether tuning cut-off frequency resulted in lower step time estimation error. In both cases, one-sample Kolmogorov Smirnov test was performed to evaluate whether data came from normal distributions (in each group). The null hypotheses were rejected at 5% significance level, that is, the distributions were not normal. Hence, non-parametric Wilcoxon rank sum followed by Tukey-Kramer Multiple Comparison tests were performed to evaluate whether mean accuracy was different at 5% or less significance level.

| RESULTS
All data preprocessing and analysis were performed offline using custom-built Matlab and Python scripts on a standard laptop. All results were reported in the format of either mean ± SD of the mean (for continuous variables such as step time) or median ± interquartile range (iqr) (for optimal cut-off frequency).

| Environment classification
The performance of personalized classifiers was high with an average accuracy of 92.3 ± 5.3% (Table 1). Inter-participant variation in classifiers' performance was up to 10% with participants 1 and 7 having the lowest and highest accuracy, respectively (87.6 ± 5.4% and 98.1 ± 2.8%). Similarly, there was also up to 10% variation in classifiers performance with J48 and MLP having the lowest and highest accuracy, respectively (84.5 ± 4.4% and 96.8 ± 2.0%). On rare occasions where classifiers failed to identify the correct environment. The majority of mislabelling occurred Step time estimation error as a function of low-pass filter cut-off frequency. Three examples are shown: participant 1 walking on sand (black), participant 1 walking downhill (cyan), and participant 2 walking on sand (magenta). Circle markers indicate optimal cut-off frequencies resulting in lowest estimation error for each data set. Note that the optimal frequency for each data set is distinct. (b) Phone data from participant 1 (sand): low-pass filtered by the optimal frequency of 3 Hz (top) and filtered by 10 Hz as a control (bottom). (c) Phone data from participant 1 (downhill): filtered by the optimal frequency of 6 Hz (top) and filtered by 10 Hz (bottom). (d) Phone data from participant 2 (sand): filtered by the optimal frequency of 7 Hz (top) and filtered by 10 Hz (bottom). Vertical lines correspond to actual toe-off (dashed grey line) and heel-strike (solid grey line) time points obtained from the video camera between similar environments including pavement and track or pebble and sand (e.g., see Figure 4). Overall, the classifiers also had high stability Personalized classifiers performed significantly better than generalized classifiers which were trained and tested on different participants (nine participants for training and one participant for testing) (p < 0.01). The accuracy of generalized classifiers reached a ceiling around 33%: 28.5 ± 3.2% (FNN), 19.9 ± 1.5% (J48), 31.8 ± 4.0% (JRip), 30.7 ± 2.7% (NN), 30.8 ± 5.6% (NB), 23.6 ± 2.4% (SMO), 27.2 ± 3.4% (MLP). Generalized models were more complex than personalized models. For instance, on average a generalized J48 classifier (a decision tree) had almost 17 times more leaves than a personalized J48 classifier (38 vs. 679), suggesting that there were no overlapping rules among participants, probably due to inter-participant variability.

| Step time estimation
The optimal cut-off frequency minimizing step time estimation error was different for each participant and environment, and there was no one frequency that could be associated with a participant or environment ( F I G U R E 4 Naive Bayes (NB) classifier confusion matrix which was calculated as a cumulative sum of 10-fold cross-validation. Light colours indicate higher numbers cases had either 2 Hz (38%) or 3 Hz (17%) as optimal frequency. In five walking conditions the median frequency was less than 3 Hz: 2.5 ± 2.5 Hz (downstairs), 2.5 ± 3.5 Hz (forest), 2.5 ± 3.0 Hz (track), 2.5 ± 3.25 Hz (uphill) and 2.5 ± 2.5 Hz (upstairs), and in one condition (sand) it was 3.5 ± 2.75 Hz. In other four conditions (downhill, grass, pavement and pebbles) the optimal frequency was more variable; for instance, 5.5 ± 4.25 Hz in grass.
With optimal cut-off frequency, the mean step time estimation error was 8.6 ± 15.4 ms (Table 4). Again, the amount of error varied among participants and environments. The minimum error was measured for participant 7 (2.9 ± 1.9 ms) and sand (5.1 ± 4.4 ms) whereas the maximum error was measured for participant 10 (28.6 ± 26.2 ms) and downhill (16.2 ± 32.1 ms).
T A B L E 2 Optimal cut-off frequency for each participant and environment Note: Last two rows show median and interquartile range across participants (same environment). Last two columns show median and interquartile range across environments (same participant). The overall median and Iqr when all data combined was 3.0 and 4.0, respectively.

T A B L E 3 Reduction in step time estimation error (percentage) when optimum cut-off frequency was used
The overall mean when all data combined was 39. Note: Last two rows show mean reduction (and SD) across participants (same environment). Last two columns show mean reduction (and SD) across environments (same participant).

| Summary
This study presents a novel software architecture which uses an adaptive method to successfully characterize gait in different outdoor environments, and lays the groundwork for a smartphone-based gait assessment technology that can be used in the community. The architecture relies on two data processing modules: environment classification module which predicts the environmental conditions under which walking takes place, and a gait characterization module which uses an environment-specific algorithm to estimate the temporal gait parameters of the walker.
The utility of the architecture was demonstrated by analysing real world data collected from 10 participants walking on 10 different conditions.
The main results are summarized below: 1. The accuracy of the environment classification module was above 90%.
2. It was not possible to obtain a single classifier generalized to all participants. Instead, a separate classifier was obtained for each participant.
3. The step time estimation error was less than 10 ms which is less than (or equal to) error values reported for indoor, flat surface walking 4. Tailoring filtering frequency to each person and environment reduced step time estimation error by 39%.
5. The optimal filtering frequency was different for each person and environment.
To the best of our knowledge, this is the first study which successfully demonstrates that parameter tuning can improve gait characterization in outdoor environments.

| Analysis of results
Why did generalized classifiers have low classification accuracy? The most plausible explanation is that the inter-participant variability (variability of how participants walk) was higher than the inter-environment variability (variability of walking in different environments). In another words, the gait variability between two participants walking in the same environment was higher than the gait variability of the same participant walking in two different environments. This was rather expected as everyone has a distinct gait depending on a multitude of biomechanical and physiological factors (Winter, 1991) so much so that gait can be used as a human identification tool (Nixon et al., 2010), similar to fingerprint. Alternatively, the low classification accuracy can be due to the selected feature set which included most basic time and frequency domain features. It remains to be seen whether adding new features (e.g., see Lubba et al., 2019) or using deeper networks (capable of learning their own features) would improve the classification performance.
The personalized classifiers had an average classification accuracy higher than 90%, which was very encouraging. Incorrect classifications typically occurred across potentially similar conditions. The percentage of incorrect classifications was around 10% between track and pavement (both environments had a flat surface), and sand and pebble beach (both environments had a granular surface although at a different scale). There

T A B L E 4
Step time estimation error (in milliseconds) when optimum cut-off frequency was used was also some confusion between pavement and grass, pavement and pebble beach, pavement and forest and forest and sand. On some occasions, participants deviated from their path or changed their gait based on external factors; for instance, to avoid a rock (pebbled beach), a tree branch (forest) or a pedestrian who was walking nearby. These gait perturbations also changed the acceleration profiles and may have contributed to the incorrect classifications to a certain extent.
Overall, the gait characterization module had an outstanding performance in estimating step time. The average step time estimation error was less than 10 ms, which was a marked improvement compared with previous indoor studies, reporting error values between 10 and 20 ms (e.g., 13 ms in Del Din et al. (2015) and 19 ms in Kim et al. (2015)). However, it is worth noting that previous studies evaluated performance using larger data sets (including participants with varying demographics and degree of gait impairments). In our study, there were few exceptional cases (9 in total) where estimation error was relatively high (>25 ms or 5% assuming that average step time was 500 ms). For instance, it was 100 ms while participant 3 walking downhill. In this particular case, the participant walked very fast (almost running) and took larger steps shifting the position of local peaks in the acceleration profile ( Figure 5a). Noticeably, five of these nine cases came from participant 10. Further analysis showed that this was due to the fact that the peaks in this participant's acceleration profile did not align with heel-strike and toe-off time points as well as they did in other participants (Figure 5b), violating the key assumption of the gait characterization module.
The results highlight the importance of parameter tuning in smartphone-based outdoor gait characterization; that is, tuning low-pass filter cutoff frequency reduced step time estimation error as much as 100%. The optimal frequency varied between 2 and 9 Hz, and on average the error reduction varied between 8% and 55% in participants and 20% and 55% in environments. In general, lowering cut-off frequency improved prediction performance; the optimal cut-off frequency was equal to or lower than 5 Hz in 70% of the instances, and equal to or less than 3 Hz in 55% of the instances. We speculate that walking outside required more corrective movements (for instance, to maintain balance after stepping on a stone).
These movements were typically fast and transient, creating high-frequency noise in the data. Filtering data with a relatively low cut-off frequency might have enhanced the peaks created by the fundamental stepping frequency, which was typically less than 2 Hz.

| Limitations of the study and future work
While the proposed software architecture focused on estimating one gait parameter, step time, it can be easily extended to estimate other temporal parameters (e.g., swing to stance ratio and left-right asymmetry). In addition, there is a prospect for estimating spatial gait parameters which are associated with mobility (e.g., step length) and dynamic stability (e.g., step width) (Brach et al., 2005;Sekiya et al., 1997). Several methods have been proposed for estimating step length from IMU data (Klein & Asraf, 2020;Köse et al., 2012;Zijlstra & Hof, 2003). So far, these algorithms have been almost exclusively tested on indoor data, and further studies are needed to evaluate their performance on outdoor data.
Apart from filtering frequency, there are other parameters inside the gait characterization module that can be tuned to improve performance (e.g., amplitude threshold in the peak detection algorithm). The long-term goal is to include a new optimization module capable of tuning these parameters automatically on the fly without needing a priori training. This new module will make the architecture more adaptive to unseen participants and environmental conditions.
We are currently working on improving the gait characterization module to handle fast, unsteady and impaired walking. In these conditions, acceleration profiles change unpredictably. Hence, more reliable and assumption-free step detection and characterization methods are needed.
One promising approach is training a deep neural network to predict foot contact times (Kidzi nski et al., 2019). In particular, long short-term memory networks and transformers, capable of learning temporal dependencies, would be suitable for this task.
To expand the project, we have started recording data from more participants with diverse backgrounds (e.g., healthy participants) with different age groups and participants with neurological movement disorders (e.g., Stroke and Parkinson disease) and in more dynamic environments F I G U R E 5 (a) Phone data from participant 3 and participant 4 during downhill walking. (b) Phone data from participant 10 and participant 4 during climbing stairs. Vertical lines correspond to actual toe-off (dashed grey line) and heel-strike (solid grey line) time points obtained from the video camera (e.g., while walking in town centre or shopping in a grocery store). The new data set will pose new challenges to recognize the 'context' (who is walking and where walking takes place), and provide a rich test bed to evaluate the performance of the improved gait characterization module.
Up until now, all data processing was done offline using a standard laptop. The next step is to realize these computations on the phone itself to provide real-time feedback to participants and health care professionals. The readout from the phone sensors was kept as high as possible (i.e., 400 Hz) to have high-resolution data. It is desired, however, to lower the sampling rate in order to reduce data bandwidth hence improve phone battery life. The preliminary results from an ongoing investigation in our research group suggests that 100 Hz is plentiful to maintain high performance.
Similarly, we are investigating how position and orientation of the phone affect its sensitivity. It is not unreasonable to assume if the phone is placed in one of front pockets (which is more realistic in daily living scenarios), its sensors will be less sensitive to the steps from the contralateral side than the steps from the ipsilateral side. A naive solution to mitigate this problem would be to use separate amplitude thresholds for each side.
However, peaks in the accelerometer data generated by the steps from the contralateral side may not be as prominent (or may be distorted), warranting further data processing.
This study lays the groundwork for a smartphone-based gait assessment technology that can be used in the community for long-term continuous health monitoring.

AUTHOR CONTRIBUTIONS
Otar Akanyeti proposed the study. Megan Taylor Bunker developed the data recording mobile app. Megan Taylor Bunker and Arshad Sher collected the data. Otar Akanyeti, Arshad Sher analyzed the data. Otar Akanyeti and Arshad Sher wrote and edited the manuscript.